Last updated 3 months ago
Inability to use multiple GPUs of multiple computers across NuNet’s network hinders many decentralised compute applications, including large scale AI model training, federated learning and others.
Enabling the NuNet platform to support GPU clustering from multiple compute providers in a decentralized network for large scale distributed jobs, potentially approaching scale of big cloud providers
This is the total amount allocated to NuNet: Enabling GPU clusters - Scaling&Expansion.
Please provide your proposal title
NuNet: Enabling GPU clusters - Scaling&Expansion
Enter the amount of funding you are requesting in ADA
100000
Please specify how many months you expect your project to last
8
Please indicate if your proposal has been auto-translated
No
Original Language
en
What is the problem you want to solve?
Inability to use multiple GPUs of multiple computers across NuNet’s network hinders many decentralised compute applications, including large scale AI model training, federated learning and others.
Supporting links
Does your project have any dependencies on other organizations, technical or otherwise?
Yes
Describe any dependencies or write 'No dependencies'
Internal dependencies (existing NuNet technologies that will be augmented during the project) The primary dependency is NuNet's device-management-service (DMS) protocol, which provides the foundational decentralized orchestration layer for compute resources distributed across globally available compute devices. DMS enables distributed task execution across a network of participating devices, handling resource discovery, allocation, and coordination in a trustless environment. The NuActor system is an integral part of device-management-service which delivers a specialised programming model designed for secure development of decentralised systems. NuActor framework provides isolation guarantees, message-passing semantics, and fault-tolerance mechanisms essential for distributed application reliability. The system abstracts the complexities of distributed state management while maintaining security properties across network boundaries. In conclusion, the majority of the work for this project will go into the enhancement of the Device Management Service to allow job deployments that require a clustered set of GPUs.
Will your project's outputs be fully open source?
Yes
License and Additional Information
NuNet licensing policy, adopted in April 2023, follows the principle of "Open Source code for community and business use" and publicly available at https://docs.nunet.io/docs/nunet-licensing-policy. It defines a licensing application for all NuNet's projects, publicly hosted at https://gitlab.com/nunet. All components developed as a result of this project will be categorised as core or non-core protocol components and licensed with Apache 2.0 open source software license from the start.
Please choose the most relevant theme and tag related to the outcomes of your proposal.
AI
Describe what makes your idea innovative compared to what has been previously funded (whether by you or others).
NuNet’s decentralized infrastructure currently limits GPU utilization to one GPU and one machine, restricting scalable workloads. Our innovation addresses this critical challenge by enabling dynamic, clustered GPU processing across a network of diverse machines. We’re delivering a truly scalable solution, effectively pooling computing resources to maximize performance and bandwidth efficiency regardless of different capabilities. This overcomes the current limitations, offering a robust and adaptable platform for demanding GPU workloads.
Describe what your prototype or MVP will demonstrate, and where it can be accessed.
The prototype will be built on the Device Management Service (DMS) that makes up the NuNet decentralized computing platform. Once completed, it should be possible to utilize multiple GPUs from multiple machines for a single job by pooling together the resources from each machine into an integrated cluster.
Describe realistic measures of success, ideally with on-chain metrics.
Successfully completing this project will enable support for an ensemble specification or job descriptions for the NuNet platform that allows deploying a job with multiple GPU requirement and leverage GPUs across multiple machines. Additionally, example ensemble specifications, documentation, and tutorials will be provided, making this feature easy to use and accessible for both testers and users.
Please describe your proposed solution and how it addresses the problem
NuNet has already established a robust GPU identification and computing capability within the Device Management Service (DMS), developed through funding from Cardano Catalyst Fund 8. This initial implementation successfully supports single-GPU jobs, providing reliable resource discovery, scheduling, and execution across the decentralized network. Building on this foundation, the next step is to extend DMS and the existing ensemble specification to enable Orchestrators to run multi-GPU jobs across multiple Compute Providers, unlocking greater scalability and efficiency for high-performance workloads. By developing mechanisms to orchestrate GPU resources across heterogeneous devices in a decentralized environment, we aim to unlock a new layer of scalability and efficiency for NuNet users and contributors alike.
At the core of the solution is the extension of the ensemble specification used for job descriptions to request and coordinate multiple GPUs in a flexible and transparent manner. This extension of the ensemble spec will involve adding new parameters for GPU requirements, such as the number of GPUs, minimum memory per GPU, CUDA/ROCm compatibility, and interconnect performance. For example, a job may specify that it requires four NVIDIA A100-class GPUs or eight consumer-level GPUs with at least 8GB VRAM each. The platform’s scheduling system should then identify available GPUs across the network that collectively satisfy the requirement, regardless of whether they reside on a single machine or multiple compute providers’ devices.
A key technical challenge is enabling distributed GPU execution across machines that may be geographically dispersed and connected over variable network conditions. To address this, the solution will integrate existing distributed computing paradigms such as data parallelism and model parallelism into NuNet’s decentralized architecture. Data parallelism will allow identical model shards to run across multiple GPUs with parameter synchronization at each training step, while model parallelism will divide different parts of a computational graph across GPUs. We will implement middleware that leverages libraries like Horovod, and gRPC-based message passing to ensure efficient communication between GPUs, even in decentralized settings. Additionally, it should also be possible to allow libraries such as NCCL (NVIDIA Collective Communication Library) to be usable as containerized jobs without such a middleware which currently isn't possible due to the way the ip layer on top of the peer to peer network is implemented to forward ports.
The solution also requires the design of a resource discovery and allocation layer capable of securely advertising GPU availability, benchmarking performance, and verifying compatibility. GPUs differ significantly in performance, driver stack, and software dependencies, so a profiling and benchmarking mechanism will be built into the Device Management Serve (DMS). This will allow DMS’s resource allocation to better match job requirements with available GPU resources, optimizing for latency, bandwidth, and cost efficiency. Where possible, GPUs on the same machine or local network will be prioritized to minimize interconnect bottlenecks, while cross-machine GPU coordination will be handled through synchronization protocols that tolerate network variability.
From a decentralization and trust perspective, the solution will build upon NuNet’s existing sandboxing and containerization technologies. Each multi-GPU job will be packaged in a container or virtualized execution environment with explicit resource access controls. Compute Providers will retain full sovereignty over their devices, with configurable limits for their GPUs to be pooled into cluster jobs.
To make this functionality usable and widely adoptable, we will develop documentation, example ensemble specs, and tutorials demonstrating common workflows. For instance, orchestrators will be able to submit a training job for a large transformer model that automatically distributes computation across multiple compute providers’ GPUs, with the system abstracting away most of the complexity. Similarly, tutorials will show how to specify GPU requirements for simulations or inference tasks without needing to manage hardware allocation directly.
The final deliverable will be a production-ready GPU clustering feature integrated into the NuNet platform, complete with APIs for specifying GPU jobs, a scheduling and orchestration layer, and supporting developer tools. This will empower both early testers and users to harness a truly decentralized GPU cluster at scale, paving the way for NuNet to support the next generation of compute-intensive applications in AI, data science, and beyond.
Please define the positive impact your project will have on the wider Cardano community
The development of GPU clustering within NuNet’s decentralized computing platform represents a transformative milestone with wide-reaching impact across technological, economic, and societal domains. By enabling single jobs to utilize multiple GPUs across multiple machines contributed by diverse Compute Providers, this feature unlocks a new dimension of scalability, accessibility, and efficiency in distributed and decentralized computing. The integration of this capability within NuNet’s existing ensemble specification and Device Management Service (DMS) ensures that the innovation builds directly upon the platform’s foundations, strengthening its role as a global marketplace for compute.
Empowering Orchestrators with Scalable Compute
The ability to aggregate GPU resources across heterogeneous machines addresses one of the most pressing challenges faced by Orchestrators today: access to scalable, high-performance infrastructure. Training large machine learning models, running simulations, or performing inference at scale requires clusters of GPUs that are typically available only in centralized data centers operated by major cloud providers. By contrast, NuNet’s GPU clustering functionality democratizes access to these capabilities, allowing Orchestrators to seamlessly harness distributed GPUs owned by independent Compute Providers. This reduces reliance on centralized platforms, lowering costs and fostering greater autonomy for developers, researchers, and enterprises.
Unlocking Economic Opportunities for Compute Providers
For Compute Providers, GPU clustering creates new opportunities to monetize underutilized resources. High-performance GPUs are often idle or underused outside peak workloads, particularly in academic institutions, enterprise environments, or even among individual owners of gaming-class hardware. By enabling these GPUs to be pooled into distributed clusters, Compute Providers can participate in high-value compute markets such as AI training and scientific computing. This creates a decentralized revenue stream, incentivizing broader participation and contributing to the growth and resilience of the NuNet ecosystem.
Advancing Decentralized Infrastructure and Innovation
The impact of GPU clustering extends beyond economics into the realm of infrastructure innovation. By demonstrating that complex, high-performance workloads can be executed efficiently on a decentralized and heterogeneous network, NuNet challenges the conventional assumption that such capabilities require centralized cloud-scale infrastructure. This represents a paradigm shift in how compute-intensive tasks can be distributed, coordinated, and validated. In the long term, it sets the stage for more advanced decentralized capabilities, including large-scale AI model training, distributed simulations for scientific research, and federated approaches to compute that prioritize data sovereignty and privacy.
Impact on the Cardano Ecosystem
NuNet’s NTX token is a native asset on Cardano, and the integration of GPU clustering directly strengthens the Cardano ecosystem. NuNet is actively developing payment systems to be available on the platform and by enabling high-performance compute services within a tokenized framework, GPU clustering creates new real-world demand for NTX and drives utility within the Cardano economy. Orchestrators will require NTX to access decentralized GPU resources, while Compute Providers will be compensated in NTX for their contributions, creating a vibrant, self-sustaining marketplace of value exchange. This deepens liquidity and adoption of Cardano-native assets while showcasing Cardano’s scalability for real-world applications.
Furthermore, GPU clustering opens pathways for collaboration with other Cardano-native projects, such as decentralized AI initiatives, research collectives, and DeFi platforms that require compute-intensive analytics. By anchoring these services to Cardano, NuNet strengthens the ecosystem’s positioning as not just a financial infrastructure but also as a foundation for decentralized compute and AI innovation. This alignment demonstrates Cardano’s ability to host advanced, real-economy use cases, reinforcing its competitiveness in the broader blockchain landscape.
In summary, GPU clustering on NuNet is not just a technical enhancement, but a catalyst for broad and lasting impact. It democratizes access to high-performance compute, creates new economic opportunities, strengthens decentralized infrastructure, supports sustainability, and accelerates innovation. By extending the existing capabilities of the DMS and ensemble spec framework, and by anchoring utility to the NTX token on Cardano, this development delivers benefits both to the NuNet network and the broader Cardano ecosystem.
What is your capability to deliver your project with high levels of trust and accountability? How do you intend to validate if your approach is feasible?
Illustration of Capacity:
Our organization comes with a history of successfully bringing intricate technology projects to fruition. The pillars of our success lie in our deep-rooted technical understanding, stringent project management practices, and an unwavering focus on transparency and responsibility.
Our team consists of experienced software engineers with a strong track record in building distributed computing and decentralized systems. We have already demonstrated our capability by developing the first iteration of GPU support for single-GPU jobs, which is fully integrated into the NuNet platform. The Device Management Service (DMS) that makes up the NuNet platform is currently live and operational, providing a proven foundation for extending support to multi-GPU, multi-machine workloads. This history of successful implementation underscores our readiness to tackle the challenges of the proposed GPU clustering project.
NuNet is committed to Open Source Software development from the inception. Therefore, all our development and progress is available for public scrutiny at all times as well as open collaboration with the community. We actively invite and work with the community in regards to contribution, usage, work and testing of the platform codebase.
Link: https://gitlab.com/nunet
NuNet licencing policy:
https://docs.nunet.io/nunet-licensing-policy/
Openness and Responsibility:
We have established a robust framework to ensure openness and responsibility in the execution of the project and the management of finances:
These measures reflect our commitment to openness, responsibility, and proper management of funds. We believe that these factors, along with our technical capabilities, make us an ideal choice to successfully execute this project.
We understand that not all steps we have implemented are valid for the Catalyst proposal but it demonstrates the internal working procedures we have in place.
Catalyst Experience
NuNet also has received the funding for proposals in Fund7 and Fund8. Both proposals have been delivered. Overall, the funds were spent as intended on the development which can be monitored on Gitlab with daily commits since the award.
https://gitlab.com/groups/nunet/-/milestones/19#tab-issues
https://gitlab.com/groups/nunet/-/milestones/20#tab-issues
Financial Stability
As a 20+ strong team, we have independent funding to develop the core platform with a cash runway for at least 1-1.5 years. Cardano Catalyst proposals are used to extend the functionality and add features to the platform in order to enrich the possible use cases.
Risk analysis and mitigation:
Insufficient Compute Providers – If the network does not have enough Compute Providers available to support multi-GPU jobs due to the substantial resource requirements of high-performance workloads, a compensation structure will be implemented in which Orchestrators pay NTX to Compute Providers for access to their GPUs, similar to a cloud computing model. Additionally, NuNet will deploy a portion of its own GPU resources as a fallback to ensure network balance and maintain availability for critical jobs at all times.
Milestone Title
Requirements and Architecture Design
Milestone Outputs
NuNet will provide the following:
Acceptance Criteria
NuNet will provide the following:
Evidence of Completion
NuNet will provide the following:
Delivery Month
1
Cost
15000
Progress
10 %
Milestone Title
Resource Discovery and Allocation Layer
Milestone Outputs
NuNet will provide the following:
Acceptance Criteria
NuNet will provide the following:
Evidence of Completion
NuNet will provide the following:
Delivery Month
4
Cost
30000
Progress
40 %
Milestone Title
Distributed Execution Middleware
Milestone Outputs
NuNet will provide the following:
Acceptance Criteria
NuNet will provide the following:
Evidence of Completion
NuNet will provide the following:
Delivery Month
6
Cost
30000
Progress
70 %
Milestone Title
Documentation, Tutorials, and Examples
Milestone Outputs
NuNet will provide the following:
Acceptance Criteria
NuNet will provide the following:
Evidence of Completion
NuNet will provide the following:
Delivery Month
7
Cost
10000
Progress
90 %
Milestone Title
Milestone 5: Testing, Validation, and Production Release
Milestone Outputs
NuNet will provide the following:
Acceptance Criteria
NuNet will provide the following:
Evidence of Completion
NuNet will provide the following:
Delivery Month
8
Cost
15000
Progress
100 %
Please provide a cost breakdown of the proposed work and resources
The following budget breakdown provides an overview of the anticipated costs essential to delivering NuNet’s GPU clustering functionality. This allocation supports every critical stage, including project management, coordination, software development, testing, and associated research of ecosystem technologies, ensuring that NuNet has the resources necessary to build a robust, production-ready multi-GPU, multi-machine system.
Outlined in ADA, the budget is categorized to cover project management, IT development, system integration, testing, documentation, and research. This structured approach ensures operational efficiency, seamless technical execution, and a strategic reserve for unforeseen requirements — setting a strong foundation for project success.
Milestone #1: Requirements and Architecture Design
15,000 ADA / Resources & Skills Needed: Systems architects, software engineers/developers
Milestone #2: Resource Discovery and Allocation Layer
30,000 ADA / Resources & Skills Needed: Software engineers, DevOps, infrastructure for testing
Milestone #3: Distributed Execution Middleware
30,000 ADA / Resources & Skills Needed: Software developers, distributed computing engineers, testing infrastructure
Milestone #4: Documentation, Tutorials, and Examples
10,000 ADA / Resources & Skills Needed: Technical write ups and documentation, tutorial creation
Milestone #5: Testing, Validation, and Production Release
15,000 ADA / Resources & Skills Needed: QA engineers, performance benchmarking and testing infrastructure
How does the cost of the project represent value for the Cardano ecosystem?
This project delivers exceptional value by addressing a critical need for high-performance, decentralized GPU compute. Building on NuNet’s existing single-GPU functionality, it enables multi-GPU, multi-machine workloads, significantly expanding the platform’s capabilities for Orchestrators and Compute Providers alike.
By allowing jobs to leverage GPUs across multiple machines, the project enhances scalability, efficiency, and resource utilization. Orchestrators gain access to flexible, cost-effective compute resources without relying on centralized providers, while Compute Providers are able to monetize underutilized GPUs, creating a sustainable and self-reinforcing ecosystem.
From an operational perspective, the platform reduces risks associated with hardware bottlenecks, single points of failure, and underutilized resources. The decentralized architecture ensures that workloads continue to execute reliably even when individual nodes experience downtime or performance variability.
By combining distributed GPU orchestration, efficient resource utilization, and a tokenized settlement mechanism, this project delivers a robust, production-ready solution that strengthens NuNet’s platform, unlocks new opportunities for complex workloads, and ensures the network remains flexible, resilient, and sustainable.
Terms and Conditions:
Yes
Project Team
NuNet is a deep tech startup that is developing cutting edge solutions in the decentralized open source space. Currently, there are 30+ people in NuNet working on delivering use cases, primarily for Cardano. On top of that,
As a SingularyNET spin-off, NuNet has access to 100+ AI and software engineers for support. Main team members responsible for this proposal are presented below.
The NuNet Team working on this project:
Name: Kabir Veitas, PhD AI, MBA
Location: Brussels, Belgium
LinkedIn: https://www.linkedin.com/in/vveitas/
Position: Co-Founder & CEO/CTO
Bio:
Working in the computer software, research and management consulting industries with demonstrated experience. Skilled in Artificial Intelligence, cognitive and computer sciences, systems thinking, technology strategy, strategic business planning, management and social science research. Strong operations professional with a Doctor of Philosophy - PhD focused in Multi/Interdisciplinary Studies from Vrije Universiteit Brussel.
Name: Janaina Senna, MSc CS, MBA
Location: Belo Horizonte, Brasil
LinkedIn: https://www.linkedin.com/in/janaina-farnese-senna/
Position: Product Owner
Bio:
Master's degree in computer science and played different roles over the past 20 years, such as development manager, tech lead, and system architect, helping organizations launch new software and hardware products in the telecommunication and energy areas. As a product owner, she has shaped the product vision into manageable tasks and constructed the bridge between developers and stakeholders. She enjoys seeing products coming to life!
Name: Dagim Sisay Anbessie, BSc CS
Location: Addis Ababa, Ethiopia
LinkedIn: https://www.linkedin.com/in/dagim-sisay-7b4b05b8/
Position: Tech Lead
Bio:
Experience in projects in the areas of Robotics, Machine Learning, System Software Development and Server Application Deployment and Administration for several international clients. At SingularityNET he worked on AI and misc. software development. Main responsibilities lay in researching the development path, technology to be used and directing specific tasks to the dev team. Additionally, he has been involved in system development when circumstances demand it.
Name: Samuel Lake
Location: Phuket, Thailand
LinkedIn: https://www.linkedin.com/in/sam-lake-a04698127/
Position: Dev Rel / Community testing
Bio:
30 Years Experience in delivering projects in the areas of IT Infrastructure / Cloud / Networking/ IOT Sam's responsibilities at NuNet are working alongside the development team to test and provide input into solutions being developed as well as handling communications with and input from the community.
Name: Jennifer Bourke, BA, MSc
Location: Dublin, Ireland
LinkedIn: https://www.linkedin.com/in/jennifer-bourke-1bb286158/
Position: Marketing and Community Lead
Bio:
A data-driven marketing expert with a postgraduate degree in digital marketing and data analytics. Currently pursuing a postgraduate degree in global leadership, she combines her strategic marketing skills with a global perspective. With over 6 years of experience, Jennifer has a proven track record of driving successful marketing campaigns.
Name: Ilija Radeljic, MSc CE
Location: Oslo, Norway
LinkedIn: https://www.linkedin.com/in/ilija-radelji%C4%87-2108ab14/
Position: COO
Bio:
Corporate industry veteran and AI&Blockchain enthusiast. This combination brings a wealth of 15 years of experience managing major infrastructure, power and manufacturing projects to the emerging blockchain world and its applications.
15+ years of experience in business negotiation, partnerships, leads, market entry, project management, promotion and presentations worldwide.
Formal engineering education, MSc Civil Engineering + MIT Sloan Executive Management and Leadership certified.
Cardano Catalyst Community Advisor and Cardano Catalyst Veteran Community Advisor since the beginning (Fund2) and consulted several funded proposals in Cardano Catalyst.
External auditors:
NuNet is also collaborating with the external auditing company Obsidian (https://obsidian.systems/)) which has been contracted to audit the core platform development as well as specific use case integrations such as this one.
We intend to extend their contract (or hire another suitable 3rd party auditor) for auditing the implementation of this research work as well.
External support:
NuNet has a capable team (30+) to tackle the project but sometimes some extra resources or skills might be needed outside of the available pool. This will be sourced either as additional employees or subcontracted depending on the size and length of the development.