The recent rise in AI and ML has resulted in the lack of available GPU computing power on the market which is dominated by big cloud service providers and are often expensive and out of reach.
NuNet is a decentralized, open source, peer-to-peer computing alternative to big cloud providers. Currently NuNet enables computing on one GPU on one machine for one task. Scaling up is the next step
This is the total amount allocated to NuNet: Decentralized GPU Splitting on software level // Splitting large scale compute work into small containers optimized for deployment on decentralized hardware is a necessary component of any decentralized system..
Avimanyu Bandyopadhyay, PhD candidate
Lead Researcher and systems scientist
Kabir Veitas, PhD
CEO and lead architect
For the development we shall use only open source frameworks as follows:
These technical dependencies do not require permissions to use outside Open Source licensing and are taken into account in our feasibility study. Therefore, integrating these dependencies will not cause any delays.
Project will be fully open source in line with our licensing policy.
SDG Goals
SDG Subgoals
Problem:
The recent rise in AI and ML resulted in the lack of available GPU computing power on the market which is dominated by big cloud service providers and are often expensive and out of reach.
Currently there is no real-world use-case for distributed scaling that works on a decentralized network. There is a need for a globally distributed computing infrastructure that seamlessly works on consumer devices and machines. That can be achieved by splitting large scale models into separate containers deployable on decentralized infrastructure for parallel execution and combination of results.
Unique solution:
Currently NuNet enables GPU computing on multiple GPU cards on one machine for a single task which was implemented as part of funded Fund8 proposal NuNet: Decentralized GPU ML Cloud. Splitting large scale execution models into containerized components running and communicating in parallel on decentralized hardware is the next step.
Detailed approach:
The general process of distributing a single GPU job (training/inference/general purpose computing) across multiple nodes in a decentralized network using the specified tools and techniques, is briefly outlined as follows:
1. Job Preparation:
Prepare the specific task - it could be training a machine learning(ML) model or performing inference using a pre-trained model with the Python program ready for the task at hand. It could also be a non-ML computational Python program.
2. Environment Setup:
Wrap the Python script and the necessary libraries (like TensorFlow, PyTorch, or other machine learning or computational libraries) into a standardized unit, which we'll refer to as the 'distributed job.' The distributed job will also include tools for distributed processing (like Horovod) and network communication (like libp2p).
3. Node Configuration:
Arrange the nodes in the network and ensure they have the necessary tools to handle the 'distributed job'. All nodes should be connected through a communication protocol, such as libp2p.
4. Job Splitting:
Distribute the job across the nodes in the network. Each node now has an identical setup and is capable of executing the task independently.
5. Task Initialization:
Initiate the task using distributed processing tools. For training, the data is split among nodes, and the model is trained simultaneously on each node. For inference, the nodes make predictions independently on a subset of the data.
6. Inter-container Communication:
As the nodes execute the task inside containers, they would use peer to peer network communication such as libp2p to share and synchronize their work.
7. Task Finalization:
Once the task is complete, we gather the results. For training, the final model parameters could be accessed from any of the nodes. For inference, we may need to collect and compile the prediction results from each node. For general purpose computation, we simply collect the results.
Benefits for the Cardano ecosystem:
The research is a continuation and expansion of the already completed Fund8 proposal. It will enable all dapps and usecases in the web2 and web3 space that need GPU computing power to source it via NuNet. The value for the compute provided will be exchanged via NTX token which is a Cardano Native Token.
Each transaction will be executed as a Smart Contract on the Cardano blockchain which will directly increase the volume of tx, volume of CNT as well as provide unique use cases to be built on top of it for the Cardano ecosystem.
The proposal addresses the following directions of the challenge:
The research done in this proposal would lead to the development of the NuNet framework to be available as Open Source to all the users in the Cardano ecosystem and wider with further development. In order for the Open Source community to use NuNet, extensive knowledge base, documentation and step-by-step procedures shall be prepared.
The current hot trends are in AI and large scale machine learning and are not slowing down. GPU computing is the main aspect of it which this research and development results out of it will tap into.
NuNet is building technology that will allow people to provision hardware for AI/ML jobs monetized via Cardano ecosystem; in the short term and in case of success, that may boost Cardano usage; in the long term, it would connect real-world assets (computing power) and crypto payment space with the help of Cardano integration.
NuNet builds a potentially disruptive technology where it has a potential to grab a share of the global computing market valued at 548 B USD, with a potential to grow to 1240 B USD. Grabbing just a fraction would result in potentially huge values being moved via Cardano Smart Contracts. Based on this proposal, research, an implementation shall proceed where more precise estimation on the number of users could take place. Anyone in the Cardano ecosystem could deploy and use the cheaper GPU cluster resources for AI, ML, rendering and many other applications. It is a fundamental enabling technology.
Source:
This project will result in the implementation of a way to distribute large scale models that need GPU computing resources (mostly machine learning and AI related) on a decentralized network of hardware resources owned by the community. In case of success, it will provide the ability to access these resources to the groups of users, which are currently excluded (due to high price and low availability of GPU resources, as explained in the problem statement).
After completing this project, we expect a substantial increase of deployment requests on NuNet network which uses Cardano SC for tokenomics and settlement layer, which in turn will increase transactions on Cardano network and further develop real use-cases in Cardano ecosystem.
Some of the direct benefits to the Cardano ecosystem are:
Some of the indirect benefits to the Cardano ecosystem are:
Spreading Outputs Over a Timescale
Our project plan includes clear milestones and deliverables, which will be shared publicly as they are completed. This incremental release of outputs will ensure a continuous stream of updates for the community.
This approach lets us provide updates on a regular basis, and offers users the chance to provide feedback that we can use to guide subsequent development.
Sharing Outputs, Impacts, and Opportunities
We intend to leverage various communication channels to share our project's outputs, impacts, and opportunities:
Testing and further research
As an open-source project, our outputs will be freely accessible for further research and development. We encourage the community's involvement in testing our solutions to enhance their real-world performance.
Community Testing: We'll invite our users to participate in alpha and beta testing phases, where they can help identify bugs and suggest improvements. We'll use GitLab's issue tracking for managing feedback and provide guidelines for issue reporting and feature suggestions.
Internally, we'll use project insights and community feedback to guide our future work, optimize performance, and prioritize new features. Our aim is to foster a collaborative development ecosystem that is robust, relevant, and of high quality.
Illustration of Capacity:
Our organization comes with a history of successfully bringing intricate technology projects to fruition. The pillars of our success lie in our deep-rooted technical understanding, stringent project management practices, and an unwavering focus on transparency and responsibility.
Our team is populated with seasoned software engineers with excellent skills to leverage containerization (Docker), distributed computing (Horovod), and peer-to-peer networking (Go libp2p). NuNet past work includes the implementation of projects similar to the one proposed here, showcasing our readiness to tackle the unique challenges this project poses.
NuNet is committed to Open Source Software development from the inception. Therefore, all our development and progress is available for public scrutiny at all times as well as open collaboration with the community. We actively invite and work with the community in regards to contribution, usage, work and testing of the platform codebase.
Link: https://gitlab.com/nunet
NuNet licencing policy:
https://docs.nunet.io/nunet-licensing-policy/
Openness and Responsibility:
We have established a robust framework to ensure openness and responsibility in the execution of the project and the management of finances:
1. Elaborate Budgeting: We present an exhaustive budget layout at the start of the project that details the fund allocation across various tasks. This leaves no room for ambiguity regarding the utilization of funds.
2. Periodic Reporting: Regular updates regarding the project and financial statements will be shared, offering complete transparency into the progression of the project and the use of funds.
3. External Auditing: We are open to audits conducted by independent third parties at regular intervals. This ensures responsibility and openness in our financial management.
4. Escrow Mechanisms: To further reassure proper use of funds, we can utilize an escrow service. This arrangement ensures that the project funds are held by a third party and released according to pre-set milestones. This provides an extra layer of assurance for the funds.
5. Payment Based on Milestones: Our payment structure is built around specific, agreed-upon milestones. This ensures that funds are released as we achieve these milestones. The completion of each milestone can be verified, ensuring you pay only for verifiable progress.
These measures reflect our commitment to openness, responsibility, and proper management of funds. We believe that these factors, along with our technical capabilities, make us an ideal choice to successfully execute this project.
We understand that not all steps we have implemented are valid for the Catalyst proposal but it demonstrates the internal working procedures we have in place.
Catalyst Experience
NuNet also has received the funding for proposals in Fund7 and Fund8. One proposal is successfully closed and the other is close to completion with one technical obstacle left to be solved. Overall, the funds were spent as intended on the development which can be monitored on Gitlab with daily commits since the award.
https://gitlab.com/groups/nunet/-/milestones/19#tab-issues
https://gitlab.com/groups/nunet/-/milestones/20#tab-issues
Financial Stability
As a 28+ strong team, we have independent funding to develop the core platform with a cash runway for at least 1-1.5 years. Cardano Catalyst proposals are used to extend the functionality and add features to the platform in order to enrich the possible use cases.
The financial report is publicly available and can be reviewed here:
https://medium.com/nunet/nunet-financial-report-2022-and-outlook-for-2023-405d38397629
Opening Statement:
This initiative seeks to engineer a decentralized, universally adaptable, GPU-enhanced setting for shared machine intelligence tasks. By integrating the capacities of Docker modules, the distributed training capabilities of Horovod, and the peer-to-peer networking of Go libp2p, we aim to architect a system that can optimize GPU usage across a distributed network. Our strategy will pave the way for an innovative approach to conducting machine intelligence experiments, one that is not tethered to a single provider or a centrally controlled infrastructure.
Main goals:
1. Design a Universally Adaptable GPU-Enhanced Environment: The system will be designed to cooperate with GPUs from all manufacturers, enabling users to utilize their current GPU resources without needing to pledge to a specific provider. This objective will be verified by illustrating the ability to run Docker modules with machine intelligence workloads on GPUs from various manufacturers.
2. Incorporate Distributed Machine Intelligence Training: Our goal is to facilitate distributed training of machine intelligence models across various Docker modules. This will be verified by successfully training a model on multiple GPUs across the network, and comparing the training duration and model performance to those achieved with single-GPU training.
3. Develop Inter-Container Communication: The initiative will implement a communication protocol between Docker modules using libp2p, enabling efficient inter-container communication for distributed computing tasks. This will be validated by demonstrating efficient message passing and synchronization between modules in the network.
4. Showcase Scalability and Efficiency: Our system should display improved efficiency and scalability in executing machine intelligence experiments as compared to traditional centralized methods. We will validate this by running experiments that compare the performance of our system to traditional methods in terms of processing time, resource utilization, and scalability.
5. Enhance User Experience: While our initiative is highly technical, we aim to design a user-friendly platform. The success of this objective will be validated qualitatively through user feedback and quantitatively through user engagement metrics.
Strategy for Implementation:
Our initiative is technical and experimental in nature. We plan to commence by establishing Docker modules equipped with GPUs from various manufacturers. Next, we will incorporate Horovod into these modules to facilitate distributed training. Simultaneously, we will work on implementing a communication protocol between the modules using Go libp2p. Once these components are established, we will conduct experiments to validate the efficiency, scalability, and user-friendliness of our system.
The anticipated outcome of this initiative is a functional prototype of a decentralized, universally adaptable, GPU-enhanced system for distributed machine intelligence. This system will not only push the boundaries of what's possible in machine intelligence infrastructure, but will also empower researchers and developers to conduct their machine intelligence experiments in a more efficient and flexible manner.
Milestone 1: Project Commencement and detailed architecture blueprints
Milestone 2: Development of Proof-of-Concept for GPU Job Splitting
Milestone 3: System Testing & Improvement for GPU Job Splitting
Milestone 4: Production Release & Documentation for GPU Job Splitting
Milestone 5: Dissemination & Research Paper Writing for GPU Job Splitting
The success of this project will be measured by the successful deployment of the decentralized GPU job splitting system on the Cardano testnet first and later on mainnet. The long-term impact of the project will be evaluated by the adoption and usage of the system by the Cardano community. The project's progress will be tracked by checking the completion of each milestone's deliverables and intended outcomes.
Milestone 1: Project Commencement and detailed architecture blueprints
Milestone 2: Development of Proof-of-Concept for GPU Job Splitting
Milestone 3: System Testing & Improvement for GPU Job Splitting
Milestone 4: Production Release & Documentation for GPU Job Splitting
Milestone 5: Dissemination & Research Paper Writing for GPU Job Splitting
Each milestone’s progress will be tracked through the completion of the stated expected results and the achievement of the anticipated impact. Regular project update meetings and reports will provide visibility into the project's progress, and any issues or delays will be addressed through the project's risk management process. The overall project management methodology will be agile, with regular sprint planning, daily stand-up meetings, and retrospective meetings. Key performance indicators will be defined to track the progress and success of the project. The team will regularly communicate with stakeholders and the Cardano community to keep them updated on the progress and gather feedback.
Each project is examined in great detail which can be seen in the proposed budgeting sheet. This results in pre-feasibility and feasibility studies which minimize the risk of budget overruns.
Project management in NuNet is on a high level with employed techniques such as Agile, Scrum, CCPM and others resulting in a good daily overview of the project progress.
The project is complex and involves research and development uncertainties however, NuNet is a well funded deep tech startup and in case of budget overruns will continue to develop until delivered due to this being a critical part of the overall NuNet development plan. This is evidenced by the funding received in Cardano Catalyst Fund 7 and 8 where NuNet has continued the work despite the substantial unexpected technical roadblocks and time impact.
The costs of the project are based on the average salary levels of engineers currently employed by NuNet. Since the team is fully distributed and remote, it is challenging to have a suitable median cost that covers the range of countries (India, Pakistan, Ethiopia, Brasil, Egypt, UAE, UK, Italy and others).
We believe that the costs are reasonable and reflect the seniority and knowledge of various positions involved in the delivering of the proposal.
In line of full openness, in the budget table can be seen the very granular distribution of costs, all the way to the hours of each position for each milestone.
In addition, fully remote workers can compete for jobs in Western countries driving the individual compensation levels much higher than in their native countries.
NuNet is a deep tech startup that is developing cutting edge solutions in the decentralized open source space. Currently, there are 28+ people in NuNet working on delivering use cases, primarily for Cardano. On top of that,
As a SingularyNET spin-off, NuNet has access to 100+ AI and software engineers for support. Main team members responsible for this proposal are presented below.
The NuNet Team working on this project:
Name: Kabir Veitas, PhD AI, MBA
Location: Brussels, Belgium
LinkedIn: https://www.linkedin.com/in/vveitas/
Position: Co-Founder & CEO
Bio:
Working in the computer software, research and management consulting industries with demonstrated experience. Skilled in Artificial Intelligence, cognitive and computer sciences, systems thinking, technology strategy, strategic business planning, management and social science research. Strong operations professional with a Doctor of Philosophy - PhD focused in Multi/Interdisciplinary Studies from Vrije Universiteit Brussel.
Name: Janaina Senna, MSc CS, MBA
Location: Belo Horizonte, Brasil
LinkedIn: https://www.linkedin.com/in/janaina-farnese-senna/
Position: Product Owner
Bio:
Master's degree in computer science and played different roles over the past 20 years, such as development manager, tech lead, and system architect, helping organizations launch new software and hardware products in the telecommunication and energy areas. As a product owner, she has shaped the product vision into manageable tasks and constructed the bridge between developers and stakeholders. She enjoys seeing products coming to life!
Name: Avimanyu Bandyopadhyay, PhD Candidate, Bioinformatics, MTech CS
Location: Kolkata, India
LinkedIn: https://www.linkedin.com/in/iavimanyu/
Position: Systems Scientist
Bio:
Knowledge-driven PhD candidate who manages resources and technical skills to accelerate collaborative research with GPU-based Bioinformatics. He thrives in a fast-paced and cross-disciplinary team environment that challenges his capacity for problem-solving and troubleshooting. He’s very passionate about understanding how various open source software work and loves to design new deployment models for them. Furthermore, he also believes that any software is as good as its documentation.
Interest driven researcher and author of “Hands-On GPU Computing With Python”, he has produced several scientific articles in different areas of science and research, with an academic publication related to enhancing productivity while working with extensive data.
At NuNet, he works with the integration of GPUs, tools and mechanisms with the broader NuNet platform.
Name: Dagim Sisay Anbessie, BSc CS
Location: Addis Ababa, Ethiopia
LinkedIn: https://www.linkedin.com/in/dagim-sisay-7b4b05b8/
Position: Tech Lead
Bio:
Experience in projects in the areas of Robotics, Machine Learning, System Software Development and Server Application Deployment and Administration for several international clients. At SingularityNET he worked on AI and misc. software development. Main responsibilities lay in researching the development path, technology to be used and directing specific tasks to the dev team. Additionally, he has been involved in system development when circumstances demand it.
Name: Jennifer Bourke, BA, MSc
Location: Dublin, Ireland
LinkedIn: https://www.linkedin.com/in/jennifer-bourke-1bb286158/
Position: Marketing and Community Lead
Bio:
A data-driven marketing expert with a postgraduate degree in digital marketing and data analytics. Currently pursuing a postgraduate degree in global leadership, she combines her strategic marketing skills with a global perspective. With over 6 years of experience, Jennifer has a proven track record of driving successful marketing campaigns.
Name: Ilija Radeljic, MSc CE
Location: Oslo, Norway
LinkedIn: https://www.linkedin.com/in/dagim-sisay-7b4b05b8/
Position: Director of Operations and Business Development
Bio:
Corporate industry veteran and AI&Blockchain enthusiast. This combination brings a wealth of 15 years of experience managing major infrastructure, power and manufacturing projects to the emerging blockchain world and its applications.
15+ years of experience in business negotiation, partnerships, leads, market entry, project management, promotion and presentations worldwide.
Formal engineering education, MSc Civil Engineering + MIT Sloan Executive Management and Leadership certified.
Cardano Catalyst Community Advisor and Cardano Catalyst Veteran Community Advisor since the beginning (Fund2) and consulted several funded proposals in Cardano Catalyst.
External auditors:
NuNet is also collaborating with the external auditing company Obsidian (https://obsidian.systems/) which has been contracted to audit the core platform development as well as specific use case integrations such as this one.
We intend to extend their contract (or hire another suitable 3rd party auditor) for auditing the implementation of this research work as well.
External support:
NuNet has a capable team (28+) to tackle the project but sometimes some extra resources or skills might be needed outside of the available pool. This will be sourced either as additional employees or subcontracted depending on the size and length of the development.