Home Funds overview Fund8 Developer Ecosystem

ID: #800271

Last updated 2 years ago

Share this on X Share this on Facebook Share this on LinkedIn Share this on Reddit

NuNet: Decentralized GPU ML Cloud

Status:

CompleteCountry:

BelgiumIndustry group:

Development & Tools

Share this on X Share this on Facebook Share this on LinkedIn Share this on Reddit

Problem

Artificial Intelligence (Machine Learning) models need GPU processing power. How to provide such decentralized GPU power to grow Cardano.

Solution

NuNet platform that connects decentralized GPU hardware providers and enables secure, safe and decentralized access to GPUs for Cardano.

Completed outcome

Download report

Total to date

This is the total amount allocated to NuNet: Decentralized GPU ML Cloud.

83,000 USD

Total funds requested

Distributed: 83,000 USD

Remaining: 0 USD

Complete

In progress

2.34K

Total votes cast

364M

Votes yes

10.2M

Votes no

About this idea

scaling adoption infrastructure

NuNet platform that connects decentralized GPU hardware providers and enables secure, safe and decentralized access to GPUs for Cardano.

Team leader: Dr. V. Kabir Veitas - AI researcher & software architect; co-founder & CEO, NuNet.io

https://www.linkedin.com/in/vveitas/

83000

[IMPACT]

Summary

Applications running on Cardano, as well as SPOs, need computing power in the form of CPUs or GPUs. Currently there are only options to have cloud computing rented from big tech, which increases the reliance on such big tech companies or requires purchasing costly hardware setups. In the increasingly hostile and censorship prone environment, it is essential to secure the reliability and decentralization of Cardano.

Computing needs in the Cardano ecosystem can broadly be divided into:

1. CPU requirements - Stake Pool Operators

2. GPU requirements - Artificial Intelligence (Machine Learning), Dapps, Metaverse, others.

Allowing decentralized computing on CPUs is a prerequisite for running Cardano Nodes via NuNet, a project which already was awarded funding from Cardano Catalyst Fund7 as one of the top 20 voted proposals.

Fund8 proposal will push forward, expand the scope and focus on the GPU aspect.

Source:

https://cardano.ideascale.com/c/idea/383862

https://medium.com/nunet/decentralized-compute-for-spos-is-coming-aecdcbbc3fa7

Overview

Utilization of GPU by the NuNet platform will span in two phases:

Foundation - Phase 1: One User Per GPU

Scaling - Phase 2: GPU Grid Computing

Phase 1: Foundation - One User Per GPU Model

This model will involve getting the NuNet containers to support GPU access, monitor resource usage of GPUs and make them directly available to the processes running inside the containers. The GPUs utilized in this model initially will be the GPUs available on that specific provider device.

This model has its use-cases and would be able to allow ML model training and inference if the available GPU is adequately capable to handle the workload by itself. Additionally, it would serve as a guidance for the next phases of development by allowing the core development to be performed which involves supporting GPU device onboarding to NuNet, enabling NuNet Adapter to manage GPUs, implementation of GPU access from within virtual machines and containers, and monitoring GPU resource usage for provider compensation.

Regular personal computers are known not to have enough GPU capacity for large workloads and thus this model will be limited in its ability to allow large-scale ML projects and especially federated learning where data should not be transmitted to the device where the GPU is located. A model where data storage and device with GPU for training are decoupled is necessary to allow users to not upload data to a Provider's device in order to perform the training. It should be possible to allow only certain tasks and processes that need GPU execution be relayed to Provider's devices without having to transmit full training data i.e. process being transmitted instead of code and data.

Phase 1 is proposal and scope for Cardano Catalyst Fund8 (present proposal).

Source:

https://arxiv.org/pdf/2103.08894.pdf

Phase 2: Scaling - GPU Grid Computing

This model involves accumulating massive amounts of processing power by virtualizing GPUs and aggregating them in a pool where end users of these GPUs have access to a cluster instead of a single device.

Technically, this will be implemented in two interconnected steps:

Phase 2A: Splitting jobs into manageable tasks

Phase 2B: Assigning a cluster of virtual GPUs to workloads

Phase 2A: Splitting Jobs

This method involves three main components:

Worker : This component performs that actual work. This is basically a single procedure that is executed on a GPU
Work Manager : This component performs task splitting. It accepts large jobs, splits them into individually processable tasks and dispatches them to Workers across Provider devices.
Job Dispatcher : This component submits the full job to be executed to the Work Manager

In order to successfully develop this method, it would require interfering with the initial programming of the ML tasks. That is, it is necessary to ensure the ability of the Work Manager to split jobs into individually executable tasks. This can be achieved for example by building a library with a high level API to Numpy where certain operations are overloaded to be splittable. It helps developers write just like they're used to but would have to use certain recommended functions and data structures.

Phase 2B: Cluster of Virtual GPUs

This method will virtualize all GPUs available on the NuNet platform and make them available to containers running ML tasks as physical GPUs located on that virtual machine. This method will not involve building a task splitter as the splitting, scheduling and prioritization of tasks will be done by the low level APIs themselves.

It is based on the following use-case worked out with DeepChainADA: https://github.com/nunet-io/simple-ML-on-GPU/issues/1

The description of Phase 2 is given here in order to understand the long term potential and plan for building fundamentals (Phase 1). The current proposal does not include Phase 2 scope, which will be submitted for further Catalyst Funds based on the success of Phase 1.

GPU requirements - Artificial Intelligence (Machine Learning)

Training a Machine learning (ML) model requires a lot of processing power which can be costly or difficult to obtain. In Cardano Catalyst Fund7, an interesting proposal was funded which enables Decentralized Federated Machine Learning by ensuring privacy to allow open collaboration. This proposal will need GPU power to train the ML models, and is just one example of the potential usage of decentralized GPU power provided by NuNet. Furthermore, inferencing those models is less computationally expensive, but still needs considerable GPU compute resources and is somewhat more prone to decentralization.

DeepchainAda: Trustless AI training

Source:

https://app.ideascale.com/t/UM5UZBqdc

What is Machine Learning?

Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

Source:

https://en.wikipedia.org/wiki/Machine_learning

Why GPUs for Machine Learning?

GPUs are optimized for training artificial intelligence and deep learning models as they can process multiple computations simultaneously.

They have a large number of cores, which allows for better computation of multiple parallel processes. Additionally, computations in deep learning need to handle huge amounts of data — this makes a GPU’s memory bandwidth most suitable.

Source:

https://towardsdatascience.com/what-is-a-gpu-and-do-you-need-one-in-deep-learning-718b9597aa0d

Inside the cryptocurrency industry there are a lot of hardware providers of CPU and GPU power which can be easily diverted to train ML models. NuNet's proposal will enable tapping into that huge potential market (e.g. ETH miners) and linking to the demand in the Cardano ecosystem.

NuNet, a spinoff of SingularityNet, allows to run arbitrary computing workflows on community provisioned hardware and provides payment gateways directly from software or application via Cardano Plutus Smart Contracts. Adding the functionality to source decentralized GPU computing resources via NuNet ecosystem will tap into a huge and expanding part of global computing infrastructure, powering growing industries of AI as well as the emerging industry of Metaverse. NuNet’s ability to connect decentralized hardware into a single workflow is an attractive possibility for these industries.

This would greatly increase the possibilities of the growing ecosystem on Cardano as already witnessed by the needs of DeepchainAda: Trustless AI training. NuNet can provide resilience and true decentralization through the Cardano network both in CPU and GPU computing domains.

The proposal addresses the Challenge goals in terms of:

Deployment, testing, and monitoring frameworks
Support structures
Incentivization structures

To summarize, this proposal brings value to Cardano by enabling flexible, decentralized, robust, faster or cheaper CPU and GPU resources as a computing framework to support the Cardano ecosystem.

Risk 1: Mostly general technical research and development uncertainties and complexity of the project from that side. We are fairly confident that the team will be able to deal with difficulties, but that may require additional time and work.

Risk 2: Complexities with deployment with the pilot partner. To be mitigated with the possibility of including more testing partners inside the NuNet open source community.

Risk 3: Increased hardware prices and uncertainty in the GPU device market. To be mitigated by focused monitoring of price swings and acquiring hardware when prices are lowest.

[FEASIBILITY]

The delivery timeline can be split as follows upon receipt of the funding:

In two months:
NuNet onboarding GPU devices and allowing users to set amount of resource to be used
NuNet adapter equipped with ability to access multi-vendor GPUs
In four months:
NuNet containers supporting GPU access;
Framework to monitor resource usage of GPUs and make them directly available to the processes running inside the containers;
Webapp API specification and implementation for machine learning dApp access to the workflows;
In six months:
Onboarding ML workloads on NuNet for alpha testing GPUs available on that specific provider device.
Plutus contracts and adaptation of Tokenomics API for compensating GPU resource owners via NuNet platform;

Machine Learning webapp implemented, tested and deployed for accessing GPU resources via NuNet platform

The budget includes a mix of personnel, hardware as well as partners defining and running the ML scripts for which GPU computing is needed.

Item Expense Months/Unit Total, USD

Systems engineer 6000 6 36,000

Blockchain development (Plutus) 7000 3 21,000

Fullstack development 3000 4 12,000

Testing hardware 2000 3 6,000

Testing and pilot costs 8000 1 8,000

Total 83,000

The proposed budget is deemed sufficient for the implementation. In case of additional costs or scope, NuNet commits to allocate additional resources from its full-time development team in order to deliver project results as described.

Team lead:

Dr. V. Kabir Veitas - AI researcher & software architect; co-founder & CEO, NuNet.io

https://www.linkedin.com/in/vveitas

Project Manager:

Nara Bagiyan

https://www.linkedin.com/in/narina8

Technical manager:

Dagim Sisay - NuNet tech lead

https://www.linkedin.com/in/dagim-sisay-7b4b05b8

Main developers:

Israel Abebe Azime - MSc in Machine Learning

https://www.linkedin.com/in/israel-abebe

Tewodros Kederalah - BSc in Electrical and computer engineering

https://www.linkedin.com/in/tewodroskederalah

Khaled Yasser - BSc in Information technology

https://www.linkedin.com/in/khaled-yasser/

The NuNet team is also supported by SingularityNET human resources on-need basis while rapidly expanding organically after successful token launch on 17.11.

https://medium.com/nunet/nunet-community-contribution-round-completed-5543ce39915f

Pilot and implementation partner:

Nunet will partner with PGWAD for defining the structure and needs in order to enable access to decentralized GPU on Cardano for ML. The pilot will be deployed and run for Fund7 funded project DeepchainAda: Trustless AI training as proof of concept.

PGWAD is a cardano stakepool running on Raspberry Pi. PGWAD is part of the armada-alliance. This is an alliance of independent stake pool operators using low powered ARM cores to help decentralize Cardano. PGWAD is also part of xSPO alliance.

PGWAD means Packet GateWay for AI and Decentralization. PGWAD has been focusing on the DeepchainAda project.

Risk mitigation

Addressed under IMPACT section: What main challenges or risks do you foresee to deliver this project successfully.

[AUDITABILITY]

Roadmap with milestones

Addressed under FEASIBILITY section: Please provide a detailed plan and timeline for delivering the solution.

Metrics/KPISs

NuNet onboarding GPU devices and allowing users to set amount of resources to be used
NuNet adapter equipped with ability to access multi-vendor GPUs
NuNet containers supporting GPU access;
Framework to monitor resource usage of GPUs and make them directly available to the processes running inside the containers
Onboarding ML workloads on NuNet for alpha testing GPUs available on that specific provider device.
Framework and APIs adaptation for compensating GPU and CPU resources via Cardano blockchain transactions;

One of the proposed key metrics for this Challenge is that the proposal addresses the number of developers building on top of Cardano. For the Phase 1, at least one developer creating a complex ML model will be onboarded.

Training a Machine learning (ML) model requires a lot of processing power which can be costly or difficult to obtain. In addition, there are also other use cases on Cardano (AI, dapps, Metaverse etc.) where GPU computing power might be needed.

Solution:

Ability for community providers to onboard their GPU enabled computers via NuNet framework;
Ability to onboard ML workloads on NuNet for alpha testing GPUs available on that specific provider device
Ability for users of ML workloads on NuNet to compensate for GPU resources used using Cardano blockchain transactions;

To summarize, this proposal brings value to Cardano by enabling flexible, decentralized, robust, faster and cheaper GPU resources as a computing framework to support the Cardano ecosystem.

Entirely new project

Monthly report

NB: Monthly reporting was deprecated from January 2024 and replaced fully by the Milestones Program framework. Learn more here

February 3, 2023 Progress report

Status: In progress

On track: No

Estimated completion date: -

Summary

Please refer to our monthly recap blog for January updates: https://medium.com/nunet/nunet-monthly-recap-january-2023-8ea138143e4

Evidence

https://gitlab.com/groups/nunet/-/milestones/20#tab-issues

Explanation

- For our GPU ML decentralized cloud progress, we can now resume interrupted ML jobs either on PyTorch or TensorFlow. As a real world example, we've tested training an open source alternative to ChatGPT, known as PaLM+RLHF and have successfully been able to interrupt and resume it on other machines. - It is important to note that no additional modifications were made to the existing ML on GPU service workflow, in order to test the above real world example, and it was successfully implemented like any other ML training program and dependency. - We are immensely happy to share that different machines on our network can use checkpointing to carry on ML training or any other computational progress. We've also designed an ML on GPU test protocol for community testing. - Our ML on GPU service is ready for production level usage. Our device management service (DMS) can now send ML progress logs back to the web app.

January 10, 2023 Progress report

Status: In progress

On track: Yes

Estimated completion date: -

Summary

Please refer to our monthly recap blog for December updates: https://medium.com/nunet/nunet-development-update-december-2022-3308c80ad563

Evidence

https://gitlab.com/groups/nunet/-/milestones/20#tab-issues

Explanation

This month we have been busy with security testing our ML containers for GPU computing. The functional aspect of the ML on GPU milestone has been completed well before mid December with some tokenomics based implementations remaining. Focusing on additional features to enhance the framework, we recently developed a security layer to scan any ML job/dependencies requested for deployment on NuNet. They would undergo checking for any high severity issues. If any found, they are not allowed to execute and would be immediately removed. This is a positive development for ensuring a level of security on the machines provided by our compute providers. This update has recently been merged on our gitlab (see below link). Currently, we’re working on implementing a checkpointing solution for safeguarding the progress of the ML jobs. in addition to the above link, related issues on the device management service repository can be found here: https://gitlab.com/groups/nunet/device-management-service/-/issues/?sort=created_date&state=all

December 6, 2022 Progress report

Status: In progress

On track: Yes

Estimated completion date: -

Summary

Please refer to our monthly recap blog for all November updates: https://medium.com/nunet/nunet-monthly-recap-november-2022-617ec1abaee4

Evidence

https://gitlab.com/groups/nunet/-/milestones/20#tab-issues

Explanation

This month, advancements in the Decentralized SPO Computing use case include fixing messaging and deployment issues for compute providers, establishing a testing environment, and solutions for security issues and peer authorization. On the GPU ML Cloud front, progress was made on implementing the ML WebApp, defining the algorithm to estimate compute resource prices, creating the API for calling the device management service (DMS), and adding a new command on the DMS to install a GPU driver, among others.

November 2, 2022 Progress report

Status: In progress

On track: Yes

Estimated completion date: -

Summary

Please refer to our monthly recap blog for all updates: https://medium.com/nunet/nunet-monthly-recap-october-2022-423e56d9a637

Evidence

https://gitlab.com/groups/nunet/-/milestones/20#tab-issues

Explanation

We've completed working on the following issues: - Implementing the ML WebApp - Front-end - Defining the algorithm to estimate the required compute resource - Defining the algorithm to calculate the estimated compute resource price - Compute API - Calling the Device Management Service(DMS) REST API on the ML WebApp - Defining how the service provider receives the computed job results - Adding a new command on the NuNet DMS command line interface(CLI) to install a GPU driver Currently, our goal is to bring the NuNet ML on GPU WebApp to fruition as soon as we can, so that you the users, developers and researchers can start working independently on your very own computational infrastructure with NTX! To bring it to a quick but steady completion, our current focus is on defining an ML log saving algorithm to report the compute job's result. Once that is done, we will implement it through the DMS that would send the URL with job result to the service provider to show it on the WebApp. We also continue to work on: - Getting NuNet tokens for computing work and putting them into escrow - Security improvements when deploying docker containers from DMS

October 3, 2022 Progress report

Status: In progress

On track: Yes

Estimated completion date: -

Summary

Please refer to our monthly recap blog for all updates: https://medium.com/nunet/nunet-monthly-recap-september-2022-c8d418b346ef

Evidence

https://gitlab.com/groups/nunet/-/milestones/20#tab-issues https://gitlab.com/nunet/ml-on-gpu/ml-on-gpu-service/container_registry https://nunet-demo.vercel.app/

Explanation

Starting September, we've developed our own Dockerfiles and uploaded our own versions of container images for TensorFlow and PyTorch to our GitLab registry. Based on an initial overview for our GPU ML webapp, we've developed an initial preview . We've also worked on mapping the entries into the webapp as parameters into containers built from our ML Dockerfile. To achieve this in a generic manner, we've developed a python script bundled inside our containers to download open source ML code for our users, prepare it with any necessary dependencies before deployment, and then finally run it. Other than the above, we're also working on testing NTX tokens for computing work and putting them on escrow, calling the Device Management Service REST API on the ML webapp, and defining algorithms to estimate resource usage along with their NTX prices. We also continue to work on revamping the security infrastructure of our use case. All our issues on our ML on GPU repository can be tracked at the link above.

September 6, 2022 Progress report

Status: In progress

On track: Yes

Estimated completion date: -

Summary

NuNet platform that connects decentralized GPU hardware providers and enables secure, safe and decentralized access to GPUs for Cardano.

Evidence

https://gitlab.com/groups/nunet/-/milestones/20#tab-issues

Explanation

Early August, we've worked on stability testing and also the ability to run ML jobs without allowing administrative access inside the environments they run in. In the past week, we've developed individual Dockerfiles that are intended to be generated per ML request and forwarded to Nomad as a job. We are currently working on creating the sequence diagram of our GPU ML use case. This includes prototyping and designing the ML on GPU web app, the service, and its API specifications. We are also comprehensively working on strengthening the security aspects of this use case.

August 7, 2022 Progress report

Status: In progress

On track: Yes

Estimated completion date: -

July 20, 2022 Progress report

Status: In progress

On track: Yes

Estimated completion date: -

June 23, 2022 Progress report

Status: In progress

On track: Yes

Estimated completion date: 01/01/2023

May 23, 2022 Progress report

Status: In progress

On track: Yes

Estimated completion date: 01/01/2023

Team

Team leader: Dr. V. Kabir Veitas - AI researcher & software architect; co-founder & CEO, NuNet.io

https://www.linkedin.com/in/vveitas/

Nara

Website:https://gitlab.com/nunet

All funds

NuNet: Decentralized GPU ML Cloud

Problem

Solution

Completed outcome

Total to date

About this idea

[IMPACT]

[FEASIBILITY]

[AUDITABILITY]

Monthly report

February 3, 2023 Progress report

January 10, 2023 Progress report

December 6, 2022 Progress report

November 2, 2022 Progress report

October 3, 2022 Progress report

September 6, 2022 Progress report

August 7, 2022 Progress report

July 20, 2022 Progress report

June 23, 2022 Progress report

May 23, 2022 Progress report

Team

Thank you for subscribing