Last updated 4 months ago
The construction and utilization of LLM inference infrastructure is costly and centralized, placing control in the hands of major corporations and making it inaccessible to global populations.
This is the total amount allocated to Wolfram: AI - LLM Distributed Inference Services. 4 out of 4 milestones are completed.
1/4
Development of easy-to-deploy LLM Server
Cost: ₳ 25,000
Delivery: Month 2 - May 2024
2/4
Development of LLM API + WebUI
Cost: ₳ 30,000
Delivery: Month 5 - Aug 2024
3/4
Iterate: Deploy/Test/Debug
Cost: ₳ 30,000
Delivery: Month 7 - Oct 2024
4/4
Close Out Reporting
Cost: ₳ 15,000
Delivery: Month 8 - Nov 2024
NB: Monthly reporting was deprecated from January 2024 and replaced fully by the Milestones Program framework. Learn more here
Matt Blumberg
GridRepublic Team
Development and prototyping of a distributed LLM inference service. The modular system will support a range of models
GridRepublic (the development team behind Charity Engine), who provide essential expertise and technology in building and operating distributed systems at scale.
The LLM(s) used will be open-source. The distributed computing tools and infrastructure, however, are provided by GridRepublic; some portions of this are open-source and some are not.
Artificial Intelligence plays a growing role in assisting users that work in knowledge-based work.
Wolfram Blockchain Labs (WBL) is engaging actively in Cardano Catalyst-sponsored research to develop assistance chat bot services for Cardano Catalyst in Fund10 (note – we've now changed the name of that project to the Cardano “Catalyst Navigator”).
In the course of this work, it’s become clear that in order to appropriately deploy LLM-based applications for Catalyst, blockchain and general communities worldwide, lower-cost inference resources are needed
Thus, in order to build appropriate distributed infrastructure to power such low-cost inferences, WBL is collaborating with GridRepublic (an organization with nearly two decades of experience in distributed computing applications, at global scale):
For this project, we propose to build a modular inference service, with a simple API (and WebUI), but running on a global network of participating servers. Towards this end, the project will provide a simple-to-deploy LLM-Server application which can be run on appropriately provisioned servers, and will then automatically plug into and integrate with the global inference service: "The Network is the computer", as they used to say at Sun Microsystems.
Through this distributed inference infrastructure, Project Catalyst will then be able to launch scalable and low-cost LLM applications trained on Cardano datasets. This will help enhance sharing of knowledge about, and boost participation in, the Cardano ecosystem.
A key component in this is the creation of a distributed infrastructure for running LLM-based applications, which aims to lower costs and enhance democratic control of critical systems.
The current project proposes development of a prototype. Future work, however, could enable integration of ADA-based payments to support the ecosystem, e.g. LLM users could in principle pay in ADA, and resource-providers (i.e. participants running the LLM-Server app) could be paid in ADA.
It's worth noting also that we intend to build our prototype around the MDEL LLM, which has unique and extensive multilingual capabilities – being, we believe, the only LLM supporting languages like Hindi, Vietnamese, and others. This opens exciting avenues of outreach to communities worldwide where both tools and infrastructure for advanced AI are presently unavailable. (*MDEL have experience and tooling for extending the range of languages: thus providing another key avenue for future growth.)
Wolfram is working with GridRepublic (which manages Charity Engine), an entity we've worked with before, with over a decade of experience building distributed computational systems. (Wolfram Research has also been a Charity Engine customer and user for many years; see also Wolfram Language Batch Compute )
GridRepublic team (through the Charity Engine service) has operated distributed applications running on as many as a million simultaneous CPU cores, in domains ranging from molecular simulation, advanced mathematics, and genomics.
For example:
Development of easy-to-deploy LLM Server (e.g in container form)
Development of LLM API (*Concept)
Development of LLM WebUI (*Concept)
Iterate: Deploy/Test/Debug
N/A
Share working demo: an LLM inference service powered by distributed compute resources, with tolerable latency
Matthew Blumberg, GridRepublic Co-founder and CEO
Matthew Blumberg has been working in the fields of network computing and large scale collaboration for 15 years. He is Executive Director of GridRepublic and Co-Founder of Charity Engine, two large-scale distributed computing services. Past projects include work as Fellow at Harvard's MetaLAB; Visiting Fellow at the Laboratory for Innovation Science at Harvard (LISH); Section Editor of "The Handbook of Human Computation"; Consultant to DARPA’s “Social Computing Seedling”; and Partner in TGT Energy, an industrial-scale energy storage venture.
Jon Woodard is the CEO at Wolfram Blockchain Labs, where Jon coordinates the decentralized projects that connect the Wolfram Technology ecosystem to different DLT ecosystems. Previously at Wolfram Research Jon worked on projects at the direction of Wolfram Research CEO Stephen Wolfram and prior to that was a member of the team who worked on the monetization strategies and execution for Wolfram|Alpha. Jon has a background in economics and computational neuroscience. He enjoys cycling in his spare time.
Johan Veerman is General Manager at Wolfram Research South America and CTO at Wolfram Blockchain Labs. Previously he has been Science Advisor at the Ministry of Foreign Affairs in Peru and Chief Scientist on two Antarctic expeditions. Johan's background is on physics and business management. He enjoys playing soccer and is a certified cave diver.
Steph Macurdy, WBL Head of Research and Education
Steph Macurdy has a background in economics, with a focus on complex systems. He attended the Real World Risk Institute in 2019, lead by Nassim Taleb, and has been investing in the crypto asset space since 2015. He previously worked for Tesla as an energy advisor and Cambridge Associates as an investment analyst. Steph is a youth soccer coach in the Philadelphia area and is interested in permaculture.
Gabriela Guerra Galan, WBL Business Operations Specialist
Gabriela Guerra Galan: Gabriela has 15+ years of experience leading projects. She is a certified PMP and Product Owner with bachelor's degree in Mechatronical Engineering, complemented by a master's degree in Automotive Engineering. As the co-founder of Bloinx, a startup that secured funding from the UNICEF Innovation Fund, she has demonstrated a passion for driving innovation and social impact.
Milestone 1: ₳10,000
Milestones 2: ₳25,000
Milestone 3; ₳20,000
Milestone 4: ₳35,000
This initiative involves developing a distributed infrastructure to run LLM-based applications. It not only complements other applications that require LLM usage but also offers substantial benefits to the Catalyst community.
GridRepublic's tools and expertise pave the way for such a functional, reliable, and scalable distributed inference service at a relatively low cost.
Furthermore, as noted above, this concept system will be well-suited for future projects. These include potential integration of Cardano-based payment and incentive systems to develop a Minimum Viable Product (MVP) in the future, scaling into a sustainable 'intelligence-as-a-service' ecosystem on Cardano.