Fund 10 had nearly 1600 proposals, which is overwhelming for the standard Catalyst voter. Cardano has many decentralized infrastructure projects, and limited tools for developers to utilize them.
This is the total amount allocated to RAGDoC: Open Source and Decentralized AI Analysis of Catalyst Proposals.
I will recreate and open source the AI analysis pipeline I made to filter Catalyst proposals for Minswap. However, it will run completely on Cardano infrastructure with fully open source tools.
All tools developed on the path to building this tool will be licensed under MIT, which is a complete open source option for both public, private, and commercial usage.
Any data published in the course of this work will either maintain the original licensing or be licensed under CC-BY-SA-4.0.
Outline
Abstract
I want to build an AI analysis pipeline called RAGDoC (Retrieval Augmented Generation for Documents on Cardano) that runs completely on Cardano (e.g. NuNet and Iagon). This pipeline can cluster and summarize Catalyst proposals (or any set of documents) to make finding proposals that align with your interests easier. In the process of developing this tool, I will create or further develop the open source tools needed to run this and other pipelines on Cardano infrastructure.
Background
I was a member of a group of Minswap volunteers that provided input to Minswap on the 50 proposals that were voted for in Fund10. 1600 proposals was way too many documents to look through, so I decided to use a combination of RAG models, dimension reduction, and clustering to group proposals together and then have AI models summarize each group. This help us to more easily browse through the proposals and find the ones relevant to our community. However, I used OpenAI to accomplish this and never released any of the source code.
Since Fund10, NuNet and Iagon have become much more mature, both having functional alphas for compute and storage respectively. Further, Iagon plans to have an alpha version of compute in early 2024. With these tools, it is possible to completely recreate this the workflow I developed for Minswap using completely open source tools and completely decentralized infrastructure! However, tooling is needed to make it easier for developers to utilize them.
Approach
A broad overview of this approach is retrieval augmented generation (RAG) with a dimension reduction and clustering intermediate step. The general steps are Catalyst proposal aggregation, text embedding with a large language model to obtain vector embeddings, dimension reduction of the vector embeddings, clustering, and finally summarization of the contents of the clusters using a large language model. Below is an example of the result of this workflow from Fund10, showing that the model clustered similar proposals together and appropriately summarized them.
Group 26 (relevance: 100.00%):
The common themes across the proposals include the use of the Aiken programming
language, the need for audits and bug bounties, the goal of increasing DeFi usage on
Cardano, and the desire to strengthen liquidity in the ecosystem. Other common goals
include showcasing the efficiency and interoperability of Aiken, empowering Cardano
developers with open-source tools, upgrading contracts for efficiency and functionality,
and enabling decentralized renting. Feasibility is a key consideration, with proposals
emphasizing technical assessments, prototype development and testing, security audits,
user feedback and validation, and community engagement and adoption. The proposals also
highlight specific challenges such as the lack of open-source Stableswap and options for
launching tokens on Cardano, as well as the need for better user experiences during high
chain load. Customizability and adaptability are important factors in addressing these
challenges.
Proposals:
Title: Minswap Aiken Stableswap Audit + Bug Bounty
https://cardano.ideascale.com/a/dtd/101498-163 (332000 ada requested of 9,080,400 ada available)
Title: SundaeSwap Aiken Smart Contracts
https://cardano.ideascale.com/a/dtd/102976-163 (276000 ada requested of 9,080,400 ada available)
Title: Lenfi V2 Aiken Audit + Bug Bounty
https://cardano.ideascale.com/a/dtd/103087-163 (265000 ada requested of 9,080,400 ada available)
Title: Revolutionizing Cardano Rewards Contracts: Aiken Language Upgrade for Efficiency
and Functionality
https://cardano.ideascale.com/a/dtd/103870-163 (85000 ada requested of 9,080,400 ada available)
Title: FluidShare: Decentralized Uncollateralized Renting [Release + Audit + Open
Source]
https://cardano.ideascale.com/a/dtd/104787-163 (200000 ada requested of 9,080,400 ada available)
Title: Minswap Aiken V2 Audit
https://cardano.ideascale.com/a/dtd/105516-163 (467000 ada requested of 9,080,400 ada available)
Title: Minswap Liquidity Bootstrapping for DAOs
https://cardano.ideascale.com/a/dtd/103138-163 (206000 ada requested of 3,158,400 ada available)
The original version of this workflow used OpenAI for the embedding and summarization steps, but these can be replaced by open source models that also perform better than the OpenAI models. For text embedding, I will use Instructor-XL from Meta and the Allen Institute for AI. For summarization I will use Llama2 from Meta's Facebook Research group. A stretch goal for this project will be to generalize the code to use any model for embedding or summarization.
Vector storage will use FAISS (an MIT licensed project from Facebook). Dimension reduction will allow a variety of different reduction types including UMAP and PaCMAP. Clustering will come with the ability to use a variety of clustering algorithms including HDBscan and the standard k-means.
Tooling
All tools will be developed in Python, the primary language used for AI development. The tooling component to this proposal is as valuable as the end product itself. It will create the open source tools, or build upon the existing ones I have released, to enable AI developers to make use of decentralized infrastructure on Cardano.
nunet-py
NuNet is a decentralized computing project on Cardano that allows individuals to rent the processing power of their computer. nunet-py is a project I have developed while actively testing NuNet during it's alpha testing phase, and it allows programmatic execution of jobs on NuNet. It is capable of fully configuring and executing a job on NuNet, but it suffers from some basic usability issues and no documentation. This tool will be further developed and be the job submission tool for running the data aggregation, text embeddings, clustering, etc for RAGDoC.
iagon-py
Iagon is a decentralized, privacy focused storage solution that runs on Cardano. It allows individuals to rent out disk space on their computer. iagon-py is a project I developed during Iagons alpha test phase, but it has very rudimentary functionality and no documentation. This tool will be used for storing intermediate data, such as text embeddings, clusters, and summarization information.
cardano-flows
To provide additional utility to developers, it would be helpful to make the workflow of RAGDoC modular so that data aggregation, embeddings, dimension reduction, clustering, and summarization are all separate steps in the process. The reason is that if each task is made into a separate step, the tools can be re-used for other applications. While there are tools for creating workflows in Python, most are tied to a workflow manager directly. cardano-flows will be a new tool used to create and run workflows on Cardano infrastructure. For this proposal, it will use NuNet for compute and Iagon for storage, but it will make the individual components abstractable so that as new projects come online they can be easily added. For example, when Iagon's compute infrastructure comes online, cardano-flows should be built in a way to easily incorporate it as a compute backend.
RAGDoC Dashboard
The final piece of RAGDoC is a Dashboard for browsing Catalyst data, tuning parameters, and submitting workflows. The Dashboard will be created with Solara, a Python wrapper around React. This dashboard will allow users to submit the pipeline to NuNet and access results from Iagon to be displayed in an interface that will allows users to browse results and link back to the original documents in IdeaScale. Part of this dashboarding will include open sourcing some custom components for Solara, such as the wallet connector that allows people to sign transactions and CIP-8 messages (already live and in use on the SteelSwap dex aggregator).
Audience
I see two general categories of audience for this project:
The success of this project will give the Cardano community improved mechanisms for evaluating Catalyst proposals, which have become increasingly burdensome with the number of proposals that have been submitted.
Further, the success of this project will enable developers to more easily adopt the decentralized computing tools on Cardano.
The success of this project will be evaluated a few different ways.
I am highly capable of delivering this project with high levels of trust and accountability. Since I have already developed a prototype of this tool for the Minswap community for Fund10, and I have prototype versions of most of the tools needed to make this work. I am in active communication with the teams from NuNet and Iagon as I have developed these tools, and I have commitments from the NuNet team for compute resources as I develop this project.
Outputs
Catalyst proposal aggregation toolbox.
Completion and documentation for nunet-py and iagon-py.
Acceptance Criteria
A github repo with code needed for aggregating Catalyst proposals.
An mkdocs documentation sites for nunet-py and iagon-py, describing all functionality and providing example use cases.
Outputs
A job specification for configuring jobs in a workflow.
Creation of cardano-flows that permits configuring of jobs in Python, with execution and storage on NuNet and Iagon respectively.
Acceptance Critera
A github repo.
A PyPI package for cardano-flows.
An mkdocs site describing all functionality and providing some simple test cases. One test case will be pulling in Fund11 data, embedding with an open source model, and storing on Iagon.
Output
The RAGDoC dashboard.
Acceptance Criteria
A Github repo with a Readme on how to set up the dashboard.
A dashboard that will execute the analysis workflow and visualize the outputs.
A deployment that serves the dashboard, with CIP-8 login for credentials.
Stretch Goal
Provide configuration for the workflow to permit different AI models, dimension reduction algorithms and parameter tuning, and clustering algorithms.
I, Elder Millenial, am the sole developer on this project. I possess the AI, compute, and tooling skills needed to perform this work. Although I operate under a pseudonymous name, I will provide any and all verification required if my proposal is selected.
I have already engaged with the NuNet team. I have a direct line of communication to them, and they have committed to computing resources for testing.
I have already engaged with the Iagon team, and I have a direct line of communication with them.
The predominant cost for this project is my time developing it. I estimate this will take 10-20 hours of work per week over the next 6 months. I am asking for ~10,000 ADA/month.
This doesn't account for any other development costs, such as domain names and server costs for the final deliverable, but any additional costs will come the final two months budget.
At approximately 15 hours a week and 10k ADA per month, my hourly cost comes out to about $60/hour. This is entirely reasonable for a mid to senior level dev, and is below what my standard pay is.