Community members need a way to ask questions about, and educate themselves about, CIP-1694 governance. We can address this using AI-powered contextual retrieval.
This is the total amount allocated to AI RAG Analysis of CIP 1694.
Provide an educational exposition of the development of an open-source LLM (Large Language Model) based on RAG (Retrieval Augmented Generation) for question & analysis of CIP 1694.
No dependencies.
Yes, the development of this proposal will be under the Apache 2.0 Licence. Some external data sources may be attributed using Creative Commons. The text of this proposal is licenced under Creative Commons CC BY-NC-SA 4.0
-------------------------------------
Overview
CIP 1694 (Cardano Improvement Proposal) was an important step in Cardano’s Voltaire Roadmap. Various sources were used to draft, edit, review, consult and agree consensus for the CIP. This diversity of sources is an important feature of our decentralised community; but it does come with challenges. Despite a great deal of community communication, many of these sources remain opaque and hard to access.
When vital information is scattered across multiple sources in this way, it can lead to confusion, and make it challenging for people to educate themselves on the topic, or analyse the data to extract relevant insights, or verify exactly what was said at any given point in the process. This impacts the transparency and accountability of Cardano’s governance materials and processes.
Simply applying generalist Large Language Models such as Open AI will generate misleading responses, because external data sources are not carefully incorporated, which results in the LLM filling its knowledge gaps with hallucinations. This means that the effectiveness of untutored LLMs is limited, and any deeper understanding based on context-specific information is not possible without a way to embed specific external data sources.
Reasons for our approach: Retrieval-Augmented Generation (RAG)
-------------------------------------
This proposal will address these limitations by developing a Language Model using a Retrieval-Augmented Generation (RAG) approach which can be tailored to specific datasets. The RAG approach integrates a necessary data retrieval component with a language model, which enhances its ability to generate context-relevant responses.
The aim is to ease the process of question-and-answering, and enable more accurate answers. The RAG process provides contextual information retrieval and synthesis (data source comparisons) to ensure accuracy and comprehension.
A further aim of the proposal is as a demonstration/proof-of-concept. By developing a RAG retrieval process with a very specific data source (the CIP itself and related data), we aim to demonstrate how it can support community members to self-educate and keep informed about a specific topic - in this case, Voltaire governance.
Additionally the RAG approach allows for greater control over prompt engineering constraints and unit testing, so that quality assurance and ethical safeguards can be applied.
(See What is retrieval-augmented generation? for further information)
Who will we engage ?
-------------------------------------
QADAO will apply its extensive experience in community engagement and outreach to publicise our Open Source workflow and invite the community to reuse our methods, code and documentation. We also intend to engage the community with expositions at the close of each milestone, where we will demonstrate how we have worked, and invite the community to engage with and learn from our process.
How will we demonstrate or prove our impact ?
-------------------------------------
Overview of the proposed RAG model architecture
-------------------------------------
We will demonstrate our impact by providing educational step-by-step documentation and exposition of our RAG workflow.
Our methodology will focus on the most accessible models as a proof of concept, with each step documented in Colab (Python) notebooks. The Python code will be hosted, processed and documented in Colab Notebooks which will be committed to the project’s public GitHub repository with an Open Source Apache 2.0 licence.
Langchain libraries will be used to build the model architecture. Large Language Models (e.g. Open Source models hosted on HuggingFace) will be assessed for semantic use in combination with local datasets.
Data preprocessing and preparation
-------------------------------------
The data will be sourced, prepared and processed prior to embedding in a vector data store.
Model training and fine-tuning
-------------------------------------
The embeddings in the vector data store will provide the basis for model training and fine tuning. Sample queries and expected responses, sourced from the community, will be tested against the model.
Evaluation metrics and techniques
-------------------------------------
Conversation or query chains will be built and tested against the model. This will take the form of a Q&A interaction between a constrained local source and a general LLM, and a series of prescriptive prompt instructions.
Documentation and Knowledge Transfer
-------------------------------------
This entire process - the workflow, the code, the data processing, model training and evaluation - will be fully documented along the way and published at the close of the project.
The project will bring value to the Cardano ecosystem by delivering a straightforward, open-source, AI process that can be reused, queried and analysed by community members.
This will support people to self-educate on CIP-1694 (the topic of the dataset) despite barriers such as relevant data being widely dispersed, hard to track down, and lengthy; and it will thereby contribute to improved transparency and governance inclusion.
The project will also offer a demonstration of how a RAG retrieval process can work in this type of context, and will enable others in the ecosystem to use and adapt the process to enable AI querying of any set of data - both to analyse the data itself, and to use it as source material for community education about a topic.
We will measure this impact by:
We will share the outputs via our open-source documentation, and with a closing video which walks through what we did and how it works.
The team members are skilled and experienced members of the Catalyst community, and both have experience of working in transparent and open-source ways via GitHub, GitBook, and Dework, providing a trackable, accountable and trustworthy audit trail. See for example
They also have well-established skills in community engagement and education; and a thorough grasp of the RAG retrieval process and its educative and ethical implications.
Our proposal includes not only thorough documentation of our process, but also community education and sharing. This will offer a high level of trust and accountability, since the community verifies our work by learning about it and trying it in practice.
-------------------------------------
Data preparation and exploration; RAG model development
We will locate, preprocess and prepare data sources relevant to CIP-1694; and we will deliver a baseline RAG implementation and data.
Milestone outputs
Acceptance criteria
Evidence of milestone completion
-------------------------------------
Model refinement and optimization; model deployment and integration.
We will deliver refinements to the baseline model; and deploy the RAG model in a public, Open Source environment
Milestone outputs:
Acceptance criteria
Evidence of milestone completion
-------------------------------------
Testing and evaluation; documentation and knowledge transfer
We will provide comprehensive testing and evaluation of the deployed system; and deliver the final comprehensive documentation and knowledge transfer.
Milestone outputs
Acceptance criteria
Evidence of milestone completion
Stephen Whitenstall is the co-founder of Quality-Assurance DAO, https://qadao.io/ , and has provided project management consultancy for many Catalyst projects since Fund 4 including Catalyst Circle, Audit Circle, Community Governance Oversight, Training & Automation (with Treasury Guild), Governance Guild and Swarm. A Circle V2 representative for funded proposers. Also engaged in cross chain collaboration with SingularityNET managing an Archive project. He has 30 years experience in development, test management, project management, social enterprises in Investment Banking, Telecoms and Local Government. A philosophy honors graduate with an interest in Blockchain governance.
Vanessa Cardui
Community engagement professional with 20+ years' experience of working with communities to record and document their information, archive it, and make it discoverable. Part of QA-DAO, where she leads on documentation; founding member of The Facilitators’ Collective; founding member of the SingularityNET Archives; part of the SingularityNET DeepFunding Focus Group.
-------------------------------------
Data preparation and exploration; RAG model development
Milestone 1 outputs
Subtotal - 15,000 ADA
-------------------------------------
Model refinement and optimization; model deployment and integration
Milestone 2 outputs:
Subtotal - 21,000 ADA
-------------------------------------
Testing and evaluation; documentation and knowledge transfer
Final Milestone outputs
Subtotal - 15,000 ADA
-------------------------------------
Overall Total - 51,000 ADA
The pay rates given are self-employed rates that take into account the employment overheads of the resources contracted. The rates are based on the low end of US and European averages. The amounts are calculated for each milestone based on the hours to complete.
A freelance project manager can charge from $50/hr. In addition management of this project requires knowledge of open source software tools and an awareness of blockchain technology and LLMs. [Source - Project Management Fees | Hourly & Consulting Rates | Salaries – OCM Solution]
In addition, all the resources working on this project are taking on the currency risk of being paid in ADA. This means that a fall in the ADA price will result in being paid less or delivering less in each milestone. Any rise in the ADA price will represent a reward for investing in the Cardano ecosystem.
Consequently, given these factors, we believe this proposal offers excellent value for money in a volatile cryptocurrency environment.