Home Funds overview Fund12 Cardano Open: Developers

ID: #1200047

Last updated a month ago

Share this on X Share this on Facebook Share this on LinkedIn Share this on Reddit

Build a customised LLM for Aiken Smart Contract code analysis and release on Hugging Face

Status:

In progressCountry:

United KingdomIndustry group:

Development & Tools

Share this on X Share this on Facebook Share this on LinkedIn Share this on Reddit

Problem

A custom LLM for Aiken will significantly accelerate development workflows with features such as code suggestions and error detection. Enabling developers to boost overall productivity and quality.

Total to date

This is the total amount allocated to Build a customised LLM for Aiken Smart Contract code analysis and release on Hugging Face. 2 out of 3 milestones are completed.

₳24,890

Total funds requested

Distributed: ₳14,934

Remaining: ₳9,956

10/24

12/24

01/25

Complete

In progress

To be completed

Explore all milestones in-depth

290

Total votes cast

₳66.9M

Votes yes

₳46.9M

Votes abstain

About this idea

smart-contract developers teaching

NB: Monthly reporting was deprecated from January 2024 and replaced fully by the Milestones Program framework. Learn more here

[GENERAL] Name and surname of main applicant

Ami Bening

[GENERAL] Are you delivering this project as an individual or as an entity (whether formally incorporated or not)

Entity (Incorporated)

[GENERAL] Please specify how many months you expect your project to last (from 2-12 months)

[GENERAL] Please indicate if your proposal has been auto-translated into English from another language

[GENERAL] Summarize your solution to the problem (200-character limit including spaces)

Create a customised LLM using an iterative process with a custom dataset for fine-tuning and validation testing to perfect the custom LLM to accurately analyse Aiken code.

[GENERAL] Does your project have any dependencies on other organizations, technical or otherwise?

[GENERAL] If YES, please describe what the dependency is and why you believe it is essential for your project’s delivery. If NO, please write “No dependencies.”

[GENERAL] Will your project’s output/s be fully open source?

Yes

[GENERAL] Please provide here more information on the open source status of your project outputs

OpenSource under the MIT License. Access repo here: https://github.com/amibening/AikenLLM

The LLM model will be opensourced as AikenLLM on https://huggingface.co/

[METADATA] SDG rating

SDG Goals

9 - Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation

SDG Subgoals

9.b - Support domestic technology development, research and innovation in developing countries, including by ensuring a conducive policy environment for, inter alia, industrial diversification and value addition to commodities

Key Performance Indicator (KPI)

9.b.1 - Proportion of medium and high-tech industry value added in total value added

[SOLUTION] Please describe your proposed solution

There is a need for tools that can enhance developer productivity, improve code quality, reduce entry barriers, and assist with adhering to standards in smart contract development with Aiken on Cardano.

Creating a custom Large Language Model (LLM) for Aiken hosted on HuggingFace addresses many of these challenges for Cardano Smart Contract development as well as assisting and help train new developers coming to the ecosystem.

What is Hugging Face?

Hugging Face is an open source library language model repository, which can also provide pre-trained models and a simple interface. The library is widely used and has been adopted by many researchers, developers, and companies. It will be used to host the model along with the a custom dataset and will enable the following -

Provide a platform for deploying and managing the model in production environments
A command-line interface for interacting with the model

Why a Custom LLM?

Efficiency and Productivity: Automating repetitive coding tasks and offering intelligent suggestions speeds up the development process.
Quality Assurance: Detecting potential errors or inadequate code before deployment enhances the reliability of smart contracts.
Educational Tool: The LLM can serve as an educational resource, which is invaluable for onboarding new developers.
Community and Open Source: Hosting on Hugging Face encourages community collaboration, allowing developers to contribute to the model’s training and improvement, which enhances its effectiveness and adaptability.

Who Will Engage with the Project?

Cardano Smart Contract Developers: Especially those working with or planning to work with the Aiken language.
Enterprises: Companies and projects focusing on developing Cardano smart contracts can use this LLM to streamline their development processes.
Educational Institutions: That offer courses in Cardano development and could use the LLM as a teaching aid.

Metrics and Methods to Measure Impact:

Quality Metrics: Compare the frequency and severity of bugs in projects developed with the help of the LLM versus those without. This could involve statistical analysis of project outcomes.
Educational Outcomes: Engage with educational institutions to measure how the LLM affects learning outcomes for Aiken students.
Performance Benchmarks: Benchmark the LLM’s performance on tasks such as code completion accuracy and bug detection rates
Community Contributions: Community contributions to the model’s training and development will be enabled, reflecting the model's openness and collaborative potential and future growth.

Proof of Concept:

Pilot Projects: Launch pilot projects with selected project development teams to integrate the LLM into their workflows and measure the changes in development speed, bug rates, and overall project success.

[IMPACT] Please define the positive impact your project will have on the wider Cardano community

The following is a list of positive impacts the project will have on the Cardano developer community and helping grow the ecosytems development projects -

Increased Developer Adoption and Efficiency:
Impact: By helping simplify the process of writing, testing, and deploying smart contracts in Aiken, the LLM will attract more developers to the Cardano ecosystem, assisting its growth and diversity.
Measurement: Track the number of developers who start using the LLM by downloads.
Enhanced Smart Contract Quality:
Impact: The LLM will help reduce bugs in smart contracts, which is crucial for the trust and reliability of the Cardano blockchain.
Measurement: Analyse the reduction in reported issues and vulnerabilities pre and post-LLM implementation. Collaborate with Cardano projects to validate improvements.
Education and Training:
Impact: The LLM will serve as a valuable learning tool for new and existing developers, helping to lower the learning curve associated with Cardano smart contract development.
Measurement: Partner with educational institutions to track usage and performance improvements in courses that incorporate the LLM.
Community Building and Collaboration:
Impact: Hosting the model on an open platform like Hugging Face fosters a collaborative environment where developers can contribute towards improvements, share insights, and leverage collective intelligence.
Measurement: Monitor the volume and quality of community contributions and updates to the model. Track engagement metrics on the repository.

[CAPABILITY & FEASIBILITY] What is your capability to deliver your project with high levels of trust and accountability? How do you intend to validate if your approach is feasible?

Jest has a proven track record of the successful completion of projects within the software development and AI space. Our experience has equipped us with a good understanding of the challenges and specific requirements needed for such a project.

Validation of Feasibility:

We have conducted a set of pilot studies that have validated that our proposal is feasible
We intend to involve members of the existing development projects to assist in the testing and validation of our proposal for the production environment

[PROJECT MILESTONES] What are the key milestones you need to achieve in order to complete your project successfully?

Milestone 1: Data Collection and Preparation

Objective: Gather and prepare a comprehensive dataset of Aiken code for model training.

Tasks:
Identify and collect Aiken smart contracts and relevant code examples from code repositories such at GitHub, GitLab and others and connect with potential beta testers
Automate the collection of Aiken code samples from repositories
Clean and format the data into a custom dataset for model learning purposes and automate this as much as possible, any manual aspects to be documented
Acceptance Criteria:
Ensure there are diverse examples of Aiken code to assist developers
Data fully annotated and prepared for building the model
Ensure there are enough beta testers from development and educational projects

Milestone 2: Model Selection, Initial Training, Fine-Tuning and Optimisation

Objective: Select an appropriate base model and begin the initial training phase.

Tasks:
Evaluate potential base model suitable for adaptation to the Aiken language
Perform initial training and fine tuning with the prepared custom dataset
Acceptance Criteria:
Base model selected based on performance metrics such as accuracy in code understanding tasks
Initial model achieves a pre-set benchmark on training accuracy and loss metrics

Objective: Fine-tune the model to specifically adapt to the nuances of the Aiken language.

Tasks:
Continue training the model with a focus on reducing overfitting and improving
Fine tune and optimise model parameters for better performance on specific tasks like code syntax and bug detection
Acceptance Criteria:
Demonstrates code understanding and prediction capabilities specific to Aiken

Milestone Final: Evaluation and Testing

Objective: Thoroughly evaluate and test the model to ensure reliability and effectiveness.

Tasks:
Conduct comprehensive testing
Gather feedback from initial user groups (beta testers)
Iterate based on feedback to improve the model
Acceptance Criteria:
Model meets or exceeds performance benchmarks such as accuracy, precision, and recall in code related tasks
Positive feedback from high majority of beta testers regarding usability and effectiveness

Objective: Deploy the model and custom dataset onto Hugging Face / GitHub and ensure all aspects of the project are documented for future community contributions

Tasks:
Prepare the model for deployment, including final optimisations and packaging
Deploy the model on Hugging Face along with all documentation on GitHub
Acceptance Criteria:
Model and dataset are accessible and downloadable from Hugging Face
Upload all source code to build model onto GitHub
Document automation scripts for gathering data for dataset and share on GitHub
Document how the Dataset data is cleaned and formatted on GitHub
Document how development and educational processes improved with beta testers

[RESOURCES] Who is in the project team and what are their roles?

Ami Bening - www.linkedin.com/in/amibening/ will research and compile the project and its deliverables. He has worked as a software developer, product consultant, business analyst and software architect in various global organisations, where he has contributed to the design and development of AI ML software projects.

[BUDGET & COSTS] Please provide a cost breakdown of the proposed work and resources

The budget of the project is ₳24890 which is calculated as follows:

Cost per hour: $50 (discounted from usual min developer rate of $100)
Total number of days = 28 (8 hour working day)
ADA/USD conversion rate:0.45 (as of 9th May 2024)

Break down of cost per milestone -

The project will be carried out over a period of 6 months.

In summary, the milestones and payments are as follows -

Milestone 1 : ₳8889 (36%)
Milestone 2 : ₳7112 (28%)
Final Milestone : ₳8889 (36%)

[VALUE FOR MONEY] How does the cost of the project represent value for money for the Cardano ecosystem?

The cost of the project offers good value for money as follows -

A heavily discounted rate has been used for this project
This will be a tool to assist in the improving of quality and standards of Aiken code
The model can be added to workflows within projects improving overall productivity
Onboarding new developers will be made easier with the additional educational tool
Assist in increasing diversity as the tool will help developers who do not speak English as a first language
Community collaboration, allowing developers to contribute to the model’s training and improvement
Enabling the building of new or adapting existing developer tools to utilise and enhance the open sourced model

Overall the project will add value to the Cardano developer echosystem.

Team