[GENERAL] Name and surname of main applicant
Ami Bening
[GENERAL] Are you delivering this project as an individual or as an entity (whether formally incorporated or not)
Entity (Incorporated)
[GENERAL] Please specify how many months you expect your project to last (from 2-12 months)
6
[GENERAL] Please indicate if your proposal has been auto-translated into English from another language
No
[GENERAL] Summarize your solution to the problem (200-character limit including spaces)
Create a customised LLM using an iterative process with a custom dataset for fine-tuning and validation testing to perfect the custom LLM to accurately analyse Aiken code.
[GENERAL] Does your project have any dependencies on other organizations, technical or otherwise?
No
[GENERAL] If YES, please describe what the dependency is and why you believe it is essential for your projectâs delivery. If NO, please write âNo dependencies.â
No
[GENERAL] Will your projectâs output/s be fully open source?
Yes
[GENERAL] Please provide here more information on the open source status of your project outputs
OpenSource under the MIT License. Access repo here: https://github.com/amibening/AikenLLM
The LLM model will be opensourced as AikenLLM on https://huggingface.co/
[METADATA] SDG rating
SDG Goals
9 - Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation
SDG Subgoals
9.b - Support domestic technology development, research and innovation in developing countries, including by ensuring a conducive policy environment for, inter alia, industrial diversification and value addition to commodities
Key Performance Indicator (KPI)
9.b.1 - Proportion of medium and high-tech industry value added in total value added
[SOLUTION] Please describe your proposed solution
There is a need for tools that can enhance developer productivity, improve code quality, reduce entry barriers, and assist with adhering to standards in smart contract development with Aiken on Cardano.
Creating a custom Large Language Model (LLM) for Aiken hosted on HuggingFace addresses many of these challenges for Cardano Smart Contract development as well as assisting and help train new developers coming to the ecosystem.
What is Hugging Face?
Hugging Face is an open source library language model repository, which can also provide pre-trained models and a simple interface. The library is widely used and has been adopted by many researchers, developers, and companies. It will be used to host the model along with the a custom dataset and will enable the following -
- Provide a platform for deploying and managing the model in production environments
- A command-line interface for interacting with the model
Why a Custom LLM?
- Efficiency and Productivity: Automating repetitive coding tasks and offering intelligent suggestions speeds up the development process.
- Quality Assurance: Detecting potential errors or inadequate code before deployment enhances the reliability of smart contracts.
- Educational Tool: The LLM can serve as an educational resource, which is invaluable for onboarding new developers.
- Community and Open Source: Hosting on Hugging Face encourages community collaboration, allowing developers to contribute to the modelâs training and improvement, which enhances its effectiveness and adaptability.
Who Will Engage with the Project?
- Cardano Smart Contract Developers: Especially those working with or planning to work with the Aiken language.
- Enterprises: Companies and projects focusing on developing Cardano smart contracts can use this LLM to streamline their development processes.
- Educational Institutions: That offer courses in Cardano development and could use the LLM as a teaching aid.
Metrics and Methods to Measure Impact:
- Quality Metrics: Compare the frequency and severity of bugs in projects developed with the help of the LLM versus those without. This could involve statistical analysis of project outcomes.
- Educational Outcomes: Engage with educational institutions to measure how the LLM affects learning outcomes for Aiken students.
- Performance Benchmarks: Benchmark the LLMâs performance on tasks such as code completion accuracy and bug detection rates
- Community Contributions: Community contributions to the modelâs training and development will be enabled, reflecting the model's openness and collaborative potential and future growth.
Proof of Concept:
- Pilot Projects: Launch pilot projects with selected project development teams to integrate the LLM into their workflows and measure the changes in development speed, bug rates, and overall project success.
[IMPACT] Please define the positive impact your project will have on the wider Cardano community
The following is a list of positive impacts the project will have on the Cardano developer community and helping grow the ecosytems development projects -
- Increased Developer Adoption and Efficiency:
- Impact: By helping simplify the process of writing, testing, and deploying smart contracts in Aiken, the LLM will attract more developers to the Cardano ecosystem, assisting its growth and diversity.
- Measurement: Track the number of developers who start using the LLM by downloads.
- Enhanced Smart Contract Quality:
- Impact: The LLM will help reduce bugs in smart contracts, which is crucial for the trust and reliability of the Cardano blockchain.
- Measurement: Analyse the reduction in reported issues and vulnerabilities pre and post-LLM implementation. Collaborate with Cardano projects to validate improvements.
- Education and Training:
- Impact: The LLM will serve as a valuable learning tool for new and existing developers, helping to lower the learning curve associated with Cardano smart contract development.
- Measurement: Partner with educational institutions to track usage and performance improvements in courses that incorporate the LLM.
- Community Building and Collaboration:
- Impact: Hosting the model on an open platform like Hugging Face fosters a collaborative environment where developers can contribute towards improvements, share insights, and leverage collective intelligence.
- Measurement: Monitor the volume and quality of community contributions and updates to the model. Track engagement metrics on the repository.
[CAPABILITY & FEASIBILITY] What is your capability to deliver your project with high levels of trust and accountability? How do you intend to validate if your approach is feasible?
Jest has a proven track record of the successful completion of projects within the software development and AI space. Our experience has equipped us with a good understanding of the challenges and specific requirements needed for such a project.
Validation of Feasibility:
- We have conducted a set of pilot studies that have validated that our proposal is feasible
- We intend to involve members of the existing development projects to assist in the testing and validation of our proposal for the production environment
[PROJECT MILESTONES] What are the key milestones you need to achieve in order to complete your project successfully?
Milestone 1: Data Collection and Preparation
Objective: Gather and prepare a comprehensive dataset of Aiken code for model training.
- Tasks:
- Identify and collect Aiken smart contracts and relevant code examples from code repositories such at GitHub, GitLab and others and connect with potential beta testers
- Automate the collection of Aiken code samples from repositories
- Clean and format the data into a custom dataset for model learning purposes and automate this as much as possible, any manual aspects to be documented
- Acceptance Criteria:
- Ensure there are diverse examples of Aiken code to assist developers
- Data fully annotated and prepared for building the model
- Ensure there are enough beta testers from development and educational projects
Milestone 2: Model Selection, Initial Training, Fine-Tuning and Optimisation
Objective: Select an appropriate base model and begin the initial training phase.
- Tasks:
- Evaluate potential base model suitable for adaptation to the Aiken language
- Perform initial training and fine tuning with the prepared custom dataset
- Acceptance Criteria:
- Base model selected based on performance metrics such as accuracy in code understanding tasks
- Initial model achieves a pre-set benchmark on training accuracy and loss metrics
Objective: Fine-tune the model to specifically adapt to the nuances of the Aiken language.
- Tasks:
- Continue training the model with a focus on reducing overfitting and improving
- Fine tune and optimise model parameters for better performance on specific tasks like code syntax and bug detection
- Acceptance Criteria:
- Demonstrates code understanding and prediction capabilities specific to Aiken
Milestone Final: Evaluation and Testing
Objective: Thoroughly evaluate and test the model to ensure reliability and effectiveness.
- Tasks:
- Conduct comprehensive testing
- Gather feedback from initial user groups (beta testers)
- Iterate based on feedback to improve the model
- Acceptance Criteria:
- Model meets or exceeds performance benchmarks such as accuracy, precision, and recall in code related tasks
- Positive feedback from high majority of beta testers regarding usability and effectiveness
Objective: Deploy the model and custom dataset onto Hugging Face / GitHub and ensure all aspects of the project are documented for future community contributions
- Tasks:
- Prepare the model for deployment, including final optimisations and packaging
- Deploy the model on Hugging Face along with all documentation on GitHub
- Acceptance Criteria:
- Model and dataset are accessible and downloadable from Hugging Face
- Upload all source code to build model onto GitHub
- Document automation scripts for gathering data for dataset and share on GitHub
- Document how the Dataset data is cleaned and formatted on GitHub
- Document how development and educational processes improved with beta testers
[RESOURCES] Who is in the project team and what are their roles?
Ami Bening - www.linkedin.com/in/amibening/ will research and compile the project and its deliverables. He has worked as a software developer, product consultant, business analyst and software architect in various global organisations, where he has contributed to the design and development of AI ML software projects.Â
[BUDGET & COSTS] Please provide a cost breakdown of the proposed work and resources
The budget of the project is âł24890 which is calculated as follows:
- Cost per hour:Â $50 (discounted from usual min developer rate of $100)
- Total number of days = 28 (8 hour working day)
- ADA/USD conversion rate:0.45 (as of 9th May 2024)
Â
Break down of cost per milestone -
Â
Â
Â
Â
The project will be carried out over a period of 6 months.
Â
In summary, the milestones and payments are as follows -
- Milestone 1 : âł8889 (36%)
- Milestone 2 : âł7112 (28%)
- Final Milestone : âł8889 (36%)
[VALUE FOR MONEY] How does the cost of the project represent value for money for the Cardano ecosystem?
The cost of the project offers good value for money as follows -
- A heavily discounted rate has been used for this project
- This will be a tool to assist in the improving of quality and standards of Aiken code
- The model can be added to workflows within projects improving overall productivity
- Onboarding new developers will be made easier with the additional educational tool
- Assist in increasing diversity as the tool will help developers who do not speak English as a first language
- Community collaboration, allowing developers to contribute to the modelâs training and improvement
- Enabling the building of new or adapting existing developer tools to utilise and enhance the open sourced model
Overall the project will add value to the Cardano developer echosystem.