Home Funds overview Fund14 Cardano Use Cases: Concepts

Last updated 10 months ago

Share this on X Share this on Facebook Share this on LinkedIn Share this on Reddit

Wolfram: AI Benchmarks for Cardano

Share this on X Share this on Facebook Share this on LinkedIn Share this on Reddit

Problem

Absence of standardized LLM benchmarks for Cardano’s smart contract languages hinders the advancement of a robust and efficient AI-assisted development ecosystem.

Solution

Wolfram will build an open-source framework to benchmark LLMs on interpreting and generating Cardano smart contracts, measuring correctness, syntax compliance, vulnerability and explain-ability.

Total to date

This is the total amount allocated to Wolfram: AI Benchmarks for Cardano.

50,000 $ADA

Total funds requested

109

Total votes cast

14.8M

Votes yes

6.87M

Votes abstain

About this idea

[Proposal setup] Proposal title

Please provide your proposal title

Wolfram: AI Benchmarks for Cardano

[Proposal Summary] Budget Information

Enter the amount of funding you are requesting in ADA

50000

[Proposal Summary] Time

Please specify how many months you expect your project to last

[Proposal Summary] Translation Information

Please indicate if your proposal has been auto-translated

Original Language

[Proposal Summary] Problem Statement

What is the problem you want to solve?

Absence of standardized LLM benchmarks for Cardano’s smart contract languages hinders the advancement of a robust and efficient AI-assisted development ecosystem.

[Proposal Summary] Supporting Documentation

Supporting links

[Proposal Summary] Project Dependencies

Does your project have any dependencies on other organizations, technical or otherwise?

Describe any dependencies or write 'No dependencies'

No dependencies

[Proposal Summary] Project Open Source

Will your project's outputs be fully open source?

Yes

License and Additional Information

The project will be open-sourced under the Apache 2.0 license for maximum permissiveness and enterprise adoption. All source code will be published on GitHub from project start. Documentation, API references, and example use cases will be included. Community contributions will be encouraged through public issues and PRs.

[Theme Selection] Theme

Please choose the most relevant theme and tag related to the outcomes of your proposal.

[Campaign Category] Category Questions

Describe what makes your idea innovative compared to what has been previously funded (whether by you or others).

Unlike prior AI projects in blockchain or Cardano, this initiative builds the first open-source framework to benchmark LLM proficiency in Cardano’s smart contract languages. It combines blockchain-specific validation (syntax, deployment, vulnerability checks) with AI evaluation methods (semantic similarity, explainability). By testing both commercial APIs and open-weight models, it enables fair, extensible comparisons, giving developers actionable insights to adopt AI tools more effectively.

Describe what your prototype or MVP will demonstrate, and where it can be accessed.

The MVP will be a working benchmarking framework, accessible as a public GitHub repository under the Apache 2.0 license from project start. It will include:

A curated set of Plutus smart contract dataset with reference code and test cases.

LLM execution pipeline capable of running prompts across multiple commercial and local LLM models.

Evaluation module for correctness, syntax compliance, semantic similarity, and vulnerability detection.

A final benchmark report comparing well-known commercial LLMs and open-weight model.

Describe realistic measures of success, ideally with on-chain metrics.

50+ clones of the GitHub repository within 2 months of release.

Benchmark dataset includes 50+ unique contract tasks of varying complexity.

Identification of some actionable changes for improving Plutus LLM support.

Pull requests from the community

Some successfully deployed smart contract to Cardano Testnet, generated during benchmarking process.

[Your Project and Solution] Solution

Please describe your proposed solution and how it addresses the problem

With Cardano's smart contract ecosystem evolving, its range of smart contract languages such as Plutus, Aiken, etc. provides developers with flexibility and tailored capabilities. Yet this variety also brings differing levels of complexity, unique learning requirements, and distinct workflows. This may pose entry-barriers for new comers and slow productivity in general. Large Language Models (LLMs) have emerged as transformative tools in software engineering, offering capabilities such as accelerating code generation, explaining complex logic, optimizing performance, and identifying security vulnerabilities.

Within the Cardano ecosystem, LLMs hold significant potential to enhance smart contract development. However, the extent to which these models fully understand Cardano’s smart contract languages remains unclear. We propose development of a systematic benchmarking framework to evaluate LLM proficiency in interpreting, generating, and explaining Cardano smart contract code. The framework will identify strengths, weaknesses, and language-specific challenges, providing actionable insights to guide developers toward the most LLM-compatible tools and workflows.

The proposed solution aims to:

Develop a benchmarking framework to evaluate LLM capabilities in interpreting, generating, optimizing, and explaining smart contracts written in Cardano’s primary language, Plutus, with support for future extension to other Cardano languages.
Determine the most effective LLMs for Plutus based on accuracy, security, and developer usability, and provide a benchmarking framework capable of applying the same evaluation methods to other Cardano smart contract languages.
Provide actionable recommendations for improving language toolchains and documentation, making them more AI-friendly and enhancing LLM-assisted development.
Establish a continuous benchmarking process to monitor and improve the Cardano ecosystem’s readiness for future AI integration.

Diagram for LLM Benchmarking: https://amoeba.wolfram.com/index.php/s/5F2fczGM6tWH8wi

The evaluation framework will assess LLM performance across key dimensions including correctness, syntax compliance, vulnerability, and explainability. It will ingest a curated collection of prompt-response pairs, and each prompt will be executed through an LLM Execution Engine capable of interfacing with both commercial LLM APIs and open-weight local models. The generated responses will be processed by an Evaluation Engine, which will compare them to ground truth using a variety of techniques, including:

Embedding similarity scoring (e.g., cosine distance via sentence-transformers) to evaluate semantic alignment.
Rule-based correctness checks to ensure functional behavior matches requirements.
Blockchain-specific validation such as verifying contract deployment success, execution cost efficiency, and absence of known vulnerabilities. Automated statistical and logical consistency checks where relevant to contract logic.

A Performance Metrics Module will output structured evaluation data, visual summaries, and annotated response logs, enabling the Cardano community to benchmark LLM behavior and identify opportunities for improving language toolchains, documentation, and ecosystem readiness for AI-assisted development. The framework will be built using open-source technologies, fully documented, and designed for extensibility so it can adapt to additional Cardano smart contract languages, new evaluation methods, and emerging LLM capabilities.

[Your Project and Solution] Impact

Please define the positive impact your project will have on the wider Cardano community

This project will deliver immediate and long-term benefits by systematically evaluating how effectively Large Language Models (LLMs) understand and work with Cardano’s smart contract languages. The results will empower developers, researchers, and ecosystem leaders with actionable insights to guide AI integration strategies for the Cardano ecosystem.

The project will:

Provide clear insights on how well different LLMs interpret, generate and optimize Cardano smart contract in Plutus, extendible to other languages.
Create a standardized LLM benchmark dataset and methodology that can be reused for testing future LLM models.
Allow the ecosystem to focus on improvements that make smart contract development more AI-compatible.
Help attract talent from other ecosystems, as AI-driven development would lower the entry barrier.

[Your Project and Solution] Capabilities & Feasibility

What is your capability to deliver your project with high levels of trust and accountability? How do you intend to validate if your approach is feasible?

Our team has extensive expertise in Large Language Model (LLM) evaluation, stemming from our work on large-scale public benchmarks and custom internal projects. We have a strong track record in developing comprehensive LLM benchmarks, including the Wolfram Benchmarking Project: https://www.wolfram.com/llm-benchmarking-project/. This work provides us with deep insights into LLM capabilities by rigorously evaluating their performance on complex tasks like code generation.

Our experience is further enriched by the creation and utilization of a suite of internal benchmarks for specialized AI applications, such as AI tutors for mathematics and biology, and for evaluating tool-assisted mathematical problem-solving. We are proficient in a wide array of evaluation methodologies, from designing robust frameworks and metrics to performing granular error analysis. This allows us to develop a nuanced, actionable understanding of model performance that goes far beyond simple accuracy scores. Wolfram brings its proven expertise with AI, blockchain and data driven applications, ensuring the successful delivery of developing comprehensive LLM benchmarking project.

Organizational Strengths

Our team has over three decades of leadership in computational science, data science, and AI. We have multidisciplinary team specializing in AI, blockchain & development, data science, and community engagement. Our company has always prided ourselves on the global talent and knowledge we have. Along with our innovative tech stack, we have an in-house consulting team who has helped organizations with the most difficult problems for decades.

[Milestones] Project Milestones

Milestone Title

Project Setup & Dataset Curation

Milestone Outputs

A public GitHub repository is initialized under the Apache 2.0 license with clear base documentation, including README and license files. A curated dataset of Plutus smart contract code snippets is prepared, organized into folders, and accompanied by prompt templates with examples that demonstrate intended use.

Acceptance Criteria

The GitHub repository must be publicly accessible and include an initial commit with README, Apache 2.0 license, dataset folder, and template documentation. Example prompts are tested and confirmed to run correctly, ensuring that contributors and community members can reproduce the setup and understand usage.

Evidence of Completion

Public GitHub repository link displaying the dataset folder, documentation, and prompt templates. Repository history confirms an initial commit containing README, license, and curated dataset files.

Delivery Month

Cost

10000

Progress

20 %

Milestone Title

Benchmarking Framework Architecture & LLM Integration

Milestone Outputs

The architecture for the LLM execution and code validation pipeline is fully defined and documented, with diagrams and specifications uploaded to GitHub. A working integration layer is implemented that connects with at least one commercial API and one local open-weight model. A collection module for prompts and responses is also established and tested.

Acceptance Criteria

The GitHub repository contains architecture diagrams and technical specifications of the benchmarking pipeline. The system must demonstrate successful retrieval of outputs from both a commercial API and a local model, with prompt–response pairs collected and stored for reproducibility, proving integration works as intended.

Evidence of Completion

Repository updates include architecture diagrams, integration code for API and local models, and a record of successful prompt–response retrievals to show that both sources are functioning in the pipeline.

Delivery Month

Cost

10000

Progress

40 %

Milestone Title

Evaluation Engine & Metrics Implementation

Milestone Outputs

A comprehensive evaluation engine is implemented with modules for rule-based correctness and syntax compliance of Plutus contracts. Validators check blockchain-specific criteria including deployment success, execution cost efficiency, and vulnerability detection. A semantic similarity module using embedding scoring is integrated, with unit tests confirming accuracy.

Acceptance Criteria

All evaluation modules, including syntax, correctness, and semantic similarity, must pass unit tests on a curated set of sample inputs. Preliminary benchmarking results comparing multiple LLMs must be generated and published in a structured report, demonstrating that the evaluation engine produces consistent, actionable outputs.

Evidence of Completion

GitHub repository updated with evaluation engine source code and test cases, plus an initial benchmarking report comparing LLMs. Both PDF and Markdown formats are available for review by the community.

Delivery Month

Cost

10000

Progress

60 %

Milestone Title

Security Assessment & Continuous Benchmarking Setup

Milestone Outputs

A security and vulnerability assessment module is created and integrated into the benchmarking pipeline. This module checks for known vulnerability patterns in Plutus smart contracts. Documentation is written explaining how to extend the framework to additional Cardano languages, providing step-by-step guidance for community developers.

Acceptance Criteria

The security assessment module must successfully detect known vulnerabilities in at least a set of sample contracts. Documentation for extending the benchmark to other Cardano languages is reviewed by community developers and validated for clarity and completeness, ensuring long-term extensibility.

Evidence of Completion

GitHub repository includes the security module source code and examples of detected vulnerabilities, as well as extension documentation that has been publicly shared for review and feedback.

Delivery Month

Cost

10000

Progress

80 %

Milestone Title

Final Benchmark Report & Open-Source Release

Milestone Outputs

The complete benchmarking framework is released publicly under Apache 2.0 with full source code, datasets, and detailed documentation. A comprehensive final report compares all tested LLMs across correctness, syntax, vulnerability, and explainability. Supporting materials include video tutorials, user guides, developer documentation, and a community presentation summarizing findings.

Acceptance Criteria

The GitHub repository contains the final benchmarking framework with all supporting materials. A final benchmark report is published in PDF and Markdown formats, alongside tutorials, walkthrough videos, and clear documentation. Feedback from community testing is incorporated to ensure the release is robust and accessible.

Evidence of Completion

Public GitHub repository includes final code, datasets, and documentation. A recorded walkthrough or demo session is uploaded alongside the benchmark report and tutorials to provide evidence of completion and usability.

Delivery Month

Cost

10000

Progress

100 %

[Final Pitch] Budget & Costs

Please provide a cost breakdown of the proposed work and resources

The total project budget is 50,000 ADA, allocated to ensure each development stage is fully resourced and tied to measurable outputs.

Project Setup & Dataset Curation – 10,000 ADA covers initial repository creation under Apache 2.0 license, dataset collection of Plutus smart contract snippets, and definition of prompt templates.

Benchmarking Framework Architecture & LLM Integration – 10,000 ADA funds the design and documentation of the LLM Execution & Code Validation Pipeline, integration with both commercial LLM APIs and local models, and development of a structured prompt–response collection system.

Evaluation Engine & Preliminary Benchmarking – 10,000 ADA supports the implementation of rule-based correctness checks, syntax compliance validation, semantic similarity scoring, and blockchain-specific deployment tests, culminating in the first benchmarking report across multiple LLMs.

Security Assessment & Continuous Benchmarking Setup – 10,000 ADA enables integration of vulnerability detection into the evaluation pipeline and preparation of documentation for extending benchmarks to other Cardano smart contract languages.

Final Benchmark Report & Open-Source Release – 10,000 ADA delivers the complete open-source benchmarking framework, final comparative benchmark report, developer and user documentation, video tutorials, community presentation, and incorporation of community feedback into the final release.

[Final Pitch] Value for Money

How does the cost of the project represent value for the Cardano ecosystem?

This project aims to deliver lasting value without unnecessary cost. By creating an open-source benchmarking framework, we reduce duplication of effort and make future model evaluations more efficient. Developers gain actionable insights into LLM performance, saving time otherwise spent on trial-and-error. The framework also strengthens Cardano’s developer community through improved documentation and a shared knowledge base. Released under Apache 2.0, it can be maintained and extended at low cost, ensuring scalability and continued ecosystem benefit.

[Required Acknowledgements] Consent & Confirmation

Terms and Conditions:

Yes

Team

Jon Woodard, CEO

Jon Woodard is the CEO at Wolfram Blockchain Labs, where Jon coordinates the decentralized projects that connect the Wolfram Technology ecosystem to different DLT ecosystems. Previously at Wolfram Research Jon worked on projects at the direction of Wolfram Research CEO Stephen Wolfram and prior to that was a member of the team who worked on the monetization strategies and execution for Wolfram|Alpha. Jon has a background in economics and computational neuroscience. He enjoys cycling in his spare time.

Steph Macurdy, Head of Research and Education

Steph Macurdy has a background in economics, with a focus on complex systems. He attended the Real World Risk Institute in 2019, lead by Nassim Taleb, and has been investing in the crypto asset space since 2015. He previously worked for Tesla as an energy advisor and Cambridge Associates as an investment analyst. Steph is a youth soccer coach in the Philadelphia area and is interested in permaculture.

Gaurav Vishal, Manager

Gaurav Vishal is a Manager in Wolfram Research’s Technical Consulting team with over 6 years of experience in designing and delivering computational applications and enterprise AI solutions. He specializes in full-stack development, data analysis, machine learning, with expertise in microservices architecture and distributed data systems. Since joining Wolfram in 2019, he has led projects ranging from AI-powered tutoring platforms and large-scale data integration pipelines to secure on-premise analytics systems for enterprise and government clients.

Gaurav holds a B.Tech degree from IIT Bhubaneswar, where he earned the Institute’s Silver Medal for academic excellence. He also received the TSIL Research Partner Award for his research on heat transfer in coal-fired Sponge Iron Rotary Kilns. Known for his focus on computational efficiency, system reliability, and client satisfaction, Gaurav has contributed to high-impact technical solutions for Fortune 500 companies, academic institutions, and public sector organizations.

Subesh Sonthalia, Application Developer

Subesh Sonthalia is an Application Developer at Wolfram, specializing in scalable software architecture and AI-driven systems. With a Master’s in Mechanical Engineering focused on Microfabrication and Simulation Optimization, he combines engineering precision with computational innovation to build solutions that reflect Wolfram’s vision. Outside of work, he enjoys adventure sports.

Sanjeet Patra, Application Developer

Sanjeet Patra is as an Application Developer at Wolfram Technology Consulting, building products aligned with Wolfram’s mission and technological vision. Previously worked as an Internal Combustion Engine Engineer, gaining deep expertise in automotive systems. His career journey reflects a versatile skill set, spanning from core mechanical engineering to modern software development. Outside of work, he enjoys playing cricket and soccer.

Gabriela Guerra Galan, Project Manager

Gabriela Guerra Galan: Gabriela has 15+ years of experience leading projects. She is a certified PMP and Product Owner with bachelor's degree in Mechatronical Engineering, complemented by a master's degree in Automotive Engineering. As the co-founder of Bloinx, a startup that secured funding from the UNICEF Innovation Fund, she has demonstrated a passion for driving innovation and social impact.

Steph Macurdy

All funds

Wolfram: AI Benchmarks for Cardano

Problem

Solution

Total to date

About this idea

[Proposal setup] Proposal title

[Proposal Summary] Budget Information

[Proposal Summary] Time

[Proposal Summary] Translation Information

[Proposal Summary] Problem Statement

[Proposal Summary] Supporting Documentation

[Proposal Summary] Project Dependencies

[Proposal Summary] Project Open Source

[Theme Selection] Theme

[Campaign Category] Category Questions

[Your Project and Solution] Solution

[Your Project and Solution] Impact

[Your Project and Solution] Capabilities & Feasibility

[Milestones] Project Milestones

[Final Pitch] Budget & Costs

[Final Pitch] Value for Money

[Required Acknowledgements] Consent & Confirmation

Team

Thank you for subscribing