Home Funds overview Fund12 Cardano Use Cases: MVP

Last updated a year ago

Share this on X Share this on Facebook Share this on LinkedIn Share this on Reddit

Cardano on BigQuery: open, fast and easy querying of on-chain data

Share this on X Share this on Facebook Share this on LinkedIn Share this on Reddit

Problem

Free and open data access to Cardano’s blockchain data. Anybody with basic SQL knowledge can get information around specific pools, addresses, transactions fast and free of charge.

Total to date

This is the total amount allocated to Cardano on BigQuery: open, fast and easy querying of on-chain data.

₳157,500

Total funds requested

Complete

In progress

197

Total votes cast

₳54M

Votes yes

₳39.3M

Votes abstain

About this idea

[GENERAL] Name and surname of main applicant

Thomas Kaliakos

[GENERAL] Are you delivering this project as an individual or as an entity (whether formally incorporated or not)

Entity (Not Incorporated)

[GENERAL] Please specify how many months you expect your project to last (from 2-12 months)

[GENERAL] Please indicate if your proposal has been auto-translated into English from another language

[GENERAL] Summarize your solution to the problem (200-character limit including spaces)

We have already built a PoC where we export the Cardano on-chain data to BigQuery, using db-sync (see link 2). We will make this PoC a reliable, production-ready solution & move it to the next level.

[GENERAL] Does your project have any dependencies on other organizations, technical or otherwise?

Yes

[GENERAL] If YES, please describe what the dependency is and why you believe it is essential for your project’s delivery. If NO, please write “No dependencies.”

This project will depend on data provided by Cardano Db-sync (https://github.com/IntersectMBO/cardano-db-sync)

[GENERAL] Will your project’s output/s be fully open source?

Yes

[GENERAL] Please provide here more information on the open source status of your project outputs

All source code for the update and monitoring processes will be openly available under the Apache v2.0 license. Additionally, our documentation will be open source, featuring comprehensive guides on data querying, along with practical examples and code snippets for connecting to BigQuery. We will also make the results of our data validation publicly accessible, ensuring transparency and reliability.

[METADATA] SDG rating

By offering Cardano's blockchain data for free on the cloud, we open access to information to less privileged people. At the same time we improve sustainability, as the resources that would be needed to access on chain data individually, are now shared.

SDG Goals

9 - Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation

10 - Reduce inequality within and among countries

SDG Subgoals

9.1 - Develop quality, reliable, sustainable and resilient infrastructure, including regional and transborder infrastructure, to support economic development and human well-being, with a focus on affordable and equitable access for all

9.3 - Increase the access of small-scale industrial and other enterprises, in particular in developing countries, to financial services, including affordable credit, and their integration into value chains and markets

9.c - Significantly increase access to information and communications technology and strive to provide universal and affordable access to the Internet in least developed countries by 2020

Key Performance Indicator (KPI)

9.c.1 - Proportion of population covered by a mobile network, by technology

9.3.1 - Proportion of small-scale industries in total industry value added

[SOLUTION] Please describe your proposed solution

Currently there is no easy way to get access and query the information on the Cardano blockchain.

The existing solutions are:

using an online explorer, that offers an easy, but very limited way to access information on chain (eg. only rewards for an address, transactions etc.)
or spend the time to run a local Cardano node, and a tool that sets up the data to be queryable (eg. db-sync: https://github.com/IntersectMBO/cardano-db-sync). This is indeed a flexible and powerful way of accessing the data, albeit it is high maintenance and has a high entry barrier (one has to run these systems in a very powerful machine).

We want to solve this problem by offering a way that would be easy (you only need to setup a Google project), free (the first Terabyte of query data is free https://cloud.google.com/bigquery/pricing) and versatile (using SQL you can run custom queries, extracting insights from the data).

The current proof of Concept already exports data from a db-sync pipeline (cardano node + db-sync) to a dataset in Google BigQuery, exposing the on chain data. However it is limited in capability: slow, there are discrepancies in the data, the validation occurs only after an epoch ends and does not include all the on chain data.

We would like to bring this project to the next level: faster updates, guaranteed data accuracy, complete on-chain data and insightful analytics on top of them.

[IMPACT] Please define the positive impact your project will have on the wider Cardano community

Earlier stages of this project have been originating in 2022 within IOG’s Data Analytics team and we know of a number of projects in the Cardano community which already build on Cardano_on_BigQuery.

To secure long-term stability of the project we ask for funding to transition to a community owned project which is completely open source.

The more users are involved the greater the potential for future development. The more shoulders that carry the burden of maintaining Cardano_on_BigQuery, the lower the costs.

We believe that this project enables small and mid sized projects on the Cardano blockchain to implement their data layers: scalability at minimal, usage related costs.

[CAPABILITY & FEASIBILITY] What is your capability to deliver your project with high levels of trust and accountability? How do you intend to validate if your approach is feasible?

Both of the team members have been working at IOHK (later known as IOG - the company that built Cardano), for a combined 8 years on the Cardano node and on commercial blockchain solutions. There is already a proof of concept that has been built.

[PROJECT MILESTONES] What are the key milestones you need to achieve in order to complete your project successfully?

M1: Redesign/Optimise data export (2 months)

The current version of the data export from db-sync to BigQuery occurs in slow and sequential manner. The output of this milestone involves parallelising the data export, redesigning certain parts of the process and improving the schema of the data.

The acceptance criteria for this milestone would be to double the speed in which the data is being updated (from data refresh every 30' in the PoC version, to 15').

M2: Setup monitoring (1 month)

The whole system has many moving parts and various things that can fail (a db-sync process, a Cardano node process, CPU/memory utilization, Postgres database).

This milestone involves setting up monitoring and alerting for all the various subsystem's.

The acceptance criteria for this milestone would be:

expose a series of graphs, tracking the various subsystems' health
perform a "fire-drill" alert and demonstrate the alerting mechanism

M3: Continuous data validation (2 months)

In the PoC version the data is being validated at the epoch boundary. This means that we can only be sure about the validity of the data only after the epoch changes. This milestone would involve changing and redesigning the process, so that we can guarantee that the data is valid and accurate constantly.

The way this would be done is by changing the way the updates occur, to be atomic and deterministic.

Acceptance criteria for this milestone would be code in the Github repo.

Acceptance criteria for this milestone would be performing an update that intentionally fails and has no side effects.

Final Milestone: Documentation - Analytics (1 month)

The final milestone output would be detailed documentation on how to use the data: how to connect to the data, how to write optimised and cost effective queries, full data dictionary, as well as provide certain analytics on top of the data.

The deliverable would be documentation in Github pages, example queries, code snippets on how to use the data as well as Data Studio dashboards and BigQuery views offering interesting insights about the data.

[RESOURCES] Who is in the project team and what are their roles?

Alexander Diemand (cardanobigquery ): architecture & design, project management, communication, documentation

Thomas Kaliakos (thomaska): data engineering, data quality responsibility, documentation

[BUDGET & COSTS] Please provide a cost breakdown of the proposed work and resources

Cost breakdown assumptions:

Cost for a data engineer manmonth: 3500$
Cost for cloud infrastructure: 1500$/month

M1:

4 manmonths -> 4500 * 4 = $18000

2 months of cloud infra -> 1500 * 2 = $3000

M1 cost: $21,000

M2:

2 manmonths -> 4500 * 2 = $9000

1 months of cloud infra -> 1500 * 1 = $1500

M2 cost: $10,500

M3:

4 manmonths -> 4500 * 4 = $18000

2 months of cloud infra -> 1500 * 2 = $3000

M3 cost: $21,000

Final milestone:

2 manmonths -> 4500 * 2 = $9000

1 months of cloud infra -> 1500 * 1 = $1500

Final Milestone cost: $10,500

Total cost:

$63000 -> per 1 ADA(₳)=$0.4 -> ₳157500

[VALUE FOR MONEY] How does the cost of the project represent value for money for the Cardano ecosystem?

We are aiming to empower everyone: from an individual Cardano user with basic SQL skills, to companies that have whole teams with data analysts, to gain information and insights by interrogating the data on chain.

We aspire to share the public blockchain data in their most practical form such that each participant can independently work with it. We believe that SQL is the most accessible way of querying data and everybody will find a way in their own setup to connect and query from the BigQuery dataset which is always available, timely updated and accurate.

Running the complete Db-Sync pipeline costs several hundred dollars per month in infrastructure and maintenance. On the other hand, BigQuery offers a free monthly quota of 1 TB queried data and usually incurs no costs if used sparingly.

Team

tkaliakos

Alexander Diemand

Website:https://github.com/Blockchain-Data-Analytics/Cardano_on_BigQuery

All funds