Last updated 9 months ago
Free and open data access to Cardano’s blockchain data. Anybody with basic SQL knowledge can get information around specific pools, addresses, transactions fast and free of charge.
This is the total amount allocated to Cardano on BigQuery: open, fast and easy querying of on-chain data.
We have already built a PoC where we export the Cardano on-chain data to BigQuery, using db-sync (see link 2). We will make this PoC a reliable, production-ready solution & move it to the next level.
This project will depend on data provided by Cardano Db-sync (https://github.com/IntersectMBO/cardano-db-sync)
All source code for the update and monitoring processes will be openly available under the Apache v2.0 license. Additionally, our documentation will be open source, featuring comprehensive guides on data querying, along with practical examples and code snippets for connecting to BigQuery. We will also make the results of our data validation publicly accessible, ensuring transparency and reliability.
By offering Cardano's blockchain data for free on the cloud, we open access to information to less privileged people. At the same time we improve sustainability, as the resources that would be needed to access on chain data individually, are now shared.
SDG Goals
9 - Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation
10 - Reduce inequality within and among countries
SDG Subgoals
9.1 - Develop quality, reliable, sustainable and resilient infrastructure, including regional and transborder infrastructure, to support economic development and human well-being, with a focus on affordable and equitable access for all
9.3 - Increase the access of small-scale industrial and other enterprises, in particular in developing countries, to financial services, including affordable credit, and their integration into value chains and markets
9.c - Significantly increase access to information and communications technology and strive to provide universal and affordable access to the Internet in least developed countries by 2020
Key Performance Indicator (KPI)
9.c.1 - Proportion of population covered by a mobile network, by technology
9.3.1 - Proportion of small-scale industries in total industry value added
Currently there is no easy way to get access and query the information on the Cardano blockchain.
The existing solutions are:
We want to solve this problem by offering a way that would be easy (you only need to setup a Google project), free (the first Terabyte of query data is free https://cloud.google.com/bigquery/pricing) and versatile (using SQL you can run custom queries, extracting insights from the data).
The current proof of Concept already exports data from a db-sync pipeline (cardano node + db-sync) to a dataset in Google BigQuery, exposing the on chain data. However it is limited in capability: slow, there are discrepancies in the data, the validation occurs only after an epoch ends and does not include all the on chain data.
We would like to bring this project to the next level: faster updates, guaranteed data accuracy, complete on-chain data and insightful analytics on top of them.
Earlier stages of this project have been originating in 2022 within IOG’s Data Analytics team and we know of a number of projects in the Cardano community which already build on Cardano_on_BigQuery.
To secure long-term stability of the project we ask for funding to transition to a community owned project which is completely open source.
The more users are involved the greater the potential for future development. The more shoulders that carry the burden of maintaining Cardano_on_BigQuery, the lower the costs.
We believe that this project enables small and mid sized projects on the Cardano blockchain to implement their data layers: scalability at minimal, usage related costs.
Both of the team members have been working at IOHK (later known as IOG - the company that built Cardano), for a combined 8 years on the Cardano node and on commercial blockchain solutions. There is already a proof of concept that has been built.
M1: Redesign/Optimise data export (2 months)
The current version of the data export from db-sync to BigQuery occurs in slow and sequential manner. The output of this milestone involves parallelising the data export, redesigning certain parts of the process and improving the schema of the data.
The acceptance criteria for this milestone would be to double the speed in which the data is being updated (from data refresh every 30' in the PoC version, to 15').
M2: Setup monitoring (1 month)
The whole system has many moving parts and various things that can fail (a db-sync process, a Cardano node process, CPU/memory utilization, Postgres database).
This milestone involves setting up monitoring and alerting for all the various subsystem's.
The acceptance criteria for this milestone would be:
M3: Continuous data validation (2 months)
In the PoC version the data is being validated at the epoch boundary. This means that we can only be sure about the validity of the data only after the epoch changes. This milestone would involve changing and redesigning the process, so that we can guarantee that the data is valid and accurate constantly.
The way this would be done is by changing the way the updates occur, to be atomic and deterministic.
Acceptance criteria for this milestone would be code in the Github repo.
Acceptance criteria for this milestone would be performing an update that intentionally fails and has no side effects.
Final Milestone: Documentation - Analytics (1 month)
The final milestone output would be detailed documentation on how to use the data: how to connect to the data, how to write optimised and cost effective queries, full data dictionary, as well as provide certain analytics on top of the data.
The deliverable would be documentation in Github pages, example queries, code snippets on how to use the data as well as Data Studio dashboards and BigQuery views offering interesting insights about the data.
Alexander Diemand (
Thomas Kaliakos (
Cost breakdown assumptions:
M1:
4 manmonths -> 4500 * 4 = $18000
2 months of cloud infra -> 1500 * 2 = $3000
M1 cost: $21,000
M2:
2 manmonths -> 4500 * 2 = $9000
1 months of cloud infra -> 1500 * 1 = $1500
M2 cost: $10,500
M3:
4 manmonths -> 4500 * 4 = $18000
2 months of cloud infra -> 1500 * 2 = $3000
M3 cost: $21,000
Final milestone:
2 manmonths -> 4500 * 2 = $9000
1 months of cloud infra -> 1500 * 1 = $1500
Final Milestone cost: $10,500
Total cost:
$63000 -> per 1 ADA(₳)=$0.4 -> ₳157500
We are aiming to empower everyone: from an individual Cardano user with basic SQL skills, to companies that have whole teams with data analysts, to gain information and insights by interrogating the data on chain.
We aspire to share the public blockchain data in their most practical form such that each participant can independently work with it. We believe that SQL is the most accessible way of querying data and everybody will find a way in their own setup to connect and query from the BigQuery dataset which is always available, timely updated and accurate.
Running the complete Db-Sync pipeline costs several hundred dollars per month in infrastructure and maintenance. On the other hand, BigQuery offers a free monthly quota of 1 TB queried data and usually incurs no costs if used sparingly.