Current Cardano data querying is either restrictive, using online data providers, or resource-intensive and complex to setup (using db-sync/cardano-cli), hindering efficient blockchain analysis.
This is the total amount allocated to Cardano on BigQuery: scalably querying Cardano’s authenticated blockchain data on BigQuery.
This project addresses the challenge of providing free and open data access to Cardano’s blockchain. We export all onchain data to BigQuery and back validate it creating a proof of data authenticity.
No dependencies.
All source code for the update and monitoring processes will be openly available under the Apache v2.0 license. Additionally, our documentation will be open source, featuring comprehensive guides on data querying, along with practical examples and code snippets for connecting to BigQuery. We will also make the results of our data validation publicly accessible, ensuring transparency and reliability.
This project addresses the challenge of providing free and open data access to Cardano’s blockchain which constantly grows and requires increasing technology to be thrown at the task. Because the network has agreed on the finalisation of the on-chain data, these data can be shared with everybody in its most practical and cheapest form, trustless. Our back validation procedure builds a proof of data authenticity of the exported data in BigQuery.
This project will have a positive impact on the Cardano community as it provides an accessible and affordable data basis to launch DApps, web3, NFT, transaction explorers, on-chain analytics, and other types of data driven projects with ease.
It can also empower any member of the Cardano community with basic SQL skills to interrogate the Cardano blockchain for specific information of interest.
We have more than two years experience with our own developed Db-Sync Enterprise protocol which minimises downtime of a Db-Sync pipeline.
The export of the Cardano blockchain data to BigQuery has been running for ten continuous months without interruption, until June 2023.
The project team covers all aspects required to achieve the stated goals.
M1: create infrastructure for the project (1 month)
This process entails setting up the servers to run the Cardano node, db-sync, and a Postgres database. The setup includes configuring a fully functional db-sync pipeline, including the mentioned components. The criteria for acceptance will focus on ensuring this pipeline runs smoothly, testing failover capabilities, and implementing comprehensive monitoring.
M2: setup of the export and its continuous processing (1 month)
This milestone involves setting up the export process from db-sync to BigQuery.
The export is split into 2 separate processes: exporting data at the end of every epoch and exporting data every 30 minutes.
The acceptance criteria will testing and validating that both the continuous and the end-of-epoch update, export successfully the data to BigQuery.
M3: deep comparison to create authoritative data equivalence with Cardano’s blockchain (1 month)
This milestone involves setting up the deep comparison process: we will compare the data exported in BigQuery with the data in db-sync using hashing and creating a proof of data authenticity.
Acceptance criteria would be running the deep comparison process successfully for all past epochs and having all the exported data BigQuery validated.
Final Milestone: fully document the update process and the data quality monitoring (1 month)
This milestone involves creating extensive and descriptive documentation of the schema, the export process as well as the monitoring process.
The milestone output will be:
Alexander Diemand (
Thomas Kaliakos (
Bitseat Tadesse (
B1: 50 PD for architecture, design, project management, documentation, communication
B1.1: 10 PD architecture & design
B1.2: 10 PD project management
B1.3: 10 PD documentation
B1.4: 20 PD communication
B2: 20 PD for systems engineering (devops)
B2.1: 10 PD infrastructure setup (redundant hardware, high-availability)
B2.2: 10 PD process monitoring, alerting, mitigation procedures
B3: 55 PD for data engineering
B3.1: 5 PD PostgreSQL optimisations
B3.2: 10 PD BigQuery maintenance
B3.3: 20 PD Update process
B3.4: 20 PD Deep comparison process (back validation)
B4: Infrastructure costs
B4.1: $680 per month for redundant server hardware
(PD = person day; 8 hrs/day; 1 hr = $90)
Sum PD = 125 person days
At rate $90/h, 8 hrs/day: Sum budget PD = $90,000
12 months running costs: Hardware $680 x 12 = $8160
Total Budget: $98,160
Blockchain data is by its definition equal for all participants of the network. So it makes sense to share these data in their most practical form such that each participant can independently work with it. We believe that SQL is the most accessible way of querying data and everybody will find a way in their own setup to connect and query from the BigQuery dataset which is always on.
Trustless data querying is enabled by our back validation which proves that the data in BigQuery really represents the on-chain Cardano blockchain.
Running the complete Db-Sync pipeline amounts to costs of several hundred dollars per month. On the other hand, BigQuery offers a free monthly quota of 1 TB queried data and usually incurs no costs if used sparingly.