Last updated a year ago
This project exports Cardano’s blockchain data to BigQuery and makes it available to the public for free: trustless data usage, scalable queries, stunning dashboards within everyone's reach.
We export all data from Db-Sync to BigQuery and back validate it to create a proof of data authenticity. The code is open source and all code and data including meta data is openly available.
This is the total amount allocated to Cardano on BigQuery: scalably querying Cardano’s authenticated blockchain data on BigQuery.
Thomas Kaliakos (@thomaska)
Bitseat Tadesse (@bitseatt)
No dependencies.
Project will be fully open source.
This project addresses the challenge of providing free and open data access to Cardano’s blockchain which constantly grows and requires increasing technology to be thrown at the task. Because the network has agreed on the finalisation of the on-chain data, these data can be shared with everybody in its most practical and cheapest form, trustless. Our back validation procedure builds a proof of data authenticity of the exported data in BigQuery.
This project might have a great impact on the Cardano community as it provides an accessible and affordable data basis to launch DApps, web3, NFT, data analytics, and other projects with ease.
Earlier stages of this project have been originating in 2022 within IOG’s Data Analytics team and we know of a number of projects in the Cardano community which already build on Cardano_on_BigQuery.
To secure long-term stability of the project we ask for funding the transition to a community owned project which is completely open source.
The more users are involved the greater the potential for future development. The more shoulders that carry the burden of maintaining Cardano_on_BigQuery the lower the costs.
We believe that this project enables small and mid sized projects on the Cardano blockchain to implement their data layers: scalability at minimal, usage related costs.
As success for this project we consider the raising number of great projects developing on Cardano. This project enables DApps, web3, transaction explorers, on-chain analytics, and other types of adoption related works.
We are looking for testimonials of users of the BigQuery dataset. Maybe through our Twitter account: @CardanoBigQuery
There are no metrics available to determine the number of users connecting to the public BigQuery dataset. We can however get some access statistics from the back validation dashboard. And, we hope to get magnificent contributions in the Github repository.
We are going open source from day one as all development and documentation is happening in our public Github repository.
All data in BigQuery is publicly available.
All meta data and the results of the back validation is publicly accessible as well as the dashboard summarising the results.
We have more than two years experience with our own developed Db-Sync Enterprise protocol which minimises downtime of a Db-Sync pipeline.
The export of the Cardano blockchain data to BigQuery has been running for ten continuous months without interruption, until June 2023.
The project team covers all aspects required to achieve the stated goals.
G1: Continuous update of the BigQuery dataset with new block data from Cardano’s chain
Feasibility: back validation as a proof of data authenticity
G2: Open source the code and document the processes
Feasibility: Github repo with wiki
G3: Communication with users of the data and the Cardano community
Feasibility: feedback from the community, collaboration with users
M1: create infrastructure for the project (1 month, in-progress)
M2: setup of the export and its continuous processing (1 month)
M3: deep comparison to create authoritative data equivalence with Cardano’s blockchain (1 month)
M4: fully document the update process and the data quality monitoring (1 month)
D1: BigQuery dataset of Cardano blockchain data publicly available
D2: Open source code repository containing all code of the update and monitoring processes
D3: Metadata and dashboard of back validation publicly available
D4: complete documentation
D5: continuous communication with the community on the update of the BigQuery dataset
B1: 50 PD for architecture, design, project management, documentation, communication
B1.1: 10 PD architecture & design
B1.2: 10 PD project management
B1.3: 10 PD documentation
B1.4: 20 PD communication
B2: 20 PD for systems engineering (devops)
B2.1: 10 PD infrastructure setup (redundant hardware, high-availability)
B2.2: 10 PD process monitoring, alerting, mitigation procedures
B3: 55 PD for data engineering
B3.1: 5 PD PostgreSQL optimisations
B3.2: 10 PD BigQuery maintenance
B3.3: 20 PD Update process
B3.4: 20 PD Deep comparison process (back validation)
B4: Infrastructure costs
B4.1: $680 per month for redundant server hardware
(PD = person day; 8 hrs/day; 1 hr = $90)
Sum PD = 125 person days
At rate $90/h, 8 hrs/day: Sum budget PD = $90,000
12 months running costs: Hardware $680 x 12 = $8160
Total Budget: $98,160
Blockchain data is by its definition equal for all participants of the network. So it makes sense to share these data in their most practical form such that each participant can independently work with it. We believe that SQL is the most accessible way of querying data and everybody will find a way in their own setup to connect and query from the BigQuery dataset which is always on.
Trustless data querying is enabled by our back validation which proves that the data in BigQuery really represents the on-chain Cardano blockchain.
Running the complete Db-Sync pipeline amounts to costs of several hundred dollars per month. On the other hand, BigQuery offers a free monthly quota of 1 TB queried data and usually incurs no costs if used sparingly.
Alexander Diemand (@cardanobigquery): architecture & design, project management, communication, documentation
Thomas Kaliakos (@thomaska): data engineering, data quality responsibility, documentation
Bitseat Tadesse (@bitseatt): data science, social networks, documentation