Last updated 3 years ago
Users hosting independent cardano-db-sync is a repeat process. AÂ centralized database will eliminate redundancy and lower barriers to entry
Load blockchain data into BigQuery to eliminate user's database set-up process, lower cost and enable more complex cross-chain queries.
This is the total amount allocated to Cardano blockchain data on BigQuery.
Load blockchain data into BigQuery to eliminate user's database set-up process, lower cost and enable more complex cross-chain queries.
Full-stack developers, CIO, stake pool operators
Why a centralized database?
As the blockchain matures into later epochs, it will become increasingly costly for the community to maintain separate databases. Hence, we aim to reduce user costs by utilizing community funds to maintain a centralized database.Â
How is this different?
We'll increase accessibility by reducing users' capital expenses, the knowledge required to set up cardano-db-sync, and the time required to sync to the blockchain. Unlike PostgreSQL, BigQuery can perform analytics with complex queries aggregations. This includes cross-chain analysis and determining the most active/popular tokens, contracts, NFT artists, or applications on the blockchain. This gives the community data-driven insights into different areas of adoption on the blockchain.
We are currently running a proof of concept.
We run a cardano-db-sync locally to fetch information from the chain. Data is then extracted and inserted into Google's cloud SQL database which is later accessed with PostgreSQL. The database is updated with the latest rows every 24 hours and exported into CSV and subsequently saved in Google cloud storage.
How will the funds be used?
In addition to the cost of maintaining the database, we will hire developers to write automation code for syncing the latest rows of the blockchain with Google Cloud Storage. To streamline and schedule this workflow, we will utilize Google Cloud Composer service built on Apache Airflow as it is more stable than the current workflow process. Once the exported CSV files are loaded into BigQuery, users can then query the data in the BigQuery console or via an API.
Our code is open-sourced.
Users may choose to replicate our solution for optimization or vary the interval at which the latest data is uploaded to BQ. Our current configuration performs daily extraction of data from the Cardano blockchain including results of smart contract transactions such as token transfers. After which, data is de-normalized and stores date-partitioned data to BigQuery for easy and cost-effective exploration.
This solution has proven to be successful for other blockchains.
BTC and ETH already have their blockchain hosted on BQ. Since then, users have been able to easily perform bulk operations on it. It also has support for user-defined functions (UDF), thus it's easy to run scripts on the whole dataset in seconds. Here's an example of Tomasz Kolinko analyzing 1.2M ETH contracts in 20 seconds with BigQuery and a custom script. https://tinyurl.com/ethcontractsanalysis
The queries are low cost, and zero setup is required by users.
Furthermore, users no longer need to host independent Cardano DB Sync to perform data analytics. Users may sign up for BigQuery and access the database within minutes.Â
Our goal is to have a self-sustaining database that grows with the Cardano community.
The funds will maintain the database for 2 years. After which we'll seek funding via running a Cardano stake pool to maintain the operation of this database.Â
The project will take 3 months.
Key performance indicators:
Perform interesting query from the cardano-db-sync document in Big Query https://tinyurl.com/interestingqueriescardano
Our current pool: https://armadacardano.io/
Details on projected cost, and schemas to load: https://tinyurl.com/bigquerycostprojections
Full-stack developers, CIO, stake pool operators