Home Funds overview Fund7 Open Source Developer Ecosystem

ID: # | Status:

Not approved

Last updated 3 years ago

Share this on X Share this on Facebook Share this on LinkedIn Share this on Reddit

Cardano blockchain data on BigQuery

Problem

Users hosting independent cardano-db-sync is a repeat process. A centralized database will eliminate redundancy and lower barriers to entry

Solution

Load blockchain data into BigQuery to eliminate user's database set-up process, lower cost and enable more complex cross-chain queries.

Total to date

This is the total amount allocated to Cardano blockchain data on BigQuery.

$44,661

Total funds requested

Project status:

Complete: 0

In progress: 0

Onboarding

116

Total votes cast

₳27.2M

Votes yes

₳22.8M

Votes no

About this idea

Load blockchain data into BigQuery to eliminate user's database set-up process, lower cost and enable more complex cross-chain queries.

Full-stack developers, CIO, stake pool operators

Why a centralized database?

As the blockchain matures into later epochs, it will become increasingly costly for the community to maintain separate databases. Hence, we aim to reduce user costs by utilizing community funds to maintain a centralized database.

How is this different?

We'll increase accessibility by reducing users' capital expenses, the knowledge required to set up cardano-db-sync, and the time required to sync to the blockchain. Unlike PostgreSQL, BigQuery can perform analytics with complex queries aggregations. This includes cross-chain analysis and determining the most active/popular tokens, contracts, NFT artists, or applications on the blockchain. This gives the community data-driven insights into different areas of adoption on the blockchain.

We are currently running a proof of concept.

We run a cardano-db-sync locally to fetch information from the chain. Data is then extracted and inserted into Google's cloud SQL database which is later accessed with PostgreSQL. The database is updated with the latest rows every 24 hours and exported into CSV and subsequently saved in Google cloud storage.

How will the funds be used?

In addition to the cost of maintaining the database, we will hire developers to write automation code for syncing the latest rows of the blockchain with Google Cloud Storage. To streamline and schedule this workflow, we will utilize Google Cloud Composer service built on Apache Airflow as it is more stable than the current workflow process. Once the exported CSV files are loaded into BigQuery, users can then query the data in the BigQuery console or via an API.

Our code is open-sourced.

Users may choose to replicate our solution for optimization or vary the interval at which the latest data is uploaded to BQ. Our current configuration performs daily extraction of data from the Cardano blockchain including results of smart contract transactions such as token transfers. After which, data is de-normalized and stores date-partitioned data to BigQuery for easy and cost-effective exploration.

This solution has proven to be successful for other blockchains.

BTC and ETH already have their blockchain hosted on BQ. Since then, users have been able to easily perform bulk operations on it. It also has support for user-defined functions (UDF), thus it's easy to run scripts on the whole dataset in seconds. Here's an example of Tomasz Kolinko analyzing 1.2M ETH contracts in 20 seconds with BigQuery and a custom script. https://tinyurl.com/ethcontractsanalysis

The queries are low cost, and zero setup is required by users.

Furthermore, users no longer need to host independent Cardano DB Sync to perform data analytics. Users may sign up for BigQuery and access the database within minutes.

Our goal is to have a self-sustaining database that grows with the Cardano community.

The funds will maintain the database for 2 years. After which we'll seek funding via running a Cardano stake pool to maintain the operation of this database.

The project will take 3 months.

1-month: Host an independent version of IOHK code with custom extraction code to stream data into BQ.
2-month: Upload all datasets into BQ. Synchronization will take around 3 weeks.
3-month: BQ datasets are able to automatically sync with the blockchain.

Key performance indicators:

Perform interesting query from the cardano-db-sync document in Big Query https://tinyurl.com/interestingqueriescardano

Our current pool: https://armadacardano.io/

Details on projected cost, and schemas to load: https://tinyurl.com/bigquerycostprojections

Team

Full-stack developers, CIO, stake pool operators

armada.tmcc

Website:https://github.com/trchopan/cardano-postgresql-bigquery-export

All funds