Last updated 4 years ago
Collection and democratization of data in the Cardano ecosystem will be critical for feeding new ideas and measuring critical project KPIs
Cardano ETL will support transformation of blockchain data into convenient formats like JSON Newline, GCP PubSub, and relational databases.
This is the total amount allocated to Cardano ETL: Public BigQuery Data.
Cardano ETL will support transformation of blockchain data into convenient formats like JSON Newline, GCP PubSub, and relational databases.
I've been working in the mobile gaming space for 8 years creating client and server architectures at petabyte scale with both GCP and AWS
## Problem
Collection and democratization of data in the Cardano ecosystem will be critical for feeding new ideas and measuring critical project KPIs.
## Solution
Cardano ETL will support transformation of Cardano blockchain data into convenient formats like CSV, JSON Newline, GCP PubSub, and relational databases.
Initially we will support exporting CSV, JSON Newline Delimited, GCP PubSub. We will also operate a passive stakepool that streams in realtime to a public BigQuery dataset for everyone to use using PubSub/Dataflow pipeline/BigQuery.
We will expand into additional convenient formats in later phases and milestones likely based on the success of this Fund3 effort (both technology and vlog lessons/amas/deep dives).
If you want instant Cardano data and you rather NOT export the blockchain yourself using `cardanoetl`; please checkout the quickstart below for our realtime public BigQuery data!
Proposal Details (Github): https://github.com/floydcraft/cardano-etl
Proposal Overview (YouTube): https://www.youtube.com/watch?v=QeFCzwNBR5U
Proposal BigQuery use case Examples (YouTube): https://youtu.be/0LtND_PDfQU
## Target Impact
## Auditability
## Feasibility
### Approach
Well great news! IOHK already has a model for how to sync the blockchain to a SQL database https://github.com/input-output-hk/cardano-db-sync .
Initially I'll work quickly to see if Haskell only solution works much like the cardano-db-sync repo currently uses. I suspect that will actually not scale to many target db/formats and allow for a clean/useful implementation, but I need to follow up on this as my first action item.
So, in the case I can't use pure Haskell I'll end up using the serialization lib to load the data from disk into python which will allow the Cardano ETL project to target just about any potential target (BigQuery, Athena, Json, …). This is more or less the Ethereum ETL approach https://github.com/blockchain-etl/ethereum-etl . This might require work to make the serialization lib available in python (TODO).
Worst case I'll end up needing to improve/add features to the serialization lib to allow for the second approach.
All of this will be open source and I welcome contributions / feedback along the way.
One note on the Cardano ETL CLI. The idea is that it supports exporting to all likely formats/streams, but it could be that a limited set is supported via a Haskell CLI and a full set is supported via a python CLI (like PubSub streaming).
### Applicable Skills
### Estimates
### Resourcing
## Future Funding (for both operations and developments)
I've been working in the mobile gaming space for 8 years creating client and server architectures at petabyte scale with both GCP and AWS