Data relating to the Cardano ecosystem is granular and scattered, making it difficult to access and use for analytics or machine learning.
A data hub for the Cardano ecosystem that makes historical and modelled datasets available through multiple access mechanisms.
This is the total amount allocated to Cardano Analytics Data Hub.
Problem Overview
Currently data about the Cardano ecosystem is available, but is spread out over multiple sources.
DBSync is the most detailed source of on-chain data, but it's not easy or cheap to run, and the data contained is highly normalized, meaning it's often difficult to use the data to gain insights without a high level of knowledge about the database schema. Additionally there are other sources of data which must be integrated in order to make the data optimally useful, this includes:
Additionally, there is a wealth of data in the transaction metadata which is specific to certain use cases, and is difficult to access without knowledge of how to query JSON data structures.
There are currently several excellent sites which offer pool specific data such as adapools.org and pooltool.io, as well as several block explorers, however none of these sites provide full historical data, custom queries, or data which has been modelled specifically for analysis or machine learning use cases.
Solution
We propose to build the initial MVP of a community data hub which will provide consolidated analytics-ready data to the Cardano ecosystem.
At a minimum, there will be data available from DBSync and other sources listed above, which have been modelled for various analytics activities. The DBSync data will have additional aggregated views such as the ones in the following repository: https://github.com/cardanocanuck/db-sync-queries
Additionally, we will continue to add special purpose datasets for various domains within the Cardano ecosystem. We have / will be submitting several smaller proposals for specialized datasets to be modelled and developed such as:
The initial MVP Data Hub will allow the download of CSV data sets. In the future, the range of sharing methods will be expanded.
Some of these sharing methods will be:
We will prioritize free community access methods, but some access methods such as direct database access or data sharing may be monetized with a subscription model. The purpose of monetizing premium aspects of the data hub is to fund future ongoing development and enhancement.
This proposal is for the core functionality and backend infrastructure development of this community hub.
Project Plan
The requested funds will cover the first 3 months of development of the platform as well as the first 6 months of running costs.
We propose to follow a hybrid waterfall / agile methodology, starting with some upfront architecture and design and feature planning, followed by 4 sprints of feature development. The project plan will be updated throughout this catalyst process as we find team members and refine our idea and feature set.
See attached diagram.
Budget
The budget we are requesting will fund the first 3 months of development, and 6 months of infrastructure costs.
The approximate budget breakdown by role is as follows:
Total development costs: $48,000
Infrastructure costs estimated at $2000/mo x 6 months = $12,000
Total Budget: $60,000
Core Team Experience
Michael Stewart
Vivek Nankissoor
Founders of Cardano Canucks and Canuckz NFT with 30+ yrs experience in data infrastructure, analysis and data visualization for enterprises.