Data relating to the Cardano ecosystem is granular and scattered, making it difficult to access and use for analytics or machine learning.
A data hub for the Cardano ecosystem that makes historical and modelled datasets available through multiple access mechanisms.
This is the total amount allocated to Analytics Data Hub.
Problem Overview
Currently data relating to the Cardano ecosystem is available, but is spread out over multiple sources.
DBSync is the most detailed source of on-chain data, but it's not easy or cheap to run, and the data contained is highly normalized, meaning it's often difficult to use the data to gain insights without a high level of knowledge about the database schema. The effort and time required to transform this data is something that any project that consumes this data will need to account for. i.e., this is a repeatable process that doesn't need to be done every time a project needs data.
Additionally there are other sources of data which must be integrated in order to make the data optimally useful, which includes:
Additionally, there is a wealth of data in the transaction metadata which is specific to certain use cases, and is difficult to access without knowledge of how to query JSON data structures.
There are currently several excellent sites which offer pool specific data such as adapools.org and pooltool.io, as well as several block explorers, however none of these sites provide full historical data, custom queries, or data which has been modelled specifically for analysis or machine learning use cases.
Solution
We propose to build the initial MVP of a community data hub which will provide consolidated analytics-ready data to the Cardano ecosystem. We have already begun initial data sets (on-chain data and stake pool data sets), which would be integrated into the single data hub platform.
At a minimum, there will be data available from DBSync and other sources listed above, which have been modelled for various analytics activities. The DBSync data will have additional aggregated views such as the ones in the following repository: https://github.com/cardanocanuck/db-sync-queries
Additionally, we will continue to add special purpose datasets for various domains within the Cardano ecosystem. We have / will be submitting several smaller proposals for specialized datasets to be modelled and developed such as:
The initial MVP Data Hub will allow the download of scheduled CSV data sets. In the future, the range of sharing methods will be expanded. Some of these sharing methods will be:
We will prioritize free community access methods, but some access methods such as direct database access or data sharing may be monetized with a subscription model. The purpose of monetizing premium aspects of the data hub is to fund future ongoing development and enhancement.
This proposal is for the core functionality and backend infrastructure development of this community hub.
Project Plan
The requested funds will cover the first 3 months of development of the platform as well as the first 6 months of running costs.
We propose to follow a hybrid waterfall / agile methodology, starting with some upfront architecture and design and feature planning, followed by 4 sprints of feature development. The project plan will be updated throughout this catalyst process as we find team members and refine our idea and feature set.
See attached diagram.
Budget
The budget we are requesting will fund the first 3 months of development, and 6 months of infrastructure costs.
The approximate budget breakdown by role is as follows:
Total development costs: $48,000
Infrastructure costs estimated at $2000/mo x 6 months = $12,000
Total Budget: $60,000
Core Team Experience
Michael Stewart
Vivek Nankissoor
This solution will address the challenge by providing a starting point for data and analytics projects within the developer ecosystem, removing the overhead of time and effort for creating a usable data set. In addition, developers will not need to ramp up on the nuances of raw data sets (e.g., structures and relationships within DBSync). Instead, they can start with curated data that lends itself to easy integration within developer applications.
Also, this solution will allow for previously funded projects to be integrated into a single place for the aggregation and distribution of curated data sets:
The risks are:
Please see the attachment for the overall project timeline.
Budget
The budget we are requesting will fund the first 3 months of development, and 6 months of infrastructure costs.
The approximate budget breakdown by role is as follows:
Total development costs: $53,000
Infrastructure costs estimated at $2000/mo x 6 months = $12,000
Total Budget: $65,000
Project resource breakdown (see budget breakdown).
Leadership:
Michael Stewart
Vivek Nankissoor
This project will be measured primarily by:
The secondary KPIs may include:
Success is defined as a website where curated datasets are refreshed daily and can be downloaded by data consumers on an ad hoc basis.
Once completed, this project will serve as the foundation for the enablement of data for analytics: modeling, visualization, machine learning, applications specific to particular domains (stake pools, NFTs, etc.), and much more.
The project is a net new project, but will tie in the outputs from previous projects:
It will also provide a platform for future data set development.
The vision for this project is to increase awareness and engagement with Cardano among the data and analytics community. There are many developers who have high levels of excitement and energy for data science, but grapple with the barrier of data curation to apply their skills. This project is intended to remove those barriers.
SDG goals:
Goal 8. Promote sustained, inclusive and sustainable economic growth, full and productive employment and decent work for all
SDG subgoals:
8.3 Promote development-oriented policies that support productive activities, decent job creation, entrepreneurship, creativity and innovation, and encourage the formalization and growth of micro-, small- and medium-sized enterprises, including through access to financial services
Key Performance Indicator (KPI):
8.3.1 Proportion of informal employment in total employment, by sector and sex
#proposertoolsdg
Founders of Cardano Canucks, and Canuckz Publishing with 30+ yrs experience in data infrastructure, analysis and data visualization for enterprises.