Last updated a year ago
Operators and Designers need reliable, decentralised connections and better data to understand and optimise individual&overall performance.
The well-known TopologyUpdater API has been running for a long time. BlockPerformance API/metrics can usefully and valuably supplement this.
This is the total amount allocated to Pool Topology and Block Propagation.
TopologyUpdater is an existing API solution that has been repeatedly adapted and improved since the mainnet launch in the summer of 2020 to cope with the increasing number of participating relay nodes. Currently there are 2700 nodes in the mainnet and about 150 in the testnet using TU. Originally intended as a short-term transitional solution for the P2P module, it provides each participating node with 10-20 working peer connections distributed from very near to far around the globe. Old nodes are sorted out by automated quality tests within an hour. Participation is generally free and permissionless. Currently, this creates about 40,000 inter-peer connections for a significantly more decentralised acting Cardano network, supplementing the (few) connections manually configured by SPOs and the star-topology relays provided by IOG. Every day, the TopologyUpdater servers handle about 80,000 requests. Part of this proposal covers the duration of this service for another 12 months.
A second part of this project is yet to come as an additional API service, and will take care of measuring the individual block propagation times in the Cardano network. The realtime aggregation of this data will provide an exceptional overview and comparison. The resulting insights into individual and general optimisation possibilities of the network are very promising. Participating operators can compare the performance of their nodes (topology, network, CPU, configuration) with the average and the best in the entire network. Errors such as unnecessary/incorrect delays in the generation and distribution of blocks from certain pools can be identified and thus addressed quickly and concretely.
The data collected will also be made available for fundamental optimisation projects (research and engineering and application developement) with the aim of significantly increasing the transaction throughput and net capacity of the Cardano Blockchain. The proposal includes the costs for setting up and operating this service for one year.
The TopologyUpdater has been used since 2020 with a steadily increasing number of relays. This number should be maintained in line with the development of the network or continue to grow, at least until the P2P module can take over this task.
The new BlockPerformance API and Dashboard to be developed is currently in the status of a preliminary study. (see illustrated graphic https://cardano.ideascale.com/userimages/accounts/93/936143/panel_upload_48088/TUBlockPerf_dashboard-2ce138.png)
As with the TopologyUpdater, an open source script is to be made available to every SPO free of charge, with which they can easily and reliably enter the timing data of their relays. By participating, the SPO will have free access to the data and findings. The visualization Dashboard will provide web-based interactive gantt timeline diagramms with filtering and sorting functions as well as detailed popup information for each reported blocks propagation progress and time. There will be violin-plot graphs presenting the individual relays performance ranges, to be compared with the network average and best performing relays in each category. Additional graphs and statistics over time will be created and also provided to the public and for other ecosystem members for further analysis, interpretation and use-cases such as application planning and design.
We are aiming for at least 150 participating nodes for the BlockPerf tool, but are designing the system so that 1500 nodes can also participate if there is interest.
The main goal is not to increase the number of relay nodes, but to support the ongoing improvements, quality control and performance tunings. Operators and Application designers should be able to use these graphs, reports and data to form, prove or change their opinion on future topics and discussions.
The collection of data must be done reliably without affecting the basic function and performance of the relays (low memory and CPU usage, no block propagation delays). The data must be made available in agregated, summarised and anonymised form without exposing individual nodes network connections, setups and behaviour. The aim is to develop comprehensible representations that summarise the large amounts of data in a clear and insightful way.
The TopologyUpdater section is already in place, and is to be continued for the period April 2022 to April 2023, if the proposal is accepted.
The BlockPerformance part of the project has completed the pre-study (Nov 2021 - Feb 2022), which showed that the proper full project execution would bring a significant benefit to participating Operators and the whole network. First results will be available from April 2022, and further extended step by step during 2022. The BlockPerformance API service operations are also funded for 12 months from April 2022 - 2023.
TopologyUpdater 12 Months Operation Apr 2022-2023 8,000.-
BlockPerformance Development Feb-Sep 2022 8,000.-
BlockPerformance 12 Months Operation Apr 2022-2023 8,000.-
Markus Gufler CLIO1 Pool operator, Cardano Ambassador, 20 years of Server operations, Network design, and Application maintenance, CTO of an ISP business.
The described services are managed and maintained by an experienced DevOps team, under my guidance.
TopologyUpdater is under permanent monitoring, announced and documented on several places
https://forum.cardano.org/t/topology-service/34093
Support requests document the usage and developement over time, for example in the forum as well as in different Telegram and Discord channels.
https://forum.cardano.org/search?q=topologyupdater
As Part of the widely used cnTools suite, provided by the operators guild, TU is an integrated easy-to-use option. BlockPerformance will be integrated the same way, ensuring updates and reliable client side setups.
https://cardano-community.github.io/guild-operators/Scripts/topologyupdater/
The progress of participating nodes was documented from time to time in a public visible but anonymised form at https://gist.github.com/gufmar/2fdc4f4582015404d19038331c3d5091 (see Revisions) A similar form of documentation, discussion and data presentation is also planned for the BlockPerformance Part.
KPI 1: continue TU participation numbers until P2P is ready
KPI 2: grow BlockPerf participation to at least 150 reporting and globaly distributed nodes
Since BlockPerformance will generate much more general network data than TopologyUpdater, this proposal plans to create a web-based dashboard for this purpose, in which live data including the nr of participating nodes can be seen permanently. This is also recorded over time and is thus available as an ongoing KPI indicator.
KPI 3: provide a public Dashboard showing the ongoing processing and activity based on collected metrics
Apart from the interest of the pool operators through their active participation, an additional goal is to make this infrastructure and performance data available to other already existing or new projects. (Explorers, Blockchain Query Layers, Dapps, Wallets, Research projects)
KPI 4: find at least 1-3 cooperating/integration partner projects.
Reliable connections, a stable mainnet, findings, improvings, a common data- and knowledge-base for different improvement ideas and proposals.
People and Entitities involved in Network, Infrastructure and Application design should consume the aggreagated data, findings and metrics.
No preceding work for TopologyUpdater Service and the BlockPerf study was funded by Catalyst or any other funding. All services and support was provided for free to all participating Operators.
NB: Monthly reporting was deprecated from January 2024 and replaced fully by the Milestones Program framework. Learn more here
TopologyUpdater has been working continuously in the test and mainnet since summer 2020 at 99.999% uptime. As an SPO, I know the needs and challenges that can be optimised. I have 25 years of active work experience in the field of networking, security, and a redundant DevOps team