Steemdev Idea/concept: DNS based querying of a small set of STEEM-API node monitoring agents for dynamic failover.

in utopian-io •  6 years ago  (edited)

Repositories

Components

This idea consists of two distinct components:

  • A simple STEEM API-node monitoring infrastructure.
  • A per STEEM client library best-node lookup hook making use of the monitoring infrastructure.

Proposal Description

The proposal is to come to an agreement amongst STEEM JSON-RPC client/library implementors on the query and response format for both a DNS based and a JSON-RPC based communication protocol for communicating between a STEEM API client library and a small set of STEEM API- node monitoring daemons with the ultimate goal of streamlining dynamic failover for API clients.

The idea consists of:

  • A DNS configuration for some sub domain, configured for dynamic failover
  • A small set of monitoring agents running both a DNS and a JSON-RPC service
  • A piece of easy to install (possibly Docker) software for running above agent.
  • A specification of the communication between monitoring agents and STEEM JSON-RPC API libraries.

DNS Configuration

The dynamic failover setup is quite simple. We take a sub domain and a convention for naming agent instances, and we create both a CNAME and a NS record for each agent instance in the following way:

  • Let's say our domain is timelord.ninja (I'll use this actual domain for now)
  • For each agent, we define a CNAME or A record.
    • steem-monitor-01.timelord.ninja -> epub.timelord-ninja
    • steem-monitor-02 -> ???
    • etc
  • We define an NS records for each of the CNAME/A records that points to a sub domain
    • steem-api.timelord.ninja -> steem-monitor-01.timelord.ninja
    • steem-api.timelord.ninja -> steem-monitor-02.timelord.ninja
    • etc

The monitoring agent

The monitoring agent is to be a piece of software based on this experimental txjsonrpcqueue script. The monitoring script runs an API coverage test on all known full-API nodes every dozen or so minutes and keeps track of:

  • Failures due to API's not being supported
  • Failures below JSON-RPC command level (HTTP errors, certificate errors, etc)
  • The time it takes to complete the set of testing commands.

Based on the results from these tests over, say, the last hour, the agent then provides two lookup services:

  • A JSON-RPC service
  • A DNS service

The DNS service.

The DNS service, being a registered name server for the shared subdomain, allows any monitoring agent capable of returning monitoring data, to respond to queries about suitable STEEM JSON-RPC API nodes that it has been monitoring. It is important to note that the agent won't start up it's DNS service until it has been running for at least a full hour and has gathered enough monitoring data to confidently answer to queries.

The DNS service, once running, will answer both TXT and A queries, where the TXT queries are meant for use by a STEEM JSON-RPC client library. A TXT query is build up as follows:

  • A low level failure filter specification.
  • API-usage pattern
  • API-usage filter
  • sub domain
  • domain

The low level failure specification allows the user to not get responses with node's that files more than a given amount in low level failures in the past hour. The form for this filter consists of a specification of the maximum percentage of failure and the minimum number of minutes since the last low level failure.

For example:

  • p20m10 : Only nodes that gave a failure at most 20% of the time in the last hour, and that didn't produce a failure in the last ten minutes.

An API-usage pattern is defined by one of the following strings:

  • condenser : The user intents to use all API's through the condenser API
  • namespace: The user intents to use all API's through the specific namespaces sub-API.
  • any: The library or user code used is smart enough to do its own failover between condenser and specific API

The API-usage filter consists of a list of specific API's without the _api part separated by a hyphen.

A full TXT record query could look something like this:

  • p20m10.condenser.database-follow-reputation.steem-api.timelord.ninja

The response to this query should be a response time sorted list of nodes that support at least the database, the follow and the reputation API through the condenser API, that didn't produce any low level server failures in the last ten minutes, and that didn't have server side low level failures more than 20% of all tests run in the last hour.

For convenience, the DNS service will also respond to A queries, and if applicable AAAA queries with it's own external IP address. This is meant to extent the DNS service to a simple single method JSON-RPC API that provides exactly the same service that is provided through the DNS TXT record lookups.

Benefits

The benefits of the proposed system and interface, is that it offloads the complex task of node failover to a small set of redundantly running monitoring agents. This should make it considerably easier both for RPC client library authors and for DApp builders, to make applications dynamically fail over to currently the best nodes available.

Feedback

The main reason why I post this now, and not after implementing the monitoring agent, is that I want to put the querying interface out there for developers to comment on. I think the currently proposesed querying design could be quite powerful, but it is possible I am missing problems or opportunities. So if you have any doubts about the sanity or completeness of the proposed interface, please comment below.

And if anyone is interested in running a monitoring agent instance once the code is ready, please also leave a comment or talk to me on Discord.

GitHub Account

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Hello, @mattockfs. Thank you for supporting these projects with your valuable idea. You might want to consider the following when submitting your next idea contribution:

  • It is good that you have used a lot of technical terms on your contribution, consider simplifying them to help external contributor to the project easily understand.
  • Please also consider submitting your idea contribution as an issue to the project it is for. This will enable the project owner/maintainer to easily keep track of the issues.

Thanks again for using Utopian and I am looking forward to seeing your next contribution via Utopian.

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Chat with us on Discord.

[utopian-moderator]

Thank you for your review, @knowledges! Keep up the good work!

Hi @mattockfs!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server

Hey, @mattockfs!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Hello @mattockfs! This is a friendly reminder that you have 3000 Partiko Points unclaimed in your Partiko account!

Partiko is a fast and beautiful mobile app for Steem, and it’s the most popular Steem mobile app out there! Download Partiko using the link below and login using SteemConnect to claim your 3000 Partiko points! You can easily convert them into Steem token!

https://partiko.app/referral/partiko