Building a high availability steemd node for web apis

in steem •  8 years ago  (edited)

If you're interested in building a steemd node for use with a web application, this is meant to serve as your guide. I will attempt to repost this guide occasionally with updates as requirements/steps change (since editing doesn't work past 12 hrs).

Running a load balanced steemd node for the web apis

Running a node for the web is slightly different than a node you'd use for mining or as a witness. We need all of the features available through the API to serve out every piece of information possible. Many mining/witness nodes turn most of these options off to reduce load on the server.

For a web node, we want all of these enabled! The plugins this guide will enable are as follows:

  • database_api
  • login_api
  • market_history_api
  • tags_api
  • follow_api
  • network_broadcast_api(thanks for the tip @rainman)

In the end, your nodes will run as a load balanced websocket server on port 80 (or 443 if you install a ssl cert, not covered here). Both steemstats and piston's API's follow these conventions.

Current hardware requirements:

As of 8/8, the hardware requirements are as follows. I will attempt to recreate this post with best practices over the coming months as requirements increase.

Dual web node:

  • 4 vCPU
  • 16gb RAM (may exceed this soon)
  • Lots of bandwidth to spare (my node uses between 4mb-25mb+ a second)

If you choose to run just a single node, you can effectively cut those requirements in half.

Assumptions

  1. You have a linux server
  2. The linux os is already installed
  3. You have remote access to the server
  4. You have permissions to install software (you may need to add sudo to some of these commands)
  5. You have a basic understanding of system administration

Building steemd, twice.

This configuration runs two instances of steemd for load balancing and failover purposes. If one of the nodes goes down, the other node should still be available to take requests. This will only build the steemd application (thanks to @rainman for the tip here).

If you're interested in compiling the code faster, replace the 4 in -j4 with the number of vCPUs your machine has.

Building node #1

cd ~
git clone https://github.com/steemit/steem.git steem1
cd steem1
git submodule update --init --recursive
cmake -DCMAKE_BUILD_TYPE=Release CMakeLists.txt
make -j4 steemd

Building node #2

cd ~
git clone https://github.com/steemit/steem.git steem2
cd steem2
git submodule update --init --recursive
cmake -DCMAKE_BUILD_TYPE=Release CMakeLists.txt
make -j4 steemd

Configuring steemd

Before we start running anything, we need to configure our two steemd nodes. Listed below are two configuration examples, one for each node, where the only difference is the rpc-endpoint port number.


steemd node #1

~/steem1/programs/steemd/witness_node_data_dir/config.ini

rpc-endpoint = 127.0.0.1:5090

seed-node=52.38.66.234:2001
seed-node=52.37.169.52:2001
seed-node=52.26.78.244:2001
seed-node=192.99.4.226:2001
seed-node=46.252.27.1:1337
seed-node=81.89.101.133:2001
seed-node=52.4.250.181:39705
seed-node=85.214.65.220:2001
seed-node=104.199.157.70:2001
seed-node=104.236.82.250:2001
seed-node=104.168.154.160:40696
seed-node=162.213.199.171:34191
seed-node=seed.steemed.net:2001
seed-node=steem.clawmap.com:2001
seed-node=seed.steemwitness.com:2001
seed-node=steem-seed1.abit-more.com:2001

enable-plugin = account_history
enable-plugin = follow
enable-plugin = market_history
enable-plugin = private_message
enable-plugin = tags

public-api = database_api login_api market_history_api tags_api follow_api

steemd node #2

~/steem2/programs/steemd/witness_node_data_dir/config.ini

rpc-endpoint = 127.0.0.1:5091

seed-node=52.38.66.234:2001
seed-node=52.37.169.52:2001
seed-node=52.26.78.244:2001
seed-node=192.99.4.226:2001
seed-node=46.252.27.1:1337
seed-node=81.89.101.133:2001
seed-node=52.4.250.181:39705
seed-node=85.214.65.220:2001
seed-node=104.199.157.70:2001
seed-node=104.236.82.250:2001
seed-node=104.168.154.160:40696
seed-node=162.213.199.171:34191
seed-node=seed.steemed.net:2001
seed-node=steem.clawmap.com:2001
seed-node=seed.steemwitness.com:2001
seed-node=steem-seed1.abit-more.com:2001

enable-plugin = account_history
enable-plugin = follow
enable-plugin = market_history
enable-plugin = private_message
enable-plugin = tags

public-api = database_api login_api market_history_api tags_api follow_api

Downloading a snapshot of the blockchain

@fydel offers up a snapshot of the blockchain to help you get sync'd faster. You'll need to download this and place it in the appropriate folders to get started.

You need to do this for each individual node you are running, in both steem1 and steem2 folders.

Automatically starting steemd on boot

It's important to ensure your node is running 24/7. If you're running ubuntu, @steemed wrote a guide that helps you configure it with ubuntu.

You'll have to create two of these, one for each steem node you're setting up. I'd recommend naming them as follows:

  • /etc/init/steem1
  • /etc/init/steem2

Once you have the startup scripts created, start steem1 and start steem2 should start both of your nodes. If you'd like to monitor the progress of both nodes simultaneously, you can use:

tail -f path/to/steem1/programs/steemd/debug.log -f path/to/steem2/programs/steemd/debug.log

You will see the nodes replaying the blockchain and once they are ready, you will see lines like this appear:

2163510ms th_a       application.cpp:439           handle_block         ] Got 2 transactions from network on block 3913580

As more scripts for different distros are created, I'll start adding links to them here or in the next iteration of this guide.

Configuring nginx as your load balancer

I won't go into installing nginx, as you should probably have a basic understanding of how to do this yourself. If you're looking for a package, nginx provides a package for most popular distros.

What we will need though is to configure nginx a little bit. First up, the basic nginx configuration:

nginx config:

/etc/nginx/nginx.conf

events {
  worker_connections 768;
}

http {
  sendfile on;
  tcp_nopush on;
  tcp_nodelay on;
  keepalive_timeout 65;
  types_hash_max_size 2048;

  include /etc/nginx/mime.types;
  default_type application/octet-stream;

  access_log /var/log/nginx/access.log;
  error_log /var/log/nginx/error.log;

  gzip on;
  gzip_disable "msie6";

  limit_req_zone $binary_remote_addr zone=ws:10m rate=1r/s;

  include /etc/nginx/conf.d/*.conf;
  include /etc/nginx/sites-enabled/*;
}

Most of this should already exist in your nginx configuration, but note the limit_req_zone line towards the bottom. This is a measure to help prevent overloading your node by setting up some request throttling.

One more file needs to be added to finish off this configuration, the actual file inside of /etc/nginx/sites-enabled. If this server isn't going to be used for anything else, remove all of the default configurations from that folder and add the following:

nginx vhost config:

/etc/nginx/sites-enabled/default.conf

upstream websockets {
  server 127.0.0.1:5090;
  server 127.0.0.1:5091;
}

server {
    listen 80;
    server_name _;
    root /var/www/html/;

    keepalive_timeout 65;
    keepalive_requests 100000;
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;

    location ~ ^(/|/ws) {
        limit_req zone=ws burst=5;
        access_log off;
        proxy_pass http://websockets;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_next_upstream error timeout invalid_header http_500;
        proxy_connect_timeout 2;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }

}

Reload nginx with service nginx reload and your server should now be responding on port 80 with the steemd nodes you just created. If you load it with a web browser, or just curl http://localhost it should return:

11 eof_exception: End Of File
stringstream
    {}
    th_a  sstream.cpp:109 peek

    {"str":""}
    th_a  json.cpp:478 from_string

(which is totally ok, as your web browser isn't issuing the proper request)

Congrats!

You're now running a fully featured web node that will return content, following information, tags, account history and plenty of other information! Go forth, build awesome things, and help make the steem community even greater! :)

Something missing?

If you know of something that should be included in this guide, please let me know. I'm looking to help build a comprehensive guide so others can start hosting their own web APIs.

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

This is awesome!

Some tips:

  • To compile even faster, only compile steemd: 'make -j4 steemd'
  • To support transactions add 'network_broadcast_api'

Thank you so much, that's awesome.

I'll get these included in the appropriate sections and then get my servers modified for the same.

'Build twice' sounds really funny :) Apparently, you don't really like the --data-dir option

steemd --data-dir /node1
steemd --data-dir /node2

I didn't realize that was a thing! Now I wish I could go back and edit this post lol.

Thats a great point lol

Thanks, @jesta! This will be helpful to all devs! Quick question - why clone the repo twice and build separately? Can we instead just build once and duplicate the whole folder, all built? This will save time on building.

You absolutely could (in fact I did that as well). I just wanted to outline a bullet proof way to make it work. I've had issues with the blockchain getting corrupted when I cp -r the whole folder to another location, and didn't want others to encounter the same problem :)

Haha, maybe the config and make puts the absolute path into the artifacts? Anyway, love nginx and steemstats!! Thanks for all the hard work

What about taking a snapshot of the final VPS, then spawn a few instances of it and load balancing those? Then there'll be multiple nginx nodes, and the setup with have even higher availability!! =D This way, there won't be a case of possible failure of nginx, otherwise that would be the point of failure.

  ·  8 years ago (edited)

I was actually doing this while I was mining a few weeks ago, it worked pretty well. And yes, this would increase availability even further, though it would get pricey pretty fast!

You also have to consider that if you use multiple nginx nodes and proxy them to different servers, all of the bandwidth will be multiplied by the proxy. Not a problem though if you're using internal networking, since most providers don't charge for it.

This is what we developer looking for. Many thanks to @jesta! It deserved thousand upvotes.

  ·  8 years ago (edited)

This is amazing and something I've been wanting to explore further. It should be listed as official documentation to encourage people to set up their own APIs. I'm spreading the word!

Great walk-through! I must say that I'm impressed by Steemit so far, but some of the documentation is a bit lacking. These kinds of posts really help.

It is, I've searched high and low for some things and haven't found a lot. But, that's the kind of stuff I thrive on: deconstructing something and learning how it works.

Hopefully what I've learned will help others get involved!

this is awesome. are you in the bay area? if so, would love for you to join our meetup in a few weeks. we're investors in nginx as well :)

https://steemit.com/steem/@ntomaino/silicon-valley-steem-meetup

I'm not unfortunately, I'm in the LA area. I'd love to join but that would require either a long, long drive or a flight :)

Thank you for posting this. I don't have the requisite skills to use it but I'm sure I will be benefitting from things that people make using your post:)

I usually don't upvote like this, but I use your tools all the time.
They are extremely helpful - Without them, I couldn't even follow people.
Here is my blind upvote - I just assume that this post is good.

  ·  8 years ago (edited)

Hahaha, hey... at least you're being honest :)

This post is primarily to get the instructions out there, so I can reference how my nodes are setup to other people getting involved. Many people use this.piston.rocks and steem.steemstats.com to power their own websites - and we'd like to see others creating nodes as well!

Why clone and build twice? What I do is

mkdir steem1 steem2    
git clone https://github.com/steemit/steem.git src 
cd src && git submodule update --init --recursive
cmake -DCMAKE_BUILD_TYPE=Release .
make -j4 steemd
cp programs/steemd/steemd ../steem1/
cp programs/steemd/steemd ../steem2/

This should save some build time.

I ran into issues with blockchain corruption and the dreaded:

Starting chain with 0 blocks...

I seem to get this error everytime I CP a witness_data_node_dir around. I didn't know how common that would be, so to avoid people saying "hey this broke" I decided to go the long route and just have people create it twice.

But - It's totally possible to do it this way, and it would save time. But if you run into errors like I did, just start over and compile from scratch ;)

Yeah, that makes sense. Thanks.

You have my upvote every time, blind or not :)

OK, this time it is not blind because I read it all through. I did undestand most of it, too :)

I know it is out of the scope of this guide but ... any suggestions on where is the best to host the server(s)?

I was a sysadmin but I am out of it for several years now. Maybe it is time to start relearning the old knowledge and adding new stuff :)

My current goto's are linode, digital ocean or aws. I've also heard good things about google's compute, but haven't used them yet.

TNX. Will check them out.

And now back to php and mysql programming :)

So great. 很好的 steemd node

  ·  8 years ago Reveal Comment

@jesta
url for snapshot of a blockchain has not been updated 10 months, and url returns 404
http://einfachmalnettsein.de/steem-blocks-and-index.zip

Any updates hmm on this very productive and expanding community or just sex and money are actual subject ?

@den @steem @steemit

Hi i am very interested in this.
But i am quite confused. Since there is not only STEEM, STEEM DOLLAR and also STEEM POWER.

There are also from the server side, STEEM nodes, witnesses and the third i forgot. Also all the explanations are quite diffiult. Maybe someone got a good explanation to repost for me ?

So is this essentially, in comparison too to ethereum, we can designate our own node and build whatever I want on top of it? say a petshop? what about ICO's is there capability to write ICO's to interact with the main STEAM network via the node? or vice versa?

are there any frameworks developed like solidity but for STEEM

I am trying to get a handle on the capabilities, tools, and resources. I would like to help getting to develop new tech and possibly help with new deployment frameworks on top of this network I have a good feeling this platform is going to blow up in a short amount of time.

Well, you could, but you're still limited to the capabilities of Steem itself.

Steem itself doesn't actually have a smart contract layer - so you can't write consensus based logic. It's primarily a content (posts) store with voting mechanisms to surface content.

Currently the platform has ballooned as well, and this article is severely out of data. To run a "full node" on the network now requires something like 270GB of RAM, which can be trimmed down slightly by using some of the filtering options. For example, I built a forum interface on top of Steem (chainbb.com, and the Steem API node behind that is currently sitting at 54GB of RAM.

steem-python and steem-js are the two primarily libraries that most people use at this time to interact with the chain, both available under https://github.com/steemit.

We have a discord chat running for steem developers, if you're interested in chatting/asking more questions, feel free to join!

https://discord.gg/bU5fYD

Hi @jesta my witness

Steemian all

Previously I apologize if commented, because it does not match the topic. But I am sure you are a good and caring person, I am very sure you are too great person of course, I am very motivated with you @jesta
You love to travel, on the way you meet abandoned children, I am sure you are a caring, loving and loving person that children can smile at children, it will be nice even though the valentine moment has passed. I am sure you will want to be discouraged, if you do not mind visit my bloq, i hope you can give input to my writing and direct me @jesta
Thanks you so much

Give a little smile (Save the children)