RE: SBDS to MySQL in Docker

You are viewing a single comment's thread from:

SBDS to MySQL in Docker

in utopian-io •  7 years ago 

Our discussion in my earlier post formed much of this post so thanks again for sharing everything!

So I'm one folder deeper in _data/steem/ running this command specifically:

# du -sh _data/steem/
419G    _data/steem/

I'm also surprised I'm at 419GB (already up 2GB!) and you're at 388GB so here's the exact tables that are at least a GB in size.

# ls -lh _data/steem/ | egrep '[0-9]G'
total 419G
-rw-r-----. 1 polkitd ssh_keys 255G Mar  1 21:43 sbds_core_blocks.ibd
-rw-r-----. 1 polkitd ssh_keys 1.5G Mar  1 21:43 sbds_tx_claim_reward_balances.ibd
-rw-r-----. 1 polkitd ssh_keys  77G Mar  1 21:43 sbds_tx_comments.ibd
-rw-r-----. 1 polkitd ssh_keys  25G Mar  1 21:43 sbds_tx_custom_jsons.ibd
-rw-r-----. 1 polkitd ssh_keys 2.5G Mar  1 21:43 sbds_tx_transfers.ibd
-rw-r-----. 1 polkitd ssh_keys  57G Mar  1 21:43 sbds_tx_votes.ibd

I don't have a good explanation for why we'd see different sizes, so I'm pretty curious about this.

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

I just created a new digitalocean droplet (the 16GB droplet) to try and see what the database size should be, and I'm just following the script you show in this post. Hopefully the database size will end up being the same as yours (currently around 419G) or mine (currently around 370G) and not be some new number. Right now the mysql database size per the size of _data/steem on the new droplet is at 5G, and I plan to periodically check on the progress of the database size over the next few days.

One thing I noticed is that "mysql -h 172.17.0.2 -u root -p" gives the error "-bash: mysql: command not found" and I wanted to bring that to your attention because I think it means you'll want to adjust the script you show in your post.

One other comment I wanted to make regarding tests with droplets and volumes is that the storage volumes that are created when creating a droplet aren't actually deleted when the droplet is destroyed and there is a separate step to destroy a volume. I actually had three $50/month volumes that I wasn't using but still being charged for which I incorrectly assumed were deleted when I deleted the corresponding droplet. Fortunately, however, I noticed them and deleted them after only a few days.

I'll be interested to see where the new droplet comes out, like you said, hopefully not a third new size.

And good catch! I've added INSTALL MYSQL CLIENT UTILITIES to the script here now so that'll be installed as well.

Sorry about the extra volumes, I'm glad you caught that early before the cost added up too much. DigitalOcean gets expensive quickly when you get to the larger sizes.

Both my servers are caught up to block 20410475. The _data/steem on the most recent server shows as 437G and the _data/steem on my other server (the one I suspect to be wrong) shows as 377G. it would be nice if your server was at 437G around block 20410475 too, but I'm thinking you probably aren't because you were at 419G four days ago.

So, I was running some queries, joining between sbds_tx_comments and sbds_tx_votes and found that there was no index on permlink in the sbds_tx_votes table so my join was pretty slow.

First of all, adding an index to a 46GB table is basically impossible so I just created a new duplicate table with no data and added an index on that field, then inserted a few thousand records to compare the performance against the primary table. The difference was significant.

So, now I'm carefully re-inserting the entire sbds_tx_votes into my new sbds_tx_votes_copy table with the additional index.

I stopped SBDS to avoid new writes while this is running, and I have a new table now too! So, it'll probably be a day or two before I'm back up to the head_block again, and my numbers might be a little different now with the index, but I'll post an update when that's back up.

And I think I might throw in an ALTER TABLE sbds_tx_votes ADD INDEX... in my script so this is implemented at the start.

Back up to sync so I wanted to get some updated totals for you.

So, I'm a little confused now. My total size is now lower than yours from two days ago!

# du -sh _data/steem/
431G    _data/steem/

Here's the biggest tables. I thought adding an index to sbds_tx_votes would have more of an impact on size, but when I still had both tables they were very close so I'm not sure if that has any impact here.

# ls -lh _data/steem/ | egrep '[0-9]G'
total 431G
-rw-r-----. 1 polkitd ssh_keys 262G Mar  7 12:34 sbds_core_blocks.ibd
-rw-r-----. 1 polkitd ssh_keys 1.6G Mar  7 12:34 sbds_tx_claim_reward_balances.ibd
-rw-r-----. 1 polkitd ssh_keys  80G Mar  7 12:34 sbds_tx_comments.ibd
-rw-r-----. 1 polkitd ssh_keys  25G Mar  7 12:33 sbds_tx_custom_jsons.ibd
-rw-r-----. 1 polkitd ssh_keys 2.7G Mar  7 12:34 sbds_tx_transfers.ibd
-rw-r-----. 1 polkitd ssh_keys  58G Mar  7 12:34 sbds_tx_votes.ibd

Yes, I'm confused by things too. Here is my current data for my most recent server around block number 20475678. I was very careful to follow the script with this server.

du -sh _data/steem/
440G    _data/steem/

ls -lh _data/steem/ | egrep '[0-9]G'
total 440G
-rw-r-----. 1 polkitd ssh_keys 263G Mar  7 20:04 sbds_core_blocks.ibd
-rw-r-----. 1 polkitd ssh_keys 1.7G Mar  7 20:04 sbds_tx_claim_reward_balances.ibd
-rw-r-----. 1 polkitd ssh_keys  82G Mar  7 20:04 sbds_tx_comments.ibd
-rw-r-----. 1 polkitd ssh_keys  28G Mar  7 20:04 sbds_tx_custom_jsons.ibd
-rw-r-----. 1 polkitd ssh_keys 3.0G Mar  7 20:04 sbds_tx_transfers.ibd
-rw-r-----. 1 polkitd ssh_keys  61G Mar  7 20:04 sbds_tx_votes.ibd

Note: I tried comparing this server with the other server I have running sbds to see if I could identify why one was bigger than the other, and I notice that the data contained in the 'raw' field in the 'sbds_core_blocks' table doesn't seem to match on both servers. For example, the 'raw' field for 'block_num' 20470001 on one server is different than the same raw field on the other server.

Speaking of expensive, this hit my bank account yesterday:

digital_ocean_bill.jpg

I was spinning up massive instances and playing around this month, obviously far more than I thought. This definitely was not in the budget so looks like this will be an interesting month.

Hopefully my mistake is a lesson for others to be aware of their usage and avoid surprises like this.

Wow that is a lot of money to spend playing around. Hopefully it will be well worth it for you in the long run as this opens up so many possibilities. Thanks for the reminder about how costs can quickly add up because I'm spending at a $260/month pace, but I should be able to cut that back after my latest droplet test gives us another number for the total database size. So far after 18 hours the new droplet is at 159G so if everything continues to go well, then we may have another number for total steem mysql database size by tomorrow night.

The difference is actually larger since 388GB is 370G. The issue is most likely on my end, but I don't think I did anything that should cause data to be missing other than have steem_mysql shut down on its own a bunch of times.

ls -lh _data/steem/ | egrep '[0-9]G'
total 370G
-rw-r-----. 1 polkitd ssh_keys 243G Mar  1 22:49 sbds_core_blocks.ibd
-rw-r-----. 1 polkitd ssh_keys 1.4G Mar  1 22:49 sbds_tx_claim_reward_balances.ibd
-rw-r-----. 1 polkitd ssh_keys  65G Mar  1 22:49 sbds_tx_comments.ibd
-rw-r-----. 1 polkitd ssh_keys  17G Mar  1 22:49 sbds_tx_custom_jsons.ibd
-rw-r-----. 1 polkitd ssh_keys 2.3G Mar  1 22:49 sbds_tx_transfers.ibd
-rw-r-----. 1 polkitd ssh_keys  40G Mar  1 22:49 sbds_tx_votes.ibd