Hivemind QueriessteemCreated with Sketch.

in hivemind •  6 years ago  (edited)

I installed my own version of Hivemind so that I can play around with accessing some of the blockchain using Postgres. It took a while to sync. But once it did, oh nelly!!

What is Hivemind? The repo says:

Developer-friendly microservice powering social networks on the Steem blockchain.

Hive is a "consensus interpretation" layer for the Steem blockchain, maintaining the state of social features such as post feeds, follows, and communities. Written in Python, it synchronizes an SQL database with chain state, providing developers with a more flexible/extensible alternative to the raw steemd API.

This means that you can bypass steemd and access data in a more traditional way. Often, business solutions use SQL for data. But you can't use SQL on steemd. So Hivemind solves that problem.

Now, in reality, there's another goal to Hivemind, as mentioned in the @steemitblog application team update: Hivemind/Communities, Sign-Ups, Developer Tools, Condenser, and More!:

Work on Hivemind 1.0 remains a major focus. Over the past few weeks, we have been heavily testing compatibility between hive and condenser. We have also committed significant resources to documenting the steps required for Hivemind integration, which will help community developers deploy and take full advantage of hive once it is ready.

Hivemind will facilitate community front-ends. But the 1.0 version doesn't offer much in new features. It takes some of the load off of steemd, which is a great feature that I'm very excited about. But it's hard to convey why this is exciting. It's just a drop-in replacement for something that already works, right? Yes, but in doing so, it takes some of the load off of steemd. And that's a very good thing.

At it's core, it does that by making it easier to query the same data. But in addition to that purpose, if you can run your own Postgres database, you can do some interesting queries.

For example, I want to know what the top 10 apps are, by payout (all time). Well, the query for that looks like this:

SELECT   json::json->>'app', Sum(payout)
FROM     hive_posts_cache
GROUP BY json::json->>'app'
LIMIT 10;

And that gives us the following results:

AppPayout in SBD
steemit/0.11225887.103
busy/2.5.393830.625
dlive/0.186517.867
busy/2.5.463678.215
dtube/0.755127.314
busy/2.5.242774.839
unknown38544.404
steemhunt/1.0.034304.329
esteem/1.6.031151.406
dsound/0.317792.419

Kinda cool, right?

Or, I can query for specific mentions with certain tags:

SELECT hive_posts.*
FROM   hive_posts
WHERE  ( hive_posts.id ) IN (
  SELECT hive_posts_cache.post_id
  FROM   hive_posts_cache
  WHERE  ( hive_posts_cache.body LIKE '%@inertia%' )
  AND    ( hive_posts_cache.body LIKE '%@whatsup%' ))
AND ( hive_posts.id ) IN (
  SELECT hive_post_tags.post_id
  FROM   hive_post_tags
  WHERE  ( hive_post_tags.tag ) IN ( 'community', 'shoutout' ));

That one's saying that each result must have two mentions and either tag, which gives us the following results:

@steemexperience/update-on-the-steem-experience
@wishmaiden/attention-noobs-come-join-the-voice-chat-community-at-steemspeak-com
@arsenal49/400-followers-steemspeak-randowhale
@vocalists-trail/thursday-shoutout
@steemexperience/3uaqax-update-on-the-steem-experience
@steemexperience/45ncqm-update-on-the-steem-experience

Or, lets say we want to query posts that must have all three tags: kitty pet and cute:

SELECT hive_posts.* 
FROM   hive_posts 
WHERE  ( hive_posts.id ) IN (
  SELECT hive_post_tags.post_id
  FROM   hive_post_tags
  WHERE  hive_post_tags.tag = 'kitty')
AND ( hive_posts.id ) IN (
  SELECT hive_post_tags.post_id
  FROM   hive_post_tags
  WHERE  hive_post_tags.tag = 'pet')
AND ( hive_posts.id ) IN (
  SELECT hive_post_tags.post_id
  FROM   hive_post_tags
  WHERE  hive_post_tags.tag = 'cute');

I'm really excited about this kind of query because normally, if we don't use SQL to do this kind of query, we get a huge result. For example, you might want all posts with kitty plus all posts with pet plus all posts with cute which would give you 23,844 results. But because I require all three tags in the result, I only get two:

@seoya/my-lovely-kitty-jelly
@justwatchout/8nin82lj

You can also ask for the most upvoted post (at this very moment):

SELECT hive_posts.*
FROM   hive_posts
INNER JOIN hive_posts_cache ON hive_posts_cache.post_id = hive_posts.id
ORDER  BY hive_posts_cache.rshares DESC
LIMIT  1;

... which is this:

@chbartist/right-before-the-daw

... and the most downvoted post:

SELECT hive_posts.*
FROM   hive_posts
INNER JOIN hive_posts_cache ON hive_posts_cache.post_id = hive_posts.id
ORDER  BY hive_posts_cache.rshares ASC
LIMIT  1;

... which is this:

@joanaltres/re-elfspice-dan-larimer-so-insecure-he-has-to-self-vote-to-put-his-posts-that-already-are-getting-enough-votes-to-the-top-of-trending-20170802t061503811z

Notice that the highest upvoted post is distinct from the highest paid post. This is because the market prices are a factor, as well as quadratic rewards, and the fact that this payout pre-dated the voting slider.

SELECT hive_posts.*
FROM   hive_posts
INNER JOIN hive_posts_cache ON hive_posts_cache.post_id = hive_posts.id
ORDER  BY hive_posts_cache.payout DESC
LIMIT  1;

... which is this:

@xeroc/piston-web-first-open-source-steem-gui---searching-for-alpha-testers

So yeah, I'm excited about Hivemind. It's a great way to look at the blockchain from a community perspective.

Bonus Query:

Here are my 10 most upvoted (ordered by rshares):

@inertia/deer-on-the-dock
@inertia/primer-primer
@inertia/ganymede-a-growing-collection-of-steem-web-tools
@inertia/creating-demand-for-steem-power-vote-negation
@inertia/prisma-pumpkin-patch
@inertia/dr-otto-vote-bidding-bot
@inertia/radiator-0-3-4
@inertia/before-and-after
@inertia/profile
@inertia/steemit-the-blockchain

And my 10 most downvoted:

@inertia/re-dantheman-origin-of-the-right-to-vote-and-how-the-system-denies-this-right-20160813t161354289z
@inertia/re-berniesanders-why-did-ned-lie-to-the-steem-community-20170315t004719152z
@inertia/re-fyrstikken-to-the-co-owners-of-steem-i-am-being-stalked-and-flagged-by-bernieslanders-nextgencraphole-every-post-i-make-on-steem-20170406t055739063z
@inertia/re-ats-david-re-inertia-re-jerrybanfield-i-am-sorry-for-my-last-post-20171010t180440642z
@inertia/them-slashdot-trolls
@inertia/them-java-coders
@inertia/re-berniesanders-re-inertia-re-berniesanders-why-did-ned-lie-to-the-steem-community-20170315t010729361z
@inertia/re-cryptopian68-re-inertia-re-haejin-v4dxybd8-20180205t042726761z
@inertia/re-berniesanders-berniesanders-re-inertiare-haejin-v4dxybd8-20180204t235618208z
@inertia/re-ats-witness-long-time-user-first-time-witness-20171121t175253701z

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Looking sharp.

One thing I am skeptical about is syncing.

It should be rock-solid and catch-up with the chain no matter what the block volume or database size is.

I will certainly play with the project since it's written on my beloved Python.

btw,

It took a while to sync.

How many hours/days?

  ·  6 years ago (edited)

Yup, the sync routine is rock solid. @roadscape really nails it. Initial sync took me about a week. But that’s because I used api.steemit.com. I wanted to see what the typical hobbyist might experience trying this out, so I didn’t try anything heroic on the initial sync.

I also didn’t set any of the recommended Postgres configurations. Same reason. Hobbiests might skip that too.

It took 3 days for me couple weeks back, likely depends on stability of rpc and server specs

Should you and other top witnesses not setting price feed bias percentages, with a SBD Debt Ratio at 6+% ... there is to much steem being printed.

Or is it that you and the witnesses know this an play dumb? Because at 10% the SBD floor will be gone, and a bail-in is just what you guys want so that users and community be pickpocketed?

Who else is going to pay to bring that 15+ million SBD Debt down? Who is going to burn it?

Looking forward to your answer.

Very exciting.
Is it possible to import your database into another instance of hivemind so that a fully sync wouldn't be necessary?

Thanks for sharing!

I think that would be possible. Just have to transfer a 300 GB dump.

Wow, that's big.

Yes @inertia, but someone might say, “I love the idea, but I also love Ruby.” This looks harder than it might need to be. :)

That's an idea. Like some kind of ORM?

Cool post! Thanks for taking time to share your experiences of the upcoming #hivemind update.

I am not a programmer, but I managed to follow most of what you said it was very informative!

Hopefully by making search queries easier, we can see better interactions with the steem blockchain

Thanks,
@kabir88

Hivemind is the next big thing. Until we see it more, and use it more, no one gets it... but everytime I hear about the possibilities of what hivemind can bring.. I understand the importance of it.

Thank you very much for sharing your experience with it. It is very helpful and valuable... I hope you write more about hivemind as you do more with it.

It is the advanced version of steemd,right

No, it's taking on some of the role that steemd currently has. But steemd will always continue to provide a focus on blockchain consensus.

Guess there would be more development that could join hivemind later on

No, Hivemind is not meant to replace steemd but work on top of it. Hivemind is a godsend for those who do not want to rely on SteemSQL, a paid service, for pulling up information from the blockchain to run analytics on the blockchain. Hivemind is also meant to operate not too far behind the latest block so the information will be quite up to date.

Being able to pull this data into a sql database will have massive use implications we can run on steemit. We could see better curation programs, ways to enhance our feeds and find relevant content faster. Hivemind looks very cool, gave you a follow @inertia and look forward to learning more about this initiative.

Saw this comment below from @chekohler and I wonder:

What would it take to be able to make queries like these to hivemind straight from script.js file like we currently get to do with steem.api.getDiscussionsByBlog as an example.

Nice one

Being able to pull this data into a sql database will have massive use implications we can run on steemit. We could see better curation programs, ways to enhance our feeds and find relevant content faster. Hivemind looks very cool, gave you a follow @inertia and look forward to learning more about this initiative.

That's precisely the motivation for building it in the first place.

That's precisely the
Motivation for building
It in the first place.

                 - markkujantunen


I'm a bot. I detect haiku.

@inertia, do you have an idea of the specs required for just the postgres instance? My fellow flag enthusiasts are interested in the prospect of housing our own relational database of any sort and are trying to get a grasp on the requirements.

Resteemed and will revisit to vote this quality contribution. (VP too low)

  ·  6 years ago (edited)

I’ve been doing fine on 32GB, 6 cores. It could probably get by with way less, at the expense of performance, obviously.

I run Postgres on my local dev laptop for testing. It never gets in the way.

From a community perspective...it's okay
I'll like to be a part of it...

Can I run Hivermind on an Ubuntu virtual machine?

Yes. I’m running it on Digital Ocean.

As interesting as they seem...for the clueless like me, steemd is easy☺

Once 'adopted', will it eliminate the use for the (now pay to play) Steem SQL site?

No. Hivemind does not capture everything like SteemSQL does. That’s not the purpose.

I like SteemSQL very much. I’m glad @arcange doesn’t offer his services for free anymore. Hopefully it’s sustainable.

Great work! If I understand the documentation correctly, hive will primarily provide a JSON/http API similar to the steemd condenser_api, possibly with additional methods. Direct database access is only possible because you have it on your own machine, right?
Would you mind to share the disk space usage of the database as it is right now?

Correct, running it on my own gives me acccess to more than just the API endpoints.

Right now, it’s taking up about 310 GB disk.