RE: Curating for Value: How "Follower Network Strength" Improves Steem Post Ranking

You are viewing a single comment's thread from:

Curating for Value: How "Follower Network Strength" Improves Steem Post Ranking

in hive-127586 •  6 months ago 

Follower networks are comprised of active and inactive followers, so all follower counts are not the same. For example, someone who raised 2,000 followers in six months should probably get a higher score than someone who raised 2,000 followers in six years, since a higher percentage of the 2nd author's followers are likely to be inactive.

In principle, I would agree with this, because it can be assumed that new followers are constantly being added to active (even older) accounts.

I have to point out an older quote from you in this context:

I'm seeing that it's probably far too easy to get a top score.

Wouldn't your new approach with a bunch of spam accounts lead to the top even more easily?

and then average the number of followers into a time-based average over the life of the account (i.e. new followers per day, new followers per week, new followers per month, etc.).

I was wondering how you want to determine the number of followers per day (i.e. for a certain period of time), etc. I probably had the same difficulties in understanding as event-horizon. However, a look at your new code revealed to me that you calculate the number over the entire account lifetime.

It would be more interesting and comparable to look at the same time period for all accounts, but this would not be possible with the existing blockchain methods (or only with more network load).
However, Steemchiller's data service could be useful here once again. Among other things, it offers a follower history that can also be filtered by time period:
https://sds0.steemworld.org/followers_api for Follow and Unfollow

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Wouldn't your new approach with a bunch of spam accounts lead to the top even more easily?

So far, that doesn't seem to be the case. In general, it has lowered the scores on most of the accounts that I've checked. The big problem with this new scoring has been that there is sort of a dumbbell shaped distribution with a number of the big/old/inactive accounts still retaining a top score. The more recent accounts that I've checked all look fairly reasonable, but there are a number of accounts who still appear to have strong follower networks, even though they've been inactive for 4+ years.

However, Steemchiller's data service could be useful here once again. Among other things, it offers a follower history that can also be filtered by time period:
https://sds0.steemworld.org/followers_api for Follow and Unfollow

Thanks! This might be good way of dealing with that dumbbell shape. I might might not get that in until September, though. I think my next step is to write up a script to collect all the accounts for one day, then score and visualize them so I can gauge the score distribution for the accounts who are actively posting. I'm doubtful about whether I'll find time for both before the end of June.

but there are a number of accounts who still appear to have strong follower networks, even though they've been inactive for 4+ years.

Das wäre allerdings nicht nachvollziehbar.

I might might not get that in until September

There's no rush. I also don't have much time at the moment (when it's warmer outside). There are soo many other tasks on my list :-))

collect all the accounts for one day

Di you really mean all accounts? That would of course be a good basis, especially if the activity is recorded at the same time. Once you have the data, you can also work better with minimal changes without having to request the data every time...

Here are two visualizations of all the accounts that posted or commented during a 24 hour period from early yesterday until early today using the scoring that I had posted last weekend. With my setup, it takes about 6 hours to collect and score a day's worth of accounts. The median score is about 0.225.

Followers / month across the X-axis, median follower reputations along the Y. Grey is the low end of the scale, orange is the top end.

In this visual, the bigger circles have more accounts in the range. The color gradients are the same as above.

Most of the circles outside the main cluster are size=1. The fair-sized circle at [0, 0] was sort of a surprise. I didn't check, but I'm guessing that the big groups with the 61 median rep and low follower counts are nearly all followed by the same account.

I was able to correctly guess the one in the [392-408, 24-26] bucket. I'm guessing that the 0.01 score tells us something about how the account got its ~33k followers.

I think the score already works quite well. The extreme cases at the edges (top and left) will probably be interesting and important for the "settings"... the clusters should be equalised with the "settings" in order to be able to differentiate them better.

I find the value at [48,24] remarkable. Although the follower median is very low and the number of monthly followers is not very high, it has a high score. This must then probably compensate for the low median with an above-average number of followers.

I also find it interesting that the score in the "columns" of the X-axis differs only slightly along the Y-axis. I'm not sure what this means. The score seems to be more influenced by the median than the total number of followers. Does this indicate that over all accounts, the number of new followers could be almost constant, and thus the age of the accounts (for active accounts) could have almost no influence?

I think the score already works quite well.

Agreed. There's certainly still room for improvement, but this feels much better to me than the previous version. The big thing that I wanted to do was make it harder to get a top-score, and I think I probably accomplished that. IMO, top scores should be pretty rare. In general, I also think that the overall distribution of scores is much better than before, but I can't be sure based on the spotty checking that I did with the May 18 version.

I just updated my python script & PowerBI report so I can collect a day's worth of comments now and score them with both the old and new methods and then compare the visualizations. I'll run it overnight so that over the weekend I'll be able to see what the two methods look like side by side with the same set of accounts from a full day of posting/commenting.

I find the value at [48,24] remarkable.

Yeah, I'm not sure what I think about this one. On one hand, 48 followers per month (the real value is 50, but it's in the 48 bin) really is uncommonly high, so the score might not be unreasonable. On the other hand, the follower quality still seems low. I reduced the reputation cut-off back down from 30 to 25 in this version. I might bump it back up, at least part way.

Does this indicate that over all accounts, the number of new followers could be almost constant, and thus the age of the accounts (for active accounts) could have almost no influence?

This is kind-of what I had in mind by switching to followers per month. The hope was that an author would need to keep gaining followers with similar reputations in order to maintain their score. If an account stops gaining followers (or slows down), then the score would decay. As implemented, it seems to me that new accounts might have a small advantage, but that might not be too bad. It could potentially give a little bit of a boost to new arrivals.

TEAM 5

Congratulations! This post has been upvoted through steemcuratorXX We support quality posts, good comments anywhere, and any tags.



Curated by : @soulfuldreamer