What are the Nash Equilibria in the curation game?

in steemit •  6 years ago  (edited)

Steem pays curation weighted by sqrt( P + R ) - sqrt( P ), where P is the rshares of the previous votes and R is the shares of the current voter. But, curation rewards are multiplied by min( 1, T/15 ), where T is the number of minutes past the post that the voter acts. (I'm not 100% clear whether it's just that voter or the entire curation reward.)

We can build a simplified model of this as an N-player game where all players have equal voting weights. The pure strategies then consist of a vector of N-1 numbers: at what minute to vote, given that "i" players have upvoted already. (If all other players have already voted, then wait for the 15 minutes mark.)

For example, in a 5-player game the strategy [13,14,15,15] says to vote at 13 minutes if there are 0 votes, 14 minutes if there is 1 vote, and at 15 minutes if there are 2, or 3 votes already. This gives 16^4 = 65536 pure strategies.

I was thinking about whether it would be feasible to simulate this game using fictitious play, but that number of pure strategies leads to a very big matrix, and the simplifying assumption that all opponents use the same pure strategy probably leads to incorrect results, so we would end up with an even larger number of pure-strategy evaluations to iterate on a mixed strategy.

However, I think that we can look for Nash Equilibria in the game. These are pure strategies where no player can individually make a move that improves their payout.

In the 2-player game, we can easily see that [15],[15] is not a Nash equilibrium. A player who moves to (14) will see an increased payout. The payout for being first is 0.707 of the curation rewards, while second is 0.293. Voting at minute 14 is worth 0.660 (=0.707 * 14/15) when the other player votes at minute 15, compared to an expected fair split of 0.500 if you both vote at minute 15.

A quick spreadsheet model:

Because the game is not zero-sum, equilibria are not guaranteed to exist and a quick eyeball search didn't turn up any. None of the diagonals are; when they are high, the player or his opponent is better off switching to a lower value, and when they are low, the player is better off giving up and voting at 15 minutes.

[9],[9] is better than [15],[9] for the player, but [8],[8] is worse for the player than [15],[8]. But the opponent is tempted to move later. So, there do not appear to be any pure-strategy equilibria. (And it's way to late for me to calculate the mixed-strategy equilibrium.)

Update

The spreadsheet above assumed that the second player would still vote rather than wait for minute 15. The one below fixes that:

I also calculated the time value that would make the player indifferent between tying and waiting, which is 8.79 minutes. But tying at this value still makes the player motivated to try for 8 minutes instead, so (8.79,8.79) is not an equilibrium.

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

sqrt( P + R ) + sqrt( P )

I think this should be weight = sqrt( P + R ) - sqrt( P )

I'm not 100% clear whether it's just that voter or the entire curation reward.

The weight of the individual votes is scaled down with the reverse auction system, but the total_weight is calculated without the reverse auction system. The rshares remain unchanged from the reverse auction system. This means that for posts with votes before 15 mins, the sum of the individual vote weights is less that the total_weight. Curation rewards (the STEEM-equivalent of 25% of all rshares) for the individual voters are then payed out with the fraction weight/total_weight, leaving a share of the curation rewards in the pool.

Fixed the plus to minus, thank you. Will have to figure out if more accurately modeling the 15 minute interval makes any difference.

While rewards are still pending, the weights do add up to 100% when you include the author-curate-reward % (which since HF20 goes back to the reward pool).

I guess you mean the weight-% and author-curate-reward-% values as shown on steemd? These values are calculated by the steemd website and are not directly part of the blockchain data. They are calculated from the individual vote weights in relation to the total vote weight and, to my understanding, the author-curate-reward % is exactly a result of the mismatch between the sum of the individual vote weights and the total weight.

This is incorrect, I think. These are the the payouts for an unresponsive strategy rather than always waiting for 15 minutes if you're not first.

I think the expression max(1, T/15) should be min(1, T/15). That is, after 15 minutes, it should be 1.

Yes, thank you.

I studied this some while ago, there should not be an equilibrium. If your opponents are playing optimally it is always better to self-vote. The time to self-vote depends if a minimum reward of 0.001 in curation can reached by anyone voting before you. So given your sp we can compute when you should self-vote to make maximum profit and such that your strategy cannot be attacked.

I'm looking at just the case for voting on a post not authored by any of the players. This is unrealistic in other ways, for example, it ignores opportunity cost of voting on something else instead.

Optimal time in the 2 player game may not be an integer number of minutes, but working with countably or uncountably many strategies was something I hoped to avoid.

in that case there will be an equilibrium as long as you can squeeze all the votes before the 15 minutes I guess





This post has been voted on by the SteemSTEM curation team and voting trail in collaboration with @curie.

If you appreciate the work we are doing then consider voting both projects for witness by selecting stem.witness and curie!

For additional information please join us on the SteemSTEM discord and to get to know the rest of the community!