Exploratory Analysis: DTube Correlation Hypothesis to test for Independence in Voting

in dtube •  6 years ago 

DTube is a decentralized video streaming app built on top of the STEEM blockchain.  It is currently in receipt of delegation from @misterdelegation as a way to help support and grow the project.  This delegation is used to reward content creators with upvotes that are attached with a value. 

But, do we know if all the votes given by DTube are free from bias and independent?

Respiratory:

https://github.com/dtube

Aim of Analysis 

The aim of this analysis is to explore if there is a correlation between voting by DTube and other data points contained within the blockchain.  Strong correlations would be an indication that DTube voting’s is bias towards that data point, however correlation does not equal causation and a strong positive correlation would be an indicator for further analysis.   Whereas no correlation or very weak correlations would suggest independent voting and no further analysis would be required.

What is Correlation?

Correlation is used to test the relationship between two variables, so it measures how things are related.  Correlation coefficient is a number between -1 and 1 and measures the strength of the relationship.

This type of analysis is useful when you want to establish if there are possible connections between two variables.

Correlation can be charted as shown below, where one variable is charted against a second.  A strong positive correlation will show a line or curve increasing as you move right across the chart as seen in 1 below.

A strong negative correlation will show a line or curve decreasing as you move right across the chart as seen in 4 below. 

 

In addition to charting correlation.  One can also calculate the strength of the relationship.  This is known as the correlation coefficient.

The coefficient ranges between -1 and +1 and quantifies the direction and strength of the linear association between the two variables. The correlation between two variables can be positive (i.e., higher levels of one variable are associated with higher levels of the other) or negative (i.e., higher levels of one variable are associated with lower levels of the other).

The sign of the correlation coefficient indicates the direction of the association. The magnitude of the correlation coefficient indicates the strength of the association.

For example, a correlation of r = 0.9 suggests a strong, positive association between two variables, whereas a correlation of r = -0.2 suggest a weak, negative association. A correlation close to zero suggests no linear association between two continuous variables.

What is hypothesis testing?

A hypothesis is an educated guess about something in the world around us. It should be testable, either by experiment or observation. Hypothesis testing is the use of statistics to test the results of a survey or experiment to see if you have meaningful results. Basically you are testing whether your results are valid by working out the odds that your results have happened by chance. If your results may have happened by chance, the experiment won’t be repeatable and so has little use.

When carrying out hypothesis testing, a hypothesis question and a null hypothesis are put forward.  The null hypothesis is the result of the hypothesis question proving NOT to be true.

Correlation Hypothesis to test for Independence in Voting – Analysis

Test 1:

Null Hypothesis: There is a strong positive correlation between the voting weight used by DTube and the % of app post supported for each creator.

Alternative Hypothesis: There is not a strong positive correlation between the voting weight used by DTube and the % of app post supported for each creator.

Variables:

  • average voting weight used for each creator voted on my DTube.
  • % of posts made to DTube by each creator that is supported by DTube votes.

If the average weight used increases with the % of posts supported there would be a correlation between the weight used and the overall number of posts DTube support for each creator.

Correlation coefficient = 0.22 – suggests a weak positive correlation. Null hypothesis is therefore rejected.

 

Test 2:

Null Hypothesis: There is a strong positive correlation between the voting weight used by DTube and the % of post made by each creator on the block that relate to DTube.

Alternative Hypothesis: There is not a strong positive correlation between the voting weight used by DTube and the % of post made by each creator on the block that relate to DTube.

Variables: 

  • average voting weight used for each creator voted on my DTube.
  • % of all posts made to DTube by each creator that via DTube.

If the average weight used increases with the % of posts there would be a correlation. 

Correlation coefficient = 0.105 – suggests a weak positive correlation. Null hypothesis is therefore rejected. 

 

Test 3:

Null Hypothesis: There is a strong positive correlation between the voting weight used by DTube and the owned STEEM Power by each creator.

Alternative Hypothesis: There is not a strong positive correlation between voting weight used by DTube and the owned STEEM Power by each creator.

Variables: 

  • average voting weight used for each creator voted on my DTube.
  • SP Owned by each creator. 

If the average weight used increases with owned SP there would be a correlation.

Correlation coefficient = 0.1408 – suggests a weak positive correlation. Null hypothesis is therefore rejected.

 

Test 4:

Null Hypothesis: There is a strong positive correlation between the voting weight used by DTube and the account age for each creator.

Alternative Hypothesis: There is not a strong positive correlation between voting weight used by DTube and the account age for each creator

Data points: 

  • average voting weight used for each creator voted on my DTube
  • account age in days for each creator. 

If the average weight used increases with account age in days there would be a correlation 

Correlation coefficient = 0.126 – suggests a weak positive correlation. Null hypothesis is therefore rejected.

 

Test 5:

Null Hypothesis: There is a strong positive correlation between the % of DTube posts supported for each creator and the % of posts made to the blockchain by the creators that are DTube posts.

Alternative Hypothesis: There is not a strong positive correlation between the % of DTube posts supported for each creator and the % of posts made to the blockchain by the creators that are DTube posts.

Variables: 

  • % of Dtube posts supported for each creator by DTube.
  • % of all posts made to the block by each creator that are DTube posts.

If the % of DTube posts supported for each creator by DTube increases with the % of all posts made to the block by each creator that are DTube posts there would be a correlation .

Correlation coefficient = 0.07 – suggests a very weak positive correlation. Null hypothesis is therefore rejected.

 

Test 6:

Null Hypothesis: There is a strong positive correlation between the % of DTube posts supported for each creator and the SP owned by each creator.

Alternative Hypothesis: There is not a strong positive correlation between the % of DTube posts supported for each creator and the SP Owned by each creator.

Variables: 

  • % of DTube posts supported for each creator by DTube.
  • SP Owned by each creator .

If the % of DTube posts supported for each creator by DTube increases with the SP owned there would be a correlation.

Correlation coefficient = -0.013 – suggests a extremely weak negative correlation. Null hypothesis is therefore rejected.

 

Test 7:

Null Hypothesis: There is a strong positive correlation between the % of DTube posts supported for each creator and the age of account in days.

Alternative Hypothesis: There is not a strong positive correlation between the % of DTube posts supported for each creator and the age of account in days.

Variables: 

  • % of DTube posts supported for each creator by DTube.
  • The age of account in days.

If the % of DTube posts supported for each creator by DTube increases with the age of account in days there would be a correlation. 

Correlation coefficient = -0.019 – suggests an extremely weak negative correlation. Null hypothesis is therefore rejected.

Conclusion

From the 7 hypothesis tests, we have not found any strong correlations between the number of votes or the weight of the vote given and any of the comparable data points. All of the null hypothesis have been rejected. This is a strong indication that DTube vote independently of these variables and are not bias towards account types.  

Correlations found are extremely weak and below a threshold that would require further analysis. As all null hypothesis were rejected, no deeper analysis work is required.

The Data and the queries

The data was collected and transformed and modeled using Power BI, however Power BI does not include the functions needed for Statistical analysis like the one above.  Therefore, I then transferred the modelled data into Excel to carry out the correlation and produce the visualizations.

All data was collected using SteemSQL held and managed by @arcange 

 The M queries used to collect and transform the data are:

All Posts

let
    Source = Sql.Database("vip.steemsql.com", "DBSteem", [Query="select id, author, created, category, total_payout_value from comments#(lf) where CONVERT(DATE,created) BETWEEN '2018-07-01' AND '2018-08-01' and depth = 0"]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"created", type date}})
in
    #"Changed Type"

 

Account Data

let
    Source = Sql.Database("vip.steemsql.com", "DBSteem", [Query="select  name, created, vesting_shares, delegated_vesting_shares, received_vesting_shares from accounts#(lf) "]),
    #"Replaced Value" = Table.ReplaceValue(Source,"VESTS","",Replacer.ReplaceText,{"vesting_shares", "delegated_vesting_shares", "received_vesting_shares"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Replaced Value",{{"vesting_shares", type number}, {"delegated_vesting_shares", type number}, {"received_vesting_shares", type number}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Changed Type1",{{"created", type date}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "controlling vesting shares", each [vesting_shares]+[received_vesting_shares]-[delegated_vesting_shares]),
    #"Changed Type2" = Table.TransformColumnTypes(#"Added Custom",{{"controlling vesting shares", type number}}),
    #"Added Custom1" = Table.AddColumn(#"Changed Type2", "Controlling SP", each [controlling vesting shares]*.000495),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Owned SP", each [vesting_shares]*.000495),
    #"Changed Type3" = Table.TransformColumnTypes(#"Added Custom2",{{"Controlling SP", Currency.Type}, {"Owned SP", Currency.Type}})
in
    #"Changed Type3"

Posts via APP/TAG

let
    Source = Sql.Database("vip.steemsql.com", "DBSteem", [Query="select  id, author, created, category, total_payout_value, json_metadata from comments#(lf) where CONVERT(DATE,created) BETWEEN '2018-07-01' AND '2018-08-01' and depth = 0"]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"created", type date}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each Text.Contains([json_metadata], "Dtube"))
in
    #"Filtered Rows"

Votes by APP

let

    Source = Sql.Database("vip.steemsql.com", "DBSteem", [Query="select  *#(lf)from TXvotes#(lf) where CONVERT(DATE,timestamp) BETWEEN '2018-07-01' AND '2018-08-01'#(lf)and voter = 'dtube' "]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"timestamp", type date}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "% weight", each [weight]/10000),
    #"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"% weight", Percentage.Type}})
in
    #"Changed Type1"


Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  
  ·  6 years ago (edited)

Hi @paulag

Well this is good news!

Thank you for the detailed explanation of correlation and hypothesis, adding this detail in prior to the analysis certainly helps with understanding of the contribution.

I'm glad to observe that there are only weak and extremely weak correlations observed for all data points tested. No doubt @dtube staff have taken a look at this and are also happy with the results found.

I enjoyed this exploratory analysis contribution!

Asher [analysis - community manager]


Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Thank you for your review, @abh12345!

So far this week you've reviewed 5 contributions. Keep up the good work!

Hot diggity damn, that is a wellspring of information there @paulag.

So the TLDR / my understanding is that dtube votes without any apparent bias?

Fantastic news! Would love to see this kind of analysis ran on the top witnesses and where the votes come from (for the witnesses) / the votes coming from them (in the form of outgoing votes) to see if there is any correlation there.

Much love.

You know me and my love of numbers lol, so I tried desperately to understand it all, and truthfully, skipped to the conclusion/ cut to the chase haha. I am seriously very happy to see these results though @paulag; it's nice to know there's no bias with their voting!

  ·  6 years ago (edited)

Interesting analysis. It would be great to see this data for all the big apps.

however Power BI does not include the functions needed for Statistical analysis like the one above

Sounds like a good use case to try out the R integration with Power BI 😀

Suatu saat saya akan menjadi seperti kamu @paulag. Dapat menganalisa berbagai sistem dan membagikan menggunakan utopian-io. Ini cita-cita saya. Sekalipun itu mimpi tidak apa-apa.


One day I will be like you @paulag. Can analyze various systems and share using utopian-io. This is my dream.
Even though it's a dream. no problem. Hehehehe

Posted using Partiko Android

Hey @paulag
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!