Pearson correlation coefficients

in statistics •  7 years ago 

Karl Pearson was born on March 27, 1857, in London, England. His interest in statistics arose as he proceeded to develop mathematical methods to study the processes of heredity and Darwin's theory of evolution.

Between 1893 and 1912, Pearson wrote his most valuable work: a set of 18 articles entitled Mathematical Contributions to the Theory of Evolution, which contain contributions to regression analysis, the correlation coefficient, and includes the chi-square test of statistical significance. Also, Pearson was responsible for the creation of the term "standard deviation" and the maximum likelihood distribution function.

The concept of correlation coefficient was developed by Pearson along with Galton, but they did not determine a formal definition (STANTON, 2001). Thus, according to Moore (2007), we can define the correlation coefficient as "the correlation that measures the direction and degree of the linear relationship between two quantitative variables".

The correlation coefficient can be summarised in a single sentence "Pearson's correlation coefficient (r) is a measure of linear association between variables". The concept of association is related to the similarity that two variables hold in the distribution of their data, and linearity is related to the assumption that the growth and/or decay of a unit of the (independent) variable X generates the same Impact on the variable Y (dependent). Thus, Pearson's correlation coefficient requires that there be a sharing of variances and that such variations are linearly distributed.

The correlation coefficient is given, according to Pearson and Galton, by the following formula:

where X and Y represent the sample data; and are the means of the variables X and Y, respectively; n number of sample elements; and the variance of X.

Currently, after a simplification in the above formula, we can determine the correlation coefficient using the formula:

where and represent the sample data; and are the means of the variables X and Y, respectively; n number of sample elements; and the sample standard deviations of X and Y, respectively.

Pearson's correlation coefficient variation occurs in the interval [-1, 1], its signal indicates the direction (positive or negative) of the relationship between the variables and its value suggests the strength of the relationship between the variables.

In general, we can standardise the interpretation of the correlation coefficient as follows: the closer to 1 the value is, the greater the degree of dependence between the variables, the closer to zero, the less this dependence.

Pearson's correlation coefficients have some propriety:

  1. Pearson's correlation coefficient does not differentiate between independent variables and dependent variables. Thus the coefficient is the same for both X about Y, and Y about X.
  2. The value of the correlation does not change when changing the unit of measurement of the variables.
  3. The coefficient has a dimensionless character, that is, it is devoid of physical unity that defines it.

Also, some conditions must be satisfied to calculate the correlation coefficient:

a) Correlation requires that the variables be quantitative (continuous or discrete).
b) The observed values must be normally distributed.
c) It is necessary to analyse outliers since the correlation coefficient is strongly affected by their presence.
d) Independence of observations is necessary, that is, the occurrence of one observation does not influence the occurrence of another observation.

According to Osborne and Waters (2002), violation of these conditions may cause the researcher to make type I errors (reject a hypothesis of true experimentation), and type II (not reject a hypothesis of false experimentation), and these are factors that the researcher want to minimise in their experiments.

REFERENCE
STANTON, J. M. Galton, Pearson, and the peas: A brief history of linear regression for statistics instructors. Journal of Statistical Education, v. 9, n. 3, 2001.
MOORE, D. S. The Basic Practice of Statistics. New York, Freeman, 2007.
OSBORNE, J., WATERS, E. Four assumptions of multiple regression that researchers should always test. Practical Assessment, Research & Evaluation, v. 8, n. 2, 2002.

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Congratulations @lahstorti! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

You published your First Post
You made your First Vote

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!

Congratulations @lahstorti! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!

Congratulations @lahstorti! You have received a personal award!

1 Year on Steemit
Click on the badge to view your Board of Honor.

Do not miss the last post from @steemitboard:
SteemitBoard World Cup Contest - Round of 16 - Day 4


Participate in the SteemitBoard World Cup Contest!
Collect World Cup badges and win free SBD
Support the Gold Sponsors of the contest: @good-karma and @lukestokes


Do you like SteemitBoard's project? Then Vote for its witness and get one more award!

Congratulations @lahstorti! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 2 years!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!