from a handpicked tutor in LIVE 1-to-1 classes
Pearson Correlation Formula
The correlation coefficient is the measurement of the correlation between two variables. Pearson correlation formula is used to see how the two sets of data are co-related. The linear dependency between the data set is checked using the Pearson correlation coefficient. It is also known by the name of the Pearson product-moment correlation coefficient. The value of the Pearson correlation coefficient product lies between -1 to +1. If the correlation coefficient is zero, then the data is said to be not related. A value of +1 indicates that the data are positively correlated and a value of -1 indicates a negative correlation.
What Is Pearson Correlation Formula?
The Pearson correlation coefficient is symbolised by the letter “r”. RephraseThe Pearson correlation formula for the coefficient r is given by:
\(r=\frac{n\left(\sum x y\right)-\left(\sum x\right)\left(\sum y\right)}{\sqrt{\left[n \sum x^{2}-\left(\sum x\right)^{2}\right]\left[n \sum y^{2}-\left(\sum y\right)^{2}\right]}}\)
Where,
\(r=\) Pearson correlation coefficient
\(x=\) Values in the first set of data \(y=\) Values in the second set of data \(n=\) Total number of values
Let's solve a few solved examples based on the Pearson correlation formula.
Solved Examples Using Pearson Correlation Formula
Example 1: A survey was conducted in your city. Given is the following sample data containing a person's age and their corresponding income. Find out whether the increase in age has an effect on income using the correlation coefficient formula. (Use \(\frac{1}{\sqrt{181}}\) as 0.074 and \(\frac{1}{\sqrt{209}}\) as 0.07)
Age | 25 | 30 | 36 | 43 |
Income | 30000 | 44000 | 52000 | 7000 |
Solution:
To simplify the calculation, we divide y by 1000.
Age (xi) | Income/1000 (yi/1000) | \(x_i - \bar{x}\) | \(y_i - \bar{y}\) | \((x_i - \bar{x})^2\) | \((y_i - \bar{y})^2\) | \((x_i - \bar{x})(y_i - \bar{y})\) |
25 | 30 | -8.5 | -19 | 72.25 | 361 | 161.5 |
30 | 44 | -3.5 | -5 | 12.25 | 25 | 17.5 |
36 | 52 | 2.5 | 3 | 6.25 | 9 | 7.5 |
43 | 70 | 9.5 | 21 | 90.25 | 441 | 199.5 |
\(\bar{x} = 33.5\) | \(\bar{y} = 49\) | \(\Sigma (x_i - \bar{x})^2 = 181\) | \(\Sigma (y_i - \bar{y})^2 = 836\) | \(\Sigma(x_i-\bar{x})(y_i - \bar{y}) = 386\) |
Pearson correlation coefficient for sample = \(\dfrac{\Sigma (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\Sigma (x_i - \bar{x})^2 \Sigma (y_i - \bar{y})^2 }}\) = \(\dfrac{386}{\sqrt{181}\sqrt{836}}\) = \(\dfrac{193}{\sqrt{181}\sqrt{209}}\) = 0.99
Answer: Yes, with the increase in age a person's income increases as well, since the Pearson correlation coefficient between age and income is very close to 1.
Example 2: Marks obtained by 5 students in algebra and trigonometry as given below:
\(\begin{array}{|c|c|c|c|c|c|} \hline \text { Science } & 16 & 15 & 12 & 10 & 8 \\ \hline \text { Geometry } & 11 & 18 & 10 & 20 & 17 \\ \hline \end{array}\)
Calculate the Pearson correlation coefficient.
Solution:
Construct the following table:
The formula for Pearson correlation coefficient is:
\(r=\frac{n\left(\sum x y\right)-\left(\sum x\right)\left(\sum y\right)}{\sqrt{\left[n \sum x^{2}-\left(\sum x\right)^{2}\right]\left[n \sum y^{2}-\left(\sum y\right)^{2}\right]}}\) \(r=\frac{5 \times 902-61 \times 76}{ \left.\sqrt{\left[5 \times 789(61)^{2} \| 5 \times 1234-(76)^{2}\right.}\right]}\) \(r=-0.424\)
Answer: r = -0.424
visual curriculum