Correlation Coefficient Formula
In statistics, correlation is a way of establishing the relationship/association between two variables. In other words, the correlation coefficient formula helps in calculating the correlation coefficient which measures the dependency of one variable on the other variable. Correlation is measured numerically using the correlation coefficient. The correlation coefficient lies between 1 and 1. A negative correlation coefficient indicates that the relationship between two variables is inverse. A positive correlation coefficient indicates that the value of one variable depends on the other variable directly. A zerocorrelation coefficient indicates that there is no correlation between both variables. There are many types of correlation coefficients, among them, the Pearson Correlation Coefficient (PCC) is the most common one. Let us explore how to calculate the correlation coefficient formula for a given population or sample below.
What Is the Correlation Coefficient Formula?
Pearson Correlation Coefficient Formula:
1. Sample Correlation Coefficient
The formula for pearson correlation coefficient for population of size N (written as ρ_{X, Y}) is given as:
\( \rho_{X, Y} = \dfrac{\text{cov}(X, Y)}{\sigma_X \sigma_Y} = \dfrac{\sum^{N}_{i=1}(X_i  \bar{X})(Y_i  \bar{Y})}{\sqrt{\sum^{N}_{i=1} (X_i  \bar{X})^2} \sqrt{\sum^{N}_{i=1} (Y_i  \bar{Y})^2}} \)
where cov is the covariance and \( \left (\text{cov}(X,Y) = \dfrac{\sum^{N}_{i=1}(X_i  \bar{X})(Y_i \bar{Y})}{N}\right) \), σ_{X} is standard deviation of X and σ_{Y} is standard deviation of Y.
Given X and Y are two random variables.
2. Population Correlation Coefficient
The formula for pearson correlation coefficient for sample of size n (written as r_{xy}) is given as:
\( r_{x,y} = \dfrac{\sum^{n}_{i=1}(x_i  \bar{x})(y_i  \bar{y})}{\sqrt{\sum^{n}_{i=1}(x_i  \bar{x})^2} \sqrt{ \sum^{n}_{i=1}(y_i  \bar{y})^2 }} \)
where n is the sample size, x_{i} & y_{i} are the i^{th} sample points and x̄ & ȳ are the sample means for the random variables X and Y respectively.
Given X and Y are two random variables.
3. Linear Correlation Coefficient
It uses pearson's correlation coefficient to determine the linear relationship between two variables. It's value lies between 1 and 1. It is given as:
\(r = \dfrac{ n(\Sigma xy)  (\Sigma x)(\Sigma y) }{\sqrt{[n \Sigma x^2  (\Sigma x)^2][n\Sigma y^2  (\Sigma y)^2]}}\)
where n is the sample size, x_{i} & y_{i} are the i^{th} sample points and x̄ & ȳ are the sample means for the random variables x and y respectively.
The sign of r indicates the strength of the linear relationship between the variables.
 If r is near 1, then the two variables have a strong linear relationship.
 If r is near 0, then the two variables have no linear relation.
 If r is near 1, then the two variables have a weak (negative) linear relationship.
Let us see the applications of the correlation coefficient formula in the following section.

Example 1. Given the following population data. Find the Pearson correlation coefficient between x and y for this data. (Take \(\frac{1}{\sqrt{7}}\) as 0.378)
x 600 800 1000 y 1200 1000 2000 Solution:
To simplify the calculation, we divide both x and y by 100.
x/100 y/100 \(x_i  \bar{x}\) \(y_i  \bar{y}\) \((x_i  \bar{x})^2\) \((y_i  \bar{y})^2\) \((x_i  \bar{x})(y_i  \bar{y})\) 6 12 2 2 4 4 4 8 10 0 4 0 16 0 10 20 2 6 4 36 12 \(\bar{x} = 8\) \(\bar{y} = 14\) \(\Sigma (x_i  \bar{x})^2 = 8\) \(\Sigma (y_i  \bar{y})^2 = 56\) \(\Sigma(x_i\bar{x})(y_i  \bar{y}) = 16\) Using the correlation coefficient formula,
Pearson correlation coefficient for population = \(\dfrac{\Sigma (x_i  \bar{x})(y_i  \bar{y})}{\sqrt{\Sigma (x_i  \bar{x})^2 \Sigma (y_i  \bar{y})^2 }}\) = \(\dfrac{16}{\sqrt{8}\sqrt{56}}\) = \(\dfrac{2}{\sqrt{7}}\) = 0.756
Answer: Pearson correlation coefficient = 0.756

Example 2.
A survey was conducted in your city. Given is the following sample data containing a person's age and their corresponding income. Find out whether the increase in age has an effect on income using the correlation coefficient formula. (Use \(\frac{1}{\sqrt{181}}\) as 0.074 and \(\frac{1}{\sqrt{209}}\) as 0.07)
Age 25 30 36 43 Income 30000 44000 52000 70000 Solution:
To simplify the calculation, we divide y by 1000.
Age (x_{i}) Income/1000 (y_{i}/1000) \(x_i  \bar{x}\) \(y_i  \bar{y}\) \((x_i  \bar{x})^2\) \((y_i  \bar{y})^2\) \((x_i  \bar{x})(y_i  \bar{y})\) 25 30 8.5 19 72.25 361 161.5 30 44 3.5 5 12.25 25 17.5 36 52 2.5 3 6.25 9 7.5 43 70 9.5 21 90.25 441 199.5 \(\bar{x} = 33.5\) \(\bar{y} = 49\) \(\Sigma (x_i  \bar{x})^2 = 181\) \(\Sigma (y_i  \bar{y})^2 = 836\) \(\Sigma(x_i\bar{x})(y_i  \bar{y}) = 386\) Pearson correlation coefficient for sample = \(\dfrac{\Sigma (x_i  \bar{x})(y_i  \bar{y})}{\sqrt{\Sigma (x_i  \bar{x})^2 \Sigma (y_i  \bar{y})^2 }}\) = \(\dfrac{386}{\sqrt{181}\sqrt{836}}\) = \(\dfrac{193}{\sqrt{181}\sqrt{209}}\) = 0.99
Answer: Yes, with the increase in age a person's income increases as well, since the Pearson correlation coefficient between age and income is very close to 1.