The Product Moment Correlation Coefficient

Summary

  • Formula for correlation \frac { { S }_{ xy } }{ \sqrt { { S }_{ x }{ S }_{ y } } }
  • r >0 positive correlation
  • r <0 negative correlation
  • r =0 no/zero correlation
  • r =+1 r = -1 perfect positive and negative correlation respectively
  • A good relation between the variables means that the line of best fit will pass through maximum points

The interdependence of the two variables is known as as correlation.correlation is measured by coefficient of correlation which is denoted by ”r”. And its numerical value ranges from +1 to -1.
It also helps us understand the strength of the relationship and whether the relationship between two variables is positive or negative. It’s formula is:

1. Positive correlation:

  • Is when both the variables have the same type of moment and they both rise or fall together in the same direction
  • E.g sale of ice cream with change in temperature, if the temperature increases more ice cream is sold and as the temperature decreases less ice cream is sold
  • The expenditure of a family depends on their income. If income falls the expenditure also falls and vice versa.

2. Negative correlation:

  • When both the variables move in different direction. They are considered to have an inverse relationship
  • E.g when you start exercising your weight reduces significantly
  • As the price of one product increases its demand falls and as its price decreases its demand increases.

3. If the correlation is found to be 0 then in that case, both the variables X and Y are considered independent and are considered to have no linear dependency on each other.

  • E.g The price of shoes and jeans have nothing in common, thus if the price of jeans increases or falls it will have no effect on the price of shoes and so we can say that they have zero correlation with each other

4. If the coefficient of correlation is -1 it is considered a perfect negative correlation and if the correlation is +1 then it is considered a perfect positive correlation. The closer the value is to -1 or +1 the stronger the relationship is considered to be.

Example #1

Week12345678910
x9111213151816141210
y429350360300225200230280315410

\sum { x } = 130

\sum { y } = 3090

\sum { xy } = 38305

\sum { { x }^{ 2 } } = 1760

\sum { { y }^{ 2 } } = 1007750

a) Find the correlation coefficient.

We will first find the mean of x and y:

\bar { x } \quad =\quad \frac { 130 }{ 10 } \quad =\quad 13

\bar { y } \quad =\quad \frac { 3090 }{ 10 } \quad =\quad 309

Next we will plug in all the values in the formula of correlation coefficient that we studied above,

r\quad =\quad \frac { 39305\quad -\quad 10(309)(13) }{ \sqrt { \left[ 1760\quad -\quad 10{ (13) }^{ 2 } \right] \left[ 1007750\quad -\quad 10{ (309) }^{ 2 } \right] } } \quad =\quad -0.9688

b) Calculate the least square regression line.

Use the formula that we discussed in linear regression chapter to calculate this line.

b=\frac { \sum _{ i\quad =\quad 1 }^{ n }{ { x }_{ i }{ y }_{ i } } -n\bar { x } \bar { y } }{ \sum _{ i\quad =\quad 1 }^{ n }{ { x }_{ i }^{ 2 } } -n{ \bar { x } }^{ 2 } }

b\quad =\quad \frac { -1860 }{ 70 } \quad =\quad -26.64

Now plug in the values in the equation below and calculate ”a”.

y = a+bx

309 = a -26.64 (13)

a = 655.56

y = 655.36 – 26.64 x

c) Based on the regression line what will be the predicted loss from the company when there are 17 workers on duty?will you trust this value? Justify your answer.

Since the number of workers is an x variable, we will replace x with 17 the regression equation and calculate the loss to the company.

y = 655.36 – 26.64(17)

y = 202.48

Since the value of y is 202.48 which is not a an outlier. And the correlation coefficient is also close to -1 representing a strong negative correlation. Thus we can say that this value can be trusted.

References
  1. https://explorable.com/statistical-correlation
  2. J.S Abdey Statistics 1