Linear Regression

Summary

  • Regression line helps us connect two or more variables together, its equation is y = ax + b
  • To find the value of b we use the following formula:
    b\quad =\quad \frac { \sum _{ i\quad =\quad 1 }^{ n }{ { x }_{ i }y_{ i }-n(\bar { x } )(\bar { y } ) } }{ \sum _{ i\quad =\quad 1 }^{ n }{ { x }_{ i }^{ 2 }-n{ (\bar { x } ) }^{ 2 } } }
  • An explanatory variable is an Independent variable
  • Response variable is a Dependent variable

Regression is commonly known for data analysis and It helps us understand the connection between two (dependant and independant) or more variables with a linear equation. When dealing with paired data, we draw a scatter diagram and study the graph to inspect any relationship between the two variables.

Example#1

Q.

12345678
height6564626967656167
weight105115110140120135104130

We have been given data of students from school with height and weight, we will first draw the scatter diagram for this data. We will take height on X axis since it is an independent variable and weight on Y axis since weight is a dependant variable and draw the graph. We can see that both the variable height and weight have a positive linear relation.

After drawing the scatter diagram. We will now join the points and draw a best fit line which will have the maximum points on it. We also need to remember that the best fit line should also pass through the mean point. Which in this case is (65, 120).

Another way of finding the regression line is by using the least square regression line formula. By using this formula we make sure that our line is accurate and not just drawn by estimating the points. Regression line equation is as follows for y on x.

y = ax + b

Where a = y – intercept

Also the initial value of y

b = gradient

Also the change in y when we increase the value of x by 1 unit

Inorder to find b on the regression equation we will use the following formula:

b\quad =\quad \frac { \sum _{ i\quad =\quad 1 }^{ n }{ { x }_{ i }y_{ i }-n(\bar { x } )(\bar { y } ) } }{ \sum _{ i\quad =\quad 1 }^{ n }{ { x }_{ i }^{ 2 }-n{ (\bar { x } ) }^{ 2 } } }

We are also sometimes asked which of the two X OR Y is the explanatory variable and which is a response variable.

An explanatory variable is defined as an independent variable, whose value does not get affected by the any other variable.

On the other hand, response variable is defined as a dependent variable, which depends upon the other variables.

Example #2

Q. We are given the loss a company goes through in 10 weeks as its increases the number of workers each week.

a) find the regression line
b) figure out the dependent and independent variable
c) If the company has 20 workers find out the loss it goes through

Weeksnumber of workers (xi)loss (yi)
19420
211350
312360
413300
515225
618200
716230
814280
912315
1010410
Total = 130Total = 3090

a) Firstly, we will make two other columns xy and { x }^{ 2 } and these will help us in finding the regression line.

Weeksnumber of workers (xi)loss (yi)xy{ x }^{ 2 }
19420378081
2113503850121
3123604320144
4133003900169
5152253375225
6182003600324
7162303680256
8142803920196
9123153780144
10104104100100
Total = 130Total = 3090

Next we would need to calculate “b” by substituting values from above in the given formula of b.

b\quad =\quad \frac { \sum _{ i\quad =\quad 1 }^{ n }{ { x }_{ i }y_{ i }-n(\bar { x } )(\bar { y } ) } }{ \sum _{ i\quad =\quad 1 }^{ n }{ { x }_{ i }^{ 2 }-n{ (\bar { x } ) }^{ 2 } } }

=\quad \frac { (130\quad \times \quad 3090)\quad -\quad 10(13)(309) }{ { 130 }^{ 2 }\quad -\quad 10{ (13) }^{ 2 } } \quad

=\quad \frac { 361530 }{ 15210 } \quad =\quad 24

In order to find “a” in our equation we use both the mean of y and x, thus we will have the following equation:

\bar { y } \quad =\quad a\bar { x } \quad +\quad b

\bar { x } \quad =\quad \frac { 130 }{ 10 } \quad =\quad 13

\bar { y } \quad =\quad \frac { 3090 }{ 10 } \quad =\quad 309

Now putting all the information in equation we can calculate the value of “a” :

309 = a(13) + 24

a = 21.92

a\quad \approx \quad 22

We now, represent all this information on the initial line of regression equation as:

y = 22x + 24

b) We know that from the information above, the independent variable is the explanatory variable whereas the dependent variable is the response variable.

Since the loss dependents upon the number of workers we can say that loss is the dependent variable.

However, since the number of workers do not depend upon the loss, we can say that the number of workers are the independent variable.

c) Now that we have our regression line we can find the loss of the company. We do this as as we know that the number of workers(x) are 20, thus putting it in the equation we get:

y = 22(20) + 24

y = 464

Reference
  1. J.S abdey statistics 1