Linear regression in Excel – continued

Doing linear regression in Excel is simple. Now I’d like to show you how exactly this is done.

Let’s use the same data set:

x 9.42 2.78 5.87 5.49 6.48 4.91 7.82 2.70 2.65 8.42 5.85 0.93
y 52.24 4.15 13.72 7.00 5.28 6.98 42.00 7.15 3.41 31.00 12.59 1.09

Let’s say we want do a order-2 linear regression. That means, we want a function:

f(x)=c*x^2 + b*x+a

a, b and c should be chosen in a way such that if we put x into f(x), the difference/error between f(x) and y, will be minimized, in the least square sense.

So we have the difference R as a function of a,b,c:

R(a,b,c)=(c* 9.42^2 + b* 9.42 + a – 52.24)^2+(c*2.78^2+b*2.78+a-4.15)^2+…+(c*0.93^2+b*0.93+a)

R is smooth every where in space. So R is minimal where:

D(R)/D(a)=0; D(R)/D(b)=0; D(R)/D(c)=0

Now we’ll have to employ the compact form to avoid tedious typing:

equ1

In order for R to be minimal, we need:

equ2

Insert R into the equations and expand them, we get:

equ3

That is:

equ4

Or in matrix form:

equ5

Now this is a very simple equation, so we can simply solve it using Cramer’s rule:

equ6

It looks scary but actually not too complex. We need the sum of x, x^2, x^3, x^4, x*y and x^y. We can actually verify this in excel:

x y x^2 x^3 x^4 x*y x^2*y
9.42 52.24 88.69578 835.323 7866.942 491.974 4633.334
2.78 4.15 7.721688 21.45697 59.62447 11.53932 32.06537
5.87 13.72 34.43254 202.0476 1185.6 80.52797 472.5321
5.49 7.00 30.18894 165.8715 911.3719 38.47401 211.3934
6.48 5.28 41.99053 272.099 1763.204 34.19838 221.6059
4.91 6.98 24.08858 118.227 580.2595 34.23907 168.0457
7.82 42.00 61.22442 479.0568 3748.43 328.6334 2571.426
2.70 7.15 7.269068 19.59829 52.83935 19.27266 51.96143
2.65 3.41 7.003793 18.53532 49.05312 9.028243 23.89296
8.42 31.00 70.90802 597.0945 5027.948 261.0414 2198.149
5.85 12.59 34.27254 200.6409 1174.607 73.71933 431.5732
0.93 1.09 0.869064 0.810173 0.755272 1.014426 0.945685
63.32 186.61 408.665 2930.761 22420.63 1383.662 11016.92

So we have:

calculation_0

calculation_1calculation_2

calculation_3

So in the end we get:

cal_result

This is exactly what Excel displayed on the chart:

trend_line_res

I actually put all this in an Excel file so you can try it yourself. Please note that in the excel file we have a data set of 12 points. If your data set has a different size, you may have to adapt the formulas a little bit. 🙂

Linear regression in Excel

Excel has a very handy feature that does linear regression for you very intuitively.

Let’s say you have a data set like this:

x 9.42 2.78 5.87 5.49 6.48 4.91 7.82 2.70 2.65 8.42 5.85 0.93
y 52.24 4.15 13.72 7.00 5.28 6.98 42.00 7.15 3.41 31.00 12.59 1.09

First, you’ll have to insert a scatter chart:

insert_scatter_chart

You’ll get a chart like this (Excel is smart enough to figure out what should be the x-axis and what should be the y-axis):

scatter_chart_ex1

Now, right click on the dots and then you can select “Add Trendline…” from the pop-up menu.

add_trendline

Then you’ll be presented with a dialog like this:

trend_line_options

Note that Excel has different options for “Linear” and “Polynomial”. Here the option is about the relationship between the variables. All these different options are all Linear Regressions.

Let’s try a order 2 polynomial regression and check the option “Display Equation on chart” and “Display R-squared value on chart”. This is what you’ll get:

order_2_regression

Good thing about doing this in Excel is, you can always change the data set and your chart and equation will be updated automatically.

Depending on the nature of your data set, you may want to try different model to get a better fit, the procedure will be the same.