Doing linear regression in Excel is simple. Now I’d like to show you how exactly this is done.
Let’s use the same data set:
x | 9.42 | 2.78 | 5.87 | 5.49 | 6.48 | 4.91 | 7.82 | 2.70 | 2.65 | 8.42 | 5.85 | 0.93 |
y | 52.24 | 4.15 | 13.72 | 7.00 | 5.28 | 6.98 | 42.00 | 7.15 | 3.41 | 31.00 | 12.59 | 1.09 |
Let’s say we want do a order-2 linear regression. That means, we want a function:
f(x)=c*x^2 + b*x+a
a, b and c should be chosen in a way such that if we put x into f(x), the difference/error between f(x) and y, will be minimized, in the least square sense.
So we have the difference R as a function of a,b,c:
R(a,b,c)=(c* 9.42^2 + b* 9.42 + a – 52.24)^2+(c*2.78^2+b*2.78+a-4.15)^2+…+(c*0.93^2+b*0.93+a)
R is smooth every where in space. So R is minimal where:
D(R)/D(a)=0; D(R)/D(b)=0; D(R)/D(c)=0
Now we’ll have to employ the compact form to avoid tedious typing:
In order for R to be minimal, we need:
Insert R into the equations and expand them, we get:
That is:
Or in matrix form:
Now this is a very simple equation, so we can simply solve it using Cramer’s rule:
It looks scary but actually not too complex. We need the sum of x, x^2, x^3, x^4, x*y and x^y. We can actually verify this in excel:
x | y | x^2 | x^3 | x^4 | x*y | x^2*y |
9.42 | 52.24 | 88.69578 | 835.323 | 7866.942 | 491.974 | 4633.334 |
2.78 | 4.15 | 7.721688 | 21.45697 | 59.62447 | 11.53932 | 32.06537 |
5.87 | 13.72 | 34.43254 | 202.0476 | 1185.6 | 80.52797 | 472.5321 |
5.49 | 7.00 | 30.18894 | 165.8715 | 911.3719 | 38.47401 | 211.3934 |
6.48 | 5.28 | 41.99053 | 272.099 | 1763.204 | 34.19838 | 221.6059 |
4.91 | 6.98 | 24.08858 | 118.227 | 580.2595 | 34.23907 | 168.0457 |
7.82 | 42.00 | 61.22442 | 479.0568 | 3748.43 | 328.6334 | 2571.426 |
2.70 | 7.15 | 7.269068 | 19.59829 | 52.83935 | 19.27266 | 51.96143 |
2.65 | 3.41 | 7.003793 | 18.53532 | 49.05312 | 9.028243 | 23.89296 |
8.42 | 31.00 | 70.90802 | 597.0945 | 5027.948 | 261.0414 | 2198.149 |
5.85 | 12.59 | 34.27254 | 200.6409 | 1174.607 | 73.71933 | 431.5732 |
0.93 | 1.09 | 0.869064 | 0.810173 | 0.755272 | 1.014426 | 0.945685 |
63.32 | 186.61 | 408.665 | 2930.761 | 22420.63 | 1383.662 | 11016.92 |
So we have:
So in the end we get:
This is exactly what Excel displayed on the chart:
I actually put all this in an Excel file so you can try it yourself. Please note that in the excel file we have a data set of 12 points. If your data set has a different size, you may have to adapt the formulas a little bit. 🙂