Equations
yprediction(x0,x1,...,xm−1)=w0x0+w1x1+...+wm−1xm−1
MSE=12nn−1∑i=0(yactual−(w0x0+w1x1+...+wm−1xm−1))2
Gradient of the MSE:
∇MSE(w0,w1,...,wm−1)=∂MSE∂w0,∂MSE∂w1,...,∂MSE∂wm−1Let’s take the first partial derivative as an example.
u=(yactual−(w0x0+w1x1+...+wm−1xm−1))
MSE=12nn−1∑i=0u2
Using the chain rule:
∂MSE∂w0=∂MSE∂u∗∂u∂w0The constants are cancelled out.
∂MSE∂u=1nn−1∑i=0uOnly x0 will remain from u=(yactual−(w0x0+w1x1+...+wm−1xm−1)) since all other variables will be treated as a constant except for w0.
∂u∂w0=x0Back to the other equation:
∂MSE∂w0=∂MSE∂u∗∂u∂w0
∂MSE∂w0=1nn−1∑i=0(u)∗x0
Substitute u.
∂MSE∂w0=1nn−1∑i=0(yactual−(w0x0+w1x1+...+wm−1xm−1))∗x0Substitute the yprediction function.
∂MSE∂w0=1nn−1∑i=0(yactual−yprediction)∗x0Then do this for all the weights:
∂MSE∂w1=1nn−1∑i=0(yactual−yprediction)∗x1
...
∂MSE∂wm−1=1nn−1∑i=0(yactual−yprediction)∗xm−1