Equations

yprediction(x0,x1,...,xm1)=w0x0+w1x1+...+wm1xm1
MSE=12nn1i=0(yactual(w0x0+w1x1+...+wm1xm1))2

Gradient of the MSE:

MSE(w0,w1,...,wm1)=MSEw0,MSEw1,...,MSEwm1

Let’s take the first partial derivative as an example.

u=(yactual(w0x0+w1x1+...+wm1xm1))
MSE=12nn1i=0u2

Using the chain rule:

MSEw0=MSEuuw0

The constants are cancelled out.

MSEu=1nn1i=0u

Only x0 will remain from u=(yactual(w0x0+w1x1+...+wm1xm1)) since all other variables will be treated as a constant except for w0.

uw0=x0

Back to the other equation:

MSEw0=MSEuuw0
MSEw0=1nn1i=0(u)x0

Substitute u.

MSEw0=1nn1i=0(yactual(w0x0+w1x1+...+wm1xm1))x0

Substitute the yprediction function.

MSEw0=1nn1i=0(yactualyprediction)x0

Then do this for all the weights:

MSEw1=1nn1i=0(yactualyprediction)x1
...
MSEwm1=1nn1i=0(yactualyprediction)xm1