Multiple Linear Regression models

Steps for Multiple Linear Regression

1. Model

\(y_i = b_0+b_1x_{1i}+b_2x_{2i}+ ... + b_kx_{ki}+e_i\)
in matrix notation is: \(y=Xb+e\)

2. Parameter estimation

\(b = (X^TX)^{-1}(X^Ty)\)

remember to add a column of 1’s to the beginning of X

3. Allocation of variation

\(SSY=\sum^n_{i=1}y^2_i\)
\(SS0 = n\bar{y}^2\)
\(SST=SSY-SS0\)
\(SSE = y^Ty-b^TX^Ty\)
\(SSR = SST-SSE\)

4. Coefficient of determination

\(R^2=\frac{SSR}{SST}=\frac{SST-SSE}{SST}\)

5. Coefficient of multiple correlation

\(R = \sqrt{\frac{SSR}{SST}}\)

6. Degrees of freedom:

\(SST=SSY-SS0=SSR+SSE\)
\(n-1=n\ \ \ \ -\ \ \ 1\ \ =k\ \ +(n-k-1)\)

7. Analysis of variance

\(MSR=\frac{SSR}{k};MSE=\frac{SSE}{n-k-1}\)

8. Standard deviation of errors

\(s_e=\sqrt{MSE}\)

9. Standard deviation of parameters

\(s_{b_j}=s_e\sqrt{C_{jj}}\) where \(C_{jj}\) is the \(j\)th diagonal term of \(C=(X^TX)^-1\)

10. Prediction: Mean of m future observations

\(\hat{y}_p=b_0+b_1x_{1p}+b_2x_{2p}+ ... + b_kx_{kp}\) or \(\hat{y}_p=x^T_pb\) remember to add the bias 1 to \(x\)

11. Standard deviation of prediction

\(S_{\hat{y}_p}=s_e\sqrt{\frac{1}{m}+x^T_p(X^TX)^{-1}x_p}\)

12. All confidence intervals are computed using \(t_{[1-\alpha/2;n-k-1]}\)

13. Correlations among predictors

\(R_{x_1x_2}=\frac{\Sigma x_{1i}x_{2i}-n\bar{x}_1\bar{x}_2}{\sqrt{\Sigma x^2_{1i}-n\bar{x}^2_1}\sqrt{\Sigma x^2_{2i}-n\bar{x}^2_{2}}}\)

Example 15.1 Seven programs were monitored to observe their resource demands. In particular, the number of disk I/O’s, memory size (in kilo-bytes), and CPU time (in milliseconds) were observed. We would like to find a linear function to estimate the CPU time: CPU time = b0 + b1(number of disk I/O’s) + b2(memory size).

CPUTime <- c(2,5,7,9,10,13,20)
DiskIOs <- c(14,16,27,42,39,50,83)
MemorySize <- c(70,75,144,190,210,235,400)
data <- data.frame(CPUTime, DiskIOs, MemorySize)
kable(data)
CPUTime DiskIOs MemorySize
2 14 70
5 16 75
7 27 144
9 42 190
10 39 210
13 50 235
20 83 400

First we do the parameter estimation.

X  <- as.matrix(select(data,DiskIOs,MemorySize))
y <- data$CPUTime
X = cbind(rep(1,length(y)),X)
n = length(y)

b <- solve(t(X) %*% X) %*% (t(X) %*% y)
b0 <- b[1]
b1 <- b[2]
b2 <- b[3]

So the equation is CPU time = -0.1614 + 0.1182(number of disk I/O’s) + 0.0265(memory size)

SSE=t(y) %*% y - t(b) %*% t(X) %*% y
SSY=sum(y^2)
SS0=n*mean(y)^2
SST=SSY-SS0
SSR=SST-SSE
R2=SSR/SST
R=sqrt(R2)
se=sqrt(SSE/(n-length(b)))
measure  <- c('SSE','SSY','SS0','SST','SSR','R2','R','se')
values  <- c(SSE,SSY,SS0,SST,SSR,R2,R,se)
results <- data.frame(measure,values)
kable(results)
measure values
SSE 5.3000876
SSY 828.0000000
SS0 622.2857143
SST 205.7142857
SSR 200.4141981
R2 0.9742357
R 0.9870338
se 1.1510960