Steps for Multiple Linear Regression
\(y_i = b_0+b_1x_{1i}+b_2x_{2i}+ ... + b_kx_{ki}+e_i\)
in matrix notation is: \(y=Xb+e\)
\(b = (X^TX)^{-1}(X^Ty)\)
remember to add a column of 1’s to the beginning of X
\(SSY=\sum^n_{i=1}y^2_i\)
\(SS0 = n\bar{y}^2\)
\(SST=SSY-SS0\)
\(SSE = y^Ty-b^TX^Ty\)
\(SSR = SST-SSE\)
\(R^2=\frac{SSR}{SST}=\frac{SST-SSE}{SST}\)
\(R = \sqrt{\frac{SSR}{SST}}\)
\(SST=SSY-SS0=SSR+SSE\)
\(n-1=n\ \ \ \ -\ \ \ 1\ \ =k\ \ +(n-k-1)\)
\(MSR=\frac{SSR}{k};MSE=\frac{SSE}{n-k-1}\)
\(s_e=\sqrt{MSE}\)
\(s_{b_j}=s_e\sqrt{C_{jj}}\) where \(C_{jj}\) is the \(j\)th diagonal term of \(C=(X^TX)^-1\)
\(\hat{y}_p=b_0+b_1x_{1p}+b_2x_{2p}+ ... + b_kx_{kp}\) or \(\hat{y}_p=x^T_pb\) remember to add the bias 1 to \(x\)
\(S_{\hat{y}_p}=s_e\sqrt{\frac{1}{m}+x^T_p(X^TX)^{-1}x_p}\)
\(R_{x_1x_2}=\frac{\Sigma x_{1i}x_{2i}-n\bar{x}_1\bar{x}_2}{\sqrt{\Sigma x^2_{1i}-n\bar{x}^2_1}\sqrt{\Sigma x^2_{2i}-n\bar{x}^2_{2}}}\)
Example 15.1 Seven programs were monitored to observe their resource demands. In particular, the number of disk I/O’s, memory size (in kilo-bytes), and CPU time (in milliseconds) were observed. We would like to find a linear function to estimate the CPU time: CPU time = b0 + b1(number of disk I/O’s) + b2(memory size).
CPUTime <- c(2,5,7,9,10,13,20)
DiskIOs <- c(14,16,27,42,39,50,83)
MemorySize <- c(70,75,144,190,210,235,400)
data <- data.frame(CPUTime, DiskIOs, MemorySize)
kable(data)
| CPUTime | DiskIOs | MemorySize |
|---|---|---|
| 2 | 14 | 70 |
| 5 | 16 | 75 |
| 7 | 27 | 144 |
| 9 | 42 | 190 |
| 10 | 39 | 210 |
| 13 | 50 | 235 |
| 20 | 83 | 400 |
First we do the parameter estimation.
X <- as.matrix(select(data,DiskIOs,MemorySize))
y <- data$CPUTime
X = cbind(rep(1,length(y)),X)
n = length(y)
b <- solve(t(X) %*% X) %*% (t(X) %*% y)
b0 <- b[1]
b1 <- b[2]
b2 <- b[3]
So the equation is CPU time = -0.1614 + 0.1182(number of disk I/O’s) + 0.0265(memory size)
SSE=t(y) %*% y - t(b) %*% t(X) %*% y
SSY=sum(y^2)
SS0=n*mean(y)^2
SST=SSY-SS0
SSR=SST-SSE
R2=SSR/SST
R=sqrt(R2)
se=sqrt(SSE/(n-length(b)))
measure <- c('SSE','SSY','SS0','SST','SSR','R2','R','se')
values <- c(SSE,SSY,SS0,SST,SSR,R2,R,se)
results <- data.frame(measure,values)
kable(results)
| measure | values |
|---|---|
| SSE | 5.3000876 |
| SSY | 828.0000000 |
| SS0 | 622.2857143 |
| SST | 205.7142857 |
| SSR | 200.4141981 |
| R2 | 0.9742357 |
| R | 0.9870338 |
| se | 1.1510960 |