HW2.knit

STAT GU4205/GR5205 (Section 004) Linear Regression Models

HW2 by Dongrui Liu, Uni:dl3390

2.6

First, import the data.

setwd("C:/Users/Dongrui/Desktop/21 Fall/Linear Regression Models/HW/HW2")
Y = c(16,9,17,12,22,13,8,15,19,11)
X = c(1,0,2,0,3,1,0,1,2,0)
b1 = cov(X, Y)/var(X)
b0 = mean(Y) - b1*mean(X)
n=length(X)
MSE = function(x, y, b1, b0) {
return(sum((y-b1*x-b0)^2)/(n-2))
}
MSE1=MSE(X, Y, b1, b0)
k=sum((X-mean(X))^2)

Part (a)

upper = b1 + qt(0.975,n-2)*sqrt(MSE1/k)
lower = b1 - qt(0.975,n-2)*sqrt(MSE1/k)

We have β₁ with 95% confidence interval is [2.9183882,5.0816118]

With 95% of confidence, we say that the number of broken ampules increase by from 2.9183882 to 5.0816118 times the number of carton transferred growth.

Part (b)

P_value=pt(-b1/sqrt(MSE1/k),n-2)*2

H0 is that there does not exist a linear association between number of times a carton is transferred (X) and number of broken ampules (Y ), and β₁ is actually 0. And the alternative is that there does exist such relationship and β₁ is not 0.

Hence we are determining whether β₁ is 0. If the P-value of the test is less than .05, then we say that the difference between β₁ and 0 is significant on the level of .05. In this case, the P-value is 2.7486695^{-5}, hence we reject the null hypothesis that β₁ is 0.

Part (c)

sb0=sqrt(MSE1*(1/n+mean(X)/k))
upper = b0 + sb0*qt(0.975,n-2)
lower = b0 - sb0*qt(0.975,n-2)

We have β₀ with 95% confidence interval is [8.6703699,11.7296301]

Hence, we have 95% confidence that when no cartons are transferred, the number of broken ampules should be between 8.6703699 and 11.7296301.

Part (d)

sy=sqrt(MSE1*(1/n+(0-mean(X))^2/k))
P_value = pt((b0-9)/sy,n-2,lower.tail = FALSE)

We are now determining whether the mean of broken ampules exceeds 9.0 when no carton is transferred. The H0 is that it does not exceed 9.0, while the alternative is that it does. Since the P_value = 0.0540223 which is larger than α=.025, we accept H0 that the mean of broken ampules exceeds 9.0 when no carton is transferred is smaller than 9.0 with a significance level of .025.

Part (e)

σb1=abs(b1-2)/.5

σb0=abs(9-11)/.75

From the table where α=.05, and σ for b1 is 4 and df = 8, we have Power for b1 is 0.94

From the table where α=.05, and σ for b0 is 2.6666667 and df = 8, we have Power for b0 is 0.64

2.23

Part (a)

data=read.table("CH01PR19.txt",header = FALSE)
x=data[,2]
y=data[,1]
n=length(x)
b1 = cov(x, y)/var(x)
b0 = mean(y) - b1*mean(x)
SSR=sum((b1*x+b0-mean(y))^2)
SSE=sum((y-b1*x-b0)^2)
SSTO=sum((y-mean(y))^2)
MSE=SSE/(n-2)
MSR=SSR/(1)
df1 = data.frame(SS=c(SSR,SSE,SSTO),df=c(1,n-2,n-1),MS=c(MSR,MSE,NA),F_value=c(MSR/MSE,NA,NA),row.names = c("Regression","Error","Total"))
df1

##                   SS  df        MS  F_value
## Regression  3.587846   1 3.5878459 9.240243
## Error      45.817608 118 0.3882848       NA
## Total      49.405454 119        NA       NA

Above is the ANOVA table

Part (b)

The MSR measured the variance in the Grade point average that is captured by the model, which is 3.5878459. The MSE measured the variance the Grade point average that is not captured by the model, which is 0.3882848.

From the F-test for \(\beta_1\) we know that when \(\beta_1\) = 0, MSE and MSR will estimate the same quantity.

Part (c)

PFb1 = pf(MSR/MSE,1,118,lower.tail = FALSE)

The H0 for F-test is that β₁ is zero and the alternative is that β₁ is not zero. The F-Statistic we have is 9.2402427 and its P_value is 0.0029166 which is smaller than the α that we set up to be 0.01. Hence, we reject H0 which states that β₁ is zero.

Part (d)

The absolute magnitude of the reduction in the variation of Y when X is introduced into the regression model is the same as SSR, which is 3.5878459. The relative reduction is SSR/SSTO which is 0.0726204. The name of the latter measure is coefficient of determination(R²).

Part (e)

r = sign(b1)*sqrt(SSR/SSTO)

The r obtained for this model should have the same sign of b1, which is 0.2694818

Part (f)

R² has the more clear-cut operational interpretation, because in operation, companies care about how much the variance of Y has been explained instead of whether the slope of the fitted regression line is positive or negative.