STAT 408 Regression HW1

Problem 1 and 2.

Problem 3.

Problem 4

Part A.

\[\hat{y} = 10 + 0.56X\] When X = 7, \[\hat{y} = 10 + 0.56(7) = 10 + 3.92 = 13.92\]

Part B.

When X = 7 and Y = 17 When X = 7, \[\hat{y} = 13.92\] Residual: \[\hat{y} - Y = 17-13.92 = 3.08 \]

Part C.

With evey 1 training hour increase, the expected test score will increase by 0.56.

Part D.

No, the test score would not necessarily be the same. The model is in terms of Y given X. So, while the test score depends on the training hour, the training hour does not necessarily depend on the test score. The model we have attempts to predict the expected Y value based on the training hours, but there is always some error in the model.

Part E.

\[MSE = \frac{SSE}{n-2} = \frac {22}{18-2} = \frac {11}{8}\]

Problem 5.

It should be written as \[Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \]

Problem 6.

Part A.

The data obtained is observational.

Part B.

The study may show a negative relationship, but without knowing more about the design we cannot clearly state whether there was bias skewing the results. First, it seems there may be selection bias; only the “capable” senior citizens may be able to workout which causes the sample to not be a representative sample of ALL senior citizens. Additionally, there may be measurement error. Individuals tend to over report positive actions such as working out and eating healthy. The value of B1 is not given in the report either, so it may be only a small negative relationship, which may not prove the results to be as generalized and valid as they are reported. Additionally, there may be other factors contributing to this negative relationship.

Part C.

Diet and amount of sleep may affect exercise and frequency of colds. A healthier diet may influence more exercise and decrease colds. More sleep may yield similar results.

Part D.

The causal relationship cannot be determined, but we can see how x and y are related better if we have more controls on data collection.

PROBLEM 7

Part A.

The least squares estimates for B_0 and B_1 respectively are as follows: \[b_0 = 2.114049 \] and \[b_1 = 0.03882713 \]

The regression function is as follows: \[\hat{y} = 2.114049 + 0.03882713x\]

n <- nrow(Data)
X <- Data[,2]
Y <- Data[,1]
Xbar <- mean(X)
Ybar <- mean(Y)
b1 <- sum((X-Xbar)*(Y-Ybar))/((n-1)*var(X)) #Formula
b0 <- Ybar-b1*Xbar #Formula

b0

## [1] 2.114049

b1

## [1] 0.03882713

Part B.

The plot below shows the ACT Score by GPA Plot. The line does an okay job depicting the data. The data seems to be scattered in an upwards direction, but it is not centered directly along the line. Generally, it captures the direction though.

min(X)

## [1] 14

max(X) #Range of X

## [1] 35

plot(X,Y,xlab="ACT Score",
ylab="Grade Point Average",type="p",pch = 20,cex = 1)
lines(c(0,36),b0+b1*(c(0,36)),lwd=2)

Part C.

With an ACT Test score X=30, the predicted GPA will be 3.29

Yhath <- b0 + b1 * 30 # predicted value for X_h=5
Yhath

## [1] 3.278863

Part D.

When the test score increases by one point, the mean response for the GPA will increase by 0.03882713.

Problem 8

Part A.

The residuals and the sum of the residuals are listed below. They sum to -2.6012*10^-14 which is very close to 0.

Yhat <- b0+b1*X
Res <- Y-Yhat
resid<-Res[1:120] # All 120 residuals
resid

##   [1]  0.96758105  1.22737094  0.57679116 -0.42824608  0.09858105
##   [6]  0.54730978 -0.39451735  0.79861829 -2.74003597  0.05444541
##  [11]  0.26409967  0.25913691  0.03709967 -0.03290033 -0.15034448
##  [16] -0.19938171  0.43727254 -0.30469022 -0.13772746 -0.77259183
##  [21] -0.48290033  0.42758105  0.52979116  0.76261829  0.35479116
##  [26] -0.02255459 -0.78120884 -0.38924608  0.74744541  0.13058105
##  [31]  0.84227254 -0.36028332 -0.27220884  0.25144541 -0.11124608
##  [36]  0.02609967  0.45158105  0.01113691  0.38661829  0.52244541
##  [41] -0.14555459 -0.62486309 -0.50590033 -0.87355459 -1.17103597
##  [46] -0.42890033 -1.13469022 -0.69645619  0.10023530  0.99306243
##  [51] -0.29138171  0.61671668  0.14261829 -0.17155459  0.50109967
##  [56]  0.41213691  0.23058105 -0.69659183  0.04413691  0.69596403
##  [61] -0.16272746 -0.29107321  0.28527254  0.59892679 -0.63686309
##  [66] -0.47741895 -0.39090033  0.35748265 -1.00693757  0.50892679
##  [71]  0.14840817 -0.04107321 -0.33093757 -0.11293757  0.67996403
##  [76] -0.05659183  0.21492679 -0.03955459  0.79879116  0.07682840
##  [81]  0.43240817  0.18140817 -1.04455459  0.51848265  0.12327254
##  [86] -0.24238171  0.18261829  0.71596403  0.95623530 -0.42341895
##  [91]  0.84009967 -0.97938171  0.34427254  0.21106243  0.50996403
##  [96]  0.78709967 -0.04938171 -0.05441895 -0.10476470 -0.50193757
## [101] -1.24372746 -1.22993757 -0.01159183  0.23448265 -0.13190033
## [106]  0.24300127 -0.28472746  0.41979116  0.59079116 -0.21772746
## [111]  0.45075392  0.32113691 -0.49659183 -0.60459183 -1.83169022
## [116]  0.99440817  0.55996403  0.71279116 -0.87528332 -0.25320884

sum(resid)

## [1] -2.620126e-14

Part B.

Sigma^2 or MSE = 0.3883848 and Sigma or root(MSE) = 0.623125. Sigma is expressed in GPA points.

MSE <- sum(Res^2)/(n-2) #Formula
MSE

## [1] 0.3882848

sqrt(MSE)

## [1] 0.623125

STAT 408 Regression HW1

Kajal Chokshi

9/6/2018

Problem 1 and 2.

Problem 3.

Problem 4

Part A.

Part B.

Part C.

Part D.

Part E.

Problem 5.

Problem 6.

Part A.

Part B.

Part C.

Part D.

PROBLEM 7

Part A.

Part B.

Part C.

Part D.

Problem 8

Part A.

Part B.