1. (1.2) The members of a health spa pay annual membership dues of \(\$300\) plus a charge of \(\$2\) for each visit to the spa. Let Y denote the dollar cost for the year for a member and X the number of visits by the member during the year. Express the relation between X and Y mathematically. Is it a functional relation or a statistical relation (that is, is the relation deterministic or stochastic)?
  1. (1.6) Suppose the regression parameters are \(\beta0\) = 200 and \(\beta1\) = 5.0.
  1. Plot the regression equation.

  1. Predict the response for X = 10, 20, and 40.

  1. Explain the meaning of parameters \(\beta0\) and \(\beta1\).
  1. (1.10) An analyst in a large corporation studied the relation between current annual salary (Y ) and age (X) for the 46 computer programmers presently employed in the company. The analyst concluded that the relation is curvilinear, reaching a maximum at 47 years. Does this imply that the salary for a programmer increases until age 47 and then decreases? Explain.
  1. The time it takes to transmit a file always depends on the file size. Suppose you transmitted 30 files, with the average size of 126 Kbytes and the standard deviation of 35 Kbytes. The average transmittance time was 0.04 seconds with the standard deviation of 0.01 seconds. The correlation coefficient between the time and the size was 0.86. Based on this data, fit a linear regression model and predict the time it will take to transmit a 400 Kbyte file.

  1. At a gas station, 180 drivers were asked to record the mileage of their cars and the number of miles per gallon. The results are summarized in the table.
  1. Compute the least squares regression line which describes how the number of miles per gallon depends on the mileage.

- Reference: https://youtu.be/yttN024P-Gg

# slope(b1) = r*sd(y)/sd(x)
slope <- ((-0.17*3.4)/14634)
slope
## [1] -3.949706e-05
# y intercept(b0) = sample mean of y - slope* sample mean of x
yIntercept <- 23.8-(slope*24598)
yIntercept
## [1] 24.77155
  1. What do the obtained slope and intercept mean in this situation?
  1. You purchase a used car with 35,000 miles on it. Predict the number of miles per gallon.
# y = b0+b1*X
predict35000 <- yIntercept + slope * 35000
predict35000 
## [1] 23.38915
  1. (Stat-615 only) Show that the sample intercept b0 is a linear and unbiased estimator of the population intercept \(\beta0\).

  1. (Computer project - 1.19, 1.24). Grade point average. The director of admissions of a small college selected 120 students at random from the new freshman class in a study to determine whether a students grade point average (GPA) at the end of the freshman year (Y) can be predicted from the ACT test score (X). The results of the study follow.
  1. Obtain the least squares estimates of \(\beta0\) and \(\beta1\) and state the estimated regression function.
asc <- read.table("./data/CH01PR19.txt")

reg <- lm(V1 ~ V2, data = asc)
summary(reg)
## 
## Call:
## lm(formula = V1 ~ V2, data = asc)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.74004 -0.33827  0.04062  0.44064  1.22737 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.11405    0.32089   6.588  1.3e-09 ***
## V2           0.03883    0.01277   3.040  0.00292 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6231 on 118 degrees of freedom
## Multiple R-squared:  0.07262,    Adjusted R-squared:  0.06476 
## F-statistic:  9.24 on 1 and 118 DF,  p-value: 0.002917
  1. Plot the estimated regression function and the data. Does the estimated regression function appear to fit the data well?
attach(asc)
plot(V2, V1)
reg <- lm(V1 ~ V2)
abline(reg, col = "red", lwd = 3)
Yhat = predict(reg, x = V2)
points(V2, Yhat, col = "blue")

# summary(reg)
  1. Obtain a point estimate of the mean freshman GPA for students with ACT test score X = 30.
predict(reg, data.frame(V2 = 30))
##        1 
## 3.278863
  1. What is the point estimate of the change in the mean response when the entrance test score increases by one point?
summary(reg)
## 
## Call:
## lm(formula = V1 ~ V2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.74004 -0.33827  0.04062  0.44064  1.22737 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.11405    0.32089   6.588  1.3e-09 ***
## V2           0.03883    0.01277   3.040  0.00292 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6231 on 118 degrees of freedom
## Multiple R-squared:  0.07262,    Adjusted R-squared:  0.06476 
## F-statistic:  9.24 on 1 and 118 DF,  p-value: 0.002917
  1. Obtain the residuals \(ei\) and the sum of the squared residuals.
# formula = smaple mean of Y - b0 + b1* sample mean of x + residual(ei)
meanGPA <- mean(asc$V1)
meanGPA
## [1] 3.07405
meanACT <- mean(asc$V2)
meanACT 
## [1] 24.725
residual <- meanGPA - 2.11405 - 0.03883 * meanACT 
residual
## [1] -7.175e-05
anova(reg)
  1. Obtain point estimates of \(\sigma^2\) and \(\sigma\). In what units is each of them expressed?
anova(reg)
sigma <- sqrt(0.3883)
sigma
## [1] 0.6231372