An F-statistic greater than the critical value is equivalent to a p-value less than alpha and both mean that we can reject the null hypothesis. With the small p-value 0.033, we have evidence to reject the null, meaning that \(\alpha\) > 0.033. Plus, 0.033 is greater than 0.01, so we also can reject the null at \(\alpha\) = 0.01.
Reference: https://stats.stackexchange.com/questions/50727/f-statistic-f-critical-value-and-p-value
Because t-test has one-sided test(left tail & right tail) and two-sided test for \(\beta1\). Conversely, F-test (most notably in ANOVA) only can detect H0 : \(\beta1\) = 0 v.s. H1 : \(\beta1\) != 0.
The F-stat is always a number that is positive. The large values of F-stat is support Ha, and the values of F-stat approximate to 1 support H0. In other words, \(\beta1^2\) > 0, which is the same as testing whether \(\beta1\) > 0 or \(\beta1\) < 0, respectively.
The sample correlation coefficient is r = -0.17. In the previous homework, we fit a regression model that described how the number of miles per gallon depends on the mileage.
The unbiased estimation of standard deviation is 3.36
\(R^2\) = 0.0289 is the straightforward interpretation for the model because it accounts for the 2.89% of the total variation in Miles per gallon explained by Mileage. With a low percentage of \(R^2\), it doesn’t look like fitness in this regression. We might be able to change another independent variable to fit the model with the dependent variable Miles per gallon.
# read the data
gpa <- read.table("./data/CH01PR19.txt")
reg <- lm(V1 ~ V2, data = gpa)
# call the regression model summary table
summary(reg)
##
## Call:
## lm(formula = V1 ~ V2, data = gpa)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.74004 -0.33827 0.04062 0.44064 1.22737
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.11405 0.32089 6.588 1.3e-09 ***
## V2 0.03883 0.01277 3.040 0.00292 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6231 on 118 degrees of freedom
## Multiple R-squared: 0.07262, Adjusted R-squared: 0.06476
## F-statistic: 9.24 on 1 and 118 DF, p-value: 0.002917
anova(reg)
| Source | Sum of square | degrees of freedom | Mean of square |
|---|---|---|---|
| Regression | 3.588 | 1 | 3.5878 |
| Error | 45.818 | 118 | 0.3883 |
| Total | 49.406 | 119 |
If \(\beta1\) = 0, meaning that MSR and MSE estimate the same quantity. In this situation, X and Y does not have a linear association.
Reference: https://stats.stackexchange.com/questions/175437/mse-and-msr-in-regression-question
Based on the ANOVA table above, the F-test is 9.2402, and the p-value falls into significant level between 0.001 to 0.01. We can conclude the null hypothesis can be rejected at level 0.001 <= \(\alpha\) <= 0.01 in favor of the alternative hypothesis.
SST = 49.406, SSE = 45.818, SSR = 3.588. The coefficient of determination is 7 %. It means that 7 % of total variation of GPA score is explained by the ACT score.
Firstly, \(\beta1\) is 0.03883, which is positive slope so the correlation coefficient is a positive number. Thus, the sample correlation coefficient is 0.26.
• the data
• the least squares regression line for ACT scores
• the 95 percent confidence band for the true regression line for ACT scores between 20 and 30
# confidence band for the entire population regression line
attach(gpa)
n = length(V2) #sample sizes
e = reg$residuals # residuals
s = sqrt(sum(e^2)/(n-2)) # estimated standard deviation = root MSE
s
## [1] 0.623125
W = sqrt(2*qf(0.95,2,n-2)) # quantity of F-distribution
W
## [1] 2.479149
Yhat = fitted.values(reg) # Yhat = b0 + b1x = predict(reg)
Sxx = (n-1)*var(V2) # sum of the squares of the difference between each x and its mean
margin = W*s*sqrt(1/n + (V2 - mean(V2))^2/Sxx)
upper.band = Yhat + margin # 95% upper
lower.band = Yhat - margin # 95% lower
plot(V2, V1, xlab = "ACT Score", ylab = "Y = fresh Year GPA", xlim = c(20,30))
abline(reg,col="red")
lines(V2 ,upper.band,col="blue")
lines(V2 ,lower.band,col="blue")