The test statistic is 3.386119.
mod<-lm(purity~hydro, data = oxygen)
n<-dim(oxygen)[1]
beta_1<-mod$coefficients[2]
ss_res <- sum(mod$residuals^2)
ms_res<- ss_res/(n-2)
se_b1<-sqrt(ms_res/sum((hydro-mean(hydro))^2))
t_stat <- beta_1/se_b1
t_stat
## hydro
## 3.386119
The p-value for a two sided T test is 0.003291122
# obtain the p value for a two sided test.
pt(abs(t_stat), df = n-2, lower.tail = FALSE)*2
## hydro
## 0.003291122
There are 18 degrees of freedom.
df <- n-2
df
## [1] 18
Confirm with the output table
summary(mod)
##
## Call:
## lm(formula = purity ~ hydro, data = oxygen)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.6724 -3.2113 -0.0626 2.5783 7.3037
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 77.863 4.199 18.544 3.54e-13 ***
## hydro 11.801 3.485 3.386 0.00329 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.597 on 18 degrees of freedom
## Multiple R-squared: 0.3891, Adjusted R-squared: 0.3552
## F-statistic: 11.47 on 1 and 18 DF, p-value: 0.003291
Hypothesis: The null hypothesis is that H0 = 0, the alternative hypothesis is that HA != 0.However, we reject the null hypothesis with a p-value of 0.003291122 at the 0.05 significance level. There is highly suggestive evidence that percentage of hydrocarbons affects the purity of oxygen.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.3
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
# Load Data ( make sure URL is on one line)
nfl<-read.csv("https://raw.githubusercontent.com/kitadasmalley/sp21_MATH376L
MT/main/data/nlf1976.csv", header = TRUE)
nfl_mod<-lm(y~x8, data = nfl)
summary(nfl_mod)
##
## Call:
## lm(formula = y ~ x8, data = nfl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.804 -1.591 -0.647 2.032 4.580
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.788251 2.696233 8.081 1.46e-08 ***
## x8 -0.007025 0.001260 -5.577 7.38e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.393 on 26 degrees of freedom
## Multiple R-squared: 0.5447, Adjusted R-squared: 0.5272
## F-statistic: 31.1 on 1 and 26 DF, p-value: 7.381e-06
n<-dim(nfl)[1]
beta_1<-nfl_mod$coefficients[2]
nfl_mod$coefficients
## (Intercept) x8
## 21.7882509 -0.0070251
ss_res <- sum(nfl_mod$residuals^2)
ms_res<- ss_res/(n-2)
se_b1 <- sqrt(ms_res/sum((nfl$x8-mean(nfl$x8))^2)) # standard error
se_b1
## [1] 0.00125965
# Confidence Interval for Slope
crt_value<-qt(.975, df=n-2)
crt_value
## [1] 2.055529
# critical value
conf<-beta_1+c(-1,1)*crt_value*se_b1 # confidence interval
conf
## [1] -0.009614347 -0.004435854
beta_1 # point estimate
## x8
## -0.0070251
crt_value*se_b1 # margin of error
## [1] 0.002589247
(n-2) # degrees of freedom
## [1] 26
confint(nfl_mod)
## 2.5 % 97.5 %
## (Intercept) 16.246064040 27.330437725
## x8 -0.009614347 -0.004435854
We have 95% confidence that, with 26 degrees of freedom, that the estimated gained yardage lies between -0.009614347 and -0.004435854. The point estimate is 21.788251, the critical value is 2.055529, the standard error is 0.00125965 (0.001260 in the summary), and the margin of error is 0.002589247.
beta_0<-nfl_mod$coefficients[1]
X8<-1800
beta_0+(beta_1*X8)
## (Intercept)
## 9.14307
You’d win an estimated 9 games when rushing is limited to 1800 yards.
n<-dim(nfl)[1]
ft_value<-beta_0+(beta_1*X8) # fitted value
ft_value
## (Intercept)
## 9.14307
crt_value<-qt(.95, df = n-2) # critical value
crt_value
## [1] 1.705618
x_bar<-mean(nfl$x8)
std_err<-sqrt(ms_res*(1+(1/n)+((X8-x_bar)^2/sum((nfl$x8-x_bar)^2)))) # standard error
std_err
## [1] 2.466366
marg<-crt_value*std_err # margin of error
marg
## [1] 4.206679
conf<-ft_value+(c(-1,1)*marg)
conf
## [1] 4.936392 13.349749
If 1800 yards were the limit on yards rushed, with 26 degrees of freedom, we have 90% confidence that the number of games won would be between 4.936392 and 13.349749 (since games don’t work like that, we’ll call it 4-13 games). The margin of error is 4.206679, the critical value is 1.705618, and the stand error for the estimated value is 2.466366. Fitted value is 9.14307.
n<-dim(nfl)[1]
ft_value<-beta_0+(beta_1*X8) # fitted value
ft_value
## (Intercept)
## 9.14307
crt_value<-qt(.95, df = n-2) # critical value
crt_value
## [1] 1.705618
x_bar<-mean(nfl$x8)
std_err<-sqrt(ms_res*((1/n)+((X8-x_bar)^2/sum((nfl$x8-x_bar)^2)))) # standard error
std_err
## [1] 0.597594
marg<-crt_value*std_err # margin of error
marg
## [1] 1.019267
conf<-ft_value+(c(-1,1)*marg)
conf
## [1] 8.123803 10.162337
If 1800 yards were the limit on yards rushed, with 26 degrees of freedom, with a mean 90% confidence that the number of games won would be between 8.123803 and 10.162337 (8-10 games). The margin of error is 1.019267, the critical value is 1.705618, and the standard error for the estimated value is 0.597594 Fitted value is 9.14307.
Since there is more noise around one variable, there is a larger margin of error, thus a larger confidence interval. The mean confidence interval has less noise, which makes for a smaller confidence interval.
confBand<-predict(nfl_mod, interval="confidence")
predBand<-predict(nfl_mod, interval="predict")
## Warning in predict.lm(nfl_mod, interval = "predict"): predictions on current data refer to _future_ responses
colnames(predBand)<-c("fit2", "lwr2", "upr2")
newDF<-cbind(nfl, confBand, predBand)
ggplot(newDF, aes(x=nfl$x8, y=nfl$y))+
geom_point()+
geom_abline(slope=nfl_mod$coefficients[2], intercept=nfl_mod$coefficients[1],
color="blue", lty=2, lwd=1)+
geom_line(aes(y=lwr), color="green", lty=2, lwd=1)+
geom_line(aes(y=upr), color="green", lty=2, lwd=1)+
geom_line(aes(y=lwr2), color="red", lty=2, lwd=1)+
geom_line(aes(y=upr2), color="red", lty=2, lwd=1)+
theme_bw()