1. question
do our cars have a different mpg than 17?
Model c: \(MPG_i=17+\epsilon_i\)
Model a: \(MPG_i=\beta_0+\epsilon_i\)
PA-PC: 1
N-PA: 31
Null: \(\beta_0 = 17\)
2. code models
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
nrow(mtcars)
## [1] 32
mtcars$mpg17<- mtcars$mpg - 17
mod.c<- lm(mpg17~0, data = mtcars)
mod.a<- lm(mpg17~1, data = mtcars)
3. look at results
modelCompare(mod.c, mod.a)
## SSE (Compact) = 1431.71
## SSE (Augmented) = 1126.047
## Delta R-Squared = 0
## Partial Eta-Squared (PRE) = 0.2134949
## F(1,31) = 8.414876, p = 0.00678831
mcSummary(mod.c)
## lm(formula = mpg17 ~ 0, data = mtcars)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 0
## Error 1431.71 32 44.741
## Corr Total 1431.71 32 44.741
##
## RMSE AdjEtaSq
## 6.689 NA
##
## Coefficients: none
mcSummary(mod.a)
## lm(formula = mpg17 ~ 1, data = mtcars)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 0.000 0 Inf 0
## Error 1126.047 31 36.324
## Corr Total 1126.047 31 36.324
##
## RMSE AdjEtaSq
## 6.027 0
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 3.091 1.065 2.901 305.663 0.213 NA 0.918 5.264 0.007
SSR for the intercept here is just the SSE C - SSE A. we can get these values from the individual mcSummaries or from adding the SSR(3) to SSE(A) from the mcSummary(mod.a).
4. conclude
we wanted to see if our cars have a different average miles per gallon than 17 mpgs. we can conclude from this that our cars do have a significantly different, and greater, mpg than 17 (M=20.091, F(1,31)=8.41, PRE=.213, p=.007). we can reject the null that our cars have an average mpg of 17.
5. what was our power?
true PRE
\(1- ((1-PRE)*\frac{n-P_C} {n-P_A})\)
f2 (for pwr calc)
\(f2= \eta^2/(1-\eta^2)\)
truePRE<- 1- ((1-.213)*((32)/(31)))
f2<- truePRE/(1-truePRE)
pwr.f2.test(u = 1, v = 31, f2 = f2, sig.level = 0.05)
##
## Multiple regression power calculation
##
## u = 1
## v = 31
## f2 = 0.2309403
## sig.level = 0.05
## power = 0.7624732
we had 76% power to detect a true effect of .23 or greater at a significance level of 0.05.
1. question
do number of cylanders significantly predict mpg in cars?
Model c: \(MPG_i=\beta_0+\epsilon_i\)
Model a: \(MPG_i=\beta_0+\beta_1*cyl+\epsilon_i\)
PA-PC: 1
N-PA: 30
Null: \(\beta_1 = 0\)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb mpg17
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 4.0
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 4.0
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 5.8
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 4.4
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 1.7
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 1.1
mod.c<- lm(mpg~1,data= mtcars)
mod.a<- lm(mpg~cyl,data = mtcars)
2. look at results
modelCompare(mod.c, mod.a)
## SSE (Compact) = 1126.047
## SSE (Augmented) = 308.3342
## Delta R-Squared = 0.72618
## Partial Eta-Squared (PRE) = 0.72618
## F(1,30) = 79.56103, p = 6.112687e-10
mcSummary(mod.c)
## lm(formula = mpg ~ 1, data = mtcars)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 0.000 0 Inf 0
## Error 1126.047 31 36.324
## Corr Total 1126.047 31 36.324
##
## RMSE AdjEtaSq
## 6.027 0
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 20.091 1.065 18.857 12916.26 0.92 NA 17.918 22.264 0
mcSummary(mod.a)
## lm(formula = mpg ~ cyl, data = mtcars)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 817.713 1 817.713 0.726 79.561 0
## Error 308.334 30 10.278
## Corr Total 1126.047 31 36.324
##
## RMSE AdjEtaSq
## 3.206 0.717
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 37.885 2.074 18.268 3429.836 0.918 NA 33.649 42.120 0
## cyl -2.876 0.322 -8.920 817.713 0.726 NA -3.534 -2.217 0
the slope model comparison is whether the slope is different from 0, aka a model where we have no slope. aka our model C. that’s why the SSR is the same as the SS(model) from the top.
if we mean center and re run, the intercept = the mean mpgs. currently, the intercept is predicting mpgs for a car with 0 cylinders (not plausible).
1b. code/ run models
mtcars$cyl_mean<- mtcars$cyl-mean(mtcars$cyl)
mod.c<- lm(mpg~1,data= mtcars)
mod.a<- lm(mpg~cyl_mean,data = mtcars)
2b. look at results
modelCompare(mod.c, mod.a)
## SSE (Compact) = 1126.047
## SSE (Augmented) = 308.3342
## Delta R-Squared = 0.72618
## Partial Eta-Squared (PRE) = 0.72618
## F(1,30) = 79.56103, p = 6.112687e-10
mcSummary(mod.c)
## lm(formula = mpg ~ 1, data = mtcars)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 0.000 0 Inf 0
## Error 1126.047 31 36.324
## Corr Total 1126.047 31 36.324
##
## RMSE AdjEtaSq
## 6.027 0
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 20.091 1.065 18.857 12916.26 0.92 NA 17.918 22.264 0
mcSummary(mod.a)
## lm(formula = mpg ~ cyl_mean, data = mtcars)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 817.713 1 817.713 0.726 79.561 0
## Error 308.334 30 10.278
## Corr Total 1126.047 31 36.324
##
## RMSE AdjEtaSq
## 3.206 0.717
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 20.091 0.567 35.45 12916.263 0.977 NA 18.933 21.248 0
## cyl_mean -2.876 0.322 -8.92 817.713 0.726 NA -3.534 -2.217 0
what changes? not much except the intercept in the mean centered model A (CIs included). why does intercept PRE etc change? the intercept model is forcing the line for the intercept to go through 0,0. this is not really meaningful to us, but the comparison of our old value to 0,0 vs our mean centered value vs 0,0 will differ.
3. make a conclusion, this time talk about both estimates, use CIs and describe a p value
we wanted too see if the number of cylinders a car has significantly predicts mpgs. we found the number of cylinders does significantly predict mpgs (\(b_1=-2.876, t=-8.92, PRE=.726, p<.001\)). the average mpg of a car in our sample is 20.091, and as cylinders increase by 1, mpgs decrease on average by 2.876. assuming the null (that the average effect a cylinder has on mpgs is 0) is true, if we re-sampled this population over and over, 95% of the time we would expect the true effect to be between 95% CI = [-3.534, -2.217]. similarly assuming our null is true, our p value tells us that we can expect to estimate an effect greater than or equal to what we found here less than 1% of the time.
4. write a 5 o’clock news summary
we analyzed whether the number of cylinders a car has significantly influences car miles per gallon. we found the number of cylinders does significantly predict mpgs, and that as the number of cylinders increases 1, the avg mpg decreases 2.876.
1. question: is the relationship between number of cylinders and mpgs in a car such that when you increase cyl by 1 the mpgs of a car increases 5?
Model c: \(MPG_i=\beta_0+5*cyl+\epsilon_i\)
Model a: \(MPG_i=\beta_0+\beta_1*cyl+\epsilon_i\)
PA-PC: 1
N-PA: 30
Null: \(\beta_1 = 5\)
mod.c<- lm(mpg~1, offset = 5 * cyl, data=mtcars) # ~1= estimate the intercept, offset= constrain slope
mod.a<- lm(mpg~cyl,data = mtcars) # ~cyl = estimate slope, intercept is implied when we do this
2. look at results
modelCompare(mod.c, mod.a)
## SSE (Compact) = 6441.36
## SSE (Augmented) = 308.3342
## Delta R-Squared = 0.72618
## Partial Eta-Squared (PRE) = 0.9521321
## F(1,30) = 596.7251, p = 2.346365e-21
mcSummary(mod.c)
## lm(formula = mpg ~ 1, data = mtcars, offset = 5 * cyl)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 2471.875 0 Inf 0.277
## Error 6441.360 31 207.786
## Corr Total 8913.235 31 287.524
##
## RMSE AdjEtaSq
## 14.415 0.277
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) -10.847 2.548 -4.257 3764.95 0.369 NA -16.044 -5.65 0
mcSummary(mod.a)
## lm(formula = mpg ~ cyl, data = mtcars)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 817.713 1 817.713 0.726 79.561 0
## Error 308.334 30 10.278
## Corr Total 1126.047 31 36.324
##
## RMSE AdjEtaSq
## 3.206 0.717
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 37.885 2.074 18.268 3429.836 0.918 NA 33.649 42.120 0
## cyl -2.876 0.322 -8.920 817.713 0.726 NA -3.534 -2.217 0
the output for the model A here tells us there is a significant relationship betweeen cyls and mpgs (t=-8.920, PRE=817.713, p<.001) such that as cyls increase 1, mpgs decrease 2.876. BUT this doesn’t answer our model comparison from above! The model comparison for the slope of model a here is:
Model c: \(MPG_i=\beta_0+\epsilon_i\)
Model a: \(MPG_i=\beta_0+\beta_1*cyl+\epsilon_i\)
PA-PC: 1
N-PA: 30
Null: \(\beta_1 = 0\)
vs what we want to ask with this specific question:
Model c: \(MPG_i=\beta_0+5*cyl+\epsilon_i\)
Model a: \(MPG_i=\beta_0+\beta_1*cyl+\epsilon_i\)
PA-PC: 1
N-PA: 30
Null: \(\beta_1 = 5\)
you can see how they’re similar! however, since the slope doesn’t give us this test we need to calculate F/PRE etc.
SSE.c<- 6441.360 # (from the mod.c SS ERROR that has the offset code written in. the SS ERROR is going to be the SSE for the current model)
SSE.a<- 308.334 # (from the mod.a SS ERROR. again, the SSE for the current model at hand. can be confusing if we think of it as SSE A in a lot of cases, but it will be the SSE for the current model we are testing which is how we can test this model comparison which differs from the default R model comparison of the slope)
(PRE<- (SSE.c-SSE.a)/SSE.c)
## [1] 0.9521322
PAminusPC<- 1
NminusPA<- 30
(F_stat<- (PRE/PAminusPC)/((1-PRE)/NminusPA))
## [1] 596.7256
# notice how our PRE and F line up with from the model compare from earlier!
modelCompare(mod.c, mod.a)
## SSE (Compact) = 6441.36
## SSE (Augmented) = 308.3342
## Delta R-Squared = 0.72618
## Partial Eta-Squared (PRE) = 0.9521321
## F(1,30) = 596.7251, p = 2.346365e-21
# therefore, we can conclude that we reject the null that the relationship between mpgs and cyls is such that as cyls increase 1, mpgs increase 5.
# we can easily see this because as we said, as we increase cyls 1, mpgs actually decrease.
1. is there a difference in mpgs between american made and non-american made cars?
table(mtcars$am) #19 zeroes (let's say american) and 13 ones (let's say non-american)
##
## 0 1
## 19 13
tapply(mtcars$mpg, mtcars$am, mean)
## 0 1
## 17.14737 24.39231
# tapply gives us the mean mpg stratified by american or not
# so the mean mpg of american (0) cars = 17.147
# and the mean mpg of non-am (1) cars = 24.392
mod.a<- lm(mpg~am, data=mtcars)
mcSummary(mod.a) # this model asks whether there is a relationships between american vs non american cars and mpgs. but more specifically, it tests/ can answer what the mean mpg is for an american car. because the intercept = the mean mpgs when b1(american)=0. which is the mean mpgs for american cars.
## lm(formula = mpg ~ am, data = mtcars)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 405.151 1 405.151 0.36 16.86 0
## Error 720.897 30 24.030
## Corr Total 1126.047 31 36.324
##
## RMSE AdjEtaSq
## 4.902 0.338
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 17.147 1.125 15.247 5586.613 0.886 NA 14.851 19.444 0
## am 7.245 1.764 4.106 405.151 0.360 NA 3.642 10.848 0
# we could write out the equation as mpgs=17.147+7.245*american
# therefore, the intercept 17.147 is the mean mpgs for american cars. we saw that above! remember
# the slope, when b1 is coded as 0 and 1, is the difference in means for our Y value.
# if we add the slope (7.245) to the intercept (17.147), we get 24.392, or the mean mpgs of non american cars, from above.
# the t, PRE and p from the slope of the output tells us there is a significant difference in mpgs between american and non american made cars (t=4.106, PRE=0.360, p<.001).
we could have written out that model comparison as:
Model c: \(MPG_i=\beta_0+\epsilon_i\)
Model a: \(MPG_i=\beta_0+beta_1*am++\epsilon_i\)
PA-PC: 1
N-PA: 30
Null: \(\beta_1 = 0\)
that question might also be asked as: “do american and non-american made cars differ on their mpgs?”
#testing the explicit model C as well as the model A:
mod.c<- lm(mpg~1, data=mtcars)
mcSummary(mod.c) # remember the intercept of this model is just the mean mpgs, or mpgs of the average car in our sample
## lm(formula = mpg ~ 1, data = mtcars)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 0.000 0 Inf 0
## Error 1126.047 31 36.324
## Corr Total 1126.047 31 36.324
##
## RMSE AdjEtaSq
## 6.027 0
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 20.091 1.065 18.857 12916.26 0.92 NA 17.918 22.264 0
mod.a<- lm(mpg~am, data=mtcars)
mcSummary(mod.a) # intercept here is now mpgs when am=0, so average mpg of an american made car
## lm(formula = mpg ~ am, data = mtcars)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 405.151 1 405.151 0.36 16.86 0
## Error 720.897 30 24.030
## Corr Total 1126.047 31 36.324
##
## RMSE AdjEtaSq
## 4.902 0.338
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 17.147 1.125 15.247 5586.613 0.886 NA 14.851 19.444 0
## am 7.245 1.764 4.106 405.151 0.360 NA 3.642 10.848 0
modelCompare(mod.c, mod.a)
## SSE (Compact) = 1126.047
## SSE (Augmented) = 720.8966
## Delta R-Squared = 0.3597989
## Partial Eta-Squared (PRE) = 0.3597989
## F(1,30) = 16.86028, p = 0.0002850207
offsets are asking about the relationship between two variables if you hold the slope constant. so here, we might ask: is the relationship between mpgs and cylinders such that as cylinders increase 1, mpgs increase by 5?
or our null hypothesis is that for every additional cylinder, mpgs increase by 5.
mod.c<- lm(mpg~1,offset = 5*cyl, data=mtcars)
mcSummary(mod.c)
## lm(formula = mpg ~ 1, data = mtcars, offset = 5 * cyl)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 2471.875 0 Inf 0.277
## Error 6441.360 31 207.786
## Corr Total 8913.235 31 287.524
##
## RMSE AdjEtaSq
## 14.415 0.277
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) -10.847 2.548 -4.257 3764.95 0.369 NA -16.044 -5.65 0
mod.a<- lm(mpg~cyl, data=mtcars)
mcSummary(mod.a)
## lm(formula = mpg ~ cyl, data = mtcars)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 817.713 1 817.713 0.726 79.561 0
## Error 308.334 30 10.278
## Corr Total 1126.047 31 36.324
##
## RMSE AdjEtaSq
## 3.206 0.717
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 37.885 2.074 18.268 3429.836 0.918 NA 33.649 42.120 0
## cyl -2.876 0.322 -8.920 817.713 0.726 NA -3.534 -2.217 0
modelCompare(mod.c, mod.a)
## SSE (Compact) = 6441.36
## SSE (Augmented) = 308.3342
## Delta R-Squared = 0.72618
## Partial Eta-Squared (PRE) = 0.9521321
## F(1,30) = 596.7251, p = 2.346365e-21
our model C doesn’t give output for the slope because we are holding it constant. so we are asking with that model c if cyls have a relationship with mpgs such that a 1 unit increase in cyls is associated with a 5 unit increase in mpgs, but R doesn’t explicitly give us those values.
mod.a<- lm(mpg~cyl, data=mtcars)
mcSummary(mod.a) # for model a, we know the slope is really -2.876, or as cyls increase 1, the mpgs actually DECREASE 2.876, therefore we can reject our null (t=-8.920, PRE=.726, p<.001).
## lm(formula = mpg ~ cyl, data = mtcars)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 817.713 1 817.713 0.726 79.561 0
## Error 308.334 30 10.278
## Corr Total 1126.047 31 36.324
##
## RMSE AdjEtaSq
## 3.206 0.717
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 37.885 2.074 18.268 3429.836 0.918 NA 33.649 42.120 0
## cyl -2.876 0.322 -8.920 817.713 0.726 NA -3.534 -2.217 0
# AND we can see if we offset this model to hold the slope constant at our actual slope:
mod.a2<- lm(mpg~1, offset=-2.876*cyl, data=mtcars)
mcSummary(mod.a2) # now our intercept is the same in this offset model as in the model where we estimated the slope and intercept.
## lm(formula = mpg ~ 1, data = mtcars, offset = -2.876 * cyl)
##
## Omnibus ANOVA
## SS df MS EtaSq F p
## Model 817.832 0 Inf 0.726
## Error 308.334 31 9.946
## Corr Total 1126.167 31 36.328
##
## RMSE AdjEtaSq
## 3.154 0.726
##
## Coefficients
## Est StErr t SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 37.886 0.558 67.955 45930.86 0.993 NA 36.749 39.023 0
# so when we are offsetting, we are testing something about the intercept, or the average Y value assuming a certain relationship between Y and X (or here, mpgs and cyls)