review before exam 1

testing a simple model

1. question

do our cars have a different mpg than 17?

Model c: \(MPG_i=17+\epsilon_i\)

Model a: \(MPG_i=\beta_0+\epsilon_i\)

PA-PC: 1

N-PA: 31

Null: \(\beta_0 = 17\)

2. code models

head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

nrow(mtcars)

## [1] 32

mtcars$mpg17<- mtcars$mpg - 17
mod.c<- lm(mpg17~0, data = mtcars)
mod.a<- lm(mpg17~1, data = mtcars)

3. look at results

modelCompare(mod.c, mod.a)

## SSE (Compact) =  1431.71 
## SSE (Augmented) =  1126.047 
## Delta R-Squared =  0 
## Partial Eta-Squared (PRE) =  0.2134949 
## F(1,31) = 8.414876, p = 0.00678831

mcSummary(mod.c)

## lm(formula = mpg17 ~ 0, data = mtcars)
## 
## Omnibus ANOVA
##                 SS df     MS EtaSq F p
## Model               0                 
## Error      1431.71 32 44.741          
## Corr Total 1431.71 32 44.741          
## 
##   RMSE AdjEtaSq
##  6.689       NA
## 
## Coefficients: none

mcSummary(mod.a)

## lm(formula = mpg17 ~ 1, data = mtcars)
## 
## Omnibus ANOVA
##                  SS df     MS EtaSq F p
## Model         0.000  0    Inf     0    
## Error      1126.047 31 36.324          
## Corr Total 1126.047 31 36.324          
## 
##   RMSE AdjEtaSq
##  6.027        0
## 
## Coefficients
##               Est StErr     t  SSR(3) EtaSq tol CI_2.5 CI_97.5     p
## (Intercept) 3.091 1.065 2.901 305.663 0.213  NA  0.918   5.264 0.007

SSR for the intercept here is just the SSE C - SSE A. we can get these values from the individual mcSummaries or from adding the SSR(3) to SSE(A) from the mcSummary(mod.a).

4. conclude

we wanted to see if our cars have a different average miles per gallon than 17 mpgs. we can conclude from this that our cars do have a significantly different, and greater, mpg than 17 (M=20.091, F(1,31)=8.41, PRE=.213, p=.007). we can reject the null that our cars have an average mpg of 17.

5. what was our power?

true PRE

\(1- ((1-PRE)*\frac{n-P_C} {n-P_A})\)

f2 (for pwr calc)

\(f2= \eta^2/(1-\eta^2)\)

truePRE<- 1- ((1-.213)*((32)/(31)))
f2<- truePRE/(1-truePRE)

pwr.f2.test(u = 1, v = 31, f2 = f2, sig.level = 0.05)

## 
##      Multiple regression power calculation 
## 
##               u = 1
##               v = 31
##              f2 = 0.2309403
##       sig.level = 0.05
##           power = 0.7624732

we had 76% power to detect a true effect of .23 or greater at a significance level of 0.05.

testing a bivariate (2 variable) association

1. question

do number of cylanders significantly predict mpg in cars?

Model c: \(MPG_i=\beta_0+\epsilon_i\)

Model a: \(MPG_i=\beta_0+\beta_1*cyl+\epsilon_i\)

PA-PC: 1

N-PA: 30

Null: \(\beta_1 = 0\)

head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb mpg17
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4   4.0
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4   4.0
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1   5.8
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1   4.4
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2   1.7
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1   1.1

mod.c<- lm(mpg~1,data= mtcars)
mod.a<- lm(mpg~cyl,data = mtcars)

2. look at results

modelCompare(mod.c, mod.a)

## SSE (Compact) =  1126.047 
## SSE (Augmented) =  308.3342 
## Delta R-Squared =  0.72618 
## Partial Eta-Squared (PRE) =  0.72618 
## F(1,30) = 79.56103, p = 6.112687e-10

mcSummary(mod.c)

## lm(formula = mpg ~ 1, data = mtcars)
## 
## Omnibus ANOVA
##                  SS df     MS EtaSq F p
## Model         0.000  0    Inf     0    
## Error      1126.047 31 36.324          
## Corr Total 1126.047 31 36.324          
## 
##   RMSE AdjEtaSq
##  6.027        0
## 
## Coefficients
##                Est StErr      t   SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 20.091 1.065 18.857 12916.26  0.92  NA 17.918  22.264 0

mcSummary(mod.a)

## lm(formula = mpg ~ cyl, data = mtcars)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq      F p
## Model       817.713  1 817.713 0.726 79.561 0
## Error       308.334 30  10.278               
## Corr Total 1126.047 31  36.324               
## 
##   RMSE AdjEtaSq
##  3.206    0.717
## 
## Coefficients
##                Est StErr      t   SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 37.885 2.074 18.268 3429.836 0.918  NA 33.649  42.120 0
## cyl         -2.876 0.322 -8.920  817.713 0.726  NA -3.534  -2.217 0

the slope model comparison is whether the slope is different from 0, aka a model where we have no slope. aka our model C. that’s why the SSR is the same as the SS(model) from the top.

if we mean center and re run, the intercept = the mean mpgs. currently, the intercept is predicting mpgs for a car with 0 cylinders (not plausible).

1b. code/ run models

mtcars$cyl_mean<- mtcars$cyl-mean(mtcars$cyl)
mod.c<- lm(mpg~1,data= mtcars)
mod.a<- lm(mpg~cyl_mean,data = mtcars)

2b. look at results

modelCompare(mod.c, mod.a)

## SSE (Compact) =  1126.047 
## SSE (Augmented) =  308.3342 
## Delta R-Squared =  0.72618 
## Partial Eta-Squared (PRE) =  0.72618 
## F(1,30) = 79.56103, p = 6.112687e-10

mcSummary(mod.c)

## lm(formula = mpg ~ 1, data = mtcars)
## 
## Omnibus ANOVA
##                  SS df     MS EtaSq F p
## Model         0.000  0    Inf     0    
## Error      1126.047 31 36.324          
## Corr Total 1126.047 31 36.324          
## 
##   RMSE AdjEtaSq
##  6.027        0
## 
## Coefficients
##                Est StErr      t   SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 20.091 1.065 18.857 12916.26  0.92  NA 17.918  22.264 0

mcSummary(mod.a)

## lm(formula = mpg ~ cyl_mean, data = mtcars)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq      F p
## Model       817.713  1 817.713 0.726 79.561 0
## Error       308.334 30  10.278               
## Corr Total 1126.047 31  36.324               
## 
##   RMSE AdjEtaSq
##  3.206    0.717
## 
## Coefficients
##                Est StErr     t    SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 20.091 0.567 35.45 12916.263 0.977  NA 18.933  21.248 0
## cyl_mean    -2.876 0.322 -8.92   817.713 0.726  NA -3.534  -2.217 0

what changes? not much except the intercept in the mean centered model A (CIs included). why does intercept PRE etc change? the intercept model is forcing the line for the intercept to go through 0,0. this is not really meaningful to us, but the comparison of our old value to 0,0 vs our mean centered value vs 0,0 will differ.

3. make a conclusion, this time talk about both estimates, use CIs and describe a p value

we wanted too see if the number of cylinders a car has significantly predicts mpgs. we found the number of cylinders does significantly predict mpgs (\(b_1=-2.876, t=-8.92, PRE=.726, p<.001\)). the average mpg of a car in our sample is 20.091, and as cylinders increase by 1, mpgs decrease on average by 2.876. assuming the null (that the average effect a cylinder has on mpgs is 0) is true, if we re-sampled this population over and over, 95% of the time we would expect the true effect to be between 95% CI = [-3.534, -2.217]. similarly assuming our null is true, our p value tells us that we can expect to estimate an effect greater than or equal to what we found here less than 1% of the time.

4. write a 5 o’clock news summary

we analyzed whether the number of cylinders a car has significantly influences car miles per gallon. we found the number of cylinders does significantly predict mpgs, and that as the number of cylinders increases 1, the avg mpg decreases 2.876.

constraining the slope

1. question: is the relationship between number of cylinders and mpgs in a car such that when you increase cyl by 1 the mpgs of a car increases 5?

Model c: \(MPG_i=\beta_0+5*cyl+\epsilon_i\)

Model a: \(MPG_i=\beta_0+\beta_1*cyl+\epsilon_i\)

PA-PC: 1

N-PA: 30

Null: \(\beta_1 = 5\)

mod.c<- lm(mpg~1, offset = 5 * cyl, data=mtcars) # ~1= estimate the intercept, offset= constrain slope
mod.a<- lm(mpg~cyl,data = mtcars) # ~cyl = estimate slope, intercept is implied when we do this

2. look at results

modelCompare(mod.c, mod.a)

## SSE (Compact) =  6441.36 
## SSE (Augmented) =  308.3342 
## Delta R-Squared =  0.72618 
## Partial Eta-Squared (PRE) =  0.9521321 
## F(1,30) = 596.7251, p = 2.346365e-21

mcSummary(mod.c)

## lm(formula = mpg ~ 1, data = mtcars, offset = 5 * cyl)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq F p
## Model      2471.875  0     Inf 0.277    
## Error      6441.360 31 207.786          
## Corr Total 8913.235 31 287.524          
## 
##    RMSE AdjEtaSq
##  14.415    0.277
## 
## Coefficients
##                 Est StErr      t  SSR(3) EtaSq tol  CI_2.5 CI_97.5 p
## (Intercept) -10.847 2.548 -4.257 3764.95 0.369  NA -16.044   -5.65 0

mcSummary(mod.a)

## lm(formula = mpg ~ cyl, data = mtcars)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq      F p
## Model       817.713  1 817.713 0.726 79.561 0
## Error       308.334 30  10.278               
## Corr Total 1126.047 31  36.324               
## 
##   RMSE AdjEtaSq
##  3.206    0.717
## 
## Coefficients
##                Est StErr      t   SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 37.885 2.074 18.268 3429.836 0.918  NA 33.649  42.120 0
## cyl         -2.876 0.322 -8.920  817.713 0.726  NA -3.534  -2.217 0

the output for the model A here tells us there is a significant relationship betweeen cyls and mpgs (t=-8.920, PRE=817.713, p<.001) such that as cyls increase 1, mpgs decrease 2.876. BUT this doesn’t answer our model comparison from above! The model comparison for the slope of model a here is:

Model c: \(MPG_i=\beta_0+\epsilon_i\)

Model a: \(MPG_i=\beta_0+\beta_1*cyl+\epsilon_i\)

PA-PC: 1

N-PA: 30

Null: \(\beta_1 = 0\)

vs what we want to ask with this specific question:

Model c: \(MPG_i=\beta_0+5*cyl+\epsilon_i\)

Model a: \(MPG_i=\beta_0+\beta_1*cyl+\epsilon_i\)

PA-PC: 1

N-PA: 30

Null: \(\beta_1 = 5\)

you can see how they’re similar! however, since the slope doesn’t give us this test we need to calculate F/PRE etc.

SSE.c<- 6441.360 # (from the mod.c SS ERROR that has the offset code written in. the SS ERROR is going to be the SSE for the current model)
SSE.a<- 308.334 # (from the mod.a SS ERROR. again, the SSE for the current model at hand. can be confusing if we think of it as SSE A in a lot of cases, but it will be the SSE for the current model we are testing which is how we can test this model comparison which differs from the default R model comparison of the slope)

(PRE<- (SSE.c-SSE.a)/SSE.c)

## [1] 0.9521322

PAminusPC<- 1
NminusPA<- 30
(F_stat<- (PRE/PAminusPC)/((1-PRE)/NminusPA))

## [1] 596.7256

# notice how our PRE and F line up with from the model compare from earlier!

modelCompare(mod.c, mod.a)

## SSE (Compact) =  6441.36 
## SSE (Augmented) =  308.3342 
## Delta R-Squared =  0.72618 
## Partial Eta-Squared (PRE) =  0.9521321 
## F(1,30) = 596.7251, p = 2.346365e-21

# therefore, we can conclude that we reject the null that the relationship between mpgs and cyls is such that as cyls increase 1, mpgs increase 5.
# we can easily see this because as we said, as we increase cyls 1, mpgs actually decrease.

categorical predictors

1. is there a difference in mpgs between american made and non-american made cars?

table(mtcars$am) #19 zeroes (let's say american) and 13 ones (let's say non-american)

## 
##  0  1 
## 19 13

tapply(mtcars$mpg, mtcars$am, mean)

##        0        1 
## 17.14737 24.39231

# tapply gives us the mean mpg stratified by american or not
# so the mean mpg of american (0) cars = 17.147
# and the mean mpg of non-am (1) cars = 24.392

mod.a<- lm(mpg~am, data=mtcars)
mcSummary(mod.a) # this model asks whether there is a relationships between american vs non american cars and mpgs. but more specifically, it tests/ can answer what the mean mpg is for an american car. because the intercept = the mean mpgs when b1(american)=0. which is the mean mpgs for american cars.

## lm(formula = mpg ~ am, data = mtcars)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq     F p
## Model       405.151  1 405.151  0.36 16.86 0
## Error       720.897 30  24.030              
## Corr Total 1126.047 31  36.324              
## 
##   RMSE AdjEtaSq
##  4.902    0.338
## 
## Coefficients
##                Est StErr      t   SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 17.147 1.125 15.247 5586.613 0.886  NA 14.851  19.444 0
## am           7.245 1.764  4.106  405.151 0.360  NA  3.642  10.848 0

# we could write out the equation as mpgs=17.147+7.245*american
# therefore, the intercept 17.147 is the mean mpgs for american cars. we saw that above! remember
# the slope, when b1 is coded as 0 and 1, is the difference in means for our Y value.
# if we add the slope (7.245) to the intercept (17.147), we get 24.392, or the mean mpgs of non american cars, from above.

# the t, PRE and p from the slope of the output tells us there is a significant difference in mpgs between american and non american made cars (t=4.106, PRE=0.360, p<.001).

we could have written out that model comparison as:

Model c: \(MPG_i=\beta_0+\epsilon_i\)

Model a: \(MPG_i=\beta_0+beta_1*am++\epsilon_i\)

PA-PC: 1

N-PA: 30

Null: \(\beta_1 = 0\)

that question might also be asked as: “do american and non-american made cars differ on their mpgs?”

#testing the explicit model C as well as the model A:
mod.c<- lm(mpg~1, data=mtcars)
mcSummary(mod.c) # remember the intercept of this model is just the mean mpgs, or mpgs of the average car in our sample

## lm(formula = mpg ~ 1, data = mtcars)
## 
## Omnibus ANOVA
##                  SS df     MS EtaSq F p
## Model         0.000  0    Inf     0    
## Error      1126.047 31 36.324          
## Corr Total 1126.047 31 36.324          
## 
##   RMSE AdjEtaSq
##  6.027        0
## 
## Coefficients
##                Est StErr      t   SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 20.091 1.065 18.857 12916.26  0.92  NA 17.918  22.264 0

mod.a<- lm(mpg~am, data=mtcars)
mcSummary(mod.a) # intercept here is now mpgs when am=0, so average mpg of an american made car

## lm(formula = mpg ~ am, data = mtcars)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq     F p
## Model       405.151  1 405.151  0.36 16.86 0
## Error       720.897 30  24.030              
## Corr Total 1126.047 31  36.324              
## 
##   RMSE AdjEtaSq
##  4.902    0.338
## 
## Coefficients
##                Est StErr      t   SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 17.147 1.125 15.247 5586.613 0.886  NA 14.851  19.444 0
## am           7.245 1.764  4.106  405.151 0.360  NA  3.642  10.848 0

modelCompare(mod.c, mod.a)

## SSE (Compact) =  1126.047 
## SSE (Augmented) =  720.8966 
## Delta R-Squared =  0.3597989 
## Partial Eta-Squared (PRE) =  0.3597989 
## F(1,30) = 16.86028, p = 0.0002850207

looking closer at offsets

offsets are asking about the relationship between two variables if you hold the slope constant. so here, we might ask: is the relationship between mpgs and cylinders such that as cylinders increase 1, mpgs increase by 5?

or our null hypothesis is that for every additional cylinder, mpgs increase by 5.

mod.c<- lm(mpg~1,offset = 5*cyl, data=mtcars)
mcSummary(mod.c)

## lm(formula = mpg ~ 1, data = mtcars, offset = 5 * cyl)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq F p
## Model      2471.875  0     Inf 0.277    
## Error      6441.360 31 207.786          
## Corr Total 8913.235 31 287.524          
## 
##    RMSE AdjEtaSq
##  14.415    0.277
## 
## Coefficients
##                 Est StErr      t  SSR(3) EtaSq tol  CI_2.5 CI_97.5 p
## (Intercept) -10.847 2.548 -4.257 3764.95 0.369  NA -16.044   -5.65 0

mod.a<- lm(mpg~cyl, data=mtcars)
mcSummary(mod.a)

## lm(formula = mpg ~ cyl, data = mtcars)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq      F p
## Model       817.713  1 817.713 0.726 79.561 0
## Error       308.334 30  10.278               
## Corr Total 1126.047 31  36.324               
## 
##   RMSE AdjEtaSq
##  3.206    0.717
## 
## Coefficients
##                Est StErr      t   SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 37.885 2.074 18.268 3429.836 0.918  NA 33.649  42.120 0
## cyl         -2.876 0.322 -8.920  817.713 0.726  NA -3.534  -2.217 0

modelCompare(mod.c, mod.a)

## SSE (Compact) =  6441.36 
## SSE (Augmented) =  308.3342 
## Delta R-Squared =  0.72618 
## Partial Eta-Squared (PRE) =  0.9521321 
## F(1,30) = 596.7251, p = 2.346365e-21

our model C doesn’t give output for the slope because we are holding it constant. so we are asking with that model c if cyls have a relationship with mpgs such that a 1 unit increase in cyls is associated with a 5 unit increase in mpgs, but R doesn’t explicitly give us those values.

mod.a<- lm(mpg~cyl, data=mtcars)
mcSummary(mod.a) # for model a, we know the slope is really -2.876, or as cyls increase 1, the mpgs actually DECREASE 2.876, therefore we can reject our null (t=-8.920, PRE=.726, p<.001).

## lm(formula = mpg ~ cyl, data = mtcars)
## 
## Omnibus ANOVA
##                  SS df      MS EtaSq      F p
## Model       817.713  1 817.713 0.726 79.561 0
## Error       308.334 30  10.278               
## Corr Total 1126.047 31  36.324               
## 
##   RMSE AdjEtaSq
##  3.206    0.717
## 
## Coefficients
##                Est StErr      t   SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 37.885 2.074 18.268 3429.836 0.918  NA 33.649  42.120 0
## cyl         -2.876 0.322 -8.920  817.713 0.726  NA -3.534  -2.217 0

# AND we can see if we offset this model to hold the slope constant at our actual slope:

mod.a2<- lm(mpg~1, offset=-2.876*cyl, data=mtcars)
mcSummary(mod.a2) # now our intercept is the same in this offset model as in the model where we estimated the slope and intercept.

## lm(formula = mpg ~ 1, data = mtcars, offset = -2.876 * cyl)
## 
## Omnibus ANOVA
##                  SS df     MS EtaSq F p
## Model       817.832  0    Inf 0.726    
## Error       308.334 31  9.946          
## Corr Total 1126.167 31 36.328          
## 
##   RMSE AdjEtaSq
##  3.154    0.726
## 
## Coefficients
##                Est StErr      t   SSR(3) EtaSq tol CI_2.5 CI_97.5 p
## (Intercept) 37.886 0.558 67.955 45930.86 0.993  NA 36.749  39.023 0

# so when we are offsetting, we are testing something about the intercept, or the average Y value assuming a certain relationship between Y and X (or here, mpgs and cyls)

review through ch 8

Claire Morrison

9/20/2022

review before exam 1

testing a simple model

testing a bivariate (2 variable) association

constraining the slope

categorical predictors

looking closer at offsets