Exercise 1

Question: Create a table1 and use GGally::ggpairs to explore your data.

Table 1: Darwin's finches by species
conirostris
(N=185)
difficilis
(N=102)
fortis
(N=510)
fuliginosa
(N=318)
magnirostris
(N=90)
scandens
(N=229)
Beak height (mm)
Mean (SD) 14.9 (1.63) 8.17 (0.494) 12.3 (1.29) 8.00 (0.508) 19.9 (1.83) 9.56 (0.584)
Median [Min, Max] 15.0 [10.5, 18.7] 8.10 [7.10, 9.70] 12.0 [8.60, 16.6] 8.00 [6.70, 10.2] 20.0 [11.5, 23.2] 9.60 [8.20, 11.9]
Wing (lenght) (mm)
Mean (SD) 77.3 (2.94) 64.7 (3.35) 69.9 (3.45) 61.9 (2.39) 81.4 (3.92) 70.6 (2.20)
Median [Min, Max] 77.0 [68.0, 84.0] 64.0 [58.0, 71.0] 70.0 [60.0, 80.0] 62.0 [55.0, 70.0] 81.0 [70.0, 89.0] 71.0 [65.0, 76.0]
Upper beak (mm)
Mean (SD) 21.0 (1.24) 13.8 (0.720) 16.5 (1.21) 12.5 (0.649) 22.8 (1.64) 19.6 (1.09)
Median [Min, Max] 21.0 [18.1, 24.1] 13.8 [12.3, 15.5] 16.3 [13.4, 20.2] 12.5 [11.0, 15.0] 22.9 [15.0, 25.9] 19.6 [17.1, 22.5]
Sex
F 59 (31.9%) 29 (28.4%) 155 (30.4%) 128 (40.3%) 29 (32.2%) 72 (31.4%)
M 126 (68.1%) 73 (71.6%) 355 (69.6%) 190 (59.7%) 61 (67.8%) 157 (68.6%)

Exercise 2

Question: Create a table1 and use GGally::ggpairs to explore your data.

## 
## Call:
## lm(formula = beakh ~ wingl, data = finches)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9257 -1.0104  0.0236  1.0507  6.1625 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.397361   0.503009  -40.55   <2e-16 ***
## wingl         0.457641   0.007204   63.53   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.734 on 1432 degrees of freedom
## Multiple R-squared:  0.7381, Adjusted R-squared:  0.7379 
## F-statistic:  4035 on 1 and 1432 DF,  p-value: < 2.2e-16
## # A tibble: 1,434 × 8
##    beakh wingl .fitted .resid     .hat .sigma   .cooksd .std.resid
##    <dbl> <dbl>   <dbl>  <dbl>    <dbl>  <dbl>     <dbl>      <dbl>
##  1  10.8    69   11.2  -0.380 0.000702   1.73 0.0000169     -0.219
##  2  10.3    65    9.35  0.951 0.00105    1.73 0.000158       0.548
##  3  10      65    9.35  0.651 0.00105    1.73 0.0000742      0.375
##  4   9.5    68   10.7  -1.22  0.000738   1.73 0.000184      -0.705
##  5  11      66    9.81  1.19  0.000913   1.73 0.000216       0.688
##  6  10.4    68   10.7  -0.322 0.000738   1.73 0.0000128     -0.186
##  7  11.3    67   10.3   1.04  0.000808   1.73 0.000144       0.597
##  8  10.6    69   11.2  -0.580 0.000702   1.73 0.0000393     -0.334
##  9  10.8    67   10.3   0.535 0.000808   1.73 0.0000386      0.309
## 10  10.3    68   10.7  -0.422 0.000738   1.73 0.0000219     -0.244
## # ℹ 1,424 more rows

## 
## Call:
## lm(formula = beakh ~ wingl + species, data = finches)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.4837 -0.5140 -0.0194  0.5005  3.1530 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -2.425973   0.615416  -3.942 8.47e-05 ***
## wingl                0.224125   0.007919  28.301  < 2e-16 ***
## speciesdifficilis   -3.913210   0.149223 -26.224  < 2e-16 ***
## speciesfortis       -0.943352   0.097067  -9.719  < 2e-16 ***
## speciesfuliginosa   -3.444110   0.147575 -23.338  < 2e-16 ***
## speciesmagnirostris  4.048605   0.120771  33.523  < 2e-16 ***
## speciesscandens     -3.824673   0.103945 -36.795  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9042 on 1427 degrees of freedom
## Multiple R-squared:  0.929,  Adjusted R-squared:  0.9287 
## F-statistic:  3114 on 6 and 1427 DF,  p-value: < 2.2e-16
## # A tibble: 1,434 × 9
##    beakh wingl species .fitted .resid    .hat .sigma   .cooksd .std.resid
##    <dbl> <dbl> <chr>     <dbl>  <dbl>   <dbl>  <dbl>     <dbl>      <dbl>
##  1  10.8    69 fortis     12.1 -1.30  0.00202  0.904 0.000595      -1.43 
##  2  10.3    65 fortis     11.2 -0.899 0.00380  0.904 0.000540      -0.996
##  3  10      65 fortis     11.2 -1.20  0.00380  0.904 0.000961      -1.33 
##  4   9.5    68 fortis     11.9 -2.37  0.00224  0.902 0.00221       -2.63 
##  5  11      66 fortis     11.4 -0.423 0.00312  0.904 0.0000982     -0.468
##  6  10.4    68 fortis     11.9 -1.47  0.00224  0.904 0.000849      -1.63 
##  7  11.3    67 fortis     11.6 -0.347 0.00260  0.904 0.0000551     -0.384
##  8  10.6    69 fortis     12.1 -1.50  0.00202  0.904 0.000793      -1.66 
##  9  10.8    67 fortis     11.6 -0.847 0.00260  0.904 0.000328      -0.938
## 10  10.3    68 fortis     11.9 -1.57  0.00224  0.904 0.000969      -1.74 
## # ℹ 1,424 more rows

## 
## Call:
## lm(formula = beakh ~ wingl + species + wingl:species, data = finches)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7044 -0.4571 -0.0223  0.4289  3.3790 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -11.09673    1.68912  -6.570 7.06e-11 ***
## wingl                       0.33636    0.02185  15.395  < 2e-16 ***
## speciesdifficilis          12.21079    2.37861   5.134 3.24e-07 ***
## speciesfortis               5.83395    1.86176   3.134 0.001762 ** 
## speciesfuliginosa          12.52616    2.11235   5.930 3.80e-09 ***
## speciesmagnirostris         5.17641    2.55622   2.025 0.043051 *  
## speciesscandens            12.85324    2.50616   5.129 3.32e-07 ***
## wingl:speciesdifficilis    -0.22737    0.03384  -6.720 2.63e-11 ***
## wingl:speciesfortis        -0.08514    0.02455  -3.469 0.000539 ***
## wingl:speciesfuliginosa    -0.23016    0.02994  -7.686 2.81e-14 ***
## wingl:speciesmagnirostris  -0.01958    0.03212  -0.610 0.542181    
## wingl:speciesscandens      -0.22572    0.03414  -6.612 5.33e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.87 on 1422 degrees of freedom
## Multiple R-squared:  0.9345, Adjusted R-squared:  0.934 
## F-statistic:  1846 on 11 and 1422 DF,  p-value: < 2.2e-16
## # A tibble: 1,434 × 9
##    beakh wingl species .fitted .resid    .hat .sigma   .cooksd .std.resid
##    <dbl> <dbl> <chr>     <dbl>  <dbl>   <dbl>  <dbl>     <dbl>      <dbl>
##  1  10.8    69 fortis     12.1 -1.27  0.00209  0.870 0.000374      -1.46 
##  2  10.3    65 fortis     11.1 -0.766 0.00592  0.870 0.000387      -0.883
##  3  10      65 fortis     11.1 -1.07  0.00592  0.870 0.000750      -1.23 
##  4   9.5    68 fortis     11.8 -2.32  0.00255  0.868 0.00152       -2.67 
##  5  11      66 fortis     11.3 -0.317 0.00447  0.870 0.0000500     -0.366
##  6  10.4    68 fortis     11.8 -1.42  0.00255  0.869 0.000570      -1.63 
##  7  11.3    67 fortis     11.6 -0.269 0.00335  0.870 0.0000268     -0.309
##  8  10.6    69 fortis     12.1 -1.47  0.00209  0.869 0.000501      -1.69 
##  9  10.8    67 fortis     11.6 -0.769 0.00335  0.870 0.000219      -0.885
## 10  10.3    68 fortis     11.8 -1.52  0.00255  0.869 0.000653      -1.75 
## # ℹ 1,424 more rows

## 
## Call:
## lm(formula = beakh ~ wingl + species + wingl:species, data = finches)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7044 -0.4571 -0.0223  0.4289  3.3790 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -11.09673    1.68912  -6.570 7.06e-11 ***
## wingl                       0.33636    0.02185  15.395  < 2e-16 ***
## speciesdifficilis          12.21079    2.37861   5.134 3.24e-07 ***
## speciesfortis               5.83395    1.86176   3.134 0.001762 ** 
## speciesfuliginosa          12.52616    2.11235   5.930 3.80e-09 ***
## speciesmagnirostris         5.17641    2.55622   2.025 0.043051 *  
## speciesscandens            12.85324    2.50616   5.129 3.32e-07 ***
## wingl:speciesdifficilis    -0.22737    0.03384  -6.720 2.63e-11 ***
## wingl:speciesfortis        -0.08514    0.02455  -3.469 0.000539 ***
## wingl:speciesfuliginosa    -0.23016    0.02994  -7.686 2.81e-14 ***
## wingl:speciesmagnirostris  -0.01958    0.03212  -0.610 0.542181    
## wingl:speciesscandens      -0.22572    0.03414  -6.612 5.33e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.87 on 1422 degrees of freedom
## Multiple R-squared:  0.9345, Adjusted R-squared:  0.934 
## F-statistic:  1846 on 11 and 1422 DF,  p-value: < 2.2e-16
## # A tibble: 1,434 × 9
##    beakh wingl species .fitted .resid    .hat .sigma   .cooksd .std.resid
##    <dbl> <dbl> <chr>     <dbl>  <dbl>   <dbl>  <dbl>     <dbl>      <dbl>
##  1  10.8    69 fortis     12.1 -1.27  0.00209  0.870 0.000374      -1.46 
##  2  10.3    65 fortis     11.1 -0.766 0.00592  0.870 0.000387      -0.883
##  3  10      65 fortis     11.1 -1.07  0.00592  0.870 0.000750      -1.23 
##  4   9.5    68 fortis     11.8 -2.32  0.00255  0.868 0.00152       -2.67 
##  5  11      66 fortis     11.3 -0.317 0.00447  0.870 0.0000500     -0.366
##  6  10.4    68 fortis     11.8 -1.42  0.00255  0.869 0.000570      -1.63 
##  7  11.3    67 fortis     11.6 -0.269 0.00335  0.870 0.0000268     -0.309
##  8  10.6    69 fortis     12.1 -1.47  0.00209  0.869 0.000501      -1.69 
##  9  10.8    67 fortis     11.6 -0.769 0.00335  0.870 0.000219      -0.885
## 10  10.3    68 fortis     11.8 -1.52  0.00255  0.869 0.000653      -1.75 
## # ℹ 1,424 more rows

Answer: The limitation of the fixed effect model is that since we are estimating intercepts for each category, there will be an increase in the degrees of freedom of the model, which increases the risk of overfitting, increased variance in parameters, a higher complexity and less generalizability of the model.

Exercise 3

Question: Try to replicate as many figures above as you can. What do you notice about the intercepts? What do you notice about the slopes? How does this compare to the previous models we fit?

## $species
##              (Intercept)
## conirostris    1.3417493
## difficilis    -2.5605229
## fortis         0.4037595
## fuliginosa    -2.0911466
## magnirostris   5.3831675
## scandens      -2.4770068
## 
## with conditional variances for "species"
## [1]  1.3417493 -2.5605229  0.4037595 -2.0911466  5.3831675 -2.4770068
## [1] 1.341749

##               intercept      species
## conirostris   1.3417493  conirostris
## difficilis   -2.5605229   difficilis
## fortis        0.4037595       fortis
## fuliginosa   -2.0911466   fuliginosa
## magnirostris  5.3831675 magnirostris
## scandens     -2.4770068     scandens
## [1] 0
## (Intercept)       wingl 
##  -3.8188397   0.2247786
## [1] -3.81884
## [1] 0.2247786
## numeric(0)
## $species
##              (Intercept)     wingl
## conirostris    -2.477090 0.2247786
## difficilis     -6.379363 0.2247786
## fortis         -3.415080 0.2247786
## fuliginosa     -5.909986 0.2247786
## magnirostris    1.564328 0.2247786
## scandens       -6.295846 0.2247786
## 
## attr(,"class")
## [1] "coef.mer"
## lmer(formula = beakh ~ wingl + (1 | species), data = finches)
##             coef.est coef.se
## (Intercept) -3.82     1.38  
## wingl        0.22     0.01  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  species  (Intercept) 3.10    
##  Residual             0.90    
## ---
## number of obs: 1434, groups: species, 6
## AIC = 3838.8, DIC = 3819.5
## deviance = 3825.1
## Linear mixed model fit by REML ['lmerMod']
## Formula: beakh ~ wingl + (1 | species)
##    Data: finches
## 
## REML criterion at convergence: 3830.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -7.1590 -0.5683 -0.0215  0.5570  3.4892 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  species  (Intercept) 9.6096   3.0999  
##  Residual             0.8176   0.9042  
## Number of obs: 1434, groups:  species, 6
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) -3.818840   1.384784  -2.758
## wingl        0.224779   0.007912  28.409
## 
## Correlation of Fixed Effects:
##       (Intr)
## wingl -0.405

Answer: The general regression line in the first plot of exercise doesn’t include random intercepts for the species, that’s why it’s different to the other plots. Therefore its not a good model to predict the beak height by the wing length. The slopes in the model can be uniform across all categories, implying that all lines share the same slope. Alternatively, the slopes can be individually calculated for each category using a linear regression model, resulting in a unique slope for each category. What is the mean of these random intercepts? It is zero, because when the mean of the random intercepts is zero, it implies that, on average, there is no systematic difference between the groups beyond what is accounted for by the fixed effects. What do the model predictions mean? The model is estimating the fixed effect of wingl on the response variable while accounting for random variability in intercepts among different species.

Exercise 4

Question: How might we make the intercept more meaningful if we were to refit this model? (Hint: consider centering your predictor variable!)

Answer: We have to scale and center all variables.