27 Nov 2023Question: Create a table1 and use GGally::ggpairs to explore your data.
| conirostris (N=185) |
difficilis (N=102) |
fortis (N=510) |
fuliginosa (N=318) |
magnirostris (N=90) |
scandens (N=229) |
|
|---|---|---|---|---|---|---|
| Beak height (mm) | ||||||
| Mean (SD) | 14.9 (1.63) | 8.17 (0.494) | 12.3 (1.29) | 8.00 (0.508) | 19.9 (1.83) | 9.56 (0.584) |
| Median [Min, Max] | 15.0 [10.5, 18.7] | 8.10 [7.10, 9.70] | 12.0 [8.60, 16.6] | 8.00 [6.70, 10.2] | 20.0 [11.5, 23.2] | 9.60 [8.20, 11.9] |
| Wing (lenght) (mm) | ||||||
| Mean (SD) | 77.3 (2.94) | 64.7 (3.35) | 69.9 (3.45) | 61.9 (2.39) | 81.4 (3.92) | 70.6 (2.20) |
| Median [Min, Max] | 77.0 [68.0, 84.0] | 64.0 [58.0, 71.0] | 70.0 [60.0, 80.0] | 62.0 [55.0, 70.0] | 81.0 [70.0, 89.0] | 71.0 [65.0, 76.0] |
| Upper beak (mm) | ||||||
| Mean (SD) | 21.0 (1.24) | 13.8 (0.720) | 16.5 (1.21) | 12.5 (0.649) | 22.8 (1.64) | 19.6 (1.09) |
| Median [Min, Max] | 21.0 [18.1, 24.1] | 13.8 [12.3, 15.5] | 16.3 [13.4, 20.2] | 12.5 [11.0, 15.0] | 22.9 [15.0, 25.9] | 19.6 [17.1, 22.5] |
| Sex | ||||||
| F | 59 (31.9%) | 29 (28.4%) | 155 (30.4%) | 128 (40.3%) | 29 (32.2%) | 72 (31.4%) |
| M | 126 (68.1%) | 73 (71.6%) | 355 (69.6%) | 190 (59.7%) | 61 (67.8%) | 157 (68.6%) |
Question: Create a table1 and use GGally::ggpairs to explore your data.
##
## Call:
## lm(formula = beakh ~ wingl, data = finches)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.9257 -1.0104 0.0236 1.0507 6.1625
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -20.397361 0.503009 -40.55 <2e-16 ***
## wingl 0.457641 0.007204 63.53 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.734 on 1432 degrees of freedom
## Multiple R-squared: 0.7381, Adjusted R-squared: 0.7379
## F-statistic: 4035 on 1 and 1432 DF, p-value: < 2.2e-16
## # A tibble: 1,434 × 8
## beakh wingl .fitted .resid .hat .sigma .cooksd .std.resid
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 10.8 69 11.2 -0.380 0.000702 1.73 0.0000169 -0.219
## 2 10.3 65 9.35 0.951 0.00105 1.73 0.000158 0.548
## 3 10 65 9.35 0.651 0.00105 1.73 0.0000742 0.375
## 4 9.5 68 10.7 -1.22 0.000738 1.73 0.000184 -0.705
## 5 11 66 9.81 1.19 0.000913 1.73 0.000216 0.688
## 6 10.4 68 10.7 -0.322 0.000738 1.73 0.0000128 -0.186
## 7 11.3 67 10.3 1.04 0.000808 1.73 0.000144 0.597
## 8 10.6 69 11.2 -0.580 0.000702 1.73 0.0000393 -0.334
## 9 10.8 67 10.3 0.535 0.000808 1.73 0.0000386 0.309
## 10 10.3 68 10.7 -0.422 0.000738 1.73 0.0000219 -0.244
## # ℹ 1,424 more rows
##
## Call:
## lm(formula = beakh ~ wingl + species, data = finches)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.4837 -0.5140 -0.0194 0.5005 3.1530
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.425973 0.615416 -3.942 8.47e-05 ***
## wingl 0.224125 0.007919 28.301 < 2e-16 ***
## speciesdifficilis -3.913210 0.149223 -26.224 < 2e-16 ***
## speciesfortis -0.943352 0.097067 -9.719 < 2e-16 ***
## speciesfuliginosa -3.444110 0.147575 -23.338 < 2e-16 ***
## speciesmagnirostris 4.048605 0.120771 33.523 < 2e-16 ***
## speciesscandens -3.824673 0.103945 -36.795 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9042 on 1427 degrees of freedom
## Multiple R-squared: 0.929, Adjusted R-squared: 0.9287
## F-statistic: 3114 on 6 and 1427 DF, p-value: < 2.2e-16
## # A tibble: 1,434 × 9
## beakh wingl species .fitted .resid .hat .sigma .cooksd .std.resid
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 10.8 69 fortis 12.1 -1.30 0.00202 0.904 0.000595 -1.43
## 2 10.3 65 fortis 11.2 -0.899 0.00380 0.904 0.000540 -0.996
## 3 10 65 fortis 11.2 -1.20 0.00380 0.904 0.000961 -1.33
## 4 9.5 68 fortis 11.9 -2.37 0.00224 0.902 0.00221 -2.63
## 5 11 66 fortis 11.4 -0.423 0.00312 0.904 0.0000982 -0.468
## 6 10.4 68 fortis 11.9 -1.47 0.00224 0.904 0.000849 -1.63
## 7 11.3 67 fortis 11.6 -0.347 0.00260 0.904 0.0000551 -0.384
## 8 10.6 69 fortis 12.1 -1.50 0.00202 0.904 0.000793 -1.66
## 9 10.8 67 fortis 11.6 -0.847 0.00260 0.904 0.000328 -0.938
## 10 10.3 68 fortis 11.9 -1.57 0.00224 0.904 0.000969 -1.74
## # ℹ 1,424 more rows
##
## Call:
## lm(formula = beakh ~ wingl + species + wingl:species, data = finches)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7044 -0.4571 -0.0223 0.4289 3.3790
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -11.09673 1.68912 -6.570 7.06e-11 ***
## wingl 0.33636 0.02185 15.395 < 2e-16 ***
## speciesdifficilis 12.21079 2.37861 5.134 3.24e-07 ***
## speciesfortis 5.83395 1.86176 3.134 0.001762 **
## speciesfuliginosa 12.52616 2.11235 5.930 3.80e-09 ***
## speciesmagnirostris 5.17641 2.55622 2.025 0.043051 *
## speciesscandens 12.85324 2.50616 5.129 3.32e-07 ***
## wingl:speciesdifficilis -0.22737 0.03384 -6.720 2.63e-11 ***
## wingl:speciesfortis -0.08514 0.02455 -3.469 0.000539 ***
## wingl:speciesfuliginosa -0.23016 0.02994 -7.686 2.81e-14 ***
## wingl:speciesmagnirostris -0.01958 0.03212 -0.610 0.542181
## wingl:speciesscandens -0.22572 0.03414 -6.612 5.33e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.87 on 1422 degrees of freedom
## Multiple R-squared: 0.9345, Adjusted R-squared: 0.934
## F-statistic: 1846 on 11 and 1422 DF, p-value: < 2.2e-16
## # A tibble: 1,434 × 9
## beakh wingl species .fitted .resid .hat .sigma .cooksd .std.resid
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 10.8 69 fortis 12.1 -1.27 0.00209 0.870 0.000374 -1.46
## 2 10.3 65 fortis 11.1 -0.766 0.00592 0.870 0.000387 -0.883
## 3 10 65 fortis 11.1 -1.07 0.00592 0.870 0.000750 -1.23
## 4 9.5 68 fortis 11.8 -2.32 0.00255 0.868 0.00152 -2.67
## 5 11 66 fortis 11.3 -0.317 0.00447 0.870 0.0000500 -0.366
## 6 10.4 68 fortis 11.8 -1.42 0.00255 0.869 0.000570 -1.63
## 7 11.3 67 fortis 11.6 -0.269 0.00335 0.870 0.0000268 -0.309
## 8 10.6 69 fortis 12.1 -1.47 0.00209 0.869 0.000501 -1.69
## 9 10.8 67 fortis 11.6 -0.769 0.00335 0.870 0.000219 -0.885
## 10 10.3 68 fortis 11.8 -1.52 0.00255 0.869 0.000653 -1.75
## # ℹ 1,424 more rows
##
## Call:
## lm(formula = beakh ~ wingl + species + wingl:species, data = finches)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7044 -0.4571 -0.0223 0.4289 3.3790
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -11.09673 1.68912 -6.570 7.06e-11 ***
## wingl 0.33636 0.02185 15.395 < 2e-16 ***
## speciesdifficilis 12.21079 2.37861 5.134 3.24e-07 ***
## speciesfortis 5.83395 1.86176 3.134 0.001762 **
## speciesfuliginosa 12.52616 2.11235 5.930 3.80e-09 ***
## speciesmagnirostris 5.17641 2.55622 2.025 0.043051 *
## speciesscandens 12.85324 2.50616 5.129 3.32e-07 ***
## wingl:speciesdifficilis -0.22737 0.03384 -6.720 2.63e-11 ***
## wingl:speciesfortis -0.08514 0.02455 -3.469 0.000539 ***
## wingl:speciesfuliginosa -0.23016 0.02994 -7.686 2.81e-14 ***
## wingl:speciesmagnirostris -0.01958 0.03212 -0.610 0.542181
## wingl:speciesscandens -0.22572 0.03414 -6.612 5.33e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.87 on 1422 degrees of freedom
## Multiple R-squared: 0.9345, Adjusted R-squared: 0.934
## F-statistic: 1846 on 11 and 1422 DF, p-value: < 2.2e-16
## # A tibble: 1,434 × 9
## beakh wingl species .fitted .resid .hat .sigma .cooksd .std.resid
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 10.8 69 fortis 12.1 -1.27 0.00209 0.870 0.000374 -1.46
## 2 10.3 65 fortis 11.1 -0.766 0.00592 0.870 0.000387 -0.883
## 3 10 65 fortis 11.1 -1.07 0.00592 0.870 0.000750 -1.23
## 4 9.5 68 fortis 11.8 -2.32 0.00255 0.868 0.00152 -2.67
## 5 11 66 fortis 11.3 -0.317 0.00447 0.870 0.0000500 -0.366
## 6 10.4 68 fortis 11.8 -1.42 0.00255 0.869 0.000570 -1.63
## 7 11.3 67 fortis 11.6 -0.269 0.00335 0.870 0.0000268 -0.309
## 8 10.6 69 fortis 12.1 -1.47 0.00209 0.869 0.000501 -1.69
## 9 10.8 67 fortis 11.6 -0.769 0.00335 0.870 0.000219 -0.885
## 10 10.3 68 fortis 11.8 -1.52 0.00255 0.869 0.000653 -1.75
## # ℹ 1,424 more rows
Answer: The limitation of the fixed effect model is
that since we are estimating intercepts for each category, there will be
an increase in the degrees of freedom of the model, which increases the
risk of overfitting, increased variance in parameters, a higher
complexity and less generalizability of the model.
Question: Try to replicate as many figures above as you can. What do you notice about the intercepts? What do you notice about the slopes? How does this compare to the previous models we fit?
## $species
## (Intercept)
## conirostris 1.3417493
## difficilis -2.5605229
## fortis 0.4037595
## fuliginosa -2.0911466
## magnirostris 5.3831675
## scandens -2.4770068
##
## with conditional variances for "species"
## [1] 1.3417493 -2.5605229 0.4037595 -2.0911466 5.3831675 -2.4770068
## [1] 1.341749
## intercept species
## conirostris 1.3417493 conirostris
## difficilis -2.5605229 difficilis
## fortis 0.4037595 fortis
## fuliginosa -2.0911466 fuliginosa
## magnirostris 5.3831675 magnirostris
## scandens -2.4770068 scandens
## [1] 0
## (Intercept) wingl
## -3.8188397 0.2247786
## [1] -3.81884
## [1] 0.2247786
## numeric(0)
## $species
## (Intercept) wingl
## conirostris -2.477090 0.2247786
## difficilis -6.379363 0.2247786
## fortis -3.415080 0.2247786
## fuliginosa -5.909986 0.2247786
## magnirostris 1.564328 0.2247786
## scandens -6.295846 0.2247786
##
## attr(,"class")
## [1] "coef.mer"
## lmer(formula = beakh ~ wingl + (1 | species), data = finches)
## coef.est coef.se
## (Intercept) -3.82 1.38
## wingl 0.22 0.01
##
## Error terms:
## Groups Name Std.Dev.
## species (Intercept) 3.10
## Residual 0.90
## ---
## number of obs: 1434, groups: species, 6
## AIC = 3838.8, DIC = 3819.5
## deviance = 3825.1
## Linear mixed model fit by REML ['lmerMod']
## Formula: beakh ~ wingl + (1 | species)
## Data: finches
##
## REML criterion at convergence: 3830.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -7.1590 -0.5683 -0.0215 0.5570 3.4892
##
## Random effects:
## Groups Name Variance Std.Dev.
## species (Intercept) 9.6096 3.0999
## Residual 0.8176 0.9042
## Number of obs: 1434, groups: species, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -3.818840 1.384784 -2.758
## wingl 0.224779 0.007912 28.409
##
## Correlation of Fixed Effects:
## (Intr)
## wingl -0.405
Answer: The general regression line in the first plot of exercise doesn’t include random intercepts for the species, that’s why it’s different to the other plots. Therefore its not a good model to predict the beak height by the wing length. The slopes in the model can be uniform across all categories, implying that all lines share the same slope. Alternatively, the slopes can be individually calculated for each category using a linear regression model, resulting in a unique slope for each category. What is the mean of these random intercepts? It is zero, because when the mean of the random intercepts is zero, it implies that, on average, there is no systematic difference between the groups beyond what is accounted for by the fixed effects. What do the model predictions mean? The model is estimating the fixed effect of wingl on the response variable while accounting for random variability in intercepts among different species.
Question: How might we make the intercept more meaningful if we were to refit this model? (Hint: consider centering your predictor variable!)
Answer: We have to scale and center all variables.