Lab 7: Random intercepts

Exercise 1

Question: Create a table1 and use GGally::ggpairs to explore your data.

	conirostris (N=185)	difficilis (N=102)	fortis (N=510)	fuliginosa (N=318)	magnirostris (N=90)	scandens (N=229)
Beak height (mm)
Mean (SD)	14.9 (1.63)	8.17 (0.494)	12.3 (1.29)	8.00 (0.508)	19.9 (1.83)	9.56 (0.584)
Median [Min, Max]	15.0 [10.5, 18.7]	8.10 [7.10, 9.70]	12.0 [8.60, 16.6]	8.00 [6.70, 10.2]	20.0 [11.5, 23.2]	9.60 [8.20, 11.9]
Wing (length) (mm)
Mean (SD)	77.3 (2.94)	64.7 (3.35)	69.9 (3.45)	61.9 (2.39)	81.4 (3.92)	70.6 (2.20)
Median [Min, Max]	77.0 [68.0, 84.0]	64.0 [58.0, 71.0]	70.0 [60.0, 80.0]	62.0 [55.0, 70.0]	81.0 [70.0, 89.0]	71.0 [65.0, 76.0]
Upper beak (mm)
Mean (SD)	21.0 (1.24)	13.8 (0.720)	16.5 (1.21)	12.5 (0.649)	22.8 (1.64)	19.6 (1.09)
Median [Min, Max]	21.0 [18.1, 24.1]	13.8 [12.3, 15.5]	16.3 [13.4, 20.2]	12.5 [11.0, 15.0]	22.9 [15.0, 25.9]	19.6 [17.1, 22.5]
Sex
F	59 (31.9%)	29 (28.4%)	155 (30.4%)	128 (40.3%)	29 (32.2%)	72 (31.4%)
M	126 (68.1%)	73 (71.6%)	355 (69.6%)	190 (59.7%)	61 (67.8%)	157 (68.6%)

function (data, mapping = NULL, columns = 1:ncol(data), title = NULL, 
    upper = list(continuous = "cor", combo = "box_no_facet", 
        discrete = "count", na = "na"), lower = list(continuous = "points", 
        combo = "facethist", discrete = "facetbar", na = "na"), 
    diag = list(continuous = "densityDiag", discrete = "barDiag", 
        na = "naDiag"), params = NULL, ..., xlab = NULL, ylab = NULL, 
    axisLabels = c("show", "internal", "none"), columnLabels = colnames(data[columns]), 
    labeller = "label_value", switch = NULL, showStrips = NULL, 
    legend = NULL, cardinality_threshold = 15, progress = NULL, 
    proportions = NULL, legends = stop("deprecated")) 
{
    warn_deprecated(!missing(legends), "legends")
    warn_if_args_exist(list(...))
    stop_if_params_exist(params)
    isSharedData <- inherits(data, "SharedData")
    data_ <- fix_data(data)
    data <- fix_data_slim(data_, isSharedData)
    if (!missing(mapping) & !is.list(mapping) & missing(columns)) {
        columns <- mapping
        mapping <- NULL
    }
    stop_if_bad_mapping(mapping)
    columns <- fix_column_values(data, columns, columnLabels, 
        "columns", "columnLabels")
    stop_if_high_cardinality(data, columns, cardinality_threshold)
    upper <- check_and_set_ggpairs_defaults("upper", upper, continuous = "cor", 
        combo = "box_no_facet", discrete = "count", na = "na")
    lower <- check_and_set_ggpairs_defaults("lower", lower, continuous = "points", 
        combo = "facethist", discrete = "facetbar", na = "na")
    diag <- check_and_set_ggpairs_defaults("diag", diag, continuous = "densityDiag", 
        discrete = "barDiag", na = "naDiag", isDiag = TRUE)
    axisLabels <- fix_axis_label_choice(axisLabels, c("show", 
        "internal", "none"))
    proportions <- ggmatrix_proportions(proportions, data, columns)
    dataTypes <- plot_types(data, columns, columns, allowDiag = TRUE)
    if (identical(axisLabels, "internal")) {
        dataTypes$plotType[dataTypes$posX == dataTypes$posY] <- "label"
    }
    ggpairsPlots <- lapply(seq_len(nrow(dataTypes)), function(i) {
        plotType <- dataTypes[i, "plotType"]
        posX <- dataTypes[i, "posX"]
        posY <- dataTypes[i, "posY"]
        xColName <- dataTypes[i, "xVar"]
        yColName <- dataTypes[i, "yVar"]
        if (posX > posY) {
            types <- upper
        }
        else if (posX < posY) {
            types <- lower
        }
        else {
            types <- diag
        }
        sectionAes <- add_and_overwrite_aes(add_and_overwrite_aes(aes_(x = as.name(xColName), 
            y = as.name(yColName)), mapping), types$mapping)
        args <- list(types = types, sectionAes = sectionAes)
        if (plotType == "label") {
            args$label <- columnLabels[posX]
        }
        plot_fn <- ggmatrix_plot_list(plotType)
        p <- do.call(plot_fn, args)
        return(p)
    })
    plotMatrix <- ggmatrix(plots = ggpairsPlots, byrow = TRUE, 
        nrow = length(columns), ncol = length(columns), xAxisLabels = (if (axisLabels == 
            "internal") 
            NULL
        else columnLabels), yAxisLabels = (if (axisLabels == 
            "internal") 
            NULL
        else columnLabels), labeller = labeller, switch = switch, 
        showStrips = showStrips, showXAxisPlotLabels = identical(axisLabels, 
            "show"), showYAxisPlotLabels = identical(axisLabels, 
            "show"), title = title, xlab = xlab, ylab = ylab, 
        data = data_, gg = NULL, progress = progress, legend = legend, 
        xProportions = proportions, yProportions = proportions)
    plotMatrix
}
<bytecode: 0x559e7ba3aed0>
<environment: namespace:GGally>

Exercise 2

Question: Try to replicate as many graphs as you can from those shown above, and try to explain what could be the limitations of a fixed effects models we created?

## 
## Call:
## lm(formula = beakh ~ wingl, data = finches)
## 
## Coefficients:
## (Intercept)        wingl  
##    -20.3974       0.4576

## 
## Call:
## lm(formula = beakh ~ wingl, data = finches)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9257 -1.0104  0.0236  1.0507  6.1625 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -20.397361   0.503009  -40.55   <2e-16 ***
## wingl         0.457641   0.007204   63.53   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.734 on 1432 degrees of freedom
## Multiple R-squared:  0.7381, Adjusted R-squared:  0.7379 
## F-statistic:  4035 on 1 and 1432 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = beakh ~ wingl + wingl2, data = finches)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6554 -1.0030 -0.1342  0.9923  6.6615 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.4954288  4.0697832   9.705   <2e-16 ***
## wingl       -1.2530210  0.1156634 -10.833   <2e-16 ***
## wingl2       0.0121132  0.0008176  14.815   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.615 on 1431 degrees of freedom
## Multiple R-squared:  0.7729, Adjusted R-squared:  0.7726 
## F-statistic:  2435 on 2 and 1431 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = beakh ~ wingl + wingl2 + species, data = finches)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.5830 -0.4802 -0.0155  0.4421  3.1512 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         32.9526799  3.1630443  10.418  < 2e-16 ***
## wingl               -0.7882287  0.0892474  -8.832  < 2e-16 ***
## wingl2               0.0071660  0.0006295  11.384  < 2e-16 ***
## speciesdifficilis   -3.8668081  0.1429794 -27.045  < 2e-16 ***
## speciesfortis       -0.6566894  0.0963176  -6.818 1.36e-11 ***
## speciesfuliginosa   -3.6559863  0.1425631 -25.645  < 2e-16 ***
## speciesmagnirostris  3.4833861  0.1258761  27.673  < 2e-16 ***
## speciesscandens     -3.4835950  0.1039663 -33.507  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.866 on 1426 degrees of freedom
## Multiple R-squared:  0.935,  Adjusted R-squared:  0.9346 
## F-statistic:  2928 on 7 and 1426 DF,  p-value: < 2.2e-16

Answer: Since the independence assumption is violated, the data will explain variance that is not of interest. This is because birds of the same species are more similar than birds of different species. Also, we see heteroscedasticity in the graph, which is a violation of the linear regression assumption. The solution might be to create a mixed model with fixed effects for species. This will essentially add 5 dummy variables to our regression model which reflect the different intercepts for the different species. This will allow the model to explain the between-species variance for each species seperately. Our model will thus be a more accurate representation of realtiy.

Exercise 3

Question: Try to replicate as many figures above as you can. What do you notice about the intercepts? What do you notice about the slopes? How does this compare to the previous models we fit?

Answer: So we see that the intercept vary across species but the slopes are the same for all species. Compared to the previous model this model explains the variation that we are interested in: the relation between wing length and beakheight. It models the variation between species separately so we can distinguish between them. The previous model does not do this.

Exercise 4

Question: How might we make the intercept more meaningful if we were to refit this model? (Hint: consider centering your predictor variable!)

## $species
##              (Intercept)
## conirostris    1.3417493
## difficilis    -2.5605229
## fortis         0.4037595
## fuliginosa    -2.0911466
## magnirostris   5.3831675
## scandens      -2.4770068
## 
## with conditional variances for "species"

## [1]  1.3417493 -2.5605229  0.4037595 -2.0911466  5.3831675 -2.4770068

##               intercept      species
## conirostris   1.3417493  conirostris
## difficilis   -2.5605229   difficilis
## fortis        0.4037595       fortis
## fuliginosa   -2.0911466   fuliginosa
## magnirostris  5.3831675 magnirostris
## scandens     -2.4770068     scandens

## [1] 0

## (Intercept)       wingl 
##  -3.8188397   0.2247786

## [1] -3.81884

## [1] 0.2247786

## numeric(0)

## $species
##              (Intercept)     wingl
## conirostris    -2.477090 0.2247786
## difficilis     -6.379363 0.2247786
## fortis         -3.415080 0.2247786
## fuliginosa     -5.909986 0.2247786
## magnirostris    1.564328 0.2247786
## scandens       -6.295846 0.2247786
## 
## attr(,"class")
## [1] "coef.mer"

## Linear mixed model fit by REML ['lmerMod']
## Formula: beakh ~ wingl + (1 | species)
##    Data: finches
## 
## REML criterion at convergence: 3830.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -7.1590 -0.5683 -0.0215  0.5570  3.4892 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  species  (Intercept) 9.6096   3.0999  
##  Residual             0.8176   0.9042  
## Number of obs: 1434, groups:  species, 6
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) -3.818840   1.384784  -2.758
## wingl        0.224779   0.007912  28.409
## 
## Correlation of Fixed Effects:
##       (Intr)
## wingl -0.405

## lmer(formula = beakh ~ wingl + (1 | species), data = finches)
##             coef.est coef.se
## (Intercept) -3.82     1.38  
## wingl        0.22     0.01  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  species  (Intercept) 3.10    
##  Residual             0.90    
## ---
## number of obs: 1434, groups: species, 6
## AIC = 3838.8, DIC = 3819.5
## deviance = 3825.1

## [1] 0.2

Answer: In general our model predicts that beak height increases as wing length increases across all species. This is actually not the case; our mean is zero. This is because the Bj has a normal distribution where N(0, s^2).

We expected the mean intercept to be the same as the intercept in the fixed effect model since the mean intercept reflects the average intercept across species.

The negative correlation between the intercept and the slope tells us that within a species the predictor and outcome variable are related to each other. Thus their variation is actually different. This we don’t want in our model because we want to observe the difference in beak height independent of species.

To make the intercept more meaningful we could center the predictor. The variable wing length will then have a mean of 0. The intercept can then be predicted as the expected value when all predictors are at their mean. Additionally, we could also make the slopes random in order to account for the correlation between the slopes and intercepts to make further partition the variance within species.

Lab 7: Random intercepts

Group 2: Silke Hoekstra, Mengqian Zhao, Jesse Lindenhovius, Rina Sevcik

2023-11-15

Exercise 1

Exercise 2

Exercise 3

Exercise 4