This report presents two statistical analyses using mixed-effects models in R:
Mixed-effects models are useful because they allow the inclusion of:
# Libraries for data analysis and visualization
library(readxl) # Import Excel files
library(lme4) # Mixed effects models
library(lmerTest) # p-values for LMM
library(emmeans) # Estimated marginal means
library(ggplot2) # Visualization
library(dplyr) # Data manipulation
library(tidyr) # Data tidying
library(car) # ANOVA
library(DHARMa) # GLMM diagnostics
library(gridExtra) # Multiple plots
library(MASS) # Negative binomial models
library(multcomp)
library(multcompView)# Import greenhouse gas dataset
gge_data <- read_excel("greenhousegases.xlsx")
# Convert categorical variables into factors
gge_data$county <- as.factor(gge_data$county)
gge_data$sector <- as.factor(gge_data$sector)
gge_data$municipality <- as.factor(gge_data$municipality)
# Display structure
str(gge_data)## tibble [525 × 5] (S3: tbl_df/tbl/data.frame)
## $ municipality: Factor w/ 75 levels "Alingsås","Älmhult",..: 1 1 1 1 1 1 1 5 5 5 ...
## $ sector : Factor w/ 7 levels "Agriculture",..: 5 3 4 2 7 6 1 4 2 3 ...
## $ county : Factor w/ 19 levels "Blekinge län",..: 19 19 19 19 19 19 19 8 8 8 ...
## $ GG2017 : chr [1:525] "6579.075" "2232.759" "8300.543" "3228.022" ...
## $ pop2017 : chr [1:525] "40.39" "40.39" "40.39" "40.39" ...
## [1] 525 5
## [1] 19
## [1] 7
## [1] 75
Explanation:
sector_summary <- gge_data %>%
group_by(sector) %>%
summarise(
n = n(),
mean_emissions = mean(GG2017),
sd_emissions = sd(GG2017),
median_emissions = median(GG2017),
min_emissions = min(GG2017),
max_emissions = max(GG2017),
mean_population = mean(pop2017)
)## Warning: There were 14 warnings in `summarise()`.
## The first warning was:
## ℹ In argument: `mean_emissions = mean(GG2017)`.
## ℹ In group 1: `sector = Agriculture`.
## Caused by warning in `mean.default()`:
## ! argument is not numeric or logical: returning NA
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 13 remaining warnings.
## # A tibble: 7 × 8
## sector n mean_emissions sd_emissions median_emissions min_emissions
## <fct> <int> <dbl> <dbl> <chr> <chr>
## 1 Agriculture 75 NA 40422. 32719.17 10014.86
## 2 Heating 75 NA 2945. 3228.022 1204.654
## 3 Industry 75 NA 288889. 321047.9 1065.189
## 4 Machinery 75 NA 17397. 4003.197 10239.81
## 5 Product use 75 NA 5308. 2854.124 11208.1
## 6 Transport 75 NA 51268. 27927.11 10284.54
## 7 Waste 75 NA 4693. 399.7303 1056.943
## # ℹ 2 more variables: max_emissions <chr>, mean_population <dbl>
Interpretation:
The summary statistics show variation in emissions across sectors. Some sectors contribute substantially more greenhouse gas emissions than others.
# Convert emissions variable to numeric
gge_data$GG2017 <- as.numeric(as.character(gge_data$GG2017))
# Histogram of raw emissions
p1 <- ggplot(gge_data, aes(x = GG2017)) +
geom_histogram(bins = 30, fill = "steelblue", color = "black") +
labs(title = "Distribution of Greenhouse Gas Emissions")
# Histogram of log-transformed emissions
p2 <- ggplot(gge_data, aes(x = log10(GG2017))) +
geom_histogram(bins = 30, fill = "darkgreen", color = "black") +
labs(title = "Log10 Transformed Emissions")
grid.arrange(p1, p2, ncol = 2)Interpretation:
The raw emissions distribution is highly right-skewed.
Applying a log transformation produces a more
symmetrical distribution suitable for linear modeling.
gge_data$logGG <- log10(gge_data$GG2017)
ggplot(gge_data, aes(x = pop2017, y = logGG, color = sector)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE) +
theme_bw() +
labs(
title = "Population vs Emissions by Sector",
x = "Population",
y = "Log Emissions"
)## `geom_smooth()` using formula = 'y ~ x'
Interpretation:
Municipalities with larger populations tend to produce higher greenhouse gas emissions.
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: logGG ~ sector + pop2017 + (1 | county)
## Data: gge_data
##
## REML criterion at convergence: 636.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7981 -0.4630 -0.0426 0.4287 6.1846
##
## Random effects:
## Groups Name Variance Std.Dev.
## county (Intercept) 0.0217 0.1473
## Residual 0.1676 0.4094
## Number of obs: 525, groups: county, 19
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 3.865e+00 2.181e-01 7.174e-09 17.720 1.00000
## sectorHeating -7.222e-01 6.686e-02 4.440e+02 -10.802 < 2e-16 ***
## sectorIndustry -3.073e-01 6.686e-02 4.440e+02 -4.596 5.63e-06 ***
## sectorMachinery -2.003e-01 6.686e-02 4.440e+02 -2.996 0.00288 **
## sectorProduct use -6.195e-01 6.686e-02 4.440e+02 -9.266 < 2e-16 ***
## sectorTransport 4.767e-01 6.686e-02 4.440e+02 7.130 4.10e-12 ***
## sectorWaste -8.694e-01 6.686e-02 4.440e+02 -13.003 < 2e-16 ***
## pop201710.241 -1.419e-01 3.022e-01 6.608e-09 -0.469 1.00000
## pop201710.66 -3.597e-01 3.022e-01 6.608e-09 -1.190 1.00000
## pop201710.747 -1.142e-01 3.022e-01 6.608e-09 -0.378 1.00000
## pop201710.837 2.865e-01 3.022e-01 6.608e-09 0.948 1.00000
## pop201710.894 1.238e-01 3.022e-01 6.608e-09 0.410 1.00000
## pop201711.496 2.009e-01 3.022e-01 6.608e-09 0.665 1.00000
## pop201711.841 3.027e-01 3.022e-01 6.608e-09 1.002 1.00000
## pop201711.845 1.092e-01 3.022e-01 6.608e-09 0.361 1.00000
## pop201712.008 1.104e-01 3.022e-01 6.608e-09 0.365 1.00000
## pop201712.257 1.782e-01 3.022e-01 6.608e-09 0.590 1.00000
## pop201712.711 1.332e-01 3.022e-01 6.608e-09 0.441 1.00000
## pop201712.916 1.474e-02 3.022e-01 6.608e-09 0.049 1.00000
## pop201713.182 3.390e-02 2.188e-01 4.440e+02 0.155 0.87698
## pop201713.84 2.927e-01 3.022e-01 6.608e-09 0.969 1.00000
## pop2017140.927 9.971e-01 3.022e-01 6.608e-09 3.300 1.00000
## pop201715.429 2.099e-01 2.188e-01 4.440e+02 0.959 0.33792
## pop201715.64 1.270e-01 3.022e-01 6.608e-09 0.420 1.00000
## pop201715.79 6.336e-03 3.022e-01 6.608e-09 0.021 1.00000
## pop2017150.134 8.730e-01 3.022e-01 6.608e-09 2.889 1.00000
## pop2017158.52 9.760e-01 3.022e-01 6.608e-09 3.230 1.00000
## pop201716.169 3.716e-01 3.022e-01 6.608e-09 1.230 1.00000
## pop201716.478 1.138e-01 2.188e-01 4.440e+02 0.520 0.60339
## pop201717.148 3.772e-01 3.022e-01 6.608e-09 1.248 1.00000
## pop201717.416 2.834e-01 3.022e-01 6.608e-09 0.938 1.00000
## pop201717.455 2.149e-01 3.022e-01 6.608e-09 0.711 1.00000
## pop201717.462 4.894e-01 2.188e-01 4.440e+02 2.236 0.02584 *
## pop201717.825 6.802e-01 3.022e-01 6.608e-09 2.251 1.00000
## pop201718.073 1.347e-01 2.188e-01 4.440e+02 0.616 0.53854
## pop20172.516 -4.043e-01 3.022e-01 6.608e-09 -1.338 1.00000
## pop20172.809 -4.169e-01 3.022e-01 6.608e-09 -1.380 1.00000
## pop20172.821 -4.307e-01 3.022e-01 6.608e-09 -1.425 1.00000
## pop201720.93 4.356e-01 3.022e-01 6.608e-09 1.442 1.00000
## pop201721.074 2.331e-01 2.188e-01 4.440e+02 1.065 0.28738
## pop201721.577 3.662e-01 3.022e-01 6.608e-09 1.212 1.00000
## pop201721.927 3.405e-01 3.022e-01 6.608e-09 1.127 1.00000
## pop201723.256 4.753e-01 3.022e-01 6.608e-09 1.573 1.00000
## pop201724.264 1.697e-01 2.188e-01 4.440e+02 0.775 0.43847
## pop201724.65 5.100e-01 3.022e-01 6.608e-09 1.688 1.00000
## pop201725.147 5.566e-01 3.022e-01 6.608e-09 1.842 1.00000
## pop201726.224 5.474e-01 3.022e-01 6.608e-09 1.812 1.00000
## pop201726.918 3.415e-01 3.022e-01 6.608e-09 1.130 1.00000
## pop20173.133 -4.696e-01 3.022e-01 6.608e-09 -1.554 1.00000
## pop201732.2 5.346e-01 3.022e-01 6.608e-09 1.769 1.00000
## pop201733.077 5.848e-01 3.022e-01 6.608e-09 1.936 1.00000
## pop201733.236 7.497e-01 2.188e-01 4.440e+02 3.426 0.00067 ***
## pop201734.133 4.734e-01 3.022e-01 6.608e-09 1.567 1.00000
## pop201735.045 3.192e-01 3.022e-01 6.608e-09 1.056 1.00000
## pop201735.79 7.171e-02 2.188e-01 4.440e+02 0.328 0.74332
## pop201737.401 6.946e-01 3.022e-01 6.608e-09 2.299 1.00000
## pop201740.39 3.834e-01 3.022e-01 6.608e-09 1.269 1.00000
## pop201741.786 4.453e-01 2.188e-01 4.440e+02 2.035 0.04247 *
## pop201742.184 7.256e-01 3.022e-01 6.608e-09 2.401 1.00000
## pop201743.549 4.651e-01 3.022e-01 6.608e-09 1.539 1.00000
## pop201744.605 1.326e-01 3.022e-01 6.608e-09 0.439 1.00000
## pop201747.146 4.537e-01 3.022e-01 6.608e-09 1.502 1.00000
## pop201747.185 -7.379e-02 3.022e-01 6.608e-09 -0.244 1.00000
## pop20175.412 -1.219e-01 3.022e-01 6.608e-09 -0.403 1.00000
## pop20175.896 -2.805e-01 3.022e-01 6.608e-09 -0.928 1.00000
## pop201752.003 8.645e-01 2.188e-01 4.440e+02 3.950 9.09e-05 ***
## pop201755.763 4.461e-01 3.022e-01 6.608e-09 1.476 1.00000
## pop201758.595 1.242e+00 3.022e-01 6.608e-09 4.112 1.00000
## pop20176.837 -4.669e-02 3.022e-01 6.608e-09 -0.155 1.00000
## pop201766.121 3.755e-01 3.022e-01 6.608e-09 1.243 1.00000
## pop20177.109 -3.744e-03 3.022e-01 6.608e-09 -0.012 1.00000
## pop201772.723 9.866e-01 3.022e-01 6.608e-09 3.265 1.00000
## pop20178.603 -1.766e-01 3.022e-01 6.608e-09 -0.584 1.00000
## pop20178.776 -1.763e-01 3.022e-01 6.608e-09 -0.583 1.00000
## pop20179.377 5.494e-02 3.022e-01 6.608e-09 0.182 1.00000
## pop20179.485 9.467e-02 3.022e-01 6.608e-09 0.313 1.00000
## pop20179.668 6.871e-02 3.022e-01 6.608e-09 0.227 1.00000
## pop20179.805 -9.942e-02 3.022e-01 6.608e-09 -0.329 1.00000
## pop201791.925 4.041e-01 3.022e-01 6.608e-09 1.337 1.00000
## pop201798.81 9.791e-01 3.022e-01 6.608e-09 3.240 1.00000
## pop201799.752 9.467e-01 3.022e-01 6.608e-09 3.133 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation matrix not shown by default, as p = 81 > 12.
## Use print(x, correlation=TRUE) or
## vcov(x) if you need it
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## sector 97.865 16.3108 6 444 97.303 <2e-16 ***
## pop2017 52.632 0.7112 74 2 4.243 0.2094
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Explanation:
Model structure:
log emissions = sector + population + random county variation
The random effect allows each county to have its own baseline emission level.
residuals_df <- data.frame(
Fitted = fitted(model_lmm1),
Residuals = residuals(model_lmm1)
)
p_resid <- ggplot(residuals_df, aes(Fitted, Residuals)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed")
p_qq <- ggplot(residuals_df, aes(sample = Residuals)) +
stat_qq() +
stat_qq_line()
grid.arrange(p_resid, p_qq, ncol = 2)Interpretation:
Residual plots help verify assumptions such as:
## contrast estimate SE df t.ratio p.value
## Agriculture - Heating 0.722 0.0671 444 10.767 <0.0001
## Agriculture - Industry 0.307 0.0671 444 4.581 0.0001
## Agriculture - Machinery 0.200 0.0671 444 2.987 0.0466
## Agriculture - Product use 0.619 0.0671 444 9.235 <0.0001
## Agriculture - Transport -0.477 0.0671 444 -7.108 <0.0001
## Agriculture - Waste 0.869 0.0671 444 12.960 <0.0001
## Heating - Industry -0.415 0.0669 444 -6.206 <0.0001
## Heating - Machinery -0.522 0.0669 444 -7.805 <0.0001
## Heating - Product use -0.103 0.0669 444 -1.536 0.7230
## Heating - Transport -1.199 0.0669 444 -17.932 <0.0001
## Heating - Waste 0.147 0.0669 444 2.201 0.2968
## Industry - Machinery -0.107 0.0669 444 -1.599 0.6830
## Industry - Product use 0.312 0.0669 444 4.670 <0.0001
## Industry - Transport -0.784 0.0669 444 -11.726 <0.0001
## Industry - Waste 0.562 0.0669 444 8.407 <0.0001
## Machinery - Product use 0.419 0.0669 444 6.269 <0.0001
## Machinery - Transport -0.677 0.0669 444 -10.126 <0.0001
## Machinery - Waste 0.669 0.0669 444 10.007 <0.0001
## Product use - Transport -1.096 0.0669 444 -16.396 <0.0001
## Product use - Waste 0.250 0.0669 444 3.737 0.0039
## Transport - Waste 1.346 0.0669 444 20.133 <0.0001
##
## Results are averaged over the levels of: pop2017
## Degrees-of-freedom method: kenward-roger
## P value adjustment: tukey method for comparing a family of 7 estimates
Interpretation:
Post-hoc comparisons identify which sectors differ significantly in emission levels.
guppy_data <- read_excel("guppy.xls")
guppy_data$Family <- as.factor(guppy_data$Family)
guppy_data$Gender <- as.factor(guppy_data$Gender)
guppy_data$Treatment <- as.factor(guppy_data$Treatment)
str(guppy_data)## tibble [145 × 4] (S3: tbl_df/tbl/data.frame)
## $ Family : Factor w/ 19 levels "0A2-1","0A2-2",..: 7 7 7 7 10 10 10 10 10 10 ...
## $ Gender : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
## $ Treatment : Factor w/ 2 levels "0ng","20ng": 1 1 1 1 1 1 1 1 1 1 ...
## $ Transitions: num [1:145] 0 4 0 1 0 0 0 4 2 1 ...
##
## F M
## 0ng 32 42
## 20ng 31 40
Explanation:
ggplot(guppy_data, aes(x = Transitions, fill = Treatment)) +
geom_histogram(binwidth = 1, alpha = 0.7) +
facet_wrap(~Gender) +
theme_bw() +
labs(
title = "Distribution of Guppy Transitions",
x = "Number of Transitions",
y = "Count"
)Interpretation:
Transition counts are discrete and right-skewed, suggesting that a count model is appropriate.
glmm_poisson <- glmer(
Transitions ~ Treatment * Gender + (1 | Family),
data = guppy_data,
family = poisson
)
summary(glmm_poisson)## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: Transitions ~ Treatment * Gender + (1 | Family)
## Data: guppy_data
##
## AIC BIC logLik -2*log(L) df.resid
## 846.0 860.9 -418.0 836.0 140
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.7489 -1.2678 -0.4613 0.9714 5.4208
##
## Random effects:
## Groups Name Variance Std.Dev.
## Family (Intercept) 0.3909 0.6253
## Number of obs: 145, groups: Family, 19
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.45785 0.21925 6.649 2.95e-11 ***
## Treatment20ng -0.01389 0.31712 -0.044 0.96505
## GenderM 0.25561 0.12507 2.044 0.04098 *
## Treatment20ng:GenderM -0.52644 0.17589 -2.993 0.00276 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Trtm20 GendrM
## Tretmnt20ng -0.690
## GenderM -0.321 0.221
## Trtmnt20:GM 0.228 -0.298 -0.711
Explanation:
Poisson models are standard for count data, but they assume:
variance = mean
This assumption is often violated.
pearson_resid <- residuals(glmm_poisson, type = "pearson")
dispersion_ratio <- sum(pearson_resid^2) / df.residual(glmm_poisson)
dispersion_ratio## [1] 2.629349
If the dispersion ratio is greater than 1.5, overdispersion is present.
glmm_nb <- glmer.nb(
Transitions ~ Treatment * Gender + (1 | Family),
data = guppy_data
)
summary(glmm_nb)## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(1.941) ( log )
## Formula: Transitions ~ Treatment * Gender + (1 | Family)
## Data: guppy_data
##
## AIC BIC logLik -2*log(L) df.resid
## 747.9 765.8 -368.0 735.9 139
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.2017 -0.7242 -0.2169 0.5586 3.4725
##
## Random effects:
## Groups Name Variance Std.Dev.
## Family (Intercept) 0.2882 0.5369
## Number of obs: 145, groups: Family, 19
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.45302 0.23808 6.103 1.04e-09 ***
## Treatment20ng 0.03918 0.34268 0.114 0.909
## GenderM 0.18220 0.22396 0.814 0.416
## Treatment20ng:GenderM -0.51811 0.32037 -1.617 0.106
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Trtm20 GendrM
## Tretmnt20ng -0.695
## GenderM -0.524 0.364
## Trtmnt20:GM 0.368 -0.513 -0.701
Explanation:
Negative binomial models handle overdispersion in count data.
Interpretation:
Diagnostic plots check whether model assumptions are satisfied.
emm_guppy <- emmeans(glmm_nb, ~ Treatment * Gender, type = "response")
emm_df <- as.data.frame(emm_guppy)
ggplot(emm_df, aes(x = Gender, y = response, color = Treatment)) +
geom_point(size = 4) +
geom_line(aes(group = Treatment)) +
theme_bw() +
labs(
title = "Predicted Transitions by Treatment and Gender",
y = "Predicted Number of Transitions"
)Interpretation:
This interaction plot shows how treatment and gender influence guppy behaviour.
R Core Team (2024).
R: A language and environment for statistical computing.