Introduction

This report presents two statistical analyses using mixed-effects models in R:

Greenhouse Gas Emissions Analysis
- Linear Mixed Effects Models (LMM)
Guppy Behaviour Analysis
- Generalized Linear Mixed Effects Models (GLMM)

Mixed-effects models are useful because they allow the inclusion of:

Fixed effects (predictor variables of interest)
Random effects (grouping variables such as counties or families)

Load Required Libraries

# Libraries for data analysis and visualization

library(readxl)     # Import Excel files
library(lme4)       # Mixed effects models
library(lmerTest)   # p-values for LMM
library(emmeans)    # Estimated marginal means
library(ggplot2)    # Visualization
library(dplyr)      # Data manipulation
library(tidyr)      # Data tidying
library(car)        # ANOVA
library(DHARMa)     # GLMM diagnostics
library(gridExtra)  # Multiple plots
library(MASS)       # Negative binomial models
library(multcomp)
library(multcompView)

Part 1: Greenhouse Gas Emissions Analysis

Load and Prepare Data

# Import greenhouse gas dataset
gge_data <- read_excel("greenhousegases.xlsx")

# Convert categorical variables into factors
gge_data$county <- as.factor(gge_data$county)
gge_data$sector <- as.factor(gge_data$sector)
gge_data$municipality <- as.factor(gge_data$municipality)

# Display structure
str(gge_data)

## tibble [525 × 5] (S3: tbl_df/tbl/data.frame)
##  $ municipality: Factor w/ 75 levels "Alingsås","Älmhult",..: 1 1 1 1 1 1 1 5 5 5 ...
##  $ sector      : Factor w/ 7 levels "Agriculture",..: 5 3 4 2 7 6 1 4 2 3 ...
##  $ county      : Factor w/ 19 levels "Blekinge län",..: 19 19 19 19 19 19 19 8 8 8 ...
##  $ GG2017      : chr [1:525] "6579.075" "2232.759" "8300.543" "3228.022" ...
##  $ pop2017     : chr [1:525] "40.39" "40.39" "40.39" "40.39" ...

# Dataset information
dim(gge_data)

## [1] 525   5

length(unique(gge_data$county))

## [1] 19

length(unique(gge_data$sector))

## [1] 7

length(unique(gge_data$municipality))

## [1] 75

Explanation:

County will be treated as a random effect
Sector and population are treated as fixed effects

Summary Statistics by Sector

sector_summary <- gge_data %>%
  group_by(sector) %>%
  summarise(
    n = n(),
    mean_emissions = mean(GG2017),
    sd_emissions = sd(GG2017),
    median_emissions = median(GG2017),
    min_emissions = min(GG2017),
    max_emissions = max(GG2017),
    mean_population = mean(pop2017)
  )

## Warning: There were 14 warnings in `summarise()`.
## The first warning was:
## ℹ In argument: `mean_emissions = mean(GG2017)`.
## ℹ In group 1: `sector = Agriculture`.
## Caused by warning in `mean.default()`:
## ! argument is not numeric or logical: returning NA
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 13 remaining warnings.

sector_summary

## # A tibble: 7 × 8
##   sector          n mean_emissions sd_emissions median_emissions min_emissions
##   <fct>       <int>          <dbl>        <dbl> <chr>            <chr>        
## 1 Agriculture    75             NA       40422. 32719.17         10014.86     
## 2 Heating        75             NA        2945. 3228.022         1204.654     
## 3 Industry       75             NA      288889. 321047.9         1065.189     
## 4 Machinery      75             NA       17397. 4003.197         10239.81     
## 5 Product use    75             NA        5308. 2854.124         11208.1      
## 6 Transport      75             NA       51268. 27927.11         10284.54     
## 7 Waste          75             NA        4693. 399.7303         1056.943     
## # ℹ 2 more variables: max_emissions <chr>, mean_population <dbl>

Interpretation:

The summary statistics show variation in emissions across sectors. Some sectors contribute substantially more greenhouse gas emissions than others.

Distribution of Emissions

# Convert emissions variable to numeric
gge_data$GG2017 <- as.numeric(as.character(gge_data$GG2017))

# Histogram of raw emissions
p1 <- ggplot(gge_data, aes(x = GG2017)) +
  geom_histogram(bins = 30, fill = "steelblue", color = "black") +
  labs(title = "Distribution of Greenhouse Gas Emissions")

# Histogram of log-transformed emissions
p2 <- ggplot(gge_data, aes(x = log10(GG2017))) +
  geom_histogram(bins = 30, fill = "darkgreen", color = "black") +
  labs(title = "Log10 Transformed Emissions")

grid.arrange(p1, p2, ncol = 2)

Interpretation:

The raw emissions distribution is highly right-skewed.
Applying a log transformation produces a more symmetrical distribution suitable for linear modeling.

Population and Emissions Relationship

gge_data$logGG <- log10(gge_data$GG2017)

ggplot(gge_data, aes(x = pop2017, y = logGG, color = sector)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_bw() +
  labs(
    title = "Population vs Emissions by Sector",
    x = "Population",
    y = "Log Emissions"
  )

## `geom_smooth()` using formula = 'y ~ x'

Interpretation:

Municipalities with larger populations tend to produce higher greenhouse gas emissions.

Linear Mixed Effects Model

model_lmm1 <- lmer(
  logGG ~ sector + pop2017 + (1 | county),
  data = gge_data
)

summary(model_lmm1)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: logGG ~ sector + pop2017 + (1 | county)
##    Data: gge_data
## 
## REML criterion at convergence: 636.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7981 -0.4630 -0.0426  0.4287  6.1846 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  county   (Intercept) 0.0217   0.1473  
##  Residual             0.1676   0.4094  
## Number of obs: 525, groups:  county, 19
## 
## Fixed effects:
##                     Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        3.865e+00  2.181e-01  7.174e-09  17.720  1.00000    
## sectorHeating     -7.222e-01  6.686e-02  4.440e+02 -10.802  < 2e-16 ***
## sectorIndustry    -3.073e-01  6.686e-02  4.440e+02  -4.596 5.63e-06 ***
## sectorMachinery   -2.003e-01  6.686e-02  4.440e+02  -2.996  0.00288 ** 
## sectorProduct use -6.195e-01  6.686e-02  4.440e+02  -9.266  < 2e-16 ***
## sectorTransport    4.767e-01  6.686e-02  4.440e+02   7.130 4.10e-12 ***
## sectorWaste       -8.694e-01  6.686e-02  4.440e+02 -13.003  < 2e-16 ***
## pop201710.241     -1.419e-01  3.022e-01  6.608e-09  -0.469  1.00000    
## pop201710.66      -3.597e-01  3.022e-01  6.608e-09  -1.190  1.00000    
## pop201710.747     -1.142e-01  3.022e-01  6.608e-09  -0.378  1.00000    
## pop201710.837      2.865e-01  3.022e-01  6.608e-09   0.948  1.00000    
## pop201710.894      1.238e-01  3.022e-01  6.608e-09   0.410  1.00000    
## pop201711.496      2.009e-01  3.022e-01  6.608e-09   0.665  1.00000    
## pop201711.841      3.027e-01  3.022e-01  6.608e-09   1.002  1.00000    
## pop201711.845      1.092e-01  3.022e-01  6.608e-09   0.361  1.00000    
## pop201712.008      1.104e-01  3.022e-01  6.608e-09   0.365  1.00000    
## pop201712.257      1.782e-01  3.022e-01  6.608e-09   0.590  1.00000    
## pop201712.711      1.332e-01  3.022e-01  6.608e-09   0.441  1.00000    
## pop201712.916      1.474e-02  3.022e-01  6.608e-09   0.049  1.00000    
## pop201713.182      3.390e-02  2.188e-01  4.440e+02   0.155  0.87698    
## pop201713.84       2.927e-01  3.022e-01  6.608e-09   0.969  1.00000    
## pop2017140.927     9.971e-01  3.022e-01  6.608e-09   3.300  1.00000    
## pop201715.429      2.099e-01  2.188e-01  4.440e+02   0.959  0.33792    
## pop201715.64       1.270e-01  3.022e-01  6.608e-09   0.420  1.00000    
## pop201715.79       6.336e-03  3.022e-01  6.608e-09   0.021  1.00000    
## pop2017150.134     8.730e-01  3.022e-01  6.608e-09   2.889  1.00000    
## pop2017158.52      9.760e-01  3.022e-01  6.608e-09   3.230  1.00000    
## pop201716.169      3.716e-01  3.022e-01  6.608e-09   1.230  1.00000    
## pop201716.478      1.138e-01  2.188e-01  4.440e+02   0.520  0.60339    
## pop201717.148      3.772e-01  3.022e-01  6.608e-09   1.248  1.00000    
## pop201717.416      2.834e-01  3.022e-01  6.608e-09   0.938  1.00000    
## pop201717.455      2.149e-01  3.022e-01  6.608e-09   0.711  1.00000    
## pop201717.462      4.894e-01  2.188e-01  4.440e+02   2.236  0.02584 *  
## pop201717.825      6.802e-01  3.022e-01  6.608e-09   2.251  1.00000    
## pop201718.073      1.347e-01  2.188e-01  4.440e+02   0.616  0.53854    
## pop20172.516      -4.043e-01  3.022e-01  6.608e-09  -1.338  1.00000    
## pop20172.809      -4.169e-01  3.022e-01  6.608e-09  -1.380  1.00000    
## pop20172.821      -4.307e-01  3.022e-01  6.608e-09  -1.425  1.00000    
## pop201720.93       4.356e-01  3.022e-01  6.608e-09   1.442  1.00000    
## pop201721.074      2.331e-01  2.188e-01  4.440e+02   1.065  0.28738    
## pop201721.577      3.662e-01  3.022e-01  6.608e-09   1.212  1.00000    
## pop201721.927      3.405e-01  3.022e-01  6.608e-09   1.127  1.00000    
## pop201723.256      4.753e-01  3.022e-01  6.608e-09   1.573  1.00000    
## pop201724.264      1.697e-01  2.188e-01  4.440e+02   0.775  0.43847    
## pop201724.65       5.100e-01  3.022e-01  6.608e-09   1.688  1.00000    
## pop201725.147      5.566e-01  3.022e-01  6.608e-09   1.842  1.00000    
## pop201726.224      5.474e-01  3.022e-01  6.608e-09   1.812  1.00000    
## pop201726.918      3.415e-01  3.022e-01  6.608e-09   1.130  1.00000    
## pop20173.133      -4.696e-01  3.022e-01  6.608e-09  -1.554  1.00000    
## pop201732.2        5.346e-01  3.022e-01  6.608e-09   1.769  1.00000    
## pop201733.077      5.848e-01  3.022e-01  6.608e-09   1.936  1.00000    
## pop201733.236      7.497e-01  2.188e-01  4.440e+02   3.426  0.00067 ***
## pop201734.133      4.734e-01  3.022e-01  6.608e-09   1.567  1.00000    
## pop201735.045      3.192e-01  3.022e-01  6.608e-09   1.056  1.00000    
## pop201735.79       7.171e-02  2.188e-01  4.440e+02   0.328  0.74332    
## pop201737.401      6.946e-01  3.022e-01  6.608e-09   2.299  1.00000    
## pop201740.39       3.834e-01  3.022e-01  6.608e-09   1.269  1.00000    
## pop201741.786      4.453e-01  2.188e-01  4.440e+02   2.035  0.04247 *  
## pop201742.184      7.256e-01  3.022e-01  6.608e-09   2.401  1.00000    
## pop201743.549      4.651e-01  3.022e-01  6.608e-09   1.539  1.00000    
## pop201744.605      1.326e-01  3.022e-01  6.608e-09   0.439  1.00000    
## pop201747.146      4.537e-01  3.022e-01  6.608e-09   1.502  1.00000    
## pop201747.185     -7.379e-02  3.022e-01  6.608e-09  -0.244  1.00000    
## pop20175.412      -1.219e-01  3.022e-01  6.608e-09  -0.403  1.00000    
## pop20175.896      -2.805e-01  3.022e-01  6.608e-09  -0.928  1.00000    
## pop201752.003      8.645e-01  2.188e-01  4.440e+02   3.950 9.09e-05 ***
## pop201755.763      4.461e-01  3.022e-01  6.608e-09   1.476  1.00000    
## pop201758.595      1.242e+00  3.022e-01  6.608e-09   4.112  1.00000    
## pop20176.837      -4.669e-02  3.022e-01  6.608e-09  -0.155  1.00000    
## pop201766.121      3.755e-01  3.022e-01  6.608e-09   1.243  1.00000    
## pop20177.109      -3.744e-03  3.022e-01  6.608e-09  -0.012  1.00000    
## pop201772.723      9.866e-01  3.022e-01  6.608e-09   3.265  1.00000    
## pop20178.603      -1.766e-01  3.022e-01  6.608e-09  -0.584  1.00000    
## pop20178.776      -1.763e-01  3.022e-01  6.608e-09  -0.583  1.00000    
## pop20179.377       5.494e-02  3.022e-01  6.608e-09   0.182  1.00000    
## pop20179.485       9.467e-02  3.022e-01  6.608e-09   0.313  1.00000    
## pop20179.668       6.871e-02  3.022e-01  6.608e-09   0.227  1.00000    
## pop20179.805      -9.942e-02  3.022e-01  6.608e-09  -0.329  1.00000    
## pop201791.925      4.041e-01  3.022e-01  6.608e-09   1.337  1.00000    
## pop201798.81       9.791e-01  3.022e-01  6.608e-09   3.240  1.00000    
## pop201799.752      9.467e-01  3.022e-01  6.608e-09   3.133  1.00000    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 81 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

anova(model_lmm1)

## Type III Analysis of Variance Table with Satterthwaite's method
##         Sum Sq Mean Sq NumDF DenDF F value Pr(>F)    
## sector  97.865 16.3108     6   444  97.303 <2e-16 ***
## pop2017 52.632  0.7112    74     2   4.243 0.2094    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Explanation:

Model structure:

log emissions = sector + population + random county variation

The random effect allows each county to have its own baseline emission level.

Model Diagnostics

residuals_df <- data.frame(
  Fitted = fitted(model_lmm1),
  Residuals = residuals(model_lmm1)
)

p_resid <- ggplot(residuals_df, aes(Fitted, Residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed")

p_qq <- ggplot(residuals_df, aes(sample = Residuals)) +
  stat_qq() +
  stat_qq_line()

grid.arrange(p_resid, p_qq, ncol = 2)

Interpretation:

Residual plots help verify assumptions such as:

Homoscedasticity
Normality of residuals

Post-Hoc Comparisons

emm_sector <- emmeans(model_lmm1, ~ sector)

pairs(emm_sector, adjust = "tukey")

##  contrast                  estimate     SE  df t.ratio p.value
##  Agriculture - Heating        0.722 0.0671 444  10.767 <0.0001
##  Agriculture - Industry       0.307 0.0671 444   4.581  0.0001
##  Agriculture - Machinery      0.200 0.0671 444   2.987  0.0466
##  Agriculture - Product use    0.619 0.0671 444   9.235 <0.0001
##  Agriculture - Transport     -0.477 0.0671 444  -7.108 <0.0001
##  Agriculture - Waste          0.869 0.0671 444  12.960 <0.0001
##  Heating - Industry          -0.415 0.0669 444  -6.206 <0.0001
##  Heating - Machinery         -0.522 0.0669 444  -7.805 <0.0001
##  Heating - Product use       -0.103 0.0669 444  -1.536  0.7230
##  Heating - Transport         -1.199 0.0669 444 -17.932 <0.0001
##  Heating - Waste              0.147 0.0669 444   2.201  0.2968
##  Industry - Machinery        -0.107 0.0669 444  -1.599  0.6830
##  Industry - Product use       0.312 0.0669 444   4.670 <0.0001
##  Industry - Transport        -0.784 0.0669 444 -11.726 <0.0001
##  Industry - Waste             0.562 0.0669 444   8.407 <0.0001
##  Machinery - Product use      0.419 0.0669 444   6.269 <0.0001
##  Machinery - Transport       -0.677 0.0669 444 -10.126 <0.0001
##  Machinery - Waste            0.669 0.0669 444  10.007 <0.0001
##  Product use - Transport     -1.096 0.0669 444 -16.396 <0.0001
##  Product use - Waste          0.250 0.0669 444   3.737  0.0039
##  Transport - Waste            1.346 0.0669 444  20.133 <0.0001
## 
## Results are averaged over the levels of: pop2017 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 7 estimates

Interpretation:

Post-hoc comparisons identify which sectors differ significantly in emission levels.

Part 2: Guppy Behaviour Analysis

Load Data

guppy_data <- read_excel("guppy.xls")

guppy_data$Family <- as.factor(guppy_data$Family)
guppy_data$Gender <- as.factor(guppy_data$Gender)
guppy_data$Treatment <- as.factor(guppy_data$Treatment)

str(guppy_data)

## tibble [145 × 4] (S3: tbl_df/tbl/data.frame)
##  $ Family     : Factor w/ 19 levels "0A2-1","0A2-2",..: 7 7 7 7 10 10 10 10 10 10 ...
##  $ Gender     : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Treatment  : Factor w/ 2 levels "0ng","20ng": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Transitions: num [1:145] 0 4 0 1 0 0 0 4 2 1 ...

table(guppy_data$Treatment, guppy_data$Gender)

##       
##         F  M
##   0ng  32 42
##   20ng 31 40

Explanation:

Family is used as a random effect
Treatment and Gender are fixed effects

Visualize Behaviour Data

ggplot(guppy_data, aes(x = Transitions, fill = Treatment)) +
  geom_histogram(binwidth = 1, alpha = 0.7) +
  facet_wrap(~Gender) +
  theme_bw() +
  labs(
    title = "Distribution of Guppy Transitions",
    x = "Number of Transitions",
    y = "Count"
  )

Interpretation:

Transition counts are discrete and right-skewed, suggesting that a count model is appropriate.

Poisson GLMM

glmm_poisson <- glmer(
  Transitions ~ Treatment * Gender + (1 | Family),
  data = guppy_data,
  family = poisson
)

summary(glmm_poisson)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: Transitions ~ Treatment * Gender + (1 | Family)
##    Data: guppy_data
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     846.0     860.9    -418.0     836.0       140 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.7489 -1.2678 -0.4613  0.9714  5.4208 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  Family (Intercept) 0.3909   0.6253  
## Number of obs: 145, groups:  Family, 19
## 
## Fixed effects:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            1.45785    0.21925   6.649 2.95e-11 ***
## Treatment20ng         -0.01389    0.31712  -0.044  0.96505    
## GenderM                0.25561    0.12507   2.044  0.04098 *  
## Treatment20ng:GenderM -0.52644    0.17589  -2.993  0.00276 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) Trtm20 GendrM
## Tretmnt20ng -0.690              
## GenderM     -0.321  0.221       
## Trtmnt20:GM  0.228 -0.298 -0.711

Explanation:

Poisson models are standard for count data, but they assume:

variance = mean

This assumption is often violated.

Check Overdispersion

pearson_resid <- residuals(glmm_poisson, type = "pearson")

dispersion_ratio <- sum(pearson_resid^2) / df.residual(glmm_poisson)

dispersion_ratio

## [1] 2.629349

If the dispersion ratio is greater than 1.5, overdispersion is present.

Negative Binomial GLMM

glmm_nb <- glmer.nb(
  Transitions ~ Treatment * Gender + (1 | Family),
  data = guppy_data
)

summary(glmm_nb)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(1.941)  ( log )
## Formula: Transitions ~ Treatment * Gender + (1 | Family)
##    Data: guppy_data
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     747.9     765.8    -368.0     735.9       139 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.2017 -0.7242 -0.2169  0.5586  3.4725 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  Family (Intercept) 0.2882   0.5369  
## Number of obs: 145, groups:  Family, 19
## 
## Fixed effects:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            1.45302    0.23808   6.103 1.04e-09 ***
## Treatment20ng          0.03918    0.34268   0.114    0.909    
## GenderM                0.18220    0.22396   0.814    0.416    
## Treatment20ng:GenderM -0.51811    0.32037  -1.617    0.106    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) Trtm20 GendrM
## Tretmnt20ng -0.695              
## GenderM     -0.524  0.364       
## Trtmnt20:GM  0.368 -0.513 -0.701

Explanation:

Negative binomial models handle overdispersion in count data.

Model Diagnostics

simulationOutput <- simulateResiduals(glmm_nb)

plot(simulationOutput)

Interpretation:

Diagnostic plots check whether model assumptions are satisfied.

Predicted Behaviour

emm_guppy <- emmeans(glmm_nb, ~ Treatment * Gender, type = "response")

emm_df <- as.data.frame(emm_guppy)

ggplot(emm_df, aes(x = Gender, y = response, color = Treatment)) +
  geom_point(size = 4) +
  geom_line(aes(group = Treatment)) +
  theme_bw() +
  labs(
    title = "Predicted Transitions by Treatment and Gender",
    y = "Predicted Number of Transitions"
  )

Interpretation:

This interaction plot shows how treatment and gender influence guppy behaviour.

Conclusion

Greenhouse Gas Analysis

Sector significantly affects emissions.
Population size increases emission levels.
Counties show random variation in baseline emissions.

Guppy Behaviour Analysis

Behaviour varies with treatment and gender.
Data showed overdispersion.
Negative binomial GLMM provided the best fit.

References

R Core Team (2024).
R: A language and environment for statistical computing.

Assignment 2: Mixed Effects Models Analysis

Iman makhzoum

2026-03-15

Introduction

Load Required Libraries

Part 1: Greenhouse Gas Emissions Analysis

Load and Prepare Data

Summary Statistics by Sector

Distribution of Emissions

Population and Emissions Relationship

Linear Mixed Effects Model

Model Diagnostics

Post-Hoc Comparisons

Part 2: Guppy Behaviour Analysis

Load Data

Visualize Behaviour Data

Poisson GLMM

Check Overdispersion

Negative Binomial GLMM

Model Diagnostics

Predicted Behaviour

Conclusion

Greenhouse Gas Analysis

Guppy Behaviour Analysis

References