Introduction

This report presents two statistical analyses using mixed-effects models in R:

  1. Greenhouse Gas Emissions Analysis
    • Linear Mixed Effects Models (LMM)
  2. Guppy Behaviour Analysis
    • Generalized Linear Mixed Effects Models (GLMM)

Mixed-effects models are useful because they allow the inclusion of:

  • Fixed effects (predictor variables of interest)
  • Random effects (grouping variables such as counties or families)

Load Required Libraries

# Libraries for data analysis and visualization

library(readxl)     # Import Excel files
library(lme4)       # Mixed effects models
library(lmerTest)   # p-values for LMM
library(emmeans)    # Estimated marginal means
library(ggplot2)    # Visualization
library(dplyr)      # Data manipulation
library(tidyr)      # Data tidying
library(car)        # ANOVA
library(DHARMa)     # GLMM diagnostics
library(gridExtra)  # Multiple plots
library(MASS)       # Negative binomial models
library(multcomp)
library(multcompView)

Part 1: Greenhouse Gas Emissions Analysis

Load and Prepare Data

# Import greenhouse gas dataset
gge_data <- read_excel("greenhousegases.xlsx")

# Convert categorical variables into factors
gge_data$county <- as.factor(gge_data$county)
gge_data$sector <- as.factor(gge_data$sector)
gge_data$municipality <- as.factor(gge_data$municipality)

# Display structure
str(gge_data)
## tibble [525 × 5] (S3: tbl_df/tbl/data.frame)
##  $ municipality: Factor w/ 75 levels "Alingsås","Älmhult",..: 1 1 1 1 1 1 1 5 5 5 ...
##  $ sector      : Factor w/ 7 levels "Agriculture",..: 5 3 4 2 7 6 1 4 2 3 ...
##  $ county      : Factor w/ 19 levels "Blekinge län",..: 19 19 19 19 19 19 19 8 8 8 ...
##  $ GG2017      : chr [1:525] "6579.075" "2232.759" "8300.543" "3228.022" ...
##  $ pop2017     : chr [1:525] "40.39" "40.39" "40.39" "40.39" ...
# Dataset information
dim(gge_data)
## [1] 525   5
length(unique(gge_data$county))
## [1] 19
length(unique(gge_data$sector))
## [1] 7
length(unique(gge_data$municipality))
## [1] 75

Explanation:

  • County will be treated as a random effect
  • Sector and population are treated as fixed effects

Summary Statistics by Sector

sector_summary <- gge_data %>%
  group_by(sector) %>%
  summarise(
    n = n(),
    mean_emissions = mean(GG2017),
    sd_emissions = sd(GG2017),
    median_emissions = median(GG2017),
    min_emissions = min(GG2017),
    max_emissions = max(GG2017),
    mean_population = mean(pop2017)
  )
## Warning: There were 14 warnings in `summarise()`.
## The first warning was:
## ℹ In argument: `mean_emissions = mean(GG2017)`.
## ℹ In group 1: `sector = Agriculture`.
## Caused by warning in `mean.default()`:
## ! argument is not numeric or logical: returning NA
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 13 remaining warnings.
sector_summary
## # A tibble: 7 × 8
##   sector          n mean_emissions sd_emissions median_emissions min_emissions
##   <fct>       <int>          <dbl>        <dbl> <chr>            <chr>        
## 1 Agriculture    75             NA       40422. 32719.17         10014.86     
## 2 Heating        75             NA        2945. 3228.022         1204.654     
## 3 Industry       75             NA      288889. 321047.9         1065.189     
## 4 Machinery      75             NA       17397. 4003.197         10239.81     
## 5 Product use    75             NA        5308. 2854.124         11208.1      
## 6 Transport      75             NA       51268. 27927.11         10284.54     
## 7 Waste          75             NA        4693. 399.7303         1056.943     
## # ℹ 2 more variables: max_emissions <chr>, mean_population <dbl>

Interpretation:

The summary statistics show variation in emissions across sectors. Some sectors contribute substantially more greenhouse gas emissions than others.


Distribution of Emissions

# Convert emissions variable to numeric
gge_data$GG2017 <- as.numeric(as.character(gge_data$GG2017))

# Histogram of raw emissions
p1 <- ggplot(gge_data, aes(x = GG2017)) +
  geom_histogram(bins = 30, fill = "steelblue", color = "black") +
  labs(title = "Distribution of Greenhouse Gas Emissions")

# Histogram of log-transformed emissions
p2 <- ggplot(gge_data, aes(x = log10(GG2017))) +
  geom_histogram(bins = 30, fill = "darkgreen", color = "black") +
  labs(title = "Log10 Transformed Emissions")

grid.arrange(p1, p2, ncol = 2)

Interpretation:

The raw emissions distribution is highly right-skewed.
Applying a log transformation produces a more symmetrical distribution suitable for linear modeling.


Population and Emissions Relationship

gge_data$logGG <- log10(gge_data$GG2017)

ggplot(gge_data, aes(x = pop2017, y = logGG, color = sector)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_bw() +
  labs(
    title = "Population vs Emissions by Sector",
    x = "Population",
    y = "Log Emissions"
  )
## `geom_smooth()` using formula = 'y ~ x'

Interpretation:

Municipalities with larger populations tend to produce higher greenhouse gas emissions.


Linear Mixed Effects Model

model_lmm1 <- lmer(
  logGG ~ sector + pop2017 + (1 | county),
  data = gge_data
)

summary(model_lmm1)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: logGG ~ sector + pop2017 + (1 | county)
##    Data: gge_data
## 
## REML criterion at convergence: 636.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.7981 -0.4630 -0.0426  0.4287  6.1846 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  county   (Intercept) 0.0217   0.1473  
##  Residual             0.1676   0.4094  
## Number of obs: 525, groups:  county, 19
## 
## Fixed effects:
##                     Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        3.865e+00  2.181e-01  7.174e-09  17.720  1.00000    
## sectorHeating     -7.222e-01  6.686e-02  4.440e+02 -10.802  < 2e-16 ***
## sectorIndustry    -3.073e-01  6.686e-02  4.440e+02  -4.596 5.63e-06 ***
## sectorMachinery   -2.003e-01  6.686e-02  4.440e+02  -2.996  0.00288 ** 
## sectorProduct use -6.195e-01  6.686e-02  4.440e+02  -9.266  < 2e-16 ***
## sectorTransport    4.767e-01  6.686e-02  4.440e+02   7.130 4.10e-12 ***
## sectorWaste       -8.694e-01  6.686e-02  4.440e+02 -13.003  < 2e-16 ***
## pop201710.241     -1.419e-01  3.022e-01  6.608e-09  -0.469  1.00000    
## pop201710.66      -3.597e-01  3.022e-01  6.608e-09  -1.190  1.00000    
## pop201710.747     -1.142e-01  3.022e-01  6.608e-09  -0.378  1.00000    
## pop201710.837      2.865e-01  3.022e-01  6.608e-09   0.948  1.00000    
## pop201710.894      1.238e-01  3.022e-01  6.608e-09   0.410  1.00000    
## pop201711.496      2.009e-01  3.022e-01  6.608e-09   0.665  1.00000    
## pop201711.841      3.027e-01  3.022e-01  6.608e-09   1.002  1.00000    
## pop201711.845      1.092e-01  3.022e-01  6.608e-09   0.361  1.00000    
## pop201712.008      1.104e-01  3.022e-01  6.608e-09   0.365  1.00000    
## pop201712.257      1.782e-01  3.022e-01  6.608e-09   0.590  1.00000    
## pop201712.711      1.332e-01  3.022e-01  6.608e-09   0.441  1.00000    
## pop201712.916      1.474e-02  3.022e-01  6.608e-09   0.049  1.00000    
## pop201713.182      3.390e-02  2.188e-01  4.440e+02   0.155  0.87698    
## pop201713.84       2.927e-01  3.022e-01  6.608e-09   0.969  1.00000    
## pop2017140.927     9.971e-01  3.022e-01  6.608e-09   3.300  1.00000    
## pop201715.429      2.099e-01  2.188e-01  4.440e+02   0.959  0.33792    
## pop201715.64       1.270e-01  3.022e-01  6.608e-09   0.420  1.00000    
## pop201715.79       6.336e-03  3.022e-01  6.608e-09   0.021  1.00000    
## pop2017150.134     8.730e-01  3.022e-01  6.608e-09   2.889  1.00000    
## pop2017158.52      9.760e-01  3.022e-01  6.608e-09   3.230  1.00000    
## pop201716.169      3.716e-01  3.022e-01  6.608e-09   1.230  1.00000    
## pop201716.478      1.138e-01  2.188e-01  4.440e+02   0.520  0.60339    
## pop201717.148      3.772e-01  3.022e-01  6.608e-09   1.248  1.00000    
## pop201717.416      2.834e-01  3.022e-01  6.608e-09   0.938  1.00000    
## pop201717.455      2.149e-01  3.022e-01  6.608e-09   0.711  1.00000    
## pop201717.462      4.894e-01  2.188e-01  4.440e+02   2.236  0.02584 *  
## pop201717.825      6.802e-01  3.022e-01  6.608e-09   2.251  1.00000    
## pop201718.073      1.347e-01  2.188e-01  4.440e+02   0.616  0.53854    
## pop20172.516      -4.043e-01  3.022e-01  6.608e-09  -1.338  1.00000    
## pop20172.809      -4.169e-01  3.022e-01  6.608e-09  -1.380  1.00000    
## pop20172.821      -4.307e-01  3.022e-01  6.608e-09  -1.425  1.00000    
## pop201720.93       4.356e-01  3.022e-01  6.608e-09   1.442  1.00000    
## pop201721.074      2.331e-01  2.188e-01  4.440e+02   1.065  0.28738    
## pop201721.577      3.662e-01  3.022e-01  6.608e-09   1.212  1.00000    
## pop201721.927      3.405e-01  3.022e-01  6.608e-09   1.127  1.00000    
## pop201723.256      4.753e-01  3.022e-01  6.608e-09   1.573  1.00000    
## pop201724.264      1.697e-01  2.188e-01  4.440e+02   0.775  0.43847    
## pop201724.65       5.100e-01  3.022e-01  6.608e-09   1.688  1.00000    
## pop201725.147      5.566e-01  3.022e-01  6.608e-09   1.842  1.00000    
## pop201726.224      5.474e-01  3.022e-01  6.608e-09   1.812  1.00000    
## pop201726.918      3.415e-01  3.022e-01  6.608e-09   1.130  1.00000    
## pop20173.133      -4.696e-01  3.022e-01  6.608e-09  -1.554  1.00000    
## pop201732.2        5.346e-01  3.022e-01  6.608e-09   1.769  1.00000    
## pop201733.077      5.848e-01  3.022e-01  6.608e-09   1.936  1.00000    
## pop201733.236      7.497e-01  2.188e-01  4.440e+02   3.426  0.00067 ***
## pop201734.133      4.734e-01  3.022e-01  6.608e-09   1.567  1.00000    
## pop201735.045      3.192e-01  3.022e-01  6.608e-09   1.056  1.00000    
## pop201735.79       7.171e-02  2.188e-01  4.440e+02   0.328  0.74332    
## pop201737.401      6.946e-01  3.022e-01  6.608e-09   2.299  1.00000    
## pop201740.39       3.834e-01  3.022e-01  6.608e-09   1.269  1.00000    
## pop201741.786      4.453e-01  2.188e-01  4.440e+02   2.035  0.04247 *  
## pop201742.184      7.256e-01  3.022e-01  6.608e-09   2.401  1.00000    
## pop201743.549      4.651e-01  3.022e-01  6.608e-09   1.539  1.00000    
## pop201744.605      1.326e-01  3.022e-01  6.608e-09   0.439  1.00000    
## pop201747.146      4.537e-01  3.022e-01  6.608e-09   1.502  1.00000    
## pop201747.185     -7.379e-02  3.022e-01  6.608e-09  -0.244  1.00000    
## pop20175.412      -1.219e-01  3.022e-01  6.608e-09  -0.403  1.00000    
## pop20175.896      -2.805e-01  3.022e-01  6.608e-09  -0.928  1.00000    
## pop201752.003      8.645e-01  2.188e-01  4.440e+02   3.950 9.09e-05 ***
## pop201755.763      4.461e-01  3.022e-01  6.608e-09   1.476  1.00000    
## pop201758.595      1.242e+00  3.022e-01  6.608e-09   4.112  1.00000    
## pop20176.837      -4.669e-02  3.022e-01  6.608e-09  -0.155  1.00000    
## pop201766.121      3.755e-01  3.022e-01  6.608e-09   1.243  1.00000    
## pop20177.109      -3.744e-03  3.022e-01  6.608e-09  -0.012  1.00000    
## pop201772.723      9.866e-01  3.022e-01  6.608e-09   3.265  1.00000    
## pop20178.603      -1.766e-01  3.022e-01  6.608e-09  -0.584  1.00000    
## pop20178.776      -1.763e-01  3.022e-01  6.608e-09  -0.583  1.00000    
## pop20179.377       5.494e-02  3.022e-01  6.608e-09   0.182  1.00000    
## pop20179.485       9.467e-02  3.022e-01  6.608e-09   0.313  1.00000    
## pop20179.668       6.871e-02  3.022e-01  6.608e-09   0.227  1.00000    
## pop20179.805      -9.942e-02  3.022e-01  6.608e-09  -0.329  1.00000    
## pop201791.925      4.041e-01  3.022e-01  6.608e-09   1.337  1.00000    
## pop201798.81       9.791e-01  3.022e-01  6.608e-09   3.240  1.00000    
## pop201799.752      9.467e-01  3.022e-01  6.608e-09   3.133  1.00000    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation matrix not shown by default, as p = 81 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it
anova(model_lmm1)
## Type III Analysis of Variance Table with Satterthwaite's method
##         Sum Sq Mean Sq NumDF DenDF F value Pr(>F)    
## sector  97.865 16.3108     6   444  97.303 <2e-16 ***
## pop2017 52.632  0.7112    74     2   4.243 0.2094    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Explanation:

Model structure:

log emissions = sector + population + random county variation

The random effect allows each county to have its own baseline emission level.


Model Diagnostics

residuals_df <- data.frame(
  Fitted = fitted(model_lmm1),
  Residuals = residuals(model_lmm1)
)

p_resid <- ggplot(residuals_df, aes(Fitted, Residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed")

p_qq <- ggplot(residuals_df, aes(sample = Residuals)) +
  stat_qq() +
  stat_qq_line()

grid.arrange(p_resid, p_qq, ncol = 2)

Interpretation:

Residual plots help verify assumptions such as:

  • Homoscedasticity
  • Normality of residuals

Post-Hoc Comparisons

emm_sector <- emmeans(model_lmm1, ~ sector)

pairs(emm_sector, adjust = "tukey")
##  contrast                  estimate     SE  df t.ratio p.value
##  Agriculture - Heating        0.722 0.0671 444  10.767 <0.0001
##  Agriculture - Industry       0.307 0.0671 444   4.581  0.0001
##  Agriculture - Machinery      0.200 0.0671 444   2.987  0.0466
##  Agriculture - Product use    0.619 0.0671 444   9.235 <0.0001
##  Agriculture - Transport     -0.477 0.0671 444  -7.108 <0.0001
##  Agriculture - Waste          0.869 0.0671 444  12.960 <0.0001
##  Heating - Industry          -0.415 0.0669 444  -6.206 <0.0001
##  Heating - Machinery         -0.522 0.0669 444  -7.805 <0.0001
##  Heating - Product use       -0.103 0.0669 444  -1.536  0.7230
##  Heating - Transport         -1.199 0.0669 444 -17.932 <0.0001
##  Heating - Waste              0.147 0.0669 444   2.201  0.2968
##  Industry - Machinery        -0.107 0.0669 444  -1.599  0.6830
##  Industry - Product use       0.312 0.0669 444   4.670 <0.0001
##  Industry - Transport        -0.784 0.0669 444 -11.726 <0.0001
##  Industry - Waste             0.562 0.0669 444   8.407 <0.0001
##  Machinery - Product use      0.419 0.0669 444   6.269 <0.0001
##  Machinery - Transport       -0.677 0.0669 444 -10.126 <0.0001
##  Machinery - Waste            0.669 0.0669 444  10.007 <0.0001
##  Product use - Transport     -1.096 0.0669 444 -16.396 <0.0001
##  Product use - Waste          0.250 0.0669 444   3.737  0.0039
##  Transport - Waste            1.346 0.0669 444  20.133 <0.0001
## 
## Results are averaged over the levels of: pop2017 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 7 estimates

Interpretation:

Post-hoc comparisons identify which sectors differ significantly in emission levels.


Part 2: Guppy Behaviour Analysis

Load Data

guppy_data <- read_excel("guppy.xls")

guppy_data$Family <- as.factor(guppy_data$Family)
guppy_data$Gender <- as.factor(guppy_data$Gender)
guppy_data$Treatment <- as.factor(guppy_data$Treatment)

str(guppy_data)
## tibble [145 × 4] (S3: tbl_df/tbl/data.frame)
##  $ Family     : Factor w/ 19 levels "0A2-1","0A2-2",..: 7 7 7 7 10 10 10 10 10 10 ...
##  $ Gender     : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Treatment  : Factor w/ 2 levels "0ng","20ng": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Transitions: num [1:145] 0 4 0 1 0 0 0 4 2 1 ...
table(guppy_data$Treatment, guppy_data$Gender)
##       
##         F  M
##   0ng  32 42
##   20ng 31 40

Explanation:

  • Family is used as a random effect
  • Treatment and Gender are fixed effects

Visualize Behaviour Data

ggplot(guppy_data, aes(x = Transitions, fill = Treatment)) +
  geom_histogram(binwidth = 1, alpha = 0.7) +
  facet_wrap(~Gender) +
  theme_bw() +
  labs(
    title = "Distribution of Guppy Transitions",
    x = "Number of Transitions",
    y = "Count"
  )

Interpretation:

Transition counts are discrete and right-skewed, suggesting that a count model is appropriate.


Poisson GLMM

glmm_poisson <- glmer(
  Transitions ~ Treatment * Gender + (1 | Family),
  data = guppy_data,
  family = poisson
)

summary(glmm_poisson)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: Transitions ~ Treatment * Gender + (1 | Family)
##    Data: guppy_data
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     846.0     860.9    -418.0     836.0       140 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.7489 -1.2678 -0.4613  0.9714  5.4208 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  Family (Intercept) 0.3909   0.6253  
## Number of obs: 145, groups:  Family, 19
## 
## Fixed effects:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            1.45785    0.21925   6.649 2.95e-11 ***
## Treatment20ng         -0.01389    0.31712  -0.044  0.96505    
## GenderM                0.25561    0.12507   2.044  0.04098 *  
## Treatment20ng:GenderM -0.52644    0.17589  -2.993  0.00276 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) Trtm20 GendrM
## Tretmnt20ng -0.690              
## GenderM     -0.321  0.221       
## Trtmnt20:GM  0.228 -0.298 -0.711

Explanation:

Poisson models are standard for count data, but they assume:

variance = mean

This assumption is often violated.


Check Overdispersion

pearson_resid <- residuals(glmm_poisson, type = "pearson")

dispersion_ratio <- sum(pearson_resid^2) / df.residual(glmm_poisson)

dispersion_ratio
## [1] 2.629349

If the dispersion ratio is greater than 1.5, overdispersion is present.


Negative Binomial GLMM

glmm_nb <- glmer.nb(
  Transitions ~ Treatment * Gender + (1 | Family),
  data = guppy_data
)

summary(glmm_nb)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: Negative Binomial(1.941)  ( log )
## Formula: Transitions ~ Treatment * Gender + (1 | Family)
##    Data: guppy_data
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     747.9     765.8    -368.0     735.9       139 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.2017 -0.7242 -0.2169  0.5586  3.4725 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  Family (Intercept) 0.2882   0.5369  
## Number of obs: 145, groups:  Family, 19
## 
## Fixed effects:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            1.45302    0.23808   6.103 1.04e-09 ***
## Treatment20ng          0.03918    0.34268   0.114    0.909    
## GenderM                0.18220    0.22396   0.814    0.416    
## Treatment20ng:GenderM -0.51811    0.32037  -1.617    0.106    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) Trtm20 GendrM
## Tretmnt20ng -0.695              
## GenderM     -0.524  0.364       
## Trtmnt20:GM  0.368 -0.513 -0.701

Explanation:

Negative binomial models handle overdispersion in count data.


Model Diagnostics

simulationOutput <- simulateResiduals(glmm_nb)

plot(simulationOutput)

Interpretation:

Diagnostic plots check whether model assumptions are satisfied.


Predicted Behaviour

emm_guppy <- emmeans(glmm_nb, ~ Treatment * Gender, type = "response")

emm_df <- as.data.frame(emm_guppy)

ggplot(emm_df, aes(x = Gender, y = response, color = Treatment)) +
  geom_point(size = 4) +
  geom_line(aes(group = Treatment)) +
  theme_bw() +
  labs(
    title = "Predicted Transitions by Treatment and Gender",
    y = "Predicted Number of Transitions"
  )

Interpretation:

This interaction plot shows how treatment and gender influence guppy behaviour.


Conclusion

Greenhouse Gas Analysis

  • Sector significantly affects emissions.
  • Population size increases emission levels.
  • Counties show random variation in baseline emissions.

Guppy Behaviour Analysis

  • Behaviour varies with treatment and gender.
  • Data showed overdispersion.
  • Negative binomial GLMM provided the best fit.

References

R Core Team (2024).
R: A language and environment for statistical computing.