1 Introduction

In this assigment, it divids into two parts. The first part is to analyse greenhouse gas emissions of Swedish municipalities in 2017, where the emissions divided by the number of inhabitants in municipalities and by economic sectors. The main task in Part I is to analyse the effect of the number of inhabitants in municipalities and economic sectors on greenhouse gas emissions with LMM . In Part II, the main task is to study the anxiety treatment Effects of the synthetic hormone 17α-ethinyl estradiol (EE2) on Guppy Poecilia reticulata with GLME. Detail of experitmental design for part II can refer to paperS of (???) and (???).

1.1 Linear Mixed-Effect Model: Package lme4

1.2 Restricted Maximum Likelihood (REML) in LMM

  • The way how to find \(\boldsymbol{T}\) and how to estimate \(\boldsymbol{\beta,\Sigma_{U}, \Sigma_{R} }\) can refer to the book of Shayle R. Searle (1992) and the paper of Harville (1977), Laird and Ware (1982), Pinero and Bates (2000), JJ (2006), Crawley (2013) and Satoh (2018)

  • The advantages of REML approach over ANOVA

    • can produce unbiased estimates of variance and covariance parameters;
    • can analyze unbalanced designs; and
    • has a powerful prediction algorithm that extends the ideas in regression prediction algorithm to cover random as well as fixed effects.

1.3 Laplace Approximation for maximum likelihood in GLMM

Laplace approximation in Generalized linear mixed model is one of the simplest approximations to a log-likelihood function. It is good for a model with latent random variables to a Bayesian posterior calculation by initializing a prior distribution. For those interested in Laplace approximation computation, please refer to the paper of Kass and Steffey (1989).

2 Part I: Greenhouse Gas Emissions: 2017 Sweden Economy

2.1 Research Questions

  • whether county and municipality have any random effects on GG2017

  • whether sector and pop2017 have any fixed effects on GG2017

  • whether interaction between sector and pop2017 have any fixed effects on GG2017

2.2 Variable Definitions

2.3 Descriptive Statistics

2.4 Log Transformation

2.5 Data Visualization

2.5.1 Relationship of County Vs Sector

2.5.2 Sum of GG2017 by County X Sector

2.5.3 Boxplots: Log(GG2017) by Log(pop2017) by sector

2.5.4 Boxplots: Log(GG2017) by Log(pop2017) by county

2.5.5 Boxplots: Log(GG2017) by Log(pop2017) by sector by county

2.5.6 ScatterPlot: Log(GG2017) by Log(pop2017) by sector by county

2.6 Model Specification Strategies:

Table 6 is a list of six-model specifications to test

  • whether county and municipality have any random effects on GG2017

  • whether sector and pop2017 have any fixed effects on GG2017

  • whether interaction between sector and pop2017 have any fixed effects on GG2017

2.6.1 Fixed Effects Test

Type I sum of squares is “sequential”, which show the RSS decreases as each predictor is added to the model.The type I result will change if the order predictors changes in the model differently. Type II anova is “marginal”. The RSS of type II would increase if each predictor in the model was removed predictors one by one from the model but the ordering of the predictors does not affect Type II result. Hence it choose to use Type II anova to test Fixed Effects for 6 model specification with Satterthwaite’s method.

The result in Table 7 indicates that the main effects of Logpop2017 and sector are highly statistical significant and the interaction term (Logpop2017*sector) is less statistical significant for all six model specifications.

2.6.2 LR Tests for Random Terms

Table 8 indicates that Random term of county is highly statistical significant in Model.01 and Model.01NT

2.6.3 Best Model Diagnostics Check

2.6.4 Effects Plot

2.6.5 Tukey Contrasts Test

Table 7 and 8 indicates that there are only eight pairs of the mean difference are highly statistical significant different from mean zero listed below:

  • (Product use and Agriculture)
  • (Product use and Machinery)
  • (Transport and Heating)
  • (Transport and Industry)
  • (Transport and Product use)
  • (Waste and Machinery)
  • (Waste and Agriculture)
  • (Waste and Transport)

2.6.6 Model Comparsion with LR Test

With the nested model comparsion strategies, Table 9 shows that the first best model is Model.01 while the second best model is Model.01NT(which have no interaction term of fixed effects). The summary of LMM Model.01 output shown in Figure 9 which attached in Appendix A

2.7 Conclusion for Part I

The first best model is Model.01 while the second best model is Model.01NT(which have no interaction term of fixed effects). It found that county and municipality have any random effects on GG2017. sector and pop2017 show positively fixed effects on GG2017 However, their interaction between sector and pop2017 contributes less statistically significant fixed effects on GG2017.

3 Part II: Guppy Poecilia reticulata with GLMM (Laplace Approximation)

In this experimental design, the sample size is a total of 145 observations with one count number variable (“Transition”), two 2-level factors (“Gender” and “Treatment”) and one categorical factor(“Family”).

  • The response variable “Transition” is a count number for Guppy Poecilia moving transition from lower to the upper half of the compartment.
  • The variable “Family” is to be a random categorical factor to control for dependency among siblings within treatment groups. Transitions need to do log transformation and assumed followed Poisson distribution.
  • The fixed effects include a 2 level-factor “Gender” and a 2 level-factor “Treatment” (“0ng” for control and “20ng” EE2 in 10ppm acetone)
  • All experiments are conducted in small aquariums which divided into two parts with a horizontal midline. For some fishes experienced anxiety, they will stay at the bottom. The number of transitions from the lower to the upper compartment per fish is recorded as variable “Transitions”.

3.1 Research Question

  • Anxiety treatment effects of the synthetic hormone EE2 on Guppy Poecilia reticulata with GLMM

3.2 Data Visualization

Table 10 is to display the given dataset (guppy.xlsx) while Table 11 is to show the total sum of the numbers of transition of fishes grouped by Treatment and by Gender, which indicates it is unbalanced experimental design setting.

Table 12 is to show the unbalanced design matix for reference.

Figure 10 is a scatter Plot. The left one is grouped by Treatment while the right one is grouped by Gender

Figure 11 is a mean boxplot of Transitions grouped by TreatmentGender which indicates that the main effects of both “Gender” and “Treatment” shows less visualized mean difference. This observations is further induced whether the interactive term TreatmentGender plays significant behaviour effects for Guppy Poecilia or not.

3.3 Model Specification Strategies:

Table 13 shows 3 GLMM models to study the anxiety Effects of the synthetic hormone 17α-ethinyl estradiol (EE2) on Guppy Poecilia reticulata. Model.P201 includes two main fixed effects whereas Model.P202 includes the main effects with interactive terms. Model.P203 is formulated similar to Model.P202 except not including intercept term,

Table 14 is the result of model comparison with LR Test. The test is divided into two step procedures. The first step is test all models. The second step is test the best two models from step one. The final result indicates that the first best model is Model.P202. Its computational output is attached in Appendix B

3.4 Best Model Diagnostics Check: Model.P202

3.5 Effects Plots: Model.P202

3.6 All Effects Plots: Model.P202

3.7 Conclusion for Part II

From Table 13 and Table14, it found that the interative term Gender* Treatment play more statistically significant role than two main effects both “Gender” and “Treatments”.For control treatment, male Guppy shows less anxiety effects than that of female Guppy whereas male Guppy shows more anxiety treatment effects(20ng EE2) than that of female Guppy.

4 Appendix A

5 Appendix B

Reference

Crawley, Michael J. 2013. “The R Book Second Edition.” John Wiley & Sons.

Harville, David A. 1977. “Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems.” Journal of the American Statistical Association 72 (358): 320–38. www.jstor.org/stable/2286796.

JJ, Faraway. 2006. “Binomial Data. Extending the Linear Model with R.” Chapman & Hall/CRC.

Kass, Robert E, and Duane Steffey. 1989. “Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models).” Journal of the American Statistical Association 84 (407): 717–26.

Laird, Nan M., and James H. Ware. 1982. “Random-Effects Models for Longitudinal Data.” Biometrics 38 (4): 963–74. www.jstor.org/stable/2529876.

Pinero, Jose, and Douglas Bates. 2000. “Mixed-Effects Models in S and S-Plus (Statistics and Computing).” Springer, New York.

Satoh, Masahiro. 2018. “An Alternative Derivation Method of Mixed Model Equations from Best Linear Unbiased Prediction (Blup) and Restricted Blup of Breeding Values Not Using Maximum Likelihood.” Animal Science Journal 89 (6): 876–79.

Shayle R. Searle, Charles E. McCulloch, George Casella. 1992. Variance Components. NewJersey: Wiley-Interscience.