Part 2- Homework

Introduction

For this week’s assignment, I chose a data set of examination results consisting of scores on two components of a science examination in 1989 by 1905 students in 73 schools and colleges. The exam is the General Certificate of Secondary Education (GCSE) taken at the end of compulsory schooling in England, normally when students are 16 years old. The GCSE is an academic qualification, generally taken in a number of subjects by pupils in secondary education in England. The two components of the exam that are shown in the data set are the results of the traditional written question paper (marked out of a total score of 160) and coursework score (marked out of a total score of 108). The exam is taken by students after they have completed the coursework.

Research Questions and Hypotheses

The purpose of this analysis is to see the relationship between the marks of the written test at both the school and student level and whether there are gender differences in this relationship.The students are nested within schools and the schools provide a context for students. I used the written score as the dependent variable and gender and the coursework score as the predictor variables. The coursework score is taken as a measure of ability in obtaining high marks in the GCSE exam. In this case, in obtaining high marks on the written question paper. I used multilevel and random effect models to demonstrate my findings. I analyzed this data set using three competing approaches which include a school-level analysis (ecological regression), student-level analysis (complete-pooling) and another student-level analysis (no-pooling).

Defining Variables

school: ID for 73 different schools and colleges

student: ID for 1905 students in the different schools

gender: 0 is for male and 1 is for female

writtenscore: score obtained on the written question paper exam (marked out of a total score of 160)

courseworkscore: score obtained upon completing the course and before taking the GCSE exam (marked out of a total score of 160)

Importing and previewing the data set

library(readr)
scores<-read_csv("C:\\users\\Sangita Roy\\Desktop\\scores.csv")
head(scores)

Cleaning up the data set

library(dplyr)
scores2<-scores%>%
  rename(gender=`gender, girl=1`,
         writtenscore=`written score, missing=-1`,
         courseworkscore=`coursework score, missing=-1`)%>%
  filter(writtenscore>-1,
         courseworkscore>-1)%>%
  select(School, Student, gender, writtenscore, courseworkscore)
head(scores2)

After filtering missing values for both exams, there are now 1523 observations.

The number of schools

length(unique(scores2$School))
[1] 73

The number of students in each school

scores2 %>% 
  group_by(School) %>% 
  summarise(n_sch = n())

Ecological Analysis

schoold <- scores2 %>% 
  group_by(School) %>% 
  summarise(mean_s = mean(writtenscore, na.rm = TRUE), mean_g = mean(gender, na.rm = TRUE))
head(schoold)

The data set has been grouped by schools and summarized using the mean written score for the GCSE exam and the mean gender.

ecoreg <- lm(mean_s ~ mean_g, data = schoold)
summary(ecoreg)

Call:
lm(formula = mean_s ~ mean_g, data = schoold)

Residuals:
     Min       1Q   Median       3Q      Max 
-18.2849  -4.6580   0.4681   3.9008  18.4128 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   57.062      3.243  17.598  < 2e-16 ***
mean_g       -14.699      5.299  -2.774  0.00707 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.958 on 71 degrees of freedom
Multiple R-squared:  0.09778,   Adjusted R-squared:  0.08507 
F-statistic: 7.695 on 1 and 71 DF,  p-value: 0.007071

This is a school-level analysis which shows the average written score for the schools is 57 and the average written score for females is 14 points lower than males. Ecological fallacy is a limitation of this approach because school-level association may or may not reflect student-level connection.

Complete-pooling model

cpooling <- lm(writtenscore ~ gender, data = scores2)
summary(cpooling)

Call:
lm(formula = writtenscore ~ gender, data = scores2)

Residuals:
    Min      1Q  Median      3Q     Max 
-45.782  -9.113   0.056   9.218  44.431 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  48.9073     0.5363  91.196  < 2e-16 ***
gender       -3.3379     0.6980  -4.782  1.9e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 13.4 on 1521 degrees of freedom
Multiple R-squared:  0.01481,   Adjusted R-squared:  0.01416 
F-statistic: 22.87 on 1 and 1521 DF,  p-value: 1.904e-06

This model is a student-level analysis in which schools are ignored. The average written exam score for females is 3 points lower than males. Average written exam for males is 48. The results are statistically significant, (p<.05).The limitation with the complete-pooling approach is that there may be important between-school variations in the relationship.

No-pooling model (The intercept)

library(ggplot2)
Need help getting started? Try the cookbook for R:
http://www.cookbook-r.com/Graphs/
dcoef <- scores2 %>% 
    group_by(School) %>% 
    do(mod = lm(writtenscore ~ gender, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram()

This histogram shows the earlier linear model grouped by schools, one model per school. The intercept shows that most schools score between 40 to 60 marks on the written exam. However, there are a few schools that score lower on the written exam as show on the right and left side of the histogram.

The slope

dcoef <- scores2 %>% 
    group_by(School) %>% 
    do(mod = lm(writtenscore ~ gender, data = .))
coef <- dcoef %>% do(data.frame(genderc = coef(.$mod)[2]))
ggplot(coef, aes(x = genderc)) + geom_histogram()

The histogram of the slope is shown above and it is evident that males score much higher on the written exam. In summary, there is a great deal of between-school level variation in both the intercept and the slope parameters. The complete pooling model results of 48.9073 and -3.3379 are the results of many countervailing forces.

Partial-pooling model

Partial-pooling models allows between-school variation.

Multilevel model #1 (Random intercept)

library(nlme)
m1_lme <- lme(writtenscore ~ gender, data = scores2, random = ~1|School, method = "ML")
summary(m1_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: scores2 
       AIC      BIC   logLik
  11827.66 11848.97 -5909.83

Random effects:
 Formula: ~1 | School
        (Intercept) Residual
StdDev:     7.02069 11.17925

Fixed effects: writtenscore ~ gender 
               Value Std.Error   DF  t-value p-value
(Intercept) 49.61493 0.9734214 1449 50.96964       0
gender      -2.44861 0.6016647 1449 -4.06972       0
 Correlation: 
       (Intr)
gender -0.362

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-3.15425682 -0.67237074 -0.04393553  0.66460074  3.82829157 

Number of Observations: 1523
Number of Groups: 73 

Multilevel model #1 shows the effect of gender on the written exam score with random effects for the school. This is a random intercept model which does not allow gender difference in the written exam score to differ between schools. The average written exam score for males (reference group) is 49 and for females, the score is 2 points lower than males. The p-value for both results are 0 demonstrating statistical significance (p<.05).

Multilevel model #2

m2_lme <- lme(writtenscore ~ gender, data = scores2, random = ~ gender|School, method = "ML")
summary(m2_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: scores2 
    AIC      BIC    logLik
  11831 11862.97 -5909.501

Random effects:
 Formula: ~gender | School
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept)  7.204514 (Intr)
gender       1.854513 -0.239
Residual    11.143610       

Fixed effects: writtenscore ~ gender 
               Value Std.Error   DF  t-value p-value
(Intercept) 49.64272 0.9972701 1449 49.77861   0e+00
gender      -2.49802 0.6589697 1449 -3.79079   2e-04
 Correlation: 
       (Intr)
gender -0.415

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-3.12115295 -0.69057170 -0.05821439  0.65852908  3.81286240 

Number of Observations: 1523
Number of Groups: 73 

Multilevel model #2 shows the effect of gender on the written exam score with random effects for the gender and school nested within the school.There is not much of a difference between the first and second multilevel models.The average written exam score is approximately 50 for males and 2 points lower for females. Results are statistically significant (p<.05).

Model selection

AIC(cpooling, m1_lme, m2_lme)

Based on the AIC values, multilevel model #1 is the best fit because it has the lowest AIC value of 11827.66

Adding student-level covariate

m3_lme <- lme(writtenscore ~ gender + courseworkscore, data = scores2, random = ~ gender|School, method = "ML")
summary(m3_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: scores2 
      AIC     BIC    logLik
  11418.4 11455.7 -5702.202

Random effects:
 Formula: ~gender | School
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev   Corr  
(Intercept) 6.342501 (Intr)
gender      1.946438 -0.158
Residual    9.705215       

Fixed effects: writtenscore ~ gender + courseworkscore 
                    Value Std.Error   DF   t-value p-value
(Intercept)     21.475527 1.5607541 1448 13.759712       0
gender          -5.347579 0.6090938 1448 -8.779566       0
courseworkscore  0.404319 0.0185338 1448 21.815194       0
 Correlation: 
                (Intr) gender
gender          -0.036       
courseworkscore -0.827 -0.216

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-3.53113379 -0.66818742 -0.03719285  0.66189140  3.64597728 

Number of Observations: 1523
Number of Groups: 73 

I added a student-level covariate since I did not have any school-level variables in my data set. Model #3 shows the effect of gender and course work score obtained has on the written exam score with random effects for the gender and school nested within the school. Females receive 5 points lower on the written exam score than males, and for every increase in the course work score, the written exam score increases by 0.40. Results are statistically significant (p<.05).

Cross-level interaction

m4_lme <- lme(writtenscore ~ gender*courseworkscore, data = scores2, random = ~ gender|School, method = "ML")
summary(m4_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: scores2 
       AIC      BIC    logLik
  11413.16 11455.79 -5698.582

Random effects:
 Formula: ~gender | School
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev   Corr  
(Intercept) 6.402169 (Intr)
gender      2.103725 -0.185
Residual    9.673303       

Fixed effects: writtenscore ~ gender * courseworkscore 
                            Value Std.Error   DF   t-value p-value
(Intercept)             24.942316 2.0249013 1447 12.317794  0.0000
gender                 -11.974642 2.5288679 1447 -4.735179  0.0000
courseworkscore          0.354669 0.0261801 1447 13.547273  0.0000
gender:courseworkscore   0.090834 0.0337030 1447  2.695133  0.0071
 Correlation: 
                       (Intr) gender crswrk
gender                 -0.625              
courseworkscore        -0.900  0.649       
gender:courseworkscore  0.636 -0.970 -0.707

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-3.50405734 -0.66224240 -0.02831706  0.66668044  3.58806548 

Number of Observations: 1523
Number of Groups: 73 

I also ran a cross-level interaction between gender and course work score. The interaction is not statistically significant since (p>.05).

Best model

AIC(cpooling, m1_lme, m2_lme, m3_lme, m4_lme)

Based on the AIC values, Multilevel model #4 is the best fit model because it has the lowest AIC value of 11413.16

Centering covariates

library(magrittr)
scores2 %<>% mutate(ccourseworkscore = courseworkscore - mean(courseworkscore))

Refit the model

m4a_lme <- lme(writtenscore ~ gender*ccourseworkscore, data = scores2, random = ~ gender|School, method = "ML")
summary(m4a_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: scores2 
       AIC      BIC    logLik
  11413.16 11455.79 -5698.582

Random effects:
 Formula: ~gender | School
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev   Corr  
(Intercept) 6.402168 (Intr)
gender      2.103743 -0.185
Residual    9.673302       

Fixed effects: writtenscore ~ gender * ccourseworkscore 
                           Value Std.Error   DF  t-value p-value
(Intercept)             50.98535 0.8889876 1447 57.35215  0.0000
gender                  -5.30478 0.6186048 1447 -8.57540  0.0000
ccourseworkscore         0.35467 0.0261801 1447 13.54727  0.0000
gender:ccourseworkscore  0.09083 0.0337031 1447  2.69513  0.0071
 Correlation: 
                        (Intr) gender ccrswr
gender                  -0.406              
ccourseworkscore         0.113 -0.176       
gender:ccourseworkscore -0.081  0.037 -0.707

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-3.50405829 -0.66224197 -0.02831735  0.66668066  3.58806522 

Number of Observations: 1523
Number of Groups: 73 
library(texreg)
htmlreg(list(m4_lme, m4a_lme))
Statistical models
Model 1 Model 2
(Intercept) 24.94*** 50.99***
(2.02) (0.89)
gender -11.97*** -5.30***
(2.53) (0.62)
courseworkscore 0.35***
(0.03)
gender:courseworkscore 0.09**
(0.03)
ccourseworkscore 0.35***
(0.03)
gender:ccourseworkscore 0.09**
(0.03)
AIC 11413.16 11413.16
BIC 11455.79 11455.79
Log Likelihood -5698.58 -5698.58
Num. obs. 1523 1523
Num. groups 73 73
p < 0.001, p < 0.01, p < 0.05

These are two identical models, and despite the numerical differences in some coefficient, they tell the same story. Model 4 demonstrates a cross-level interaction between gender and course work score to see the effect on the written exam score. The second models (model 4a) refits the model by centering the covariates of the course work score variable and uses the centered covariates in the second model. Centering the independent variables (course work score) redefines the 0 point for the predictor variable to be the mean of the course work score. This shifts the scale over but retains the unit. The effect of the slope between course work score and written exam score does not change at all but the interpretation of the intercept does. The intercept of the second model is the mean written exam score when all the independent variables equal 0. Since the second model has turned “0” into the average score through centering,the intercept is now recognized as the value of the written exam score when the course work score is set to the average.

Intra-class correlation

m0_lme <- lme(writtenscore ~ 1, random = ~ 1|School, data = scores2, method = "ML")
summary(m0_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: scores2 
       AIC      BIC    logLik
  11842.15 11858.13 -5918.073

Random effects:
 Formula: ~1 | School
        (Intercept) Residual
StdDev:    7.124212 11.23567

Fixed effects: writtenscore ~ 1 
               Value Std.Error   DF  t-value p-value
(Intercept) 48.18298   0.91906 1450 52.42636       0

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-3.03956792 -0.66708524 -0.05519453  0.65553457  3.73019890 

Number of Observations: 1523
Number of Groups: 73 

Intra-class correlation measures the reliability of ratings or measurements for clusters - data that has been collected as groups or sorted into groups. The Intra-class correlation can be calculated using the following formula: (7.124212)/(7.124212 + 11.23567)= 0.3880314699. Since the ICC value is low and close to zero, it means that values (written exam score) from the same group (schools) are not similar.

Computing confidence intervals

intervals(m0_lme)
Approximate 95% confidence intervals

 Fixed effects:
               lower     est.    upper
(Intercept) 46.38074 48.18298 49.98521
attr(,"label")
[1] "Fixed effects:"

 Random Effects:
  Level: School 
                   lower     est.    upper
sd((Intercept)) 5.837725 7.124212 8.694208

 Within-group standard error:
   lower     est.    upper 
10.83388 11.23567 11.65235 

A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. Based on this same, the interval is very narrow and close to the estimates. (95% confident that the true written exam score in the population for fixed effects is between 46.3 and 49.9)

merTools (Model Effects)

I used the “lmer” function of the merTools package to generate quantities of interest for the random intercept model. This version of the model gives a nearly identical result to the “lme” random slope model used above.

library(merTools)
Mod_RandInctMer <- lmer(writtenscore ~ gender + (1|School), data= scores2)
summary(Mod_RandInctMer)
Linear mixed model fit by REML ['lmerMod']
Formula: writtenscore ~ gender + (1 | School)
   Data: scores2

REML criterion at convergence: 11817.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.1531 -0.6717 -0.0434  0.6626  3.8270 

Random effects:
 Groups   Name        Variance Std.Dev.
 School   (Intercept)  50.16    7.082  
 Residual             125.06   11.183  
Number of obs: 1523, groups:  School, 73

Fixed effects:
            Estimate Std. Error t value
(Intercept)  49.6161     0.9791   50.67
gender       -2.4469     0.6015   -4.07

Correlation of Fixed Effects:
       (Intr)
gender -0.360
reEx <- REsim(Mod_RandInctMer)
p1 <- plotREsim(reEx)
p1

This plot demonstrates that there is variation in both directions when schools are used as the interest variable in our model.

confint.merMod(Mod_RandInctMer,method="profile")
                2.5 %    97.5 %
.sig01       5.779647  8.617223
.sigma      10.784234 11.598940
(Intercept) 47.691495 51.546352
gender      -3.628869 -1.268884

The confidence intervals obtained using the merTools package indicates there is not a wide variation in the random effect.

References

University of Bristol. “Centre for Multilevel Modelling.” Datasets Used in Multilevel Statistical Models; 3rd Edition 2003 | Centre for Multilevel Modelling | University of Bristol, University of Bristol, 5 Nov. 2014, www.bristol.ac.uk/cmm/team/hg/msm-3rd-ed/datasets.html.

Goldstein, Harvey. Multilevel Statistical Models. E. Arnold, 2003.

---
title: "#**Random Effect Models**"
author: "Sangita Roy"
date: "April 8, 2018"
output: html_notebook
---

###**Part 2- Homework**


##**Introduction**
For this week's assignment, I chose a data set of examination results consisting of scores on two components of a science examination in 1989 by 1905 students in 73 schools and colleges. The exam is the General Certificate of Secondary Education (GCSE) taken at the end of compulsory schooling in England, normally when students are 16 years old. The GCSE is an academic qualification, generally taken in a number of subjects by pupils in secondary education in England. The two components of the exam that are shown in the data set are the results of the traditional written question paper (marked out of a total score of 160) and coursework score (marked out of a total score of 108). The exam is taken by students after they have completed the coursework.  


##**Research Questions and Hypotheses**
The purpose of this analysis is to see the relationship between the marks of the written test at both the school and student level and whether there are gender differences in this relationship.The students are nested within schools and the schools provide a context for students. I used the written score as the dependent variable and gender and the coursework score as the predictor variables. The coursework score is taken as a measure of ability in obtaining high marks in the GCSE exam. In this case, in obtaining high marks on the written question paper. I used multilevel and random effect models to demonstrate my findings. I analyzed this data set using three competing approaches which include a school-level analysis (ecological regression), student-level analysis (complete-pooling) and another student-level analysis (no-pooling).

##Defining Variables

**school**: ID for 73 different schools and colleges

**student**: ID for 1905 students in the different schools

**gender**: 0 is for male and 1 is for female

**writtenscore**: score obtained on the written question paper exam (marked out of a total score of 160)

**courseworkscore**: score obtained upon completing the course and before taking the GCSE exam (marked out of a total score of 160)


##Importing and previewing the data set
```{r message=FALSE, warning=FALSE}
library(readr)

scores<-read_csv("C:\\users\\Sangita Roy\\Desktop\\scores.csv")
head(scores)
```


##Cleaning up the data set
```{r message=FALSE, warning=FALSE}
library(dplyr)
scores2<-scores%>%
  rename(gender=`gender, girl=1`,
         writtenscore=`written score, missing=-1`,
         courseworkscore=`coursework score, missing=-1`)%>%
  filter(writtenscore>-1,
         courseworkscore>-1)%>%
  select(School, Student, gender, writtenscore, courseworkscore)
head(scores2)

```
After filtering missing values for both exams, there are now 1523 observations.

##The number of schools
```{r}
length(unique(scores2$School))
```

##The number of students in each school
```{r}
scores2 %>% 
  group_by(School) %>% 
  summarise(n_sch = n())
```


##Ecological Analysis
```{r}
schoold <- scores2 %>% 
  group_by(School) %>% 
  summarise(mean_s = mean(writtenscore, na.rm = TRUE), mean_g = mean(gender, na.rm = TRUE))
head(schoold)
```
The data set has been grouped by schools and summarized using the mean written score for the GCSE exam and the mean gender.

```{r}
ecoreg <- lm(mean_s ~ mean_g, data = schoold)
summary(ecoreg)
```
This is a school-level analysis which shows the average written score for the schools is 57 and the average written score for females is 14 points lower than males. Ecological fallacy is a limitation of this approach because school-level association may or may not reflect student-level connection.

##Complete-pooling model
```{r}
cpooling <- lm(writtenscore ~ gender, data = scores2)
summary(cpooling)
```
This model is a student-level analysis in which schools are ignored. The average written exam score for females is 3 points lower than males. Average written exam for males is 48. The results are statistically significant, (p<.05).The limitation with the complete-pooling approach is that there may be important between-school variations in the relationship.

##No-pooling model (The intercept)
```{r}
library(ggplot2)
dcoef <- scores2 %>% 
    group_by(School) %>% 
    do(mod = lm(writtenscore ~ gender, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram()
```
This histogram shows the earlier linear model grouped by schools, one model per school. The intercept shows that most schools score between 40 to 60 marks on the written exam. However, there are a few schools that score lower on the written exam as show on the right and left side of the histogram. 

##The slope
```{r}
dcoef <- scores2 %>% 
    group_by(School) %>% 
    do(mod = lm(writtenscore ~ gender, data = .))
coef <- dcoef %>% do(data.frame(genderc = coef(.$mod)[2]))
ggplot(coef, aes(x = genderc)) + geom_histogram()
```
The histogram of the slope is shown above and it is evident that males score much higher on the written exam. In summary, there is a great deal of between-school level variation in both the intercept and the slope parameters. The complete pooling model results of 48.9073 and -3.3379 are the results of many countervailing forces.

##Partial-pooling model
Partial-pooling models allows between-school variation.

##Multilevel model #1 (Random intercept)
```{r message=FALSE, warning=FALSE}
library(nlme)
m1_lme <- lme(writtenscore ~ gender, data = scores2, random = ~1|School, method = "ML")
summary(m1_lme)
```
Multilevel model #1 shows the effect of gender on the written exam score with random effects for the school. This is a random intercept model which does not allow gender difference in the written exam score to differ between schools. The average written exam score for males (reference group) is 49 and for females, the score is 2 points lower than males. The p-value for both results are 0 demonstrating statistical significance (p<.05).

##Multilevel model #2
```{r}
m2_lme <- lme(writtenscore ~ gender, data = scores2, random = ~ gender|School, method = "ML")
summary(m2_lme)
```
Multilevel model #2 shows the effect of gender on the written exam score with random effects for the gender and school nested within the school.There is not much of a difference between the first and second multilevel models.The average written exam score is approximately 50 for males and 2 points lower for females. Results are statistically significant (p<.05). 


##Model selection
```{r}
AIC(cpooling, m1_lme, m2_lme)
```
Based on the AIC values, multilevel model #1 is the best fit because it has the lowest AIC value of 11827.66

##Adding student-level covariate
```{r}
m3_lme <- lme(writtenscore ~ gender + courseworkscore, data = scores2, random = ~ gender|School, method = "ML")
summary(m3_lme)
```
I added a student-level covariate since I did not have any school-level variables in my data set. Model #3 shows the effect of gender and course work score obtained has on the written exam score with random effects for the gender and school nested within the school. Females receive 5 points lower on the written exam score than males, and for every increase in the course work score, the written exam score increases by 0.40. Results are statistically significant (p<.05).

##Cross-level interaction
```{r}
m4_lme <- lme(writtenscore ~ gender*courseworkscore, data = scores2, random = ~ gender|School, method = "ML")
summary(m4_lme)
```
I also ran a cross-level interaction between gender and course work score. The interaction is not statistically significant since (p>.05). 

##Best model
```{r}
AIC(cpooling, m1_lme, m2_lme, m3_lme, m4_lme)
```
Based on the AIC values, Multilevel model #4 is the best fit model because it has the lowest AIC value of 11413.16

##Centering covariates
```{r message=FALSE, warning=FALSE}
library(magrittr)
scores2 %<>% mutate(ccourseworkscore = courseworkscore - mean(courseworkscore))
```

##Refit the model
```{r}
m4a_lme <- lme(writtenscore ~ gender*ccourseworkscore, data = scores2, random = ~ gender|School, method = "ML")
summary(m4a_lme)
```

```{r message=FALSE, warning=FALSE, results='asis'}
library(texreg)
htmlreg(list(m4_lme, m4a_lme))
```
These are two identical models, and despite the numerical differences in some coefficient, they tell the same story. Model 4 demonstrates a cross-level interaction between gender and course work score to see the effect on the written exam score. The second models (model 4a) refits the model by centering the covariates of the course work score variable and uses the centered covariates in the second model. Centering the independent variables (course work score) redefines the 0 point for the predictor variable to be the mean of the course work score. This shifts the scale over but retains the unit. The effect of the slope between course work score and written exam score does not change at all but the interpretation of the intercept does. The intercept of the second model is the mean written exam score when all the independent variables equal 0. Since the second model has turned "0" into the average score through centering,the intercept is now recognized as the value of the written exam score when the course work score is set to the average.


##Intra-class correlation
```{r}
m0_lme <- lme(writtenscore ~ 1, random = ~ 1|School, data = scores2, method = "ML")
summary(m0_lme)
```
Intra-class correlation measures the reliability of ratings or measurements for clusters - data that has been collected as groups or sorted into groups.
The Intra-class correlation can be calculated using the following formula: (7.124212)/(7.124212 + 11.23567)= 0.3880314699. Since the ICC value is low and close to zero, it means that values (written exam score) from the same group (schools) are not similar.

##Computing confidence intervals
```{r}
intervals(m0_lme)
```
A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. Based on this same, the interval is very narrow and close to the estimates. (95% confident that the true written exam score in the population for fixed effects is between 46.3 and 49.9)

##merTools (Model Effects)
I used the "lmer" function of the merTools package to generate quantities of interest for the random intercept model. This version of the model gives a nearly identical result to the "lme" random slope model used above.

```{r message=FALSE, warning=FALSE}
library(merTools)
Mod_RandInctMer <- lmer(writtenscore ~ gender + (1|School), data= scores2)
summary(Mod_RandInctMer)
```

```{r}
reEx <- REsim(Mod_RandInctMer)
p1 <- plotREsim(reEx)
p1
```
This plot demonstrates that there is variation in both directions when schools are used as the interest variable in our model.

```{r message=FALSE, warning=FALSE}
confint.merMod(Mod_RandInctMer,method="profile")
```
The confidence intervals obtained using the merTools package indicates there is not a wide variation in the random effect.

##**References**
University of Bristol. "Centre for Multilevel Modelling." Datasets Used in Multilevel Statistical Models; 3rd Edition 2003 | Centre for Multilevel Modelling | University of Bristol, University of Bristol, 5 Nov. 2014, www.bristol.ac.uk/cmm/team/hg/msm-3rd-ed/datasets.html.

Goldstein, Harvey. Multilevel Statistical Models. E. Arnold, 2003.
