library(lmerTest)

## Warning: package 'lmerTest' was built under R version 4.2.3

## Loading required package: lme4

## Warning: package 'lme4' was built under R version 4.2.3

## Loading required package: Matrix

## 
## Attaching package: 'lmerTest'

## The following object is masked from 'package:lme4':
## 
##     lmer

## The following object is masked from 'package:stats':
## 
##     step

library(languageR)

## Warning: package 'languageR' was built under R version 4.2.3

library(influence.ME)

## Warning: package 'influence.ME' was built under R version 4.2.3

## 
## Attaching package: 'influence.ME'

## The following object is masked from 'package:stats':
## 
##     influence

library(lattice)

1a)

Loading the datafile

politeness_data2 <- read.csv("C:/Users/John Majoubi/Downloads/politeness_data2.csv", stringsAsFactors=TRUE)

str(politeness_data2)

## 'data.frame':    84 obs. of  5 variables:
##  $ subject  : Factor w/ 6 levels "F1","F2","F3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ gender   : Factor w/ 2 levels "man","woman": 2 2 2 2 2 2 2 2 2 2 ...
##  $ scenario : int  1 1 2 2 3 3 4 4 5 5 ...
##  $ form     : Factor w/ 2 levels "informal","polite": 2 1 2 1 2 1 2 1 2 1 ...
##  $ frequency: num  213 204 285 260 204 ...

1b)

Exploratory Data Analysis

Visualizing Random Intercepts

#random intercept on participant
xyplot(frequency ~ form|as.factor(subject), data = politeness_data2, type = c("p", "g", "r"), col = "lightblue", col.line = "black")

#random intercept on scenario
xyplot(frequency ~ form|as.factor(scenario), data = politeness_data2, type = c("p", "g", "r"), col = "gold", col.line = "black")

#Just for fun
xyplot(frequency ~ scenario|as.factor(subject), data = politeness_data2, type = c("p", "g", "r"), col = "gold", col.line = "black")

1c)

Using Cook’s D for screening outliers

Speech.freq.Model = lmer(frequency ~ form + (1|scenario) + (1|subject), data = politeness_data2)
#Estimated model
estimated.speech.freq.model = influence(Speech.freq.Model, "subject")
#Cook's D values
summary(cooks.distance(estimated.speech.freq.model))

##        V1         
##  Min.   :0.01527  
##  1st Qu.:0.10051  
##  Median :0.12077  
##  Mean   :0.14636  
##  3rd Qu.:0.20354  
##  Max.   :0.29648

1d)

All assumptions are met. Proceeding with NHST

1e)

No outliers, because max Cook’s D value is < .50

1f)

NHST

summary(Speech.freq.Model)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: frequency ~ form + (1 | scenario) + (1 | subject)
##    Data: politeness_data2
## 
## REML criterion at convergence: 793.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2006 -0.5817 -0.0639  0.5625  3.4385 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  scenario (Intercept)  219     14.80   
##  subject  (Intercept) 4015     63.36   
##  Residual              646     25.42   
## Number of obs: 83, groups:  scenario, 7; subject, 6
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  202.588     26.754   5.575   7.572 0.000389 ***
## formpolite   -19.695      5.585  70.022  -3.527 0.000748 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##            (Intr)
## formpolite -0.103

1g)

ICCscenario = (219)/(219+646)
ICCscenario

## [1] 0.2531792

ICCParticipant = (4015)/(4015+646)
ICCParticipant

## [1] 0.8614031

APA Summary

Using a linear mixed-effects model (LMM), we examined whether two forms of Korean speech (polite and informal) produced changes in vocal frequency. In the LMM model we treated particiapnts and scenario as random intercepts along the vocal frequency. Results showed that the random intercepts accounted for significant variability in vocal frequency (b = 202.59, SE = 26.75), t(5.58) = 7.57 p < 0.001, ICC \(_{scenario}\) = 0.253,ICC \(_{participant}\) = 0.861.Considering the fixed slope for the ‘form’ variable, the polite form of speech produced lower frequency than the informal form of speech (b = -19.70, SE = 5.59), t(70.02) = -3.53, p < 0.001.

Question 2: MultiLevel Modeling

library(multilevel)

## Warning: package 'multilevel' was built under R version 4.2.3

## Loading required package: nlme

## 
## Attaching package: 'nlme'

## The following object is masked from 'package:lme4':
## 
##     lmList

## Loading required package: MASS

## Warning: package 'MASS' was built under R version 4.2.3

library(haven)

imm23_1_ <- read_dta("C:/Users/John Majoubi/Downloads/imm23 (1).dta")

str(imm23_1_)

## tibble [519 × 18] (S3: tbl_df/tbl/data.frame)
##  $ schid   : num [1:519] 6053 6053 6053 6053 6053 ...
##   ..- attr(*, "label")= chr "School ID"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ stuid   : num [1:519] 1 2 4 11 12 13 18 22 23 24 ...
##   ..- attr(*, "label")= chr "Student ID"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ ses     : num [1:519] 0.85 0.43 -0.59 1.02 0.84 ...
##   ..- attr(*, "label")= chr "Socioecnonomic Status"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ meanses : num [1:519] 0.7 0.7 0.7 0.7 0.7 ...
##   ..- attr(*, "label")= chr "Mean SES for the school"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ homework: num [1:519] 1 1 3 1 1 1 1 4 1 5 ...
##   ..- attr(*, "label")= chr "Time spent on math homework each week"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ white   : num [1:519] 1 1 0 1 1 1 1 1 0 1 ...
##   ..- attr(*, "label")= chr "Race: 1=white, 0=non-white"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ parented: num [1:519] 4 3 3 5 5 6 3 4 5 6 ...
##   ..- attr(*, "label")= chr "Parents highest education level"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ public  : num [1:519] 0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "label")= chr "Public school: 1=public, 0=non-public"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ ratio   : num [1:519] 18 18 18 18 18 18 18 18 18 18 ...
##   ..- attr(*, "label")= chr "Student-Teacher ratio"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ percmin : num [1:519] 3 3 3 3 3 3 3 3 3 3 ...
##   ..- attr(*, "label")= chr "Percent minority in school"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ math    : num [1:519] 50 43 50 49 62 43 42 68 41 62 ...
##   ..- attr(*, "label")= chr "Math score"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ sex     : num [1:519] 2 2 2 2 1 2 1 1 1 1 ...
##   ..- attr(*, "label")= chr "Sex: 1=male, 2=female"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ race    : num [1:519] 4 4 1 4 4 4 4 4 3 4 ...
##   ..- attr(*, "label")= chr "race of student, 1=asian, 2=Hispanic, 3=Black, 4=White, 5=Native American"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ sctype  : num [1:519] 4 4 4 4 4 4 4 4 4 4 ...
##   ..- attr(*, "label")= chr "Type of school, 1=public, 2=catholic, 3=Private other religious, 4=Private non-r"
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ cstr    : num [1:519] 3 3 3 3 3 3 3 3 3 3 ...
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ scsize  : num [1:519] 3 3 3 3 3 3 3 3 3 3 ...
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ urban   : num [1:519] 1 1 1 1 1 1 1 1 1 1 ...
##   ..- attr(*, "format.stata")= chr "%9.0g"
##  $ region  : num [1:519] 2 2 2 2 2 2 2 2 2 2 ...
##   ..- attr(*, "format.stata")= chr "%9.0g"

2a)

Grand-mean centering instead of group-mean centering is the sensible option, becuse the homework variable is self-reported and is best kept at that level, which is what grand-mean centering accomplishes. Group-mean centering compares each student’s self-reported homework score to the average score of their school. To do this, we need a good reason beforehand, as it alters the interpretive meaning of the newly centered variable.

2b)

Data Preparation: Grand mean centering

imm23_1_$Centered.Homework = imm23_1_$homework - mean(imm23_1_$homework, na.rm = TRUE)
attach(imm23_1_)

2c)

EDA for Outlier Screening and Collinearity

Math.Model = lmer(math ~ ses*Centered.Homework + meanses + (Centered.Homework|schid), data = imm23_1_)

estimated.Math.Model = influence(Math.Model, "schid")
summary(cooks.distance(estimated.Math.Model))

##        V1         
##  Min.   :0.01137  
##  1st Qu.:0.01853  
##  Median :0.03639  
##  Mean   :0.05002  
##  3rd Qu.:0.05479  
##  Max.   :0.22130

There are no outliers by 0.50 standard.

#this function comes from: https://github.com/aufrank/R-hacks/blob/master/mer-utils.R


vif.mer <- function (fit) {
  ## adapted from rms::vif
  
  v <- vcov(fit)
  nam <- names(fixef(fit))
  
  ## exclude intercepts
  ns <- sum(1 * (nam == "Intercept" | nam == "(Intercept)"))
  if (ns > 0) {
    v <- v[-(1:ns), -(1:ns), drop = FALSE]
    nam <- nam[-(1:ns)]
  }
  
  d <- diag(v)^0.5
  v <- diag(solve(v/(d %o% d)))
  names(v) <- nam
  v
}

Getting the collinearity coefficient as VIFs for the full model:

vif.mer(Math.Model)

##                   ses     Centered.Homework               meanses 
##              1.133999              1.001406              1.129989 
## ses:Centered.Homework 
##              1.019629

2d) and 2e)

No assumptions were violated. According to our Cook’s D standard, there are no outliers. The variance inflation factors (VIFs) indicated acceptable collinearity across predictors. All VIFs were under 2.00.

2f)

Creating a Build-up Model Test Step 1: Random Intercept

#equation for the random intercept model
Randomintercept.model = lmer(math ~ (1|schid))
# Requesting the variance coefficients
print(VarCorr(Randomintercept.model), comp=c("Variance", "Std.Dev."))

##  Groups   Name        Variance Std.Dev.
##  schid    (Intercept) 26.124   5.1112  
##  Residual             81.244   9.0135

Fixed Intercept Model

Fixedintercept.Model = lm(math ~ 1)

Testing the difference between fixed and random intercept

anova(Randomintercept.model, Fixedintercept.Model, refit = F)

## Warning in anova.merMod(Randomintercept.model, Fixedintercept.Model, refit =
## F): some models fit with REML = TRUE, some not

## Data: NULL
## Models:
## Fixedintercept.Model: math ~ 1
## Randomintercept.model: math ~ (1 | schid)
##                       npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## Fixedintercept.Model     2 3937.1 3945.6 -1966.5   3933.1                     
## Randomintercept.model    3 3804.7 3817.4 -1899.3   3798.7 134.39  1  < 2.2e-16
##                          
## Fixedintercept.Model     
## Randomintercept.model ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Variance and information has been gained at p < .05 Calculating the ICC:

ICC.RI.Model = 26.124/(26.124+81.244)
ICC.RI.Model

## [1] 0.2433127

The Icc shows that 24.33% of the variance is responsible for the school that students attend.

2g)

Creating a Build-up Model Test Step 2: Adding Fixed slopes on predictors

Math.Model.AddfixedSlopes = lmer(math ~ ses + Centered.Homework + ses:Centered.Homework + (1|schid))
#requesting new vaiance coefficients
print(VarCorr(Math.Model.AddfixedSlopes), comp=c("Variance", "Std.Dev."))

##  Groups   Name        Variance Std.Dev.
##  schid    (Intercept) 12.092   3.4773  
##  Residual             67.100   8.1915

#NHST for variance gain between the two models
anova(Randomintercept.model, Math.Model.AddfixedSlopes, refit = F)

## Data: NULL
## Models:
## Randomintercept.model: math ~ (1 | schid)
## Math.Model.AddfixedSlopes: math ~ ses + Centered.Homework + ses:Centered.Homework + (1 | schid)
##                           npar    AIC    BIC  logLik deviance  Chisq Df
## Randomintercept.model        3 3804.7 3817.4 -1899.3   3798.7          
## Math.Model.AddfixedSlopes    6 3698.9 3724.4 -1843.5   3686.9 111.78  3
##                           Pr(>Chisq)    
## Randomintercept.model                   
## Math.Model.AddfixedSlopes  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Adding the fixed slopes will be helpful and it does account for significant variance (p < .05) compared to the random intercept only model.

2h)

Homework varying by school plot

xyplot(math ~ Centered.Homework|as.factor(schid), data = imm23_1_, type = c("p", "g", "r"), col = "blue", col.line = "black")

2i)

Math.Model.AddRandomSlope = lmer(math ~ ses + Centered.Homework + ses:Centered.Homework + (Centered.Homework|schid))

anova(Math.Model.AddfixedSlopes, Math.Model.AddRandomSlope, refit = F)

## Data: NULL
## Models:
## Math.Model.AddfixedSlopes: math ~ ses + Centered.Homework + ses:Centered.Homework + (1 | schid)
## Math.Model.AddRandomSlope: math ~ ses + Centered.Homework + ses:Centered.Homework + (Centered.Homework | schid)
##                           npar    AIC    BIC  logLik deviance  Chisq Df
## Math.Model.AddfixedSlopes    6 3698.9 3724.4 -1843.5   3686.9          
## Math.Model.AddRandomSlope    8 3620.3 3654.3 -1802.1   3604.3 82.647  2
##                           Pr(>Chisq)    
## Math.Model.AddfixedSlopes               
## Math.Model.AddRandomSlope  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Information is gained by adding a random slope at p < 0.0001.

2j)

Adding school level SES to the model as fixed slope:

Math.Model.Add.meanses = lmer(math ~ ses + Centered.Homework + meanses + ses:Centered.Homework + (Centered.Homework|schid))

print(VarCorr(Math.Model.Add.meanses), comp=c("Variance", "Std.Dev."))

##  Groups   Name              Variance Std.Dev. Corr 
##  schid    (Intercept)       12.726   3.5673        
##           Centered.Homework 15.294   3.9108   0.364
##  Residual                   51.304   7.1626

anova(Math.Model.AddRandomSlope, Math.Model.Add.meanses, refit = F)

## Data: NULL
## Models:
## Math.Model.AddRandomSlope: math ~ ses + Centered.Homework + ses:Centered.Homework + (Centered.Homework | schid)
## Math.Model.Add.meanses: math ~ ses + Centered.Homework + meanses + ses:Centered.Homework + (Centered.Homework | schid)
##                           npar    AIC    BIC  logLik deviance  Chisq Df
## Math.Model.AddRandomSlope    8 3620.3 3654.3 -1802.1   3604.3          
## Math.Model.Add.meanses       9 3616.2 3654.5 -1799.1   3598.2 6.0245  1
##                           Pr(>Chisq)  
## Math.Model.AddRandomSlope             
## Math.Model.Add.meanses       0.01411 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Adding a predictor variable in the new model did provide information gain and was significant at p < .05 as compared to the previous model.

2k)

summary(Math.Model.Add.meanses)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: math ~ ses + Centered.Homework + meanses + ses:Centered.Homework +  
##     (Centered.Homework | schid)
## 
## REML criterion at convergence: 3598.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3705 -0.6814 -0.0086  0.6185  3.0151 
## 
## Random effects:
##  Groups   Name              Variance Std.Dev. Corr
##  schid    (Intercept)       12.73    3.567        
##           Centered.Homework 15.29    3.911    0.36
##  Residual                   51.30    7.163        
## Number of obs: 519, groups:  schid, 23
## 
## Fixed effects:
##                       Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)            50.9516     0.8530  20.3211  59.733  < 2e-16 ***
## ses                     2.4261     0.5228 478.1552   4.640  4.5e-06 ***
## Centered.Homework       1.8593     0.8671  19.8920   2.144   0.0445 *  
## meanses                 2.8900     1.5168  26.8710   1.905   0.0675 .  
## ses:Centered.Homework  -0.4243     0.3815 494.7001  -1.112   0.2667    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) ses    Cntr.H meanss
## ses          0.008                     
## Cntrd.Hmwrk  0.339 -0.036              
## meanses      0.169 -0.323  0.004       
## ss:Cntrd.Hm -0.036  0.088 -0.008  0.073

APA Conclusion

A four-step hierarchical linear model was used to estimate students’ math performance. The level-1 predictors were individual hours spent on homework and socio-economic status (SES). The level-2 predictor was average socio-economic status within a school. The Schools attended were used to estimate the random intercept on math scores. Homework (hrs spent) was used to create a random slope (along the random intercept). Finally, we included an interaction between homework and SES.

Comparing the fixed intercept model (AIC = 3937.10) to the random intercept model (AIC = 3804.70), there was significant model improvement by adding the random intercept across schools, χ² (1, N= 519) = 134.39, p < .001, with 24.33% of the variance in math scores accounted for by schools.

we then added to the level-1 predictor of homework (hours spent), SES, and SES*Homework interaction (as fixed slopes) to see if these predictors together improved the model fit.

The third step added a random slope around homework along the random intercept for a three-predictor model. The fixed slopes model (AIC = 3620.30) improved model fit over the random intercept only model (AIC = 3698.90), χ² (2, N = 519) = 82.65, p < .001. In our fourth step, we added ‘meanses’ predictor to the model to determine if there was information gain. When comparing the random intercept model with these three predictors as fixed slopes (AIC = 3620.30) to a model with the same predictors but with the level-2 predictor included (AIC = 3616.20), χ²(1, N = 519) = 6.02, p = .0141.

Homework (b = 1.85, SE = 0.87) showed a positive association with math performance, t(19.89) = 2.14, p = .0445, indicating thatas homework hours increases, so does math performance. Additionally, SES (b = 2.43, SE = 0.52) positively predicted math performance, t(478.15) = 4.64, p < .001. Neither level-2 SES (meanses, df = 26.8), p = .0675, nor the interaction between SES and homework (df = 494.7), p = .2667 were statistically significant predictors of math performance.

Question 3

library(car)

## Warning: package 'car' was built under R version 4.2.3

## Loading required package: carData

## Warning: package 'carData' was built under R version 4.2.3

library(lmerTest)
library(mgcv)

## This is mgcv 1.8-41. For overview type 'help("mgcv-package")'.

library(reshape2)

## Warning: package 'reshape2' was built under R version 4.2.3

mlm <- read.delim("C:/Users/John Majoubi/Downloads/mlm.dat", stringsAsFactors=TRUE)
View(mlm)
str(mlm)

## 'data.frame':    747 obs. of  9 variables:
##  $ subno   : int  201 201 201 201 201 201 201 201 201 201 ...
##  $ house   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ age     : int  71 71 71 71 71 71 71 71 71 71 ...
##  $ nights  : int  10 11 12 13 14 15 16 17 18 19 ...
##  $ nightleq: num  86.5 61.5 62.1 70 83.2 82.8 88.9 88.4 66.6 61.6 ...
##  $ latency : int  2 2 2 4 3 4 3 2 4 2 ...
##  $ annoy   : int  3 3 3 3 3 3 4 3 3 3 ...
##  $ site1   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ site2   : int  0 0 0 0 0 0 0 0 0 0 ...

3a)

mlm.sleep = na.exclude(mlm)

library(nlme)

3b)

scatterplot(annoy ~ nights, data = mlm.sleep, xlab = "Nights", ylab = "Level of Annoyance")

scatterplot(annoy ~ nightleq, data = mlm.sleep, xlab = "Level of Noise", ylab = "Level of Annoyance")

scatterplot(annoy ~ latency, data = mlm.sleep, xlab = "Latency(Length of Time to Fall Asleep)", ylab = "Level of Annoyance")

xyplot(annoy ~ nights|as.factor(subno), data = mlm.sleep, type = c("p", "g", "r"), col = "pink", col.line = "blue", xlab = "Nights", ylab = "Level of Annoyance")

boxplot(annoy ~ nights, data = mlm.sleep)

There is evidence for participant level variability to proceed with a random intecept model.

3c)

Two assumptions have failed. There are multiple outliers and the linearity of outcome prediction is violated in the annoyance and ‘Nights’ plot due to being flat.

3d)

There is a visual possibility for a polynomial trend

3e)

Creating the Build-up Model test

#Random Intercept Model
Random.intercept.mlm.sleep = lme(annoy ~ 1, random = ~1|subno, data = mlm.sleep, method = "ML", na.action = na.exclude)
#Fixed Intercept Model
Fixed.Intercept.mlm.sleep = gls(annoy ~ 1, data = mlm.sleep, method = "ML", na.action = na.exclude)

The random intercept model (AIC = 2311.27) loses less information compared to the fixed intercept model(AIC = 2548.57) \(/chi\)²(1) = 239.30 at p < 0.0001

anova(Fixed.Intercept.mlm.sleep, Random.intercept.mlm.sleep)

##                            Model df      AIC      BIC    logLik   Test  L.Ratio
## Fixed.Intercept.mlm.sleep      1  2 2548.571 2557.798 -1272.286                
## Random.intercept.mlm.sleep     2  3 2311.268 2325.108 -1152.634 1 vs 2 239.3031
##                            p-value
## Fixed.Intercept.mlm.sleep         
## Random.intercept.mlm.sleep  <.0001

VarCorr(Random.intercept.mlm.sleep)

## subno = pdLogChol(1) 
##             Variance  StdDev   
## (Intercept) 0.6855489 0.8279788
## Residual    1.1148921 1.0558845

Information is gained with the random intercept compared to the fixed intercept model.

ICC.Random.Intercept.mlm.sleep = 0.6855489/(0.6855489+1.1148921)
ICC.Random.Intercept.mlm.sleep

## [1] 0.3807672

38.08% of variability can be attributed to participants.

3f)

Adding the time variable to the random intercept around participant.

NightsTime.Random.intercept = update(Random.intercept.mlm.sleep, .~. + nights)

VarCorr(NightsTime.Random.intercept)

## subno = pdLogChol(1) 
##             Variance  StdDev   
## (Intercept) 0.6870446 0.8288816
## Residual    1.1092021 1.0531867

anova(Random.intercept.mlm.sleep, NightsTime.Random.intercept)

##                             Model df      AIC      BIC    logLik   Test
## Random.intercept.mlm.sleep      1  3 2311.268 2325.108 -1152.634       
## NightsTime.Random.intercept     2  4 2309.774 2328.228 -1150.887 1 vs 2
##                              L.Ratio p-value
## Random.intercept.mlm.sleep                  
## NightsTime.Random.intercept 3.494147  0.0616

Descriptively it shows less information loss by adding time but it is not statiscally significant, p > 0.05.

ICC.Nights.Time = 0.6870446/(0.6870446+1.1092021)
ICC.Nights.Time

## [1] 0.382489

Although NHST didn’t show p < .05 model improvement for predicting annoyance with the linear night trend, the xy-plot displayed a curvilinear trend.

3g)

creating the build-up model test by adding a polynomial trend to the nights variable:

nights.Poly.Trend = update(NightsTime.Random.intercept, .~poly(nights,3))

VarCorr(nights.Poly.Trend)

## subno = pdLogChol(1) 
##             Variance  StdDev   
## (Intercept) 0.6627684 0.8141059
## Residual    1.0958180 1.0468133

Comparing the cubic and linear model

anova(NightsTime.Random.intercept, nights.Poly.Trend)

##                             Model df      AIC      BIC    logLik   Test
## NightsTime.Random.intercept     1  4 2309.774 2328.228 -1150.887       
## nights.Poly.Trend               2  6 2303.693 2331.373 -1145.846 1 vs 2
##                              L.Ratio p-value
## NightsTime.Random.intercept                 
## nights.Poly.Trend           10.08121  0.0065

There is information gain by adding the cubic trend, p < .05.

ICC.NightsPolyTrend = 0.6627684/(0.6627684+1.0958180)
ICC.NightsPolyTrend

## [1] 0.3768757

37.69% of the variance can be attributed to the cubic trend around time (nights).

3h)

Build-up model test by latency

Latency.NightsPoly = update(nights.Poly.Trend, .~. + latency)

VarCorr(Latency.NightsPoly)

## subno = pdLogChol(1) 
##             Variance  StdDev   
## (Intercept) 0.6513842 0.8070838
## Residual    1.0871173 1.0426492

Comparing the models

anova(nights.Poly.Trend, Latency.NightsPoly)

##                    Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## nights.Poly.Trend      1  6 2303.693 2331.373 -1145.846                        
## Latency.NightsPoly     2  7 2299.348 2331.642 -1142.674 1 vs 2 6.345082  0.0118

Information is gained by adding latency, p < .05.

ICC.Latency.NightsPoly = 0.6513842/(0.6513842+1.0871173)
ICC.Latency.NightsPoly

## [1] 0.3746814

37.46% of variability in the model can be attributed to latency .

3i)

Build-up model test by adding interior noise:

Noise.Latency.NightsPoly = update(Latency.NightsPoly, .~. + nightleq)

VarCorr(Noise.Latency.NightsPoly)

## subno = pdLogChol(1) 
##             Variance  StdDev  
## (Intercept) 0.6808774 0.825153
## Residual    1.0256353 1.012737

Comparing the models:

anova(Latency.NightsPoly, Noise.Latency.NightsPoly)

##                          Model df      AIC      BIC    logLik   Test  L.Ratio
## Latency.NightsPoly           1  7 2299.348 2331.642 -1142.674                
## Noise.Latency.NightsPoly     2  8 2262.452 2299.359 -1123.226 1 vs 2 38.89594
##                          p-value
## Latency.NightsPoly              
## Noise.Latency.NightsPoly  <.0001

ICC.Noise.Latency.NightsPoly = 0.6808774/(0.6808774+1.0256353)
ICC.Noise.Latency.NightsPoly

## [1] 0.3989876

39.90% of variability in the model can be attributed to noise.

APA Summary

A five-step model was used to assess how annoyed participants felt about air-traffic noise over 36 nights. Participants reported annoyance each night. Two other factors, sleep latency and interior noise level, were considered. Individual differences were accounted for.

In step 1, adding individual differences significantly improved the model (AIC = 2548.571 vs. 2311.268, χ2(1, N = 745) = 239.30, p < .001, 39.08% variance accounted for). In step 2, adding a linear trend for nights showed a descriptive improvement but not statistically significant (AIC = 2309.774 vs. 2311.268, χ2(1, N = 745) = 3.49, p = .0616, ICC = .3825).

Step 3 added a cubic trend for nights, which significantly improved the model (AIC = 2303.693 vs. 2309.774, χ2(2, N = 745) = 10.08, p = .0065, ICC = .3769). In step 4, adding latency improved the model significantly (AIC = 2299.348 vs. 2303.693, χ2(1, N = 745) = 6.35, p = .0118, ICC = .3747).

For the final step, adding nightleq further improved the model (AIC = 2262.452 vs. 2299.348, χ2(1, N = 745) = 38.90, p < .001, ICC = .3990). In the final model, 4 of the 5 fixed effects were statistically significant at p < .05. The quadratic and cubic trends for nights positively predicted annoyance ratings, while the linear trend was not significant (p = .0849)

Problem Set_1_777

John mahjoubiniia

2023-10-06

1a)

1b)

Exploratory Data Analysis

Visualizing Random Intercepts

1c)

Using Cook’s D for screening outliers

1d)

1e)

1f)

NHST

1g)

APA Summary

Question 2: MultiLevel Modeling

2a)

2b)

2c)

EDA for Outlier Screening and Collinearity

2d) and 2e)

2f)

Creating a Build-up Model Test Step 1: Random Intercept

2g)

Creating a Build-up Model Test Step 2: Adding Fixed slopes on predictors

2h)

Homework varying by school plot

2i)

2j)

Adding school level SES to the model as fixed slope:

2k)

APA Conclusion

Question 3

3a)

3b)

3c)

3d)

3e)

Creating the Build-up Model test

3f)

Adding the time variable to the random intercept around participant.

3g)

creating the build-up model test by adding a polynomial trend to the nights variable:

Comparing the cubic and linear model

3h)

Build-up model test by latency

3i)

Build-up model test by adding interior noise:

APA Summary