library(tidyverse) # for lots of things
setwd("D:\\r\\hlm") # Setup working directory
df <- read.csv("D:\\r\\hlm\\hw3.csv") # Load the data

# Check nuts and bolts
dim(df)
## [1] 80  4
head(df,3)
##   classroomid time ontaskbehavior group
## 1           1    0              5     0
## 2           1    1             12     0
## 3           1    2             13     0

1. Homework Description

In the SPSS file, you will find information for the following:

  1. Classroom ID
  2. Time (0 = Day One; 1 = Day Two; 2 = Day Three; 3 = Day Four)
  3. Observation Rating Scores for On-Task Behavior (0 to 20 points)
  4. Group (Type of Classroom: 0 = Reading and Writing; 1 = Mathematics and Problem Solving



Question 1. What is the Level One unit of analysis? (1 point)

Time. In this practice dataset, time is the most granular level of analysis. Each observation in the time column represents the on-task behavior score for a specific time within a classroom.


Question 2. What is the Level Two unit of analysis? (1 point)

Classroom. The on-task behavior score for each and every time point is nested within each classroom.


Question 3. Draw time trajectories separately for each group. Reading and Writing should have one plot. Mathematics and Problem Solving should have another plot. Compare and contrast the plots. (6 points)

First, let’s generate a graph for all different time points, as follows:

ggplot(df, aes(x = time, y = ontaskbehavior, group = interaction(classroomid, group), color = as.factor(group))) +
    geom_line() +
    geom_point() +
    scale_color_manual(values = c("blue", "green"), labels = c("Reading and Writing", "Mathematics and Problem Solving")) +
    scale_x_continuous(breaks = 0:3, labels = c("Day One", "Day Two", "Day Three", "Day Four")) +
    labs(title = "Comparison of On-Task Behavior Over Time by Group",
         x = "Time (Days)", y = "On-Task Behavior Score",
         color = "Group") +
    theme_minimal()

Now, let’s take a look at the comparison between the two classrooms, taking the average of different classrooms in each classroom type:

# Aggregating data
aggregated_data <- df %>%
    group_by(time, group) %>%
    dplyr::summarize(average_ontaskbehavior = mean(ontaskbehavior))
## `summarise()` has grouped output by 'time'. You can override using the
## `.groups` argument.
# Create the plot
ggplot(aggregated_data, aes(x = time, y = average_ontaskbehavior, group = as.factor(group), color = as.factor(group))) +
geom_line() +
geom_point() +
scale_color_manual(values = c("blue", "green"), labels = c("Reading and Writing", "Mathematics and Problem Solving")) +
scale_x_continuous(breaks = 0:3, labels = c("Day One", "Day Two", "Day Three", "Day Four")) +
labs(title = "Average On-Task Behavior Over Time by Group",
x = "Time (Days)", y = "Average On-Task Behavior Score",
color = "Group") +
theme_minimal()

From both plots, we can see that for both types of classrooms, the average on-task behavior score goes up as time goes by. However, we noticed two difference:

  1. The mathematics and problem solving group has a higher average on-task behavior score at each time point than the reading and writing group.

  2. The rate of increase of behavior score seems to be faster for the reading and writing group from day 1 to day 2, and from day 3 to day 4. The difference in rate of increase for the two groups is indistinguishable from day 2 to day 3.


Question 4. Are initial status points similar within plots? (1 point)

No. We can see from plot 1 than within each group, different classrooms have quite different initial status points.


Question 5. Are initial status points similar between plots? (1 point)

No. Initial starting point is higher for the mathematical and problem solving group, as you can see from both plots.


Question 6. Are linear growth trajectories similar within plots? (1point)

No. We can see from plot 1 that within each group, classrooms varies in growth trajectories.


Question 7. Are linear growth trajectories similar between plots? (1point)

Yes, as I have stated in Q3. The rate of increase is faster for the reading and writing group from day 1 to day 2, and from day 3 to day 4. The difference in rate of increase for the two groups is indistinguishable from day 2 to day 3.


Question 8. Run unconditional and conditional HLM models to support your responses for questions 4 through 7? (5 points)


a. Was the variance component for initial status significant before groups were compared? Why or why not? (1 point)

Yes. To answer this question, we need an unconditional model with random intercept:

library(lmerTest) # for HLM

lapply(df, class) # Check data format for each column
## $classroomid
## [1] "integer"
## 
## $time
## [1] "integer"
## 
## $ontaskbehavior
## [1] "integer"
## 
## $group
## [1] "integer"
# Change columns into proper data format to feed into HLM models
df$time <- as.numeric(df$time) # time as numeric
df$group <- as.factor(df$group) # group as factor

# Unconditional model
uncon_model <- lmerTest::lmer(ontaskbehavior ~ 1 + (1 | classroomid),
data = df, REML = FALSE)

summary(uncon_model)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: ontaskbehavior ~ 1 + (1 | classroomid)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##    433.9    441.0   -213.9    427.9       77 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.01401 -0.61634 -0.06546  0.58085  3.06471 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  classroomid (Intercept) 3.399    1.844   
##  Residual                9.925    3.150   
## Number of obs: 80, groups:  classroomid, 20
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)   9.4750     0.5422 20.0000   17.47  1.4e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the model output, we can see that the random effect intercept has a variance of 3.399, which is much larger than 0 (the scale is 0 to 20 since we are talking about the behavior score). This means that the initial status of the behavior score is quite different among classrooms (Level 2).


b. Was the variance component for linear growth rate significant before groups were compared? Why or why not? (1 point)

Yes. To answer this question, we fit a mixed effect model with both a fixed effect estimate for time, and a random slope estimate for time:

uncon_model_growth_rate <- lmerTest::lmer(ontaskbehavior ~ time + (time | classroomid), 
                                          data = df, REML = FALSE)

summary(uncon_model_growth_rate)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: ontaskbehavior ~ time + (time | classroomid)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##    344.6    358.8   -166.3    332.6       74 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.89209 -0.46781 -0.04962  0.43420  3.07415 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev. Corr 
##  classroomid (Intercept) 7.7805   2.7894        
##              time        0.5721   0.7564   -0.56
##  Residual                1.4100   1.1874        
## Number of obs: 80, groups:  classroomid, 20
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)   6.2800     0.6621 20.0002   9.485 7.64e-09 ***
## time          2.1300     0.2067 19.9998  10.307 1.90e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## time -0.586

From the model output, we can see that the variance is 0.572 for the growth rate. The variance component is not extremely large, but is still substantial, considering that it is a measure of variability in the rate of change of the on-task behavior score across classrooms.


c. Does Type of Classroom explain the variance in initial status? Why or why not? (2 points)

Yes. To answer this question, we need to fit a fixed effect model with a fixed effect estimate for classroom type (group).

model_with_group <- lmerTest::lmer(ontaskbehavior ~ group + (1 | classroomid), 
                                   data = df, REML = FALSE)

summary(model_with_group)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: ontaskbehavior ~ group + (1 | classroomid)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##    425.4    434.9   -208.7    417.4       76 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.6838 -0.7824  0.0397  0.7027  3.3949 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  classroomid (Intercept) 0.9969   0.9984  
##  Residual                9.9250   3.1504  
## Number of obs: 80, groups:  classroomid, 20
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)   7.9250     0.5898 20.0000  13.438  1.8e-11 ***
## group1        3.1000     0.8340 20.0000   3.717  0.00136 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr)
## group1 -0.707

Recall that our unconditional model has an intercept variance of 3.399. In this model, our variance component is reduced to 0.997. The means that the type of classrooms accounts for a significant proportion (about 70.671%) of the variance in the initial status of the behavior score.


d. Does Type of Classroom explain the variance in linear growth rate? Why or why not? (2 points)

No. To answer this question, the model should include group as a predictor for both the intercept and the slope of time-behavior score:

model_growth_rate <- lmer(ontaskbehavior ~ time * group + (time | classroomid), 
                          data = df, REML = FALSE)

summary(model_growth_rate)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: ontaskbehavior ~ time * group + (time | classroomid)
##    Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##    335.7    354.7   -159.8    319.7       72 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.88370 -0.51014 -0.06529  0.43689  3.06869 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev. Corr 
##  classroomid (Intercept) 3.6596   1.9130        
##              time        0.4697   0.6853   -0.40
##  Residual                1.4100   1.1874        
## Number of obs: 80, groups:  classroomid, 20
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)   4.2500     0.6817 20.0001   6.235 4.34e-06 ***
## time          2.4500     0.2742 20.0006   8.936 2.02e-08 ***
## group1        4.0600     0.9640 20.0001   4.212 0.000429 ***
## time:group1  -0.6400     0.3877 20.0006  -1.651 0.114431    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) time   group1
## time        -0.510              
## group1      -0.707  0.361       
## time:group1  0.361 -0.707 -0.510

From the model output, we can see that the interaction fixed effects between time and the type of classroom (group) is insignificant (p < .05) with a small effect size (-.6400). This suggests that the linear growth rate has no statistically significant difference between the two types of classrooms.


e. Summarize whether type of classroom improves explanation of the variance in the observation rating scores for on-task behavior.

Recall that in the unconditional model, the variance of the random intercept (initial status) was 3.399, indicating substantial variability in initial behavior scores across classrooms. In contrast, inclusion of the “group” variable reduced the variance of the random intercept to 0.997. This significant reduction suggests that the type of classroom explains a considerable portion of the variance in initial scores across classrooms.

On the other hand, both the unconditional model and the conditional model with classroom type as a predictor has a residual variance of 9.925, indicating that classroom type does not explain the variability of scores within each classroom.


i. Comment on the goodness-of-fit indices when comparing and contrasting models. (1 point)

  • First, our unconditional model ontaskbehavior ~ 1 + (1 | classroomid), as presented in question a, has AIC/BIC = 433.9/441.0. This is a baseline model.

  • Second, we included time as a predictor ontaskbehavior ~ time + (time | classroomid), and got AIC/BIC = 344.6/358.8. Both of them are much lower than the baseline model, indicating substantially improved goodness-of-fit. In addition, the significant time coefficient (beta = 2.130, p < 0.001) suggests that on-task behavior scores change significantly over time.

  • Third, we included group as a predictor ontaskbehavior ~ group + (1 | classroomid). This model fit with AIC/BIC = 425.4/434.9, indicating the model is a better fit to the data than the baseline model, but not as well as the model with time as a predictor. Additionally, The significant group fixed effects coefficient (beta = 3.100, p < 0.01) indicates that on-task behavior scores differ significantly between different types of classrooms.

  • Finally, we fit a model with both time and group as predictors while also estimating their interaction effects ontaskbehavior ~ time * group + (time | classroomid). This model achieve an AIC/BIC = 335.7/354.7, providing the best fit among the four models. The significant coefficients for time and classroom types remain, but the interaction between time and the classroom type is not significant at p < .05 level, indicating that the rate of change over time does not differ significantly between different types of classrooms.

  • In summary, both time and classroom type are important predictors for the behavior score, but time is a particularly strong predictor.


ii. Comment on the degree to which the variance components change when comparing the unconditional and conditional HLM model. (1 point)

First, our unconditional model ontaskbehavior ~ 1 + (1 | classroomid), the intercept variance is 3.709, indicating classrooms differ a lot in their average on-task behavior scores. The residual variance is 9.925, indicating that there is also substantial variability within each classroom.

Second, we included time as a predictor ontaskbehavior ~ time + (time | classroomid). the intercept variance (7.781) increases almost by half. This is expected because including time as a predictor allows our model to freely estimates the between-classroom variability of the growth rate over time. Therefore, the substantial increase of the intercept variance suggests that there’s substantial heterogeneity among classrooms both in terms of their initial status and in their rates of change over time.

Meanwhile, the residual variance is reduced from 9.925 to 1.410. This tells us that by modeling the change over time differently for each classroom, our model explains more of the within-classroom variability that was left as unexplained noise (residual) in the first model. In other words, the behavior score of students within classrooms also varies a lot over time.

Third, we included group as a predictor ontaskbehavior ~ group + (1 | classroomid). The intercept variance is reduced from 3.709 to 0.997, which means that classroom type explains much of the between-classroom variability in initial scores.The residual variance is still 9.925, meaning classroom type does not explain within-classroom variability in behavior scores.

Finally, we fit a model with both time and group as predictors while also estimating their interaction effects ontaskbehavior ~ time * group + (time | classroomid). the intercept variance is slightly increased to 3.660. This denotes that the variability in initial status is not heavily influenced by the two predictors time and classroom type.

The random slope for time has a variance of 0.470, indicating that there is significant variability in how behavior score changes over time across different classrooms. Also, the residual variance is substantially reduced to 1.410. This means that these two predictors explain a good portion of within-classroom variability in scores.

In summary, the conditional models, particularly by including time as a predictor, provide a better explanation of the variability in on-task behavior scores both within and between classrooms compared to the unconditional models.


Note: I also tried to fit a model that estimates the random effects for classroom types:

# group as a random effect
# model_group_random <- lmer(ontaskbehavior ~ time * group + (time + group |                                     classroomid), data = df, REML = FALSE)

#summary(model_group_random)

But this model is nearly unidentified. Statistically, I think it’s related to we have a simple dataset with n = 80, which does not work with a relatively complicated model. Empirically, I believe the classroom type should be viewed as a treatment variable instead of a variable that identifies a random sample.