Repeated Measures ANOVA

Summary Stats

# math_df %>% str
math_df$gender %<>% factor(labels = c(`1` = "M", `2` = "F"))
tags$p("Summary statistics by Male and Female genders respectively are below:")

Summary statistics by Male and Female genders respectively are below:

psych::describeBy(math_df %>% select(-cid, -gender), list(math_df$gender)) %>% purrr::walk(.f = function(.) stargazer::stargazer(., 
    summary = F, type = "html", rownames = T, colnames = T, header = F, digits = 3))


	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se

math_1	1	101	26.145	8.223	25.360	25.660	7.087	10.670	51.370	40.700	0.579	0.078	0.818
math_2	2	101	38.575	13.334	36.240	37.227	10.971	15.960	78.080	62.120	0.894	0.490	1.327
math_3	3	101	45.553	14.019	44.290	44.683	12.202	18.390	86.120	67.730	0.632	0.477	1.395
math_4	4	101	62.100	18.558	60.320	60.929	17.584	31.280	109.300	78.020	0.527	-0.377	1.847
math_8	5	101	97.389	20.311	101.480	98.597	21.379	48.790	132.040	83.250	-0.478	-0.473	2.021
math_12	6	101	119.242	19.468	123.500	121.381	16.575	60.670	148.850	88.180	-0.920	0.380	1.937


	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se

math_1	1	99	23.203	9.149	21.240	22.237	7.161	9.480	63.220	53.740	1.664	4.606	0.920
math_2	2	99	32.391	11.128	30.240	31.344	9.251	13.640	65.060	51.420	0.889	0.560	1.118
math_3	3	99	39.504	12.928	37.710	38.306	9.993	14.470	75.490	61.020	0.865	0.606	1.299
math_4	4	99	56.812	15.860	55.300	56.167	14.233	19.950	96.840	76.890	0.430	-0.188	1.594
math_8	5	99	89.142	21.246	88.280	89.076	19.407	35.770	138.670	102.900	0.013	-0.263	2.135
math_12	6	99	111.522	20.926	113.320	112.917	20.371	62.620	144.800	82.180	-0.524	-0.400	2.103

tags$p("The numbers of individuals in groups are equal. The math scores at time point 1 may not be normally distributed.")

The numbers of individuals in groups are equal. The math scores at time point 1 may not be normally distributed.

Graphical Exploration

g <- lapply(math_df %>% select(starts_with("math")) %>% names, function(nm) {
    math_df %>% ggplot(data = ., mapping = aes_string(x = nm)) + geom_histogram(binwidth = 2, 
        aes(y = ..density..)) + geom_density(col = 2) + facet_grid(~gender)
})
gridExtra::grid.arrange(grobs = g, nrow = 3)

tags$p("Time points 1 and 2, as expected from the summary statistics, exhibit right skew and show a couple of outliers. The graphs over time show a progression from right skew to semi-normality to left skew suggesting the gradual improvement over time in math scores. We can evaluate this observation by averaging the data and looking at trend lines.")

Time points 1 and 2, as expected from the summary statistics, exhibit right skew and show a couple of outliers. The graphs over time show a progression from right skew to semi-normality to left skew suggesting the gradual improvement over time in math scores. We can evaluate this observation by averaging the data and looking at trend lines.

math_df %>% select(-cid) %>% group_by(gender) %>% summarize_all(.funs = mean) %>% 
    as.data.frame() %>% reshape(varying = names(math_df)[-c(1, 2)], v.names = "avg", 
    timevar = "yr", times = names(math_df)[-c(1, 2)], direction = "long") %>% mutate(s = str_extract(yr, 
    "\\d{1,2}") %>% as.numeric) %>% arrange(s) %T>% {
    .$yr <- factor(.$yr, levels = unique(.$yr[order(.$s)]))
} %T>% {
    .$gender %<>% factor(labels = c(`1` = "M", `2` = "F"))
} %>% ggplot(data = ., mapping = aes(y = avg, x = yr, color = gender, group = gender)) + 
    geom_line() + geom_smooth(method = "lm", alpha = 0.2) + labs(title = "Average Scores across Time", 
    subtitle = "", caption = "", x = "Time Points", y = "Average Score") + theme(plot.title = element_text(hjust = 0.5), 
    plot.subtitle = element_text(hjust = 0.5))

tags$p("As anticipated, there is a significant positive correlation between time and the math score. Gender does not appear to be significant, but the ANOVA is more suited to make this determination..")

As anticipated, there is a significant positive correlation between time and the math score. Gender does not appear to be significant, but the ANOVA is more suited to make this determination..

Hypotheses

\[\begin{aligned} \text{There is no change across subjects by gender: }& \hat{y}_{M} = \hat{y}_{F} \\ \text{There is no change between subjects across time: }& \hat{y}_{t_1} = \hat{y}_{t_2} ...=... \hat{y}_{t_x} \\ \text{There is no change due to interaction effects of gender and time: }& \hat{y}^{M}_{t_1} = \hat{y}^{F}_{t_1} ... = ... \hat{y}^{M}_{t_x} = \hat{y}^{F}_{t_x} \end{aligned}\]

Assumption Testing

Normality

The summary statistics make it evident that violations of normality are present in at least the first time observation and likely the last time observation, but given the nature of measuring math scores this is to be expected.

Sphericity ?

Repeated-Measures ANOVA

math_df.long <- math_df %>% tidyr::gather(key = "time", value = list(math = math_1:math_12), 
    -gender, -cid, factor_key = T) %>% rename(math = 4)

ANOVA Simple Version

anova(lm(math ~ time + gender + (time * gender)^2, data = math_df.long)) %T>% assign("math_long.aov", 
    ., envir = .GlobalEnv) %>% stargazer::stargazer(summary = F, type = "html", rownames = T, 
    colnames = T, header = F, digits = 3)


	Df	Sum Sq	Mean Sq	F value	Pr(> F)

time	5	1,262,337.000	252,467.400	979.192	0
gender	1	11,059.060	11,059.060	42.892	0
time:gender	5	893.274	178.655	0.693	0.629
Residuals	1,188	306,305.000	257.833

# https://stackoverflow.com/questions/35104052/how-to-remake-aov-to-car-package-anova-to-get-mauchlys-test-for-sphericity
# car::Anova(lm(math~time+gender+(time*gender)^2,idata =
# data.frame(gender=math_df.long$gender,math=math_df.long$math),idesign=~time*gender,data=math_df.long),type=3)
# aov(math ~ gender * time + Error(gender/time),data=math_df.long) %>%
# summary.aov

The results indicate significance both within subjects across grade levels F₍₅₎=979.19, p<.001 and between subjects across gender F₍₁₎=42.89, p<.001, but no interaction between the grade levels.

However, we might be interested on math outcomes at the conclusion of primary school. We can look all combinations of possible interactions between gender and grade levels on the math scores in grade 12.

ANOVA All possible Interactions

paste("math_12~", "+gender+", paste0(math_df %>% select(starts_with("math")) %>% 
    names %>% .[-length(.)], collapse = "+"), "+(", paste0(math_df %>% select(starts_with("math")) %>% 
    names %>% .[-length(.)], collapse = "+"), "+gender)^7", sep = "") %>% as.formula %>% 
    lm(data = math_df) %>% anova() %T>% assign("math_aov", ., envir = .GlobalEnv) %>% 
    stargazer::stargazer(summary = F, type = "html", rownames = T, colnames = T, 
        header = F, digits = 3)


	Df	Sum Sq	Mean Sq	F value	Pr(> F)

gender	1	2,979.206	2,979.206	35.643	0.00000
math_1	1	30,059.270	30,059.270	359.629	0
math_2	1	5,577.759	5,577.759	66.732	0
math_3	1	6,842.475	6,842.475	81.863	0
math_4	1	5,959.024	5,959.024	71.294	0
math_8	1	16,577.280	16,577.280	198.330	0
math_1:math_2	1	873.738	873.738	10.453	0.002
math_1:math_3	1	565.361	565.361	6.764	0.010
math_1:math_4	1	0.020	0.020	0.0002	0.988
math_1:math_8	1	66.884	66.884	0.800	0.373
gender:math_1	1	107.013	107.013	1.280	0.260
math_2:math_3	1	116.006	116.006	1.388	0.241
math_2:math_4	1	270.144	270.144	3.232	0.074
math_2:math_8	1	34.018	34.018	0.407	0.525
gender:math_2	1	10.564	10.564	0.126	0.723
math_3:math_4	1	97.983	97.983	1.172	0.281
math_3:math_8	1	135.374	135.374	1.620	0.205
gender:math_3	1	5.105	5.105	0.061	0.805
math_4:math_8	1	15.410	15.410	0.184	0.668
gender:math_4	1	80.079	80.079	0.958	0.329
gender:math_8	1	43.907	43.907	0.525	0.470
math_1:math_2:math_3	1	16.431	16.431	0.197	0.658
math_1:math_2:math_4	1	2.169	2.169	0.026	0.872
math_1:math_2:math_8	1	55.019	55.019	0.658	0.419
gender:math_1:math_2	1	0.973	0.973	0.012	0.914
math_1:math_3:math_4	1	11.320	11.320	0.135	0.713
math_1:math_3:math_8	1	34.100	34.100	0.408	0.524
gender:math_1:math_3	1	3.778	3.778	0.045	0.832
math_1:math_4:math_8	1	23.061	23.061	0.276	0.600
gender:math_1:math_4	1	27.191	27.191	0.325	0.569
gender:math_1:math_8	1	66.106	66.106	0.791	0.375
math_2:math_3:math_4	1	2.802	2.802	0.034	0.855
math_2:math_3:math_8	1	178.046	178.046	2.130	0.147
gender:math_2:math_3	1	6.316	6.316	0.076	0.784
math_2:math_4:math_8	1	60.490	60.490	0.724	0.396
gender:math_2:math_4	1	6.604	6.604	0.079	0.779
gender:math_2:math_8	1	27.267	27.267	0.326	0.569
math_3:math_4:math_8	1	23.918	23.918	0.286	0.594
gender:math_3:math_4	1	1.004	1.004	0.012	0.913
gender:math_3:math_8	1	395.426	395.426	4.731	0.031
gender:math_4:math_8	1	32.679	32.679	0.391	0.533
math_1:math_2:math_3:math_4	1	21.389	21.389	0.256	0.614
math_1:math_2:math_3:math_8	1	62.383	62.383	0.746	0.389
gender:math_1:math_2:math_3	1	17.469	17.469	0.209	0.648
math_1:math_2:math_4:math_8	1	0.252	0.252	0.003	0.956
gender:math_1:math_2:math_4	1	8.015	8.015	0.096	0.757
gender:math_1:math_2:math_8	1	0.626	0.626	0.007	0.931
math_1:math_3:math_4:math_8	1	13.336	13.336	0.160	0.690
gender:math_1:math_3:math_4	1	22.650	22.650	0.271	0.604
gender:math_1:math_3:math_8	1	19.265	19.265	0.230	0.632
gender:math_1:math_4:math_8	1	65.717	65.717	0.786	0.377
math_2:math_3:math_4:math_8	1	69.702	69.702	0.834	0.363
gender:math_2:math_3:math_4	1	4.052	4.052	0.048	0.826
gender:math_2:math_3:math_8	1	7.349	7.349	0.088	0.767
gender:math_2:math_4:math_8	1	127.012	127.012	1.520	0.220
gender:math_3:math_4:math_8	1	32.594	32.594	0.390	0.533
math_1:math_2:math_3:math_4:math_8	1	61.319	61.319	0.734	0.393
gender:math_1:math_2:math_3:math_4	1	276.169	276.169	3.304	0.071
gender:math_1:math_2:math_3:math_8	1	128.143	128.143	1.533	0.218
gender:math_1:math_2:math_4:math_8	1	53.071	53.071	0.635	0.427
gender:math_1:math_3:math_4:math_8	1	1.780	1.780	0.021	0.884
gender:math_2:math_3:math_4:math_8	1	43.228	43.228	0.517	0.473
gender:math_1:math_2:math_3:math_4:math_8	1	0.068	0.068	0.001	0.977
Residuals	136	11,367.440	83.584

sl <- c(0.001, 0.01, 0.05, 0.1)
math_aov.sig <- math_aov[math_aov$`Pr(>F)` < 0.1, ] %>% na.omit
for (i in seq_along(sl)) {
    cat("Variables with a significant influence on grade 12 scores at the ", sl[i], 
        " alpha level include: \n", sep = "\n")
    
    if (i == 1) {
        l <- math_aov$`Pr(>F)` < sl[i]
    } else {
        l <- {
            math_aov$`Pr(>F)` < sl[i] & math_aov$`Pr(>F)` >= sl[i - 1]
        }
    }
    math_aov.sig <- math_aov[l, ] %>% na.omit
    for (rn in 1:nrow(math_aov.sig)) {
        paste(row.names(math_aov.sig)[rn], math_aov.sig[rn, ] %>% apa_t(type = "html"), 
            "\n") %>% cat(sep = "\n")
    }
}

Variables with a significant influence on grade 12 scores at the 0.001 alpha level include:

gender F₍₁₎=35.64, p<.001

math_1 F₍₁₎=359.63, p<.001

math_2 F₍₁₎=66.73, p<.001

math_3 F₍₁₎=81.86, p<.001

math_4 F₍₁₎=71.29, p<.001

math_8 F₍₁₎=198.33, p<.001

Variables with a significant influence on grade 12 scores at the 0.01 alpha level include:

math_1:math_2 F₍₁₎=10.45, p<.01

Variables with a significant influence on grade 12 scores at the 0.05 alpha level include:

math_1:math_3 F₍₁₎=6.76, p<.05

gender:math_3:math_8 F₍₁₎=4.73, p<.05

Variables with a significant influence on grade 12 scores at the 0.1 alpha level include:

math_2:math_4 F₍₁₎=3.23, p<.1

gender:math_1:math_2:math_3:math_4 F₍₁₎=3.3, p<.1

cat("Gender is again a significant influencing variable on grade 12 math scores, as are scores in each grade level prior to grade 12. There are also significant interaction effects, the most significant of which are interactions between early grade level math scores, and the grades across gender for grades 3 & 8. These finding suggest that the early exposure to math in grades 1-3 has a more significant impact on how students perform as they graduate high school. It also suggests that males and females may learn different in grades 3 and grade 8 due to some developmental effects. Further research would be necessary to determine the nature of this interaction.")

Gender is again a significant influencing variable on grade 12 math scores, as are scores in each grade level prior to grade 12. There are also significant interaction effects, the most significant of which are interactions between early grade level math scores, and the grades across gender for grades 3 & 8. These finding suggest that the early exposure to math in grades 1-3 has a more significant impact on how students perform as they graduate high school. It also suggests that males and females may learn different in grades 3 and grade 8 due to some developmental effects. Further research would be necessary to determine the nature of this interaction.

Repeated Measures ANOVA

Stephen Synchronicity

2018-11-14

Data Exploration

Summary Stats

Graphical Exploration

Hypotheses

Assumption Testing

Repeated-Measures ANOVA