Broadly: Four different proportions were available for 200+ variables.
Detailed: Percentage of teachers endorsing each option for cause, diagnostic traits, diagnostic activities by school type (Preschool, Primary School, Middle School, High School).
| Preschool | Primary Sch. | Middle Sch. | High Sch. | |
|---|---|---|---|---|
| (n = 92 ) | (n = 105 ) | (n = 126 ) | (n = 155 ) | |
| Genetic | 56.6 | 67.6 | 27 | 38 |
| Neurological | 44.6 | 22.8 | 48.4 | 43.8 |
| Mental illness | 19.6 | 24.8 | 32.6 | 32.2 |
| Vaccinations | 10.8 | 26.6 | 30.2 | 29.6 |
| Environmental exposure | 22.8 | 21 | 14.2 | 15.4 |
| Malnutrition in pregnancy | 17.4 | 9.6 | 17.4 | 22 |
| Parenting | 18.4 | 10.4 | 16.6 | 7 |
| Dietary/nutritional issues | 3.2 | 8.6 | 8.8 | 11 |
| Drug use of mother | 6.6 | 8.6 | 4.8 | 0.6 |
For 200 + variables run (a) chi-square test of independence & report results (b) pairwise comparisons for proportions with False Discovery Rate adjusted p values & report.
Firt create a csv file that includes numbers (proportion * n) see. 1
Then use R in a loop to run analyses.
Chi-square independence test and pairwise comparisons for proportions are conducted with R using stats package (R Core Team, 2016). We adjusted p-values for multiple comparisons using false discovery rate (fdr) procedure (Benjamini & Hochberg, 1995).
data_prop=read.csv("table2.csv")
head(data_prop)
str(data_prop)
#comparing 4
ns=c(92,105,126,155) #n s
outmat=matrix(0,ncol=4,nrow=29) # number of variables in Table2 is 29
for(i in 1:29)
{
x1=as.integer(round(data_prop[i,2:5]*ns/100,0))
q1=ns-x1
M <- as.table(rbind(x1, q1))
dimnames(M) <- list(yes = c("1", "0"),
sch = c("pre","prim", "mid","high"))
xsq=chisq.test(M)$statistic
df=chisq.test(M)$parameter
p1=round(chisq.test(M)$p.value,3) # book version p value
p2=round(chisq.test(M,simulate.p.value = TRUE, B = 10000)$p.value,3) # simulation based p value
outmat[i,]=c(xsq,df,p1,p2)
}
write.csv(outmat,file="table2_all.csv")
rm(list=ls())
### pairwise
data_prop=read.csv("table2.csv")
head(data_prop)
str(data_prop)
#pairwise
ns=c(92,105,126,155)
outp=matrix(0,ncol=3)
for(i in 1:29)
{
x1=as.integer(round(data_prop[i,2:5]*ns/100,0))
outp=rbind(outp,round(pairwise.prop.test(x1, ns, p.adjust.method="fdr")$p.value,3),"data_prop")
}
outp2=cbind(as.character(rep(data_prop[,1],each=4)),outp[-1,])
write.csv(outp2,file="table2_pair.csv")
200+ variables were analyzed and reported in less than 1 minute.
R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289–300.
This sample has 29 variables↩