Intro

Broadly: Four different proportions were available for 200+ variables.

Detailed: Percentage of teachers endorsing each option for cause, diagnostic traits, diagnostic activities by school type (Preschool, Primary School, Middle School, High School).

Preschool Primary Sch. Middle Sch. High Sch.
(n = 92 ) (n = 105 ) (n = 126 ) (n = 155 )
Genetic 56.6 67.6 27 38
Neurological 44.6 22.8 48.4 43.8
Mental illness 19.6 24.8 32.6 32.2
Vaccinations 10.8 26.6 30.2 29.6
Environmental exposure 22.8 21 14.2 15.4
Malnutrition in pregnancy 17.4 9.6 17.4 22
Parenting 18.4 10.4 16.6 7
Dietary/nutritional issues 3.2 8.6 8.8 11
Drug use of mother 6.6 8.6 4.8 0.6

Purpose

For 200 + variables run (a) chi-square test of independence & report results (b) pairwise comparisons for proportions with False Discovery Rate adjusted p values & report.

Code

Firt create a csv file that includes numbers (proportion * n) see. 1

Then use R in a loop to run analyses.

Chi-square independence test and pairwise comparisons for proportions are conducted with R using stats package (R Core Team, 2016). We adjusted p-values for multiple comparisons using false discovery rate (fdr) procedure (Benjamini & Hochberg, 1995).

data_prop=read.csv("table2.csv")
head(data_prop)
str(data_prop)

#comparing 4 
ns=c(92,105,126,155)  #n s 
outmat=matrix(0,ncol=4,nrow=29)  #  number of variables in Table2 is 29
for(i in 1:29)
{
x1=as.integer(round(data_prop[i,2:5]*ns/100,0))
q1=ns-x1
M <- as.table(rbind(x1, q1))
dimnames(M) <- list(yes = c("1", "0"),
                    sch = c("pre","prim", "mid","high"))
xsq=chisq.test(M)$statistic
df=chisq.test(M)$parameter
p1=round(chisq.test(M)$p.value,3)   # book version p value
p2=round(chisq.test(M,simulate.p.value = TRUE, B = 10000)$p.value,3) # simulation based p value
outmat[i,]=c(xsq,df,p1,p2)
}
write.csv(outmat,file="table2_all.csv")


rm(list=ls())

### pairwise
data_prop=read.csv("table2.csv")
head(data_prop)
str(data_prop)

#pairwise
ns=c(92,105,126,155)
outp=matrix(0,ncol=3)
for(i in 1:29)
{
  x1=as.integer(round(data_prop[i,2:5]*ns/100,0))
  outp=rbind(outp,round(pairwise.prop.test(x1, ns, p.adjust.method="fdr")$p.value,3),"data_prop")
}
outp2=cbind(as.character(rep(data_prop[,1],each=4)),outp[-1,])

write.csv(outp2,file="table2_pair.csv")

Result

200+ variables were analyzed and reported in less than 1 minute.

References

R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289–300.


  1. This sample has 29 variables