Intro

Broadly: Four different proportions were available for 200+ variables.

Detailed: Percentage of teachers endorsing each option for cause, diagnostic traits, diagnostic activities by school type (Preschool, Primary School, Middle School, High School).

	Preschool	Primary Sch.	Middle Sch.	High Sch.
	(n = 92 )	(n = 105 )	(n = 126 )	(n = 155 )
Genetic	56.6	67.6	27	38
Neurological	44.6	22.8	48.4	43.8
Mental illness	19.6	24.8	32.6	32.2
Vaccinations	10.8	26.6	30.2	29.6
Environmental exposure	22.8	21	14.2	15.4
Malnutrition in pregnancy	17.4	9.6	17.4	22
Parenting	18.4	10.4	16.6	7
Dietary/nutritional issues	3.2	8.6	8.8	11
Drug use of mother	6.6	8.6	4.8	0.6

Purpose

For 200 + variables run (a) chi-square test of independence & report results (b) pairwise comparisons for proportions with False Discovery Rate adjusted p values & report.

Code

Firt create a csv file that includes numbers (proportion * n) see. ¹

Then use R in a loop to run analyses.

Chi-square independence test and pairwise comparisons for proportions are conducted with R using stats package (R Core Team, 2016). We adjusted p-values for multiple comparisons using false discovery rate (fdr) procedure (Benjamini & Hochberg, 1995).

data_prop=read.csv("table2.csv")
head(data_prop)
str(data_prop)

#comparing 4 
ns=c(92,105,126,155)  #n s 
outmat=matrix(0,ncol=4,nrow=29)  #  number of variables in Table2 is 29
for(i in 1:29)
{
x1=as.integer(round(data_prop[i,2:5]*ns/100,0))
q1=ns-x1
M <- as.table(rbind(x1, q1))
dimnames(M) <- list(yes = c("1", "0"),
                    sch = c("pre","prim", "mid","high"))
xsq=chisq.test(M)$statistic
df=chisq.test(M)$parameter
p1=round(chisq.test(M)$p.value,3)   # book version p value
p2=round(chisq.test(M,simulate.p.value = TRUE, B = 10000)$p.value,3) # simulation based p value
outmat[i,]=c(xsq,df,p1,p2)
}
write.csv(outmat,file="table2_all.csv")


rm(list=ls())

### pairwise
data_prop=read.csv("table2.csv")
head(data_prop)
str(data_prop)

#pairwise
ns=c(92,105,126,155)
outp=matrix(0,ncol=3)
for(i in 1:29)
{
  x1=as.integer(round(data_prop[i,2:5]*ns/100,0))
  outp=rbind(outp,round(pairwise.prop.test(x1, ns, p.adjust.method="fdr")$p.value,3),"data_prop")
}
outp2=cbind(as.character(rep(data_prop[,1],each=4)),outp[-1,])

write.csv(outp2,file="table2_pair.csv")

Result

200+ variables were analyzed and reported in less than 1 minute.

References

R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289–300.

This sample has 29 variables↩

Automated analyses on proportions

Burak AYDIN

Oct 31, 2016

Intro

Purpose

Code

Result

References