The purpose of this paper is to determine whether or not private schools outperform public schools on the Foundation Skills Assessment (FSA) tests. In order to answer this question we collected and processed data from the provincial Ministry of Education web-site and conducted a two sample t-test to prove or reject hypothesis that FSA results are the same for both public and private schools in B.C.
First, we found web-page «Provincial Reports» http://www.bced.gov.bc.ca/reporting/province.php containing the links to Achievements Reports including Foundation Skills Assessment. We found that reports on this page contains only the summary of FSA tests and information in those documents is not enough to conduct a statistical analysis. We followed the link on Provincial Reports webpage to web-site DataBC that store historical data provided by the Ministry of Education. (http://www.data.gov.bc.ca/dbc/catalogue/detail.page?config=dbc&P110=recorduid:178244&recorduid=178244&title=BC%20Schools%20-%20Foundation%20Skills%20Assessment%20(FSA)%202007-2013) Here we found document with the results of the Grades 4 and 7 BC Foundation Skills Assessments in Numeracy, Reading and Writing from 2007 to 2013. This data is available in spreadsheet and delimited text format; and the size of the documents are 27Mb for spreadsheet and 44Mb for plain text. Excel does not work so well with large documents; therefore, as an instrument for statistical analysis we chose CRAN-R software environment and programming language.
Reading data to a new dataset directly from DataBC web-site.
#dataset <- read.delim("http://www.bced.gov.bc.ca/reporting/odefiles/Foundation_Skills_Assessment_2012-2013.txt", header = TRUE, as.is = TRUE)
dataset <- read.delim("Foundation_Skills_Assessment_2012-2013.txt", header = TRUE, as.is = TRUE)
Selecting values SCHOOL LEVEL and ALL STUDENTS in the fields Data Level and Sub Population
L=dataset$Sub.Population == "ALL STUDENTS"
dataset <- dataset[L,]
L=dataset$Data.Level == "SCHOOL LEVEL"
dataset <- dataset[L,]
dataset[dataset == "Msk" | dataset =="-" | dataset =="PROVINCE - TOTAL"] = NA
dataset <- dataset[!(is.na(dataset$Score) | dataset$Score==""| dataset$Score==0 | is.na(dataset$Public.Or.Independent)), ]
Treat Percent.Meet.Or.Exceed and Score fields as numeric
dataset$Score <- as.numeric(dataset$Score)
dataset$Percent.Meet.Or.Exceed <- as.numeric(dataset$Percent.Meet.Or.Exceed)
We decided to conduct t-tests for two sample mean for the following categories: * Grade 4 * Writing * Reading * Math * Grade 7 * Writing * Reading * Math
Here we need to write why we decided to test grade 4 and 7 separately and why we think that “Private schools might show better results in Numeracy compared to that of public schools and worse results for Reading.” (Kate) Also, we need to say why we’ve chosen to analyse Percent Meet or Exceed over Score
Overlaid Histograms and Density Plots with Means for Grade 4 Writing:
plot1 <- ggplot(dataset_test,
aes(x=Percent.Meet.Or.Exceed, fill=Public.Or.Independent)) +
geom_histogram(binwidth = 10, colour = "black", alpha=.5, position="stack") +
geom_vline(data=cdf, aes(xintercept=Score.mean), linetype="dashed", size=1) +
ggtitle ("Grade 4. Writing. Overlaid histograms")
plot2 <- ggplot(dataset_test,
aes(x=Percent.Meet.Or.Exceed)) +
geom_density(aes(fill = Public.Or.Independent), alpha=0.7) +
geom_vline(data=cdf, aes(xintercept=Score.mean),
linetype="dashed", size=1) +
ggtitle ("Grade 4. Writing. Density plots with means")
multiplot(plot1, plot2, cols=1)
Overlaid Histograms and Density Plots with Means for Grade 4 Numeracy:
Overlaid Histograms and Density Plots with Means for Grade 4 Reading:
Overlaid Histograms and Density Plots with Means for Grade 7 Writing:
plot1 <- ggplot(dataset_test,
aes(x=Percent.Meet.Or.Exceed, fill=Public.Or.Independent)) +
geom_histogram(binwidth = 10, colour = "black", alpha=.5, position="stack") +
geom_vline(data=cdf, aes(xintercept=Score.mean), linetype="dashed", size=1) +
ggtitle ("Grade 7. Writing. Overlaid histograms")
plot2 <- ggplot(dataset_test,
aes(x=Percent.Meet.Or.Exceed)) +
geom_density(aes(fill = Public.Or.Independent), alpha=0.7) +
geom_vline(data=cdf, aes(xintercept=Score.mean),
linetype="dashed", size=1) +
ggtitle ("Grade 7. Writing. Density plots with means")
multiplot(plot1, plot2, cols=1)
Overlaid Histograms and Density Plots with Means for Grade 7 Numeracy:
Overlaid Histograms and Density Plots with Means for Grade 7 Reading:
Histograms and density plots show the normal nature of the data; therefore, it is possible to use t-test to compare two sample means and make the conclusion about purported private/public discrepancy in FSA performance. Otherwize we would have used Mann–Whitney U test, for it does not require the samples to be normally distributed.
Null hypothesis: sample mean for percent of population that met or exceeded expectations on FSA test are equal for public and private schools in B.C.
dataset_test <- dataset[which(dataset$Fsa.Skill.Code == "Writing" & dataset$Grade == "4"),]
L=dataset_test$Public.Or.Independent == "BC Public School"
data.public = dataset_test[L,]$Percent.Meet.Or.Exceed
data.private = dataset_test[!L,]$Percent.Meet.Or.Exceed
t.test(data.private,data.public)
##
## Welch Two Sample t-test
##
## data: data.private and data.public
## t = 29.2, df = 1357, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 15.31 17.51
## sample estimates:
## mean of x mean of y
## 86.34 69.93
Because the p-value is far less then 𝛼 = 0.05, there is sufficient evidence to reject the null hypothesis.
Null hypothesis: sample mean for percent of population that met or exceeded expectations on FSA test are equal for public and private schools in B.C.
##
## Welch Two Sample t-test
##
## data: data.private and data.public
## t = 29.09, df = 1311, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 17.01 19.48
## sample estimates:
## mean of x mean of y
## 83.42 65.18
Because the p-value is far less then 𝛼 = 0.05, there is sufficient evidence to reject the null hypothesis.
Null hypothesis: sample mean for percent of population that met or exceeded expectations on FSA test are equal for public and private schools in B.C.
##
## Welch Two Sample t-test
##
## data: data.private and data.public
## t = 29.97, df = 1358, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 15.20 17.32
## sample estimates:
## mean of x mean of y
## 84.93 68.68
Because the p-value is far less then 𝛼 = 0.05, there is sufficient evidence to reject the null hypothesis.
Null hypothesis: sample mean for percent of population that met or exceeded expectations on FSA test are equal for public and private schools in B.C.
dataset_test <- dataset[which(dataset$Fsa.Skill.Code == "Writing" & dataset$Grade == "7"),]
L=dataset_test$Public.Or.Independent == "BC Public School"
data.public = dataset_test[L,]$Percent.Meet.Or.Exceed
data.private = dataset_test[!L,]$Percent.Meet.Or.Exceed
t.test(data.private,data.public)
##
## Welch Two Sample t-test
##
## data: data.private and data.public
## t = 30.39, df = 1503, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 16.04 18.25
## sample estimates:
## mean of x mean of y
## 87.83 70.68
Because the p-value is far less then 𝛼 = 0.05, there is sufficient evidence to reject the null hypothesis.
Null hypothesis: sample mean for percent of population that met or exceeded expectations on FSA test are equal for public and private schools in B.C.
##
## Welch Two Sample t-test
##
## data: data.private and data.public
## t = 30.82, df = 1381, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 18.99 21.57
## sample estimates:
## mean of x mean of y
## 83.26 62.98
Because the p-value is far less then aplha = 0.05, there is sufficient evidence to reject the null hypothesis.
Null hypothesis: sample mean for percent of population that met or exceeded expectations on FSA test are equal for public and private schools in B.C.
##
## Welch Two Sample t-test
##
## data: data.private and data.public
## t = 35.55, df = 1410, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 18.46 20.62
## sample estimates:
## mean of x mean of y
## 84.82 65.28
Because the p-value is far less then 𝛼 = 0.05, there is sufficient evidence to reject the null hypothesis.
Based on results of t-tests, there is sufficient evidence for private schools in B.C. to make claim that they have better FSA performance in every skill and in both Grade 4 and 7.