Gap Analysis

The purpose of this paper is to determine whether or not private schools outperform public schools on the Foundation Skills Assessment (FSA) tests. In order to answer this question we collected and processed data from the provincial Ministry of Education web-site and conducted a two sample t-test to prove or reject hypothesis that FSA results are the same for both public and private schools in B.C.

Retrieving Data

First, we found web-page «Provincial Reports» http://www.bced.gov.bc.ca/reporting/province.php containing the links to Achievements Reports including Foundation Skills Assessment. We found that reports on this page contains only the summary of FSA tests and information in those documents is not enough to conduct a statistical analysis. We followed the link on Provincial Reports webpage to web-site DataBC that store historical data provided by the Ministry of Education. (http://www.data.gov.bc.ca/dbc/catalogue/detail.page?config=dbc&P110=recorduid:178244&recorduid=178244&title=BC%20Schools%20-%20Foundation%20Skills%20Assessment%20(FSA)%202007-2013) Here we found document with the results of the Grades 4 and 7 BC Foundation Skills Assessments in Numeracy, Reading and Writing from 2007 to 2013. This data is available in spreadsheet and delimited text format; and the size of the documents are 27Mb for spreadsheet and 44Mb for plain text. Excel does not work so well with large documents; therefore, as an instrument for statistical analysis we chose CRAN-R software environment and programming language.

Reading data to a new dataset directly from DataBC web-site.

#dataset <- read.delim("http://www.bced.gov.bc.ca/reporting/odefiles/Foundation_Skills_Assessment_2012-2013.txt", header = TRUE, as.is = TRUE)

dataset <- read.delim("Foundation_Skills_Assessment_2012-2013.txt", header = TRUE, as.is = TRUE)

Processing data

Selecting values SCHOOL LEVEL and ALL STUDENTS in the fields Data Level and Sub Population

L=dataset$Sub.Population == "ALL STUDENTS"
dataset <- dataset[L,]

L=dataset$Data.Level == "SCHOOL LEVEL"
dataset <- dataset[L,]

Changing all entries that have «Msk» or «—» values to NA;
Removing all all values with value «PROVINCE - TOTAL» in the «Public.Or.Independent» field;
Removing all empty entries in the field Score.

dataset[dataset == "Msk" | dataset =="-" | dataset =="PROVINCE - TOTAL"] = NA
dataset <- dataset[!(is.na(dataset$Score) | dataset$Score==""| dataset$Score==0 | is.na(dataset$Public.Or.Independent)), ]

Treat Percent.Meet.Or.Exceed and Score fields as numeric

dataset$Score <- as.numeric(dataset$Score)
dataset$Percent.Meet.Or.Exceed <- as.numeric(dataset$Percent.Meet.Or.Exceed)

Analysing data

We decided to conduct t-tests for two sample mean for the following categories: * Grade 4 * Writing * Reading * Math * Grade 7 * Writing * Reading * Math

Here we need to write why we decided to test grade 4 and 7 separately and why we think that “Private schools might show better results in Numeracy compared to that of public schools and worse results for Reading.” (Kate) Also, we need to say why we’ve chosen to analyse Percent Meet or Exceed over Score

Histogram and density plots.

Overlaid Histograms and Density Plots with Means for Grade 4 Writing:

plot1 <- ggplot(dataset_test, 
                aes(x=Percent.Meet.Or.Exceed, fill=Public.Or.Independent)) + 
  geom_histogram(binwidth = 10, colour = "black", alpha=.5, position="stack") +
  geom_vline(data=cdf, aes(xintercept=Score.mean), linetype="dashed", size=1) +
  ggtitle ("Grade 4. Writing. Overlaid histograms")

plot2 <- ggplot(dataset_test, 
                aes(x=Percent.Meet.Or.Exceed)) + 
  geom_density(aes(fill = Public.Or.Independent), alpha=0.7) +  
  geom_vline(data=cdf, aes(xintercept=Score.mean),
             linetype="dashed", size=1) + 
  ggtitle ("Grade 4. Writing. Density plots with means")

multiplot(plot1, plot2, cols=1)

Overlaid Histograms and Density Plots with Means for Grade 4 Numeracy:

Overlaid Histograms and Density Plots with Means for Grade 4 Reading: plot of chunk unnamed-chunk-12

Overlaid Histograms and Density Plots with Means for Grade 7 Writing:

plot1 <- ggplot(dataset_test, 
                aes(x=Percent.Meet.Or.Exceed, fill=Public.Or.Independent)) + 
  geom_histogram(binwidth = 10, colour = "black", alpha=.5, position="stack") +
  geom_vline(data=cdf, aes(xintercept=Score.mean), linetype="dashed", size=1) +
  ggtitle ("Grade 7. Writing. Overlaid histograms")

plot2 <- ggplot(dataset_test, 
                aes(x=Percent.Meet.Or.Exceed)) + 
  geom_density(aes(fill = Public.Or.Independent), alpha=0.7) +  
  geom_vline(data=cdf, aes(xintercept=Score.mean),
             linetype="dashed", size=1) + 
  ggtitle ("Grade 7. Writing. Density plots with means")

multiplot(plot1, plot2, cols=1)

plot of chunk unnamed-chunk-14

Overlaid Histograms and Density Plots with Means for Grade 7 Numeracy: plot of chunk unnamed-chunk-16

Overlaid Histograms and Density Plots with Means for Grade 7 Reading: plot of chunk unnamed-chunk-18

Histograms and density plots show the normal nature of the data; therefore, it is possible to use t-test to compare two sample means and make the conclusion about purported private/public discrepancy in FSA performance. Otherwize we would have used Mann–Whitney U test, for it does not require the samples to be normally distributed.