MATH 1324: Assignment 2

Evaluation of Students density/ ratio of government and catholic schools across Victoria LGA’s, 2022

Amit Singh s3868685

Last updated: 29 May, 2023

Data Cont.

##Preparation of data sets and adding columns for ratio/ density ie. to calculate No. of students per school
school_ratio <- schools1 %>%  add_column(Gov_Schools_Ratio = (schools1$Gov_Schools_Enrols/ schools1$Gov_Schools_No.),.after = "Gov_Schools_No.") 
school_ratio1 <- school_ratio %>%  add_column(Catholic_Schools_Ratio = (school_ratio$Catholic_Schools_Enrols/ school_ratio$Catholic_Schools_No.),.after = "Catholic_Schools_No.") 
school_ratio2 <- school_ratio1 %>%  add_column(Indep_Schools_Ratio = (school_ratio1$Indep_Schools_Enrols/ school_ratio1$Indep_Schools_No.),.after = "Indep_Schools_No.") 
school_ratio3 <- school_ratio2 %>%  add_column(Total_Schools_Ratio = (school_ratio2$Total_Schools_Enrols/ school_ratio2$Total_Schools_No.),.after = "Total_Schools_No.") 

# identify NAs in full data frame and imputing them by 0
school_ratio3[is.na(school_ratio3)] <- 0
#Ranking the LGA's as per descending order of Victoria LGA
school_ratio_rank <- school_ratio3 %>%  add_column(Total_Schools_No._Rank = rank(desc(school_ratio3$Total_Schools_No.),ties.method = "min"),.before =  "LGA_name") 
#Reordering of columns for likewise categories
school_ratio_rank2 <- school_ratio_rank[, c(1,2,13,4,7,10,12,3,6,9,14,5,8,11)]

#Subsetting dataset
school_ratio_rank4 <- school_ratio_rank2[, c(1,2,11,12,13,14)]
school_ratio_rank4_long <- gather(school_ratio_rank4, key = "Variable", value = "Students_Ratio", Total_Schools_Ratio:Indep_Schools_Ratio)
#Conversion of data type
school_ratio_rank4_long$Variable <- as.factor(school_ratio_rank4_long$Variable)
school_ratio_rank4_long$Rank_labels <- c(rank(-school_ratio_rank4$Total_Schools_Ratio,ties.method = "min"),
                                         rank(-school_ratio_rank4$Gov_Schools_Ratio,ties.method = "min"), rank(-school_ratio_rank4$Catholic_Schools_Ratio,ties.method = "min"),
rank(-school_ratio_rank4$Indep_Schools_Ratio,ties.method = "min"))
school_ratio_rank4_long$Variable <- factor(school_ratio_rank4_long$Variable,levels=c("Total_Schools_Ratio","Gov_Schools_Ratio","Catholic_Schools_Ratio","Indep_Schools_Ratio"),labels = c("Average of Total Schools","Government Schools","Catholic Schools", "Independent Schools"))

Descriptive Statistics and Visualisation

#Interactive visualization for KPI across all 80 Victorian LGA's, 2022 
p1 <- plot_ly(data = school_ratio_rank4_long, x = ~Students_Ratio, y= ~LGA_name,size = ~Students_Ratio , alpha = 2, symbol = ~Variable, color = ~Variable, type = "scatter", mode = "markers", colors= c("#E7298A","#D95F02","#1B9E77","#7570B3")) %>% layout(title ="FTE Students Ratio (categorical), Victoria LGA - 2022",xaxis = list(zeroline = FALSE, title = "Ratio (No. of FTE Students per school)"), yaxis = list(zeroline = FALSE, title = "Victoria LGA"))
p1

Descriptive Statistics and Visualisation Cont.

#Rearranging dataset for statistical testing of government and catholic schools student ratios
school_test <- school_ratio_rank4[, c(4,5)]
school_test_long <- gather(school_test, key = "School_Type", value = "Students_Ratio", Gov_Schools_Ratio:Catholic_Schools_Ratio)

#Rounding off the ratios and assigning factors
school_test_long$Students_Ratio <- round(school_test_long$Students_Ratio,2)
school_test_long$School_Type <- factor(school_test_long$School_Type,labels = c("Catholic","Government"))

#Summarizing of test dataset for statistical analysis
school_test_long %>% group_by(School_Type) %>% summarise(Min = min(Students_Ratio,na.rm = TRUE),
                          Q1 = quantile(Students_Ratio,probs = .25,na.rm = TRUE),Median = median(Students_Ratio, na.rm = TRUE),Q3 = quantile(Students_Ratio,probs = .75,na.rm = TRUE),Max = max(Students_Ratio,na.rm = TRUE),
                          Mean = mean(Students_Ratio, na.rm = TRUE),SD = sd(Students_Ratio, na.rm = TRUE),n = n(),Missing = sum(is.na(Students_Ratio)))-> table1
knitr::kable(table1)
School_Type Min Q1 Median Q3 Max Mean SD n Missing
Catholic 0 165.2500 335.115 461.6375 711.40 331.3274 193.2953 80 0
Government 10 143.8375 310.705 499.5975 1039.89 336.0165 205.7019 80 0

Descriptive Statistics and Visualisation Cont.

#Statistically visualizing using box plot
school_test_long %>% boxplot(Students_Ratio ~ School_Type, data = ., ylab = "Students_Ratio")

Descriptive Statistics and Visualisation Cont.

#Student ratio distributions for catholic schools using the qqPlot() function.
school_Catholic <- school_test_long %>% filter(School_Type == "Catholic")
school_Catholic$Students_Ratio %>% qqPlot(dist="norm")

## [1] 47 64

Descriptive Statistics and Visualisation Cont.

#Student ratio distributions for government schools using the qqPlot() function.
school_Government <- school_test_long %>% filter(School_Type == "Government")
school_Government$Students_Ratio %>% qqPlot(dist="norm")

## [1] 77 45

Descriptive Statistics and Visualisation Cont.

Hypothesis Testing

leveneTest(Students_Ratio ~ School_Type, data = school_test_long)

Hypothesis Testing Cont.

#Two-sample t-test - Assuming Equal Variance
t.test(Students_Ratio ~ School_Type,
  data = school_test_long,
  var.equal = TRUE,
  alternative = "two.sided")
## 
##  Two Sample t-test
## 
## data:  Students_Ratio by School_Type
## t = -0.14858, df = 158, p-value = 0.8821
## alternative hypothesis: true difference in means between group Catholic and group Government is not equal to 0
## 95 percent confidence interval:
##  -67.02056  57.64231
## sample estimates:
##   mean in group Catholic mean in group Government 
##                 331.3274                 336.0165
qt(p = 0.25, df = 80 + 80 - 2)
## [1] -0.6760457

Hypothesis Testing Cont.

Critical value approach

As the test statistic t from the two-sample t-test assuming equal variance, is t=−0.148, which is NOT more extreme than −0.676, we FAIL to reject H0. According to the critical value method, there was NOT statistically significant difference between catholic and government schools student ratio means.

Using p-value method

The p-value of the two-sample t-test tell us the probability of observing a sample difference between the means of -4.6891, or one more extreme, assuming the difference was 0 in the population (i.e. H0 is true). The two-tailed p-value is reported to be p=.882. According to the p-value method, as p=.882 > α = 0.05, we FAIL to reject H0. Therefore, there was NOT statistically significant difference observed between the means of catholic and government schools student ratio.

Confidence Interval approach

The 95% CI of the difference between the means (- 4.689) was calculated ie. H0: μ = -4.689 Since, 95% CI [-67.02 57.64] interval does capture H0, we FAIL to reject it. Once again, there was NOT a significant difference between the means of student ratios of catholic and government schools.

Discussion

References

[1] Data Reference: Statistics on Victorian schools. 21/05/2023. Retrieved May 21, 2023 from Statistics on Victorian schools and teaching website https://www.vic.gov.au/statistics-victorian-schools-and-teaching

[2] Falling enrolments put Catholic schools at risk of closure. Cath News article. Accessed 10 May, 2023. https://cathnews.com/2023/05/10/falling-enrolments-put-catholic-schools-at-risk-of-closure/