With each passing year, the importance of attending university in modern-day society has increased proportional to the advancement of technology. This fast-paced world created a need for more graduates in the field of STEM (science, technology, engineering, medicine.) But more than that, the fact that computers are making more ‘human’ jobs redundant, the need for higher level of maths training is becoming more important. Because of this, a mathematical background becomes essential to create an edge for individuals in the working world.
The people who are affected by the results of this study are university students in general as it gives a trend, which is not exclusive to one faculty. However, we have decided that the main stakeholders are prospective mathematics students. This information present in this data can help inform future applicants of course statistics; and while it may not fully disclose each unit of study in detail, it gives a fair level of representation of the abilities of the student body. This in combination with an understanding of one’s capabilities and limits can be a powerful tool in determining the right combination of classes.
library(tidyverse)
library(stringr)
library(gghighlight)
unit.ad <- read.csv("Student_data_advanced.csv")
unit.ad$Unit.of.Study.Grade <- factor(unit.ad$Unit.of.Study.Grade, levels = c("HD","DI", "CR", "PS", "FA"))
unit.ad$Unit.of.Study.Level <- factor(unit.ad$Unit.of.Study.Level, levels = c("Mainstream", "Fundamental", "Advanced"))unit.ad %>% head()## Student.Identifier Year Domestic.Intl Gender Mode Age
## 1 289703333 2014 I F Part time 19-21
## 2 188174541 2017 D M Full time 19-21
## 3 41036478 2016 D F Full time 19-21
## 4 29921215 2014 D F Full time 19-21
## 5 198560236 2013 D F Full time 19-21
## 6 268800920 2012 D F Full time 19-21
## Unit.of.Study Unit.of.Study.Level Unit.of.Study.Grade
## 1 Unit B Mainstream CR
## 2 Unit E Mainstream DI
## 3 Unit A Mainstream DI
## 4 Unit E Mainstream PS
## 5 Unit C Mainstream PS
## 6 Unit B Mainstream PS
This data set contains 64,486 anonymised student grades for 14 first year Maths units of study from 2012 to 2017. The data provided is real data from the University’s student system and has been provided by Institutional Analytics and Planning, the department of the University responsible for student data reporting and analysis.
unit.ad %>%
ggplot(aes(Unit.of.Study, fill = Unit.of.Study.Grade)) +
geom_bar() +
theme(axis.text.x = element_text(angle = 45, vjust = .5)) +
scale_fill_discrete(name="Grade",
breaks=c("HD", "DI", "CR", "PS", "FA"),
labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)",
"Pass (50-64)", "Fail (0-49)")) +
labs(x = "Unit of Study") +
facet_grid(~Unit.of.Study.Level, scales = "free")ggplot() +
geom_bar(data = unit.ad,
aes(Unit.of.Study), fill = "grey") +
geom_bar(data = unit.ad %>% filter(Unit.of.Study.Grade == "FA"),
aes(Unit.of.Study), fill = "red") +
theme(axis.text.x = element_text(angle = 45, vjust = .5)) +
scale_fill_discrete(name="Grade",
breaks=c("CR", "DI", "FA", "HD", "PS"),
labels=c("Credit (65-74)", "Distinction (75-84)",
"Fail (0-49)", "High Distinction (85-100)", "Pass (50-64)")) +
labs(x = "Unit of Study", title = "How many students failed?") +
facet_grid(~Unit.of.Study.Level, scales = "free")unit.ad %>%
ggplot(aes(Unit.of.Study, fill = Unit.of.Study.Grade)) +
geom_bar(position = "fill") +
theme(axis.text.x = element_text(angle = 45, vjust = .5)) +
scale_fill_discrete(name="Grade",
breaks=c("HD", "DI", "CR", "PS", "FA"),
labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)",
"Pass (50-64)", "Fail (0-49)")) +
labs(x = "Unit of Study", y = "Percentage (%)") +
facet_grid(~Unit.of.Study.Level, scales = "free")unit.ad %>%
ggplot(aes(Unit.of.Study, fill = Unit.of.Study.Grade)) +
geom_bar(position = "fill", show.legend = F) +
theme(axis.text.x = element_text(angle = 45, vjust = .5)) +
scale_fill_manual(values = c("grey", "grey", "grey", "grey", "red"),
breaks=c("HD", "DI", "CR", "PS", "FA"),
labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)",
"Pass (50-64)", "Fail (0-49)")) +
labs(x = "Unit of Study", title = "How many students failed? (%)", y = "Percentage (%)") +
facet_grid(~Unit.of.Study.Level, scales = "free")It’s not surprising there are more students enrolled in the mainstream stream of maths given that maths is not everybody’s passion and many students have to study it simply as a requirement for their degree. What is interesting is that both mainstream and fundamental streams, which have a higher percentage of fails and passes, are generally quite even throughout all grades. Whereas the students enrolled in the advanced stream tend to get better marks, with more than 50% of students receiving a distinction or a high distinction. We can assume that it is these students in the advanced stream who are the ones with the real passion for maths, resulting in better understanding and—in turn—better grades.
unit.ad %>%
ggplot(aes(Mode, fill = Unit.of.Study.Grade)) +
geom_bar(position = "fill") +
scale_fill_discrete(name="Grade",
breaks=c("HD", "DI", "CR", "PS", "FA"),
labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)",
"Pass (50-64)", "Fail (0-49)")) +
labs(y = "Percentage (%)") +
facet_grid(~Unit.of.Study.Level, scales = "free")Throughout all study levels, part time students have more percentage of getting a fail. And one of the most significant thing is that, in advanced level, over 50% of students get over 75 scores. It is not surprising because, those are the most passionate maths student.
unit.ad %>%
ggplot(aes(Gender, fill = Unit.of.Study.Grade)) +
geom_bar() +
scale_fill_discrete(name="Grade",
breaks=c("HD", "DI", "CR", "PS", "FA"),
labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)",
"Pass (50-64)", "Fail (0-49)")) +
labs(y = "Percentage (%)") +
facet_grid(~Unit.of.Study.Level, scales = "free")unit.ad %>%
ggplot(aes(Gender, fill = Unit.of.Study.Grade)) +
geom_bar(position = "fill") +
scale_fill_discrete(name="Grade",
breaks=c("HD", "DI", "CR", "PS", "FA"),
labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)",
"Pass (50-64)", "Fail (0-49)")) +
labs(y = "Percentage (%)") +
facet_grid(~Unit.of.Study.Level, scales = "free")Those graphs show that
unit.ad %>%
ggplot(aes(Mode, fill = Unit.of.Study.Grade)) +
geom_bar(position = "fill", show.legend = F) +
scale_fill_manual(values = c("grey", "grey", "grey", "grey", "red"),
breaks=c("HD", "DI", "CR", "PS", "FA"),
labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)",
"Pass (50-64)", "Fail (0-49)")) +
labs(title = "Percentage of getting Fails", y = "Percentage (%)")As this graph shows, with the percentage of getting “Fails”, there is a difference between full time students and part time students. So we decided to test the difference between the 2 category. What we wanted to know was the difference of mean of Grade points. In order to calculate those numbers, we set class values to each students.
population <- unit.ad %>%
mutate(Grade.point =
case_when(
Unit.of.Study.Grade == "CR" ~ 70,
Unit.of.Study.Grade == "DI" ~ 80,
Unit.of.Study.Grade == "FA" ~ 25,
Unit.of.Study.Grade == "HD" ~ 93,
Unit.of.Study.Grade == "PS" ~ 57))
sample <- population %>%
filter(Mode == "Part time")
sample.size <-
sample %>% nrow()
sample.mean <-
sample$Grade.point %>% mean()ggplot() +
geom_histogram(data = population, aes(Grade.point), binwidth = 25, fill = "grey") +
geom_histogram(data = sample, aes(Grade.point), binwidth = 25, fill = "red") +
geom_vline(xintercept = 62.72698, colour = "black") +
geom_vline(xintercept = 56.47816, colour = "red") +
labs(title = "Difference of mean of grade score")population.size <-
population %>% nrow()
population.mean <-
population$Grade.point %>% mean()
population.sd <- population$Grade.point %>% sd()In this case, we consider the whole dataset as a population and part time students’ dataset as a sample.
mean(population$Grade.point)## [1] 62.72698
mean(sample$Grade.point)## [1] 56.47816
Z = (mean(sample$Grade.point) - mean(population$Grade.point)) / (sd(population$Grade.point) / sqrt(sample.size))
Z## [1] -17.34057
According to the distribution table, z value is -1.64.
Z = -17.34057 < -1.64
Z score exist outside the CI(Confidence Interval). Thus null hypothesis is rejected in this case. Based on statistical evidence, we can conclude that there is a difference between mean of part time students’ grade point and full time students’ one.