library(readxl)
dataset <- read_excel("~/Desktop/dataset.xls")
The purpose of this report is to provide university officials, especially those involved in structuring and re-structuring units of study, with an accurate and thorough understanding of the achievements of students in 14 first year maths units. The stakeholders of this report are the aforementioned university officials and administration and this report is designed to provide insight into what areas can be improved upon, and which categories of students may require additional assistance when completing these units of study.
The dataset used in this report has been collected from the University of Sydney between 2012-2017 and contains 64,486 student grades. The grades have been made anonymous to honour confidentiality. The data has been provided by the Institutional Analytics and Planning Department for use in this report.
There are 9 variables in this dataset. The following code addresses the dimensions of the dataset, and reveals the names of the variables used in this report.
dim(dataset)
## [1] 5652 9
names(dataset)
## [1] "Year" "Domestic/Intl" "Gender"
## [4] "Mode" "Age" "Unit of Study"
## [7] "Unit of Study Level" "Unit of Study Grade" "Count"
This report aims to answer 6 research questions that focus on aspects of study associated with the grades of students and how these aspects affect the grades. An example of this is how age affects the grades achieved. According to an article in The Australian written by Darragh O’Keefe in March 2017, “obtaining a qualification after the age of 24 is associated with wage increases of 9% for men, and 7% for women according to the study by Francisco Perales, a senior research fellow at the University of Queensland and Jenny Chesters, a research fellow at the University of Melbourne” (O’Keefe, 2017). Therefore, it is expected that the number of mature-aged students will be significant. However, this report aims to provide insight into whether or not mature aged students are achieving better, or the same results as younger students.
What proportion of students included in the dataset passed their unit of study?
On average, did male or female students perform better?
Do domestic students have an advantage over international students based on the dataset? If so, why may this be the case?
Has there been a positive change in student performance over time?
Does age have an influence on student results? Do mature-aged students have an advantage?
Do full-time students achieve higher grades than part-time students and if so, how can this be combated to allow for better opportunities for part-time students?
According to the data, 4537 out of 5927 students passed their unit of study. The following barplot represents the distribution of grades.
grade <- c(1115, 1390, 1273, 1103, 771)
cols1 <- c("light pink", "light blue", "lavender", "light green", "light yellow")
table(dataset$`Unit of Study Grade`)
##
## CR DI FA HD PS
## 1273 1103 1115 771 1390
barplot((grade), names.arg = c ("Fail", "Pass", "Credit", "Distinction", "High Distinction"), col = cols1, ylim=c(0,1500))
Therefore, the proportion of students who achieved a passing grade or above in the course is 0.765, and the proportion of those who failed the course is 0.235. The following box model test has produced a histogram of the totals of 5927 draws from the box.
set.seed(1)
box = c(0, 1)
totals = replicate(5927, sum(sample(box, 5927, prob = c(0.235, 0.765), rep = T)))
hist(totals)
According to an article written by journalist Liz Burke, and posted on news.com.au, “1 in 5 commencing bachelor students left their original course in 2014, and about 2015 dropped out completely” (Liz Burke, 2016). When this data is applied to the results in this report, it could provide reason for such high failing grades. Perhaps, students are not enjoying the unit of study and missed the census date, or failed to submit assignments. Almost 1/4 of students in this report received a failing grade, which is marginally higher than the results reported by Liz Burke based on a study conducted in 2014. The reason the failing result is higher could be because the results in this report were collected over a period of 5 years, and if the rate of failing and dropping out is continuing to rise as Burke reported in 2014, it would be logical that they continued to rise after publication.
According to The Atlantic, female students are more likely to achieve better results: “This finding is reflected in a recent study by psychology professors Daniel and Susan Voyer at the University of New Brunswick. The Voyers based their results on a meta-analysis of 369 studies involving the academic grades of over one million people from 30 different nations. The findings are unquestionably robust: girls earn higher grades in every subject.” (Gnaulati, 2014). This report will compare the results of the Voyer’s study and the data collected for this report.
The data shows that 2665 female students, and 2987 male students participated in the units of study relevant to this report in 2012-2017.
table(dataset$Gender)
##
## F M
## 2665 2987
cols <- c("light pink", "light blue")
barplot(table(dataset$Gender), names.arg = c ("Female", "Male"), col = cols, main = "Gender of Students", xlab="Gender", ylab="Number of Students")
These graphical summaries show that on average, male students achieved higher marks in their units of study.
col2 <- c("light pink", "light blue")
table(dataset$Gender, dataset$`Unit of Study Grade`)
##
## CR DI FA HD PS
## F 614 507 518 358 668
## M 659 596 597 413 722
barplot(table(dataset$Gender, dataset$`Unit of Study Grade`), main = "Female vs Male Results", ylab ="Number of Students", xlab="Grade Received", beside= TRUE, col = col2)
Therefore, the results of this report conflict with the findings of the Voyers’ study. The results indicate that whilst there are more men enrolled in these courses, the distribution of grades is even across both genders.
We can test the hypothesis that gender is independent of the grades received.
library(MASS)
tbl = table(dataset$Gender, dataset$`Unit of Study Grade`)
tbl
##
## CR DI FA HD PS
## F 614 507 518 358 668
## M 659 596 597 413 722
chisq.test(tbl)
##
## Pearson's Chi-squared test
##
## data: tbl
## X-squared = 2.0527, df = 4, p-value = 0.7261
The p value is higher than the 0.05 significance level at 0.7261, which means that we do not reject the hypothesis that gender is independent from grades received (which is reflected in the earlier results). ##3. Do domestic students have an advantage over international students based on the dataset? If so, why may this be the case?
The following data shows that more domestic students were achieving higher grades. However, this is not scaled to a percentage, so it does not accurately show if there is an advantage for domestic students.
col3 <- c("lavender", "light green")
counts <- table(dataset$`Domestic/Intl`, dataset$`Unit of Study Grade`)
table(dataset$`Domestic/Intl`, dataset$`Unit of Study Grade`)
##
## CR DI FA HD PS
## D 866 727 794 499 928
## I 407 376 321 272 462
barplot(table(dataset$`Domestic/Intl`, dataset$`Unit of Study Grade`), col = col3, main = "International Students vs Domestic Students", ylab = "Grade Received", legend = rownames(counts), beside = TRUE, ylim=c(0,1000))
According to the data, there were 1838 international students and 3814 domestic students.
table(dataset$`Domestic/Intl`)
##
## D I
## 3814 1838
slices <- c(866, 727, 794, 499, 928)
lbls <- c("Credit", "Distinction", "Fail", "High Distinction", "Pass")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct)
lbls <- paste(lbls, "%", sep= "")
pie(slices, labels =lbls, col=rainbow(length(lbls)), main = "Domestic Student Results")
slices <- c(407, 376, 321, 272, 462)
lbls <- c("Credit", "Distinction", "Fail", "High Distinction", "Pass")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct)
lbls <- paste(lbls, "%", sep= "")
pie(slices, labels =lbls, col=rainbow(length(lbls)), main = "International Student Results")
These pie charts show that 17% of international students achieved a failing grade, whilst 21% of domestic students failed the unit. Furthermore, only 24% of domestic students achieved a passing grade, whilst 25% of international students achieved that grade. However, 23% of domestic students achieved a credit, whilst only 22% of international students achieved that result. 1% more international students achieved a distinction than domestic students. Finally, 15% of international students achieved a high distinction, whilst only 13% of domestic students achieved that result.
Therefore, these results show that on average, international students were achieving higher grades than domestic students at the allocated units at The University of Sydney from 2012-2017. This proves that domestic students do not have an advantage over international students. Further research should be conducted in this area, using variables such as country of origin, employment rates, employment hours, and age. This would help to understand why international students are performing better than domestic students. Both categories should be surveyed in this further research.
The following plot shows that a similar number of students achieved a high distinction throughout the years, however, a higher number of students failed the unit over time. This could potentially be due to a higher number of students initially enrolling into the unit each year. However, the number of students achieving distinctions has been steadily increasing throughout the time period.
col4 <- c("light blue", "light green", "light pink", "lavender", "light yellow")
counts2 <- table(dataset$`Unit of Study Grade`, dataset$Year)
table(dataset$`Unit of Study Grade`, dataset$Year)
##
## 2012 2013 2014 2015 2016 2017
## CR 206 186 214 215 230 222
## DI 146 156 178 196 206 221
## FA 176 182 184 196 185 192
## HD 96 107 131 156 140 141
## PS 239 221 232 222 239 237
barplot(table(dataset$`Unit of Study Grade`, dataset$Year), ylim =c(0,500), main = "Student Performance Over Time", beside = TRUE, col = col4 , legend = rownames(counts2), ylab ="Number of Students", xlab = "Year")
Therefore, student performance has stayed the same with very little variance over time, with the distinction grade as an exception. In conclusion, this could very well be due to the bell-curve grading system. Further research may be conducted to discover the way students are graded and whether or not an actual increase in quality of work is present. This could be conducted by cross-referencing and moderating work from different years that received the same grade.
However, in 2014, 184 students failed their unit of study. In the aforementioned article by Liz Burke, it was reported that one fifth of students were failing. The proportion of students that failed in 2014 in the data used for this report is 0.1959. This is extremely close to the data reported by Burke, and therefore, the data in this report is on trend with the national average.
The following pie graph shows the percentage of each age range of students. It shows that only 17% of students were over 25 during 2012-2017.
table(dataset$Age)
##
## 18 and under 19-21 22-25 Over 25
## 1248 2047 1414 943
slices <- c(1248, 2047, 1414, 943)
lbls <- c("18 And Under", "19-21", "22-25", "Over 25")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct)
lbls <- paste(lbls, "%", sep= "")
pie(slices, labels =lbls, col=rainbow(length(lbls)), main = "Age of Students")
col5 <-c("lavender", "light pink", "light blue", "light green", "yellow")
counts3 <- table(dataset$Age, dataset$`Unit of Study Grade`)
table(dataset$Age, dataset$`Unit of Study Grade`)
##
## CR DI FA HD PS
## 18 and under 265 262 232 210 279
## 19-21 437 400 401 348 461
## 22-25 330 285 293 136 370
## Over 25 241 156 189 77 280
barplot(table(dataset$Age, dataset$`Unit of Study Grade`), beside =TRUE, legend= rownames(counts3), ylim=c(0,800), col= col5, ylab="Number of Students", xlab ="Student Results", main="Impact of Age on Student Performance")
The above table and barplot would indicate that students 18 and under received worse grades than those in older age ranges. However, this is not scaled to accommodate for percentages. To assess this further, the following pie graphs show the performance of two age groups: 18 and under, and over 25.
slices <- c(265, 262, 232, 210, 279)
lbls <- c("Credit","Distinction", "Fail", "High Distinction", "Pass")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct)
lbls <- paste(lbls, "%", sep= "")
pie(slices, labels =lbls, col=rainbow(length(lbls)), main = "Results of Students Aged 18 and Under")
slices <- c(241, 156, 189, 77, 280)
lbls <- c("Credit","Distinction", "Fail", "High Distinction", "Pass")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct)
lbls <- paste(lbls, "%", sep= "")
pie(slices, labels =lbls, col=rainbow(length(lbls)), main = "Results of Students Aged 25 and Over")
The results of students 18 and under are more evenly distributed across the grading system than mature aged students. A higher percentage of students aged 18 and under received a high distinction. However, it is important to note that there was very little variance between the fail rates of students aged 18 and under and mature aged students. This indicates that there is very little impact of age on student performance aside from the 11% difference in high distinctions. This could be due to mature aged students needing to work more often than students aged 18 and below.
We can conduct a chi-squared test to discover if age is independent of grades.
library(MASS)
tbl = table(dataset$Age, dataset$`Unit of Study Grade`)
tbl
##
## CR DI FA HD PS
## 18 and under 265 262 232 210 279
## 19-21 437 400 401 348 461
## 22-25 330 285 293 136 370
## Over 25 241 156 189 77 280
chisq.test(tbl)
##
## Pearson's Chi-squared test
##
## data: tbl
## X-squared = 95.237, df = 12, p-value = 4.745e-15
The p-value, according to this test is 0.7261, which is significantly higher than the 0.05 significance level, which means that the data will not allow us to reject the hypothesis that age is independent of age.
In conclusion, mature aged students do not have an advantage, and the results are not varied enough to conclude a disadvantage either. However, the chi-square test showed that age and grades are independent of each other according to this data.Therefore, further research is encouraged in order to determine a solid conclusion on whether age and maturity has an impact on university studies. If further research is to be conducted, variables such as age, employment, employment hours, maturity level and family circumstances (e.g. mature aged students may have children).
The following data shows that there were 4310 full time students and 1342 part time students.
col5 <-c("lavender", "light pink")
table(dataset$Mode)
##
## Full time Part time
## 4310 1342
barplot(table(dataset$Mode), ylim=c(0,4500), col= col5, main= "Mode of Study", ylab = "Number of Students", xlab="Mode" )
table(dataset$Mode, dataset$`Unit of Study Grade`)
##
## CR DI FA HD PS
## Full time 957 881 815 671 986
## Part time 316 222 300 100 404
barplot(table(dataset$Mode, dataset$`Unit of Study Grade`), beside = TRUE, main = "Grades vs Mode of Study", ylim= c(0,1000), col=col5)
slices <- c(957, 881, 815, 671, 986)
lbls <- c("Credit","Distinction", "Fail", "High Distinction", "Pass")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct)
lbls <- paste(lbls, "%", sep= "")
pie(slices, labels =lbls, col=rainbow(length(lbls)), main = "Results of Full Time Students")
slices <- c(316, 222, 300, 100, 404)
lbls <- c("Credit","Distinction", "Fail", "High Distinction", "Pass")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct)
lbls <- paste(lbls, "%", sep= "")
pie(slices, labels =lbls, col=rainbow(length(lbls)), main = "Results of Part Time Students")
These results show that part time students have an incrementally higher chance of failing the unit than full time students. However, the difference in proportion is not of much significance. A much more interesting statistic is that a much higher percentage of full-time students received a high distinction than part-time students. This could be due to issues with concentration when completing the unit at a part-time rate. Therefore, further engagement and study plans should be made accessible for part-time students to ensure they complete the unit to the best of their ability.
A chi-square test was conducted to test the dependence between the mode of study, and the result achieved.
library(MASS)
tbl = table(dataset$`Unit of Study Grade`, dataset$Mode)
tbl
##
## Full time Part time
## CR 957 316
## DI 881 222
## FA 815 300
## HD 671 100
## PS 986 404
chisq.test(tbl)
##
## Pearson's Chi-squared test
##
## data: tbl
## X-squared = 86.107, df = 4, p-value < 2.2e-16
The result of this test was staggering, with a p-value of less than 2.2e-16. This shows that grades are dependent on the mode of study. Further research should be conducted to find out exactly the reason why these statistics are what they are and include variables such as contact hours, work hours, work-life balance, and age.
To conclude this report, it should firstly be noted that more research should be conducted with more variables. As aforementioned, the possible further research questions are: 1. Why are international students performing better than domestic students? 2. How does maturity (rather than age) impact results? 3. Why are full-time students significantly more likely to receive high distinctions than part-time students?
However, it was concluded that: - The total proportion of students that passed the unit was 0.765. Therefore, approximately 1/4 of students failed the units. - There were more male students enrolled in the units, but the distribution of grades is almost even amongst both genders. - International students, on average, performed better in these units than domestic students. - Student performance has largely remained the same over time, with the exception of the number of students achieving distinctions which rose throughout the data period. - There was very little difference in the failing rate between mature aged students and students 18 and under, however, students aged 18 and under were significantly more likely to be awarded a high distinction. - Part-time students were incrementally more likely to fail units than full-time students, but more interestingly, full-time students had a much higher chance of receiving a high distinction.
Burke, L. (2018). Why are uni students dropping out?. [online] News. Available at: https://www.news.com.au/finance/work/careers/university-attrition-rates-why-are-so-many-students-dropping-out/news-story/3e491dd119e1249a5a3763ef8010f8b5 [Accessed 28 May 2018].
Gnaulati, E. (2018). Why Girls Tend to Get Better Grades Than Boys Do. [online] The Atlantic. Available at: https://www.theatlantic.com/education/archive/2014/09/why-girls-get-better-grades-than-boys-do/380318/ [Accessed 1 Jun. 2018].
O’Keefe, D. (2018). University Degree Pays Off For Older Students. [online] The Australian. Available at: https://www.theaustralian.com.au/higher-education/university-degree-pays-off-for-older-students/news-story/0336a7609ed6258cd92480e2e412c6aa [Accessed 1 Jun. 2018].