Project 3 University of Sydney | DATA1001 | November 2019
The aim of this report is to determine whether a student’s age and gender influence their grades. Using data from a junior Mathematics unit at the University of Sydney, it was determined that gender has no significant impact on a student’s grades, however the age bracket does, indicating a steady decline in the average mark with age, with a slight increase in students over 25.
The main clients for the data analysis are the University of Sydney, specifically the mathematics department and the Higher School Education of the government. USYD can use the data to determine whether they need to offer more help to different age brackets or genders. Furthermore, they can determine if the course needs any fine tuning, as well as how its being taught. The data may suggest that gender separated classes, or whether mature age students should have separate tutorials. For the education branch of the government, they can analyse the report determining if more of a push for students finishing school should go straight to university, or whether time off over a year(s) builds life experience and maturity improving a student’s grades. No ethical issues are contained in the data, as it is all ethically approved by the DVC (Education) Prof.Pip Pattison. The data had very few limitations, one of which being it is unknown whether the student had previously studied maths at school. This can have a drastic effect on the students final UoS.Mark, as having a base knowledge going into the course may give the student a higher increase of obtaining a higher mark, skewing the data.
library("tidyverse")
## -- Attaching packages ------------------------ tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts --------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library("knitr")
data=read.csv("/Users/liamr/Downloads/project3.csv")
View(data)
dim(data)
## [1] 10845 26
head(data)
## Fake.Student.Identifier Gender Age.at.Semester.Start
## 1 2171 M 19-21
## 2 2343 M 19-21
## 3 1385 F 19-21
## 4 1985 F 19-21
## 5 4038 M 19-21
## 6 3188 M 18 and under
## Domestic...international.status Mode.of.Study Semester
## 1 Domestic Full Time 1
## 2 International Full Time 2
## 3 Domestic Full Time 1
## 4 Domestic Full Time 1
## 5 International Full Time 1
## 6 Domestic Full Time 2
## Fake.UoS.Identifier UoS.Mark UoS.changes.before.Week.1
## 1 UNIT I 73 0
## 2 UNIT E 71 6
## 3 UNIT I 64 20
## 4 UNIT C 69 14
## 5 UNIT I 78 0
## 6 UNIT F 88 6
## UoS.changes.before.Week.2 UoS.changes.before.census.date
## 1 0 4
## 2 8 8
## 3 20 20
## 4 14 14
## 5 0 0
## 6 6 6
## Canvas.access.Week.1 Canvas.access.Week.2 Canvas.access.Week.3
## 1 1 1 1
## 2 1 1 1
## 3 1 1 1
## 4 1 1 1
## 5 1 1 1
## 6 1 1 1
## Canvas.access.Week.4 Canvas.access.Week.5 Canvas.access.Week.6
## 1 1 1 1
## 2 1 1 1
## 3 1 1 1
## 4 1 1 1
## 5 1 1 1
## 6 1 1 1
## Canvas.access.Week.7 Canvas.access.Week.8 Canvas.access.Week.9
## 1 1 1 1
## 2 1 1 0
## 3 1 1 1
## 4 1 1 1
## 5 1 1 1
## 6 1 1 0
## Canvas.access.Week.10 Canvas.access.Week.11 Canvas.access.Week.12
## 1 1 1 1
## 2 1 1 1
## 3 1 1 1
## 4 1 1 1
## 5 1 1 1
## 6 1 1 1
## Canvas.access.Week.13 Canvas.access.Mid.semester.break
## 1 1 0
## 2 0 1
## 3 1 1
## 4 1 1
## 5 1 1
## 6 1 1
## Canvas.access.STUVAC
## 1 0
## 2 1
## 3 1
## 4 1
## 5 1
## 6 1
Mark=data$UoS.Mark
Age=data$Age.at.Semester.Start
data$Age.at.Semester.Start <- as.factor(data$Age.at.Semester.Start)
data$Gender <- as.factor(data$Gender)
data$Domestic...international.status <- as.factor(data$Domestic...international.status)
data$Semester <-as.factor(data$Semester)
ggplot(data, aes(x = Gender,y= UoS.Mark, fill = Age.at.Semester.Start)) + geom_boxplot() + theme_bw() +labs( title = "Graphical Analysis of the UoS Marks by Age Range and Gender")
From the ggplot above it is very easy to determine whether a student’s age or gender have an effect on their mark. The box and whisker plot indicate that gender has little to no effect on a student’s grade. This data contradicts much of the literature, highlighting that in science, technology, engineering and mathematics females tend to outperform males. Female grades have slightly better averages and a much more constant across their studies. Interestingly however, in the top 10% of the subject, males tend to dominate top scoring grades ( O’Dea, 2018). On the other hand, however the box and whisker plot clearly highlight’s that a student’s age bracket has an effect on their UoS.Mark. The graph indicates that regardless of sex, students in the 18 and under category have a higher mean mark, with a steady decreasing trend as the age increases. This data disagrees with the research which indicates that on average older students obtain higher grades than younger students. (Birch & Miller, 2005). Borg suggests that for every one-year increase in a student’s age the average mark at university increases by 2-4% (Borg, 1989). The research may explain the slightly higher mean mark of students over 25, compared to 22-25 years of age. With more data in the higher age brackets this trend may continue and concur with the research yet is unknown at this point. The graph shows a significant number of outliers at first sight, yet when compared to the total number of data entries is insignificant in proportion.
O’Dea, et.al. (2018). Gender differences in individual variation in academic grades fail to fit expected patterns for STEM. Nature Communications, vol.9
Birch,E & Miller,P. (2005). The Determinants of Students’ Tertiary Academic Success. Business School, University of Western Australia, pp.1-43
Borg, M., Mason, P. and Shapiro, S. (1989). The case of effort variables in student performance, Journal of Economic Education, Vol. 20 (3), pp. 308-313.