1 Introduction
- 1.1 Related Journal Article
2 Stakeholder
3 Research Questions
4 Preparation
5 Data content
6 Analysis
- 6.1 Difficulty difference
- 6.2 Grade differnce by mode
  - 6.2.1 Plots
7 Gender Difference
8 Z testing (5% Significance Level)
9 Conclusion

1 Introduction

With each passing year, the importance of attending university in modern-day society has increased proportional to the advancement of technology. This fast-paced world created a need for more graduates in the field of STEM (science, technology, engineering, medicine.) But more than that, the fact that computers are making more ‘human’ jobs redundant, the need for higher level of maths training is becoming more important. Because of this, a mathematical background becomes essential to create an edge for individuals in the working world.

2 Stakeholder

The people who are affected by the results of this study are university students in general as it gives a trend, which is not exclusive to one faculty. However, we have decided that the main stakeholders are prospective mathematics students. This information present in this data can help inform future applicants of course statistics; and while it may not fully disclose each unit of study in detail, it gives a fair level of representation of the abilities of the student body. This in combination with an understanding of one’s capabilities and limits can be a powerful tool in determining the right combination of classes.

3 Research Questions

Which units show a trend of higher passing marks?
Do part-time students have a different mark distribution compared to full-time students?
Is there a difference in mark distribution based on gender?
How does the unit of study level affect the grades earned by the students in that level?
How many students failed the respective course they took?

4 Preparation

library(tidyverse)
library(stringr)
library(gghighlight)

unit.ad <- read.csv("Student_data_advanced.csv")
unit.ad$Unit.of.Study.Grade <- factor(unit.ad$Unit.of.Study.Grade, levels = c("HD","DI", "CR", "PS", "FA"))
unit.ad$Unit.of.Study.Level <- factor(unit.ad$Unit.of.Study.Level, levels = c("Mainstream", "Fundamental", "Advanced"))

5 Data content

unit.ad %>% head()

##   Student.Identifier Year Domestic.Intl Gender      Mode   Age
## 1          289703333 2014             I      F Part time 19-21
## 2          188174541 2017             D      M Full time 19-21
## 3           41036478 2016             D      F Full time 19-21
## 4           29921215 2014             D      F Full time 19-21
## 5          198560236 2013             D      F Full time 19-21
## 6          268800920 2012             D      F Full time 19-21
##   Unit.of.Study Unit.of.Study.Level Unit.of.Study.Grade
## 1        Unit B          Mainstream                  CR
## 2        Unit E          Mainstream                  DI
## 3        Unit A          Mainstream                  DI
## 4        Unit E          Mainstream                  PS
## 5        Unit C          Mainstream                  PS
## 6        Unit B          Mainstream                  PS

This data set contains 64,486 anonymised student grades for 14 first year Maths units of study from 2012 to 2017. The data provided is real data from the University’s student system and has been provided by Institutional Analytics and Planning, the department of the University responsible for student data reporting and analysis.

Student.Identifier: Unique number of each student
Year: The academic year in which the unit of study was run. This is an integer between 2012 and 2017. Both semesters 1 and 2 have been combined together.
Domestic/Intl: Whether the student is a domestic or an international student. “D” denotes a domestic student and “I” denotes an international student.
Gender: The gender of the student. “M” denotes male students and “F” denotes female students. To preserve anonymity, students who identify as neither male nor female have been coded as female. This is the same approach that is used by the Department of Education when reporting aggregated student statistics: see the notes on ‘Gender’ at http://highereducationstatistics.education.gov.au/DataNotes.aspx
Mode: Whether the student is full time or part time. Full time is defined as taking 18 or more credit points in the semester in which the student took the unit of study.
Age: The age of the student at the time that they undertook the unit of study. This is reported as one of four bands:
- 18 and under
- 19-21
- 22-25
- Over 25
Unit of Study: The “name” of the unit of study. To preserve anonymity, this is not the actual unit of study code for the unit but rather a made-up identifier such as “Unit A”, “Unit B” and so forth. You can assume that each of these identifiers relate to a junior unit of study offered by the School of Mathematics and Statistics between 2012 and 2017.
Unit of Study Level: The level of the unit of study - either fundamental, mainstream or advanced.
Unit of Study Grade: The final grade achieved by the student in the unit of study. Grade Interpretation:
- FA: Fail (0-49)
- PS: Pass (50-64)
- CR: Credit (65-74)
- DI: Distinction (75-84)
- HD: High Distinction (85-100)
- Grades other than these five descriptors (for example, discontinuations, withdrawals, absent fails and the like) have been removed from the dataset to preserve anonymity.

6 Analysis

6.1 Difficulty difference

unit.ad %>% 
  ggplot(aes(Unit.of.Study, fill = Unit.of.Study.Grade)) +
  geom_bar() +
  theme(axis.text.x = element_text(angle = 45, vjust = .5)) +
  scale_fill_discrete(name="Grade",
                      breaks=c("HD", "DI", "CR", "PS", "FA"),
                      labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)", 
                               "Pass (50-64)", "Fail (0-49)")) +
  labs(x = "Unit of Study") +
  facet_grid(~Unit.of.Study.Level, scales = "free")

ggplot() +
  geom_bar(data = unit.ad,
           aes(Unit.of.Study), fill = "grey")  +
  geom_bar(data = unit.ad %>% filter(Unit.of.Study.Grade == "FA"),
           aes(Unit.of.Study), fill = "red")  +
  theme(axis.text.x = element_text(angle = 45, vjust = .5)) +
  scale_fill_discrete(name="Grade",
                      breaks=c("CR", "DI", "FA", "HD", "PS"),
                      labels=c("Credit (65-74)", "Distinction (75-84)", 
                               "Fail (0-49)", "High Distinction (85-100)", "Pass (50-64)")) +
  labs(x = "Unit of Study", title = "How many students failed?") +
  facet_grid(~Unit.of.Study.Level, scales = "free")

unit.ad %>% 
  ggplot(aes(Unit.of.Study, fill = Unit.of.Study.Grade)) +
  geom_bar(position = "fill") +
  theme(axis.text.x = element_text(angle = 45, vjust = .5)) +
  scale_fill_discrete(name="Grade",
                      breaks=c("HD", "DI", "CR", "PS", "FA"),
                      labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)", 
                               "Pass (50-64)", "Fail (0-49)")) +
  labs(x = "Unit of Study", y = "Percentage (%)") +
  facet_grid(~Unit.of.Study.Level, scales = "free")

unit.ad %>% 
  ggplot(aes(Unit.of.Study, fill = Unit.of.Study.Grade)) +
  geom_bar(position = "fill", show.legend = F) +
  theme(axis.text.x = element_text(angle = 45, vjust = .5)) +
  scale_fill_manual(values = c("grey", "grey", "grey", "grey", "red"),
                    breaks=c("HD", "DI", "CR", "PS", "FA"),
                      labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)", 
                               "Pass (50-64)", "Fail (0-49)")) +
  labs(x = "Unit of Study", title = "How many students failed? (%)", y = "Percentage (%)") +
  facet_grid(~Unit.of.Study.Level, scales = "free")

It’s not surprising there are more students enrolled in the mainstream stream of maths given that maths is not everybody’s passion and many students have to study it simply as a requirement for their degree. What is interesting is that both mainstream and fundamental streams, which have a higher percentage of fails and passes, are generally quite even throughout all grades.
Whereas the students enrolled in the advanced stream tend to get better marks, with more than 50% of students receiving a distinction or a high distinction. We can assume that it is these students in the advanced stream who are the ones with the real passion for maths, resulting in better understanding and—in turn—better grades.

6.2 Grade differnce by mode

6.2.1 Plots

unit.ad %>% 
  ggplot(aes(Mode, fill = Unit.of.Study.Grade)) + 
  geom_bar(position = "fill") +
  scale_fill_discrete(name="Grade",
                      breaks=c("HD", "DI", "CR", "PS", "FA"),
                      labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)", 
                               "Pass (50-64)", "Fail (0-49)")) +
  labs(y = "Percentage (%)") +
  facet_grid(~Unit.of.Study.Level, scales = "free")

Throughout all study levels, part time students have more percentage of getting a fail. And one of the most significant thing is that, in advanced level, over 50% of students get over 75 scores. It is not surprising because, those are the most passionate maths student.

7 Gender Difference

unit.ad %>% 
  ggplot(aes(Gender, fill = Unit.of.Study.Grade)) +
  geom_bar() +
  scale_fill_discrete(name="Grade",
                      breaks=c("HD", "DI", "CR", "PS", "FA"),
                      labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)", 
                               "Pass (50-64)", "Fail (0-49)")) +
  labs(y = "Percentage (%)") +
  facet_grid(~Unit.of.Study.Level, scales = "free")

unit.ad %>% 
  ggplot(aes(Gender, fill = Unit.of.Study.Grade)) +
  geom_bar(position = "fill") +
  scale_fill_discrete(name="Grade",
                      breaks=c("HD", "DI", "CR", "PS", "FA"),
                      labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)", 
                               "Pass (50-64)", "Fail (0-49)")) +
  labs(y = "Percentage (%)") +
  facet_grid(~Unit.of.Study.Level, scales = "free")

Those graphs show that

8 Z testing (5% Significance Level)

8.1 Introduction

unit.ad %>% 
  ggplot(aes(Mode, fill = Unit.of.Study.Grade)) + 
  geom_bar(position = "fill", show.legend = F) +
  scale_fill_manual(values = c("grey", "grey", "grey", "grey", "red"),
                    breaks=c("HD", "DI", "CR", "PS", "FA"),
                      labels=c("High Distinction (85-100)", "Distinction (75-84)", "Credit (65-74)", 
                               "Pass (50-64)", "Fail (0-49)")) +
  labs(title = "Percentage of getting Fails", y = "Percentage (%)")

As this graph shows, with the percentage of getting “Fails”, there is a difference between full time students and part time students. So we decided to test the difference between the 2 category. What we wanted to know was the difference of mean of Grade points. In order to calculate those numbers, we set class values to each students.

High Distinction (85-100): 93 point
Distinction (75-84): 80 point
Credit (65-74): 70 point
Pass (50-64): 57 point
Fail (0-49): 25 point

population <- unit.ad %>% 
  mutate(Grade.point = 
            case_when(
              Unit.of.Study.Grade == "CR" ~ 70,
              Unit.of.Study.Grade == "DI" ~ 80,
              Unit.of.Study.Grade == "FA" ~ 25,
              Unit.of.Study.Grade == "HD" ~ 93,
              Unit.of.Study.Grade == "PS" ~ 57))

sample <- population %>% 
  filter(Mode == "Part time")

sample.size <- 
  sample %>% nrow()

sample.mean <- 
  sample$Grade.point %>% mean()

ggplot() + 
  geom_histogram(data = population, aes(Grade.point), binwidth = 25, fill = "grey") +
  geom_histogram(data = sample, aes(Grade.point), binwidth = 25, fill = "red") +
  geom_vline(xintercept = 62.72698, colour = "black") +
  geom_vline(xintercept = 56.47816, colour = "red") +
  labs(title = "Difference of mean of grade score")

population.size <- 
  population %>% nrow()

population.mean <- 
  population$Grade.point %>% mean()

population.sd <- population$Grade.point %>% sd()

In this case, we consider the whole dataset as a population and part time students’ dataset as a sample.

mean(population$Grade.point)

## [1] 62.72698

mean(sample$Grade.point)

## [1] 56.47816

8.2 Hypothesis

Null hypothesis: µ = 62.72698
Alternative hypothesis: µ < 62.72698

8.3 Test Statistic

Z = (mean(sample$Grade.point) - mean(population$Grade.point)) / (sd(population$Grade.point) / sqrt(sample.size))
Z

## [1] -17.34057

According to the distribution table, z value is -1.64.

Z = -17.34057 < -1.64

Z score exist outside the CI(Confidence Interval). Thus null hypothesis is rejected in this case.
Based on statistical evidence, we can conclude that there is a difference between mean of part time students’ grade point and full time students’ one.

Part time students tend to get less grade points.

DATA1001 Project3 Team B

Team B

6/1/2018

1 Introduction

2 Stakeholder

3 Research Questions

4 Preparation

5 Data content

6 Analysis

6.1 Difficulty difference

6.2 Grade differnce by mode

6.2.1 Plots

7 Gender Difference

8 Z testing (5% Significance Level)

8.1 Introduction

8.2 Hypothesis

8.3 Test Statistic

9 Conclusion

DATA1001 Project3 Team B

Team B

6/1/2018

1 Introduction

1.1 Related Journal Article

2 Stakeholder

3 Research Questions

4 Preparation

5 Data content

6 Analysis

6.1 Difficulty difference

6.2 Grade differnce by mode

6.2.1 Plots

7 Gender Difference

8 Z testing (5% Significance Level)

8.1 Introduction

8.2 Hypothesis

8.3 Test Statistic

9 Conclusion