# import necessary libraries
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
# read datasets
setwd('C:/Users/ChotC/OneDrive/Documents/UW/Sem3/AdvRViz/0710/datahw1/')
df <- read.csv('exams.csv', header = T, dec = ',', sep = ',')
# View head of the data
head(df)
## gender race.ethnicity parental.level.of.education lunch
## 1 male group A high school standard
## 2 female group D some high school free/reduced
## 3 male group E some college free/reduced
## 4 male group B high school standard
## 5 male group E associate's degree standard
## 6 female group D high school standard
## test.preparation.course math.score reading.score writing.score
## 1 completed 67 67 63
## 2 none 40 59 55
## 3 none 59 60 50
## 4 none 77 78 68
## 5 completed 78 73 68
## 6 none 63 77 76
Dataset description
This dataset is ‘Students Performance in Exams’ by Aman Chauhan on kaggle.com. It includes the student information, such as:
c1 <- ggplot(data = df, aes(y = math.score, x = reading.score)) +
geom_point(aes(color = gender)) +
geom_smooth(aes(color = gender),se=F) + ggtitle("Relation between reading score and math score")
c1
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
In this first chart, students who are good in reading tend to be better in math. And given the same quality at reading, male students perform better in Math than female students do.
c2 <- ggplot(data = df, aes(y = math.score, x = writing.score)) +
geom_point(aes(color = lunch)) +
geom_smooth(aes(color = lunch),se=F) +
ggtitle("Relation between writing score and math score")
c2
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
In second chart, the same trend appears that people who are better in writing tend to be better in math. On top right corner, there is more blue dots and in the bottom left corner, there is more red dots, which mean people who have standard meal, i.e. who have better finance back ground, have better performance in math and writing.
c3 <- ggplot(data = df, aes(y = reading.score, x = writing.score)) +
geom_point(aes(color = test.preparation.course)) +
geom_smooth(aes(color = test.preparation.course),se=F) +
ggtitle("Relation between writing score and reading score")
c3
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
For the third chart, in comparison with the previous 2 charts, the shape is narrower. It means that the correlation between writing and reading is stronger than the correlation between reading and math and correlation betweeen writing and math. Secondly, it follows the same trend that people who are good at writing are better in reading. In addition, there are more red dots in top right corner and fewer red dots in bottom left corner, which mean people who completed the preparation course tend to have better performance in writing and reading that people who do not complete the prepartion course. However, for the people in the middle of the chart, who are average in both reading and writing, it is not clear that completion of preparation course benefit those students.