# import necessary libraries
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
# read datasets
setwd('C:/Users/ChotC/OneDrive/Documents/UW/Sem3/AdvRViz/0710/datahw1/')
df <- read.csv('exams.csv', header = T, dec = ',', sep = ',')
# View head of the data
head(df)
##   gender race.ethnicity parental.level.of.education        lunch
## 1   male        group A                 high school     standard
## 2 female        group D            some high school free/reduced
## 3   male        group E                some college free/reduced
## 4   male        group B                 high school     standard
## 5   male        group E          associate's degree     standard
## 6 female        group D                 high school     standard
##   test.preparation.course math.score reading.score writing.score
## 1               completed         67            67            63
## 2                    none         40            59            55
## 3                    none         59            60            50
## 4                    none         77            78            68
## 5               completed         78            73            68
## 6                    none         63            77            76
  1. Dataset description

    This dataset is ‘Students Performance in Exams’ by Aman Chauhan on kaggle.com. It includes the student information, such as:

  1. Scatter plots and comments
c1 <- ggplot(data = df, aes(y = math.score, x = reading.score)) + 
  geom_point(aes(color = gender)) +
  geom_smooth(aes(color = gender),se=F) + ggtitle("Relation between reading score and math score")
c1
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

In this first chart, students who are good in reading tend to be better in math. And given the same quality at reading, male students perform better in Math than female students do.

c2 <- ggplot(data = df, aes(y = math.score, x = writing.score)) + 
  geom_point(aes(color = lunch)) +
  geom_smooth(aes(color = lunch),se=F) +
  ggtitle("Relation between writing score and math score")
c2
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

In second chart, the same trend appears that people who are better in writing tend to be better in math. On top right corner, there is more blue dots and in the bottom left corner, there is more red dots, which mean people who have standard meal, i.e. who have better finance back ground, have better performance in math and writing.

c3 <- ggplot(data = df, aes(y = reading.score, x = writing.score)) + 
  geom_point(aes(color = test.preparation.course)) +
  geom_smooth(aes(color = test.preparation.course),se=F) +
  ggtitle("Relation between writing score and reading score")
c3
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

For the third chart, in comparison with the previous 2 charts, the shape is narrower. It means that the correlation between writing and reading is stronger than the correlation between reading and math and correlation betweeen writing and math. Secondly, it follows the same trend that people who are good at writing are better in reading. In addition, there are more red dots in top right corner and fewer red dots in bottom left corner, which mean people who completed the preparation course tend to have better performance in writing and reading that people who do not complete the prepartion course. However, for the people in the middle of the chart, who are average in both reading and writing, it is not clear that completion of preparation course benefit those students.