This HTML file is created as a my assessment. Firstly, we need to install packages, load libraries and create several folders for our project. Install packages (if below packages are not available in R):
* install.packages(“tidyverse”) * install.packages(“readr”) * install.packages(“gapminder”)> * install.packages(“here”) * devtools::install_github(“hadley/ggplot2”)
* devtools::install_github(“hrbrmstr/hrbrthemes”)
* devtools::install_github(“ropensci/plotly”) * devtools::install_github(“hrbrmstr/markdowntemplates”) Load libraries:

library(readr)
library(ggplot2)

In this part, we work with the Student Survey data set. Read the data set StudentSurvey.csv

SS = read_csv("StudentSurvey.csv.csv")
## Parsed with column specification:
## cols(
##   Year = col_character(),
##   Gender = col_character(),
##   Smoke = col_character(),
##   Award = col_character(),
##   HigherSAT = col_character(),
##   Exercise = col_double(),
##   TV = col_double(),
##   Height = col_double(),
##   Weight = col_double(),
##   Siblings = col_double(),
##   BirthOrder = col_double(),
##   VerbalSAT = col_double(),
##   MathSAT = col_double(),
##   SAT = col_double(),
##   GPA = col_double(),
##   Pulse = col_double(),
##   Piercings = col_double()
## )

Check the dataset structure

  str(SS)

Factorize variables Several variables need recoding or adjusting so that R can understand that they are categorical variables. We will factorize following variables: Year, Smoke, Gender, Award, HigherSAT, BirthOrder, Siblings

 SS$Year=ifelse(SS$Year == "FirstYear", 1, 
                      ifelse(SS$Year == "Sophomore", 2, 
                             ifelse(SS$Year == "Junior", 3, 
                                    ifelse(SS$Year == "Senior", 4, NA))))                            
  SS$Year=ordered(SS$Year, 
                    levels = c(1,2,3,4),
                    labels = c("Freshman","Sophomore","Junior","Senior"))
                     
  
  SS$Gender=ifelse(SS$Gender == "F", 1, ifelse(SS$Gender == "M", 2, NA))
  SS$Gender=factor(SS$Gender,
                      levels = c(1,2),
                      labels = c("Female", "Male"))

  SS$Smoke=ifelse(SS$Smoke == "Yes", 1, ifelse(SS$Smoke == "No", 2, NA))
  SS$Smoke=factor(SS$Smoke,
                      levels = c(1,2),
                      labels = c("Yes", "No"))
  
  SS$Award=ifelse(SS$Award == "Academy", 1, ifelse(SS$Award == "Nobel", 2, 
                                                       ifelse(SS$Award == "Olympic", 3, NA)))
  SS$Award=factor(SS$Award,
                      levels = c(1,2,3),
                      labels = c("Academy", "Nobel", "Olympic"))
  
  SS$HigherSAT=ifelse(SS$HigherSAT == "Math", 1, ifelse(SS$HigherSAT == "Verbal", 2, NA))
  SS$HigherSAT=factor(SS$HigherSAT,
                          levels = c(1,2),
                          labels = c("Math", "Verbal"))
  

  SS$BirthOrder=ordered(SS$BirthOrder, levels = c(1,2,3,4,5,6,7,8,NA))
  
  SS$Siblings=ordered(SS$Siblings, levels = c(0,1,2,3,4,5,6,7,8))

Issue is the relationship between combined SAT score (SAT) and college grade point average (GPA). Since this data is too large to run on other software as Excel,…R is more suitable to do these. Specifically, using R to transfer this data into graph to consider the relationship between the SAT score and the GPA of students who are on campus. This chart type is particularly useful to study the relationship between 2 variables. It is common to provide even more information using colors or shapes.

Figure 1 shows the correlation between the SAT score and the GPA of students who are on campus. The results show that this relationship is not clear. A student’s SAT entry score is hardly significant in predicting that student’s GPA. In contrast, the high / low student’s GPA is not much related to the student’s previous SAT score. The correlation coefficient between SAT score and GPA is only about 0.37. The chart also shows that there are not many SAT score differences between male and female students. However, in terms of GPA, female students seem to have a better performance than their male counterparts.

Figure_1 <- ggplot(SS, aes(x = SAT, y = GPA)) + 
                geom_point(aes(color = Gender), alpha = 0.9) + 
                geom_smooth(method = "loess", 
                            se = T, 
                            color = "#e7cdac", 
                            size = 0.6) + 
                xlim(c(600, 2000)) + 
                ylim(c(0, 4)) + 
                labs(y = "GPA", x = "SAT", 
                     title = "Figure 1. The correlation between student's SAT\nand student's GPA (GPA)", 
                     subtitle = "A high SAT score does not mean you will get a high GPA in your college", 
                     caption = "Source: DataViz with R. 2019. 'The StudentSurvey data set'") +
                annotate(x = 1650, y = 1.3, 
                     label = paste("Cor = ", 0.37), 
                     geom = "text", size = 5, colour = "gray40") +
                theme(plot.title = element_text(size = 15),
                      plot.subtitle = element_text(size = 11))

  Figure_1 
## Warning: Removed 17 rows containing non-finite values (stat_smooth).
## Warning: Removed 17 rows containing missing values (geom_point).