#Problem 1 Base R Graph
data <- read.csv("StudentData.csv")
#install.packages("Hmisc")
library(Hmisc)
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
plot(data$Hours_Studied, data$Exam_Score,main = "Exam Performace", xlab = "Hours Studied", ylab = "Exam Score", las = 1, col.main = "blue")
minor.tick(nx = 3, ny = 3, tick.ratio = 0.5 )

## For this graph, I am using the dataset StudentData.csv from Kaggle.com. This dataset contains over 6,000 observations of student performances on tests using 19 difference possible explanatory variables, including hours studied, gender, physical activity, etc. For this graph we are using the hours studied as our possible explanatory variable to see if there is a correlation between the amount of hours studied and performances on an exam. I chose to use a scatterplot for this data, as it provides a better visual aid in determining is there is a correlation between our two variabels. From this graph we can see there is a positive correlation between students who study for longer periods of time and higher performances on an exam. There are quite a few noticeable outliers in the dataset, of those who studied less but still received a high score.
## 2 ggplot
library(ggplot2)
ggplot(data, aes(x = Teacher_Quality, y = Exam_Score, fill = Peer_Influence)) +
geom_col(position = "dodge") +
labs(main = "Student Performace: School Factors", x = "Teacher Quality", y = "Exam Score", fill = "Peer Influence") +
facet_wrap(~Gender) +theme(strip.text = element_text(face = "bold", color = "white", hjust = 0, size = 9), strip.background = element_rect(fill = "black", linetype = "solid",color = "grey", linewidth = 1), panel.border = element_rect(fill = "transparent", color = "black", linewidth = .5))
## Ignoring unknown labels:
## • main : "Student Performace: School Factors"

## For this graph, I used the same dataset StudentData.csv from Kaggle.com to investigate the if there is any influence from the school environment, teacher quality and peer influence. I chose to use a a bar chart for a better visual aid when when comparing categorical variables. I have the x axis set as the teacher quality (High, Low, Medium) and have each group separated by peer influence (Negative, Neutral, and Positive).I also chose to facet the graph on gender to see if there is a difference in performance with gender. From the graph we can conclude that females with positive peer influence and high teacher quality tend to perform better overall on their test scores.
#3 Plotly
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:Hmisc':
##
## subplot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
a_data <- read.csv("athletedata.csv")
a_data2 <- a_data[a_data$Year >= 2000 & !is.na(a_data$Medal),]
gg <- ggplot(a_data2, aes(x = Height, y = Weight, color = Medal, size = Age, frame = Year))+
geom_point() +
labs(title = "Height and Weight of Olypmic Medalists", color = "Medal Type") +
scale_color_manual(values = c("Bronze" = "#CE8946", "Gold" = "gold2", "Silver" = "#C0C0C0"))
ggplotly(gg)
# For this graph I chose to examine the Height and Weight of Olympic medalists from the years 2000 to 2016. To do this I used the dataset athletedata.csv from Kaggle.com. In creating this animation we can see that athletes height and weights are typically correlated positively. Looking through the years we can typically see there is a a wide spread of those who receive a medal, however, some years there is a smaller spread of height and weight. For example 2006, the height is between 150 to 200 cm and the weight does not exceed 120 kg, while in other years there is a larger spread. To conclude, from the graph there is no clear correlation between height/weight and medal type for Olympic medalists.