library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(ggplot2)
students <- read.csv("StudentsPerformance.csv")
students
Graph 1: Base Graph
# In this graph, I want to investigate whether lunch status (standard vs free/reduced) is linked to differences in student performance. Specifically, I am checking if students with standard lunch score higher in math and reading compared to those with free or reduced lunch.
students$math.score <- as.numeric(students$math.score)
students$reading.score <- as.numeric(students$reading.score)
cols <- ifelse(students$lunch == "standard", "blue", "orange")
plot(students$math.score, students$reading.score,
col = cols, pch = 19,
xlab = "Math Score", ylab = "Reading Score",
main = "Math vs Reading by Lunch Status")
legend("topleft", legend = c("Standard Lunch", "Free/Reduced Lunch"),
col = c("blue", "orange"), pch = 19)

# This scatter plot shows that students who have standard lunch usually score higher in both math and reading compared to students with free or reduced lunch. The free/reduced lunch group has more spread and lower averages, while the standard lunch group is more tightly clustered at higher scores. This suggests that lunch status, which often reflects family resources, may be linked to how well students perform in these subjects.
Graph 2: ggplot Graph
# Here I want to investigate how completing a test preparation course affects the relationship between reading and writing scores. My goal is to see if preparation helps students score higher and show a stronger connection between these two skills.
ggplot(students, aes(x = reading.score, y = writing.score,
color = test.preparation.course)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Reading vs Writing by Test Preparation",
x = "Reading Score", y = "Writing Score",
color = "Test Preparation") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

# The graph makes it clear that students who completed a test preparation course tend to do better in both reading and writing. Their scores are higher overall, and the line of best fit shows a stronger relationship between the two subjects. Students without preparation have lower scores and more variation. This supports the idea that test preparation helps students improve and keeps their performance more consistent.
Graph 3: Plotly Interactive Graph
# This graph investigates whether gender plays a role in performance differences between math and writing scores. I also want to explore how other factors like race/ethnicity, lunch status, and test preparation interact with gender when looking at student outcomes.
plot_ly(students, x = ~math.score, y = ~writing.score,
color = ~gender, type = "scatter", mode = "markers",
text = ~paste("Race/Ethnicity:", race.ethnicity,
"<br>Lunch:", lunch,
"<br>Test Prep:", test.preparation.course)) %>%
layout(title = "Math vs Writing by Gender",
xaxis = list(title = "Math Score"),
yaxis = list(title = "Writing Score"))
# Based on this interactive plot, female students generally score higher in writing, while male students show more spread in math scores. The hover labels also show that race/ethnicity and test preparation play a role in these differences. This means gender is part of the picture, but other factors like preparation and background also matter. The interactive feature makes it easier to see how these details connect for each student.
Graph 4: Plotly Animation Graph
# In this animated graph, I want to investigate how parental education level relates to student performance in math and reading. The goal was to see if higher parental education is connected to stronger scores, and how race/ethnicity and test preparation influence results within each education group.
students %>%
plot_ly(x = ~math.score, y = ~reading.score,
frame = ~parental.level.of.education,
ids = ~gender,
size = ~writing.score, color = ~race.ethnicity,
text = ~paste("Gender:", gender,
"<br>Lunch:", lunch,
"<br>Test Prep:", test.preparation.course)) %>%
layout(title = "Math vs Reading by Parental Education",
xaxis = list(title = "Math Score"),
yaxis = list(title = "Reading Score"))
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
# The animation shows that students with parents who have higher education levels, like bachelor’s or master’s degrees, usually score higher in math and reading. Students whose parents have less education tend to have lower averages and more variation. The frames also show that race/ethnicity and test preparation affect scores within each group. This suggests that parental education is important, but it works together with other factors to shape student performance.