student_per <- read.csv("student_performance.csv")
View(student_per)

set.seed(123) 
reduced_students <- student_per[sample(nrow(student_per), 2000), ]

Base R Plot

For this graph, I wanted to analyze the relationship of the number of hours students study with their final exam score. I hypothesize that the more hours students study, the better the exam score. I chose a scatter plot because this will allow me to better analyze the correlation between hours studied and exam score. This will make identifying trends and patterns easier to see and interpret.

plot(reduced_students$study_hours_per_day, reduced_students$exam_score,
     pch = 19,
     col = "blue",
     xlab = "Hours Studied",
     ylab = "Exam Score",
     main = "Exam Score vs Hours Studied")

This graph does in fact support my hypothesis that the more hours a student studies, the better their exam score will be. Using a scatter plot allows us to see that positive relationship between these two variables, with each point representing an individual student. We can see though that there is some variation in this graph. There are some students who can study none, or only a few hours and still receive a high score. Along these same lines, we also do see some students who do study for a long time, and never achieve that high exam score that they may be striving for. While the amount of study hours is a big indicator of your exam grade, there may be other influential factors that must be considers. For example prior academic knowledge, stress levels, or even priorities at home.


ggplot Plot

For this graph, I wanted to see how student’s stress levels differ based on major. Here at RMU, many of my peers believe that their major is very stressful. However, it’s very hard to compare levels of stress among a small group of people. My data captures 6 majors, Arts, Biology, Business, Computer Science, Engineering, and Psychology. I chose to create a faceted box plot to see a concise summary of the distributions of stress levels. With each major being it’s own plot, I hope to have a better understanding if certain majors are more stressful than others.

library(ggplot2)
ggplot(reduced_students, aes(x = major, y = stress_level, fill = major)) +
  geom_boxplot(alpha = 0.8) +
  labs(title = "Student's Stress Level by Major",
       x = "Major",
       y = "Stress Level") +
  scale_y_continuous(breaks = seq(0, 10, by = 2), limits = c(0, 10)) +
  theme(legend.position = "none")

My plot above shows the different distributions in stress levels (rated 0-10) between 6 different majors. We can see right away that the median stress level for all majors fall around 5. Psychology has the lowest median at around 4.9 and business with the highest at around 5.1. While all of the medians are very similar to one another, we can see that some majors have a bigger variation of stress levels compared to others. It is clear to say that Arts, Business, and Computer Science majors have greater variability in stress levels among students. We can even see that there is an outlier on the psychology major box plot, suggesting that some students can really experience great stress, while others can manage their stress pretty well. On the other hand, engineering has one of the least varied distributions when it comes to stress level. Even though that engineering is very difficult, this graph suggests that students in this field are capable to manage their workload efficiently.


Trelliscope Plot

For this plot, I wanted to see how study hours and motivation level effect student’s exam score. Not only this, but how these factors differ among different study locations. In my first Base R graph, we saw that there is a relationship between study hours and exam score. However, I wanted to analyze how other factors contribute to student’s exam scores. I know that in my day to day life, I perform better on exams if I study in a quiet place, whether that be the library or a designated quiet room. I was curious to see if this was the same for other students. In order to analyze this, I created a trelliscope scatter plot with study hours per day on the x-axis and exam score on the y-axis. Using trelliscope allowed me to facet over each study environment and color by motivation level.

library(tidyr)
library(dplyr)
library(ggplot2)
library(trelliscopejs)

reduced_students %>%
  ggplot(aes(x = study_hours_per_day, y = exam_score, color = motivation_level)) +
  geom_point(alpha = 0.8) +
  labs(title = "Study Hours vs Exam Score by Study Environment",
       x = "Study Hours",
       y = "Exam Score",
       color = "Motivation Level") +
  facet_trelliscope(~ study_environment,
                    name = "Study Hours vs Exam Score",
                    desc = "Study Hours vs Exam Score Faceted by Study 
                    Environment and Colored by Motivation Level",
                    nrow = 1,
                    ncol = 2,
                    scales = "same",
                    path = "trelliscope")

This trelliscope plot reemphasizes the idea that as study hours per day increases, so does exam score. But now, we are able to see that motivation level also plays a critical role. In each graph, it is clear to see that students with more motivation (the lighter points), generally perform better on exams than less motivated students, regardless of study hours. This makes sense because if you are a motivated student, you are willing to put in extra time and energy to study for exams. Using trelliscope was very useful to not only see how motivation effected exam scores, but study enviornment as well. We can see that students who study in libraries and quiet spaces generally perform better on exams than those who study in a cafe or dorm room. This may suggest that students who are studying in the cafe or dorm room are distracted very easily, not allowing them to have productive study hours. Last but not least, there is the co-learning group, which has a different trend. We can see that for motivated students, the co-learning environment is very beneficial in increasing exam scores. However, this study method is not beneficial for those students who are unmotivated. These learning environments distract them easily, causing them to study longer and perform worse on exams. Overall, this trelliscope plot illustrates that there is an interaction between study habits, environment, and motivation and how it affects exam scores.


Plotly Graph

For this plotly graph, I wanted to examine how stress levels and extracurricular participation affected test scores. In the ggplot graph, we saw how stress levels were very similar across the different majors. I wanted to expand on this to see how stress levels and extracurricular activities impacted exam scores. This semester specifically, I had a lot on my plate in terms of participating in extracurriculars. I worked part time, participated in clubs on campus, and played club volleyball. Even though that being involved of all of these extracurriculars helped me be more motivated, it definitely increased my stress level. By creating this plotly graph, I will be able to analyze if students performed better by participating in extracurriculars (by coloring) or by stress levels. Not only this, I will be able to hover over and see each students data, which is very useful when interpreting trends and relationships.

library(plotly)
library(dplyr)

reduced_students %>%
  plot_ly(x = ~stress_level, y = ~exam_score, color = ~extracurricular_participation,
          hoverinfo = "text",
          text = ~paste("Major:", major, "<br>",
                        "Stress Level:", stress_level, "<br>",
                        "Exam Score:", exam_score)) %>%
  layout(title = "Stress Level vs Exam Score by Extracurricular Participation",
         xaxis = list(title = "Stress Level"),
         yaxis = list(title = "Exam Score"))

From this graph, I wanted to analyze how stress levels and extracurricular activities affect exam scores. We can see from this graph that students have a very wide range of stress levels. There is a bit of a “U” shape in terms of stress levels. Students who have low stress are not performing well on exams, more than likely due to low motivation. Most of the students who have moderate stress levels tend to have higher exam scores, generally 85% and above. And students who have high stress levels also tend to perform worse on exams, maybe because of the pressure they are putting on themselves. When looking at the students who participate in extracurriculars, we can see that they range over all of the exam scores. We can see a pattern that if students participate in these activities that they tend to have higher exam scores. This could be due to the fact of better time management and or even grade eligibility to participate in their extracurricular. Using plotly allows us to see both of these trends at once, while seeing student’s specific stress level and exam score. This also allows us to be interactive zoom in to establish new trends and patterns in the data.