# A tibble: 6,607 × 20
Hours_Studied Attendance Parental_Involvement Access_to_Resources
<int> <int> <chr> <chr>
1 23 84 Low High
2 19 64 Low Medium
3 24 98 Medium Medium
4 29 89 Low Medium
5 19 92 Medium Medium
6 19 88 Medium Medium
7 29 84 Medium Low
8 25 78 Low High
9 17 94 Medium High
10 23 98 Medium Medium
# ℹ 6,597 more rows
# ℹ 16 more variables: Extracurricular_Activities <chr>, Sleep_Hours <int>,
# Previous_Scores <int>, Motivation_Level <chr>, Internet_Access <chr>,
# Tutoring_Sessions <int>, Family_Income <chr>, Teacher_Quality <chr>,
# School_Type <chr>, Peer_Influence <chr>, Physical_Activity <int>,
# Learning_Disabilities <chr>, Parental_Education_Level <chr>,
# Distance_from_Home <chr>, Gender <chr>, Exam_Score <int>
In this data set we see a lot of different but very helpful variables, we are going to use all of them (Except Exam Scores) as our x variable to find the our response (Exam Scores) and use these graphs and table to figure out which of these quantitative and categorical variables have the strongest correlation with our response.
As a future teacher, I find this data set to be very interesting and find it helpful to ensure that I can best help students and set them up for success in the classroom with their exam and better prepare them for their future.
###Graphing Quantitative Variables versus Exam Scores
There is a lot going on with the graphs, and it is hard to pinpoint where some of these means really are, so with this we are also going to display the top mean and median exam scores based of the different quantitative variables
###Displaying tables of these mean and median Exam Scores based off of their x quantitative variables
`summarise()` has regrouped the output.
ℹ Summaries were computed grouped by variable and value.
ℹ Output is grouped by variable.
ℹ Use `summarise(.groups = "drop_last")` to silence this message.
ℹ Use `summarise(.by = c(variable, value))` for per-operation grouping
(`?dplyr::dplyr_by`) instead.
`summarise()` has regrouped the output.
ℹ Summaries were computed grouped by variable and value.
ℹ Output is grouped by variable.
ℹ Use `summarise(.groups = "drop_last")` to silence this message.
ℹ Use `summarise(.by = c(variable, value))` for per-operation grouping
(`?dplyr::dplyr_by`) instead.
We notice that hours studied, tutor sessions attended, and Attendance seem to have the highest mean and median exam scores. We also see that when we go towards the bottom of the tables, we see that when hours studied and attendance are at their lowest possible outcome, it reflect in the student’s test scores as well.
`summarise()` has regrouped the output.
ℹ Summaries were computed grouped by variable and category.
ℹ Output is grouped by variable.
ℹ Use `summarise(.groups = "drop_last")` to silence this message.
ℹ Use `summarise(.by = c(variable, category))` for per-operation grouping
(`?dplyr::dplyr_by`) instead.
cmean %>%arrange(desc(mean_exam_score))
# A tibble: 37 × 3
# Groups: variable [13]
variable category mean_exam_score
<chr> <chr> <dbl>
1 Parental_Involvement High 68.1
2 Access_to_Resources High 68.1
3 Parental_Education_Level Postgraduate 68.0
4 Family_Income High 67.8
5 Motivation_Level High 67.7
6 Teacher_Quality High 67.7
7 Peer_Influence Positive 67.6
8 Distance_from_Home Near 67.5
9 Extracurricular_Activities Yes 67.4
10 Learning_Disabilities No 67.3
# ℹ 27 more rows
`summarise()` has regrouped the output.
ℹ Summaries were computed grouped by variable and category.
ℹ Output is grouped by variable.
ℹ Use `summarise(.groups = "drop_last")` to silence this message.
ℹ Use `summarise(.by = c(variable, category))` for per-operation grouping
(`?dplyr::dplyr_by`) instead.
cmedian%>%arrange(desc(median_exam_score))
# A tibble: 37 × 3
# Groups: variable [13]
variable category median_exam_score
<chr> <chr> <dbl>
1 Access_to_Resources High 68
2 Family_Income High 68
3 Parental_Education_Level Postgraduate 68
4 Parental_Involvement High 68
5 Teacher_Quality High 68
6 Access_to_Resources Medium 67
7 Distance_from_Home Moderate 67
8 Distance_from_Home Near 67
9 Extracurricular_Activities No 67
10 Extracurricular_Activities Yes 67
# ℹ 27 more rows
We notice that Access to resources and Family income when it is high, then the exam scores for student has a higher mean and median outcome.
Now lets see if we can compare both hours studied to exam score but look at it between the three different types of Access to Resources(High, Medium, Low) and Parental Involvement(High, Medium, and Low)
We can see that there does seem to be a positive correlation between the three variables.
Now we will see if this is the same for our variables, Attendance, Family Income and Parental Educational Level, except for this one we have an extra graph since Parental Educational Level has some NA values.
We see in these graphs that has Attendance goes up the exam scores go up in all the graphs, but there does not seem to be that great of a difference between Parent’s Education level. Similarly with Family Income, the points seem to all be intermixed moreand no clear top of the group when it comes to Exam Scores.
Based off of these two most recent graphs we can assume that our Quantitative Variables (Attendance, and Hours Studied) have a stronger correlation with Exam scores. Which we were able to see when we pulled the means and medians of both the quantitative and catergorical variables.