This data set provides a comprehensive overview of various factors affecting student performance in exams. It includes information on study habits, attendance, parental involvement, and other aspects influencing academic success.
Installing packages like ‘tidyverse’, ‘ggplot2’, ‘lubridate’, ‘dplyr’, ‘tidyr’, ‘here’, ‘skimr’, ‘janitor’ that will help in cleaning, analyzing and plotting our data.
# Loading packages :
library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
library(here)
library(skimr)
library(janitor)
students<- read.csv("C:/Users/saksh/Desktop/StudentPerformanceFactors.csv")
View(students)
glimpse(students)
## Rows: 6,607
## Columns: 20
## $ Hours_Studied <int> 23, 19, 24, 29, 19, 19, 29, 25, 17, 23, 17,…
## $ Attendance <int> 84, 64, 98, 89, 92, 88, 84, 78, 94, 98, 80,…
## $ Parental_Involvement <chr> "Low", "Low", "Medium", "Low", "Medium", "M…
## $ Access_to_Resources <chr> "High", "Medium", "Medium", "Medium", "Medi…
## $ Extracurricular_Activities <chr> "No", "No", "Yes", "Yes", "Yes", "Yes", "Ye…
## $ Sleep_Hours <int> 7, 8, 7, 8, 6, 8, 7, 6, 6, 8, 8, 6, 8, 8, 8…
## $ Previous_Scores <int> 73, 59, 91, 98, 65, 89, 68, 50, 80, 71, 88,…
## $ Motivation_Level <chr> "Low", "Low", "Medium", "Medium", "Medium",…
## $ Internet_Access <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "…
## $ Tutoring_Sessions <int> 0, 2, 2, 1, 3, 3, 1, 1, 0, 0, 4, 2, 2, 2, 1…
## $ Family_Income <chr> "Low", "Medium", "Medium", "Medium", "Mediu…
## $ Teacher_Quality <chr> "Medium", "Medium", "Medium", "Medium", "Hi…
## $ School_Type <chr> "Public", "Public", "Public", "Public", "Pu…
## $ Peer_Influence <chr> "Positive", "Negative", "Neutral", "Negativ…
## $ Physical_Activity <int> 3, 4, 4, 4, 4, 3, 2, 2, 1, 5, 4, 2, 4, 3, 4…
## $ Learning_Disabilities <chr> "No", "No", "No", "No", "No", "No", "No", "…
## $ Parental_Education_Level <chr> "High School", "College", "Postgraduate", "…
## $ Distance_from_Home <chr> "Near", "Moderate", "Near", "Moderate", "Ne…
## $ Gender <chr> "Male", "Female", "Male", "Male", "Female",…
## $ Exam_Score <int> 67, 61, 74, 71, 70, 71, 67, 66, 69, 72, 68,…
So now, we can see that the file was imported correctly.
And here some cleaning steps I followed:
students%>%
select(Hours_Studied, Attendance, Sleep_Hours, Tutoring_Sessions, Physical_Activity, Exam_Score)%>%
summary()
## Hours_Studied Attendance Sleep_Hours Tutoring_Sessions
## Min. : 1.00 Min. : 60.00 Min. : 4.000 Min. :0.000
## 1st Qu.:16.00 1st Qu.: 70.00 1st Qu.: 6.000 1st Qu.:1.000
## Median :20.00 Median : 80.00 Median : 7.000 Median :1.000
## Mean :19.98 Mean : 79.98 Mean : 7.029 Mean :1.494
## 3rd Qu.:24.00 3rd Qu.: 90.00 3rd Qu.: 8.000 3rd Qu.:2.000
## Max. :44.00 Max. :100.00 Max. :10.000 Max. :8.000
## Physical_Activity Exam_Score
## Min. :0.000 Min. : 55.00
## 1st Qu.:2.000 1st Qu.: 65.00
## Median :3.000 Median : 67.00
## Mean :2.968 Mean : 67.24
## 3rd Qu.:4.000 3rd Qu.: 69.00
## Max. :6.000 Max. :100.00
The students sleep for average of 7 hours in a day and the exam score of students is of average of 67 score.
After looking at the data and the insights we created
-Student attendance is affected by the distance they have from their school to home. As the students who live near the school have more attendance than the students that far from the school.
-And also we saw that student are more likely to score good in exams when they have positive or neutral peer influence than that of negative peer influence.
-The students with internet facility and access to resources studies more than that of students who have no internet and less access to the resources.
-The attendance and the exam scores of the students are also affected by the parental involvement, as we saw that the students with less parental involvement are more likely to have less attendance and exam score .
Thank you very much for your interest!
And I would appreciate any comments and recommendations for improvement!