Description

This data set provides a comprehensive overview of various factors affecting student performance in exams. It includes information on study habits, attendance, parental involvement, and other aspects influencing academic success.

Tasks

Working on the dataset

Prepare Phase

Installing packages like ‘tidyverse’, ‘ggplot2’, ‘lubridate’, ‘dplyr’, ‘tidyr’, ‘here’, ‘skimr’, ‘janitor’ that will help in cleaning, analyzing and plotting our data.

# Loading packages :

library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
library(here)
library(skimr)
library(janitor)

Importing datasets

Importing StudentPerformanceFactors.csv

students<- read.csv("C:/Users/saksh/Desktop/StudentPerformanceFactors.csv")
View(students)
glimpse(students)
## Rows: 6,607
## Columns: 20
## $ Hours_Studied              <int> 23, 19, 24, 29, 19, 19, 29, 25, 17, 23, 17,…
## $ Attendance                 <int> 84, 64, 98, 89, 92, 88, 84, 78, 94, 98, 80,…
## $ Parental_Involvement       <chr> "Low", "Low", "Medium", "Low", "Medium", "M…
## $ Access_to_Resources        <chr> "High", "Medium", "Medium", "Medium", "Medi…
## $ Extracurricular_Activities <chr> "No", "No", "Yes", "Yes", "Yes", "Yes", "Ye…
## $ Sleep_Hours                <int> 7, 8, 7, 8, 6, 8, 7, 6, 6, 8, 8, 6, 8, 8, 8…
## $ Previous_Scores            <int> 73, 59, 91, 98, 65, 89, 68, 50, 80, 71, 88,…
## $ Motivation_Level           <chr> "Low", "Low", "Medium", "Medium", "Medium",…
## $ Internet_Access            <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "…
## $ Tutoring_Sessions          <int> 0, 2, 2, 1, 3, 3, 1, 1, 0, 0, 4, 2, 2, 2, 1…
## $ Family_Income              <chr> "Low", "Medium", "Medium", "Medium", "Mediu…
## $ Teacher_Quality            <chr> "Medium", "Medium", "Medium", "Medium", "Hi…
## $ School_Type                <chr> "Public", "Public", "Public", "Public", "Pu…
## $ Peer_Influence             <chr> "Positive", "Negative", "Neutral", "Negativ…
## $ Physical_Activity          <int> 3, 4, 4, 4, 4, 3, 2, 2, 1, 5, 4, 2, 4, 3, 4…
## $ Learning_Disabilities      <chr> "No", "No", "No", "No", "No", "No", "No", "…
## $ Parental_Education_Level   <chr> "High School", "College", "Postgraduate", "…
## $ Distance_from_Home         <chr> "Near", "Moderate", "Near", "Moderate", "Ne…
## $ Gender                     <chr> "Male", "Female", "Male", "Male", "Female",…
## $ Exam_Score                 <int> 67, 61, 74, 71, 70, 71, 67, 66, 69, 72, 68,…

So now, we can see that the file was imported correctly.

Process Phase

And here some cleaning steps I followed:

  • I did not found any Spelling errors, Extra and blank space and duplicated value in the data.
  • I found a error value in Exam_score column where the score was 101 which is not possible so I changed it to 100 to make it usable.

Analyze Phase

students%>%
  select(Hours_Studied, Attendance, Sleep_Hours, Tutoring_Sessions, Physical_Activity, Exam_Score)%>%
  summary()
##  Hours_Studied     Attendance      Sleep_Hours     Tutoring_Sessions
##  Min.   : 1.00   Min.   : 60.00   Min.   : 4.000   Min.   :0.000    
##  1st Qu.:16.00   1st Qu.: 70.00   1st Qu.: 6.000   1st Qu.:1.000    
##  Median :20.00   Median : 80.00   Median : 7.000   Median :1.000    
##  Mean   :19.98   Mean   : 79.98   Mean   : 7.029   Mean   :1.494    
##  3rd Qu.:24.00   3rd Qu.: 90.00   3rd Qu.: 8.000   3rd Qu.:2.000    
##  Max.   :44.00   Max.   :100.00   Max.   :10.000   Max.   :8.000    
##  Physical_Activity   Exam_Score    
##  Min.   :0.000     Min.   : 55.00  
##  1st Qu.:2.000     1st Qu.: 65.00  
##  Median :3.000     Median : 67.00  
##  Mean   :2.968     Mean   : 67.24  
##  3rd Qu.:4.000     3rd Qu.: 69.00  
##  Max.   :6.000     Max.   :100.00

The students sleep for average of 7 hours in a day and the exam score of students is of average of 67 score.

Share Phase

Now let’s visualize some key explorations.

Relationship between Hours Studied and Parental Involvement

ggplot(students)+
  geom_bar(mapping = aes(x= Hours_Studied))+
  facet_grid("Parental_Involvement") 

As i can see that the students with Medium level of parental involvement are likely to study more.

Relationship between Hours Studied and Access to Resources

ggplot(students)+
  geom_bar(mapping = aes(x= Hours_Studied))+
  facet_grid("Access_to_Resources")  

As the students with medium access to the resources are likely to study more than the students with high level of access of resources and with low access to resources students.

Relationship between Attendence and Distance from Home

ggplot(students)+
  geom_bar(mapping = aes(x=Attendance))+
  facet_grid("Distance_from_Home")

The students who live near to the school have a good level of attendance as compared to the students who live far from the school.

Relationship between Hours Studied and Internet Access

ggplot(students)+
  geom_bar(mapping = aes(x= Hours_Studied))+
  facet_grid("Internet_Access")

The students that have the access to internet are likely to study more as compared to the students with no internet facility.

Relationship between Exam Score and Peer Influence

ggplot(students)+
  geom_bar(mapping = aes(x=Exam_Score))+
  facet_grid("Peer_Influence")

As we can see that the negative peer influence affects the score of the students as compared to the students with positive peer influence.

Conclusion

After looking at the data and the insights we created

-Student attendance is affected by the distance they have from their school to home. As the students who live near the school have more attendance than the students that far from the school.

-And also we saw that student are more likely to score good in exams when they have positive or neutral peer influence than that of negative peer influence.

-The students with internet facility and access to resources studies more than that of students who have no internet and less access to the resources.

-The attendance and the exam scores of the students are also affected by the parental involvement, as we saw that the students with less parental involvement are more likely to have less attendance and exam score .

Thank you very much for your interest!

And I would appreciate any comments and recommendations for improvement!