For this assignment, I used the data set “Students’ Academic Performance Dataset” from Kaggle. This is an educational dataset which is collected from learning management system (LMS) called Kalboard 360. The data is collected using a leaning activity tracker tool which is called experience API. The data set contains 480 student records of which 305 are male and 175 are female. There are a total of 16 features in this data set that are classified into three major categories: (1) Demographic features such as gender and nationality, (2) Academic background features such as educational stage, grade level and section, (3)Behavioral Features such as raised hand in class, viewing resources, answering survey by parents, and school satisfaction.
The purpose of this analysis is to determine what factors influence student success in school. I am most interested in looking at the following variables: gender, raisedhands, relation, and StageID to see which affects student grades the most. I believe students who raise their hands the most in school will have high grades because the more questions students ask or the more questions they answer during school demonstrate the level of understanding of students in different subjects. I think students in highschool, especially, will show the greatest raisedhands effect because students are more serious in their course work when in higher edcuation level since the content is more intricate. In terms of gender, I hypothesize males will raise their hands the most often in class and will have high grades. Relation is also a very interesting variable to look at it because I do feel the parent responsible for the student plays a great and important role in student’s academic success. I suspect students who have mothers listed in this data set for being responsible for them will have higher student academic success because mothers are more serious and strict and they spend the most time with their children on their homework and academics in general.
grades1: grade level of the student, (binary dependent variable) where 1 indicates high grade level and 0 indicates low grade level
raisedhands: the number of times student raises their hands in class
gender: whether the student is male of female
StageID: educational level of the student (lowerlevel, middleschool, and highschool)
relation: parent responsible for the child, mom or dad
library(readr)
Student_Academic_Data<-read_csv("C:\\Users\\Sangita Roy\\Desktop\\Student_Academic_Data.csv")
head(Student_Academic_Data)
unique(Student_Academic_Data$Class)
[1] "M" "L" "H"
library(dplyr)
Student_Data1<- Student_Academic_Data %>%
rename(Nationality=NationalITy,
VisitedResources=VisITedResources,
Grades=Class) %>%
mutate(grades1=factor(ifelse(Grades=="H",1,0))) %>%
mutate(gender = as.factor(gender)) %>%
mutate(StageID=as.factor(StageID)) %>%
mutate(Relation=as.factor(Relation))
head(Student_Data1)
summary(Student_Data1)
gender Nationality PlaceofBirth StageID
F:175 Length:480 Length:480 HighSchool : 33
M:305 Class :character Class :character lowerlevel :199
Mode :character Mode :character MiddleSchool:248
GradeID SectionID Topic
Length:480 Length:480 Length:480
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
Semester Relation raisedhands VisitedResources
Length:480 Father:283 Min. : 0.00 Min. : 0.0
Class :character Mum :197 1st Qu.: 15.75 1st Qu.:20.0
Mode :character Median : 50.00 Median :65.0
Mean : 46.77 Mean :54.8
3rd Qu.: 75.00 3rd Qu.:84.0
Max. :100.00 Max. :99.0
AnnouncementsView Discussion ParentAnsweringSurvey
Min. : 0.00 Min. : 1.00 Length:480
1st Qu.:14.00 1st Qu.:20.00 Class :character
Median :33.00 Median :39.00 Mode :character
Mean :37.92 Mean :43.28
3rd Qu.:58.00 3rd Qu.:70.00
Max. :98.00 Max. :99.00
ParentschoolSatisfaction StudentAbsenceDays Grades grades1
Length:480 Length:480 Length:480 0:338
Class :character Class :character Class :character 1:142
Mode :character Mode :character Mode :character
library(Zelig)
z_stud <- zlogit$new()
z_stud$zelig(grades1 ~ raisedhands + gender + Relation + StageID, data = Student_Data1)
summary(z_stud)
Model:
Call:
z_stud$zelig(formula = grades1 ~ raisedhands + gender + Relation +
StageID, data = Student_Data1)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8056 -0.6709 -0.2989 0.7113 2.6189
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.89852 0.55236 -5.247 1.54e-07
raisedhands 0.04240 0.00525 8.076 6.69e-16
genderM -0.63445 0.25208 -2.517 0.0118
RelationMum 1.17620 0.25077 4.690 2.73e-06
StageIDlowerlevel -0.25898 0.49438 -0.524 0.6004
StageIDMiddleSchool -0.79609 0.49495 -1.608 0.1077
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 583.00 on 479 degrees of freedom
Residual deviance: 411.21 on 474 degrees of freedom
AIC: 423.21
Number of Fisher Scoring iterations: 5
Next step: Use 'setx' method
The results of the first model shows that the number of times students raise their hands in class, gender, and mothers responsible for the child in school affect student’s academic success. As the number of times students raise their hands in class increase, student academic success increases by 0.04. In other words, students who raise their hands in class are more likely to obtain high grades. Student academic success increases by 1.17 for those who have mothers responsible for them. Academic success in school decreases for females in high school by 2.8. The results for raisehands, gender(M), and RelationMum are significant (p<.05).
z_stud1 <- zlogit$new()
z_stud1$zelig(grades1 ~ raisedhands + gender*Relation + StageID, data = Student_Data1)
summary(z_stud1)
Model:
Call:
z_stud1$zelig(formula = grades1 ~ raisedhands + gender * Relation +
StageID, data = Student_Data1)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9534 -0.6602 -0.2969 0.6099 2.7115
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.514811 0.622313 -5.648 1.62e-08
raisedhands 0.042175 0.005266 8.009 1.16e-15
genderM 0.277576 0.417401 0.665 0.50604
RelationMum 2.150076 0.436102 4.930 8.21e-07
StageIDlowerlevel -0.284190 0.504624 -0.563 0.57332
StageIDMiddleSchool -0.768138 0.504167 -1.524 0.12761
genderM:RelationMum -1.555217 0.538001 -2.891 0.00384
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 583.00 on 479 degrees of freedom
Residual deviance: 402.42 on 473 degrees of freedom
AIC: 416.42
Number of Fisher Scoring iterations: 5
Next step: Use 'setx' method
The second model shows an interaction between gender and relation along with differences in education level and differences in the number of times students raise their hands in class. As we can see, the interaction between gender and relation is significant (p<.05).
z_stud$setrange(raisedhands=0:100)
z_stud$sim()
z_stud$graph()
The graph shows as the number of times students raise their hands in class increases, the probability of academic success also increases.
z_stud1$setx(gender="M")
z_stud1$setx1(gender="F")
z_stud1$sim()
fd <- z_stud1$get_qi(xvalue="x1", qi="fd")
summary(fd)
V1
Min. :-0.1534174
1st Qu.:-0.0491164
Median :-0.0258543
Mean :-0.0230186
3rd Qu.: 0.0006766
Max. : 0.1719998
plot(z_stud1)
Since gender demonstrated to be a significant factor in influencing student academic success, gender differences are being illustrated in the plots.
Lower Level
zstud.L <- zlogit$new()
zstud.L$zelig(grades1 ~ raisedhands+ Relation*gender + StageID, data = Student_Data1)
zstud.L$setx(gender = "M", StageID = "lowerlevel")
zstud.L$setx1(gender = "F", StageID = "lowerlevel")
zstud.L$sim()
plot(zstud.L)
Gender differences in education level are being shown in the plots using the first education level in this data set which is “lowerlevel” between males and females.
Middle School
zstud.M <- zlogit$new()
zstud.M$zelig(grades1 ~ raisedhands+ Relation*gender + StageID, data = Student_Data1)
zstud.M$setx(gender = "M", StageID = "MiddleSchool")
zstud.M$setx1(gender = "F", StageID = "MiddleSchool")
zstud.M$sim()
plot(zstud.M)
Gender differences of students in “MiddleSchool” are shown.
High School
zstud.H <- zlogit$new()
zstud.H$zelig(grades1 ~ raisedhands+ Relation*gender + StageID, data = Student_Data1)
zstud.H$setx(gender = "M", StageID = "HighSchool")
zstud.H$setx1(gender = "F", StageID = "HighSchool")
zstud.H$sim()
plot(zstud.H)
Gender differences of students in highschool are being illustrated.
d1 <- zstud.L$get_qi(xvalue="x1", qi="fd")
d2 <- zstud.M$get_qi(xvalue="x1", qi="fd")
d3 <- zstud.H$get_qi(xvalue="x1", qi="fd")
dfd <- as.data.frame(cbind(d1, d2, d3))
head(dfd)
Sort
library(tidyr)
tidd <- dfd %>%
gather(StageID, simv, 1:3)
head(tidd)
Group
tidd %>%
group_by(StageID) %>%
summarise(mean = mean(simv), sd = sd(simv))
Plot
library(ggplot2)
ggplot(tidd, aes(simv)) + geom_histogram() + facet_grid(~StageID)
Histogram shows there are not significant gender differences within education levels in terms of academic success between lowerlevel and middleschool. However, the greatest variation can be seen in the “highschool” group.
Elaf Abu Amrieh, Thair Hamtini, and Ibrahim Aljarah, The University of Jordan, Amman, Jordan, http://www.Ibrahimaljarah.com www.ju.edu.jo