This document presents an analysis of the behaviour of students in Sri Guru Tegh Bahadur Khalsa College, University of Delhi in terms of their attendance and marks. This is presented in various ways including year-wise and course-wise attendance/marks.
The data for this purpose has been taken from https://sgtbkhalsa.online/uploadfiles/postings/notices available in the 13.pdf, 14.pdf and 15.pdf files.
The attendance data contains attendance of students who took admission in 2021, 2022, and 2023. It contains cumulative attendance of all students. Marks data contains marks of students who took admission in 2022, 2023, and 2024. It contains marks in the most recent semester at that time.
This document contains the source-code. To obtain CSV files, contact the authors. An interactive application pertaining to section 3 is linked in appendix D.
We load and present the datasets:
attendance <- read.csv("attendance.csv", header=T)
head(attendance)
We add a new column that denotes the year that the student studied in the session 2023–24:
attendance <- attendance %>%
mutate(Year=case_when(
grepl("^2023", RollNo) ~ "1",
grepl("^2022", RollNo) ~ "2",
grepl("^2021", RollNo) ~ "3",
TRUE ~ NA
))
We show year-wise number of students present in the dataset:
yrwisestudents <- attendance %>%
group_by(Year) %>%
summarise(
Total=n()
)
yrwisestudents
ggplot(yrwisestudents, aes(x="",
y=Total,
fill=Year)) +
geom_bar(stat="identity",
width=1) +
labs(x="",
y="",
title="Number of Students",
caption="The data corresponds to the session 2023-24") +
coord_polar(theta="y")
ggsave("img/no.ofstudents.png", last_plot(), dpi=300)
## Saving 7 x 5 in image
We add a new column that denotes the department of the student. This
is done using a mapping code \(\to\)
department called courses.
attendance <- attendance %>%
mutate(Department=courses[substr(RollNo, 5, 7)])
We perform the same operations for marks dataset. However, the marks data is one year later than attendance, thus the change in years:
marks <- read.csv("marks.csv",
header=T)
marks <- marks %>%
mutate(Year = case_when(
grepl("^2024", RollNo) ~ "1",
grepl("^2023", RollNo) ~ "2",
grepl("^2022", RollNo) ~ "3",
TRUE ~ NA
))
marks <- marks %>%
mutate(Department = courses[substr(RollNo, 5, 7)])
head(marks)
marks %>%
group_by(Year) %>%
summarise(
Total=n()
)
We now add a new column corresponding to the paper type (AEC, SEC, GE, Core, VAC):
marks <- marks %>%
mutate(PaperType=case_when(
grepl("^SEC", PaperCode) ~ "SEC",
grepl("[A-Z]G-[0-9]+$", PaperCode) ~ "GE",
grepl("^AEC", PaperCode) ~ "AEC",
grepl("^VAC", PaperCode) ~ "VAC",
TRUE ~ "Core"
))
Since Maximum IA marks are different for each paper type therefore, corresponding to each type we find the percentage of IA marks
marks <- marks %>%
mutate(IAPercentage=case_when(
PaperType=="SEC" ~ NA,
PaperType=="GE" ~ TotalScoreIA*100/30,
grepl("^AEC-EVS", PaperCode) ~ TotalScoreIA*10,
grepl("^AEC", PaperCode) ~ TotalScoreIA*5,
PaperType=="VAC" ~ TotalScoreIA*10,
TRUE ~ TotalScoreIA*100/30
))
Finally, treating NA values in the IA & CA columns — We are choosing to replace NA values by the next values. This way, the NA values will be replaced by a value belonging to the same course, paper, or perhaps student.
marks$TotalScoreIA <- na.locf(marks$TotalScoreIA)
marks$TotalScoreCA <- na.locf(marks$TotalScoreCA)
ggplot(attendance, aes(x="",
y=Percentage)) +
geom_boxplot(linewidth=1) +
labs(x="All Students",
y="Percentage",
title="Attendance of Students",
caption="This is for students of all years.")
ggsave("img/boxplotall.png", last_plot(), dpi=300)
## Saving 7 x 5 in image
In numbers:
summary(attendance$Percentage)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 12.93 33.33 35.79 56.01 96.87
Some interpretations are that
The lowest of attendance certainly goes to 0%, though the highest goes upto over 95%. We show these top-rankers.
attendance %>%
filter(Percentage >= 94) %>%
select(StudentName,
Department,
Year,
Percentage) %>%
arrange(desc(Percentage))
Below, we show the year-wise attendance of the students. It shows a decline in the attendance percentage over the successive years as expected. Notice the change in number of outliers over the years.
ggplot(attendance, aes(x="",
y=Percentage,
col=Year)) +
geom_boxplot(linewidth=1) +
labs(x="",
y="Percentage",
title="Attendance of Students")
ggsave("img/boxplot3.png", last_plot(), dpi=300)
## Saving 7 x 5 in image
In numbers:
attendance %>%
group_by(Year) %>%
summarise(
Avg=mean(Percentage),
Max=max(Percentage),
Median=median(Percentage)
)
For students having attendance over 66%:—
attendance %>%
filter(Percentage>=66) %>%
count(Year, name="ef") %>%
right_join(yrwisestudents, by="Year") %>%
mutate(Percentage=ef/Total*100) %>%
select(Year, Total, Percentage)
For students having attendance over 85%:—
attendance %>%
filter(Percentage>=85) %>%
count(Year, name="ef") %>%
right_join(yrwisestudents, by="Year") %>%
mutate(Percentage=ef/Total*100) %>%
select(Year, Total, Percentage)
In this category, the 3rd year students seem to be leading. In total, or out of over 3000 students, only 85 of them attended more than 85% of the classes.
Below we show the average attendance and average delivered lectures for each department. In this regard, we note that Zoology has held the most number of lectures, whereas B.Com (Prog.) the least.
attendanceDept <- attendance %>%
group_by(Department) %>%
mutate(AvgAttendance=mean(Attended,
na.rm=T),
AvgDelivered=mean(AdjustedDelivered,
na.rm=T))
attendanceDept %>%
pivot_longer(., cols=c(AvgAttendance,
AvgDelivered),
names_to="Type",
values_to="Score") %>%
ggplot(., aes(x=Department,
y=Score,
fill=Type)) +
geom_bar(stat="identity",
position="dodge") +
labs(x="Department",
y="Attendance",
title="Department wise Attendance") +
theme(axis.text.x=element_text(angle=60,
hjust=1))
ggsave("img/attdept1.png", last_plot(), dpi=300)
## Saving 7 x 5 in image
Obviously, departments holding less lectures would record less attendance, but the ratio between both metrics can be looked and further interpretations can be made.
attendanceDept %>%
mutate(Ratio=AvgAttendance/AvgDelivered) %>%
ggplot(., aes(x=Department,
y=Ratio)) +
geom_bar(stat="identity",
position="dodge") +
labs(x="Department",
y="Attendance",
title="Department wise Attendance") +
theme(axis.text.x=element_text(angle=45,
hjust=1))
In this section, we analyse the marks obtained by all students. The analysis will be focused on only IA marks. See appendix A for CA marks. Firstly, we show density plot for IA marks.
ggplot(marks, aes(x=IAPercentage,
col=Year)) +
geom_density(linewidth=1) +
labs(x="Marks",
y="Density",
title="IA Marks")
ggsave("img/densityIA.png", last_plot(), dpi=300)
## Saving 7 x 5 in image
The distribution is more evenly spread for 1st and 2nd year students than for 3rd years.
Notice that 2500 rows have been ignored as data there is out of valid range. Specifically, the greatest majority of these rows pertains to SEC courses where marks are not listed at all except for a few cases.
Below, we present the average marks awarded per department. Students in Botany department scored the least marks in internal assessment, whereas top-rankers are from Economics.
marks %>%
group_by(Department) %>%
summarise(AvgMarks=mean(IAPercentage,
na.rm=T)) %>%
ggplot(., aes(x=Department,
y=AvgMarks)) +
geom_bar(stat="identity",
position="dodge") +
labs(x="Department",
y="Average Marks",
fill="Category",
title="IA Marks of Students by Department") +
theme(axis.text.x=element_text(angle=45,
hjust=1))
ggsave("img/marksdept.png", last_plot(), dpi=300)
## Saving 7 x 5 in image
marks %>%
group_by(Year) %>%
summarise(AvgMarks=mean(IAPercentage,
na.rm=T)) %>%
ggplot(., aes(x=Year,
y=AvgMarks)) +
geom_bar(stat="identity",
position="dodge") +
labs(x="Year",
y="Average Marks",
fill="Category",
title="IA Marks of Students by Years")
ggsave("img/marksyear.png", last_plot(), dpi=300)
## Saving 7 x 5 in image
No comments.
The papers mentioned below are among those in which students have scored near the most (and the least) marks, taken on an average.
avgbypaper <- marks %>%
aggregate(IAPercentage ~ PaperName,
data=.,
FUN=mean) %>%
melt(.,
id.vars="PaperName",
variable.name="MarksType",
value.name="AverageMarks")
avgbypaper %>%
filter(MarksType == "IAPercentage") %>%
select(PaperName, AverageMarks) %>%
arrange(desc(AverageMarks)) %>%
head(20)
avgbypaper %>%
filter(MarksType == "IAPercentage") %>%
select(PaperName, AverageMarks) %>%
arrange(AverageMarks) %>%
head(20)
Last but the most interesting: in this (sub) section, we analyse the relation between attendance percentage of, and marks received by the students. Throughout this section, the dataset includes students who took admission in 2022 and 2023, so there is only two batches of students.
marks_avg <- marks %>%
group_by(RollNo) %>%
summarise(
IA=mean(IAPercentage, na.rm=T),
CA=mean(TotalScoreCA[TotalScoreCA>0])
)
marksattendance <- merge(marks_avg, attendance, by="RollNo")
set.seed(1)
ggplot(
marksattendance[sample(nrow(marksattendance), 1000), ],
aes(x=as.numeric(Percentage),
y=as.numeric(IA),
color=Year)) +
geom_point(alpha=.5) +
labs(x="Attendance Percentage",
y="IA Marks Percentage",
color="Year",
title="Relation between Attendance and IA Marks") +
geom_smooth(method=loess,
se=F)
## `geom_smooth()` using formula = 'y ~ x'
ggsave("img/attvsIA.png", last_plot(), dpi=300)
## Saving 7 x 5 in image
## `geom_smooth()` using formula = 'y ~ x'
Certainly, there is a direct relation among attendance and marks obtained. Same holds for CA (appendix A). See appendix C for a department-wise display. One observation is that 2nd year students have consistently scored more marks than 1st year students, specially in IA. See appendix D for an interactive application related to this.
This document analysed student attendance and marks at Sri Guru Tegh Bahadur Khalsa College, University of Delhi covering data from 2021–2024. It finds a strong correlation between attendance and marks, with higher attendance linked to better performance. Attendance declined over the years, and only a small percentage (at average 2.5%) had over 85% attendance. Parts of the analysis such as those listing top and bottom lying papers (in terms of average marks scored therein) can specially be helpful for students in making their choices. One of the most interesting takes from this analysis (which is also more of a confirmation rather than fresh fact) is the direct relation between attendance and marks.
The data for CA marks is not as reliable as is for IA marks, on the following counts:
The above facts must be kept in mind before making any interpretations from the following analysis. For example, departments of natural sciences such as Physics will record terrible CA marks as most of the papers being taught therein are practical in nature. Whereas, departments of humanities will record comparatively wonderful CA marks as no paper there is practical is nature.
We start with something that clearly proves what has been stated above:
marks %>%
group_by(Department) %>%
summarise(AvgMarks=mean(TotalScoreCA,
na.rm=T)) %>%
ggplot(., aes(x=Department,
y=AvgMarks)) +
geom_bar(stat="identity",
position="dodge") +
labs(x="Department",
y="Average Marks",
fill="Category",
title="CA Marks of Students by Department") +
theme(axis.text.x=element_text(angle=45,
hjust=1))
ggsave("img/CAmarksdept.png", last_plot(), dpi=300)
## Saving 7 x 5 in image
To make things a little better, we employ the following remedy: remove all zero entries in CA column.
marksCA <- marks %>% filter(TotalScoreCA>0)
and with just that, we find the previous graph has significantly improved:
marksCA %>%
group_by(Department) %>%
summarise(AvgMarks=mean(TotalScoreCA,
na.rm=T)) %>%
ggplot(., aes(x=Department,
y=AvgMarks)) +
geom_bar(stat="identity",
position="dodge") +
labs(x="Department",
y="Average Marks",
fill="Category",
title="CA Marks of Students by Department") +
theme(axis.text.x=element_text(angle=45,
hjust=1))
ggsave("img/CAmarksdept.png", last_plot(), dpi=300)
## Saving 7 x 5 in image
However this is only slightly better. If we look at the number of entries from each department now, we find it is concerning in terms of how uneven it is.
marksCA %>%
group_by(Department) %>%
summarise(Count=n()) %>%
mutate(Percentage=Count/sum(Count)*100) %>%
ggplot(., aes(x=Department,
y=Count)) +
geom_bar(stat="identity",
position="dodge") +
theme(axis.text.x=element_text(angle=45,
hjust=1))
Regardless, talking about the relation between attendance and CA marks here, we find that the result is still the same: more attendance yields in more marks.
set.seed(1)
ggplot(
marksattendance[sample(nrow(marksattendance), 1000), ],
aes(x=as.numeric(Percentage),
y=as.numeric(CA),
color=Year)) +
geom_point(alpha=.5) +
labs(x="Attendance Percentage",
y="CA Marks",
color="Year",
title="Relation between Attendance and CA Marks") +
geom_smooth(method=loess,
se=F)
## `geom_smooth()` using formula = 'y ~ x'
ggsave("img/attvsCA.png", last_plot(), dpi=300)
## Saving 7 x 5 in image
## `geom_smooth()` using formula = 'y ~ x'
As notified earlier, the attendance data concerns students who took admission in 2021, 2022, and 2023; whereas marks data concerns students taking admission in 2022, 2023, and 2024. Furthermore, since marks are listed for the most recent semester whereas attendance is listed in a cumulative manner, therefore parts of this analysis — specially the relation between marks & attendance — rely upon the assumption that if a student with good cumulative attendance scored good marks in certain semester, he will continue to do so in other semesters as well.
ggplot(
marksattendance,
aes(x=as.numeric(Percentage),
y=as.numeric(IA),
color=Year)) +
geom_point(alpha=.5) +
labs(x="Attendance Percentage",
y="IA Marks",
color="Year",
title="Attendance vs IA by Dept") +
facet_wrap(~Department)
ggsave("img/attvsIAdept.png",
last_plot(),
dpi=500,
width=12,
height=8.5)
ggplot(
marksattendance,
aes(x=as.numeric(Percentage),
y=as.numeric(CA),
color=Year)) +
geom_point(alpha=.5) +
labs(x="Attendance Percentage",
y="IA Marks",
color="Year",
title="Attendance vs CA by Dept") +
facet_wrap(~Department)
ggsave("img/attvsCAdept.png",
last_plot(),
dpi=500,
width=12,
height=8.5)
We created a shiny application for interactively viewing the relation between attendance and IA marks. Students can go to the application and see where they stand in the scatter plot. By entering their roll number, their dot on the scatter plot will be highlighted. View namantaggar.shinyapps.io/sgtb.