1. Introduction

This document presents an analysis of the behaviour of students in Sri Guru Tegh Bahadur Khalsa College, University of Delhi in terms of their attendance and marks. This is presented in various ways including year-wise and course-wise attendance/marks.

The data for this purpose has been taken from https://sgtbkhalsa.online/uploadfiles/postings/notices available in the 13.pdf, 14.pdf and 15.pdf files.

The attendance data contains attendance of students who took admission in 2021, 2022, and 2023. It contains cumulative attendance of all students. Marks data contains marks of students who took admission in 2022, 2023, and 2024. It contains marks in the most recent semester at that time.

This document contains the source-code. To obtain CSV files, contact the authors. An interactive application pertaining to section 3 is linked in appendix D.

We load and present the datasets:

attendance <- read.csv("attendance.csv", header=T)
head(attendance)

We add a new column that denotes the year that the student studied in the session 2023–24:

attendance <- attendance %>%
  mutate(Year=case_when(
    grepl("^2023", RollNo) ~ "1",
    grepl("^2022", RollNo) ~ "2",
    grepl("^2021", RollNo) ~ "3",
    TRUE ~ NA
  ))

We show year-wise number of students present in the dataset:

yrwisestudents <- attendance %>%
  group_by(Year) %>%
  summarise(
    Total=n()
  )
yrwisestudents
ggplot(yrwisestudents, aes(x="",
                           y=Total,
                           fill=Year)) +
  geom_bar(stat="identity",
           width=1) +
  labs(x="",
       y="",
       title="Number of Students",
       caption="The data corresponds to the session 2023-24") +
  coord_polar(theta="y")

ggsave("img/no.ofstudents.png", last_plot(), dpi=300)
## Saving 7 x 5 in image

We add a new column that denotes the department of the student. This is done using a mapping code \(\to\) department called courses.

attendance <- attendance %>%
  mutate(Department=courses[substr(RollNo, 5, 7)])

We perform the same operations for marks dataset. However, the marks data is one year later than attendance, thus the change in years:

marks <- read.csv("marks.csv",
                  header=T)
marks <- marks %>%
  mutate(Year = case_when(
    grepl("^2024", RollNo) ~ "1",
    grepl("^2023", RollNo) ~ "2",
    grepl("^2022", RollNo) ~ "3",
    TRUE ~ NA
  ))
marks <- marks %>%
  mutate(Department = courses[substr(RollNo, 5, 7)])
head(marks)
marks %>%
  group_by(Year) %>%
  summarise(
    Total=n()
  )

We now add a new column corresponding to the paper type (AEC, SEC, GE, Core, VAC):

marks <- marks %>%
  mutate(PaperType=case_when(
    grepl("^SEC", PaperCode) ~ "SEC",
    grepl("[A-Z]G-[0-9]+$", PaperCode) ~ "GE",
    grepl("^AEC", PaperCode) ~ "AEC",
    grepl("^VAC", PaperCode) ~ "VAC",
    TRUE ~ "Core"
  ))

Since Maximum IA marks are different for each paper type therefore, corresponding to each type we find the percentage of IA marks

marks <- marks %>%
  mutate(IAPercentage=case_when(
    PaperType=="SEC" ~ NA,
    PaperType=="GE" ~ TotalScoreIA*100/30,
    grepl("^AEC-EVS", PaperCode) ~ TotalScoreIA*10,
    grepl("^AEC", PaperCode) ~ TotalScoreIA*5,
    PaperType=="VAC" ~ TotalScoreIA*10,
    TRUE ~ TotalScoreIA*100/30
  ))

Finally, treating NA values in the IA & CA columns — We are choosing to replace NA values by the next values. This way, the NA values will be replaced by a value belonging to the same course, paper, or perhaps student.

marks$TotalScoreIA <- na.locf(marks$TotalScoreIA)
marks$TotalScoreCA <- na.locf(marks$TotalScoreCA)

2. Introductory Analysis

ggplot(attendance, aes(x="",
                       y=Percentage)) +
  geom_boxplot(linewidth=1) +
  labs(x="All Students",
       y="Percentage",
       title="Attendance of Students",
       caption="This is for students of all years.")

ggsave("img/boxplotall.png", last_plot(), dpi=300)
## Saving 7 x 5 in image

In numbers:

summary(attendance$Percentage)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   12.93   33.33   35.79   56.01   96.87

Some interpretations are that

The lowest of attendance certainly goes to 0%, though the highest goes upto over 95%. We show these top-rankers.

attendance %>%
  filter(Percentage >= 94) %>%
  select(StudentName,
         Department,
         Year,
         Percentage) %>%
  arrange(desc(Percentage))

Below, we show the year-wise attendance of the students. It shows a decline in the attendance percentage over the successive years as expected. Notice the change in number of outliers over the years.

ggplot(attendance, aes(x="",
                       y=Percentage,
                       col=Year)) +
  geom_boxplot(linewidth=1) +
  labs(x="",
       y="Percentage",
       title="Attendance of Students")

ggsave("img/boxplot3.png", last_plot(), dpi=300)
## Saving 7 x 5 in image

In numbers:

attendance %>%
  group_by(Year) %>%
  summarise(
    Avg=mean(Percentage),
    Max=max(Percentage),
    Median=median(Percentage)
  )

For students having attendance over 66%:—

attendance %>%
  filter(Percentage>=66) %>%
  count(Year, name="ef") %>%
  right_join(yrwisestudents, by="Year") %>%
  mutate(Percentage=ef/Total*100) %>%
  select(Year, Total, Percentage)

For students having attendance over 85%:—

attendance %>%
  filter(Percentage>=85) %>%
  count(Year, name="ef") %>%
  right_join(yrwisestudents, by="Year") %>%
  mutate(Percentage=ef/Total*100) %>%
  select(Year, Total, Percentage)

In this category, the 3rd year students seem to be leading. In total, or out of over 3000 students, only 85 of them attended more than 85% of the classes.

Course-Wise Attendance

Below we show the average attendance and average delivered lectures for each department. In this regard, we note that Zoology has held the most number of lectures, whereas B.Com (Prog.) the least.

attendanceDept <- attendance %>%
  group_by(Department) %>%
  mutate(AvgAttendance=mean(Attended,
                            na.rm=T),
         AvgDelivered=mean(AdjustedDelivered,
                           na.rm=T)) 
attendanceDept %>% 
  pivot_longer(., cols=c(AvgAttendance,
                         AvgDelivered),
               names_to="Type",
               values_to="Score") %>% 
  ggplot(., aes(x=Department,
                y=Score,
                fill=Type)) +
  geom_bar(stat="identity",
           position="dodge") +
  labs(x="Department",
       y="Attendance",
       title="Department wise Attendance") +
  theme(axis.text.x=element_text(angle=60,
                                 hjust=1))

ggsave("img/attdept1.png", last_plot(), dpi=300)
## Saving 7 x 5 in image

Obviously, departments holding less lectures would record less attendance, but the ratio between both metrics can be looked and further interpretations can be made.

attendanceDept %>%
  mutate(Ratio=AvgAttendance/AvgDelivered) %>% 
  ggplot(., aes(x=Department,
                y=Ratio)) +
  geom_bar(stat="identity",
           position="dodge") +
  labs(x="Department",
       y="Attendance",
       title="Department wise Attendance") +
  theme(axis.text.x=element_text(angle=45,
                                 hjust=1))

3. Marks

In this section, we analyse the marks obtained by all students. The analysis will be focused on only IA marks. See appendix A for CA marks. Firstly, we show density plot for IA marks.

ggplot(marks, aes(x=IAPercentage,
                  col=Year)) +
  geom_density(linewidth=1) +
  labs(x="Marks",
       y="Density",
       title="IA Marks")

ggsave("img/densityIA.png", last_plot(), dpi=300)
## Saving 7 x 5 in image

The distribution is more evenly spread for 1st and 2nd year students than for 3rd years.

Notice that 2500 rows have been ignored as data there is out of valid range. Specifically, the greatest majority of these rows pertains to SEC courses where marks are not listed at all except for a few cases.

Department wise Marks

Below, we present the average marks awarded per department. Students in Botany department scored the least marks in internal assessment, whereas top-rankers are from Economics.

marks %>%
  group_by(Department) %>%
  summarise(AvgMarks=mean(IAPercentage,
                          na.rm=T)) %>% 
  ggplot(., aes(x=Department,
                y=AvgMarks)) + 
  geom_bar(stat="identity",
           position="dodge") +
  labs(x="Department",
       y="Average Marks",
       fill="Category",
       title="IA Marks of Students by Department") +
  theme(axis.text.x=element_text(angle=45,
                                 hjust=1))

ggsave("img/marksdept.png", last_plot(), dpi=300)
## Saving 7 x 5 in image

Year wise Marks

marks %>%
  group_by(Year) %>%
  summarise(AvgMarks=mean(IAPercentage,
                          na.rm=T)) %>% 
  ggplot(., aes(x=Year,
                y=AvgMarks)) +  
  geom_bar(stat="identity",
           position="dodge") +
  labs(x="Year",
       y="Average Marks",
       fill="Category",
       title="IA Marks of Students by Years")

ggsave("img/marksyear.png", last_plot(), dpi=300)
## Saving 7 x 5 in image

No comments.

Paper wise Marks

The papers mentioned below are among those in which students have scored near the most (and the least) marks, taken on an average.

avgbypaper <- marks %>% 
  aggregate(IAPercentage ~ PaperName,
            data=.,
            FUN=mean) %>%
  melt(.,
       id.vars="PaperName",
       variable.name="MarksType",
       value.name="AverageMarks")

avgbypaper %>%
  filter(MarksType == "IAPercentage") %>%
  select(PaperName, AverageMarks) %>% 
  arrange(desc(AverageMarks)) %>%
  head(20)
avgbypaper %>%
  filter(MarksType == "IAPercentage") %>%
  select(PaperName, AverageMarks) %>% 
  arrange(AverageMarks) %>%
  head(20)

Relation between Marks and Attendance

Last but the most interesting: in this (sub) section, we analyse the relation between attendance percentage of, and marks received by the students. Throughout this section, the dataset includes students who took admission in 2022 and 2023, so there is only two batches of students.

marks_avg <- marks %>%
  group_by(RollNo) %>%
  summarise(
    IA=mean(IAPercentage, na.rm=T),
    CA=mean(TotalScoreCA[TotalScoreCA>0])
  )
marksattendance <- merge(marks_avg, attendance, by="RollNo")
set.seed(1)
ggplot(
  marksattendance[sample(nrow(marksattendance), 1000), ],
  aes(x=as.numeric(Percentage),
      y=as.numeric(IA),
      color=Year)) +
  geom_point(alpha=.5) +
  labs(x="Attendance Percentage",
       y="IA Marks Percentage",
       color="Year",
       title="Relation between Attendance and IA Marks") +
  geom_smooth(method=loess,
              se=F)
## `geom_smooth()` using formula = 'y ~ x'

ggsave("img/attvsIA.png", last_plot(), dpi=300)
## Saving 7 x 5 in image
## `geom_smooth()` using formula = 'y ~ x'

Certainly, there is a direct relation among attendance and marks obtained. Same holds for CA (appendix A). See appendix C for a department-wise display. One observation is that 2nd year students have consistently scored more marks than 1st year students, specially in IA. See appendix D for an interactive application related to this.

4. Conclusion

This document analysed student attendance and marks at Sri Guru Tegh Bahadur Khalsa College, University of Delhi covering data from 2021–2024. It finds a strong correlation between attendance and marks, with higher attendance linked to better performance. Attendance declined over the years, and only a small percentage (at average 2.5%) had over 85% attendance. Parts of the analysis such as those listing top and bottom lying papers (in terms of average marks scored therein) can specially be helpful for students in making their choices. One of the most interesting takes from this analysis (which is also more of a confirmation rather than fresh fact) is the direct relation between attendance and marks.

A. CA Marks

The data for CA marks is not as reliable as is for IA marks, on the following counts:

  1. For practical papers, the college does not provide marks obtained in CA. It lists them as zeroes. This affects all analysis pertaining to this category.
  2. For SEC papers, no marks are listed except perhaps a handful exceptional cases.

The above facts must be kept in mind before making any interpretations from the following analysis. For example, departments of natural sciences such as Physics will record terrible CA marks as most of the papers being taught therein are practical in nature. Whereas, departments of humanities will record comparatively wonderful CA marks as no paper there is practical is nature.

We start with something that clearly proves what has been stated above:

marks %>%
  group_by(Department) %>%
  summarise(AvgMarks=mean(TotalScoreCA,
                          na.rm=T)) %>% 
  ggplot(., aes(x=Department,
                y=AvgMarks)) + 
  geom_bar(stat="identity",
           position="dodge") +
  labs(x="Department",
       y="Average Marks",
       fill="Category",
       title="CA Marks of Students by Department") +
  theme(axis.text.x=element_text(angle=45,
                                 hjust=1))

ggsave("img/CAmarksdept.png", last_plot(), dpi=300)
## Saving 7 x 5 in image

To make things a little better, we employ the following remedy: remove all zero entries in CA column.

marksCA <- marks %>% filter(TotalScoreCA>0)

and with just that, we find the previous graph has significantly improved:

marksCA %>%
  group_by(Department) %>%
  summarise(AvgMarks=mean(TotalScoreCA,
                          na.rm=T)) %>% 
  ggplot(., aes(x=Department,
                y=AvgMarks)) + 
  geom_bar(stat="identity",
           position="dodge") +
  labs(x="Department",
       y="Average Marks",
       fill="Category",
       title="CA Marks of Students by Department") +
  theme(axis.text.x=element_text(angle=45,
                                 hjust=1))

ggsave("img/CAmarksdept.png", last_plot(), dpi=300)
## Saving 7 x 5 in image

However this is only slightly better. If we look at the number of entries from each department now, we find it is concerning in terms of how uneven it is.

marksCA %>%
  group_by(Department) %>%
  summarise(Count=n()) %>%
  mutate(Percentage=Count/sum(Count)*100) %>% 
  ggplot(., aes(x=Department,
                y=Count)) +
  geom_bar(stat="identity",
           position="dodge") +
  theme(axis.text.x=element_text(angle=45,
                                 hjust=1))

Regardless, talking about the relation between attendance and CA marks here, we find that the result is still the same: more attendance yields in more marks.

set.seed(1)
ggplot(
  marksattendance[sample(nrow(marksattendance), 1000), ],
  aes(x=as.numeric(Percentage),
      y=as.numeric(CA),
      color=Year)) +
  geom_point(alpha=.5) +
  labs(x="Attendance Percentage",
       y="CA Marks",
       color="Year",
       title="Relation between Attendance and CA Marks") +
  geom_smooth(method=loess,
              se=F)
## `geom_smooth()` using formula = 'y ~ x'

ggsave("img/attvsCA.png", last_plot(), dpi=300)
## Saving 7 x 5 in image
## `geom_smooth()` using formula = 'y ~ x'

B. Disclaimer

As notified earlier, the attendance data concerns students who took admission in 2021, 2022, and 2023; whereas marks data concerns students taking admission in 2022, 2023, and 2024. Furthermore, since marks are listed for the most recent semester whereas attendance is listed in a cumulative manner, therefore parts of this analysis — specially the relation between marks & attendance — rely upon the assumption that if a student with good cumulative attendance scored good marks in certain semester, he will continue to do so in other semesters as well.

C. Detailed Marks vs. Attendance

ggplot(
  marksattendance,
  aes(x=as.numeric(Percentage),
      y=as.numeric(IA),
      color=Year)) +
  geom_point(alpha=.5) +
  labs(x="Attendance Percentage",
       y="IA Marks",
       color="Year",
       title="Attendance vs IA by Dept") +
  facet_wrap(~Department)

ggsave("img/attvsIAdept.png",
       last_plot(),
       dpi=500,
       width=12,
       height=8.5)
ggplot(
  marksattendance,
  aes(x=as.numeric(Percentage),
      y=as.numeric(CA),
      color=Year)) +
  geom_point(alpha=.5) +
  labs(x="Attendance Percentage",
       y="IA Marks",
       color="Year",
       title="Attendance vs CA by Dept") +
  facet_wrap(~Department)

ggsave("img/attvsCAdept.png",
       last_plot(),
       dpi=500,
       width=12,
       height=8.5)

D. Shiny App

We created a shiny application for interactively viewing the relation between attendance and IA marks. Students can go to the application and see where they stand in the scatter plot. By entering their roll number, their dot on the scatter plot will be highlighted. View namantaggar.shinyapps.io/sgtb.

E. Authors

marksattendance %>% 
  filter(StudentName=="Naman Taggar" | StudentName=="DIPANSH CHAUDHARY") %>% 
  select(StudentName,
         RollNo,
         IA,
         CA,
         Percentage) %>% 
  rename(
    "Name"=StudentName,
    "Roll Number"=RollNo,
    "IA (%)"=IA,
    "CA (Marks)"=CA,
    "Attendance (%)"=Percentage
  )

sessionInfo()
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_India.utf8  LC_CTYPE=English_India.utf8   
## [3] LC_MONETARY=English_India.utf8 LC_NUMERIC=C                  
## [5] LC_TIME=English_India.utf8    
## 
## time zone: Asia/Calcutta
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] zoo_1.8-13     readr_2.1.5    reshape2_1.4.4 tidyr_1.3.1    ggplot2_3.5.1 
## [6] dplyr_1.1.4   
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9         generics_0.1.3     stringi_1.8.4      lattice_0.22-6    
##  [5] hms_1.1.3          digest_0.6.35      magrittr_2.0.3     evaluate_1.0.3    
##  [9] grid_4.4.1         RColorBrewer_1.1-3 fastmap_1.2.0      Matrix_1.7-3      
## [13] plyr_1.8.9         jsonlite_2.0.0     mgcv_1.9-1         purrr_1.0.4       
## [17] scales_1.3.0       textshaping_1.0.0  jquerylib_0.1.4    cli_3.6.2         
## [21] rlang_1.1.4        splines_4.4.1      munsell_0.5.1      withr_3.0.2       
## [25] cachem_1.1.0       yaml_2.3.8         tools_4.4.1        tzdb_0.5.0        
## [29] colorspace_2.1-0   vctrs_0.6.5        R6_2.6.1           lifecycle_1.0.4   
## [33] stringr_1.5.1      ragg_1.3.3         pkgconfig_2.0.3    pillar_1.10.1     
## [37] bslib_0.9.0        gtable_0.3.6       glue_1.7.0         Rcpp_1.0.14       
## [41] systemfonts_1.1.0  xfun_0.51          tibble_3.2.1       tidyselect_1.2.1  
## [45] highr_0.11         rstudioapi_0.17.1  knitr_1.48         farver_2.1.2      
## [49] nlme_3.1-167       htmltools_0.5.8.1  rmarkdown_2.29     labeling_0.4.3    
## [53] compiler_4.4.1