Inroduction

In this report the current CGPA of some students(\(n=67\)) of intake 51 of CSE program from BUBT were analysed. This report aims to explore the distribution CGPA in both graphically and numerically. Also the CGPA of the selected students were compared according to several demographic features.

The data were collected using Google form. All analysis were carried out by R statistical software (R-base?).

Importing data and required packages

library(readr)
library(tidyverse)
library(ggthemes)
library(patchwork)
library(kableExtra)
library(gtsummary)
library(broom)

CGPA_data<- read_csv("CGPA Analysis_cleaned.csv")
#View(CGPA_data)
glimpse(CGPA_data)
## Rows: 73
## Columns: 9
## $ Timestamp                                 <chr> "2023/07/16 10:50:51 PM GMT+…
## $ Intake                                    <chr> "51", "51", "51", "51", "51"…
## $ Section                                   <dbl> 2, 1, 1, 4, 4, 4, 1, 1, 1, 3…
## $ `Current CGPA`                            <dbl> 3.19, 2.40, 3.05, 3.54, 3.64…
## $ Gender                                    <chr> "Male", "Male", "Male", "Fem…
## $ `Come from ...`                           <chr> "Urban area", "Rural area", …
## $ `Live in or with`                         <chr> "With family", "Mess", "With…
## $ `Do you feel mentally healthy right now?` <chr> "Yes", "No", "Yes", "Yes", "…
## $ `Do you feel physically healthy?`         <chr> "Yes", "Yes", "Yes", "Yes", …
CGPA_data$Section<-factor(CGPA_data$Section)
sum(is.na(CGPA_data))# Total number of missing values
## [1] 7
colSums(is.na(CGPA_data))# Missing values in Each variable
##                               Timestamp                                  Intake 
##                                       0                                       0 
##                                 Section                            Current CGPA 
##                                       1                                       3 
##                                  Gender                           Come from ... 
##                                       0                                       0 
##                         Live in or with Do you feel mentally healthy right now? 
##                                       1                                       1 
##         Do you feel physically healthy? 
##                                       1
# Droping the missing values 

CGPA_data %>% drop_na()->CGPA_data

Charectersistics of the students from raw data

It is seen from the table below that out of 67 students, only 6 (9%) students said that they lived in hostel and only 5 (7.5%) students reported that they lived with their relative. So, it will be appropriate to merge these categories in a single one separately.

CGPA_data %>% select(Section,Gender,`Come from ...`,`Live in or with`,
                     `Do you feel mentally healthy right now?`,
                     `Do you feel physically healthy?`) %>%
  tbl_summary()
Characteristic N = 671
Section
    1 17 (25%)
    2 21 (31%)
    3 10 (15%)
    4 19 (28%)
Gender
    Female 19 (28%)
    Male 48 (72%)
Come from ...
    Rural area 29 (43%)
    Urban area 38 (57%)
Live in or with
    Hostel 6 (9.0%)
    Mess 26 (39%)
    With family 30 (45%)
    With relative 5 (7.5%)
Do you feel mentally healthy right now? 45 (67%)
Do you feel physically healthy? 45 (67%)
1 n (%)
# Merging `With family` and `With relative` into `With family or relative`  AND `Hostel` and `Mess` into `Hostel or mess` of the variable "Live in or with"


CGPA_data$`Live in or with`<-case_when(
  CGPA_data$`Live in or with` %in% c("Hostel","Mess")~"Hostel or mess",
  CGPA_data$`Live in or with` %in% c("With family", "With relative")~"With family and relative"
)

Charecteristics of the students after recoding live in/with variable

  • After re-coding, it is observed that out of 67 students, 72% students were male and 28% were female. Fifty seven percent students reported that they belonged to urban area while 43% were from rural area. The proportion of students lived in hostel or mess was 48% and the remaining 52% students lived with their family or relative. Majority (67%) students told that they were both mentally and physically healthy.
CGPA_data %>% select(Section,Gender,`Come from ...`,`Live in or with`,
                     `Do you feel mentally healthy right now?`,
                     `Do you feel physically healthy?`) %>%
  tbl_summary()
Characteristic N = 671
Section
    1 17 (25%)
    2 21 (31%)
    3 10 (15%)
    4 19 (28%)
Gender
    Female 19 (28%)
    Male 48 (72%)
Come from ...
    Rural area 29 (43%)
    Urban area 38 (57%)
Live in or with
    Hostel or mess 32 (48%)
    With family and relative 35 (52%)
Do you feel mentally healthy right now? 45 (67%)
Do you feel physically healthy? 45 (67%)
1 n (%)

Exploratory Analysis of CGPA

To explore the distribution of CGPA, frequency histogram, density plot and box-plot are utilized. To describe the summary features of CGPA, mean, median, standard deviation and IQR are used.

Data visulization of CGPA

The distribution of CGPA is explored by density plot, frequency histogram and box plot.

  • All of the plots suggests that the distribution of CGPA was negatively skewed.

  • It tells that majority of the students’ CGPA were in the upper tail of the distribution that is they obtained satisfactory CGPA.

  • The plots also implies that there was no substantial outlier in the CGPA.

p1<-CGPA_data %>% ggplot(aes(x=`Current CGPA`))+
  geom_histogram(fill="steelblue",bins = 6,color="black")+
  theme_clean()

p2<-CGPA_data %>% ggplot(aes(x=`Current CGPA`))+
  geom_density(fill="steelblue",bins = 6,color="black",alpha=0.5)+
  theme_clean()   

p3<-CGPA_data %>%ggplot(aes(y=`Current CGPA`))+
  geom_boxplot(fill="steelblue",width=1)+
  scale_x_discrete()+
  coord_flip()+
  theme_clean()

p1/p2+p3

Summary statitics of CGPA

  • The CGPA ranges from 1.98 to 4.00.

  • The average CGPA was 3.24 with a standard deviation of 0.46 while the median (Mdn=3.36, IQR=0.63) was higher than the average.

CGPA_data %>%drop_na()%>%
  summarise(Min=min(`Current CGPA`),
            Max=max(`Current CGPA`),
            Mean=mean(`Current CGPA`),SD=sd(`Current CGPA`),
            Median=median(`Current CGPA`),
            IQR=quantile(`Current CGPA`,0.75)-quantile(`Current CGPA`,.25)) %>%kbl(caption = "Summary statistics of CGPA ($n=67$)",digits = 2) %>%kable_styling(c("striped", "bordered"))
Summary statistics of CGPA (\(n=67\))
Min Max Mean SD Median IQR
1.98 4 3.24 0.46 3.36 0.63

Some percentiles of CGPA

  • For better understanding about CGPA, some percentiles like \(P_{10}, P_{20}, P_{80}, P_{80}\) are reported.
CGPA_data %>%drop_na()%>%
  summarise(P10=quantile(`Current CGPA`,0.10),
            P20=quantile(`Current CGPA`,0.20),
            P80=quantile(`Current CGPA`,0.70),
            P90=quantile(`Current CGPA`,0.90))%>%
  kbl(caption = "Some percentiles of CGPA ($n=67$)",digits = 2)%>%
  kable_styling(c("striped", "bordered"))
Some percentiles of CGPA (\(n=67\))
P10 P20 P80 P90
2.58 2.88 3.52 3.8

Comparative analysis of CGPA according to some charecteristics of the students

The overlay density plot indicates that majority of the both male and female students had higher CGPA (CGPA 3.4 to 3.6).

#glimpse(CGPA_data)
CGPA_data[,-(1:3)]->D

D %>% ggplot(aes(x=`Current CGPA`,fill=Gender))+
  geom_density(alpha=0.4)+
  scale_x_continuous(breaks =seq(2,4,.2))+
  geom_vline(xintercept = 3.4,lwd=1)+
  geom_vline(xintercept = 3.6,lwd=1)+
  labs(title = "Figure 1: Overlay density plot of CGPA by gender")+
  theme_classic()

The comparative summary statistics in the following table also implies that the median CGPA of both male(Mdn=3.36,IQR=0.66) and female (Mdn=3.37, IQR=0.56) students were almost same while the mean CGPA of female (M=3.28, SD=0.41) was slightly higher than that of male (M=3.22, SD=0.49). Overall variation in CGPA of male was relatively higher than that of female.

CGPA_data%>%group_by(Gender) %>% 
  summarise(n=n(),Min=min(`Current CGPA`),
            Max=max(`Current CGPA`),
            Mean=mean(`Current CGPA`),SD=sd(`Current CGPA`),
            Median=median(`Current CGPA`),
            IQR=quantile(`Current CGPA`,0.75)-quantile(`Current CGPA`,.25)) %>%kbl(caption = "Summary statistics of CGPA by gender ($n=67$)",digits = 2) %>%kable_styling(c("striped", "bordered"))
Summary statistics of CGPA by gender (\(n=67\))
Gender n Min Max Mean SD Median IQR
Female 19 2.51 3.85 3.28 0.41 3.37 0.56
Male 48 1.98 4.00 3.22 0.49 3.36 0.66
D %>% ggplot(aes(y=`Current CGPA`,x=Gender))+
  geom_point(position = position_jitter(width = 0.06),pch=21,
             aes(fill=Gender))+
  geom_boxplot(aes(fill=Gender),width=0.4,alpha=0.5)+
  guides(fill=FALSE)+
  #geom_hline(yintercept = 3.36)+
  labs(title = "The comparative boxplot of CGPA by gender")+
  theme_bw()->d1
d1

Next the CGPA of the students were compared by urban and rural area.

D %>% ggplot(aes(y=`Current CGPA`,x=`Come from ...`))+
  geom_point(position = position_jitter(width = 0.06),pch=21,
             aes(fill=`Come from ...`))+
  geom_boxplot(aes(fill=`Come from ...`),width=0.4,alpha=0.5)+
  guides(fill=FALSE)+
  labs(title = "The parallel boxplot of CGPA of urban vs. rural area")+
  theme_bw()->d2
d2

D %>% ggplot(aes(y=`Current CGPA`,x=`Live in or with`))+
  geom_point(position = position_jitter(width = 0.06),pch=21,
             aes(fill=`Live in or with`))+
  geom_boxplot(aes(fill=`Live in or with`),width=0.3,alpha=0.5)+
  geom_hline(yintercept =3.4 )+
  annotate("text",x=1.5,y=3.45,label="CGPA=3.4",size=4)+
  guides(fill=FALSE)+
  labs(subtitle = "")+
  theme_bw()->d3

d3

D %>% ggplot(aes(y=`Current CGPA`,x=`Do you feel mentally healthy right now?`))+
  geom_point(position = position_jitter(width = 0.06),pch=21,
             aes(fill=`Do you feel mentally healthy right now?`))+
  geom_boxplot(aes(fill=`Do you feel mentally healthy right now?`),width=0.3,alpha=0.5)+
  #geom_hline(yintercept =3.4 )+
  #annotate("text",x=2.5,y=3.45,label="CGPA=3.4",size=3)+
  guides(fill=FALSE)+
  labs(subtitle =" ")+
  theme_bw()->d4

D %>% ggplot(aes(y=`Current CGPA`,x=`Do you feel physically healthy?`))+
  geom_point(position = position_jitter(width = 0.06),pch=21,
             aes(fill=`Do you feel physically healthy?`))+
  geom_boxplot(aes(fill=`Do you feel physically healthy?`),width=0.3,alpha=0.5)+
  #geom_hline(yintercept =3.4 )+
  #annotate("text",x=2.5,y=3.45,label="CGPA=3.4",size=3)+
  guides(fill=FALSE)+
  labs(subtitle =" ")+
  theme_bw()->d5
d4+d5+plot_annotation(tag_levels = list(c('(A)', '(B)'), '1'))

Plausible reason

Usually the students who are mentally and physically ill are expected to have worse CGPA than the students who are not ill both mentally and physically. May be the students who said that they were not felling mentally and physically sound tried to mitigate their shortcomings by performing better in academic records like by obtaining higher CGPA.

Conclusion

This report is made only from 67 students of some selected students of BUBT and only exploratory data analysis was performed. But, if the sample size could be increased and inferential statistical was performed then we would have better analysis about several charectistics of the students affect their academic performance like CGPA.

Reference