Inroduction
In this report the current CGPA of some students(\(n=67\)) of intake 51 of CSE program from BUBT were analysed. This report aims to explore the distribution CGPA in both graphically and numerically. Also the CGPA of the selected students were compared according to several demographic features.
The data were collected using Google form. All analysis
were carried out by R statistical software (R-base?).
Importing data and required packages
library(readr)
library(tidyverse)
library(ggthemes)
library(patchwork)
library(kableExtra)
library(gtsummary)
library(broom)
CGPA_data<- read_csv("CGPA Analysis_cleaned.csv")
#View(CGPA_data)- Details of variables at a glance
## Rows: 73
## Columns: 9
## $ Timestamp <chr> "2023/07/16 10:50:51 PM GMT+…
## $ Intake <chr> "51", "51", "51", "51", "51"…
## $ Section <dbl> 2, 1, 1, 4, 4, 4, 1, 1, 1, 3…
## $ `Current CGPA` <dbl> 3.19, 2.40, 3.05, 3.54, 3.64…
## $ Gender <chr> "Male", "Male", "Male", "Fem…
## $ `Come from ...` <chr> "Urban area", "Rural area", …
## $ `Live in or with` <chr> "With family", "Mess", "With…
## $ `Do you feel mentally healthy right now?` <chr> "Yes", "No", "Yes", "Yes", "…
## $ `Do you feel physically healthy?` <chr> "Yes", "Yes", "Yes", "Yes", …
- Checking missing values
- Dropping missing values for the final analysis
## [1] 7
## Timestamp Intake
## 0 0
## Section Current CGPA
## 1 3
## Gender Come from ...
## 0 0
## Live in or with Do you feel mentally healthy right now?
## 1 1
## Do you feel physically healthy?
## 1
Charectersistics of the students from raw data
It is seen from the table below that out of 67 students, only 6 (9%)
students said that they lived in hostel and only 5 (7.5%)
students reported that they lived with their relative. So,
it will be appropriate to merge these categories in a single one
separately.
CGPA_data %>% select(Section,Gender,`Come from ...`,`Live in or with`,
`Do you feel mentally healthy right now?`,
`Do you feel physically healthy?`) %>%
tbl_summary()| Characteristic | N = 671 |
|---|---|
| Section | |
| 1 | 17 (25%) |
| 2 | 21 (31%) |
| 3 | 10 (15%) |
| 4 | 19 (28%) |
| Gender | |
| Female | 19 (28%) |
| Male | 48 (72%) |
| Come from ... | |
| Rural area | 29 (43%) |
| Urban area | 38 (57%) |
| Live in or with | |
| Hostel | 6 (9.0%) |
| Mess | 26 (39%) |
| With family | 30 (45%) |
| With relative | 5 (7.5%) |
| Do you feel mentally healthy right now? | 45 (67%) |
| Do you feel physically healthy? | 45 (67%) |
| 1 n (%) | |
# Merging `With family` and `With relative` into `With family or relative` AND `Hostel` and `Mess` into `Hostel or mess` of the variable "Live in or with"
CGPA_data$`Live in or with`<-case_when(
CGPA_data$`Live in or with` %in% c("Hostel","Mess")~"Hostel or mess",
CGPA_data$`Live in or with` %in% c("With family", "With relative")~"With family and relative"
)Charecteristics of the students after recoding live in/with variable
- After re-coding, it is observed that out of 67 students, 72%
students were male and 28% were female. Fifty seven percent students
reported that they belonged to urban area while 43% were from rural
area. The proportion of students lived in
hostel or messwas 48% and the remaining 52% students lived with their family or relative. Majority (67%) students told that they were both mentally and physically healthy.
CGPA_data %>% select(Section,Gender,`Come from ...`,`Live in or with`,
`Do you feel mentally healthy right now?`,
`Do you feel physically healthy?`) %>%
tbl_summary()| Characteristic | N = 671 |
|---|---|
| Section | |
| 1 | 17 (25%) |
| 2 | 21 (31%) |
| 3 | 10 (15%) |
| 4 | 19 (28%) |
| Gender | |
| Female | 19 (28%) |
| Male | 48 (72%) |
| Come from ... | |
| Rural area | 29 (43%) |
| Urban area | 38 (57%) |
| Live in or with | |
| Hostel or mess | 32 (48%) |
| With family and relative | 35 (52%) |
| Do you feel mentally healthy right now? | 45 (67%) |
| Do you feel physically healthy? | 45 (67%) |
| 1 n (%) | |
Exploratory Analysis of CGPA
To explore the distribution of CGPA, frequency histogram, density plot and box-plot are utilized. To describe the summary features of CGPA, mean, median, standard deviation and IQR are used.
Data visulization of CGPA
The distribution of CGPA is explored by density plot, frequency histogram and box plot.
All of the plots suggests that the distribution of CGPA was negatively skewed.
It tells that majority of the students’ CGPA were in the upper tail of the distribution that is they obtained satisfactory CGPA.
The plots also implies that there was no substantial outlier in the CGPA.
p1<-CGPA_data %>% ggplot(aes(x=`Current CGPA`))+
geom_histogram(fill="steelblue",bins = 6,color="black")+
theme_clean()
p2<-CGPA_data %>% ggplot(aes(x=`Current CGPA`))+
geom_density(fill="steelblue",bins = 6,color="black",alpha=0.5)+
theme_clean()
p3<-CGPA_data %>%ggplot(aes(y=`Current CGPA`))+
geom_boxplot(fill="steelblue",width=1)+
scale_x_discrete()+
coord_flip()+
theme_clean()
p1/p2+p3Summary statitics of CGPA
The CGPA ranges from 1.98 to 4.00.
The average CGPA was 3.24 with a standard deviation of 0.46 while the median (Mdn=3.36, IQR=0.63) was higher than the average.
CGPA_data %>%drop_na()%>%
summarise(Min=min(`Current CGPA`),
Max=max(`Current CGPA`),
Mean=mean(`Current CGPA`),SD=sd(`Current CGPA`),
Median=median(`Current CGPA`),
IQR=quantile(`Current CGPA`,0.75)-quantile(`Current CGPA`,.25)) %>%kbl(caption = "Summary statistics of CGPA ($n=67$)",digits = 2) %>%kable_styling(c("striped", "bordered"))| Min | Max | Mean | SD | Median | IQR |
|---|---|---|---|---|---|
| 1.98 | 4 | 3.24 | 0.46 | 3.36 | 0.63 |
Some percentiles of CGPA
- For better understanding about CGPA, some percentiles like \(P_{10}, P_{20}, P_{80}, P_{80}\) are reported.
CGPA_data %>%drop_na()%>%
summarise(P10=quantile(`Current CGPA`,0.10),
P20=quantile(`Current CGPA`,0.20),
P80=quantile(`Current CGPA`,0.70),
P90=quantile(`Current CGPA`,0.90))%>%
kbl(caption = "Some percentiles of CGPA ($n=67$)",digits = 2)%>%
kable_styling(c("striped", "bordered"))| P10 | P20 | P80 | P90 |
|---|---|---|---|
| 2.58 | 2.88 | 3.52 | 3.8 |
Comparative analysis of CGPA according to some charecteristics of the students
The overlay density plot indicates that majority of the both
male and female students had higher CGPA (CGPA
3.4 to 3.6).
#glimpse(CGPA_data)
CGPA_data[,-(1:3)]->D
D %>% ggplot(aes(x=`Current CGPA`,fill=Gender))+
geom_density(alpha=0.4)+
scale_x_continuous(breaks =seq(2,4,.2))+
geom_vline(xintercept = 3.4,lwd=1)+
geom_vline(xintercept = 3.6,lwd=1)+
labs(title = "Figure 1: Overlay density plot of CGPA by gender")+
theme_classic()The comparative summary statistics in the following table also implies that the median CGPA of both male(Mdn=3.36,IQR=0.66) and female (Mdn=3.37, IQR=0.56) students were almost same while the mean CGPA of female (M=3.28, SD=0.41) was slightly higher than that of male (M=3.22, SD=0.49). Overall variation in CGPA of male was relatively higher than that of female.
CGPA_data%>%group_by(Gender) %>%
summarise(n=n(),Min=min(`Current CGPA`),
Max=max(`Current CGPA`),
Mean=mean(`Current CGPA`),SD=sd(`Current CGPA`),
Median=median(`Current CGPA`),
IQR=quantile(`Current CGPA`,0.75)-quantile(`Current CGPA`,.25)) %>%kbl(caption = "Summary statistics of CGPA by gender ($n=67$)",digits = 2) %>%kable_styling(c("striped", "bordered"))| Gender | n | Min | Max | Mean | SD | Median | IQR |
|---|---|---|---|---|---|---|---|
| Female | 19 | 2.51 | 3.85 | 3.28 | 0.41 | 3.37 | 0.56 |
| Male | 48 | 1.98 | 4.00 | 3.22 | 0.49 | 3.36 | 0.66 |
D %>% ggplot(aes(y=`Current CGPA`,x=Gender))+
geom_point(position = position_jitter(width = 0.06),pch=21,
aes(fill=Gender))+
geom_boxplot(aes(fill=Gender),width=0.4,alpha=0.5)+
guides(fill=FALSE)+
#geom_hline(yintercept = 3.36)+
labs(title = "The comparative boxplot of CGPA by gender")+
theme_bw()->d1Next the CGPA of the students were compared by urban and rural area.
- In the following parallel boxplot, it is seen that, median CGPA of
ruralarea’s students was higher than that of the students who were from urban area. The variations were almost same for both area. However 2 students of urban area were found as outliers in-terms of very low CGPA.
D %>% ggplot(aes(y=`Current CGPA`,x=`Come from ...`))+
geom_point(position = position_jitter(width = 0.06),pch=21,
aes(fill=`Come from ...`))+
geom_boxplot(aes(fill=`Come from ...`),width=0.4,alpha=0.5)+
guides(fill=FALSE)+
labs(title = "The parallel boxplot of CGPA of urban vs. rural area")+
theme_bw()->d2
d2- The comparative box plots of CGPA across the students dwelling in
hostel/messandwith family/relativeshows that students who were living with theirfamliy or relativeobtained slightly higher CGPA (in terms of median) than those who were living inmess or hostel. There were some outliers in the lower tail of CGPA boxplot inhostel or messgroup indicates the reality of hard life of students which might cause bad CGAPA score.
D %>% ggplot(aes(y=`Current CGPA`,x=`Live in or with`))+
geom_point(position = position_jitter(width = 0.06),pch=21,
aes(fill=`Live in or with`))+
geom_boxplot(aes(fill=`Live in or with`),width=0.3,alpha=0.5)+
geom_hline(yintercept =3.4 )+
annotate("text",x=1.5,y=3.45,label="CGPA=3.4",size=4)+
guides(fill=FALSE)+
labs(subtitle = "")+
theme_bw()->d3
d3D %>% ggplot(aes(y=`Current CGPA`,x=`Do you feel mentally healthy right now?`))+
geom_point(position = position_jitter(width = 0.06),pch=21,
aes(fill=`Do you feel mentally healthy right now?`))+
geom_boxplot(aes(fill=`Do you feel mentally healthy right now?`),width=0.3,alpha=0.5)+
#geom_hline(yintercept =3.4 )+
#annotate("text",x=2.5,y=3.45,label="CGPA=3.4",size=3)+
guides(fill=FALSE)+
labs(subtitle =" ")+
theme_bw()->d4
D %>% ggplot(aes(y=`Current CGPA`,x=`Do you feel physically healthy?`))+
geom_point(position = position_jitter(width = 0.06),pch=21,
aes(fill=`Do you feel physically healthy?`))+
geom_boxplot(aes(fill=`Do you feel physically healthy?`),width=0.3,alpha=0.5)+
#geom_hline(yintercept =3.4 )+
#annotate("text",x=2.5,y=3.45,label="CGPA=3.4",size=3)+
guides(fill=FALSE)+
labs(subtitle =" ")+
theme_bw()->d5Though the median CGPA of the students who do not feel mentally healthy is little bit higher but shows more variability than the students’ CGPA who feel mentally healthy.
Though the median CGPA of the students who do not feel physically healthy is higher but shows more variability than the students’ CGPA who feel physically healthy.
Plausible reason
Usually the students who are mentally and physically ill are expected to have worse CGPA than the students who are not ill both mentally and physically. May be the students who said that they were not felling mentally and physically sound tried to mitigate their shortcomings by performing better in academic records like by obtaining higher CGPA.
Conclusion
This report is made only from 67 students of some selected students of BUBT and only exploratory data analysis was performed. But, if the sample size could be increased and inferential statistical was performed then we would have better analysis about several charectistics of the students affect their academic performance like CGPA.