This document presents an introductory analysis of the result of CUET PG 2025 in Mathematics subject.
We load the dataset:
msc <- read.csv("data/msc.csv", header=T)
head(msc, 8)
## Rank App.No. Roll.No. Candidate.Name Father.Name
## 1 1 253510253188 UP180606597 RITIK ROSHAN SHARMA UMESH SHARMA
## 2 2 253510121892 DL011001236 DIVYANSH MITTAL TIRUPATI GUPTA
## 3 3 253510307212 KL130101356 ADARSH V VINOD KUMAR P N
## 4 4 253510113180 WB130200647 PURABI MAHATA BHABESH CHANDRA MAHATA
## 5 5 253510100600 WB100207644 ISHAN CHAKRABORTY MONI CHAKRABORTY
## 6 6 253510169307 UP160100626 SANYAM TANEJA NARENDER KUMAR
## 7 7 253510156977 DL010803377 DEEPAK KUMAR SINGH KAMAL SINGH
## 8 8 253510245743 DL011213624 MAAHIR SADH KAPIL SADH
## Gender Marks
## 1 Male 239
## 2 Male 229
## 3 Male 227
## 4 Female 222
## 5 Male 216
## 6 Male 210
## 7 Male 208
## 8 Male 208
Congratulations to my batch-mate Maahir Sadh on securing AIR 8. On that note, we list the top ranks from each state. For this, we slice the first two characters of roll numbers (they correspond to the state that the candidate belongs to):
msc <- msc %>%
mutate(State = substr(msc$Roll.No., 1, 2))
and thus:
msc %>%
group_by(State) %>%
slice_max(Marks, n = 1, with_ties = FALSE) %>%
select(State, Rank, Candidate.Name, Marks) %>%
arrange(Rank)
## # A tibble: 34 × 4
## # Groups: State [34]
## State Rank Candidate.Name Marks
## <chr> <int> <chr> <int>
## 1 UP 1 RITIK ROSHAN SHARMA 239
## 2 DL 2 DIVYANSH MITTAL 229
## 3 KL 3 ADARSH V 227
## 4 WB 4 PURABI MAHATA 222
## 5 OR 15 CHANDRA SEKHAR MAHAPATRO 201
## 6 MP 16 ADARSH TIWARI 199
## 7 UK 17 ANSH GUPTA 198
## 8 BR 19 SHABD PRAKASH 194
## 9 MR 28 DHRUV SANTOSH SHAH 190
## 10 CG 29 PANKAJ KUMAR VERMA 190
## # ℹ 24 more rows
Number of candidates from each state:
table(msc$State)
##
## AL AM AN AP BR CG CH DL GJ GO HP HR JH JK KK KL
## 48 549 8 123 431 270 25 1383 56 12 386 433 205 294 197 609
## LK LL MG MN MP MR MZ NL OR PB RJ SM TA TL TN UK
## 2 6 253 173 234 123 43 23 719 74 680 46 162 350 342 254
## UP WB
## 2733 650
Number of male and female candidates:
table(msc$Gender)
##
## Female Male
## 6408 5488
On that note:
msc %>%
group_by(Gender) %>%
summarise(
Count = n(),
Min = min(Marks),
Average = mean(Marks),
Median = median(Marks),
Max = max(Marks)
)
## # A tibble: 2 × 6
## Gender Count Min Average Median Max
## <chr> <int> <int> <dbl> <dbl> <int>
## 1 Female 6408 -31 46.9 40 222
## 2 Male 5488 -21 55.0 45 239
We see that the scatter plot of marks vs. rank follows something around a sigmoid curve:
ggplot(msc, aes(x=Marks,
y=Rank,
col=Gender)) +
geom_point()
Further and subsequently that marks of all candidates follow the Gaussian distribution, in this case one that is skewed to the right, given that the exam was moderate in terms of difficulty level.
ggplot(msc, aes(x=Marks,
color=Gender,
fill=Gender)) +
geom_density(alpha=0.05,
linewidth=1)
We find the general performance of students State-vise:
ggplot(msc, aes(y=Marks,
color=Gender)) +
geom_boxplot() +
facet_wrap(~State)