First, the dataset has to be imported into R Studio. As it is a csv file, the read.csv function can be used:
Adata <- read.csv("Admission.csv")
Then, it is possible to check how the data looks like:
head(Adata)
## GPA GMAT Decision
## 1 2.96 596 admit
## 2 3.14 473 admit
## 3 3.22 482 admit
## 4 3.29 527 admit
## 5 3.69 505 admit
## 6 3.46 693 admit
str(Adata)
## 'data.frame': 85 obs. of 3 variables:
## $ GPA : num 2.96 3.14 3.22 3.29 3.69 3.46 3.03 3.19 3.63 3.59 ...
## $ GMAT : int 596 473 482 527 505 693 626 663 447 588 ...
## $ Decision: chr "admit" "admit" "admit" "admit" ...
The dataset consists of 85 GPA and GMAT scores combined with an admission decision for a university program. Each row represents an individual applicant to a program, which is a unit of observation here. The dataset contains 85 observations (applicants), therefore, the sample size is 85.
Variables Description
"admit"
, "notadmit"
,
"border"
The dataset used in this analysis was sourced from Kaggle, an online platform for dataset sharing.
Now, the missing values have to be identified and removed:
Adata <- na.omit(Adata)
I also change the name of the columns:
colnames(Adata)[1] <- "Undergrad GPA"
colnames(Adata)[2] <- "GMAT Score"
head(Adata)
## Undergrad GPA GMAT Score Decision
## 1 2.96 596 admit
## 2 3.14 473 admit
## 3 3.22 482 admit
## 4 3.29 527 admit
## 5 3.69 505 admit
## 6 3.46 693 admit
It would be fitting to add a categorical variable for GPA level:
Adata <- cbind(Adata, rep(NA, nrow(Adata)))
colnames(Adata)[4] <- "GPA Level"
for (i in 1:nrow(Adata)) {
if (Adata[i,1] < 3) {Adata[i,4] <- "Low"}
else {if (Adata[i,1] < 3.5) {Adata[i,4] <- "Medium"}
else {Adata[i,4] <- "High"}}
}
Now I convert the Decision and GPA Level variables into factors:
Adata$Decision <- factor(Adata$Decision,
levels = c("admit", "notadmit", "border"))
Adata$`GPA Level` <- factor(Adata$`GPA Level`,
levels = c("Low", "Medium", "High"))
It is possible to create a new data frame with only admitted students who have high GPA.
Adata2 <- Adata[Adata$Decision=="admit" & Adata$`GPA Level`=="High",]
head(Adata2)
## Undergrad GPA GMAT Score Decision GPA Level
## 5 3.69 505 admit High
## 9 3.63 447 admit High
## 10 3.59 588 admit High
## 13 3.50 572 admit High
## 14 3.78 591 admit High
## 22 3.58 564 admit High
Library psych has to be activated now:
library(psych)
describe(Adata[, c("Undergrad GPA", "GMAT Score")])
## vars n mean sd median trimmed mad min max range
## Undergrad GPA 1 85 2.97 0.43 3.01 2.97 0.52 2.13 3.8 1.67
## GMAT Score 2 85 488.45 81.52 482.00 484.36 84.51 313.00 693.0 380.00
## skew kurtosis se
## Undergrad GPA -0.05 -1.01 0.05
## GMAT Score 0.39 -0.14 8.84
The average (mean) undergraduate GPA of the applicants was
2.97, while the median was slightly higher at
3.01, indicating a fairly symmetric distribution with
minimal skew.
The standard deviation of GPA was 0.43, suggesting that
most students had similar academic performance with little
variation.
For the GMAT score, the mean was 488.45 and the median
was 482, again showing a relatively balanced
distribution.
However, the standard deviation of 81.52 for GMAT
scores indicates a slightly wider spread, meaning applicants’ test
performance varied more significantly in relation to the mean
performance.
To better compare the variance of two variables, it is possible to calculate their coefficients of variation.
cv_gpa <- (sd(Adata$`Undergrad GPA`) /
mean(Adata$`Undergrad GPA`))
cv_gmat <- (sd(Adata$`GMAT Score`) /
mean(Adata$`GMAT Score`))
cv_gpa
## [1] 0.1442201
cv_gmat
## [1] 0.1669011
It can be concluded that GMAT scores vary more significantly indeed.
describeBy(Adata[, c("Undergrad GPA", "GMAT Score")],
group = Adata$Decision)
##
## Descriptive statistics by group
## group: admit
## vars n mean sd median trimmed mad min max range
## Undergrad GPA 1 31 3.40 0.21 3.39 3.40 0.19 2.96 3.8 0.84
## GMAT Score 2 31 561.23 67.96 559.00 560.16 56.34 431.00 693.0 262.00
## skew kurtosis se
## Undergrad GPA 0.08 -0.56 0.04
## GMAT Score 0.17 -0.65 12.21
## ------------------------------------------------------------
## group: notadmit
## vars n mean sd median trimmed mad min max range
## Undergrad GPA 1 28 2.48 0.18 2.47 2.48 0.16 2.13 2.9 0.77
## GMAT Score 2 28 447.07 62.38 435.50 449.21 65.23 321.00 542.0 221.00
## skew kurtosis se
## Undergrad GPA 0.28 -0.27 0.03
## GMAT Score -0.07 -1.10 11.79
## ------------------------------------------------------------
## group: border
## vars n mean sd median trimmed mad min max range
## Undergrad GPA 1 26 2.99 0.17 3.01 2.98 0.18 2.73 3.5 0.77
## GMAT Score 2 26 446.23 47.40 446.00 448.32 42.25 313.00 546.0 233.00
## skew kurtosis se
## Undergrad GPA 0.81 0.82 0.03
## GMAT Score -0.50 0.76 9.30
Using the describeBy()
function, I examined the
descriptive statistics for undergraduate GPA and GMAT scores across
three admission decision groups: admit, notadmit, and
border.
Admitted students had the highest average GPA (mean = 3.40) and GMAT scores (mean = 561.23) indicating strong academic performance. However, the minimum values for GPA and GMAT are as low as 2.96 and 431 respectively for this group. In contrast, the not admitted group had significantly lower averages (GPA mean = 2.48, GMAT mean = 447.07), suggesting both measures may have influenced the negative admission decision. The borderline group had an average GPA of 2.99, significantly higher than the not admitted group, but their average GMAT score (446.23) closely resembled the not admitted group, implying that while their academic record was relatively strong, lower GMAT performance may have placed them in an uncertain decision category.
library(ggplot2)
ggplot(Adata, aes(x = `Undergrad GPA`)) +
geom_histogram(binwidth = 0.1, fill = "darkslateblue", color = "white") +
facet_wrap(~ Decision, ncol = 1) +
labs(title = "Distribution of Undergraduate GPA by Admission Decision",
x = "GPA", y = "Count") +
theme_minimal()
The histograms illustrate the distribution of undergraduate GPA for each admission decision category: admit, notadmit, and border.
Admitted students show a clear concentration of GPAs in the higher range (around 3.2 to 3.6), reflecting strong academic performance. In contrast, not admitted students are heavily concentrated in the lower GPA range, particularly between 2.4 and 2.6, suggesting GPA may have played a major role in rejection. The borderline group has GPAs clustered around the mid-range (approximately 2.8 to 3.1), indicating that these applicants were neither clearly strong nor weak based on academic performance alone.
This visual comparison supports the conclusion that GPA is positively associated with admission success.
ggplot(Adata, aes(x = "", y = `GMAT Score`)) +
geom_boxplot(fill = "coral", color = "black") +
coord_flip() +
facet_wrap(~ Decision, ncol = 1) +
labs(
title = "GMAT Score Distribution by Admission Decision",
x = "",
y = "GMAT Score"
) +
theme_minimal()
The boxplots show the distribution of GMAT scores across the three admission decision categories.
Admitted students have the highest GMAT scores overall, with the middle 50% (interquartile range) falling roughly between 520 and 600, and some scores reaching as high as 690. This suggests that strong GMAT performance is closely linked to admission.
The not admitted group has a significantly lower median than the admitted group. This supports the idea that lower GMAT scores are a major factor in rejections.
The borderline group’s GMAT distribution is very similar to the not admitted group — centered around the mid-400s. This indicates that GMAT alone may not have been strong enough to secure admission for these candidates, even if other aspects of their application were competitive.
ggplot(Adata, aes(x = `Undergrad GPA`, y = `GMAT Score`,
color = Decision)) +
geom_point(size = 2, alpha = 0.8) +
labs(
title = "Scatterplot of GPA and GMAT Score by Admission Decision",
x = "Undergraduate GPA",
y = "GMAT Score"
) +
theme_minimal()
The scatterplot visualizes the relationship between undergraduate GPA and GMAT scores, with data points colored by admission decision.
Admitted students (red) are clearly clustered in the upper-right corner, where both GPA and GMAT scores are high. This suggests that strong academic performance in both areas significantly increases the likelihood of admission.
Not admitted students (green) mostly appear in the lower-left portion of the plot, indicating that lower scores in both GPA and GMAT are commonly associated with rejection.
Borderline applicants (blue) are concentrated in the mid-range of both variables, showing moderate performance that may have made their applications less decisive. Notably, some borderline applicants have strong GPA or GMAT scores individually, but not both — possibly explaining why they didn’t fall clearly into admit or not admit categories.
Overall, the plot highlights a positive relationship between GPA and GMAT, and clearly shows how both contribute to the final admission decision.