library(readxl)
Business_School <- read_excel("~/Downloads/R data/R Take Home Exam 2025/Task 2/Business School.xlsx")
head(Business_School, 10)
## # A tibble: 10 × 9
## `Student ID` `Undergrad Degree` `Undergrad Grade` `MBA Grade`
## <dbl> <chr> <dbl> <dbl>
## 1 1 Business 68.4 90.2
## 2 2 Computer Science 70.2 68.7
## 3 3 Finance 76.4 83.3
## 4 4 Business 82.6 88.7
## 5 5 Finance 76.9 75.4
## 6 6 Computer Science 83.3 82.1
## 7 7 Engineering 76 66.9
## 8 8 Engineering 82.8 76.8
## 9 9 Business 76 72.3
## 10 10 Finance 76.9 72.4
## # ℹ 5 more variables: `Work Experience` <chr>, `Employability (Before)` <dbl>,
## # `Employability (After)` <dbl>, Status <chr>, `Annual Salary` <dbl>
library(ggplot2)
ggplot(Business_School, aes(x=`Undergrad Degree`)) +
geom_bar(colour = "navy", fill = "pink") +
labs (title = "Distribution of Undergaduate Degrees among MBA Students",
x = "Undergrad Degree",
y = "Frequency" ) +
theme_minimal() +
geom_text(stat = "count",
aes (label = ..count..),
vjust = -0.3)
We can see that most common degree is Business degree (35).
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
describe(Business_School$`Annual Salary`)
## vars n mean sd median trimmed mad min max range skew
## X1 1 100 109058 41501.49 103500 104600.2 25945.5 20000 340000 320000 2.22
## kurtosis se
## X1 9.41 4150.15
ggplot(Business_School, aes(x= `Annual Salary`)) +
geom_histogram(bins=10, fill= "hotpink", color = "navy") +
scale_x_continuous(labels = scales::comma) +
labs(title ="Distribution of Annual Salary",
x = "Annual Salaray (dollar per year)",
y = "Frequency") +
theme_minimal(base_size = 13)
The distribution of Annual Salary is right skewed (positively skewed),
this is also confirmed, because the mean is a bit larger than the
median.
t.test(Business_School$`MBA Grade`,
mu = 74,
alternative = "two.sided")
##
## One Sample t-test
##
## data: Business_School$`MBA Grade`
## t = 2.6587, df = 99, p-value = 0.00915
## alternative hypothesis: true mean is not equal to 74
## 95 percent confidence interval:
## 74.51764 77.56346
## sample estimates:
## mean of x
## 76.04055
H0: 𝜇MBA Grade = 74 (the average MBA grade this year is the same as last year) H1: 𝜇MBA Grade ≠ 74 (the average MBA grade this year is different from last year)
Since the p-value 0.00915 < 0.05, we reject null hypotesis. We are 95% confident that true average MBA grade this year lies between 74.52 and 77.56, which suggests that the students’ performance is likely higher than last year’s average of 74.
#install.packages("effectsize")
library(effectsize)
##
## Attaching package: 'effectsize'
## The following object is masked from 'package:psych':
##
## phi
cohens_d(Business_School$`MBA Grade`, mu=74)
## Cohen's d | 95% CI
## ------------------------
## 0.27 | [0.07, 0.46]
##
## - Deviation from a difference of 74.
This measures the effect size of the difference between this year’s average MBA grade and last year’s average (74). Given the confidence interval, I can with 95% confidence say that true effects size lies is the range of 0.07 and 0.46. The positive value of Cohen’s d = 0.27 indicates as small effect size. Sawilowsky (2009) also confirms small effect size.
effectsize::interpret_cohens_d(0.27, rules = "sawilowsky2009")
## [1] "small"
## (Rules: sawilowsky2009)