library(readxl)
Business_School <- read_excel("~/Downloads/R data/R Take Home Exam 2025/Task 2/Business School.xlsx")
head(Business_School, 10)
## # A tibble: 10 × 9
##    `Student ID` `Undergrad Degree` `Undergrad Grade` `MBA Grade`
##           <dbl> <chr>                          <dbl>       <dbl>
##  1            1 Business                        68.4        90.2
##  2            2 Computer Science                70.2        68.7
##  3            3 Finance                         76.4        83.3
##  4            4 Business                        82.6        88.7
##  5            5 Finance                         76.9        75.4
##  6            6 Computer Science                83.3        82.1
##  7            7 Engineering                     76          66.9
##  8            8 Engineering                     82.8        76.8
##  9            9 Business                        76          72.3
## 10           10 Finance                         76.9        72.4
## # ℹ 5 more variables: `Work Experience` <chr>, `Employability (Before)` <dbl>,
## #   `Employability (After)` <dbl>, Status <chr>, `Annual Salary` <dbl>

1. Graph the distribution of undergrad degrees using the ggplot function. Which degree is the most common?

library(ggplot2)
ggplot(Business_School, aes(x=`Undergrad Degree`)) + 
  geom_bar(colour = "navy", fill = "pink") + 
   labs (title = "Distribution of Undergaduate Degrees among MBA Students", 
         x = "Undergrad Degree", 
         y = "Frequency" ) + 
  theme_minimal() + 
  geom_text(stat = "count", 
            aes (label = ..count..),
            vjust = -0.3)

We can see that most common degree is Business degree (35).

2.Show the descriptive statistics of the Annual Salary and its distribution with the histogram (use the ggplot function). Describe the distribution.

library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
describe(Business_School$`Annual Salary`)
##    vars   n   mean       sd median  trimmed     mad   min    max  range skew
## X1    1 100 109058 41501.49 103500 104600.2 25945.5 20000 340000 320000 2.22
##    kurtosis      se
## X1     9.41 4150.15
ggplot(Business_School, aes(x= `Annual Salary`)) + 
  geom_histogram(bins=10, fill= "hotpink", color = "navy") + 
  scale_x_continuous(labels = scales::comma) + 
  labs(title ="Distribution of Annual Salary", 
       x = "Annual Salaray (dollar per year)", 
       y = "Frequency") + 
  theme_minimal(base_size = 13)

The distribution of Annual Salary is right skewed (positively skewed), this is also confirmed, because the mean is a bit larger than the median.

3. Test the following hypothesis: 𝐻0: 𝜇MBA Grade = 74. Explain the result and interpret the effect size.

t.test(Business_School$`MBA Grade`, 
       mu = 74, 
       alternative = "two.sided")
## 
##  One Sample t-test
## 
## data:  Business_School$`MBA Grade`
## t = 2.6587, df = 99, p-value = 0.00915
## alternative hypothesis: true mean is not equal to 74
## 95 percent confidence interval:
##  74.51764 77.56346
## sample estimates:
## mean of x 
##  76.04055

H0: 𝜇MBA Grade = 74 (the average MBA grade this year is the same as last year) H1: 𝜇MBA Grade ≠ 74 (the average MBA grade this year is different from last year)

Since the p-value 0.00915 < 0.05, we reject null hypotesis. We are 95% confident that true average MBA grade this year lies between 74.52 and 77.56, which suggests that the students’ performance is likely higher than last year’s average of 74.

#install.packages("effectsize")
library(effectsize)
## 
## Attaching package: 'effectsize'
## The following object is masked from 'package:psych':
## 
##     phi
cohens_d(Business_School$`MBA Grade`, mu=74)
## Cohen's d |       95% CI
## ------------------------
## 0.27      | [0.07, 0.46]
## 
## - Deviation from a difference of 74.

This measures the effect size of the difference between this year’s average MBA grade and last year’s average (74). Given the confidence interval, I can with 95% confidence say that true effects size lies is the range of 0.07 and 0.46. The positive value of Cohen’s d = 0.27 indicates as small effect size. Sawilowsky (2009) also confirms small effect size.

effectsize::interpret_cohens_d(0.27, rules = "sawilowsky2009")
## [1] "small"
## (Rules: sawilowsky2009)