A Dean’s Dilemma - Gender Gap

Introduction

In this R markdown file, we will continue exploring ‘The Dean’s Dilemma’ case study. The objective of this report is to conclude whether or not there is a gender gap among the placed students in the college.

To start with, we will read the given dataset and then create another dataset, which is a subset of the given dataframe and includes only those students who are placed.

The following code does that :

setwd("~/Muyeena/Internship/Case Studies/Deans dillemma")
mbadata = read.csv("deans dilemma.csv")
#View(mbadata)
dim(mbadata) # a basic check to ensure all the data is loaded

## [1] 391  26

placed = mbadata[which(mbadata$Placement_B == 1),]
dim(placed) # a basic check to ensure all placed students are in the sub dataset.

## [1] 312  26

Table showing the mean salary of males and females, who were placed.

We use the aggregate() command to do the given task.

msalary.sex = aggregate(placed$Salary, by=list(Gender = placed$Gender), mean)
msalary.sex

##   Gender        x
## 1      F 253068.0
## 2      M 284241.9

The average salary of male MBAs who were placed

This is a simple task and involves calling out a specific element from the table created above.

msalary.sex[2,2]

## [1] 284241.9

As seen, the average salary of male students, who were placed is 2.842418610^{5}

The average salary of female MBAs who were placed

As mentioned above, it just requires calling out a value.

msalary.sex[1,2]

## [1] 253068

As seen, the average salary of female students, who were placed is 2.530680410^{5}

R code to run a t-test

The given hypothesis is -

The average salary of the male MBAs is higher than the average salary of female MBAs.

Therefore, the null hypothesis will be -

There is no significance difference betweent the salaries of the male MBAs and that of the female MBAs

To test the hypothesis, we run a code called t.test() in R. This test will conduct the Welch Two Sample t-test.

test = t.test(Salary ~ Gender, data = placed)
test

## 
##  Welch Two Sample t-test
## 
## data:  Salary by Gender
## t = -3.0757, df = 243.03, p-value = 0.00234
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -51138.42 -11209.22
## sample estimates:
## mean in group F mean in group M 
##        253068.0        284241.9

p-Value of the test

In the above test, the result is given as a list datatype in R. Therefore, we call the particular element to give us the P-value

test[3]

## $p.value
## [1] 0.00234019

Therefore, from above it is obvious that the p-value of the test is 0.0023402

Interpret the meaning of the t-test

When t-test was run to check for the hypothesis, the p-value found was 0.0023402. This value is less than p = 0.05, which would lead us to reject the NULL hypothesis.

Therefore, from the above t-test we can say with 99% confidence that

There is a significant difference between the salaries of Male MBAs and female MBAs

With this, we come to the conclusion of our report.