In this R markdown file, we will continue exploring ‘The Dean’s Dilemma’ case study. The objective of this report is to conclude whether or not there is a gender gap among the placed students in the college.
To start with, we will read the given dataset and then create another dataset, which is a subset of the given dataframe and includes only those students who are placed.
The following code does that :
setwd("~/Muyeena/Internship/Case Studies/Deans dillemma")
mbadata = read.csv("deans dilemma.csv")
#View(mbadata)
dim(mbadata) # a basic check to ensure all the data is loaded
## [1] 391 26
placed = mbadata[which(mbadata$Placement_B == 1),]
dim(placed) # a basic check to ensure all placed students are in the sub dataset.
## [1] 312 26
We use the aggregate() command to do the given task.
msalary.sex = aggregate(placed$Salary, by=list(Gender = placed$Gender), mean)
msalary.sex
## Gender x
## 1 F 253068.0
## 2 M 284241.9
This is a simple task and involves calling out a specific element from the table created above.
msalary.sex[2,2]
## [1] 284241.9
As seen, the average salary of male students, who were placed is 2.842418610^{5}
As mentioned above, it just requires calling out a value.
msalary.sex[1,2]
## [1] 253068
As seen, the average salary of female students, who were placed is 2.530680410^{5}
The given hypothesis is -
The average salary of the male MBAs is higher than the average salary of female MBAs.
Therefore, the null hypothesis will be -
There is no significance difference betweent the salaries of the male MBAs and that of the female MBAs
To test the hypothesis, we run a code called t.test() in R. This test will conduct the Welch Two Sample t-test.
test = t.test(Salary ~ Gender, data = placed)
test
##
## Welch Two Sample t-test
##
## data: Salary by Gender
## t = -3.0757, df = 243.03, p-value = 0.00234
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -51138.42 -11209.22
## sample estimates:
## mean in group F mean in group M
## 253068.0 284241.9
In the above test, the result is given as a list datatype in R. Therefore, we call the particular element to give us the P-value
test[3]
## $p.value
## [1] 0.00234019
Therefore, from above it is obvious that the p-value of the test is 0.0023402
When t-test was run to check for the hypothesis, the p-value found was 0.0023402. This value is less than p = 0.05, which would lead us to reject the NULL hypothesis.
Therefore, from the above t-test we can say with 99% confidence that
There is a significant difference between the salaries of Male MBAs and female MBAs
With this, we come to the conclusion of our report.