Setting the working directory along with creating and viewing the dataframe:
#1) Setting working directory
setwd("C:/Users/HP/Downloads/Intern/DeanDilemma")
#2) Reading the data using read.csv and creation of dataframe
dilemma.df <- read.csv(paste("Data - Deans Dilemma.csv", sep=""))
#3) Viewing the data frame in R
View(dilemma.df)
apply(dilemma.df[26], 2, median)
## Salary
## 240000
options(digits=5)
mytable<-xtabs(~Placement_B,data=dilemma.df)
mytable
## Placement_B
## 0 1
## 79 312
format(round(prop.table(mytable)*100, 2), nsmall = 2)
## Placement_B
## 0 1
## "20.20" "79.80"
3)R to create a dataframe called placed, that contains a subset of only those students who were successfully placed.
placed.df <- dilemma.df[which(dilemma.df$Placement_B=='1'),]
apply(placed.df[26], 2, median)
## Salary
## 260000
aggregate(placed.df$Salary, by=list(Gender = placed.df$Gender), mean)
## Gender x
## 1 F 253068
## 2 M 284242
hist(placed.df$Percent_MBA, main="MBA Performance of placed students", xlab="MBA percentage", ylab="Count", breaks=3, col="lightblue")
notplaced.df <- dilemma.df[which(dilemma.df$Placement_B=='0'),]
par(mfrow=c(1, 2))
hist(placed.df$Percent_MBA, main="MBA Performance of placed students", xlab="MBA percentage", ylab="Count", breaks=3, col="lightblue")
hist(notplaced.df$Percent_MBA, main="MBA Performance of placed students", xlab="MBA percentage", ylab="Count", breaks=3, col="lightblue")
par(mfrow=c(1,1))
boxplot(Salary~Gender.B, data=placed.df, horizontal = TRUE, yaxt="n", ylab="Gender", xlab="Salary", main="Comparison of Salaries of males and females")
axis(side=2, at=c(1,2), labels=c("Females", "Males"))
placedET.df <- dilemma.df[which(dilemma.df$Entrance_Test!="None" & dilemma.df$Experience_Yrs=='0'),]
library(car)
pairs(formula = ~Salary + Percent_MBA + Percentile_ET, cex=0.6, data=placedET.df)
R to create a table showing the average salary of males and females, who were placed.
aggregate(placed.df$Salary, list(placed.df$Gender), mean)
## Group.1 x
## 1 F 253068
## 2 M 284242
males_placed.df <- placed.df[which(placed.df$Gender.B=='0'),]
salary.males.placed = apply(males_placed.df[26], 2, mean)
salary.males.placed
## Salary
## 284242
females_placed.df <- placed.df[which(placed.df$Gender.B=='1'),]
salary.females.placed = apply(females_placed.df[26], 2, mean)
salary.females.placed
## Salary
## 253068
gap = salary.males.placed - salary.females.placed
gap
## Salary
## 31174
Yes, There is Gender Gap as average salaries of males is higher than the average salaries of females in this dataset.
Visualising data:
par(mfrow=c(1,1))
boxplot(placed.df$Salary~placed.df$Gender, horizontal=TRUE,las=1, xlab="Salary",ylab="Gender", main="Salary of Students (M/F)")
TASK 3c: Use R to run a t-test to test the following hypothesis:
H1: The average salary of the male MBAs is higher than the average salary of female MBAs.
t.test(placed.df$Salary~placed.df$Gender, data=placed.df)
##
## Welch Two Sample t-test
##
## data: placed.df$Salary by placed.df$Gender
## t = -3.08, df = 243, p-value = 0.0023
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -51138 -11209
## sample estimates:
## mean in group F mean in group M
## 253068 284242
p-value is 0.0023. (p-value<0.05 is indicative of a significant difference between the means of the salaries of males and females and we could reject our Null Hypotheses.
The above output of t-test is indicative that we can reject the hypotheses that the mean salary of males is equal to the mean salary to females.