Setting the working directory along with creating and viewing the dataframe:

#1) Setting working directory
setwd("C:/Users/HP/Downloads/Intern/DeanDilemma")

#2) Reading the data using read.csv and creation of dataframe
dilemma.df <- read.csv(paste("Data - Deans Dilemma.csv", sep=""))

#3) Viewing the data frame in R
View(dilemma.df)
  1. R to calculate the median salary of all the students in the data sample
apply(dilemma.df[26], 2, median)
## Salary 
## 240000
  1. R to calculate the percentage of students who were placed, correct to 2 decimal places.
options(digits=5)

mytable<-xtabs(~Placement_B,data=dilemma.df)
mytable
## Placement_B
##   0   1 
##  79 312
format(round(prop.table(mytable)*100, 2), nsmall = 2)
## Placement_B
##       0       1 
## "20.20" "79.80"

3)R to create a dataframe called placed, that contains a subset of only those students who were successfully placed.

placed.df <- dilemma.df[which(dilemma.df$Placement_B=='1'),]
  1. R to find the median salary of students who were placed.
apply(placed.df[26], 2, median)
## Salary 
## 260000
  1. R to create a table showing the mean salary of males and females, who were placed.
aggregate(placed.df$Salary, by=list(Gender = placed.df$Gender), mean)
##   Gender      x
## 1      F 253068
## 2      M 284242
  1. R to generate the histogram showing a breakup of the MBA performance of the students who were placed
hist(placed.df$Percent_MBA, main="MBA Performance of placed students", xlab="MBA percentage", ylab="Count", breaks=3, col="lightblue")

  1. Create a dataframe called notplaced, that contains a subset of only those students who were NOT placed after their MBA.
notplaced.df <- dilemma.df[which(dilemma.df$Placement_B=='0'),]
  1. Draw two histograms side-by-side, visually comparing the MBA performance of Placed and Not Placed students
par(mfrow=c(1, 2))

hist(placed.df$Percent_MBA, main="MBA Performance of placed students", xlab="MBA percentage", ylab="Count", breaks=3, col="lightblue")

hist(notplaced.df$Percent_MBA, main="MBA Performance of placed students", xlab="MBA percentage", ylab="Count", breaks=3, col="lightblue")

par(mfrow=c(1,1))
  1. R to draw two boxplots, one below the other, comparing the distribution of salaries of males and females who were placed
boxplot(Salary~Gender.B, data=placed.df,  horizontal = TRUE, yaxt="n", ylab="Gender", xlab="Salary", main="Comparison of Salaries of males and females")
axis(side=2, at=c(1,2), labels=c("Females", "Males"))

  1. Create a dataframe called placedET, representing students who were placed after the MBA and who also gave some MBA entrance test before admission into the MBA program.
placedET.df <- dilemma.df[which(dilemma.df$Entrance_Test!="None" & dilemma.df$Experience_Yrs=='0'),]
  1. Scatter Plot Matrix for 3 variables – {Salary, Percent_MBA, Percentile_ET} using the dataframe placedET.
library(car)
pairs(formula = ~Salary + Percent_MBA + Percentile_ET, cex=0.6, data=placedET.df)

  1. WEEK 2 DAY 1 : TASK 3b

R to create a table showing the average salary of males and females, who were placed.

aggregate(placed.df$Salary, list(placed.df$Gender), mean)
##   Group.1      x
## 1       F 253068
## 2       M 284242
  1. Average Salary of Male MBAs who were placed
males_placed.df <- placed.df[which(placed.df$Gender.B=='0'),]
salary.males.placed = apply(males_placed.df[26], 2, mean)  
salary.males.placed
## Salary 
## 284242
  1. Average Salary of Female MBAs who were placed
females_placed.df <- placed.df[which(placed.df$Gender.B=='1'),]
salary.females.placed = apply(females_placed.df[26], 2, mean)
salary.females.placed
## Salary 
## 253068
  1. Checking gender gap
gap = salary.males.placed - salary.females.placed
gap
## Salary 
##  31174

Yes, There is Gender Gap as average salaries of males is higher than the average salaries of females in this dataset.

Visualising data:

par(mfrow=c(1,1))
boxplot(placed.df$Salary~placed.df$Gender, horizontal=TRUE,las=1, xlab="Salary",ylab="Gender", main="Salary of Students (M/F)")

  1. TASK 3c: Use R to run a t-test to test the following hypothesis:

    H1: The average salary of the male MBAs is higher than the average salary of female MBAs.

t.test(placed.df$Salary~placed.df$Gender, data=placed.df)
## 
##  Welch Two Sample t-test
## 
## data:  placed.df$Salary by placed.df$Gender
## t = -3.08, df = 243, p-value = 0.0023
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -51138 -11209
## sample estimates:
## mean in group F mean in group M 
##          253068          284242

p-value is 0.0023. (p-value<0.05 is indicative of a significant difference between the means of the salaries of males and females and we could reject our Null Hypotheses.

The above output of t-test is indicative that we can reject the hypotheses that the mean salary of males is equal to the mean salary to females.