Helping a dean to resolve his dilemma.

1.Reading the Dataset

setwd("C:/Users/CJ With HP/Desktop/IIM Lucknow/Datasets")
dean.df <- read.csv(paste("Data - Deans Dilemma.csv",sep=""))

2.Median salary of all the students in the data sample

median(dean.df$Salary)
## [1] 240000

3.Percentage of students who were placed, correct to 2 decimal places

mytable <- with(dean.df,table(Placement))
format(round(prop.table(mytable)*100,2),nsmall=2)
## Placement
## Not Placed     Placed 
##    "20.20"    "79.80"

4.Create a dataframe called placed, that contains a subset of only those students who were successfully placed

placed.df <- dean.df[which(dean.df$Placement=="Placed"),]

5.Median salary of students who were placed

median(placed.df$Salary)
## [1] 260000

6.Mean salary of males and females, who were placed

aggregate(placed.df$Salary,by=list(Gend = placed.df$Gender),mean)
##   Gend        x
## 1    F 253068.0
## 2    M 284241.9

7.Histogram showing a breakup of the MBA performance of the students who were placed

library(lattice)
hist(placed.df$Percent_MBA,main="MBA Performance of placed students",xlab = "Percent_MBA",ylab="Count",xlim=c(50,80))

8.Create a dataframe called notplaced, that contains a subset of only those students who were NOT placed after their MBA

notplaced.df <- dean.df[which(dean.df$Placement=="Not Placed"),]

9.Histograms side-by-side, visually comparing the MBA performance of Placed and Not Placed students

par(mfrow=c(1,2))
hist(placed.df$Percent_MBA,main="MBA Performance of\nplaced students",xlab = "Percent_MBA",ylab="Count",xlim=c(50,80),breaks=3)

hist(notplaced.df$Percent_MBA,main="MBA Performance of\nnot placed students",xlab = "Percent_MBA",ylab="Count",xlim=c(50,80),breaks=3)

10. Two boxplots, one below the other, comparing the distribution of salaries of males and females who were placed

par(mfrow=c(1,1))
boxplot(Salary ~ Gender,data=placed.df,horizontal = TRUE,xlab = "Salary",ylab="Gender")

11.Create a dataframe called placedET, representing students who were placed after the MBA and who also gave some MBA entrance test before admission into the MBA program.

placedET.df <- dean.df[which(dean.df$Placement=="Placed" & dean.df$Entrance_Test!="None"),]

12.Scatter Plot Matrix for 3 variables – {Salary, Percent_MBA, Percentile_ET} using the dataframe placedET

library(car)
scatterplotMatrix(formula=~Salary+Percent_MBA+Percentile_ET,cex=0.5,data=placedET.df,diagonal="density")