Reading the Dean’s Dilemma case study dataset into a data frame for futher investigation and insights.
dd <- read.csv(paste("DeansDilemmaData.csv",sep=""))
Now having a further look into the Dataset by calculating the summary statistics for important variables.
summary(dd$Percent_SSC)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 37.00 56.00 64.50 64.65 74.00 87.20
summary(dd$Percent_HSC)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 40.0 54.0 63.0 63.8 72.0 94.7
summary(dd$Percent_Degree)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 35.00 57.52 63.00 62.98 69.00 89.00
describe(dd$Salary)
## vars n mean sd median trimmed mad min max range skew
## X1 1 391 219078.3 138311.6 240000 217011.5 88956 0 940000 940000 0.24
## kurtosis se
## X1 1.74 6994.72
lets use R to calculate the median salary of all the students in the data sample
median(dd$Salary)
## [1] 240000
Use R to calculate the percentage of students who were placed, correct to 2 decimal places.
table(dd$Placement)
##
## Not Placed Placed
## 79 312
Lets use R to create a dataframe called placed, that contains a subset of only those students who were successfully placed.
newdata <- subset(dd,Placement_B=='1',select=c(SlNo,Gender,Percent_SSC,
Percent_HSC,S.TEST.SCORE,Percent_MBA,Placement,Placement_B,Specialization_MBA,Salary))
head(newdata)
## SlNo Gender Percent_SSC Percent_HSC S.TEST.SCORE Percent_MBA Placement
## 1 1 M 62.00 88.00 55.0 58.80 Placed
## 2 2 M 76.33 75.33 86.5 66.28 Placed
## 3 3 M 72.00 78.00 0.0 52.91 Placed
## 4 4 M 60.00 63.00 75.0 57.80 Placed
## 5 5 M 61.00 55.00 66.0 59.43 Placed
## 6 6 M 55.00 64.00 0.0 56.81 Placed
## Placement_B Specialization_MBA Salary
## 1 1 Marketing & HR 270000
## 2 1 Marketing & Finance 200000
## 3 1 Marketing & Finance 240000
## 4 1 Marketing & Finance 250000
## 5 1 Marketing & HR 180000
## 6 1 Marketing & Finance 300000
Lets use R to find the median salary of students who were placed.
median(newdata$Salary)
## [1] 260000
Lets use R to create a table showing the mean salary of males and females, who were placed.
aggregate(newdata$Salary,by= list(Gender=newdata$Gender),mean)
## Gender x
## 1 F 253068.0
## 2 M 284241.9
Lets use R to generate the following histogram showing a breakup of the MBA performance of the students who were placed
hist(newdata$Percent_MBA,xlim=c(50,80),ylim=c(0,150),xlab="MBAPercentage",ylab="Count",breaks=3,main="MBA Performance of placed students",col=c("lightblue"))
Lets create a dataframe called notplaced, that contains a subset of only those students who were NOT placed after their MBA.
notplaced <- subset(dd,Placement_B=='0',select=c(SlNo,Gender,Percent_SSC,
Percent_HSC,S.TEST.SCORE,Percent_MBA,Placement,Placement_B,Specialization_MBA,Salary))
head(notplaced)
## SlNo Gender Percent_SSC Percent_HSC S.TEST.SCORE Percent_MBA Placement
## 11 11 F 79.60 87.0 98.69 69.78 Not Placed
## 16 16 F 49.00 52.2 74.28 53.29 Not Placed
## 20 20 M 66.00 46.0 0.00 54.65 Not Placed
## 40 40 F 60.00 75.0 60.00 67.28 Not Placed
## 42 42 F 40.00 40.0 49.00 51.75 Not Placed
## 43 43 M 77.12 85.0 35.00 56.34 Not Placed
## Placement_B Specialization_MBA Salary
## 11 0 Marketing & HR 0
## 16 0 Marketing & Finance 0
## 20 0 Marketing & Finance 0
## 40 0 Marketing & Finance 0
## 42 0 Marketing & Finance 0
## 43 0 Marketing & IB 0
Draw two histograms side-by-side, visually comparing the MBA performance of Placed and Not Placed students, as follows:
par(mfrow=c(1,2))
hist(newdata$Percent_MBA,xlim=c(50,80),ylim=c(0,150),xlab="MBAPercentage",
ylab="Count",
breaks=3,
main="MBA Performance of placed students",
col=c("lightblue"))
hist(notplaced$Percent_MBA,xlim=c(50,80),ylim=c(0,150),xlab="MBAPercentage",
ylab="Count",
breaks=3,
main="MBA Performance of placed students",
col=c("lightblue"))
Lets use R to draw two boxplots, one below the other, comparing the distribution of salaries of males and females who were placed, as follows:
boxplot( Salary ~ Gender, data= newdata,horizontal=TRUE,yaxt="n",
xlab="Salary",ylab="Gender",
main="Comparison of Salaries of Male and Females")
axis(side=2,at=c(1,2),labels=c("Females","Males"))
Lets create a dataframe called placedET, representing students who were placed after the MBA and who also gave some MBA entrance test before admission into the MBA program.
placedET <- subset(dd,Placement_B=='1' & S.TEST=='1',select=c(SlNo,Gender,Percent_SSC,
Percent_HSC,S.TEST.SCORE,Percent_MBA,Percentile_ET,Placement,Placement_B,Specialization_MBA,Salary))
head(placedET)
## SlNo Gender Percent_SSC Percent_HSC S.TEST.SCORE Percent_MBA
## 1 1 M 62.00 88.00 55.00 58.80
## 2 2 M 76.33 75.33 86.50 66.28
## 4 4 M 60.00 63.00 75.00 57.80
## 5 5 M 61.00 55.00 66.00 59.43
## 8 8 M 68.00 77.00 43.12 57.23
## 9 9 M 82.80 70.60 96.80 55.50
## Percentile_ET Placement Placement_B Specialization_MBA Salary
## 1 55.00 Placed 1 Marketing & HR 270000
## 2 86.50 Placed 1 Marketing & Finance 200000
## 4 75.00 Placed 1 Marketing & Finance 250000
## 5 66.00 Placed 1 Marketing & HR 180000
## 8 43.12 Placed 1 Marketing & Finance 235000
## 9 96.80 Placed 1 Marketing & Finance 425000
Drawing a Scatter Plot Matrix for 3 variables – {Salary, Percent_MBA, Percentile_ET} using the dataframe placedET.
scatterplot.matrix(formula= ~Salary + Percent_MBA + Percentile_ET,cex=0.6,
data=placedET,diagonal="density")
## Warning: 'scatterplot.matrix' is deprecated.
## Use 'scatterplotMatrix' instead.
## See help("Deprecated") and help("car-deprecated").