Reading the Dean’s Dilemma case study dataset into a data frame for futher investigation and insights.

dd <- read.csv(paste("DeansDilemmaData.csv",sep=""))

Now having a further look into the Dataset by calculating the summary statistics for important variables.

summary(dd$Percent_SSC)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   37.00   56.00   64.50   64.65   74.00   87.20
summary(dd$Percent_HSC)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    40.0    54.0    63.0    63.8    72.0    94.7
summary(dd$Percent_Degree)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   35.00   57.52   63.00   62.98   69.00   89.00
describe(dd$Salary)
##    vars   n     mean       sd median  trimmed   mad min    max  range skew
## X1    1 391 219078.3 138311.6 240000 217011.5 88956   0 940000 940000 0.24
##    kurtosis      se
## X1     1.74 6994.72

lets use R to calculate the median salary of all the students in the data sample

median(dd$Salary)
## [1] 240000

Use R to calculate the percentage of students who were placed, correct to 2 decimal places.

table(dd$Placement)
## 
## Not Placed     Placed 
##         79        312

Lets use R to create a dataframe called placed, that contains a subset of only those students who were successfully placed.

newdata <- subset(dd,Placement_B=='1',select=c(SlNo,Gender,Percent_SSC,
Percent_HSC,S.TEST.SCORE,Percent_MBA,Placement,Placement_B,Specialization_MBA,Salary))
head(newdata)
##   SlNo Gender Percent_SSC Percent_HSC S.TEST.SCORE Percent_MBA Placement
## 1    1      M       62.00       88.00         55.0       58.80    Placed
## 2    2      M       76.33       75.33         86.5       66.28    Placed
## 3    3      M       72.00       78.00          0.0       52.91    Placed
## 4    4      M       60.00       63.00         75.0       57.80    Placed
## 5    5      M       61.00       55.00         66.0       59.43    Placed
## 6    6      M       55.00       64.00          0.0       56.81    Placed
##   Placement_B  Specialization_MBA Salary
## 1           1      Marketing & HR 270000
## 2           1 Marketing & Finance 200000
## 3           1 Marketing & Finance 240000
## 4           1 Marketing & Finance 250000
## 5           1      Marketing & HR 180000
## 6           1 Marketing & Finance 300000

Lets use R to find the median salary of students who were placed.

median(newdata$Salary)
## [1] 260000

Lets use R to create a table showing the mean salary of males and females, who were placed.

aggregate(newdata$Salary,by= list(Gender=newdata$Gender),mean)
##   Gender        x
## 1      F 253068.0
## 2      M 284241.9

Lets use R to generate the following histogram showing a breakup of the MBA performance of the students who were placed

hist(newdata$Percent_MBA,xlim=c(50,80),ylim=c(0,150),xlab="MBAPercentage",ylab="Count",breaks=3,main="MBA Performance of placed students",col=c("lightblue"))

Lets create a dataframe called notplaced, that contains a subset of only those students who were NOT placed after their MBA.

notplaced <- subset(dd,Placement_B=='0',select=c(SlNo,Gender,Percent_SSC,
Percent_HSC,S.TEST.SCORE,Percent_MBA,Placement,Placement_B,Specialization_MBA,Salary))
head(notplaced)
##    SlNo Gender Percent_SSC Percent_HSC S.TEST.SCORE Percent_MBA  Placement
## 11   11      F       79.60        87.0        98.69       69.78 Not Placed
## 16   16      F       49.00        52.2        74.28       53.29 Not Placed
## 20   20      M       66.00        46.0         0.00       54.65 Not Placed
## 40   40      F       60.00        75.0        60.00       67.28 Not Placed
## 42   42      F       40.00        40.0        49.00       51.75 Not Placed
## 43   43      M       77.12        85.0        35.00       56.34 Not Placed
##    Placement_B  Specialization_MBA Salary
## 11           0      Marketing & HR      0
## 16           0 Marketing & Finance      0
## 20           0 Marketing & Finance      0
## 40           0 Marketing & Finance      0
## 42           0 Marketing & Finance      0
## 43           0      Marketing & IB      0

Draw two histograms side-by-side, visually comparing the MBA performance of Placed and Not Placed students, as follows:

par(mfrow=c(1,2))
hist(newdata$Percent_MBA,xlim=c(50,80),ylim=c(0,150),xlab="MBAPercentage",
     ylab="Count",
     breaks=3,
     main="MBA Performance of placed students",
     col=c("lightblue"))
hist(notplaced$Percent_MBA,xlim=c(50,80),ylim=c(0,150),xlab="MBAPercentage",
     ylab="Count",
     breaks=3,
     main="MBA Performance of placed students",
     col=c("lightblue"))

Lets use R to draw two boxplots, one below the other, comparing the distribution of salaries of males and females who were placed, as follows:

boxplot( Salary ~ Gender, data= newdata,horizontal=TRUE,yaxt="n",
        xlab="Salary",ylab="Gender",
        main="Comparison of Salaries of Male and Females")
        axis(side=2,at=c(1,2),labels=c("Females","Males"))

Lets create a dataframe called placedET, representing students who were placed after the MBA and who also gave some MBA entrance test before admission into the MBA program.

placedET <-  subset(dd,Placement_B=='1' & S.TEST=='1',select=c(SlNo,Gender,Percent_SSC,
Percent_HSC,S.TEST.SCORE,Percent_MBA,Percentile_ET,Placement,Placement_B,Specialization_MBA,Salary))
head(placedET)
##   SlNo Gender Percent_SSC Percent_HSC S.TEST.SCORE Percent_MBA
## 1    1      M       62.00       88.00        55.00       58.80
## 2    2      M       76.33       75.33        86.50       66.28
## 4    4      M       60.00       63.00        75.00       57.80
## 5    5      M       61.00       55.00        66.00       59.43
## 8    8      M       68.00       77.00        43.12       57.23
## 9    9      M       82.80       70.60        96.80       55.50
##   Percentile_ET Placement Placement_B  Specialization_MBA Salary
## 1         55.00    Placed           1      Marketing & HR 270000
## 2         86.50    Placed           1 Marketing & Finance 200000
## 4         75.00    Placed           1 Marketing & Finance 250000
## 5         66.00    Placed           1      Marketing & HR 180000
## 8         43.12    Placed           1 Marketing & Finance 235000
## 9         96.80    Placed           1 Marketing & Finance 425000

Drawing a Scatter Plot Matrix for 3 variables – {Salary, Percent_MBA, Percentile_ET} using the dataframe placedET.

scatterplot.matrix(formula= ~Salary + Percent_MBA + Percentile_ET,cex=0.6,
                   data=placedET,diagonal="density")
## Warning: 'scatterplot.matrix' is deprecated.
## Use 'scatterplotMatrix' instead.
## See help("Deprecated") and help("car-deprecated").