The Dean’s Dilemna Case Study

Loading the dataset

admi.df <- read.csv("Data - Deans Dilemma.csv")
View(admi.df)

Summarize the dataset

summary(admi.df)

##       SlNo       Gender     Gender.B       Percent_SSC     Board_SSC  
##  Min.   :  1.0   F:127   Min.   :0.0000   Min.   :37.00   CBSE  :113  
##  1st Qu.: 98.5   M:264   1st Qu.:0.0000   1st Qu.:56.00   ICSE  : 77  
##  Median :196.0           Median :0.0000   Median :64.50   Others:201  
##  Mean   :196.0           Mean   :0.3248   Mean   :64.65               
##  3rd Qu.:293.5           3rd Qu.:1.0000   3rd Qu.:74.00               
##  Max.   :391.0           Max.   :1.0000   Max.   :87.20               
##                                                                       
##    Board_CBSE      Board_ICSE      Percent_HSC    Board_HSC  
##  Min.   :0.000   Min.   :0.0000   Min.   :40.0   CBSE  : 96  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:54.0   ISC   : 48  
##  Median :0.000   Median :0.0000   Median :63.0   Others:247  
##  Mean   :0.289   Mean   :0.1969   Mean   :63.8               
##  3rd Qu.:1.000   3rd Qu.:0.0000   3rd Qu.:72.0               
##  Max.   :1.000   Max.   :1.0000   Max.   :94.7               
##                                                              
##     Stream_HSC  Percent_Degree                Course_Degree
##  Arts    : 18   Min.   :35.00   Arts                 : 13  
##  Commerce:222   1st Qu.:57.52   Commerce             :117  
##  Science :151   Median :63.00   Computer Applications: 32  
##                 Mean   :62.98   Engineering          : 37  
##                 3rd Qu.:69.00   Management           :163  
##                 Max.   :89.00   Others               :  5  
##                                 Science              : 24  
##   Degree_Engg      Experience_Yrs   Entrance_Test     S.TEST      
##  Min.   :0.00000   Min.   :0.0000   MAT    :265   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.:0.0000   None   : 67   1st Qu.:1.0000  
##  Median :0.00000   Median :0.0000   K-MAT  : 24   Median :1.0000  
##  Mean   :0.09463   Mean   :0.4783   CAT    : 22   Mean   :0.8286  
##  3rd Qu.:0.00000   3rd Qu.:1.0000   PGCET  :  8   3rd Qu.:1.0000  
##  Max.   :1.00000   Max.   :3.0000   GCET   :  2   Max.   :1.0000  
##                                     (Other):  3                   
##  Percentile_ET    S.TEST.SCORE    Percent_MBA   
##  Min.   : 0.00   Min.   : 0.00   Min.   :50.83  
##  1st Qu.:41.19   1st Qu.:41.19   1st Qu.:57.20  
##  Median :62.00   Median :62.00   Median :61.01  
##  Mean   :54.93   Mean   :54.93   Mean   :61.67  
##  3rd Qu.:78.00   3rd Qu.:78.00   3rd Qu.:66.02  
##  Max.   :98.69   Max.   :98.69   Max.   :77.89  
##                                                 
##            Specialization_MBA Marks_Communication Marks_Projectwork
##  Marketing & Finance:222      Min.   :50.00       Min.   :50.00    
##  Marketing & HR     :156      1st Qu.:53.00       1st Qu.:64.00    
##  Marketing & IB     : 13      Median :58.00       Median :69.00    
##                               Mean   :60.54       Mean   :68.36    
##                               3rd Qu.:67.00       3rd Qu.:74.00    
##                               Max.   :88.00       Max.   :87.00    
##                                                                    
##    Marks_BOCA         Placement    Placement_B        Salary      
##  Min.   :50.00   Not Placed: 79   Min.   :0.000   Min.   :     0  
##  1st Qu.:57.00   Placed    :312   1st Qu.:1.000   1st Qu.:172800  
##  Median :63.00                    Median :1.000   Median :240000  
##  Mean   :64.38                    Mean   :0.798   Mean   :219078  
##  3rd Qu.:72.50                    3rd Qu.:1.000   3rd Qu.:300000  
##  Max.   :96.00                    Max.   :1.000   Max.   :940000  
##

Median salary of all the students in the data sample

med <- median(admi.df$Salary)
med

## [1] 240000

Median salary of all the students in the data sample = 240000

Percentage of students who were Placed

perc <- (sum(admi.df$Placement_B)/nrow(admi.df))*100
round(perc, digits = 2)

## [1] 79.8

Percentage of students who were Placed = 79.80%

To create a new dataframe called placed, that contains a subset of only those students who were successfully placed

Placed <- admi.df[which(admi.df$Placement_B==1),]

Median salary of students who were placed

median(Placed$Salary)

## [1] 260000

Median salary of students who were placed= 260000

To create a table showing the mean salary of males and females, who were placed

aggregate(Placed$Salary, by=list(sex= Placed$Gender), mean)

##   sex        x
## 1   F 253068.0
## 2   M 284241.9

To generate a histogram showing a breakup of the MBA performance of the students who were placed

hist(Placed$Percent_MBA,
     main = "MBA Performance of Placed Students", 
     xlab = "MBA Percentage",
     breaks = 3,
     ylab = "Count",
     col = "darkgrey")

To create a dataframe called notplaced, that contains a subset of only those students who were NOT placed after their MBA

notplaced <- admi.df[which(admi.df$Placement_B==0),]

To create a histogram showing comparision in the MBA performance of Placed and Not Placed students

par(mfrow =c(1,2))
hist(Placed$Percent_MBA,
     main = "MBA Performance of Placed Students", 
     xlab = "MBA Percentage",
     breaks = 3,
     ylab = "Count",
     col = "darkgrey")
hist(notplaced$Percent_MBA,
     main = "MBA Performance of not Placed Students", 
     xlab = "MBA Percentage",
     breaks = 3,
     ylab = "Count",
     col = "darkgrey")

par(mfrow =c(1,1))

To draw boxplots showing the comparision of the distribution of salaries of males and females who were placed

boxplot(Placed$Salary~Placed$Gender.B, horizontal = TRUE,
        yaxt="n", xlab= "Salary", ylab="Gender", las = 1)
axis(side = 2, at=c(1,2), labels=c("Males","Females"))

To create a dataframe called placedET, representing students who were placed after the MBA and who also gave some MBA entrance test before admission into the MBA program

PlacedET <- admi.df[which(admi.df$Placement_B ==1 & admi.df$S.TEST == 1),]

To draw a Scatter Plot Matrix for 3 variables – {Salary, Percent_MBA, Percentile_ET} using the dataframe placedET

library("car")

## Warning: package 'car' was built under R version 3.3.3

scatterplotMatrix(formula = ~PlacedET$Salary + PlacedET$Percent_MBA + PlacedET$Percentile_ET)

The Dean’s Dilemna Case Study

Anshul Nograiya

3 February 2018

Loading the dataset

Summarize the dataset

Median salary of all the students in the data sample

Percentage of students who were Placed

To create a new dataframe called placed, that contains a subset of only those students who were successfully placed

Median salary of students who were placed

To create a table showing the mean salary of males and females, who were placed

To generate a histogram showing a breakup of the MBA performance of the students who were placed

To create a dataframe called notplaced, that contains a subset of only those students who were NOT placed after their MBA

To create a histogram showing comparision in the MBA performance of Placed and Not Placed students

To draw boxplots showing the comparision of the distribution of salaries of males and females who were placed

To create a dataframe called placedET, representing students who were placed after the MBA and who also gave some MBA entrance test before admission into the MBA program

To draw a Scatter Plot Matrix for 3 variables – {Salary, Percent_MBA, Percentile_ET} using the dataframe placedET