Dean’s Dilemma for Selection of Students into the MBA Program

Since MBA programs are prestigious in terms of the exposure the students get and placements, the deans of almost all B-schools come across the dilemma of setting qualitative as well as quantitative criteria to be followed to accept students. A lot of information regarding their academic, co-curricular performance and work experience is gathered to be able to select students. The following is a CSV format file which contains information like Gender, Percentage scored in 10th and 12th board examinations, Performance in under-grad, Work Experience, MBA Entrance Test taken, Performance in MBA, Salary post MBA. The analysis is given below:-

  1. READING DATA
setwd("C:/Users/Dell/Desktop/Project/Week 1/Day 6")
dilemma.df=read.csv("Data - Deans Dilemma.csv")
View(dilemma.df)
  1. SUMMARIZING DATA FOR CERTAIN VARIABLES a.) Gender distribution of applicants
prop.table(table(dilemma.df$Gender),margin=NULL)*100
## 
##        F        M 
## 32.48082 67.51918

b.) How many actually took the test

prop.table(table(dilemma.df$S.TEST),margin=NULL)*100
## 
##        0        1 
## 17.13555 82.86445

c.) Summary of students in CBSE board during their SSC

library(psych)
CBSE_SSC=dilemma.df[which(dilemma.df$Board_SSC=='CBSE'),"Percent_SSC"]
summary(CBSE_SSC)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   40.00   55.00   61.00   62.92   70.00   85.80
describe(CBSE_SSC)
##    vars   n  mean    sd median trimmed   mad min  max range skew kurtosis
## X1    1 113 62.92 11.04     61   62.61 11.86  40 85.8  45.8 0.23    -0.74
##      se
## X1 1.04

d.) Summary of students in ICSE board during their SSC

ICSE_SSC=dilemma.df[which(dilemma.df$Board_SSC=='ICSE'),"Percent_SSC"]
summary(ICSE_SSC)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    50.0    59.0    64.0    65.4    72.0    87.0
describe(ICSE_SSC)
##    vars  n mean   sd median trimmed mad min max range skew kurtosis se
## X1    1 77 65.4 8.78     64   64.93 8.9  50  87    37 0.44    -0.67  1

e.) Summary of students in Other boards during their SSC

Others_SSC=dilemma.df[which(dilemma.df$Board_SSC=='Others'),"Percent_SSC"]
summary(Others_SSC)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   37.00   56.00   66.30   65.34   75.00   87.20
describe(Others_SSC)
##    vars   n  mean    sd median trimmed   mad min  max range  skew kurtosis
## X1    1 201 65.34 11.59   66.3    65.8 14.38  37 87.2  50.2 -0.28    -0.77
##      se
## X1 0.82

By looking at the summary reports, we can say that each board can be treated equally for evaluating the appplications, that is almost same cut offs. This is because of the statistical summary to be close enough(mean).

f.) Summary of students who took Commerce during HSC

Commerce=dilemma.df[which(dilemma.df$Stream_HSC=='Commerce'),"Percent_HSC"]
summary(Commerce)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   40.00   57.00   66.72   66.52   75.15   94.00

g.) Summary of students who took Science during HSC

Science=dilemma.df[which(dilemma.df$Stream_HSC=='Science'),"Percent_HSC"]
summary(Science)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   42.00   52.00   58.00   59.76   67.65   94.70

h.) Summary of students who took Arts during HSC

Arts=dilemma.df[which(dilemma.df$Stream_HSC=='Arts'),"Percent_HSC"]
summary(Arts)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   40.00   59.00   63.00   64.05   72.25   83.00

By looking at the summary reports, we have to appreciate that each stream has performed differently in boards for which the cuts off should vary.

i.) Summary of Test score in the MBA Entrance Tests

summary(dilemma.df$S.TEST.SCORE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   41.19   62.00   54.93   78.00   98.69
  1. MEDIAN SALARY OF ALL APPLICANTS
median(dilemma.df$Salary)
## [1] 240000
  1. PERCENTAGE OF STUDENTS WHO WERE PLACED
m=prop.table(xtabs(~Placement, dilemma.df),margin=NULL)
m*100
## Placement
## Not Placed     Placed 
##    20.2046    79.7954
  1. PLACED.df, A SUBSET OF ALL THOSE STUDENTS WHO WERE SUCCESSFULLY PLACED
PLACED.df=dilemma.df[which(dilemma.df$Placement_B==1),]
View(PLACED.df)
  1. MEDIAN SALARY OF ALL PLACED STUDENTS
median(PLACED.df$Salary)
## [1] 260000
  1. CREATING A TABLE SHOWING THE MEAN SALARY OF MALES AND FEMALES, WHO WERE PLACED
Female_Salary=PLACED.df[which(PLACED.df$Gender.B==1),"Salary"]
Male_Salary=PLACED.df[which(PLACED.df$Gender.B==0),"Salary"]
Mean_Salary=c(mean(Female_Salary),mean(Male_Salary))
Mean_Salary
## [1] 253068.0 284241.9
  1. GENERATING A HISTOGRAM SHOWING A BREAKUP OF THE MBA PERFORMANCE OF THE STUDENTS WHO WERE PLACED

  2. UNPLACED.df, A SUBSET OF ONLY THOSE STUDENTS WHO WERE NOT PLACED

UNPLACED.df=dilemma.df[which(dilemma.df$Placement_B==0),]
View(UNPLACED.df)
  1. DRAWING TWO HISTOGRAMS SIDE-BY-SIDE, VISUALLY COMPARING THE MBA PERFORMANCE OF PLACED AND NOT PLACED STUDENTS

  2. DRAWING TWO BOXPLOTS, ONE BELOW THE OTHER, COMPARING THE DISTRIBUTION OF SALARIES OF MALES AND FEMALES WHO WERE PLACED

  3. PlacedET.df, REPRESENTING STUDENTS WHO WERE PLACED AFTER MBA AND WHO ALSO GAVE SOME MBA ENTRANCE TEST BEFORE ADMISSION INTO THE MBA PROGRAM

PlacedET.df=dilemma.df[which(dilemma.df$S.TEST==1&dilemma.df$Placement_B==1),]
View(PlacedET.df)
  1. DRAWING A SCATTER PLOT MATRIX FOR 3 VARIABLES – {Salary, Percent_MBA, Percentile_ET} USING THE DATAFRAME PlacedET
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit