A Dean’s Dilemma

This is an R Markdown report on the case study of “A Dean’s Dilemma”. It contains a list of tasks carried out on the dataset which were mentioned in the Week 1 Day 6 task list.

Task 2b : Reading the dataset

setwd("~/Muyeena/Internship/Deans dillemma")
mbadata = read.csv("deans dilemma.csv")
#View(mbadata)
str(mbadata) ## To get a basic idea about the structure of the dataset
## 'data.frame':    391 obs. of  26 variables:
##  $ SlNo               : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Gender             : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 1 2 2 1 ...
##  $ Gender.B           : int  0 0 0 0 0 0 1 0 0 1 ...
##  $ Percent_SSC        : num  62 76.3 72 60 61 ...
##  $ Board_SSC          : Factor w/ 3 levels "CBSE","ICSE",..: 3 2 3 1 1 2 3 2 1 1 ...
##  $ Board_CBSE         : int  0 0 0 1 1 0 0 0 1 1 ...
##  $ Board_ICSE         : int  0 1 0 0 0 1 0 1 0 0 ...
##  $ Percent_HSC        : num  88 75.3 78 63 55 ...
##  $ Board_HSC          : Factor w/ 3 levels "CBSE","ISC","Others": 3 3 3 1 2 1 3 2 1 1 ...
##  $ Stream_HSC         : Factor w/ 3 levels "Arts","Commerce",..: 2 3 2 1 3 2 3 2 2 1 ...
##  $ Percent_Degree     : num  52 75.5 66.6 58 54 ...
##  $ Course_Degree      : Factor w/ 7 levels "Arts","Commerce",..: 7 3 4 5 4 2 6 5 2 5 ...
##  $ Degree_Engg        : int  0 0 1 0 1 0 0 0 0 0 ...
##  $ Experience_Yrs     : int  0 1 0 0 1 0 2 0 0 1 ...
##  $ Entrance_Test      : Factor w/ 9 levels "CAT","G-MAT",..: 6 6 7 6 6 7 7 6 6 7 ...
##  $ S.TEST             : int  1 1 0 1 1 0 0 1 1 0 ...
##  $ Percentile_ET      : num  55 86.5 0 75 66 ...
##  $ S.TEST.SCORE       : num  55 86.5 0 75 66 ...
##  $ Percent_MBA        : num  58.8 66.3 52.9 57.8 59.4 ...
##  $ Specialization_MBA : Factor w/ 3 levels "Marketing & Finance",..: 2 1 1 1 2 1 2 1 1 2 ...
##  $ Marks_Communication: int  50 69 50 54 52 53 63 74 65 50 ...
##  $ Marks_Projectwork  : int  65 70 61 66 65 70 56 72 76 59 ...
##  $ Marks_BOCA         : int  74 75 59 62 67 53 50 50 70 77 ...
##  $ Placement          : Factor w/ 2 levels "Not Placed","Placed": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Placement_B        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Salary             : int  270000 200000 240000 250000 180000 300000 260000 235000 425000 240000 ...
head(mbadata) ## To display the first six rows of the dataset
##   SlNo Gender Gender.B Percent_SSC Board_SSC Board_CBSE Board_ICSE
## 1    1      M        0       62.00    Others          0          0
## 2    2      M        0       76.33      ICSE          0          1
## 3    3      M        0       72.00    Others          0          0
## 4    4      M        0       60.00      CBSE          1          0
## 5    5      M        0       61.00      CBSE          1          0
## 6    6      M        0       55.00      ICSE          0          1
##   Percent_HSC Board_HSC Stream_HSC Percent_Degree         Course_Degree
## 1       88.00    Others   Commerce          52.00               Science
## 2       75.33    Others    Science          75.48 Computer Applications
## 3       78.00    Others   Commerce          66.63           Engineering
## 4       63.00      CBSE       Arts          58.00            Management
## 5       55.00       ISC    Science          54.00           Engineering
## 6       64.00      CBSE   Commerce          50.00              Commerce
##   Degree_Engg Experience_Yrs Entrance_Test S.TEST Percentile_ET
## 1           0              0           MAT      1          55.0
## 2           0              1           MAT      1          86.5
## 3           1              0          None      0           0.0
## 4           0              0           MAT      1          75.0
## 5           1              1           MAT      1          66.0
## 6           0              0          None      0           0.0
##   S.TEST.SCORE Percent_MBA  Specialization_MBA Marks_Communication
## 1         55.0       58.80      Marketing & HR                  50
## 2         86.5       66.28 Marketing & Finance                  69
## 3          0.0       52.91 Marketing & Finance                  50
## 4         75.0       57.80 Marketing & Finance                  54
## 5         66.0       59.43      Marketing & HR                  52
## 6          0.0       56.81 Marketing & Finance                  53
##   Marks_Projectwork Marks_BOCA Placement Placement_B Salary
## 1                65         74    Placed           1 270000
## 2                70         75    Placed           1 200000
## 3                61         59    Placed           1 240000
## 4                66         62    Placed           1 250000
## 5                65         67    Placed           1 180000
## 6                70         53    Placed           1 300000

Task 2c : Summarize the data

The summary function gives the summary of each variable in the data set, including the frequency (or count) for categorical data.

The describe function offers a clean tabular format, giving the statistics of all the variables present in the dataset. It distinguishes the categorical data with an * in the column name.

summary(mbadata)
##       SlNo       Gender     Gender.B       Percent_SSC     Board_SSC  
##  Min.   :  1.0   F:127   Min.   :0.0000   Min.   :37.00   CBSE  :113  
##  1st Qu.: 98.5   M:264   1st Qu.:0.0000   1st Qu.:56.00   ICSE  : 77  
##  Median :196.0           Median :0.0000   Median :64.50   Others:201  
##  Mean   :196.0           Mean   :0.3248   Mean   :64.65               
##  3rd Qu.:293.5           3rd Qu.:1.0000   3rd Qu.:74.00               
##  Max.   :391.0           Max.   :1.0000   Max.   :87.20               
##                                                                       
##    Board_CBSE      Board_ICSE      Percent_HSC    Board_HSC  
##  Min.   :0.000   Min.   :0.0000   Min.   :40.0   CBSE  : 96  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:54.0   ISC   : 48  
##  Median :0.000   Median :0.0000   Median :63.0   Others:247  
##  Mean   :0.289   Mean   :0.1969   Mean   :63.8               
##  3rd Qu.:1.000   3rd Qu.:0.0000   3rd Qu.:72.0               
##  Max.   :1.000   Max.   :1.0000   Max.   :94.7               
##                                                              
##     Stream_HSC  Percent_Degree                Course_Degree
##  Arts    : 18   Min.   :35.00   Arts                 : 13  
##  Commerce:222   1st Qu.:57.52   Commerce             :117  
##  Science :151   Median :63.00   Computer Applications: 32  
##                 Mean   :62.98   Engineering          : 37  
##                 3rd Qu.:69.00   Management           :163  
##                 Max.   :89.00   Others               :  5  
##                                 Science              : 24  
##   Degree_Engg      Experience_Yrs   Entrance_Test     S.TEST      
##  Min.   :0.00000   Min.   :0.0000   MAT    :265   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.:0.0000   None   : 67   1st Qu.:1.0000  
##  Median :0.00000   Median :0.0000   K-MAT  : 24   Median :1.0000  
##  Mean   :0.09463   Mean   :0.4783   CAT    : 22   Mean   :0.8286  
##  3rd Qu.:0.00000   3rd Qu.:1.0000   PGCET  :  8   3rd Qu.:1.0000  
##  Max.   :1.00000   Max.   :3.0000   GCET   :  2   Max.   :1.0000  
##                                     (Other):  3                   
##  Percentile_ET    S.TEST.SCORE    Percent_MBA   
##  Min.   : 0.00   Min.   : 0.00   Min.   :50.83  
##  1st Qu.:41.19   1st Qu.:41.19   1st Qu.:57.20  
##  Median :62.00   Median :62.00   Median :61.01  
##  Mean   :54.93   Mean   :54.93   Mean   :61.67  
##  3rd Qu.:78.00   3rd Qu.:78.00   3rd Qu.:66.02  
##  Max.   :98.69   Max.   :98.69   Max.   :77.89  
##                                                 
##            Specialization_MBA Marks_Communication Marks_Projectwork
##  Marketing & Finance:222      Min.   :50.00       Min.   :50.00    
##  Marketing & HR     :156      1st Qu.:53.00       1st Qu.:64.00    
##  Marketing & IB     : 13      Median :58.00       Median :69.00    
##                               Mean   :60.54       Mean   :68.36    
##                               3rd Qu.:67.00       3rd Qu.:74.00    
##                               Max.   :88.00       Max.   :87.00    
##                                                                    
##    Marks_BOCA         Placement    Placement_B        Salary      
##  Min.   :50.00   Not Placed: 79   Min.   :0.000   Min.   :     0  
##  1st Qu.:57.00   Placed    :312   1st Qu.:1.000   1st Qu.:172800  
##  Median :63.00                    Median :1.000   Median :240000  
##  Mean   :64.38                    Mean   :0.798   Mean   :219078  
##  3rd Qu.:72.50                    3rd Qu.:1.000   3rd Qu.:300000  
##  Max.   :96.00                    Max.   :1.000   Max.   :940000  
## 
library(psych) ## The describe function is present under the package psych
describe(mbadata)
##                     vars   n      mean        sd    median   trimmed
## SlNo                   1 391    196.00    113.02    196.00    196.00
## Gender*                2 391      1.68      0.47      2.00      1.72
## Gender.B               3 391      0.32      0.47      0.00      0.28
## Percent_SSC            4 391     64.65     10.96     64.50     64.76
## Board_SSC*             5 391      2.23      0.87      3.00      2.28
## Board_CBSE             6 391      0.29      0.45      0.00      0.24
## Board_ICSE             7 391      0.20      0.40      0.00      0.12
## Percent_HSC            8 391     63.80     11.42     63.00     63.34
## Board_HSC*             9 391      2.39      0.85      3.00      2.48
## Stream_HSC*           10 391      2.34      0.56      2.00      2.36
## Percent_Degree        11 391     62.98      8.92     63.00     62.91
## Course_Degree*        12 391      3.85      1.61      4.00      3.81
## Degree_Engg           13 391      0.09      0.29      0.00      0.00
## Experience_Yrs        14 391      0.48      0.67      0.00      0.36
## Entrance_Test*        15 391      5.85      1.35      6.00      6.08
## S.TEST                16 391      0.83      0.38      1.00      0.91
## Percentile_ET         17 391     54.93     31.17     62.00     56.87
## S.TEST.SCORE          18 391     54.93     31.17     62.00     56.87
## Percent_MBA           19 391     61.67      5.85     61.01     61.45
## Specialization_MBA*   20 391      1.47      0.56      1.00      1.42
## Marks_Communication   21 391     60.54      8.82     58.00     59.68
## Marks_Projectwork     22 391     68.36      7.15     69.00     68.60
## Marks_BOCA            23 391     64.38      9.58     63.00     64.08
## Placement*            24 391      1.80      0.40      2.00      1.87
## Placement_B           25 391      0.80      0.40      1.00      0.87
## Salary                26 391 219078.26 138311.65 240000.00 217011.50
##                          mad   min       max     range  skew kurtosis
## SlNo                  145.29  1.00    391.00    390.00  0.00    -1.21
## Gender*                 0.00  1.00      2.00      1.00 -0.75    -1.45
## Gender.B                0.00  0.00      1.00      1.00  0.75    -1.45
## Percent_SSC            12.60 37.00     87.20     50.20 -0.06    -0.72
## Board_SSC*              0.00  1.00      3.00      2.00 -0.45    -1.53
## Board_CBSE              0.00  0.00      1.00      1.00  0.93    -1.14
## Board_ICSE              0.00  0.00      1.00      1.00  1.52     0.31
## Percent_HSC            13.34 40.00     94.70     54.70  0.29    -0.67
## Board_HSC*              0.00  1.00      3.00      2.00 -0.83    -1.13
## Stream_HSC*             0.00  1.00      3.00      2.00 -0.12    -0.72
## Percent_Degree          8.90 35.00     89.00     54.00  0.05     0.24
## Course_Degree*          1.48  1.00      7.00      6.00  0.00    -1.08
## Degree_Engg             0.00  0.00      1.00      1.00  2.76     5.63
## Experience_Yrs          0.00  0.00      3.00      3.00  1.27     1.17
## Entrance_Test*          0.00  1.00      9.00      8.00 -2.52     7.04
## S.TEST                  0.00  0.00      1.00      1.00 -1.74     1.02
## Percentile_ET          25.20  0.00     98.69     98.69 -0.74    -0.69
## S.TEST.SCORE           25.20  0.00     98.69     98.69 -0.74    -0.69
## Percent_MBA             6.39 50.83     77.89     27.06  0.34    -0.52
## Specialization_MBA*     0.00  1.00      3.00      2.00  0.70    -0.56
## Marks_Communication     8.90 50.00     88.00     38.00  0.74    -0.25
## Marks_Projectwork       7.41 50.00     87.00     37.00 -0.26    -0.27
## Marks_BOCA             11.86 50.00     96.00     46.00  0.29    -0.85
## Placement*              0.00  1.00      2.00      1.00 -1.48     0.19
## Placement_B             0.00  0.00      1.00      1.00 -1.48     0.19
## Salary              88956.00  0.00 940000.00 940000.00  0.24     1.74
##                          se
## SlNo                   5.72
## Gender*                0.02
## Gender.B               0.02
## Percent_SSC            0.55
## Board_SSC*             0.04
## Board_CBSE             0.02
## Board_ICSE             0.02
## Percent_HSC            0.58
## Board_HSC*             0.04
## Stream_HSC*            0.03
## Percent_Degree         0.45
## Course_Degree*         0.08
## Degree_Engg            0.01
## Experience_Yrs         0.03
## Entrance_Test*         0.07
## S.TEST                 0.02
## Percentile_ET          1.58
## S.TEST.SCORE           1.58
## Percent_MBA            0.30
## Specialization_MBA*    0.03
## Marks_Communication    0.45
## Marks_Projectwork      0.36
## Marks_BOCA             0.48
## Placement*             0.02
## Placement_B            0.02
## Salary              6994.72

Task 3a : Use R to calculate the median salary of all the students in the data sample

R has an in-built function called median, which gives the median of any variable in the dataset. The same function has been called for this task.

median(mbadata$Salary)
## [1] 240000

The median of all the students in the data set is 240000.


Task 3b : Use R to calculate the percentage of students who were placed, correct to 2 decimal places.

This task requires :

  • Creation of a frequency table for students who were placed.
  • Using the prop.table function to calculate the percentages
  • Using the round function to round the results off to 2 places after decimal.
tplaced = table(mbadata$Placement)
tplaced
## 
## Not Placed     Placed 
##         79        312
p.tplaced = round(prop.table(tplaced)*100, 2)
p.tplaced
## 
## Not Placed     Placed 
##       20.2       79.8

Therefore, 79.8% of students were placed


Task 3c : Use R to create a dataframe called placed, that contains a subset of only those students who were successfully placed.

This task uses the which function to divide the dataset based on a condition, and all the values confirming to the new condition are added in the new dataframe. The dim function is used to calculate the dimensions of the new dataframe. We then compare the number of rows in this dataframe with the total number of students placed (calculated above), as a basic verification method.

placed = mbadata[which(mbadata$Placement_B == 1),]
dim(placed)
## [1] 312  26

Task 3d : Use R to find the median salary of students who were placed.

The same median function is used, but now we use the dataframe placed.

median(placed$Salary) ## Notice that we have used the "placed" dataset.
## [1] 260000

The median salary of all the students who were placed is 2.610^{5}


Task 3e : Use R to create a table showing the mean salary of males and females, who were placed.

For this task we use the aggregate function.

aggregate(placed$Salary, by=list(Gender = placed$Gender), mean)
##   Gender        x
## 1      F 253068.0
## 2      M 284241.9

Task 3f : Use R to generate the following histogram showing a breakup of the MBA performance of the students who were placed

The given histogram can be achieved by using the hist function, with various arguments to get it in the same format as mentioned.

hist(placed$Percent_MBA, ## The variable for which the histogram is required
     xlab = "MBA Percentage", ## x-axis label
     ylab = "Count", ## Y-axis Label
     breaks = 3, ## Number of bars
     col = "grey", ## Colour of the histogram
     main = "MBA Performance of placed Students") ##  Main title of the histogram


Task 3g : Create a dataframe called notplaced, that contains a subset of only those students who were NOT placed after their MBA.

Similar to task 3b. Once again dim function is used to do a quick manual check that the subset dataframe created is correct.

notplaced = mbadata[which(mbadata$Placement_B == 0),]
dim(notplaced)
## [1] 79 26

Task 3h : Draw two histograms side-by-side, visually comparing the MBA performance of Placed and Not Placed students.

To have multi-panel plots we use the par function, with mfrow argument.

par(mfrow=c(1,2), mai=c(1,1,1,1)) ## The first number '1' indicates one row and '2' indicates two columns. "mai" argument is the margin in inches
with(placed, hist(Percent_MBA,
                  xlab = "MBA Percentage",
                  ylab = "Count",
                  breaks = 3,
                  col = "grey",
                  main = "MBA Performance of placed Students"))
with(notplaced, hist(Percent_MBA,
                     xlab = "MBA Percentage",
                     ylab = "Count",
                     breaks = 3,
                     col = "grey",
                     main = "MBA Performance of not placed Students"))

par(mfrow=c(1,1))

Task 3i : Use R to draw two boxplots, one below the other, comparing the distribution of salaries of males and females who were placed.

We use the boxplot function, by giving in the two variable Salary and Gender in the placed dataset.

boxplot(Salary ~ Gender.B, data = placed,
        horizontal = TRUE,
        yaxt = "n",
        ylab = "Gender",
        xlab = "Salary",
        main = "Comparison of Salaries of Males & Females")
axis(side = 2, at=c(1,2), labels = c("Males","Females"))


Task 3j : Create a dataframe called placedET

This dataframe should represent :

  • Students who were placed after the MBA and
  • Students who gave some MBA entrance test before admission into the MBA program.

We use the same which function to create a subset, but with the placed dataframe.

placedET = placed[which(placed$S.TEST == 1),]
table(placed$S.TEST) ## To find the frequency of students in *placed* dataset who gave some test or the other
## 
##   0   1 
##  51 261
dim(placedET) ## The dimensions of the new dataframe so that we can manually compare it with the above value. 
## [1] 261  26

Task 3k : Draw a Scatter Plot Matrix for 3 variables – {Salary, Percent_MBA, Percentile_ET} using the dataframe placedET.

For this task we use the scatterplotMatrix function present in package.

library(car)
scatterplotMatrix(formula = ~ Salary + Percent_MBA + Percentile_ET,
                  cex = 0.6,
                  data = placedET)

With this task, we have come to the end of the assignment.