PART 2

The RMD file contains the Data Analysis of “MBA Starting Salaries” case study.

Possible questions before joining the Particular B-School

Whether to enroll in the MBA program at this particular school? About starting salaries, whether gender and/or age made a difference? Whether students liked this particular program?? Whether her GMAT score made a difference in marks?

Primary Variables for salary variation

GMAT Score, Gender, Age, WorkExperience

More chances of Enrollment

Better Salary, More Satisfaction Rating

Reading the Data in to R

MBA_Data <- read.csv(paste("MBA Starting Salaries Data.csv",sep=""))
dim(MBA_Data)
## [1] 274  13
View(MBA_Data)

Creating summary statistics (e.g. mean, standard deviation, median, mode) for the important variables in the dataset.

library(psych)
describe(MBA_Data)
##          vars   n     mean       sd median  trimmed     mad min    max
## age         1 274    27.36     3.71     27    26.76    2.97  22     48
## sex         2 274     1.25     0.43      1     1.19    0.00   1      2
## gmat_tot    3 274   619.45    57.54    620   618.86   59.30 450    790
## gmat_qpc    4 274    80.64    14.87     83    82.31   14.83  28     99
## gmat_vpc    5 274    78.32    16.86     81    80.33   14.83  16     99
## gmat_tpc    6 274    84.20    14.02     87    86.12   11.86   0     99
## s_avg       7 274     3.03     0.38      3     3.03    0.44   2      4
## f_avg       8 274     3.06     0.53      3     3.09    0.37   0      4
## quarter     9 274     2.48     1.11      2     2.47    1.48   1      4
## work_yrs   10 274     3.87     3.23      3     3.29    1.48   0     22
## frstlang   11 274     1.12     0.32      1     1.02    0.00   1      2
## salary     12 274 39025.69 50951.56    999 33607.86 1481.12   0 220000
## satis      13 274   172.18   371.61      6    91.50    1.48   1    998
##           range  skew kurtosis      se
## age          26  2.16     6.45    0.22
## sex           1  1.16    -0.66    0.03
## gmat_tot    340 -0.01     0.06    3.48
## gmat_qpc     71 -0.92     0.30    0.90
## gmat_vpc     83 -1.04     0.74    1.02
## gmat_tpc     99 -2.28     9.02    0.85
## s_avg         2 -0.06    -0.38    0.02
## f_avg         4 -2.08    10.85    0.03
## quarter       3  0.02    -1.35    0.07
## work_yrs     22  2.78     9.80    0.20
## frstlang      1  2.37     3.65    0.02
## salary   220000  0.70    -1.05 3078.10
## satis       997  1.77     1.13   22.45

Structure of Data

str(MBA_Data)
## 'data.frame':    274 obs. of  13 variables:
##  $ age     : int  23 24 24 24 24 24 25 25 25 25 ...
##  $ sex     : int  2 1 1 1 2 1 1 2 1 1 ...
##  $ gmat_tot: int  620 610 670 570 710 640 610 650 630 680 ...
##  $ gmat_qpc: int  77 90 99 56 93 82 89 88 79 99 ...
##  $ gmat_vpc: int  87 71 78 81 98 89 74 89 91 81 ...
##  $ gmat_tpc: int  87 87 95 75 98 91 87 92 89 96 ...
##  $ s_avg   : num  3.4 3.5 3.3 3.3 3.6 3.9 3.4 3.3 3.3 3.45 ...
##  $ f_avg   : num  3 4 3.25 2.67 3.75 3.75 3.5 3.75 3.25 3.67 ...
##  $ quarter : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ work_yrs: int  2 2 2 1 2 2 2 2 2 2 ...
##  $ frstlang: int  1 1 1 1 1 1 1 1 2 1 ...
##  $ salary  : int  0 0 0 0 999 0 0 0 999 998 ...
##  $ satis   : int  7 6 6 7 5 6 5 6 4 998 ...

Converting the data type of some columns

Sex

MBA_Data$sex[MBA_Data$sex==1] <- "Male"
MBA_Data$sex[MBA_Data$sex==2] <- "Female"
MBA_Data$sex= factor(MBA_Data$sex)

First Language

MBA_Data$frstlang[MBA_Data$frstlang==1] <- "English"
MBA_Data$frstlang[MBA_Data$frstlang==2] <- "Other"
MBA_Data$frstlang= factor(MBA_Data$frstlang)

Structure of the Dataset.

str(MBA_Data)
## 'data.frame':    274 obs. of  13 variables:
##  $ age     : int  23 24 24 24 24 24 25 25 25 25 ...
##  $ sex     : Factor w/ 2 levels "Female","Male": 1 2 2 2 1 2 2 1 2 2 ...
##  $ gmat_tot: int  620 610 670 570 710 640 610 650 630 680 ...
##  $ gmat_qpc: int  77 90 99 56 93 82 89 88 79 99 ...
##  $ gmat_vpc: int  87 71 78 81 98 89 74 89 91 81 ...
##  $ gmat_tpc: int  87 87 95 75 98 91 87 92 89 96 ...
##  $ s_avg   : num  3.4 3.5 3.3 3.3 3.6 3.9 3.4 3.3 3.3 3.45 ...
##  $ f_avg   : num  3 4 3.25 2.67 3.75 3.75 3.5 3.75 3.25 3.67 ...
##  $ quarter : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ work_yrs: int  2 2 2 1 2 2 2 2 2 2 ...
##  $ frstlang: Factor w/ 2 levels "English","Other": 1 1 1 1 1 1 1 1 2 1 ...
##  $ salary  : int  0 0 0 0 999 0 0 0 999 998 ...
##  $ satis   : int  7 6 6 7 5 6 5 6 4 998 ...

Contingency Tables for MBA_Data Dataframe

Sex & FirstLanguage

sex_Flang <- xtabs(~MBA_Data$sex+MBA_Data$frstlang)
addmargins(sex_Flang)
##             MBA_Data$frstlang
## MBA_Data$sex English Other Sum
##       Female      60     8  68
##       Male       182    24 206
##       Sum        242    32 274

Satisfaction value

satisfaction <- xtabs(~MBA_Data$satis)
addmargins(satisfaction)
## MBA_Data$satis
##   1   2   3   4   5   6   7 998 Sum 
##   1   1   5  17  74  97  33  46 274
prop.table(satisfaction)*100
## MBA_Data$satis
##          1          2          3          4          5          6 
##  0.3649635  0.3649635  1.8248175  6.2043796 27.0072993 35.4014599 
##          7        998 
## 12.0437956 16.7883212

We can clearly see that morethan 74% of students have mentioned a satisfaction value of more than four. (i.e,>4).

Mean of all MBA Students as per Gender.

AllSalaryMean <- aggregate(MBA_Data$salary,list(Gender=MBA_Data$sex),mean)
AllSalaryMean
##   Gender        x
## 1 Female 45121.07
## 2   Male 37013.62

Mean of all MBA Students as per Age.

AllSalaryMean2 <- aggregate(MBA_Data$salary,list(Gender=MBA_Data$age),mean)
AllSalaryMean2
##    Gender         x
## 1      22  42500.00
## 2      23  57282.00
## 3      24  49342.24
## 4      25  43395.55
## 5      26  35982.07
## 6      27  31499.37
## 7      28  39809.00
## 8      29  28067.95
## 9      30  55291.25
## 10     31  40599.40
## 11     32  13662.25
## 12     33 118000.00
## 13     34  26250.00
## 14     35      0.00
## 15     36      0.00
## 16     37      0.00
## 17     39  56000.00
## 18     40 183000.00
## 19     42      0.00
## 20     43      0.00
## 21     48      0.00

998 = did not answer the survey

999 = answered the survey but did not disclose salary data

0: Not Yet Placed

Salary Value = Who are Placed

DataFrame-1

NotAnsweredSurvey <- MBA_Data[which(MBA_Data$salary=='998'),]
dim(NotAnsweredSurvey)
## [1] 46 13

DataFrame-2

NotdisclosedSalary <- MBA_Data[which(MBA_Data$salary=='999'),]
NotdisclosedSalary
##     age    sex gmat_tot gmat_qpc gmat_vpc gmat_tpc s_avg f_avg quarter
## 5    24 Female      710       93       98       98  3.60  3.75       1
## 9    25   Male      630       79       91       89  3.30  3.25       1
## 21   27 Female      570       65       82       77  3.30  3.25       1
## 26   30   Male      620       82       84       87  3.40  2.80       1
## 30   32   Male      570       71       71        0  3.50  3.50       1
## 78   25   Male      690       87       98       98  3.00  3.00       2
## 87   26   Male      680       92       93       97  3.00  3.00       2
## 91   27   Male      740       99       98       99  3.10  3.50       2
## 99   28   Male      660       95       85       96  3.10  3.25       2
## 101  29   Male      580       91       50       80  3.10  2.67       2
## 105  29   Male      590       68       84       81  3.10  3.00       2
## 108  31   Male      670       83       98       96  3.20  3.40       2
## 145  24   Male      650       89       84       93  2.70  3.25       3
## 152  25   Male      660       95       84       94  2.70  3.00       3
## 158  26   Male      640       87       84       91  2.70  3.20       3
## 161  26   Male      600       97       45       83  2.70  3.00       3
## 166  27   Male      730       95       99       99  2.90  3.33       3
## 170  27 Female      620       97       54       87  2.70  2.75       3
## 179  28   Male      500       46       54       52  2.90  2.75       3
## 181  29   Male      560       57       74       73  2.80  3.00       3
## 212  25   Male      600       53       95       84  2.50  3.00       4
## 214  25 Female      650       87       91       93  2.50  2.50       4
## 217  25   Male      590       97       41       81  2.50  2.75       4
## 221  26   Male      560       87       45       72  2.60  3.00       4
## 223  26   Male      570       82       58       75  2.50  2.75       4
## 226  27   Male      660       97       81       94  2.50  2.50       4
## 228  27   Male      790       99       99       99  2.40  2.50       4
## 231  27   Male      620       85       85       89  3.30  3.00       4
## 235  28   Male      620       93       71       87  2.40  2.75       4
## 239  29   Male      690       99       87       97  2.30  2.25       4
## 240  29   Male      630       87       84       89  2.90  2.80       4
## 245  30   Male      550       79       45       69  2.45  2.75       4
## 246  30 Female      600       99       46       86  2.80  3.00       4
## 251  31   Male      640       79       92       92  2.70  2.75       4
## 252  32   Male      570       89       41       75  2.60  2.50       4
##     work_yrs frstlang salary satis
## 5          2  English    999     5
## 9          2    Other    999     4
## 21         4  English    999     4
## 26         5  English    999     6
## 30         4  English    999     4
## 78         3  English    999     5
## 87         3  English    999     1
## 91         2  English    999     4
## 99         4  English    999     3
## 101        4    Other    999     4
## 105        6  English    999     5
## 108        4  English    999     6
## 145        1  English    999     5
## 152        3  English    999     6
## 158        4  English    999     5
## 161        4    Other    999     6
## 166        0  English    999     5
## 170        2    Other    999     2
## 179        9  English    999     6
## 181        4  English    999     5
## 212        2  English    999     4
## 214        3  English    999     7
## 217        2    Other    999     4
## 221        3    Other    999     3
## 223        3  English    999     6
## 226        4  English    999     4
## 228        4  English    999     6
## 231        1  English    999     5
## 235        3  English    999     4
## 239        7  English    999     5
## 240        3  English    999     4
## 245        5    Other    999     4
## 246        6    Other    999     4
## 251        7  English    999     3
## 252        4    Other    999     3

DataFrame-3

NotPlaced <- MBA_Data[which(MBA_Data$salary=='0'),]
NotPlaced
##     age    sex gmat_tot gmat_qpc gmat_vpc gmat_tpc s_avg f_avg quarter
## 1    23 Female      620       77       87       87  3.40  3.00       1
## 2    24   Male      610       90       71       87  3.50  4.00       1
## 3    24   Male      670       99       78       95  3.30  3.25       1
## 4    24   Male      570       56       81       75  3.30  2.67       1
## 6    24   Male      640       82       89       91  3.90  3.75       1
## 7    25   Male      610       89       74       87  3.40  3.50       1
## 8    25 Female      650       88       89       92  3.30  3.75       1
## 22   27   Male      740       99       96       99  3.50  3.50       1
## 23   27   Male      750       99       98       99  3.40  3.50       1
## 24   28 Female      540       75       50       65  3.60  4.00       1
## 25   29   Male      580       56       87       78  3.64  3.33       1
## 27   31 Female      560       60       78       72  3.30  3.75       1
## 28   32   Male      760       99       99       99  3.40  3.00       1
## 29   32   Male      640       79       91       91  3.60  3.75       1
## 31   34 Female      620       75       89       87  3.30  3.00       1
## 32   37 Female      560       43       87       72  3.40  3.50       1
## 33   42 Female      650       75       98       93  3.38  3.00       1
## 34   48   Male      590       84       62       81  3.80  4.00       1
## 70   22   Male      600       95       54       83  3.00  3.00       2
## 71   23   Male      640       89       87       92  3.00  3.00       2
## 72   24   Male      550       73       63       69  3.10  3.00       2
## 73   24   Male      570       82       58       75  3.09  3.50       2
## 74   24   Male      620       82       84       87  3.10  3.50       2
## 75   25 Female      570       61       81       76  3.00  3.25       2
## 76   25   Male      660       94       84       94  3.27  3.75       2
## 77   25   Male      680       94       92       97  3.17  3.50       2
## 88   26 Female      560       64       71       72  3.20  3.25       2
## 89   26   Male      560       87       41       72  3.00  3.00       2
## 90   26   Male      530       68       54       62  3.09  3.17       2
## 92   27   Male      720       99       95       99  3.10  3.25       2
## 93   27   Male      590       60       87       81  3.00  2.75       2
## 97   28   Male      620       81       90       89  3.20  3.00       2
## 98   28 Female      610       85       78       86  3.10  3.00       2
## 100  29   Male      660       94       87       94  3.00  3.00       2
## 102  29   Male      510       57       50       55  3.27  3.40       2
## 103  29 Female      640       90       84       92  3.20  3.00       2
## 104  29   Male      610       91       62       86  3.10  3.67       2
## 106  29   Male      580       79       67       78  3.00  3.25       2
## 107  30   Male      680       97       87       96  3.00  3.00       2
## 109  32 Female      610       64       89       86  3.25  0.00       2
## 110  35   Male      540       43       78       65  3.20  3.25       2
## 111  35   Male      630       66       95       90  3.08  3.25       2
## 112  36 Female      530       48       71       62  3.00  2.50       2
## 113  36   Male      650       87       89       93  3.00  3.20       2
## 114  43   Male      630       82       87       89  3.10  3.00       2
## 140  23   Male      720       95       98       99  2.80  2.50       3
## 141  24 Female      640       94       78       92  2.90  3.25       3
## 142  24   Male      710       96       97       99  2.80  2.75       3
## 143  24   Male      670       94       89       96  2.70  3.00       3
## 144  24 Female      710       97       97       99  2.80  3.00       3
## 146  24   Male      600       89       62       83  2.90  3.00       3
## 147  24 Female      640       96       71       91  2.70  2.50       3
## 150  25   Male      550       72       58       69  2.90  3.00       3
## 151  25   Male      710       99       91       98  2.90  3.25       3
## 159  26   Male      560       56       81       72  2.80  3.25       3
## 160  26   Male      540       52       71       65  2.70  2.75       3
## 162  26 Female      570       48       89       75  2.82  2.50       3
## 163  26   Male      610       82       81       86  2.90  2.75       3
## 164  27   Male      650       89       84       93  2.90  3.00       3
## 165  27 Female      550       66       63       69  2.90  3.00       3
## 167  27   Male      610       97       45       86  2.70  2.50       3
## 168  27 Female      630       82       89       89  2.70  3.25       3
## 169  27 Female      560       61       74       73  2.80  3.25       3
## 180  29   Male      590       92       58       81  2.80  2.75       3
## 182  32   Male      550       52       78       71  2.70  2.75       3
## 183  34   Male      610       79       81       86  2.80  3.00       3
## 184  34   Male      610       82       78       86  2.70  3.00       3
## 185  43   Male      480       49       41       45  2.90  3.25       3
## 213  25   Male      730       98       96       99  2.40  2.75       4
## 218  25   Male      700       99       87       98  2.00  2.00       4
## 219  26   Male      660       93       87       95  2.60  2.00       4
## 220  26   Male      450       28       46       34  2.10  2.00       4
## 222  26   Male      600       75       78       83  2.20  2.25       4
## 227  27 Female      560       59       74       73  2.40  2.50       4
## 229  27   Male      630       93       78       91  2.10  2.50       4
## 230  27   Male      580       84       58       78  2.70  2.75       4
## 232  27   Male      670       89       91       95  3.60  3.25       4
## 233  27   Male      580       74       70       78  3.40  3.25       4
## 234  28   Male      560       74       67       73  3.60  3.60       4
## 236  28   Male      710       94       98       99  3.40  3.75       4
## 237  28   Male      570       69       71        0  2.30  2.50       4
## 238  29   Male      530       35       81       62  3.30  2.75       4
## 241  29   Male      670       91       91       95  3.30  3.25       4
## 242  29   Male      630       99       50       89  2.90  3.25       4
## 243  29 Female      680       89       96       96  2.80  3.00       4
## 244  30   Male      650       88       92       93  3.45  3.83       4
## 250  31   Male      570       75       62       75  2.80  3.00       4
## 253  32   Male      510       79       22       54  2.30  2.25       4
## 254  35   Male      570       72       71       75  3.30  4.00       4
## 255  39 Female      700       89       98       98  3.30  3.25       4
##     work_yrs frstlang salary satis
## 1          2  English      0     7
## 2          2  English      0     6
## 3          2  English      0     6
## 4          1  English      0     7
## 6          2  English      0     6
## 7          2  English      0     5
## 8          2  English      0     6
## 22         3  English      0     6
## 23         1    Other      0     5
## 24         5  English      0     5
## 25         3  English      0     5
## 27        10  English      0     7
## 28         5  English      0     5
## 29         7  English      0     6
## 31         7  English      0     6
## 32         9  English      0     6
## 33        13  English      0     5
## 34        22  English      0     6
## 70         1  English      0     5
## 71         2  English      0     7
## 72         0    Other      0     5
## 73         2  English      0     6
## 74         1  English      0     5
## 75         3  English      0     4
## 76         2  English      0     5
## 77         2  English      0     6
## 88         3  English      0     6
## 89         3  English      0     6
## 90         4    Other      0     5
## 92         5  English      0     5
## 93         3  English      0     6
## 97         4  English      0     6
## 98         5  English      0     6
## 100        1  English      0     6
## 102        5  English      0     5
## 103        3  English      0     5
## 104        7  English      0     5
## 106        4  English      0     6
## 107        4  English      0     5
## 109       11  English      0     7
## 110        8  English      0     5
## 111       12  English      0     5
## 112        7  English      0     5
## 113       18  English      0     6
## 114       16  English      0     5
## 140        1  English      0     5
## 141        2    Other      0     4
## 142        2  English      0     7
## 143        2  English      0     7
## 144        2  English      0     7
## 146        1  English      0     6
## 147        2  English      0     6
## 150        3  English      0     6
## 151        1  English      0     6
## 159        4  English      0     6
## 160        2  English      0     6
## 162        3  English      0     5
## 163        3  English      0     6
## 164        2  English      0     6
## 165        3  English      0     4
## 167        4    Other      0     5
## 168        5  English      0     6
## 169        5  English      0     6
## 180        3    Other      0     5
## 182        7  English      0     6
## 183       11  English      0     6
## 184       12  English      0     5
## 185       22  English      0     5
## 213        2  English      0     6
## 218        1  English      0     7
## 219        2  English      0     5
## 220        4  English      0     6
## 222        2  English      0     6
## 227        2  English      0     5
## 229        4  English      0     5
## 230        1  English      0     5
## 232        5  English      0     6
## 233        3  English      0     6
## 234        5  English      0     5
## 236        6  English      0     6
## 237        5  English      0     5
## 238        6  English      0     7
## 241        3  English      0     5
## 242        1    Other      0     4
## 243        4  English      0     5
## 244        2  English      0     6
## 250        1  English      0     6
## 253        5    Other      0     5
## 254        8  English      0     6
## 255        5  English      0     5

DataFrame-4

Placed <- MBA_Data[which(MBA_Data$salary>999),]
dim(Placed)
## [1] 103  13

subset of Placed DataFrame.

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
some(Placed)
##     age    sex gmat_tot gmat_qpc gmat_vpc gmat_tpc s_avg f_avg quarter
## 38   25 Female      650       82       91       93   3.4  3.25       1
## 41   24   Male      670       84       96       95   3.3  3.25       1
## 120  24   Male      560       52       81       72   3.2  3.25       2
## 122  23   Male      590       72       81       81   3.2  3.25       2
## 125  28 Female      580       83       58       79   3.1  3.00       2
## 193  28   Male      580       72       71       78   2.8  3.00       3
## 196  25 Female      580       72       71       78   2.8  3.25       3
## 199  29   Male      710       93       98       99   2.9  3.25       3
## 200  24   Male      710       99       92       99   2.9  3.00       3
## 201  25 Female      630       84       87       89   2.8  2.75       3
##     work_yrs frstlang salary satis
## 38         3  English  88000     7
## 41         0  English  95000     4
## 120        2  English  96000     7
## 122        2  English  98000     6
## 125        5    Other  99000     6
## 193        3  English  97000     6
## 196        2  English  98000     6
## 199        7  English  98000     5
## 200        3  English 100000     6
## 201        2  English 100000     6

Mean of Placed MBA Students as per Gender.

PlacedSalaryMean <- aggregate(Placed$salary,list(Gender=Placed$sex),mean)
PlacedSalaryMean
##   Gender         x
## 1 Female  98524.39
## 2   Male 104970.97

Mean of Placed MBA Students as per Age.

AgeSalaryMean <- aggregate(Placed$salary,list(Gender=Placed$age),mean)
AgeSalaryMean
##    Gender         x
## 1      22  85000.00
## 2      23  91651.20
## 3      24 101518.75
## 4      25  99086.96
## 5      26 101665.00
## 6      27 102214.29
## 7      28 103625.00
## 8      29 102083.33
## 9      30 109916.67
## 10     31 100500.00
## 11     32 107300.00
## 12     33 118000.00
## 13     34 105000.00
## 14     39 112000.00
## 15     40 183000.00

Merging Placed, NotDisclosedSalary and NotPlaced to give KnownMBA_Data

KnownMBA_DATA <- rbind(Placed,NotdisclosedSalary,NotPlaced)
View(KnownMBA_DATA)

Creating Dummy Variable “Got Placed” which is populated with ‘0’(didn’t got a Job) or ‘1’(Got a Job)

KnownMBA_DATA$GotPlaced <- (KnownMBA_DATA$salary>1000)
View(KnownMBA_DATA)

Structure of the KnownMBA_DATA DataFrame

str(KnownMBA_DATA)
## 'data.frame':    228 obs. of  14 variables:
##  $ age      : int  22 27 25 25 27 28 24 25 25 25 ...
##  $ sex      : Factor w/ 2 levels "Female","Male": 1 1 1 1 2 1 2 1 1 2 ...
##  $ gmat_tot : int  660 700 680 650 710 620 670 560 530 650 ...
##  $ gmat_qpc : int  90 94 87 82 96 52 84 52 50 79 ...
##  $ gmat_vpc : int  92 98 96 91 96 98 96 81 62 93 ...
##  $ gmat_tpc : int  94 98 96 93 98 87 95 72 61 93 ...
##  $ s_avg    : num  3.5 3.3 3.5 3.4 3.3 3.4 3.3 3.3 3.6 3.3 ...
##  $ f_avg    : num  3.75 3.25 2.67 3.25 3.5 3.75 3.25 3.5 3.67 3.5 ...
##  $ quarter  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ work_yrs : int  1 2 2 3 2 5 0 1 3 1 ...
##  $ frstlang : Factor w/ 2 levels "English","Other": 1 1 1 1 1 1 1 1 1 1 ...
##  $ salary   : int  85000 85000 86000 88000 92000 93000 95000 95000 95000 96000 ...
##  $ satis    : int  5 6 5 7 6 5 4 5 3 7 ...
##  $ GotPlaced: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

No.of Males & Females.

table(KnownMBA_DATA$sex)
## 
## Female   Male 
##     59    169

Contingency Tables for KnownMBA_DATA DataFrame

table1 <- xtabs(~GotPlaced,data=KnownMBA_DATA)
addmargins(table1)
## GotPlaced
## FALSE  TRUE   Sum 
##   125   103   228
prop.table(table1)
## GotPlaced
##     FALSE      TRUE 
## 0.5482456 0.4517544
table2 <- xtabs(~GotPlaced+sex,data=KnownMBA_DATA)
addmargins(table2)
##          sex
## GotPlaced Female Male Sum
##     FALSE     28   97 125
##     TRUE      31   72 103
##     Sum       59  169 228
prop.table(table2,2)
##          sex
## GotPlaced    Female      Male
##     FALSE 0.4745763 0.5739645
##     TRUE  0.5254237 0.4260355
table3 <- xtabs(~GotPlaced+frstlang,data=KnownMBA_DATA)
addmargins(table3)
##          frstlang
## GotPlaced English Other Sum
##     FALSE     108    17 125
##     TRUE       96     7 103
##     Sum       204    24 228
prop.table(table3,2)
##          frstlang
## GotPlaced   English     Other
##     FALSE 0.5294118 0.7083333
##     TRUE  0.4705882 0.2916667

Correlation & Visualization Matrix

Data <- KnownMBA_DATA[,c(1,10,3:9,12,13)]
CorrMatrix <- cor(Data)
round(CorrMatrix,2)
##            age work_yrs gmat_tot gmat_qpc gmat_vpc gmat_tpc s_avg f_avg
## age       1.00     0.86    -0.14    -0.21    -0.03    -0.16  0.16 -0.04
## work_yrs  0.86     1.00    -0.19    -0.25    -0.06    -0.17  0.14 -0.06
## gmat_tot -0.14    -0.19     1.00     0.72     0.74     0.83  0.11  0.10
## gmat_qpc -0.21    -0.25     0.72     1.00     0.14     0.65 -0.04  0.09
## gmat_vpc -0.03    -0.06     0.74     0.14     1.00     0.63  0.21  0.07
## gmat_tpc -0.16    -0.17     0.83     0.65     0.63     1.00  0.11  0.08
## s_avg     0.16     0.14     0.11    -0.04     0.21     0.11  1.00  0.54
## f_avg    -0.04    -0.06     0.10     0.09     0.07     0.08  0.54  1.00
## quarter  -0.07    -0.12    -0.07     0.05    -0.18    -0.07 -0.76 -0.41
## salary   -0.11    -0.03    -0.03    -0.03     0.02     0.04  0.16  0.04
## satis    -0.07     0.03     0.03    -0.10     0.19     0.09  0.06 -0.04
##          quarter salary satis
## age        -0.07  -0.11 -0.07
## work_yrs   -0.12  -0.03  0.03
## gmat_tot   -0.07  -0.03  0.03
## gmat_qpc    0.05  -0.03 -0.10
## gmat_vpc   -0.18   0.02  0.19
## gmat_tpc   -0.07   0.04  0.09
## s_avg      -0.76   0.16  0.06
## f_avg      -0.41   0.04 -0.04
## quarter     1.00  -0.19 -0.03
## salary     -0.19   1.00  0.28
## satis      -0.03   0.28  1.00

corrplot of various variables

library(corrplot)
## corrplot 0.84 loaded
corrplot(corr = cor(Data),method="circle")

Scatterplots based on Age, Sex, Work_yrs , First Language

library(car)
scatterplotMatrix(~KnownMBA_DATA$age+KnownMBA_DATA$sex+KnownMBA_DATA$work_yrs+KnownMBA_DATA$frstlang+KnownMBA_DATA$salary)

Salary based on GMAT Score

scatterplotMatrix(~KnownMBA_DATA$gmat_tot+KnownMBA_DATA$gmat_qpc+KnownMBA_DATA$gmat_vpc+KnownMBA_DATA$gmat_tpc+KnownMBA_DATA$salary)

Salary based on Academic Performance (s_avg,f_avg,Quarter Ranking)

scatterplotMatrix(~KnownMBA_DATA$s_avg+KnownMBA_DATA$f_avg+KnownMBA_DATA$quarter+KnownMBA_DATA$salary)

Hypothesis Statement

For Test 1 <- table2 H1: THe percentage of Females placed is higher than the percentage of Males Placed.

For Test 2 <- table3 H2: THe percentage of Students placed in English is higher than the percentage of Students Placed in other languages.

Pearson’s Chi-square test

chisq.test(table2)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table2
## X-squared = 1.366, df = 1, p-value = 0.2425
chisq.test(table3)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table3
## X-squared = 2.1002, df = 1, p-value = 0.1473

t-tests

t.test(table2)
## 
##  One Sample t-test
## 
## data:  table2
## t = 3.4156, df = 3, p-value = 0.04198
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##    3.89032 110.10968
## sample estimates:
## mean of x 
##        57
t.test(table3)
## 
##  One Sample t-test
## 
## data:  table3
## t = 2.1776, df = 3, p-value = 0.1176
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -26.30272 140.30272
## sample estimates:
## mean of x 
##        57

Selecting a Regression model as, as y = f(x) where the vector of variables x may be different in different models

colnames(KnownMBA_DATA)
##  [1] "age"       "sex"       "gmat_tot"  "gmat_qpc"  "gmat_vpc" 
##  [6] "gmat_tpc"  "s_avg"     "f_avg"     "quarter"   "work_yrs" 
## [11] "frstlang"  "salary"    "satis"     "GotPlaced"

Model: salary ~ age + sex + gmat_tot + work_yrs + frstlang

Model1 <- salary ~  work_yrs + s_avg  + frstlang 
fit <- lm(Model1,data=Placed)
summary(fit)
## 
## Call:
## lm(formula = Model1, data = Placed)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -35244  -8500   -360   4263  78532 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    84104.3    13063.3   6.438 4.36e-09 ***
## work_yrs        2409.9      539.7   4.465 2.12e-05 ***
## s_avg           2948.6     4249.8   0.694   0.4894    
## frstlangOther  13843.9     6399.1   2.163   0.0329 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15780 on 99 degrees of freedom
## Multiple R-squared:  0.2433, Adjusted R-squared:  0.2203 
## F-statistic: 10.61 on 3 and 99 DF,  p-value: 4.134e-06
library(leaps)
leap <- regsubsets(Model1, data = Placed, nbest=1)
summary(leap)
## Subset selection object
## Call: regsubsets.formula(Model1, data = Placed, nbest = 1)
## 3 Variables  (and intercept)
##               Forced in Forced out
## work_yrs          FALSE      FALSE
## s_avg             FALSE      FALSE
## frstlangOther     FALSE      FALSE
## 1 subsets of each size up to 3
## Selection Algorithm: exhaustive
##          work_yrs s_avg frstlangOther
## 1  ( 1 ) "*"      " "   " "          
## 2  ( 1 ) "*"      " "   "*"          
## 3  ( 1 ) "*"      "*"   "*"
plot(leap,scale="adjr")

Model2: salary ~ age + sex + gmat_tot work_yrs + frstlang +

Model2 <- salary ~  sex + gmat_qpc + gmat_vpc + gmat_tpc + work_yrs + frstlang +s_avg + f_avg + satis  
fit2 <- lm(Model2,data=Placed)
summary(fit2)
## 
## Call:
## lm(formula = Model2, data = Placed)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -31321  -8562   -877   5606  72768 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    97896.6    21628.0   4.526 1.77e-05 ***
## sexMale         5150.9     3479.3   1.480 0.142137    
## gmat_qpc         834.5      361.0   2.311 0.023022 *  
## gmat_vpc         646.9      358.4   1.805 0.074333 .  
## gmat_tpc       -1514.7      701.3  -2.160 0.033364 *  
## work_yrs        2273.7      575.5   3.951 0.000152 ***
## frstlangOther  13158.0     6527.4   2.016 0.046704 *  
## s_avg           5790.8     4948.5   1.170 0.244906    
## f_avg          -2288.3     3772.0  -0.607 0.545551    
## satis          -1340.1     2037.2  -0.658 0.512291    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15440 on 93 degrees of freedom
## Multiple R-squared:  0.3191, Adjusted R-squared:  0.2532 
## F-statistic: 4.843 on 9 and 93 DF,  p-value: 2.592e-05
library(leaps)
leap2 <- regsubsets(Model2,data=Placed,nbest=1)
summary(leap2)
## Subset selection object
## Call: regsubsets.formula(Model2, data = Placed, nbest = 1)
## 9 Variables  (and intercept)
##               Forced in Forced out
## sexMale           FALSE      FALSE
## gmat_qpc          FALSE      FALSE
## gmat_vpc          FALSE      FALSE
## gmat_tpc          FALSE      FALSE
## work_yrs          FALSE      FALSE
## frstlangOther     FALSE      FALSE
## s_avg             FALSE      FALSE
## f_avg             FALSE      FALSE
## satis             FALSE      FALSE
## 1 subsets of each size up to 8
## Selection Algorithm: exhaustive
##          sexMale gmat_qpc gmat_vpc gmat_tpc work_yrs frstlangOther s_avg
## 1  ( 1 ) " "     " "      " "      " "      "*"      " "           " "  
## 2  ( 1 ) " "     " "      " "      " "      "*"      "*"           " "  
## 3  ( 1 ) "*"     " "      " "      " "      "*"      "*"           " "  
## 4  ( 1 ) " "     "*"      "*"      "*"      "*"      " "           " "  
## 5  ( 1 ) " "     "*"      "*"      "*"      "*"      "*"           " "  
## 6  ( 1 ) "*"     "*"      "*"      "*"      "*"      "*"           " "  
## 7  ( 1 ) "*"     "*"      "*"      "*"      "*"      "*"           "*"  
## 8  ( 1 ) "*"     "*"      "*"      "*"      "*"      "*"           "*"  
##          f_avg satis
## 1  ( 1 ) " "   " "  
## 2  ( 1 ) " "   " "  
## 3  ( 1 ) " "   " "  
## 4  ( 1 ) " "   " "  
## 5  ( 1 ) " "   " "  
## 6  ( 1 ) " "   " "  
## 7  ( 1 ) " "   " "  
## 8  ( 1 ) " "   "*"
plot(leap2,scale="adjr2")

CONCLUSION

Model 2 has the best fit compared with Model 1 (Greater R-value). So the Explanatory Variables for Salary variation account for sexMale, gmat_qpc, gmat_vpc, gmat_tpc, work_yrs, frstlangOther, s_avg, f_avg, satis