DeansDilemma

Tanushree
4th October 2017

Data Summary

dean = read.csv("DeansDilemmaData.csv")
summary(dean)

      SlNo       Gender   Percent_SSC     Board_SSC   Board_CBSE
 Min.   :  1.0   F:127   Min.   :37.00   CBSE  :113   No :278   
 1st Qu.: 98.5   M:264   1st Qu.:56.00   ICSE  : 77   Yes:113   
 Median :196.0           Median :64.50   Others:201             
 Mean   :196.0           Mean   :64.65                          
 3rd Qu.:293.5           3rd Qu.:74.00                          
 Max.   :391.0           Max.   :87.20                          

 Board_ICSE  Percent_HSC    Board_HSC      Stream_HSC  Percent_Degree 
 No :314    Min.   :40.0   CBSE  : 96   Arts    : 18   Min.   :35.00  
 Yes: 77    1st Qu.:54.0   ISC   : 48   Commerce:222   1st Qu.:57.52  
            Median :63.0   Others:247   Science :151   Median :63.00  
            Mean   :63.8                               Mean   :62.98  
            3rd Qu.:72.0                               3rd Qu.:69.00  
            Max.   :94.7                               Max.   :89.00  

               Course_Degree Degree_Engg Experience_Yrs   Entrance_Test
 Arts                 : 13   No :354     Min.   :0.0000   MAT    :265  
 Commerce             :117   Yes: 37     1st Qu.:0.0000   None   : 67  
 Computer Applications: 32               Median :0.0000   K-MAT  : 24  
 Engineering          : 37               Mean   :0.4783   CAT    : 22  
 Management           :163               3rd Qu.:1.0000   PGCET  :  8  
 Others               :  5               Max.   :3.0000   GCET   :  2  
 Science              : 24                                (Other):  3  
     S.TEST       Percentile_ET    S.TEST.SCORE    Percent_MBA   
 Min.   :0.0000   Min.   : 0.00   Min.   : 0.00   Min.   :50.83  
 1st Qu.:1.0000   1st Qu.:41.19   1st Qu.:41.19   1st Qu.:57.20  
 Median :1.0000   Median :62.00   Median :62.00   Median :61.01  
 Mean   :0.8286   Mean   :54.93   Mean   :54.93   Mean   :61.67  
 3rd Qu.:1.0000   3rd Qu.:78.00   3rd Qu.:78.00   3rd Qu.:66.02  
 Max.   :1.0000   Max.   :98.69   Max.   :98.69   Max.   :77.89  

           Specialization_MBA Marks_Communication Marks_Projectwork
 Marketing & Finance:222      Min.   :50.00       Min.   :50.00    
 Marketing & HR     :156      1st Qu.:53.00       1st Qu.:64.00    
 Marketing & IB     : 13      Median :58.00       Median :69.00    
                              Mean   :60.54       Mean   :68.36    
                              3rd Qu.:67.00       3rd Qu.:74.00    
                              Max.   :88.00       Max.   :87.00    

   Marks_BOCA         Placement       Salary      
 Min.   :50.00   Not Placed: 79   Min.   :     0  
 1st Qu.:57.00   Placed    :312   1st Qu.:172800  
 Median :63.00                    Median :240000  
 Mean   :64.38                    Mean   :219078  
 3rd Qu.:72.50                    3rd Qu.:300000  
 Max.   :96.00                    Max.   :940000

1.What is the median salary of all the students?

median(dean$Salary)

[1] 240000

2.What percentage of students were placed? Answer this after rounding off to 2 decimal places.

round(prop.table(with(dean, table(dean$Placement)))*100 , 2)[2]

Placed 
  79.8

3.Create a dataframe called 'placed', containing only those students who were successfully placed?

placed = subset(dean, Placement == 'Placed')

4.What is the median salary of students who were placed?

median(placed$Salary)

[1] 260000

5.Create a table showing the mean salary of males and females, who were placed.

aggregate(Salary ~ Gender, data = placed , mean)

  Gender   Salary
1      F 253068.0
2      M 284241.9

6.Generate the histogram , showing a breakup of the MBA performance of the students who were placed.

hist(placed$Percent_MBA, freq = FALSE)

plot of chunk unnamed-chunk-7

7.Create a dataframe called 'notplaced', that contains a subset of only those students who were NOT placed after their MBA.

notplaced = subset(dean, Placement == 'Not Placed')

8.Generate two histograms side-by-side, visually comparing the MBA performance of Placed and Not Placed students, as shown below.

library(lattice)
histogram(~ Percent_MBA | Placement, data=dean)

plot of chunk unnamed-chunk-9

9.Generate two boxplots, one below the other, comparing the distribution of salaries of males and females who were placed, as shown below.

boxplot(placed$Salary ~ Gender , data = placed , horizontal = TRUE , col= c ('light pink', 'sky blue') )

plot of chunk unnamed-chunk-10

Create a dataframe called 'placedET', representing students who were placed after the MBA and who also gave some MBA entrance test before admission into the MBA program.

placedET = subset(placed,Entrance_Test != 'None')

Draw a Scatter Plot Matrix for 3 variables -- {Salary, Percent_MBA, Percentile_ET} using the dataframe placedET

library(car)

attach(placedET)
scatterplot.matrix(~placedET$Salary+placedET$Percent_MBA+placedET$Percentile_ET, data=placedET,
    main="Three variable plot {Salary, Percent_MBA, Percentile_ET}")

plot of chunk unnamed-chunk-12

What is mean salary of all the students who were placed?

mean(placed$Salary)

[1] 274550

Test whether the distribution of the salaries is normal?

Shapiro Test

shapiro.test(dean$Salary)


    Shapiro-Wilk normality test

data:  dean$Salary
W = 0.89789, p-value = 1.619e-15

p-value = 1.619e-15 is less than the 0.05 then the null hypothesis is rejected and there is evidence that the data are not normally distributed.

QQ Plot

qqnorm(Salary)
qqline(Salary)

plot of chunk unnamed-chunk-15

Test whether the distribution of the salaries is normal?

Histogram and Density Curve

hist(Salary,freq=FALSE)
lines(density(Salary), lwd=2)

plot of chunk unnamed-chunk-16

Create a table giving the mean salary of the men and the women MBAs who were placed?

aggregate(Salary ~ Gender, data = placed , mean)

  Gender   Salary
1      F 253068.0
2      M 284241.9

Visualize the mean salary of men and women who were placed, using plotmeans()

library(gplots)

plotmeans(Salary ~ Gender, data = placed, frame = TRUE)

plot of chunk unnamed-chunk-18

Create a table giving the variance of the salary of the men and women MBAs who were placed?

aggregate(Salary ~ Gender, data = placed , var)

  Gender     Salary
1      F 5504236572
2      M 9886408520

Use a suitable statistical test to compare whether the variances are significantly different?

var.test(Salary ~ Gender, data = placed, alternative = "two.sided")


    F test to compare two variances

data:  Salary by Gender
F = 0.55675, num df = 96, denom df = 214, p-value = 0.00135
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.3999212 0.7927360
sample estimates:
ratio of variances 
         0.5567478

p-value = 0.00135 , In conclusion, there is significant difference between the two variances.

Use a suitable statistical test to compare whether there is a significant difference between the salaries of men and women?

t.test(Salary ~ Gender, data = placed)


    Welch Two Sample t-test

data:  Salary by Gender
t = -3.0757, df = 243.03, p-value = 0.00234
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -51138.42 -11209.22
sample estimates:
mean in group F mean in group M 
       253068.0        284241.9

p-value = 0.00234 , we can conclude that the averages of two groups are not significantly similar.