Dean's Dilemma

Om Joy
03-10-2017

What is the median salary of all the students?

DeansDilemma <- read.csv(paste("DeansDilemmaData.csv", oct=""))
median(DeansDilemma$Salary)
[1] 240000

What percentage of students were placed? Answer this after rounding off to 2 decimal places.

round(nrow(subset(DeansDilemma, Placement == "Placed"))/nrow(DeansDilemma), digits= 2)*100
[1] 80

Create a dataframe called 'placed', containing only those students who were successfully placed?

Placed= subset (DeansDilemma,Placement=="Placed")

What is the median salary of students who were placed?

median(Placed$Salary)
[1] 260000

Create a table showing the mean salary of males and females, who were placed.

mytable <- aggregate(Placed$Salary ~ Placed$Gender, FUN = mean)
mytable
  Placed$Gender Placed$Salary
1             F      253068.0
2             M      284241.9

Generate the histogram as shown as below, showing a breakup of the MBA performance of the students who were placed.

hist(Placed$Percent_MBA, main = "MBA performance of placed students", xlab = "MBA Percentage", ylab = "count", col = "Grey", breaks = 2)

plot of chunk unnamed-chunk-6

Create a dataframe called 'notplaced', that contains a subset of only those students who were NOT placed after their MBA.

notplaced=subset(DeansDilemma, Placement=="Not Placed")

Generate two histograms side-by-side, visually comparing the MBA performance of Placed and Not Placed students, as shown below.

par(mfrow=c(1,2))

hist(Placed$Percent_MBA, main = "Placed students",
      xlab = "MBA Percentage", ylab = "Count", col= "lightgreen",
     breaks = 2)
hist(notplaced$Percent_MBA, main = "Not placed students",
      xlab = "MBA Percentage", ylab = "Count", col= "grey",
     breaks = 2)

plot of chunk unnamed-chunk-8

par(mfrow=c(1,1))

Generate two boxplots, one below the other, comparing the distribution of salaries of males and females who were placed.

boxplot(Salary ~ Gender, data = Placed, horizontal = TRUE, xlab= "Salary", ylab = "Gender",
        main = "Comparison of salaries of Males and Females", col = "pink")

plot of chunk unnamed-chunk-9

Create a dataframe called 'placedET', representing students who were placed after the MBA and who also gave some MBA entrance test before admission into the MBA program.

placedET <-subset(DeansDilemma, Placement == "Placed" & S.TEST == 1)

Draw a Scatter Plot Matrix for 3 variables - {Salary, Percent_MBA, Percentile_ET} using the dataframe placedET

library(car)
scatterplotMatrix(~Salary + Percent_MBA + Percentile_ET, data = placedET, main = "Scatter Plot Matrix")

plot of chunk unnamed-chunk-11 Analyzing the Salaries of Men versus Women MBA students who were placed

QUESTION 12

What is mean salary of all the students who were placed?

mean(Placed$Salary)
[1] 274550

Test whether the distribution of the salaries is normal?Normality Check - QQ Plot

qqnorm(Placed$Salary)
qqline(Placed$Salary)

plot of chunk unnamed-chunk-13

Normality Check - Histogram and Density Curve

hist(Placed$Salary,freq=FALSE)
lines(density(Placed$Salary), lwd=2)

plot of chunk unnamed-chunk-14

Normality Check - Shapiro-Wilk's Test

shapiro.test(Placed$Salary)

    Shapiro-Wilk normality test

data:  Placed$Salary
W = 0.84168, p-value < 2.2e-16

QUESTION 14

Create a table giving the mean salary of the men and the women MBAs who were placed?

mytable2<-aggregate(Placed$Salary~ Placed$Gender, FUN=mean )
mytable2
  Placed$Gender Placed$Salary
1             F      253068.0
2             M      284241.9

Visualize the mean salary of men and women who were placed, using plotmeans()

library(gplots)
plotmeans(Placed$Salary ~ Placed$Gender, xlab = "Gender", ylab =" Salary",
          main = "Mean salary of men and women who were placed")

plot of chunk unnamed-chunk-17 QUESTION 16

Create a table giving the variance of the salary of the men and women MBAs who were placed?

mytable3<-aggregate(Placed$Salary~Placed$Gender, FUN=var)
mytable3
  Placed$Gender Placed$Salary
1             F    5504236572
2             M    9886408520

Use a suitable statistical test to compare whether the variances are significantly different?

t <- var.test(Salary ~ Gender, data=Placed, alternative = "two.sided")
t

    F test to compare two variances

data:  Salary by Gender
F = 0.55675, num df = 96, denom df = 214, p-value = 0.00135
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.3999212 0.7927360
sample estimates:
ratio of variances 
         0.5567478 

QUESTION 18

Use a suitable statistical test to compare whether there is a significant difference between the salaries of men and women?

##As the data is not normally distributed, we perform the non-parametric Wilcoxon Test.
wilcox.test(Salary~Gender,data=Placed)

    Wilcoxon rank sum test with continuity correction

data:  Salary by Gender
W = 8146.5, p-value = 0.001939
alternative hypothesis: true location shift is not equal to 0

The p-value < 0.05, so we reject the Null Hypothesis that there is no difference between the salaries of men and women. In conclusion, there is significant difference between the salaries of men and women. Question 19

Create a table giving the mean salary of the Engineers and Non-Engineers MBAs who were placed?

mytable4<-aggregate(Placed$Salary~Placed$Degree_Engg, FUN= mean)
mytable4
  Placed$Degree_Engg Placed$Salary
1                 No      269161.7
2                Yes      325200.0

Visualize the mean salary of Engineer and Non-Engineer MBAs who were placed, using plotmeans()

plotmeans(Placed$Salary~Placed$Degree_Engg, xlab = "Engineering or not", ylab = "Salary",main = "mean salary of Engineer and Non-Engineer MBAs who were placed")

plot of chunk unnamed-chunk-22

Create a table giving the variance of the salary of the Engineer and Non-engineer MBAs who were placed?

mytable5<-aggregate(Placed$Salary~Placed$Degree_Engg, FUN= var)
mytable5
  Placed$Degree_Engg Placed$Salary
1                 No    7996660450
2                Yes   12994648276

Use a suitable statistical test to compare whether the variances are significantly different?

t <- var.test(Salary ~ Degree_Engg, data=Placed, alternative = "two.sided")
t

    F test to compare two variances

data:  Salary by Degree_Engg
F = 0.61538, num df = 281, denom df = 29, p-value = 0.05124
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.3341346 1.0025622
sample estimates:
ratio of variances 
         0.6153811 

Use a suitable statistical test to compare whether there is a significant difference between the salaries of Engineer and Non-engineer MBAs?

##As the data is not normally distributed, we perform the non-parametric Wilcoxon Test.

wilcox.test(Salary~Degree_Engg,data=Placed)

    Wilcoxon rank sum test with continuity correction

data:  Salary by Degree_Engg
W = 2828.5, p-value = 0.002793
alternative hypothesis: true location shift is not equal to 0

The p-value < 0.05, so we reject the Null Hypothesis that there is no difference between the salaries of Engineer and Non-engineer MBAs. In conclusion, there is significant difference between the salaries of Engineer and Non-engineer MBAs.