Om Joy
03-10-2017
DeansDilemma <- read.csv(paste("DeansDilemmaData.csv", oct=""))
median(DeansDilemma$Salary)
[1] 240000
round(nrow(subset(DeansDilemma, Placement == "Placed"))/nrow(DeansDilemma), digits= 2)*100
[1] 80
Placed= subset (DeansDilemma,Placement=="Placed")
median(Placed$Salary)
[1] 260000
mytable <- aggregate(Placed$Salary ~ Placed$Gender, FUN = mean)
mytable
Placed$Gender Placed$Salary
1 F 253068.0
2 M 284241.9
hist(Placed$Percent_MBA, main = "MBA performance of placed students", xlab = "MBA Percentage", ylab = "count", col = "Grey", breaks = 2)
notplaced=subset(DeansDilemma, Placement=="Not Placed")
par(mfrow=c(1,2))
hist(Placed$Percent_MBA, main = "Placed students",
xlab = "MBA Percentage", ylab = "Count", col= "lightgreen",
breaks = 2)
hist(notplaced$Percent_MBA, main = "Not placed students",
xlab = "MBA Percentage", ylab = "Count", col= "grey",
breaks = 2)
par(mfrow=c(1,1))
boxplot(Salary ~ Gender, data = Placed, horizontal = TRUE, xlab= "Salary", ylab = "Gender",
main = "Comparison of salaries of Males and Females", col = "pink")
placedET <-subset(DeansDilemma, Placement == "Placed" & S.TEST == 1)
library(car)
scatterplotMatrix(~Salary + Percent_MBA + Percentile_ET, data = placedET, main = "Scatter Plot Matrix")
Analyzing the Salaries of Men versus Women MBA students who were placed
QUESTION 12
mean(Placed$Salary)
[1] 274550
qqnorm(Placed$Salary)
qqline(Placed$Salary)
hist(Placed$Salary,freq=FALSE)
lines(density(Placed$Salary), lwd=2)
shapiro.test(Placed$Salary)
Shapiro-Wilk normality test
data: Placed$Salary
W = 0.84168, p-value < 2.2e-16
Create a table giving the mean salary of the men and the women MBAs who were placed?
mytable2<-aggregate(Placed$Salary~ Placed$Gender, FUN=mean )
mytable2
Placed$Gender Placed$Salary
1 F 253068.0
2 M 284241.9
library(gplots)
plotmeans(Placed$Salary ~ Placed$Gender, xlab = "Gender", ylab =" Salary",
main = "Mean salary of men and women who were placed")
QUESTION 16
mytable3<-aggregate(Placed$Salary~Placed$Gender, FUN=var)
mytable3
Placed$Gender Placed$Salary
1 F 5504236572
2 M 9886408520
t <- var.test(Salary ~ Gender, data=Placed, alternative = "two.sided")
t
F test to compare two variances
data: Salary by Gender
F = 0.55675, num df = 96, denom df = 214, p-value = 0.00135
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3999212 0.7927360
sample estimates:
ratio of variances
0.5567478
QUESTION 18
##As the data is not normally distributed, we perform the non-parametric Wilcoxon Test.
wilcox.test(Salary~Gender,data=Placed)
Wilcoxon rank sum test with continuity correction
data: Salary by Gender
W = 8146.5, p-value = 0.001939
alternative hypothesis: true location shift is not equal to 0
The p-value < 0.05, so we reject the Null Hypothesis that there is no difference between the salaries of men and women. In conclusion, there is significant difference between the salaries of men and women. Question 19
mytable4<-aggregate(Placed$Salary~Placed$Degree_Engg, FUN= mean)
mytable4
Placed$Degree_Engg Placed$Salary
1 No 269161.7
2 Yes 325200.0
plotmeans(Placed$Salary~Placed$Degree_Engg, xlab = "Engineering or not", ylab = "Salary",main = "mean salary of Engineer and Non-Engineer MBAs who were placed")
mytable5<-aggregate(Placed$Salary~Placed$Degree_Engg, FUN= var)
mytable5
Placed$Degree_Engg Placed$Salary
1 No 7996660450
2 Yes 12994648276
t <- var.test(Salary ~ Degree_Engg, data=Placed, alternative = "two.sided")
t
F test to compare two variances
data: Salary by Degree_Engg
F = 0.61538, num df = 281, denom df = 29, p-value = 0.05124
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3341346 1.0025622
sample estimates:
ratio of variances
0.6153811
##As the data is not normally distributed, we perform the non-parametric Wilcoxon Test.
wilcox.test(Salary~Degree_Engg,data=Placed)
Wilcoxon rank sum test with continuity correction
data: Salary by Degree_Engg
W = 2828.5, p-value = 0.002793
alternative hypothesis: true location shift is not equal to 0
The p-value < 0.05, so we reject the Null Hypothesis that there is no difference between the salaries of Engineer and Non-engineer MBAs. In conclusion, there is significant difference between the salaries of Engineer and Non-engineer MBAs.