Ramya L.
3rd Otctober 2017
Dean<-read.csv("DeansDilemmaData.csv")
median(Dean$Salary)
[1] 240000
round(nrow(subset(Dean,Dean$Placement=="Placed"))/nrow(Dean),2)
[1] 0.8
Placed<-data.frame(subset(Dean,Dean$Placement=="Placed"))
median(Placed$Salary)
[1] 260000
aggregate(Placed$Salary, by=list(Category=Placed$Gender), FUN=mean)
Category x
1 F 253068.0
2 M 284241.9
hist(Placed$Percent_MBA,breaks=c(50,60,70,80),main="MBA performance of Placed students", xlab="MBA Percentage", ylab="Count",col="misty rose")
notplaced<-data.frame(subset(Dean,Dean$Placement=="Not Placed"))
par(mfrow = c(1:2))
hist(Placed$Percent_MBA,breaks=c(50,60,70,80),main="MBA performance of Placed students", xlab="MBA Percentage", ylab="Count",col="misty rose")
hist(notplaced$Percent_MBA,breaks=c(50,60,70,80),main="MBA performance of Not Placed students", xlab="MBA Percentage", ylab="Count",col="powder blue")
boxplot(Placed$Salary~Placed$Gender,horizontal = TRUE, col= c("powder blue", "misty rose"),xlab = "Salary",ylab = "Gender",main = "Comparison of Salaries of Males and Females")
placedET<-data.frame(subset(Dean,Dean$Placement=="Placed"&Dean$Entrance_Test!="None"))
library(car)
scatterplotMatrix(~ Salary+Percent_MBA+Percentile_ET, data=placedET,main="Scatter Plot Matrix")
mean(Placed$Salary)
[1] 274550
qqnorm(Placed$Salary)
qqline(Placed$Salary)
hist(Placed$Salary,freq=FALSE)
lines(density(Placed$Salary), lwd=2)
shapiro.test(Placed$Salary)
Shapiro-Wilk normality test
data: Placed$Salary
W = 0.84168, p-value < 2.2e-16
p-value<0.05 so null hypothesis that distribution is normal can be rejected
aggregate(Placed$Salary, by=list(Category=Placed$Gender), FUN=mean)
Category x
1 F 253068.0
2 M 284241.9
library(gplots)
plotmeans(Placed$Salary~Placed$Gender)
aggregate(Placed$Salary, by=list(Category=Placed$Gender), FUN=var)
Category x
1 F 5504236572
2 M 9886408520
Placedftest <- var.test(Salary ~ Gender, data=Placed, alternative = "two.sided")
Placedftest
F test to compare two variances
data: Salary by Gender
F = 0.55675, num df = 96, denom df = 214, p-value = 0.00135
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3999212 0.7927360
sample estimates:
ratio of variances
0.5567478
p-value less than 0.05 so null hyothesis that variances are same can be rejected
Data is neither normal nor does it have equal variances so non-parametric wilcoxon test needs to be conducted
wilcox.test(Salary~Gender,data=Placed)
Wilcoxon rank sum test with continuity correction
data: Salary by Gender
W = 8146.5, p-value = 0.001939
alternative hypothesis: true location shift is not equal to 0
p-value less than 0.05 so null hyothesis that salaries for both the sexes are same can be rejected
Hence salaries of men and women are significantly different
aggregate(Placed$Salary, by=list(Category=Placed$Degree_Engg), FUN=mean)
Category x
1 No 269161.7
2 Yes 325200.0
library(gplots)
plotmeans(Placed$Salary~Placed$Degree_Engg)
aggregate(Placed$Salary, by=list(Category=Placed$Degree_Engg), FUN=var)
Category x
1 No 7996660450
2 Yes 12994648276
Enggftest <- var.test(Salary ~ Degree_Engg, data=Placed, alternative = "two.sided")
Enggftest
F test to compare two variances
data: Salary by Degree_Engg
F = 0.61538, num df = 281, denom df = 29, p-value = 0.05124
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.3341346 1.0025622
sample estimates:
ratio of variances
0.6153811
p-value greater than 0.05 so null hyothesis that variances are cannot be rejected
Data is not normal but has equal variances so non-parametric wilcoxon test needs to be conducted
wilcox.test(Salary~Degree_Engg,data=Placed)
Wilcoxon rank sum test with continuity correction
data: Salary by Degree_Engg
W = 2828.5, p-value = 0.002793
alternative hypothesis: true location shift is not equal to 0
p-value less than 0.05 so null hyothesis that salaries for Engineers and Non-Engineers are same can be rejected
Hence salaries of Engineers and Non-Engineers are significantly different