Session 8_LRS

Ramya L.
3rd Otctober 2017

1.What is the median salary of all the students?

Dean<-read.csv("DeansDilemmaData.csv")
median(Dean$Salary)

[1] 240000

2.What percentage of students were placed? Answer this after rounding off to 2 decimal places.

round(nrow(subset(Dean,Dean$Placement=="Placed"))/nrow(Dean),2)

[1] 0.8

3.Create a dataframe called 'placed', containing only those students who were successfully placed?

 Placed<-data.frame(subset(Dean,Dean$Placement=="Placed"))

4.What is the median salary of students who were placed?

median(Placed$Salary)

[1] 260000

5.Create a table showing the mean salary of males and females, who were placed.

aggregate(Placed$Salary, by=list(Category=Placed$Gender), FUN=mean)

  Category        x
1        F 253068.0
2        M 284241.9

6.Breakup of the MBA performance of the students who were placed.

hist(Placed$Percent_MBA,breaks=c(50,60,70,80),main="MBA performance of Placed students", xlab="MBA Percentage", ylab="Count",col="misty rose")

plot of chunk unnamed-chunk-6

7.Create a dataframe called 'notplaced', of students who were NOT placed after their MBA.

notplaced<-data.frame(subset(Dean,Dean$Placement=="Not Placed"))

8.Generate two histograms side-by-side, visually comparing the MBA performance of Placed and Not Placed students, as shown below

par(mfrow = c(1:2))
hist(Placed$Percent_MBA,breaks=c(50,60,70,80),main="MBA performance of Placed students", xlab="MBA Percentage", ylab="Count",col="misty rose")
hist(notplaced$Percent_MBA,breaks=c(50,60,70,80),main="MBA performance of Not Placed students", xlab="MBA Percentage", ylab="Count",col="powder blue")

plot of chunk unnamed-chunk-8

9.Generate two boxplots, one below the other, comparing the distribution of salaries of males and females who were placed

boxplot(Placed$Salary~Placed$Gender,horizontal = TRUE, col= c("powder blue", "misty rose"),xlab = "Salary",ylab = "Gender",main = "Comparison of Salaries of Males and Females")

plot of chunk unnamed-chunk-9

10.Create a dataframe called 'placedET', representing students who were placed after the MBA and who also gave some MBA entrance test before admission into the MBA program

placedET<-data.frame(subset(Dean,Dean$Placement=="Placed"&Dean$Entrance_Test!="None"))

11.Draw a Scatter Plot Matrix for 3 variables -- {Salary, Percent_MBA, Percentile_ET} using the dataframe placedET

library(car)
scatterplotMatrix(~ Salary+Percent_MBA+Percentile_ET, data=placedET,main="Scatter Plot Matrix")

plot of chunk unnamed-chunk-11

12.What is mean salary of all the students who were placed?

mean(Placed$Salary)

[1] 274550

13.Test whether the distribution of the salaries is normal?(1/3)

qqnorm(Placed$Salary)
qqline(Placed$Salary)

plot of chunk unnamed-chunk-13

13.Test whether the distribution of the salaries is normal?(2/3)

hist(Placed$Salary,freq=FALSE)
lines(density(Placed$Salary), lwd=2)

plot of chunk unnamed-chunk-14

13.Test whether the distribution of the salaries is normal?(3/3)

shapiro.test(Placed$Salary)


    Shapiro-Wilk normality test

data:  Placed$Salary
W = 0.84168, p-value < 2.2e-16

p-value<0.05 so null hypothesis that distribution is normal can be rejected

14.Create a table giving the mean salary of the men and the women MBAs who were placed?

aggregate(Placed$Salary, by=list(Category=Placed$Gender), FUN=mean)

  Category        x
1        F 253068.0
2        M 284241.9

15.Visualize the mean salary of men and women who were placed, using plotmeans()

library(gplots)
plotmeans(Placed$Salary~Placed$Gender)

plot of chunk unnamed-chunk-17

16.Create a table giving the variance of the salary of the men and women MBAs who were placed?

aggregate(Placed$Salary, by=list(Category=Placed$Gender), FUN=var)

  Category          x
1        F 5504236572
2        M 9886408520

17.Use a suitable statistical test to compare whether the variances are significantly different?

Placedftest <- var.test(Salary ~ Gender, data=Placed, alternative = "two.sided")
Placedftest


    F test to compare two variances

data:  Salary by Gender
F = 0.55675, num df = 96, denom df = 214, p-value = 0.00135
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.3999212 0.7927360
sample estimates:
ratio of variances 
         0.5567478

p-value less than 0.05 so null hyothesis that variances are same can be rejected

18.Use a suitable statistical test to compare whether there is a significant difference between the salaries of men and women?

Data is neither normal nor does it have equal variances so non-parametric wilcoxon test needs to be conducted

wilcox.test(Salary~Gender,data=Placed)


    Wilcoxon rank sum test with continuity correction

data:  Salary by Gender
W = 8146.5, p-value = 0.001939
alternative hypothesis: true location shift is not equal to 0

p-value less than 0.05 so null hyothesis that salaries for both the sexes are same can be rejected

Hence salaries of men and women are significantly different

19.Create a table giving the mean salary of the Engineer MBAs and the Non-Engineer MBAs who were placed?

aggregate(Placed$Salary, by=list(Category=Placed$Degree_Engg), FUN=mean)

  Category        x
1       No 269161.7
2      Yes 325200.0

20.Visualize the mean salary of Engineer and Non-Engineer MBAs who were placed, using plotmeans()

library(gplots)
plotmeans(Placed$Salary~Placed$Degree_Engg)

plot of chunk unnamed-chunk-22

21.Create a table giving the variance of the salary of the men and women MBAs who were placed?

aggregate(Placed$Salary, by=list(Category=Placed$Degree_Engg), FUN=var)

  Category           x
1       No  7996660450
2      Yes 12994648276

22.Use a suitable statistical test to compare whether the variances are significantly different?

Enggftest <- var.test(Salary ~ Degree_Engg, data=Placed, alternative = "two.sided")
Enggftest


    F test to compare two variances

data:  Salary by Degree_Engg
F = 0.61538, num df = 281, denom df = 29, p-value = 0.05124
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.3341346 1.0025622
sample estimates:
ratio of variances 
         0.6153811

p-value greater than 0.05 so null hyothesis that variances are cannot be rejected

23.Use a suitable statistical test to compare whether there is a significant difference between the salaries of men and women?

Data is not normal but has equal variances so non-parametric wilcoxon test needs to be conducted

wilcox.test(Salary~Degree_Engg,data=Placed)


    Wilcoxon rank sum test with continuity correction

data:  Salary by Degree_Engg
W = 2828.5, p-value = 0.002793
alternative hypothesis: true location shift is not equal to 0

p-value less than 0.05 so null hyothesis that salaries for Engineers and Non-Engineers are same can be rejected

Hence salaries of Engineers and Non-Engineers are significantly different