-> Reading the datset
dean.df <- read.csv(paste("Data - Deans Dilemma.csv", sep=""))
-> Viewing datframe in R
View(dean.df)
-> Creating a dataframe , placed , that contains a subset of only those students who were successfully placed
placed.df<- dean.df[ which(dean.df$Placement_B=='1' & dean.df$Placement=="Placed"), ]
View(placed.df)
placed.salary.mean <- aggregate(placed.df$Salary, list(placed.df$Gender),mean)
placed.salary.mean
## Group.1 x
## 1 F 253068.0
## 2 M 284241.9
View(placed.salary.mean)
placed.df$MeanSalary <- placed.salary.mean [placed.df$Gender,2]
View(placed.df)
placed.salary.mean
## Group.1 x
## 1 F 253068.0
## 2 M 284241.9
->A dependent t-test assumes that the difference between groups is normally distributed. So, both the groups should be numeric vectors -> Changing the format of the column ( factor to num) -> Labeling : 2-Male and 1-Female
str(placed.df$Gender)
## Factor w/ 2 levels "F","M": 2 2 2 2 2 2 1 2 2 1 ...
placed.df$Gender <- as.numeric(placed.df$Gender)
str(placed.df$Gender)
## num [1:312] 2 2 2 2 2 2 1 2 2 1 ...
t.test(placed.df$Gender , placed.df$MeanSalary , var.equal = TRUE , paired = TRUE)
##
## Paired t-test
##
## data: placed.df$Gender and placed.df$MeanSalary
## t = -335.56, df = 311, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -276158.2 -272938.5
## sample estimates:
## mean of the differences
## -274548.3
—> p-value < 2.2e-16 (p-value < 0.05)
->As the p-value is less than 0.05 , the null hypothesis is rejected.
-> There is a significant difference between the average salary of male MBA’s and female MBA’s.
-> From the average salaries obtained, the Hypothesis : “The average salary of the male MBA’s is higher than the average salary of female MBA’s” is true.