Read the data set into RStudio
We do this by placing the file in the R folder and then reading it by providing details .
dean.df<-read.csv(paste("G:/R Intern/Data - Deans Dilemma.csv",sep=""))
View(dean.df)
dim(dean.df)
## [1] 391 26
As seen The dean data frame thus created matched the data we got in csv file.
Use R to create a table showing the average salary of males and females, who were placed. Review whether there is a gender gap in the data. In other words, observe whether the average salaries of males is higher than the average salaries of females in this dataset.
First collecting the placed students data in a new data frame.
placed.df<-dean.df[ which(dean.df$Placement_B==1),]
View(placed.df)
Now to find the average salary of placed males and females.
placed.salary<-aggregate(placed.df$Salary,list(placed.df$Gender),mean)
placed.salary
## Group.1 x
## 1 F 253068.0
## 2 M 284241.9
The Average salaries of Males is 284241.9.
The Average Salaries of Females is 253068.0
As we see in the above data table, there is a difference in the average salaries of men and women.
Use R to run a t-test to test the following hypothesis: H1: The average salary of the male MBAs is higher than the average salary of female MBAs.
t.test(Salary~Gender,data = placed.df)
##
## Welch Two Sample t-test
##
## data: Salary by Gender
## t = -3.0757, df = 243.03, p-value = 0.00234
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -51138.42 -11209.22
## sample estimates:
## mean in group F mean in group M
## 253068.0 284241.9
P-value is 0.00234.
p-value is less than 0.01. A p-value less than 0.01 will under normal circumstances mean that there is substantial evidence against the null hypothesis i.e. this implies that it is significant.