The following report is analysis of data from the Professional Worker Career Experience Survey
To begin, I first loaded required packages and also ensured the csv file was read from the the web. I also took a glimpse of the data and removed all of the NA from both of the variables.
require (datasets)
require (ggvis)
## Loading required package: ggvis
require (dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require (magrittr)
## Loading required package: magrittr
satisfaction<-read.csv(file = "http://www.personal.psu.edu/dlp/WFED540/pwces.csv", header = TRUE, sep = ",")
satisfaction<-tbl_df(satisfaction)
satisfaction
## Source: local data frame [756 x 2]
##
## gender lifesat
## 1 1 20
## 2 1 18
## 3 0 25
## 4 0 7
## 5 0 23
## 6 0 25
## 7 1 22
## 8 NA NA
## 9 0 21
## 10 0 29
## .. ... ...
glimpse(satisfaction)
## Observations: 756
## Variables:
## $ gender (int) 1, 1, 0, 0, 0, 0, 1, NA, 0, 0, 1, 0, 1, 0, 1, 1, NA, 0...
## $ lifesat (int) 20, 18, 25, 7, 23, 25, 22, NA, 21, 29, 26, 26, 18, 28,...
satisfaction1<- na.omit(satisfaction)
glimpse(satisfaction1)
## Observations: 675
## Variables:
## $ gender (int) 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, ...
## $ lifesat (int) 20, 18, 25, 7, 23, 25, 22, 21, 29, 26, 26, 18, 28, 21,...
Hypothesis (null) There is no difference in the mean life satisfaction of males verses females.
Hypothesis (alternative) There is a difference in the mean life satisfaction of males verses females.
To test the null hypothesis, I looked at the means of the life satisfaction (dependent) variable by gender (independent) variable and ran a histogram to determine if the means were skewed from outliers.
satisfaction1 %>% group_by(gender, na.rm=TRUE) %>% summarise(avg_lifesat= mean(lifesat, na.rm=TRUE))
## Source: local data frame [2 x 3]
## Groups: gender
##
## gender na.rm avg_lifesat
## 1 0 TRUE 21.23869
## 2 1 TRUE 21.83755
satisfaction1%>% ggvis(~lifesat) %>% layer_histograms
## Guessing width = 1 # range / 25
The histogram did not reveal major outliers in the data.
I will conduct a t-test to analyze the relationshiop between mean life satisfaction and gendar at an alpha level of .05
t.test(lifesat~gender, satisfaction1, var.equal=TRUE)
##
## Two Sample t-test
##
## data: lifesat by gender
## t = -1.349, df = 673, p-value = 0.1778
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.4704615 0.2727582
## sample estimates:
## mean in group 0 mean in group 1
## 21.23869 21.83755
The test failed to reject the null hypothesis at the .05 level, t=-1.349, p =.17, 95% CI[-1.47, .272].
From this test statistic, we can infer that there is no difference in the mean life satisfaction between genders among working professionals.