Data for this assignment came from the NLS Investigator, which can be accessed here.

An Analysis of the gender differences in the mean life satisfaction among working professionals

1.) To begin, I have created a new project in R and have saved the data file from Piazza into my project folder.

test_data <- read.csv(file = "pwces.csv")

I also loaded a few packages that I anticipated needing to work with:

require(dplyr)

## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

require(magrittr)

## Loading required package: magrittr

require(ggvis)

## Loading required package: ggvis

In order to look at the data to see what I am working with, I created a table data frame and executed the ‘glimpse’ command to view the variables and their names:

tbl_df(test_data)

## Source: local data frame [756 x 2]
## 
##    gender lifesat
##     (int)   (int)
## 1       1      20
## 2       1      18
## 3       0      25
## 4       0       7
## 5       0      23
## 6       0      25
## 7       1      22
## 8      NA      NA
## 9       0      21
## 10      0      29
## ..    ...     ...

glimpse(test_data)

## Observations: 756
## Variables: 2
## $ gender  (int) 1, 1, 0, 0, 0, 0, 1, NA, 0, 0, 1, 0, 1, 0, 1, 1, NA, 0...
## $ lifesat (int) 20, 18, 25, 7, 23, 25, 22, NA, 21, 29, 26, 26, 18, 28,...

2.) Next I filtered the data to elimiate the missing values, which have been designated as “NA” in the .csv file. To do this I am created a subset of the data that included ‘gender’ values of ‘0’ and ‘1’ and ‘lifesat’ values of ‘1’ or greater:

trim_data <- subset(test_data, gender >=0 & lifesat >= 1)
tbl_df (trim_data)

## Source: local data frame [675 x 2]
## 
##    gender lifesat
##     (int)   (int)
## 1       1      20
## 2       1      18
## 3       0      25
## 4       0       7
## 5       0      23
## 6       0      25
## 7       1      22
## 8       0      21
## 9       0      29
## 10      1      26
## ..    ...     ...

This subset includes a sample of 675 respondants. This will be the data used for the remainder of the analysis.

Just to be sure that my analysis to this point is on track, I ran a summary of the data to see that the min, max and mean information looked correct, indicating that I have trimmed the data correctly:

summary(trim_data)

##      gender          lifesat     
##  Min.   :0.0000   Min.   : 5.00  
##  1st Qu.:0.0000   1st Qu.:18.00  
##  Median :0.0000   Median :23.00  
##  Mean   :0.4104   Mean   :21.48  
##  3rd Qu.:1.0000   3rd Qu.:25.00  
##  Max.   :1.0000   Max.   :30.00

I can see that I have a minimum gender value of ‘0’ and a maximum gender value of ‘1’, which is as I would expect. My mean also looks about right at .41. The lifesat variables look accurate as well, with a minimum value of 5 and a maximum value of 30.

With this confidence in my work to this point I test the null hypothesis that there is no difference, by gender, between the mean life satisfation of working professionals. My null and alternative hypotheses are stated below:

Null Hypothesis: There is no statistically significant difference, by gender, between the means of the life satisfaction survey responses.

Alternate Hypothesis: Gender has a statistically significant impact on the mean life satisfaction survey responses.

To test the null hypothesis I utilize a t test which will help me evaluate any difference between the mean life satisfaction survey responses between males and females. In this case I set alpha to .05, meaning that I am willing to accept a Type 1 error rate of .05.

Note that when running a t test in R I organize the variables as (dependent variable ~ independent variable), with the ‘~’ meaning ‘depends on’. In other words, the t test helps me evaluate whether or not the dependent variable depends on the independent variable.

t.test(lifesat~gender, trim_data, var.eq=TRUE)

## 
##  Two Sample t-test
## 
## data:  lifesat by gender
## t = -1.349, df = 673, p-value = 0.1778
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.4704615  0.2727582
## sample estimates:
## mean in group 0 mean in group 1 
##        21.23869        21.83755

The results of the t test indicate that that the t value is -1.349. The fact that it is negative isn’t a concern, but the value itself of just over one suggests that the error in estimating is nearly equal to the difference between the means. In other words, we don’t necessarily have enough information to make a determination based on the data that is being analyzed at this time. Additionally, the p-value is .1778, which is greater than my established alpha of .05. Given the results of this t test, I will fail to reject the null hypothesis.

WFED 540 Mid-Term Exam

Maria Spencer

October 22, 2015

An Analysis of the gender differences in the mean life satisfaction among working professionals