I first loaded in the required packages.
Next, I read in the SPSS file and created a table dataframe called GSSdata
Next, I analyzed the HRS1 variable through the summary function and the codebook to determine how to recode the variable to adjust the negative and no answer (99).
summary(GSSdata$HRS1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.00 -1.00 25.00 24.08 40.00 99.00
GSSdata$HRS1 <- ifelse (GSSdata$HRS1 == 99, NA, GSSdata$HRS1)
GSSdata$HRS1 <- ifelse (GSSdata$HRS1 == -1, NA, GSSdata$HRS1)
I then created a new dataframe to select the four variables chosen and adjust the names for ease in coding.
data <- GSSdata
data <- data.frame(cbind(GSSdata$HRS1,GSSdata$SEX,GSSdata$BORN,GSSdata$BABIES))
names(data)<-c("HRS","SEX","BORN","BABIES")
summary(data)
## HRS SEX BORN BABIES
## Min. : 1.0 Min. :1.000 Min. :1.000 Min. :0.0000
## 1st Qu.:35.0 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:0.0000
## Median :40.0 Median :2.000 Median :1.000 Median :0.0000
## Mean :40.3 Mean :1.558 Mean :1.121 Mean :0.3799
## 3rd Qu.:50.0 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:0.0000
## Max. :98.0 Max. :2.000 Max. :9.000 Max. :9.0000
## NA's :1964
Utilizing the codebook, I determined that the variables I chose also needed to be recoded to allow for ease in analysis in the linear regression.
data$SEX <- ifelse(data$SEX == 1, 0, data$SEX)
data$SEX <- ifelse(data$SEX == 2, 1, data$SEX)
data$BORN <- ifelse(data$BORN == 1, 1, data$BORN) # US-1, others-0
data$BORN <- ifelse(data$BORN == 2, 0, data$BORN) # US-1, others-0
data$BORN <- ifelse(data$BORN == 9, NA, data$BORN) # US-1, others-0
data$BABIES <- ifelse(data$BABIES == 0, 0, data$BABIES) # no baby-0 baby-1
data$BABIES <- ifelse(data$BABIES == 1, 1, data$BABIES) # no baby-0 baby-1
data$BABIES <- ifelse(data$BABIES == 2, 1, data$BABIES) # no baby-0 baby-1
data$BABIES <- ifelse(data$BABIES == 3, 1, data$BABIES) # no baby-0 baby-1
data$BABIES <- ifelse(data$BABIES == 4, 1, data$BABIES) # no baby-0 baby-1
data$BABIES <- ifelse(data$BABIES == 5, 1, data$BABIES) # no baby-0 baby-1
summary(data)
## HRS SEX BORN BABIES
## Min. : 1.0 Min. :0.0000 Min. :0.0000 Min. :0.000
## 1st Qu.:35.0 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.000
## Median :40.0 Median :1.0000 Median :1.0000 Median :0.000
## Mean :40.3 Mean :0.5577 Mean :0.8852 Mean :0.327
## 3rd Qu.:50.0 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.000
## Max. :98.0 Max. :1.0000 Max. :1.0000 Max. :9.000
## NA's :1964 NA's :4
I then ran the mean and standard diviations for each of the variables.
data %>% summarise(mean_hours=mean(HRS, na.rm=TRUE), sd_hours=sd(HRS, na.rm=TRUE), n=n())
## mean_hours sd_hours n
## 1 40.30427 15.54908 4820
data %>%
summarise(mean_SEX=mean(SEX, na.rm=TRUE), sd_SEX=sd(SEX, na.rm=TRUE), n=n())
## mean_SEX sd_SEX n
## 1 0.5576763 0.4967138 4820
data %>%
summarise(mean_BORN=mean(BORN, na.rm=TRUE), sd_BORN=sd(BORN, na.rm=TRUE), n=n())
## mean_BORN sd_BORN n
## 1 0.8851744 0.3188444 4820
data %>%
summarise(mean_BABIES=mean(BABIES, na.rm=TRUE), sd_BABIES=sd(BABIES, na.rm=TRUE), n=n())
## mean_BABIES sd_BABIES n
## 1 0.326971 1.29781 4820
I then created the two histograms of HRS1 and ln(HRS1)
data %>% ggvis(~data$HRS) %>% layer_histograms()%>%
add_axis("x", title = "Number of People Working Full or Part-Time", title_offset = 50) %>%
add_axis("y", title = "Hours Per Week Worked", title_offset = 50)
## Guessing width = 5 # range / 20
data$HRSlog <- ifelse(data$HRS < 0, NA, data$HRS)
summary(data$HRSlog)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.0 35.0 40.0 40.3 50.0 98.0 1964
data %>% ggvis(~data$HRSlog) %>% layer_histograms()%>%
add_axis("x", title = "Number of People Working Full or Part-Time", title_offset = 50) %>%
add_axis("y", title = "Hours Per Week Worked", title_offset = 50)
## Guessing width = 5 # range / 20
Regression Analysis and Hypothesis Testing
lmfit<-lm(HRS ~ SEX + BORN + BABIES, data=data)
lmfit
##
## Call:
## lm(formula = HRS ~ SEX + BORN + BABIES, data = data)
##
## Coefficients:
## (Intercept) SEX BORN BABIES
## 43.6090 -6.9161 0.3053 -0.1480
summary(lmfit)
##
## Call:
## lm(formula = HRS ~ SEX + BORN + BABIES, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.914 -5.998 1.234 6.391 61.307
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 43.6090 0.8426 51.757 <2e-16 ***
## SEX -6.9161 0.5680 -12.175 <2e-16 ***
## BORN 0.3053 0.8455 0.361 0.718
## BABIES -0.1480 0.2263 -0.654 0.513
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.17 on 2851 degrees of freedom
## (1965 observations deleted due to missingness)
## Multiple R-squared: 0.04955, Adjusted R-squared: 0.04855
## F-statistic: 49.55 on 3 and 2851 DF, p-value: < 2.2e-16
confint(lmfit)
## 2.5 % 97.5 %
## (Intercept) 41.9568624 45.2610773
## SEX -8.0299379 -5.8022946
## BORN -1.3525053 1.9630244
## BABIES -0.5917954 0.2957156
loglmfit<-lm(log(HRS) ~ SEX + BORN + BABIES, data=data)
loglmfit
##
## Call:
## lm(formula = log(HRS) ~ SEX + BORN + BABIES, data = data)
##
## Coefficients:
## (Intercept) SEX BORN BABIES
## 3.712224 -0.206885 -0.023373 -0.003912
summary(loglmfit)
##
## Call:
## lm(formula = log(HRS) ~ SEX + BORN + BABIES, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6889 -0.0513 0.1329 0.2571 1.0796
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.712224 0.030246 122.736 <2e-16 ***
## SEX -0.206885 0.020391 -10.146 <2e-16 ***
## BORN -0.023373 0.030349 -0.770 0.441
## BABIES -0.003912 0.008124 -0.481 0.630
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5445 on 2851 degrees of freedom
## (1965 observations deleted due to missingness)
## Multiple R-squared: 0.03522, Adjusted R-squared: 0.0342
## F-statistic: 34.69 on 3 and 2851 DF, p-value: < 2.2e-16
confint(loglmfit)
## 2.5 % 97.5 %
## (Intercept) 3.65291859 3.77152965
## SEX -0.24686815 -0.16690267
## BORN -0.08288191 0.03613531
## BABIES -0.01984111 0.01201778
The Null Hypothesis is rejected for the linear regression of HRS and the natural logarithm of HRS
Hypothesis (null) There is no relation between the independent variable HRS1 and the three dependeing variables, SEX, BORN, and BABIES
Hypothesis (alternative) There is a relationship between the independent variable HRS1 and the three dependeing variables, SEX, BORN, and BABIES
A linear regression analysis will be used to test the null hypothesis
The null hypothesis is rejected at the specified .05 level, Adjusted R-squared= 0.05, p <.05
Hypothesis (null) There is no relation between the natural logarithm of the independent variable HRS1 and the three dependeing variables, SEX, BORN, and BABIES
Hypothesis (alternative) There is a relationship between the natural logarithm of the independent variable HRS1 and the three dependeing variables, SEX, BORN, and BABIES
A linear regression analysis will be used to test the null hypothesis
The null hypothesis is rejected at the specified .05 level, Adjusted R-squared= 0.03, p <.05
Independent Variable Hypothesis Testing-HRS1
Hypothesis (null) There is no relation between the independent variable HRS1 and SEX
Hypothesis (alternative) There is a relationship between the independent variable HRS1 and SEX
A linear regression analysis will be used to test the null hypothesis
The null hypothesis is rejected at the specified .05 level, Adjusted R-squared= 0.05, p <.05
Hypothesis (null) There is no relation between the independent variable HRS1 and the dependent variable BORN
Hypothesis (alternative) There is a relationship between the independent variable HRS1 and the variable BORN
A linear regression analysis will be used to test the null hypothesis
I fail to reject the null hypothesis at the specified .05 level, Adjusted R-squared= 0.05, p >.05
Hypothesis (null) There is no relation between the independent variable HRS1 the dependent variable BABIES
Hypothesis (alternative) There is a relationship between the independent variable HRS1 and the dependent variable BABIES
A linear regression analysis will be used to test the null hypothesis
I fail to reject the null hypothesis at the specified .05 level, Adjusted R-squared= 0.05, p >.05
Independent Variable Hypothesis Testing-NaturalLogarithm of HRS1
Hypothesis (null) There is no relation between the natural logarithm of the independent variable HRS1 and the dependent variable SEX
Hypothesis (alternative) There is a relationship between the natural logarithm of the independent variable HRS1 and the dependent variable SEX
A linear regression analysis will be used to test the null hypothesis
The null hypothesis is rejected at the specified .05 level, Adjusted R-squared= 0.03, p <.05
Hypothesis (null) There is no relation between the natural logarithm of the independent variable HRS1 and the dependent variable BORN
Hypothesis (alternative) There is a relationship between the natural logarithm of the independent variable HRS1 and the dependent variable BORN
A linear regression analysis will be used to test the null hypothesis
I fail to reject the null hypothesis at the specified .05 level, Adjusted R-squared= 0.03, p >.05
Hypothesis (null) There is no relation between the natural logarithm of the independent variable HRS1 and the dependent variable BABIES
Hypothesis (alternative) There is a relationship between the natural logarithm of the independent variable HRS1 and the dependent variable BABIES
A linear regression analysis will be used to test the null hypothesis
I fail to reject the null hypothesis at the specified .05 level, Adjusted R-squared= 0.03, p >.05
Meaning of Two Regression Coefficients
SEX The estimate of the SEX variable is -0.21. There is a relationship between SEX and the Natural Log of HRS1 at the .05 level. The regression coefficient is negative. In this variable, Males were coded as zero and Females coded as as one. With a -0.21 regression coefficient there is a negative relationship between Sex and the Natural Log of HRS1. What this tells us is that for each hour that a man works per week, women work .21 hours less.
BABIES The estimate of the BABIES varialbe is 0. Because the p-value is greater than .05 this tells me that there is no relationship between the hours per week worked and if they have babies or not.