I first loaded in the required packages.

Next, I read in the SPSS file and created a table dataframe called GSSdata

Next, I analyzed the HRS1 variable through the summary function and the codebook to determine how to recode the variable to adjust the negative and no answer (99).

summary(GSSdata$HRS1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -1.00   -1.00   25.00   24.08   40.00   99.00
GSSdata$HRS1 <- ifelse (GSSdata$HRS1 == 99, NA, GSSdata$HRS1)
GSSdata$HRS1 <- ifelse (GSSdata$HRS1 == -1, NA, GSSdata$HRS1)

I then created a new dataframe to select the four variables chosen and adjust the names for ease in coding.

data <- GSSdata  
data <- data.frame(cbind(GSSdata$HRS1,GSSdata$SEX,GSSdata$BORN,GSSdata$BABIES))
names(data)<-c("HRS","SEX","BORN","BABIES")
summary(data)
##       HRS            SEX             BORN           BABIES      
##  Min.   : 1.0   Min.   :1.000   Min.   :1.000   Min.   :0.0000  
##  1st Qu.:35.0   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:0.0000  
##  Median :40.0   Median :2.000   Median :1.000   Median :0.0000  
##  Mean   :40.3   Mean   :1.558   Mean   :1.121   Mean   :0.3799  
##  3rd Qu.:50.0   3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.:0.0000  
##  Max.   :98.0   Max.   :2.000   Max.   :9.000   Max.   :9.0000  
##  NA's   :1964

Utilizing the codebook, I determined that the variables I chose also needed to be recoded to allow for ease in analysis in the linear regression.

data$SEX <- ifelse(data$SEX == 1, 0, data$SEX)
data$SEX <- ifelse(data$SEX == 2, 1, data$SEX)

data$BORN <- ifelse(data$BORN == 1, 1, data$BORN) # US-1, others-0
data$BORN <- ifelse(data$BORN == 2, 0, data$BORN) # US-1, others-0
data$BORN <- ifelse(data$BORN == 9, NA, data$BORN) # US-1, others-0
data$BABIES <- ifelse(data$BABIES == 0, 0, data$BABIES) # no baby-0 baby-1
data$BABIES <- ifelse(data$BABIES == 1, 1, data$BABIES) # no baby-0 baby-1
data$BABIES <- ifelse(data$BABIES == 2, 1, data$BABIES) # no baby-0 baby-1
data$BABIES <- ifelse(data$BABIES == 3, 1, data$BABIES) # no baby-0 baby-1
data$BABIES <- ifelse(data$BABIES == 4, 1, data$BABIES) # no baby-0 baby-1
data$BABIES <- ifelse(data$BABIES == 5, 1, data$BABIES) # no baby-0 baby-1
summary(data)
##       HRS            SEX              BORN            BABIES     
##  Min.   : 1.0   Min.   :0.0000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:35.0   1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:0.000  
##  Median :40.0   Median :1.0000   Median :1.0000   Median :0.000  
##  Mean   :40.3   Mean   :0.5577   Mean   :0.8852   Mean   :0.327  
##  3rd Qu.:50.0   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.000  
##  Max.   :98.0   Max.   :1.0000   Max.   :1.0000   Max.   :9.000  
##  NA's   :1964                    NA's   :4

I then ran the mean and standard diviations for each of the variables.

data %>% summarise(mean_hours=mean(HRS, na.rm=TRUE), sd_hours=sd(HRS, na.rm=TRUE), n=n())
##   mean_hours sd_hours    n
## 1   40.30427 15.54908 4820
data %>%
  summarise(mean_SEX=mean(SEX, na.rm=TRUE), sd_SEX=sd(SEX, na.rm=TRUE), n=n()) 
##    mean_SEX    sd_SEX    n
## 1 0.5576763 0.4967138 4820
data %>%
  summarise(mean_BORN=mean(BORN, na.rm=TRUE), sd_BORN=sd(BORN, na.rm=TRUE), n=n())
##   mean_BORN   sd_BORN    n
## 1 0.8851744 0.3188444 4820
data %>%
  summarise(mean_BABIES=mean(BABIES, na.rm=TRUE), sd_BABIES=sd(BABIES, na.rm=TRUE), n=n())
##   mean_BABIES sd_BABIES    n
## 1    0.326971   1.29781 4820
I then created the two histograms of HRS1 and ln(HRS1)
data %>% ggvis(~data$HRS) %>% layer_histograms()%>%
add_axis("x", title = "Number of People Working Full or Part-Time", title_offset = 50) %>%
add_axis("y", title = "Hours Per Week Worked", title_offset = 50)
## Guessing width = 5 # range / 20
HRS Histogram
data$HRSlog <- ifelse(data$HRS < 0, NA, data$HRS)
summary(data$HRSlog)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     1.0    35.0    40.0    40.3    50.0    98.0    1964
data %>% ggvis(~data$HRSlog) %>% layer_histograms()%>%
add_axis("x", title = "Number of People Working Full or Part-Time", title_offset = 50) %>%
  add_axis("y", title = "Hours Per Week Worked", title_offset = 50)
## Guessing width = 5 # range / 20
HRSLog Histogram

Regression Analysis and Hypothesis Testing

lmfit<-lm(HRS ~ SEX + BORN + BABIES, data=data)
lmfit
## 
## Call:
## lm(formula = HRS ~ SEX + BORN + BABIES, data = data)
## 
## Coefficients:
## (Intercept)          SEX         BORN       BABIES  
##     43.6090      -6.9161       0.3053      -0.1480
summary(lmfit)
## 
## Call:
## lm(formula = HRS ~ SEX + BORN + BABIES, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -42.914  -5.998   1.234   6.391  61.307 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  43.6090     0.8426  51.757   <2e-16 ***
## SEX          -6.9161     0.5680 -12.175   <2e-16 ***
## BORN          0.3053     0.8455   0.361    0.718    
## BABIES       -0.1480     0.2263  -0.654    0.513    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.17 on 2851 degrees of freedom
##   (1965 observations deleted due to missingness)
## Multiple R-squared:  0.04955,    Adjusted R-squared:  0.04855 
## F-statistic: 49.55 on 3 and 2851 DF,  p-value: < 2.2e-16
confint(lmfit)
##                  2.5 %     97.5 %
## (Intercept) 41.9568624 45.2610773
## SEX         -8.0299379 -5.8022946
## BORN        -1.3525053  1.9630244
## BABIES      -0.5917954  0.2957156

loglmfit<-lm(log(HRS) ~ SEX + BORN + BABIES, data=data)
loglmfit
## 
## Call:
## lm(formula = log(HRS) ~ SEX + BORN + BABIES, data = data)
## 
## Coefficients:
## (Intercept)          SEX         BORN       BABIES  
##    3.712224    -0.206885    -0.023373    -0.003912
summary(loglmfit)
## 
## Call:
## lm(formula = log(HRS) ~ SEX + BORN + BABIES, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6889 -0.0513  0.1329  0.2571  1.0796 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.712224   0.030246 122.736   <2e-16 ***
## SEX         -0.206885   0.020391 -10.146   <2e-16 ***
## BORN        -0.023373   0.030349  -0.770    0.441    
## BABIES      -0.003912   0.008124  -0.481    0.630    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5445 on 2851 degrees of freedom
##   (1965 observations deleted due to missingness)
## Multiple R-squared:  0.03522,    Adjusted R-squared:  0.0342 
## F-statistic: 34.69 on 3 and 2851 DF,  p-value: < 2.2e-16
confint(loglmfit)
##                   2.5 %      97.5 %
## (Intercept)  3.65291859  3.77152965
## SEX         -0.24686815 -0.16690267
## BORN        -0.08288191  0.03613531
## BABIES      -0.01984111  0.01201778

The Null Hypothesis is rejected for the linear regression of HRS and the natural logarithm of HRS

Hypothesis (null) There is no relation between the independent variable HRS1 and the three dependeing variables, SEX, BORN, and BABIES

Hypothesis (alternative) There is a relationship between the independent variable HRS1 and the three dependeing variables, SEX, BORN, and BABIES

A linear regression analysis will be used to test the null hypothesis
The null hypothesis is rejected at the specified .05 level, Adjusted R-squared= 0.05, p <.05


Hypothesis (null) There is no relation between the natural logarithm of the independent variable HRS1 and the three dependeing variables, SEX, BORN, and BABIES

Hypothesis (alternative) There is a relationship between the natural logarithm of the independent variable HRS1 and the three dependeing variables, SEX, BORN, and BABIES

A linear regression analysis will be used to test the null hypothesis
The null hypothesis is rejected at the specified .05 level, Adjusted R-squared= 0.03, p <.05


Independent Variable Hypothesis Testing-HRS1

Hypothesis (null) There is no relation between the independent variable HRS1 and SEX

Hypothesis (alternative) There is a relationship between the independent variable HRS1 and SEX

A linear regression analysis will be used to test the null hypothesis
The null hypothesis is rejected at the specified .05 level, Adjusted R-squared= 0.05, p <.05


Hypothesis (null) There is no relation between the independent variable HRS1 and the dependent variable BORN

Hypothesis (alternative) There is a relationship between the independent variable HRS1 and the variable BORN

A linear regression analysis will be used to test the null hypothesis
I fail to reject the null hypothesis at the specified .05 level, Adjusted R-squared= 0.05, p >.05


Hypothesis (null) There is no relation between the independent variable HRS1 the dependent variable BABIES

Hypothesis (alternative) There is a relationship between the independent variable HRS1 and the dependent variable BABIES

A linear regression analysis will be used to test the null hypothesis
I fail to reject the null hypothesis at the specified .05 level, Adjusted R-squared= 0.05, p >.05


Independent Variable Hypothesis Testing-NaturalLogarithm of HRS1

Hypothesis (null) There is no relation between the natural logarithm of the independent variable HRS1 and the dependent variable SEX

Hypothesis (alternative) There is a relationship between the natural logarithm of the independent variable HRS1 and the dependent variable SEX

A linear regression analysis will be used to test the null hypothesis
The null hypothesis is rejected at the specified .05 level, Adjusted R-squared= 0.03, p <.05


Hypothesis (null) There is no relation between the natural logarithm of the independent variable HRS1 and the dependent variable BORN

Hypothesis (alternative) There is a relationship between the natural logarithm of the independent variable HRS1 and the dependent variable BORN

A linear regression analysis will be used to test the null hypothesis
I fail to reject the null hypothesis at the specified .05 level, Adjusted R-squared= 0.03, p >.05


Hypothesis (null) There is no relation between the natural logarithm of the independent variable HRS1 and the dependent variable BABIES

Hypothesis (alternative) There is a relationship between the natural logarithm of the independent variable HRS1 and the dependent variable BABIES

A linear regression analysis will be used to test the null hypothesis
I fail to reject the null hypothesis at the specified .05 level, Adjusted R-squared= 0.03, p >.05


Meaning of Two Regression Coefficients

SEX The estimate of the SEX variable is -0.21. There is a relationship between SEX and the Natural Log of HRS1 at the .05 level. The regression coefficient is negative. In this variable, Males were coded as zero and Females coded as as one. With a -0.21 regression coefficient there is a negative relationship between Sex and the Natural Log of HRS1. What this tells us is that for each hour that a man works per week, women work .21 hours less.

BABIES The estimate of the BABIES varialbe is 0. Because the p-value is greater than .05 this tells me that there is no relationship between the hours per week worked and if they have babies or not.