WFED540, Assignment 4

I first open the packages and libraries I will need to complete the assignment.

require(corrplot)

## Loading required package: corrplot

require(dplyr)

## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

require(Ecdat)

## Loading required package: Ecdat
## Loading required package: Ecfun
## 
## Attaching package: 'Ecdat'
## 
## The following object is masked from 'package:datasets':
## 
##     Orange

require(ggvis)

## Loading required package: ggvis

The assignment requires us to use the dataset Mroz. So I open and review that data to select my 4 continuous variable for analysis. I also create a table data frame of Mroz.

summary(Mroz)

##   work         hoursw           child6          child618    
##  yes:325   Min.   :   0.0   Min.   :0.0000   Min.   :0.000  
##  no :428   1st Qu.:   0.0   1st Qu.:0.0000   1st Qu.:0.000  
##            Median : 288.0   Median :0.0000   Median :1.000  
##            Mean   : 740.6   Mean   :0.2377   Mean   :1.353  
##            3rd Qu.:1516.0   3rd Qu.:0.0000   3rd Qu.:2.000  
##            Max.   :4950.0   Max.   :3.0000   Max.   :8.000  
##       agew           educw           hearnw           wagew     
##  Min.   :30.00   Min.   : 5.00   Min.   : 0.000   Min.   :0.00  
##  1st Qu.:36.00   1st Qu.:12.00   1st Qu.: 0.000   1st Qu.:0.00  
##  Median :43.00   Median :12.00   Median : 1.625   Median :0.00  
##  Mean   :42.54   Mean   :12.29   Mean   : 2.375   Mean   :1.85  
##  3rd Qu.:49.00   3rd Qu.:13.00   3rd Qu.: 3.788   3rd Qu.:3.58  
##  Max.   :60.00   Max.   :17.00   Max.   :25.000   Max.   :9.98  
##      hoursh          ageh           educh           wageh        
##  Min.   : 175   Min.   :30.00   Min.   : 3.00   Min.   : 0.4121  
##  1st Qu.:1928   1st Qu.:38.00   1st Qu.:11.00   1st Qu.: 4.7883  
##  Median :2164   Median :46.00   Median :12.00   Median : 6.9758  
##  Mean   :2267   Mean   :45.12   Mean   :12.49   Mean   : 7.4822  
##  3rd Qu.:2553   3rd Qu.:52.00   3rd Qu.:15.00   3rd Qu.: 9.1667  
##  Max.   :5010   Max.   :60.00   Max.   :17.00   Max.   :40.5090  
##      income          educwm           educwf         unemprate     
##  Min.   : 1500   Min.   : 0.000   Min.   : 0.000   Min.   : 3.000  
##  1st Qu.:15428   1st Qu.: 7.000   1st Qu.: 7.000   1st Qu.: 7.500  
##  Median :20880   Median :10.000   Median : 7.000   Median : 7.500  
##  Mean   :23081   Mean   : 9.251   Mean   : 8.809   Mean   : 8.624  
##  3rd Qu.:28200   3rd Qu.:12.000   3rd Qu.:12.000   3rd Qu.:11.000  
##  Max.   :96000   Max.   :17.000   Max.   :17.000   Max.   :14.000  
##   city       experience   
##  no :269   Min.   : 0.00  
##  yes:484   1st Qu.: 4.00  
##            Median : 9.00  
##            Mean   :10.63  
##            3rd Qu.:15.00  
##            Max.   :45.00

Mroz <- tbl_df(Mroz)
Mroz

## Source: local data frame [753 x 18]
## 
##      work hoursw child6 child618  agew educw hearnw wagew hoursh  ageh
##    (fctr)  (int)  (int)    (int) (int) (int)  (dbl) (dbl)  (int) (int)
## 1      no   1610      1        0    32    12 3.3540  2.65   2708    34
## 2      no   1656      0        2    30    12 1.3889  2.65   2310    30
## 3      no   1980      1        3    35    12 4.5455  4.04   3072    40
## 4      no    456      0        3    34    12 1.0965  3.25   1920    53
## 5      no   1568      1        2    31    14 4.5918  3.60   2000    32
## 6      no   2032      0        0    54    12 4.7421  4.70   1040    57
## 7      no   1440      0        2    37    16 8.3333  5.95   2670    37
## 8      no   1020      0        0    54    12 7.8431  9.98   4120    53
## 9      no   1458      0        2    48    12 2.1262  0.00   1995    52
## 10     no   1600      0        2    39    12 4.6875  4.15   2100    43
## ..    ...    ...    ...      ...   ...   ...    ...   ...    ...   ...
## Variables not shown: educh (int), wageh (dbl), income (int), educwm (int),
##   educwf (int), unemprate (dbl), city (fctr), experience (int)

I decide to run my estimates of Pearson Product-Moment Correlations using the variables hoursw, income, hearnw, and experience. So I select those variables from the data and create a new data frame.

wifedata <- Mroz %>%
  select(hoursw, income, hearnw, experience)
wifedata

## Source: local data frame [753 x 4]
## 
##    hoursw income hearnw experience
##     (int)  (int)  (dbl)      (int)
## 1    1610  16310 3.3540         14
## 2    1656  21800 1.3889          5
## 3    1980  21040 4.5455         15
## 4     456   7300 1.0965          6
## 5    1568  27300 4.5918          7
## 6    2032  19495 4.7421         33
## 7    1440  21152 8.3333         11
## 8    1020  18900 7.8431         35
## 9    1458  20405 2.1262         24
## 10   1600  20425 4.6875         21
## ..    ...    ...    ...        ...

summary(wifedata)

##      hoursw           income          hearnw         experience   
##  Min.   :   0.0   Min.   : 1500   Min.   : 0.000   Min.   : 0.00  
##  1st Qu.:   0.0   1st Qu.:15428   1st Qu.: 0.000   1st Qu.: 4.00  
##  Median : 288.0   Median :20880   Median : 1.625   Median : 9.00  
##  Mean   : 740.6   Mean   :23081   Mean   : 2.375   Mean   :10.63  
##  3rd Qu.:1516.0   3rd Qu.:28200   3rd Qu.: 3.788   3rd Qu.:15.00  
##  Max.   :4950.0   Max.   :96000   Max.   :25.000   Max.   :45.00

I then run my Pearson Product-Moment Correlations on the data to find the strength of the linear association between the variables (r), which will serve as my test statistic in testing the null hypothesis. This also produces a correlogram of my variables.

However, I must also load the correlation package by using the following code.

source("http://www.sthda.com/upload/rquery_cormat.r")
rquery.cormat(wifedata)

## $r
##            income experience hoursw hearnw
## income          1                         
## experience -0.028          1              
## hoursw       0.15        0.4      1       
## hearnw       0.23       0.25   0.42      1
## 
## $p
##             income experience hoursw hearnw
## income           0                         
## experience    0.45          0              
## hoursw     5.6e-05          0      0       
## hearnw     1.4e-10      3e-12      0      0
## 
## $sym
##            income experience hoursw hearnw
## income     1                              
## experience        1                       
## hoursw            .          1            
## hearnw                       .      1     
## attr(,"legend")
## [1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1

Using this data I’m ready to test my Null Hypothesis for 4 pairs of variables. The Null Hypothesis states \(H_{0}\): \(\rho\) = 0, its alternative is \(H_{1}\): \(\rho\) \(\neq\) 0. In other words if \(\rho\) is equal to 0 then there is no correlation between the variables. The alternative hypothesis states that if \(\rho\) is not equal to 0, then there is some correlation between the variables.

I have established my level of type 1 error as \(\alpha\)=0.05.

My test of correlation between income (family income in 1975 dollars) and hoursw (wife’s hours of work in 1975) shows a small positive correlation between the two variables (r = 0.15, with p = 5.6e-05). Since p is less than \(\alpha\) I’m able to reject the null hypothesis and state that there is a small positive correlation between family income and the number of hours a wife worked in 1975. In other words the more a wife worked, the greater the family income and vice versa.
My test of correlation between income and hearnw (wife’s average hourly earnings, in 1975 dollars) shows a small positive correlation between the two variables (r = 0.23, with p = 1.4e-10). Since p is less than \(\alpha\) I’m able to reject the null hypothesis and state that there is a small positive correlation between family income and the wife’s average hourly earnings. In other words the more a wife earned per hour the greater, the family income and vice versa.
My test of correlation between experience (actual years of wife’s previous labor market experience) and hoursw shows a medium positive correlation between the two variables (r = 0.4, with p = 0). Since p is less than \(\alpha\) I’m able to reject the null hypothesis and state that there is a medium positive correlation between a wife’s work experience and the number of hours she worked. In other words the more experience she had, the more she worked and vice versa.
My test of correlation between experience and hearnw shows a small positive correlation between the two variables (r = .25, with p = 3e-12). Since p is less than \(\alpha\) I’m able to reject the null hypothesis and state that there is a small positive correlation between a wife’s work experience and the amount she earned per hour. In other words the more experience she had, the more money she made and vice versa.

Here is a heat map which is a visual representation of the correlations between the variables

cormat<-rquery.cormat(wifedata, graphType="heatmap")

Here are four scatterplots with a smooth line which also show these relationships.

wifedata %>% ggvis (~income, ~hoursw) %>% layer_points() %>% 
  layer_smooths()%>%
  add_axis("x", title = "1975 family income", title_offset = 50) %>%
  add_axis("y", title = "Wife's hours of work", title_offset = 50)

wifedata %>% ggvis (~income, ~hearnw) %>% layer_points() %>% 
  layer_smooths()%>%
  add_axis("x", title = "1975 family income", title_offset = 50) %>%
  add_axis("y", title = "Wife's average hourly earnings", title_offset = 50)

wifedata %>% ggvis (~experience, ~hoursw) %>% layer_points() %>% 
  layer_smooths()%>%
  add_axis("x", title = "Wife's work experience", title_offset = 50) %>%
  add_axis("y", title = "Wife's hours of work", title_offset = 50)

wifedata %>% ggvis (~experience, ~hearnw) %>% layer_points() %>% 
  layer_smooths()%>%
  add_axis("x", title = "Wife's work experience", title_offset = 50) %>%
  add_axis("y", title = "Wife's average hourly earnings", title_offset = 50)

WFED540, Assignment 4

Michael Zigner

November 22, 2015