The Assignment

Using the Mroz Labor Supply Dataset from the Ecdat Package, we are to select four continuous variables on which to estimate Pearson Product-Moment Correlations.

First, I load and examine the packages and the data that I will need.

##  [1] "work"       "hoursw"     "child6"     "child618"   "agew"      
##  [6] "educw"      "hearnw"     "wagew"      "hoursh"     "ageh"      
## [11] "educh"      "wageh"      "income"     "educwm"     "educwf"    
## [16] "unemprate"  "city"       "experience"
##   work         hoursw           child6          child618    
##  yes:325   Min.   :   0.0   Min.   :0.0000   Min.   :0.000  
##  no :428   1st Qu.:   0.0   1st Qu.:0.0000   1st Qu.:0.000  
##            Median : 288.0   Median :0.0000   Median :1.000  
##            Mean   : 740.6   Mean   :0.2377   Mean   :1.353  
##            3rd Qu.:1516.0   3rd Qu.:0.0000   3rd Qu.:2.000  
##            Max.   :4950.0   Max.   :3.0000   Max.   :8.000  
##       agew           educw           hearnw           wagew     
##  Min.   :30.00   Min.   : 5.00   Min.   : 0.000   Min.   :0.00  
##  1st Qu.:36.00   1st Qu.:12.00   1st Qu.: 0.000   1st Qu.:0.00  
##  Median :43.00   Median :12.00   Median : 1.625   Median :0.00  
##  Mean   :42.54   Mean   :12.29   Mean   : 2.375   Mean   :1.85  
##  3rd Qu.:49.00   3rd Qu.:13.00   3rd Qu.: 3.788   3rd Qu.:3.58  
##  Max.   :60.00   Max.   :17.00   Max.   :25.000   Max.   :9.98  
##      hoursh          ageh           educh           wageh        
##  Min.   : 175   Min.   :30.00   Min.   : 3.00   Min.   : 0.4121  
##  1st Qu.:1928   1st Qu.:38.00   1st Qu.:11.00   1st Qu.: 4.7883  
##  Median :2164   Median :46.00   Median :12.00   Median : 6.9758  
##  Mean   :2267   Mean   :45.12   Mean   :12.49   Mean   : 7.4822  
##  3rd Qu.:2553   3rd Qu.:52.00   3rd Qu.:15.00   3rd Qu.: 9.1667  
##  Max.   :5010   Max.   :60.00   Max.   :17.00   Max.   :40.5090  
##      income          educwm           educwf         unemprate     
##  Min.   : 1500   Min.   : 0.000   Min.   : 0.000   Min.   : 3.000  
##  1st Qu.:15428   1st Qu.: 7.000   1st Qu.: 7.000   1st Qu.: 7.500  
##  Median :20880   Median :10.000   Median : 7.000   Median : 7.500  
##  Mean   :23081   Mean   : 9.251   Mean   : 8.809   Mean   : 8.624  
##  3rd Qu.:28200   3rd Qu.:12.000   3rd Qu.:12.000   3rd Qu.:11.000  
##  Max.   :96000   Max.   :17.000   Max.   :17.000   Max.   :14.000  
##   city       experience   
##  no :269   Min.   : 0.00  
##  yes:484   1st Qu.: 4.00  
##            Median : 9.00  
##            Mean   :10.63  
##            3rd Qu.:15.00  
##            Max.   :45.00

1. Select four continuous variables from Mroz

I selected the following:

  • hoursw - Wife’s hours of work in 1975
  • income - Family income, in 1975 dollars
  • educw - Wife’s educational attainment, in years
  • agew - Wife’s age
##      hoursw           income          educw            agew      
##  Min.   :   0.0   Min.   : 1500   Min.   : 5.00   Min.   :30.00  
##  1st Qu.:   0.0   1st Qu.:15428   1st Qu.:12.00   1st Qu.:36.00  
##  Median : 288.0   Median :20880   Median :12.00   Median :43.00  
##  Mean   : 740.6   Mean   :23081   Mean   :12.29   Mean   :42.54  
##  3rd Qu.:1516.0   3rd Qu.:28200   3rd Qu.:13.00   3rd Qu.:49.00  
##  Max.   :4950.0   Max.   :96000   Max.   :17.00   Max.   :60.00

2. Estimate Pearson Product-Moment Correlations for four pairs of variables.

## $r
##          agew hoursw income educw
## agew        1                    
## hoursw -0.033      1             
## income  0.052   0.15      1      
## educw   -0.12   0.11   0.36     1
## 
## $p
##           agew  hoursw income educw
## agew         0                     
## hoursw    0.36       0             
## income    0.15 5.6e-05      0      
## educw  0.00095  0.0036      0     0
## 
## $sym
##        agew hoursw income educw
## agew   1                       
## hoursw      1                  
## income             1           
## educw              .      1    
## attr(,"legend")
## [1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1

3. Test null hypotheses that the population correlations = 0 for the four pairs of variables you selected.

I use the resulting table of p-values from the Pearson Product-Moment Correlations to determine whether there is a correlation between each pair of variables.

Correlation between income (family income, in 1975 dollars) and agew (Wife’s age)

My hypotheses are as follows:

\(H_{0}: \rho_ = 0\)

Alternatively

\(H_{1}: \rho_ \neq 0\)

I set \(\alpha\) = 0.05.

I find that my p value, .15, is >.05 (my \(\alpha\)). Therefore, I fail to reject the null hypothesis that there is no correlation between these two variables.

Correlation between income (family income, in 1975 dollars) and hoursw (Wife’s hours of work in 1975)

My hypotheses are as follows:

\(H_{0}: \rho_ = 0\)

Alternatively

\(H_{1}: \rho_ \neq 0\)

I set \(\alpha\) = 0.05.

I find that my p value, 5.6e-05, is <.05 (my \(\alpha\)). Therefore, I reject the null hypothesis that there is no correlation between these two variables.

Correlation between educw (Wife’s educational attainment, in years) and agew (Wife’s age)

My hypotheses are as follows:

\(H_{0}: \rho_ = 0\)

Alternatively

\(H_{1}: \rho_ \neq 0\)

I set \(\alpha\) = 0.05.

I find that my p value, .00095, is <.05 (my \(\alpha\)). Therefore, I reject the null hypothesis that there is no correlation between these two variables.

Correlation between educw (Wife’s educational attainment, in years) and hoursw (Wife’s hours of work in 1975)

My hypotheses are as follows:

\(H_{0}: \rho_ = 0\)

Alternatively

\(H_{1}: \rho_ \neq 0\)

I set \(\alpha\) = 0.05.

I find that my p value, .0036, is <.05 (my \(\alpha\)). Therefore, I fail to reject the null hypothesis that there is no correlation between these two variables.

4. Using ggvis, plot scatterplots containing points and a smooth line for the four pairs of variable you selected.

5. Produce correlograms and heat maps for the four pairs of variables you selected.

Correlogram:

Heat map: