1.) To being, I have created a new project in R and have downloaded the data files and packages I will be working with:
require(Ecdat)
## Loading required package: Ecdat
## Loading required package: Ecfun
##
## Attaching package: 'Ecdat'
##
## The following object is masked from 'package:datasets':
##
## Orange
require(corrplot)
## Loading required package: corrplot
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require(ggvis)
## Loading required package: ggvis
require(magrittr)
## Loading required package: magrittr
data(Mroz)
names(Mroz)
## [1] "work" "hoursw" "child6" "child618" "agew"
## [6] "educw" "hearnw" "wagew" "hoursh" "ageh"
## [11] "educh" "wageh" "income" "educwm" "educwf"
## [16] "unemprate" "city" "experience"
summary(Mroz)
## work hoursw child6 child618
## yes:325 Min. : 0.0 Min. :0.0000 Min. :0.000
## no :428 1st Qu.: 0.0 1st Qu.:0.0000 1st Qu.:0.000
## Median : 288.0 Median :0.0000 Median :1.000
## Mean : 740.6 Mean :0.2377 Mean :1.353
## 3rd Qu.:1516.0 3rd Qu.:0.0000 3rd Qu.:2.000
## Max. :4950.0 Max. :3.0000 Max. :8.000
## agew educw hearnw wagew
## Min. :30.00 Min. : 5.00 Min. : 0.000 Min. :0.00
## 1st Qu.:36.00 1st Qu.:12.00 1st Qu.: 0.000 1st Qu.:0.00
## Median :43.00 Median :12.00 Median : 1.625 Median :0.00
## Mean :42.54 Mean :12.29 Mean : 2.375 Mean :1.85
## 3rd Qu.:49.00 3rd Qu.:13.00 3rd Qu.: 3.788 3rd Qu.:3.58
## Max. :60.00 Max. :17.00 Max. :25.000 Max. :9.98
## hoursh ageh educh wageh
## Min. : 175 Min. :30.00 Min. : 3.00 Min. : 0.4121
## 1st Qu.:1928 1st Qu.:38.00 1st Qu.:11.00 1st Qu.: 4.7883
## Median :2164 Median :46.00 Median :12.00 Median : 6.9758
## Mean :2267 Mean :45.12 Mean :12.49 Mean : 7.4822
## 3rd Qu.:2553 3rd Qu.:52.00 3rd Qu.:15.00 3rd Qu.: 9.1667
## Max. :5010 Max. :60.00 Max. :17.00 Max. :40.5090
## income educwm educwf unemprate
## Min. : 1500 Min. : 0.000 Min. : 0.000 Min. : 3.000
## 1st Qu.:15428 1st Qu.: 7.000 1st Qu.: 7.000 1st Qu.: 7.500
## Median :20880 Median :10.000 Median : 7.000 Median : 7.500
## Mean :23081 Mean : 9.251 Mean : 8.809 Mean : 8.624
## 3rd Qu.:28200 3rd Qu.:12.000 3rd Qu.:12.000 3rd Qu.:11.000
## Max. :96000 Max. :17.000 Max. :17.000 Max. :14.000
## city experience
## no :269 Min. : 0.00
## yes:484 1st Qu.: 4.00
## Median : 9.00
## Mean :10.63
## 3rd Qu.:15.00
## Max. :45.00
Also, as directed, I will load the cormat function:
source("http://www.sthda.com/upload/rquery_cormat.r")
2.) Below I estimate the Pearson Product-Moment Correlations for four pairs of variables which I have selected from the Mroz dataset:
First I down-selected Mroz to include the four pairs of variables that I have selected: income, wageh, hoursw and hoursh:
variables <- Mroz %>%
select(income, wageh, hoursw, hoursh)
head(variables)
## income wageh hoursw hoursh
## 1 16310 4.0288 1610 2708
## 2 21800 8.4416 1656 2310
## 3 21040 3.5807 1980 3072
## 4 7300 3.5417 456 1920
## 5 27300 10.0000 1568 2000
## 6 19495 6.7106 2032 1040
Next I estimated the Pearson Product Moment correlation of the variables:
rquery.cormat(variables)
## $r
## income wageh hoursw hoursh
## income 1
## wageh 0.73 1
## hoursw 0.15 -0.099 1
## hoursh 0.13 -0.24 -0.056 1
##
## $p
## income wageh hoursw hoursh
## income 0
## wageh 0 0
## hoursw 5.6e-05 0.0068 0
## hoursh 0.00042 5.4e-11 0.12 0
##
## $sym
## income wageh hoursw hoursh
## income 1
## wageh , 1
## hoursw 1
## hoursh 1
## attr(,"legend")
## [1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1
3.) I have also tested the null hypotheses that the population correlations = 0 for the four pairs of variables selected.
Null Hypothesis: There is no correlation between the means of the income, age, hoursw and hoursh variables.
Alternate Hypothesis: There is a correlation between the means of the income, age, hoursw and hoursh variables.
I will set alpha equal to .05.
As noted in the results below, the p-values for the correlation between income and wageh (0), income and hoursw (5.6e-05), wageh and hoursw (.0068), income and hoursh (.00042) and wageh and hoursh (5.4e-11) are less than alpha, so I will reject the null hypothesis.
$p income wageh hoursw hoursh income 0
wageh 0 0
hoursw 5.6e-05 0.0068 0
hoursh 0.00042 5.4e-11 0.12 0
I will fail to reject the null hypothesis for the correlations between hoursw and hoursh because the p-value (.12) is larger than alpha (.05).
4.) Using ggvis, I have created scatterplots containing points and a smooth line for the four pairs of variable you selected.
Income and Wageh
variables %>% ggvis(~income, ~wageh) %>% layer_points() %>% layer_smooths() %>% add_axis("x", title = "income") %>% add_axis("y", title = "wageh")
Income and Hoursw
variables %>% ggvis(~income, ~hoursw) %>% layer_points() %>% layer_smooths() %>% add_axis("x", title = "income") %>% add_axis("y", title = "hoursw")
variables %>% ggvis(~income, ~hoursh) %>% layer_points() %>% layer_smooths() %>% add_axis("x", title = "income") %>% add_axis("y", title = "hoursh")
Hoursw and Wageh
variables %>% ggvis(~hoursw, ~wageh) %>% layer_points() %>% layer_smooths() %>% add_axis("x", title = "hoursw") %>% add_axis("y", title = "wageh")
5.) Finally, I have produced some visual representations of the variable correlations:
First, two correlograms:
rquery.cormat(variables)
## $r
## income wageh hoursw hoursh
## income 1
## wageh 0.73 1
## hoursw 0.15 -0.099 1
## hoursh 0.13 -0.24 -0.056 1
##
## $p
## income wageh hoursw hoursh
## income 0
## wageh 0 0
## hoursw 5.6e-05 0.0068 0
## hoursh 0.00042 5.4e-11 0.12 0
##
## $sym
## income wageh hoursw hoursh
## income 1
## wageh , 1
## hoursw 1
## hoursh 1
## attr(,"legend")
## [1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1
rquery.cormat(variables, type="full")
## $r
## income wageh hoursw hoursh
## income 1.00 0.730 0.150 0.130
## wageh 0.73 1.000 -0.099 -0.240
## hoursw 0.15 -0.099 1.000 -0.056
## hoursh 0.13 -0.240 -0.056 1.000
##
## $p
## income wageh hoursw hoursh
## income 0.0e+00 0.0e+00 5.6e-05 4.2e-04
## wageh 0.0e+00 0.0e+00 6.8e-03 5.4e-11
## hoursw 5.6e-05 6.8e-03 0.0e+00 1.2e-01
## hoursh 4.2e-04 5.4e-11 1.2e-01 0.0e+00
##
## $sym
## income wageh hoursw hoursh
## income 1
## wageh , 1
## hoursw 1
## hoursh 1
## attr(,"legend")
## [1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1
Now a heatmap:
cormat<-rquery.cormat(variables, graphType="heatmap")