The CSV file is saved in the Github.
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.2.5
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lattice)
## Warning: package 'lattice' was built under R version 3.2.5
require(knitr)
## Loading required package: knitr
## Warning: package 'knitr' was built under R version 3.2.5
wage<- read.csv("https://raw.githubusercontent.com/xkong100/IS607/master/Project2/Wages1.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = c("", "NA"))
kable(head(wage))
| exper | sex | school | wage | |
|---|---|---|---|---|
| 1 | 9 | female | 13 | 6.315296 |
| 2 | 12 | female | 12 | 5.479770 |
| 3 | 11 | female | 11 | 3.642170 |
| 4 | 9 | female | 14 | 4.593337 |
| 5 | 8 | female | 14 | 2.418157 |
| 6 | 9 | female | 14 | 2.094058 |
wage <- data.frame(wage)
wage1 <- wage %>% spread (sex, wage)
wage1$female[is.na(wage1$female)] <-0
wage1$male[is.na(wage1$male)] <- 0
wage2 <- wage1 %>% filter(exper == 10)
wage3 <- wage2 %>% filter (school == 14)
wage4 <- wage3 %>% arrange(exper)
colnames(wage4)[1]="Worker_id"
kable(head(wage4))
| Worker_id | exper | school | female | male |
|---|---|---|---|---|
| 11 | 10 | 14 | 6.736894 | 0 |
| 26 | 10 | 14 | 11.330230 | 0 |
| 31 | 10 | 14 | 4.287114 | 0 |
| 321 | 10 | 14 | 5.756982 | 0 |
| 572 | 10 | 14 | 9.834421 | 0 |
| 575 | 10 | 14 | 15.127389 | 0 |
wage4 %>% summarise (Mean_femalewage_10yrs = mean(female), max_femalewage_10yrs = max(female), mean_malewage_10yrs= mean(male), max_malewage_10yrs=max(male))
## Mean_femalewage_10yrs max_femalewage_10yrs mean_malewage_10yrs
## 1 3.79048 15.12739 3.283094
## max_malewage_10yrs
## 1 16.65645
When we control the experience year is 10 years and education is 14 years, Female’s mean wage is higher than male’s wage
wage5 <- wage %>% filter(exper == 10)
wage6 <- wage5 %>% arrange(school)
kable(head(wage6))
| Var.1 | exper | sex | school | wage |
|---|---|---|---|---|
| 495 | 10 | female | 5 | 0.6031654 |
| 2122 | 10 | male | 7 | 4.0355397 |
| 3038 | 10 | male | 7 | 1.9324717 |
| 3058 | 10 | male | 7 | 3.8663337 |
| 3200 | 10 | male | 7 | 3.2663727 |
| 434 | 10 | female | 8 | 6.1480953 |
xyplot(wage6$school ~ wage6$wage, data = wage6,
xlab = "Education years log scale",
ylab = "Wage log scale",
main = "Education and wage"
)
Clearly, by looking at the xy-plot, we find that there is a positive relationship between education and wage. When worker have more years in shcool, the wage tends to be higher.