(1) Create a .CSV file that includes all of the information above.

The CSV file is saved in the Github.

(2) Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data.

library(tidyr)
## Warning: package 'tidyr' was built under R version 3.2.5
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.5
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lattice)
## Warning: package 'lattice' was built under R version 3.2.5
require(knitr)
## Loading required package: knitr
## Warning: package 'knitr' was built under R version 3.2.5
wage<- read.csv("https://raw.githubusercontent.com/xkong100/IS607/master/Project2/Wages1.csv", stringsAsFactors = FALSE, check.names = FALSE, na.strings = c("", "NA"))
kable(head(wage))
exper sex school wage
1 9 female 13 6.315296
2 12 female 12 5.479770
3 11 female 11 3.642170
4 9 female 14 4.593337
5 8 female 14 2.418157
6 9 female 14 2.094058

To find the wages of different genders whose experince is 10 years and the education year is 14, we need to organize our data.

wage <- data.frame(wage)
wage1 <- wage %>% spread (sex, wage)
wage1$female[is.na(wage1$female)] <-0
wage1$male[is.na(wage1$male)] <- 0
wage2 <- wage1 %>% filter(exper == 10)
wage3 <- wage2 %>% filter (school == 14)
wage4 <- wage3 %>% arrange(exper)
colnames(wage4)[1]="Worker_id"
kable(head(wage4))
Worker_id exper school female male
11 10 14 6.736894 0
26 10 14 11.330230 0
31 10 14 4.287114 0
321 10 14 5.756982 0
572 10 14 9.834421 0
575 10 14 15.127389 0

After we organize our data, we can use ‘summarise’ to analyze the data.

wage4 %>% summarise (Mean_femalewage_10yrs = mean(female), max_femalewage_10yrs = max(female), mean_malewage_10yrs= mean(male), max_malewage_10yrs=max(male))
##   Mean_femalewage_10yrs max_femalewage_10yrs mean_malewage_10yrs
## 1               3.79048             15.12739            3.283094
##   max_malewage_10yrs
## 1           16.65645

When we control the experience year is 10 years and education is 14 years, Female’s mean wage is higher than male’s wage

We also want to find out if the education will affect wage rate. We’ll set experience year=10 years.

wage5 <- wage %>% filter(exper == 10)
wage6 <- wage5 %>% arrange(school)
kable(head(wage6))
Var.1 exper sex school wage
495 10 female 5 0.6031654
2122 10 male 7 4.0355397
3038 10 male 7 1.9324717
3058 10 male 7 3.8663337
3200 10 male 7 3.2663727
434 10 female 8 6.1480953
xyplot(wage6$school ~ wage6$wage, data = wage6,
  xlab = "Education years log scale",
  ylab = "Wage log scale",
  main = "Education and wage"
)

Clearly, by looking at the xy-plot, we find that there is a positive relationship between education and wage. When worker have more years in shcool, the wage tends to be higher.