STAT545A Homework 5

Jack Ni

I will be using the Labour Force Survey data for this homework. The LFS is a national survey measuring many factors of employed (or unemployed) people. The information obtained contains sex, age group, education, hourly earnings, and job tenure. I will be working with this data to look for meaningful correlations.

lfs <- read.csv("LFS.csv")
str(lfs)
## 'data.frame':    136407 obs. of  12 variables:
##  $ YEAR     : int  1997 1997 1997 1997 1997 1997 1997 1997 1997 1997 ...
##  $ PROV     : Factor w/ 4 levels "AL","BC","ON",..: 3 1 2 3 3 3 3 4 3 3 ...
##  $ SEX      : Factor w/ 2 levels "Female","Male": 2 1 1 1 1 1 2 2 2 1 ...
##  $ MARRIED  : Factor w/ 2 levels "No","Yes": 2 2 1 2 2 1 2 1 2 2 ...
##  $ AGE      : Factor w/ 5 levels "15-24","25-34",..: 3 3 1 3 4 4 3 1 1 1 ...
##  $ EDUCATION: int  3 2 2 3 2 1 3 3 2 3 ...
##  $ FULLTIME : Factor w/ 2 levels "FT","PT": 1 1 2 1 1 1 1 1 1 1 ...
##  $ HOURS    : num  16 47 24 4 0 37.5 40 35 64 30 ...
##  $ TENURE   : int  140 61 22 80 240 204 74 8 32 33 ...
##  $ HRLYEARN : num  15 21.1 7 21 21.6 ...
##  $ UNION    : Factor w/ 2 levels "No","Yes": 2 1 2 2 2 2 1 2 1 1 ...
##  $ FWEIGHT  : int  66 310 257 541 86 267 726 680 335 164 ...
library(ggplot2)

Here, I make a four boxplots, one for each year, comparing age group with hourly earnings. While there doesn't seem to be many significant differences, the general trend seems to be a slight increase in median earnings as year increases. And the lowest earning age group is between 15-24 while the highest looks to be 45-54.

ggplot(lfs, aes(x = AGE, y = HRLYEARN)) + geom_boxplot() + facet_grid(~YEAR)

plot of chunk unnamed-chunk-3

I also wanted to see if being in a union makes a difference in the amount of hours worked. To do this, I made a density plot. There doesn't look to be too much of a difference.

ggplot(lfs, aes(x = HOURS, color = UNION)) + geom_density()

plot of chunk unnamed-chunk-4

Here I made a bar graph of level of education for men and women. There seems to be more men at a pre-highschool level of education than women. The other education levels (high school diploma, some undergraduate experience, graduate) look to be evenly split.

ggplot(lfs, aes(x = EDUCATION, color = SEX)) + geom_histogram()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust
## this.

plot of chunk unnamed-chunk-5

This is a smooth line graph of tenure by hourly earnings. While it makes sense that the longer the job tenure, the higher the earnings generally, there looks to be an even level between 150-200 days worked.

ggplot(subset(lfs, YEAR == "2012"), aes(x = TENURE, y = HRLYEARN)) + geom_smooth()
## geom_smooth: method="auto" and size of largest group is >=1000, so using
## gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the
## smoothing method.

plot of chunk unnamed-chunk-6