The relationship between income and education for each category generated from crossing the factor prestige with the type of occupation

The structural of the data

There are 102 obs. of 6 variables.
There are three types of occupation. “Bc” means “blue_collar” and prof means “professional” and “wc” means “white_collar”. We will discuss the prestige scores and the relationship of education and income within different types of occupation later.

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

	education	income	women	prestige	census	type
gov.administrators	13.11	12351	11.16	68.8	1113	prof
general.managers	12.26	25879	4.02	69.1	1130	prof
accountants	12.77	9271	15.70	63.4	1171	prof
purchasing.officers	11.42	8865	9.11	56.8	1175	prof
chemists	14.62	8403	11.68	73.5	2111	prof
physicists	15.64	11030	5.13	77.6	2113	prof

## Classes 'tbl_df', 'tbl' and 'data.frame':    102 obs. of  6 variables:
##  $ education: num  13.1 12.3 12.8 11.4 14.6 ...
##  $ income   : int  12351 25879 9271 8865 8403 11030 8258 14163 11377 11023 ...
##  $ women    : num  11.16 4.02 15.7 9.11 11.68 ...
##  $ prestige : num  68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 ...
##  $ census   : int  1113 1130 1171 1175 2111 2113 2133 2141 2143 2153 ...
##  $ type     : Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 2 2 2 ...

Comment on the chunk code

group the data by different types of occpation.
summarize the median prestige scores in different types of occupation. We omitted the missing data.
The “professional” got the highest median prestige scores among three types of occupation.

dta %>% group_by(type) %>% summarize(prestige_median=median(prestige, na.rm=T))

## Warning: Factor `type` contains implicit NA, consider using
## `forcats::fct_explicit_na`

## # A tibble: 4 x 2
##   type  prestige_median
##   <fct>           <dbl>
## 1 bc               35.9
## 2 prof             68.4
## 3 wc               41.5
## 4 <NA>             35

Comment on the chunk code

choose the rows of “bc”
Use the median score in each type of occupation to define two levels of prestige: High and low
name the new variable “prestige_f”
draw the scatter diagram.

Conclusion: for the category of “blue collars”, in high levels of prestige, there is a positive relation between education and income; while in low levels of prestige, there is a very weak linear relationship between education and income.

dta1 <- dta %>% filter(type=="bc") %>% mutate(prestige_f = cut(prestige, breaks=quantile(prestige, probs=c(0, .50, 1)), label=c("Low", "High"), ordered=T)) 
dta1 %>% xyplot(income ~ education, groups=prestige_f, type=c("p","g","r"), data=., xlab = "education", ylab="income", auto.key=list(columns=2))

dta1 %>%
  group_by(prestige_f) %>%
  dplyr::summarize(r=cor(education, income))

## Warning: Factor `prestige_f` contains implicit NA, consider using
## `forcats::fct_explicit_na`

## # A tibble: 3 x 2
##   prestige_f       r
##   <ord>        <dbl>
## 1 Low         0.0403
## 2 High        0.450 
## 3 <NA>       NA

Conclusion: For the category of “professional”, in high levels of prestige, there is a very weak linear relation between education and income; while in low levels of prestige, there is a mild postive relationship between education and income. Interesting, the result is oppsite from the blue collar category.

dta2 <- dta %>% filter(type=="prof") %>% mutate(prestige_m = cut(prestige, breaks=quantile(prestige, probs=c(0, .50, 1)), label=c("Low", "High"), ordered=T)) 
dta2 %>% xyplot(income ~ education, groups=prestige_m, type=c("p","g","r"), data=., xlab = "education", ylab="income", auto.key=list(columns=2))

dta2 %>%
  group_by(prestige_m) %>%
  dplyr::summarize(r=cor(education, income))

## Warning: Factor `prestige_m` contains implicit NA, consider using
## `forcats::fct_explicit_na`

## # A tibble: 3 x 2
##   prestige_m           r
##   <ord>            <dbl>
## 1 Low         0.375     
## 2 High        0.00000166
## 3 <NA>       NA

Conclusion: For the category of “white collars”, in high levels of prestige, there is a positive relation between education and income; while in low levels of prestige, there is a negative relationship.

dta3 <- dta %>% filter(type=="wc") %>% mutate(prestige_f = cut(prestige, breaks=quantile(prestige, probs=c(0, .50, 1)), label=c("Low", "High"), ordered=T)) 
dta3 %>% xyplot(income ~ education, groups=prestige_f, type=c("p","g","r"), data=., xlab = "education", ylab="income", auto.key=list(columns=2))

dta3 %>%
  group_by(prestige_f) %>%
  dplyr::summarize(r=cor(education, income))

## Warning: Factor `prestige_f` contains implicit NA, consider using
## `forcats::fct_explicit_na`

## # A tibble: 3 x 2
##   prestige_f      r
##   <ord>       <dbl>
## 1 Low        -0.155
## 2 High        0.277
## 3 <NA>       NA

The relationship between income and education for each category generated from crossing the factor prestige with the type of occupation

Chen Meng-ting(Eunice Chen)