The structural of the data
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
| education | income | women | prestige | census | type | |
|---|---|---|---|---|---|---|
| gov.administrators | 13.11 | 12351 | 11.16 | 68.8 | 1113 | prof |
| general.managers | 12.26 | 25879 | 4.02 | 69.1 | 1130 | prof |
| accountants | 12.77 | 9271 | 15.70 | 63.4 | 1171 | prof |
| purchasing.officers | 11.42 | 8865 | 9.11 | 56.8 | 1175 | prof |
| chemists | 14.62 | 8403 | 11.68 | 73.5 | 2111 | prof |
| physicists | 15.64 | 11030 | 5.13 | 77.6 | 2113 | prof |
## Classes 'tbl_df', 'tbl' and 'data.frame': 102 obs. of 6 variables:
## $ education: num 13.1 12.3 12.8 11.4 14.6 ...
## $ income : int 12351 25879 9271 8865 8403 11030 8258 14163 11377 11023 ...
## $ women : num 11.16 4.02 15.7 9.11 11.68 ...
## $ prestige : num 68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 ...
## $ census : int 1113 1130 1171 1175 2111 2113 2133 2141 2143 2153 ...
## $ type : Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 2 2 2 ...
Comment on the chunk code
dta %>% group_by(type) %>% summarize(prestige_median=median(prestige, na.rm=T))
## Warning: Factor `type` contains implicit NA, consider using
## `forcats::fct_explicit_na`
## # A tibble: 4 x 2
## type prestige_median
## <fct> <dbl>
## 1 bc 35.9
## 2 prof 68.4
## 3 wc 41.5
## 4 <NA> 35
Comment on the chunk code
Conclusion: for the category of “blue collars”, in high levels of prestige, there is a positive relation between education and income; while in low levels of prestige, there is a very weak linear relationship between education and income.
dta1 <- dta %>% filter(type=="bc") %>% mutate(prestige_f = cut(prestige, breaks=quantile(prestige, probs=c(0, .50, 1)), label=c("Low", "High"), ordered=T))
dta1 %>% xyplot(income ~ education, groups=prestige_f, type=c("p","g","r"), data=., xlab = "education", ylab="income", auto.key=list(columns=2))
dta1 %>%
group_by(prestige_f) %>%
dplyr::summarize(r=cor(education, income))
## Warning: Factor `prestige_f` contains implicit NA, consider using
## `forcats::fct_explicit_na`
## # A tibble: 3 x 2
## prestige_f r
## <ord> <dbl>
## 1 Low 0.0403
## 2 High 0.450
## 3 <NA> NA
Conclusion: For the category of “professional”, in high levels of prestige, there is a very weak linear relation between education and income; while in low levels of prestige, there is a mild postive relationship between education and income. Interesting, the result is oppsite from the blue collar category.
dta2 <- dta %>% filter(type=="prof") %>% mutate(prestige_m = cut(prestige, breaks=quantile(prestige, probs=c(0, .50, 1)), label=c("Low", "High"), ordered=T))
dta2 %>% xyplot(income ~ education, groups=prestige_m, type=c("p","g","r"), data=., xlab = "education", ylab="income", auto.key=list(columns=2))
dta2 %>%
group_by(prestige_m) %>%
dplyr::summarize(r=cor(education, income))
## Warning: Factor `prestige_m` contains implicit NA, consider using
## `forcats::fct_explicit_na`
## # A tibble: 3 x 2
## prestige_m r
## <ord> <dbl>
## 1 Low 0.375
## 2 High 0.00000166
## 3 <NA> NA
Conclusion: For the category of “white collars”, in high levels of prestige, there is a positive relation between education and income; while in low levels of prestige, there is a negative relationship.
dta3 <- dta %>% filter(type=="wc") %>% mutate(prestige_f = cut(prestige, breaks=quantile(prestige, probs=c(0, .50, 1)), label=c("Low", "High"), ordered=T))
dta3 %>% xyplot(income ~ education, groups=prestige_f, type=c("p","g","r"), data=., xlab = "education", ylab="income", auto.key=list(columns=2))
dta3 %>%
group_by(prestige_f) %>%
dplyr::summarize(r=cor(education, income))
## Warning: Factor `prestige_f` contains implicit NA, consider using
## `forcats::fct_explicit_na`
## # A tibble: 3 x 2
## prestige_f r
## <ord> <dbl>
## 1 Low -0.155
## 2 High 0.277
## 3 <NA> NA