Lab Exercise

Convert the class column in mpg data set into a factor with an order of levels following size (from small to large): 2seater, subcompact, compact, midsize, suv, minivan, pickup.

sort(levels(factor(mpg$class, levels=c('2seater', 'subcompact', 'compact', 'midsize', 'suv', 'minivan', 'pickup'))))
## [1] "2seater"    "compact"    "midsize"    "minivan"    "pickup"    
## [6] "subcompact" "suv"

What are the levels of marital in gss_cat data set? Which level is the most common one?

ggplot(gss_cat, aes(x=fct_infreq(marital)))+
  geom_bar()+
  scale_x_discrete(drop = F)

married is the most common level

In flights data set, create a graph of average arrival delay time vs destination airports after factor reordering.

flights%>%
  group_by(dest)%>%
  summarise(mean_arr_delay=mean(arr_delay, na.rm=T))%>%
  drop_na(mean_arr_delay)%>%
  ggplot(aes(y = fct_reorder(dest, mean_arr_delay), x = mean_arr_delay)) +
  geom_col()

Update the levels of rincome in gss_cat into three categories, $10000 or more, less than $10000 and Others.

gss_cat%>%
  mutate(rincome=fct_collapse(rincome,
                                    "$10000 or more"=c("$25000 or more", "$20000 - 24999", "$15000 - 19999", "$10000 - 14999"),
                                    "less than $10000"=c("Lt $1000", "$1000 to 2999", "$3000 to 3999", "$4000 to 4999", "$5000 to 5999", "$6000 to 6999", "$7000 to 7999", "$8000 to 9999"),
                                    "Others"=c("No answer", "Don't know", "Refused", "Not applicable")))%>%
  count(rincome, sort=T)
## # A tibble: 3 × 2
##   rincome              n
##   <fct>            <int>
## 1 $10000 or more   10862
## 2 Others            8468
## 3 less than $10000  2153