Let’s revisit the 60,000 San Francisco OkCupid users in 2012 and consider the variable drug reflecting users’ answers to a question on their drug use.
Type in a series of commands that will output how many men and women there are.
## # A tibble: 2 × 2
## sex count
## <chr> <int>
## 1 f 24117
## 2 m 35829
Create a visualization that shows the distribution of the different responses in the variable drugs.
Create a visualization that shows the same information as in part a), but for men and women separately. Is a man more likely to never use drugs? Or a woman?
Roughly 78% of the women that answered the drug question never used drugs vs. roughly 86% of men that answered the question. A man is more likely to never use drugs.
In lines 54-55 above we made sure to convert all missing values, encoded in R as NA, to a specific response value “N/A” i.e. not answered. Why do you think it was important to do so? So that the actual count is represented accurately in the bar plot and the plot isn’t visually misleading
We consider the Gapminder data from Problem Set 04 again.
Output a table that allows you to answer the following two questions on the GDP of countries in the year 2007:
## # A tibble: 5 × 3
## continent mean std_dev
## <fctr> <dbl> <dbl>
## 1 Africa 3089.033 3618.163
## 2 Americas 11003.032 9713.209
## 3 Asia 12473.027 14154.937
## 4 Europe 25054.482 11800.340
## 5 Oceania 29810.188 6540.991
Finally, we consider survivor data from the Titanic. This is a data set that comes with R by default, so we don’t need to install any packages
# Do not modify any code in this block, however you still need to run it in your
# console to load the gapminder data set, which is built into R by default.
data(Titanic)
# Convert the Titanic data to data frame format
Titanic <- Titanic %>%
as.data.frame()
What demographic attributes can be used to describe each passenger before they boarded the ship? Class, sex, and age
Output tables that compare survivor counts
## Source: local data frame [4 x 3]
## Groups: Sex [?]
##
## Sex Survived count
## <fctr> <fctr> <dbl>
## 1 Male No 1364
## 2 Male Yes 367
## 3 Female No 126
## 4 Female Yes 344
## Source: local data frame [8 x 3]
## Groups: Class [?]
##
## Class Survived count
## <fctr> <fctr> <dbl>
## 1 1st No 122
## 2 1st Yes 203
## 3 2nd No 167
## 4 2nd Yes 118
## 5 3rd No 528
## 6 3rd Yes 178
## 7 Crew No 673
## 8 Crew Yes 212
For each comparison in part b), indicate who was most likely to survive. Females were more likely to survive than males because a larger proportion of females survived than males. First class passengers were the ones most likely to survive.