Please Indicate

Question 1: Drug Use Amongst OkCupid Users

Let’s revisit the 60,000 San Francisco OkCupid users in 2012 and consider the variable drug reflecting users’ answers to a question on their drug use.

a)

Type in a series of commands that will output how many men and women there are.

## # A tibble: 2 × 2
##     sex count
##   <chr> <int>
## 1     f 24117
## 2     m 35829

b)

Create a visualization that shows the distribution of the different responses in the variable drugs.

c)

Create a visualization that shows the same information as in part a), but for men and women separately. Is a man more likely to never use drugs? Or a woman?

Roughly 78% of the women that answered the drug question never used drugs vs. roughly 86% of men that answered the question. A man is more likely to never use drugs.

d)

In lines 54-55 above we made sure to convert all missing values, encoded in R as NA, to a specific response value “N/A” i.e. not answered. Why do you think it was important to do so? So that the actual count is represented accurately in the bar plot and the plot isn’t visually misleading

Question 2: Gapminder

We consider the Gapminder data from Problem Set 04 again.

a)

Output a table that allows you to answer the following two questions on the GDP of countries in the year 2007:

  1. What is the richest continent per capita in 2007? Oceania
  2. Which continent seems to have the most consistent GDP per capita across its constituent countries in 2007? Africa
## # A tibble: 5 × 3
##   continent      mean   std_dev
##      <fctr>     <dbl>     <dbl>
## 1    Africa  3089.033  3618.163
## 2  Americas 11003.032  9713.209
## 3      Asia 12473.027 14154.937
## 4    Europe 25054.482 11800.340
## 5   Oceania 29810.188  6540.991

Question 3: Titanic

Finally, we consider survivor data from the Titanic. This is a data set that comes with R by default, so we don’t need to install any packages

# Do not modify any code in this block, however you still need to run it in your
# console to load the gapminder data set, which is built into R by default.
data(Titanic)

# Convert the Titanic data to data frame format
Titanic <- Titanic %>% 
  as.data.frame()

a)

What demographic attributes can be used to describe each passenger before they boarded the ship? Class, sex, and age

b)

Output tables that compare survivor counts

  1. between men and women
  2. between the different classes
## Source: local data frame [4 x 3]
## Groups: Sex [?]
## 
##      Sex Survived count
##   <fctr>   <fctr> <dbl>
## 1   Male       No  1364
## 2   Male      Yes   367
## 3 Female       No   126
## 4 Female      Yes   344
## Source: local data frame [8 x 3]
## Groups: Class [?]
## 
##    Class Survived count
##   <fctr>   <fctr> <dbl>
## 1    1st       No   122
## 2    1st      Yes   203
## 3    2nd       No   167
## 4    2nd      Yes   118
## 5    3rd       No   528
## 6    3rd      Yes   178
## 7   Crew       No   673
## 8   Crew      Yes   212

c)

For each comparison in part b), indicate who was most likely to survive. Females were more likely to survive than males because a larger proportion of females survived than males. First class passengers were the ones most likely to survive.