For this problem set I worked with: myself
Load necessary packages:
In this question, you’ll be analyzing the age of members of the United States congress over the years. The data of interest is saved in the congress_age data frame included in the fivethirtyeight package. To understand the data’s context, first read:
congress_age data frame by running ?congress_age in the console.Take the congress_age data frame and perform the data wrangling necessary to create the first visualization in the article. Save the output in a data frame called avg_congress_age. Hint: avg_congress_age should have 68 rows and 3 variables: termstart, party, and mean_age.
avg_congress_age <- congress_age %>%
filter(party=="R" | party=="D") %>%
group_by(termstart, party) %>%
summarize(mean_age=mean(age))
glimpse(avg_congress_age)## Observations: 68
## Variables: 3
## Groups: termstart [34]
## $ termstart <date> 1947-01-03, 1947-01-03, 1949-01-03, 1949-01-03, 1951-…
## $ party <chr> "D", "R", "D", "R", "D", "R", "D", "R", "D", "R", "D",…
## $ mean_age <dbl> 52.00688, 52.96961, 51.43030, 54.60357, 52.29589, 54.3…
Take the avg_congress_age data frame and perform the data wrangling necessary to create a new variable mean_age_months that has the mean age of congress members in months. Overwrite the original avg_congress_age data frame that had 68 rows and 3 variables with this new data frame that has 68 rows and 4 variables.
avg_congress_age <- avg_congress_age %>%
mutate(mean_age_months = mean_age*12)
glimpse(avg_congress_age)## Observations: 68
## Variables: 4
## Groups: termstart [34]
## $ termstart <date> 1947-01-03, 1947-01-03, 1949-01-03, 1949-01-03,…
## $ party <chr> "D", "R", "D", "R", "D", "R", "D", "R", "D", "R"…
## $ mean_age <dbl> 52.00688, 52.96961, 51.43030, 54.60357, 52.29589…
## $ mean_age_months <dbl> 624.0826, 635.6353, 617.1636, 655.2429, 627.5507…
Using the avg_congress_age data frame, use the ggplot2 package to recreate the first visualization in the article as follows:
In other words, your plot should look like this.
ggplot(data=avg_congress_age, mapping=aes(x=termstart, y=mean_age, color=party))+
geom_line()+
labs(x="termstart", y="mean_age", color="Party") +
labs(x="Date", y="Average age", color="party", title="Average Age of Members of Congress", subtitle="At start of term, 1947-2013")+
ylim(40,60.3)+
scale_color_manual(values=c("#384AFB","#f03b20"))Load the titanic dataset from the internet and take a look at it’s contents. It contains demographic information about the 2201 passenengers on the Titanic disaster and information on whether they survived. Note there are 2201 rows in this data, one for each passenger:
Using dplyr commands, output a table that displays the counts of survived & died. Your code should print out a table with two columns and two rows of data, along with a “header” row of the variable names.
## # A tibble: 2 x 2
## Survived count
## <chr> <int>
## 1 No 1490
## 2 Yes 711
Survival split by sex. Using dplyr commands, output a single table that displays the overall survival & death counts of the disaster split by sex (as recorded at the time). Your code should print out a table with three columns and four rows of data, along with a “header” row of the variable names.
## # A tibble: 4 x 3
## # Groups: Survived [2]
## Survived Sex count
## <chr> <chr> <int>
## 1 No Female 126
## 2 No Male 1364
## 3 Yes Female 344
## 4 Yes Male 367
Using dplyr commands, output a table that displays only the passenger_number of all 17 3rd class female children aboard the ship who died. Your code should print out a table with one column and 17 rows, along with a “header” row of the variable names. Hint: skim through ModernDive Chapter 3 on how to do this.
titanicnew <- titanic %>%
filter(Class=="3rd" & Sex=="Female" & Age=="Child" & Survived=="No") %>% select(passenger_number) %>%
print(n=17)## # A tibble: 17 x 1
## passenger_number
## <dbl>
## 1 37
## 2 199
## 3 246
## 4 286
## 5 303
## 6 329
## 7 902
## 8 1046
## 9 1240
## 10 1242
## 11 1352
## 12 1374
## 13 1398
## 14 1602
## 15 1661
## 16 1871
## 17 2026