Administrative

Please indicate

  • Roughly how much time you spent on this HW so far: about 4 hours
  • The URL of the RPubs published URL here.
  • What gave you the most trouble: formatting grid lines and making arrows for labels.
  • Any comments you have:

Question 1:

Use the mlb_teams.csv data set to create an informative data graphic that illustrates the relationship between winning percentage (WPct) and payroll in context.

mlb_teams <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/mlb_teams.csv")
library(tidyverse)
library(ggthemes)
winpct <- ggplot(mlb_teams, aes(x = payroll/1000, y = WPct*100))+
  geom_point()+
  labs(y = "Winning percentage", x = "Payroll ($ thousands)")+
  ggtitle("MLB Winning Percentage and Payroll", subtitle = "2008-2014")+
  geom_smooth(color = "red", method = "lm", se = FALSE)+
  theme_economist()+
  theme(axis.line.x = element_blank())+
  ylim(30,65)
winpct

Question 2:

Using data from the nasaweather R package, use the path geometry (i.e. use a geom_path layer) to plot the path of each tropical storm in the storms data table. Use color to distinguish the storms from one another, and use faceting to plot each year in its own panel.

Hint: Don’t forget to install and load the nasaweather R package!

library("nasaweather")
storms <- storms
tropical_storms <- storms %>% filter(type == "Tropical Storm")
storm_plot <- ggplot(tropical_storms, aes(y=lat, x=long))+
  geom_path(aes(colour = name))+
  facet_wrap(~ year)+
  theme_stata()+
  theme(legend.position = "none", axis.title.y = element_text(vjust = 2))+
  labs(x = "Latitude", y = "Longitude", title = "Paths of Tropical Storms")
storm_plot

Question 3:

Using the data set Top25CommonFemaleNames.csv, recreate the “Median Names for Females with the 25 Most Common Names” graphic from FiveThirtyEight (link to graphic; link to full article).

female_names <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/Top25CommonFemaleNames.csv")
name_plot <- ggplot(female_names, aes(x= reorder(name, -median_age)))+
  geom_linerange(aes(ymin=q1_age, ymax=q3_age), color = "#f3d478", size = 4, alpha = 0.85)+
  coord_flip()+
  geom_point(aes(y = median_age), color = "red")+
  labs(title = "Median Ages for Females With the 25 Most\nCommon Names", subtitle = "Among Americans estimated to be alive as of Jan. 1, 2014")+
  xlab(NULL) + ylab(NULL)+
  scale_y_continuous(breaks=seq(15, 75, 10), position = "right")+
  geom_text(x = 16, y = 26, label = "25th", size = 3)+
  geom_text(x = 16, y = 51, label = "75th percentile", size = 3)+
  annotate(geom = "point", x = 23, y = 62, color = "red")+
  geom_text(x = 23.1, y = 65.5, label = "median", size = 3.5)+
  theme_fivethirtyeight()+
  theme(panel.grid.minor = element_blank(), panel.grid.major.y = element_blank(), axis.ticks = element_blank(), panel.grid.major.x = element_line(linetype = "dotted", color = "black"), panel.background = element_blank())+
  geom_line(aes(x = 16, y = 23.25), arrow = arrow(length=unit(0.22,"cm"), ends="first", type = "closed"))+
  geom_line(aes(x = 16, y = 56.5), arrow = arrow(length=unit(-0.22,"cm"), ends="first", type = "closed"))
name_plot