Administrative

Please indicate

  • Roughly how much time you spent on this HW so far:
  • The URL of the RPubs published URL here.
  • What gave you the most trouble:
  • Any comments you have:

Question 1:

Use the mlb_teams.csv data set to create an informative data graphic that illustrates the relationship between winning percentage (WPct) and payroll in context.

library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
mlb_data <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/mlb_teams.csv", as.is = TRUE)
  ggplot(mlb_data, aes(x = reorder(WPct, WPct), y = payroll)) +
  geom_bar(stat = "Identity") +
  coord_flip()

From the bar chart, it is not very obvious that there is a clear trend between the winning rate and payroll. For clearer visualization, we created an xy-scatter plot with a trendline.

ggplot(mlb_data, aes(x = WPct * 100, y = payroll)) +
  geom_point(alpha = 0.5, color = "red") +
  geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess'

We do see the relationship, but it does not seem very strong as the points are deviated a lot from the trend line.

As we see more than 200 teams in the dataset, we concluded that we would need more controlled dataset, as the teams can be in different league. The teams that are in higher league may have lower winning rate although their payroll is high, because they are more likely to compete with stronger teams.

Question 2:

Using data from the nasaweather R package, use the path geometry (i.e. use a geom_path layer) to plot the path of each tropical storm in the storms data table. Use color to distinguish the storms from one another, and use faceting to plot each year in its own panel.

Hint: Don’t forget to install and load the nasaweather R package!

library(nasaweather)
library(tidyverse)
storms
## # A tibble: 2,747 × 11
##       name  year month   day  hour   lat  long pressure  wind
##      <chr> <int> <int> <int> <int> <dbl> <dbl>    <int> <int>
## 1  Allison  1995     6     3     0  17.4 -84.3     1005    30
## 2  Allison  1995     6     3     6  18.3 -84.9     1004    30
## 3  Allison  1995     6     3    12  19.3 -85.7     1003    35
## 4  Allison  1995     6     3    18  20.6 -85.8     1001    40
## 5  Allison  1995     6     4     0  22.0 -86.0      997    50
## 6  Allison  1995     6     4     6  23.3 -86.3      995    60
## 7  Allison  1995     6     4    12  24.7 -86.2      987    65
## 8  Allison  1995     6     4    18  26.2 -86.2      988    65
## 9  Allison  1995     6     5     0  27.6 -86.1      988    65
## 10 Allison  1995     6     5     6  28.5 -85.6      990    60
## # ... with 2,737 more rows, and 2 more variables: type <chr>,
## #   seasday <int>
tropical <- filter(storms, type == "Tropical Storm")
ggplot(tropical, aes(x = lat, y = long)) +
  geom_path(aes(colour = name)) +
  facet_wrap(~year, scales = "free", ncol = 3)

Question 3:

Using the data set Top25CommonFemaleNames.csv, recreate the “Median Names for Females with the 25 Most Common Names” graphic from FiveThirtyEight (link to graphic; link to full article).

library(tidyverse)
old_name <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/Top25CommonFemaleNames.csv")
ggplot(old_name, aes(x = reorder(name, - median_age), ymin = q1_age, lower = q1_age,
                     middle = median_age, upper = q3_age, ymax = q3_age)) +
  geom_boxplot(stat = "identity", color = "lightgoldenrod", fill = "lightgoldenrod") +
  coord_flip() +
  labs(x = NULL, y = NULL,
       title = "Median Ages For Females With the 25 Most Common Names",
       subtitle = "Among Americans estimated to be alive as of Jan. 1, 2014") +
  geom_point(aes(y = median_age), shape = 21, fill = "red", color = "white", size = 2)

** I was trying to use “theme_pander()” at the end, to get rid of the grey background, but it did not knit the R markdown file when I did it.