Please indicate
Use the mlb_teams.csv data set to create an informative data graphic that illustrates the relationship between winning percentage (WPct) and payroll in context.
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
mlb_data <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/mlb_teams.csv", as.is = TRUE)
ggplot(mlb_data, aes(x = reorder(WPct, WPct), y = payroll)) +
geom_bar(stat = "Identity") +
coord_flip()
From the bar chart, it is not very obvious that there is a clear trend between the winning rate and payroll. For clearer visualization, we created an xy-scatter plot with a trendline.
ggplot(mlb_data, aes(x = WPct * 100, y = payroll)) +
geom_point(alpha = 0.5, color = "red") +
geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess'
We do see the relationship, but it does not seem very strong as the points are deviated a lot from the trend line.
As we see more than 200 teams in the dataset, we concluded that we would need more controlled dataset, as the teams can be in different league. The teams that are in higher league may have lower winning rate although their payroll is high, because they are more likely to compete with stronger teams.
Using data from the nasaweather R package, use the path geometry (i.e. use a geom_path layer) to plot the path of each tropical storm in the storms data table. Use color to distinguish the storms from one another, and use faceting to plot each year in its own panel.
Hint: Don’t forget to install and load the nasaweather R package!
library(nasaweather)
library(tidyverse)
storms
## # A tibble: 2,747 × 11
## name year month day hour lat long pressure wind
## <chr> <int> <int> <int> <int> <dbl> <dbl> <int> <int>
## 1 Allison 1995 6 3 0 17.4 -84.3 1005 30
## 2 Allison 1995 6 3 6 18.3 -84.9 1004 30
## 3 Allison 1995 6 3 12 19.3 -85.7 1003 35
## 4 Allison 1995 6 3 18 20.6 -85.8 1001 40
## 5 Allison 1995 6 4 0 22.0 -86.0 997 50
## 6 Allison 1995 6 4 6 23.3 -86.3 995 60
## 7 Allison 1995 6 4 12 24.7 -86.2 987 65
## 8 Allison 1995 6 4 18 26.2 -86.2 988 65
## 9 Allison 1995 6 5 0 27.6 -86.1 988 65
## 10 Allison 1995 6 5 6 28.5 -85.6 990 60
## # ... with 2,737 more rows, and 2 more variables: type <chr>,
## # seasday <int>
tropical <- filter(storms, type == "Tropical Storm")
ggplot(tropical, aes(x = lat, y = long)) +
geom_path(aes(colour = name)) +
facet_wrap(~year, scales = "free", ncol = 3)
Using the data set Top25CommonFemaleNames.csv, recreate the “Median Names for Females with the 25 Most Common Names” graphic from FiveThirtyEight (link to graphic; link to full article).
library(tidyverse)
old_name <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/Top25CommonFemaleNames.csv")
ggplot(old_name, aes(x = reorder(name, - median_age), ymin = q1_age, lower = q1_age,
middle = median_age, upper = q3_age, ymax = q3_age)) +
geom_boxplot(stat = "identity", color = "lightgoldenrod", fill = "lightgoldenrod") +
coord_flip() +
labs(x = NULL, y = NULL,
title = "Median Ages For Females With the 25 Most Common Names",
subtitle = "Among Americans estimated to be alive as of Jan. 1, 2014") +
geom_point(aes(y = median_age), shape = 21, fill = "red", color = "white", size = 2)
** I was trying to use “theme_pander()” at the end, to get rid of the grey background, but it did not knit the R markdown file when I did it.