Please indicate
Use the mlb_teams.csv
data set to create an informative data graphic that illustrates the relationship between winning percentage (WPct
) and payroll in context.
Here is a simple scatter plot of payroll vs Wpct, with a line going through the data. Shows that as payroll increases, Wpct generall increases.
library (tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
mlb_teams <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/mlb_teams.csv")
scatter <- ggplot(mlb_teams, mapping = aes(x = payroll, y = WPct)) +
geom_point() +
geom_smooth() +
labs(title = "Win Percantage vs. Payroll", x = "Win %", y = "Payroll")
scatter
## `geom_smooth()` using method = 'loess'
A more elegant attempt… unfortunately couldn’t fix the problem of the text appearing in a better way. Plotted the payroll, wanted to have the win percentages next to the team. I think it’s because there are so many years, I don’t know how to separate the different years. Looking forward to learning how.
library(ggplot2)
library(ggthemes)
mlb_teams <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/mlb_teams.csv")
mlb_plot <-
ggplot(mlb_teams, aes(x = reorder(name, payroll), y = payroll, fill = lgID)) +
geom_bar(stat = "identity") +
geom_text(aes(label=round(WPct,digits = 2))) +
facet_wrap(~lgID, scales = "free_y", ncol = 1) +
coord_flip() +
ylim(0,225000000) +
theme_fivethirtyeight()
mlb_plot
## Warning: Removed 1 rows containing missing values (position_stack).
## Warning: Removed 129 rows containing missing values (geom_bar).
## Warning: Removed 1 rows containing missing values (geom_text).
##geom_text(aes(label=round(WPct, digits = 2)))
##geom_text(aes(label = round(WPct*100, digits=2)), hjust =-.4) +
##ylim(0,100)
#geom_text(aes(label = pct_dead), hjust = -.1)
Using data from the nasaweather
R package, use the path geometry (i.e. use a geom_path
layer) to plot the path of each tropical storm in the storms
data table. Use color to distinguish the storms from one another, and use faceting to plot each year in its own panel.
Hint: Don’t forget to install and load the nasaweather
R package!
I think this is kind of what this question is looking for, but I know it isn’t 100% what you want. I did my best with this one.
library(nasaweather)
tropical_storm <- filter(storms, type == "Tropical Storm")
trp_strm_plot <- ggplot(tropical_storm, mapping = aes(x = long, y = lat, color = name)) + geom_path(color = "steelblue") + facet_wrap(~year)
trp_strm_plot
##ggplot(data = storms, mapping = aes(x = long, y= lat)) + geom_path() + facet_wrap(~year)
Using the data set Top25CommonFemaleNames.csv
, recreate the “Median Names for Females with the 25 Most Common Names” graphic from FiveThirtyEight (link to graphic; link to full article).
For some reason it looks like the “fatten” and “lwd” aren’t working for me. I’m hoping that it’s just a weird display thing with my comp and when you open it on yours it will be all good.
library(ggthemes)
names <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/Top25CommonFemaleNames.csv")
names_plot <- #ggplot(names, aes(x=reorder(name, -median_age), y = median_age))+
ggplot(names, mapping = aes(x=reorder(name, -median_age), ymin = q1_age, ymax = q3_age, lower = q1_age, middle = median_age, upper = q3_age)) +
geom_boxplot(stat = "identity", fill = "yellow", fatten=0.01, lwd=.01) +
geom_point(aes(y=median_age), color = "red") +
coord_flip() +
labs(x = NULL, y= NULL, title = "Median Ages For Females With the 25 Most\nCommon Names", subtitle = "Among Americans estimated to be alive as of Jan. 1, 2014") +
scale_y_discrete(position = "top", name = NULL, limits=c(15, 25, 35, 45, 55, 65, 75)) +
geom_hline(yintercept = c(15,25,35,45,55,65), linetype="dotted") +
theme_fivethirtyeight()
names_plot