Administrative

Please indicate

  • Roughly how much time you spent on this HW so far: Between 2-3 hours, probably closer to 3.
  • The URL of the RPubs published URL here.
  • What gave you the most trouble: couldn’t find a way to fix the text display for problem 1. Also I guess I’m not creative enough to find a better way to display that data, I’m sure there is a much more elegant solution to what I’ve done. Didn’t fully understand what the plot for q2 should look like.
  • Any comments you have: Was really looking forward to this class, now I am even more. Looking forward to applying the skills I gain here to life. Although my solutions aren’t perfect, it is a step in the right direction and excites me for the future assignments and projects this class has to offer.

Question 1:

Use the mlb_teams.csv data set to create an informative data graphic that illustrates the relationship between winning percentage (WPct) and payroll in context.

Here is a simple scatter plot of payroll vs Wpct, with a line going through the data. Shows that as payroll increases, Wpct generall increases.

library (tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
mlb_teams <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/mlb_teams.csv") 

scatter <- ggplot(mlb_teams, mapping = aes(x = payroll, y = WPct)) + 
geom_point() + 
geom_smooth() +
labs(title = "Win Percantage vs. Payroll", x = "Win %", y = "Payroll")
scatter
## `geom_smooth()` using method = 'loess'

A more elegant attempt… unfortunately couldn’t fix the problem of the text appearing in a better way. Plotted the payroll, wanted to have the win percentages next to the team. I think it’s because there are so many years, I don’t know how to separate the different years. Looking forward to learning how.

library(ggplot2)
library(ggthemes)
mlb_teams <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/mlb_teams.csv") 

mlb_plot <- 
ggplot(mlb_teams, aes(x = reorder(name, payroll), y = payroll, fill = lgID)) + 
geom_bar(stat = "identity") + 
geom_text(aes(label=round(WPct,digits = 2))) +
facet_wrap(~lgID, scales = "free_y", ncol = 1) + 
coord_flip() +
ylim(0,225000000) + 
theme_fivethirtyeight()

mlb_plot
## Warning: Removed 1 rows containing missing values (position_stack).
## Warning: Removed 129 rows containing missing values (geom_bar).
## Warning: Removed 1 rows containing missing values (geom_text).

##geom_text(aes(label=round(WPct, digits = 2)))
##geom_text(aes(label = round(WPct*100, digits=2)), hjust =-.4) + 
##ylim(0,100)
#geom_text(aes(label = pct_dead), hjust = -.1)

Question 2:

Using data from the nasaweather R package, use the path geometry (i.e. use a geom_path layer) to plot the path of each tropical storm in the storms data table. Use color to distinguish the storms from one another, and use faceting to plot each year in its own panel.

Hint: Don’t forget to install and load the nasaweather R package!

I think this is kind of what this question is looking for, but I know it isn’t 100% what you want. I did my best with this one.

library(nasaweather)
tropical_storm <- filter(storms, type == "Tropical Storm")
trp_strm_plot <- ggplot(tropical_storm, mapping = aes(x = long, y = lat, color = name)) + geom_path(color = "steelblue") + facet_wrap(~year)
trp_strm_plot

##ggplot(data = storms, mapping = aes(x = long, y= lat)) + geom_path() + facet_wrap(~year)

Question 3:

Using the data set Top25CommonFemaleNames.csv, recreate the “Median Names for Females with the 25 Most Common Names” graphic from FiveThirtyEight (link to graphic; link to full article).

For some reason it looks like the “fatten” and “lwd” aren’t working for me. I’m hoping that it’s just a weird display thing with my comp and when you open it on yours it will be all good.

library(ggthemes)
names <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/Top25CommonFemaleNames.csv")

names_plot <- #ggplot(names, aes(x=reorder(name, -median_age), y = median_age))+
ggplot(names, mapping = aes(x=reorder(name, -median_age), ymin = q1_age, ymax = q3_age, lower = q1_age, middle = median_age, upper = q3_age)) +
geom_boxplot(stat = "identity", fill = "yellow", fatten=0.01, lwd=.01) +
geom_point(aes(y=median_age), color = "red") + 
coord_flip() + 
labs(x = NULL, y= NULL, title = "Median Ages For Females With the 25 Most\nCommon Names", subtitle = "Among Americans estimated to be alive as of Jan. 1, 2014") +
scale_y_discrete(position = "top", name = NULL, limits=c(15, 25, 35, 45, 55, 65, 75)) +
geom_hline(yintercept = c(15,25,35,45,55,65), linetype="dotted") +
theme_fivethirtyeight()


names_plot