Administrative

Please indicate

  • Roughly how much time you spent on this HW so far: about two and a half hours
  • The URL of the RPubs published URL: http://rpubs.com/aphalin11/A_Aron
  • What gave you the most trouble: For question three - I didn’t know the linerange function when i started the assignment, and so i had trouble making the lines different colors because i initially was making it as scatterplot For question two - I overcomplicated the graph, didn’t realizing some of the arguments for geom_path.
  • Any comments you have: nope!

Question 1:

Use the mlb_teams.csv data set to create an informative data graphic that illustrates the relationship between winning percentage (WPct) and payroll in context.

Loading the data set

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.2.5
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Warning: package 'ggplot2' was built under R version 3.2.5
## Warning: package 'tibble' was built under R version 3.2.5
## Warning: package 'tidyr' was built under R version 3.2.5
## Warning: package 'readr' was built under R version 3.2.5
## Warning: package 'purrr' was built under R version 3.2.5
## Warning: package 'dplyr' was built under R version 3.2.5
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
MLB_Data <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/mlb_teams.csv")
head(MLB_Data)
##   yearID teamID lgID  W  L      WPct attendance normAttend   payroll
## 1   2008    ARI   NL 82 80 0.5061728    2509924  0.5838859  66202712
## 2   2008    ATL   NL 72 90 0.4444444    2532834  0.5892155 102365683
## 3   2008    BAL   AL 68 93 0.4223602    1950075  0.4536477  67196246
## 4   2008    BOS   AL 95 67 0.5864198    3048250  0.7091172 133390035
## 5   2008    CHA   AL 89 74 0.5460123    2500648  0.5817280 121189332
## 6   2008    CHN   NL 97 64 0.6024845    3300200  0.7677285 118345833
##   metroPop                 name
## 1  4489109 Arizona Diamondbacks
## 2  5614323       Atlanta Braves
## 3  2785874    Baltimore Orioles
## 4  4732161       Boston Red Sox
## 5  9554598    Chicago White Sox
## 6  9554598         Chicago Cubs
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.2.5

breaking down the problem -

Variables: winning percentage (quantitative) and payroll (quantitative) Graph: scatterplot - x-axis is Winning Percentage y-axis is payroll

compiled coding -

ggplot(MLB_Data, aes(x=WPct, y=payroll))+
  geom_point(alpha =.7, color = "orange")+
  labs(x="Winning Percentages of individual Players",
       y= "Payroll of individual players", 
       title = "Winning Percentages and Payroll of MLB Players") +
  scale_color_solarized() 

Question 2:

Using data from the nasaweather R package, use the path geometry (i.e. use a geom_path layer) to plot the path of each tropical storm in the storms data table. Use color to distinguish the storms from one another, and use faceting to plot each year in its own panel.

Hint: Don’t forget to install and load the nasaweather R package!

opening and looking at nasa weather storms

Storms <- nasaweather::storms
Storms
## # A tibble: 2,747 × 11
##       name  year month   day  hour   lat  long pressure  wind
##      <chr> <int> <int> <int> <int> <dbl> <dbl>    <int> <int>
## 1  Allison  1995     6     3     0  17.4 -84.3     1005    30
## 2  Allison  1995     6     3     6  18.3 -84.9     1004    30
## 3  Allison  1995     6     3    12  19.3 -85.7     1003    35
## 4  Allison  1995     6     3    18  20.6 -85.8     1001    40
## 5  Allison  1995     6     4     0  22.0 -86.0      997    50
## 6  Allison  1995     6     4     6  23.3 -86.3      995    60
## 7  Allison  1995     6     4    12  24.7 -86.2      987    65
## 8  Allison  1995     6     4    18  26.2 -86.2      988    65
## 9  Allison  1995     6     5     0  27.6 -86.1      988    65
## 10 Allison  1995     6     5     6  28.5 -85.6      990    60
## # ... with 2,737 more rows, and 2 more variables: type <chr>,
## #   seasday <int>

making plot - compiled coding

ggplot(Storms, aes(x=lat, y=long))+
  geom_path(aes(col=name))+
  facet_wrap(~year) 

Question 3:

Using the data set Top25CommonFemaleNames.csv, recreate the “Median Names for Females with the 25 Most Common Names” graphic from FiveThirtyEight (link to graphic; link to full article).

uploading data -set

Female_Names <- read.csv("https://raw.githubusercontent.com/cmsc205/data/master/Top25CommonFemaleNames.csv")

Making box plot

ggplot(Female_Names, aes(x=reorder(name, -median_age), y = median_age))+
  geom_linerange(ymin=Female_Names$q1_age, ymax=Female_Names$q3_age, col = "orange", size = 4, alpha = 0.7)+
  geom_point(col="red")+
  ylim(9,70)+
  coord_flip()+
  labs(x= NULL, y = NULL,
        title = "Median Ages for Females With the 25 Most\nCommon Names", 
        subtitle = "Among Americans estimated to be alive as of Jan 1. 2014")