Tidy Tuesday: A weekly data science challenge

Week 21

How A Booming Population And Climate Change Made California’s Wildfires Worse Than Ever

This is my take on the 2018 August 21 dataset provided by rfordatascience/tidytuesday.


The data for this study can be found here:

https://github.com/rfordatascience/tidytuesday/tree/master/data/week21


All the following code for this exercise can be found at my github repo here:

https://github.com/jasonmstevensphd/tidytuesday/tree/2018_08_21


Lastly, the corresponding article from Buzzfeed can be found here:

https://www.buzzfeednews.com/article/peteraldhous/california-wildfires-people-climate


Here we go!

Initial Exploration of the Dataset


To start, I imported the calfires_week21_frap.csv and I employed the case_when function that the original auther used to assign cause_2 as it’s not explicitly clear what the numbers correlate to in the dataset. This was a nice example of “case_when” that I’ll definitely add to my repretoire.

Text

Tidy Tuesday Week 22: NFL Stats

Bringing in the Data

First I loaded the libraries and files then cleaned up the data to convert blanks to NA’s. Also during this transformation I grouped the data by year and team while removing the columns containing player name, game week, and position. I noted that during this transformation that several teams were not represented and a large number of NA’s were present. To dig a little deeper I then grouped all the teams into conferences.

## # A tibble: 6 x 20
## # Groups:   game_year [1]
##   game_year team  rush_att rush_yds rush_avg rush_tds rush_fumbles   rec
##       <int> <chr>    <dbl>    <dbl>    <dbl>    <dbl>        <dbl> <dbl>
## 1      2000 ARI        322     1174     171.        6           11   302
## 2      2000 ATL        314     1083     114.        6           10   265
## 3      2000 BAL        485     2135     275.        9            8   276
## 4      2000 BUF        438     1709     320.        8            7   292
## 5      2000 CAR        351     1107     163.        7            2   318
## 6      2000 CHI        391     1631     259.        6           10   285
## # ... with 12 more variables: rec_yds <dbl>, rec_avg <dbl>, rec_tds <dbl>,
## #   rec_fumbles <dbl>, pass_att <dbl>, pass_yds <dbl>, pass_tds <dbl>,
## #   int <dbl>, sck <dbl>, pass_fumbles <dbl>, rate <dbl>, Conference <chr>

Digging a bit deeper

The initial plot of this data made it clear that Jacksonville (JAC / JAX, AFC South), San Diego (SD / LAC, AFC West), and the Rams (STL / LA, NFC West) were the teams that were missing assignment to players. As such, these teams were excluded from further analysis, which is unfortunate as analysis of “The Greatest Show on Turf” would have been interesting. As a side note, I don’t have a good way to simply exclude teams that match a character string. Stack overflow mentioned creating a reverse %in% operator. Nevertheless, I went the end around approach to get the job done in an effective yet clunky manner ####Rushing Efficiency An interesting aspect of this plot is observing how a teams rushing performance has changed over time, especially the Atlanta Falcons (can you figure out when Julio Jones was drafted?).

Passing Performance

The above analysis was performed for passing. Interestingly, it was observed that most teams have increased the amount that they are passing. Again, Atlanta was an interesting case, effectively doubling their yearly passing yardage over a period of 10 years.

Rushing vs Passing Play Selection

The above analysis was performed for passing. Interestingly, it was observed that most teams have increased the amount that they are passing. Again, Atlanta was an interesting case, effectively doubling their yearly passing yardage over a period of 10 years.

## NULL