Notes Mar 1

Harold Nelson

3/1/2021

Setup

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.6     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Get Data

Go to Tidy Tuesday and get the wind turbine data from Oct. 27, 2020.

wind_turbine <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-10-27/wind-turbine.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   objectid = col_double(),
##   province_territory = col_character(),
##   project_name = col_character(),
##   total_project_capacity_mw = col_double(),
##   turbine_identifier = col_character(),
##   turbine_number_in_project = col_character(),
##   turbine_rated_capacity_k_w = col_double(),
##   rotor_diameter_m = col_double(),
##   hub_height_m = col_double(),
##   manufacturer = col_character(),
##   model = col_character(),
##   commissioning_date = col_character(),
##   latitude = col_double(),
##   longitude = col_double(),
##   notes = col_character()
## )

Issues

  1. Missing values in commissioning_date

  2. Missing values in turbine_rated_capacity_k_w.

  3. Gaps in time series plots.

  4. No value for Ontario in current installed capacity.

Missing Dates.

Why are they missing?

Can we fix the problem?

What did Julia Silge do?

Answer

bad_dates = wind_turbine %>% 
  filter(is.na(as.numeric(commissioning_date))) 
## Warning in mask$eval_all_filter(dots, env_filter): NAs introduced by coercion
table(bad_dates$commissioning_date)
## 
##      2000/2001      2001/2003      2002/2006      2004/2005 2005/2006/2012 
##             59             16              3             47             73 
##      2006/2007      2006/2008      2011/2012      2013/2014      2014/2015 
##             35            133            141            154            207

The problem is that multiple years are listed, separated by “/”.

Julia used parse_number(). What does this do?

Answer

parse_number("2000/2001")
## [1] 2000

Missing Capacity

Look at these records.

Answer

bad_capacity = wind_turbine %>% 
  filter(is.na(turbine_rated_capacity_k_w))

head(bad_capacity)
## # A tibble: 6 x 15
##   objectid province_territ… project_name total_project_c… turbine_identif…
##      <dbl> <chr>            <chr>                   <dbl> <chr>           
## 1     3347 Ontario          Skyway 8                 9.48 SKY1            
## 2     3348 Ontario          Skyway 8                 9.48 SKY2            
## 3     3349 Ontario          Skyway 8                 9.48 SKY3            
## 4     3350 Ontario          Skyway 8                 9.48 SKY4            
## 5     3351 Ontario          Skyway 8                 9.48 SKY5            
## 6     3362 Ontario          South Kent …           270    SKW1            
## # … with 10 more variables: turbine_number_in_project <chr>,
## #   turbine_rated_capacity_k_w <dbl>, rotor_diameter_m <dbl>,
## #   hub_height_m <dbl>, manufacturer <chr>, model <chr>,
## #   commissioning_date <chr>, latitude <dbl>, longitude <dbl>, notes <chr>
table(bad_capacity$province_territory)
## 
## Ontario 
##     220
table(bad_capacity$commissioning_date)
## 
## 2014 2015 
##  129   91

What did Julia do?

She dropped them.

Alternatives? Let’s think about it and discuss Wednesday.

What else do you see in her code?

  1. transmute()

  2. fct_lump_n()

Google!!!