Harold Nelson
3/1/2021
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.6 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Go to Tidy Tuesday and get the wind turbine data from Oct. 27, 2020.
wind_turbine <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-10-27/wind-turbine.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## objectid = col_double(),
## province_territory = col_character(),
## project_name = col_character(),
## total_project_capacity_mw = col_double(),
## turbine_identifier = col_character(),
## turbine_number_in_project = col_character(),
## turbine_rated_capacity_k_w = col_double(),
## rotor_diameter_m = col_double(),
## hub_height_m = col_double(),
## manufacturer = col_character(),
## model = col_character(),
## commissioning_date = col_character(),
## latitude = col_double(),
## longitude = col_double(),
## notes = col_character()
## )
Missing values in commissioning_date
Missing values in turbine_rated_capacity_k_w.
Gaps in time series plots.
No value for Ontario in current installed capacity.
Why are they missing?
Can we fix the problem?
What did Julia Silge do?
## Warning in mask$eval_all_filter(dots, env_filter): NAs introduced by coercion
##
## 2000/2001 2001/2003 2002/2006 2004/2005 2005/2006/2012
## 59 16 3 47 73
## 2006/2007 2006/2008 2011/2012 2013/2014 2014/2015
## 35 133 141 154 207
The problem is that multiple years are listed, separated by “/”.
Julia used parse_number(). What does this do?
Look at these records.
## # A tibble: 6 x 15
## objectid province_territ… project_name total_project_c… turbine_identif…
## <dbl> <chr> <chr> <dbl> <chr>
## 1 3347 Ontario Skyway 8 9.48 SKY1
## 2 3348 Ontario Skyway 8 9.48 SKY2
## 3 3349 Ontario Skyway 8 9.48 SKY3
## 4 3350 Ontario Skyway 8 9.48 SKY4
## 5 3351 Ontario Skyway 8 9.48 SKY5
## 6 3362 Ontario South Kent … 270 SKW1
## # … with 10 more variables: turbine_number_in_project <chr>,
## # turbine_rated_capacity_k_w <dbl>, rotor_diameter_m <dbl>,
## # hub_height_m <dbl>, manufacturer <chr>, model <chr>,
## # commissioning_date <chr>, latitude <dbl>, longitude <dbl>, notes <chr>
##
## Ontario
## 220
##
## 2014 2015
## 129 91
What did Julia do?
She dropped them.
Alternatives? Let’s think about it and discuss Wednesday.
What else do you see in her code?
transmute()
fct_lump_n()
Google!!!