Project 2: Wind Turbines

Author

D Shima

Wind Turbines Project

Introduction

The topic of my project is about wind turbines. The dataset I picked was found in the United States Geological Survey website and is part of our class google drive selection. The topic is fascinating for me because their benefits have a positive impact in quality life worldwide from job creation, to empowerment of local communities to providing clean air. Wind turbines help produce renewable elecricity and therefore reduce relying on fossil fuels and greenhouse gas emissions.

The dataset includes variables on the state, county, year, turbine capacity, turbine hub height, turbine rotor diameter, turbine swept area, turbine total height, project capacity, project number turbines and latitude and longitude. For this specific assignment, I have focused on seven of those variables: state, county, year, turbine hub height, turbine rotor diameter, latitude and longitude.

After finding my working director, loading the libraries needed and loading the dataset, the cleaning process of this assignment included putting all the names of the variables in lowercase using command names, changing where there is a “.” and putting a “_” instead using command gsub. I have also renamed the variables site.longitude and site.latitude to longitude and latitude respectively using command rename. After that, I filtered the state CA, the specific years 1985, 1995, 2005 and 2015 as well as the specific counties Kern and Riverside. Then, I removed all the nas from the variables I wanted to focus on, removed the variables that I will not work with and renamed CA to California.

Finally, I created my plot and my map to show the turbine hub height and turbine rotor diameter in the two counties of California across the selected years.

Describe your actions in each chunk

Load the necessary libraries

In these two chunks, I am loading and installing the necessary libraries that will help me do this assignment.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(leaflet)
library(plotly)

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout

Find my working directory

My first step was to find my working directory using getwd.

getwd()
[1] "/Users/doriashima/Downloads"

Load your dataset

Then I was able to load my dataset using setwd and read_csv.

#setwd("/Users/doriashima/Desktop/data visualization course")
setwd("/Users/doriashima/Downloads")
windturbinesdataset <- read_csv("wind_turbines_USGS.csv")
Rows: 63961 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): Site.State, Site.County
dbl (10): Year, Turbine.Capacity, Turbine.Hub_Height, Turbine.Rotor_Diameter...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(windturbinesdataset)
# A tibble: 6 × 12
  Site.State Site.County    Year Turbine.Capacity Turbine.Hub_Height
  <chr>      <chr>         <dbl>            <dbl>              <dbl>
1 IA         Story County   2017             3000               87.5
2 IA         Hardin County  2017             3000               87.5
3 IA         Story County   2017             3000               87.5
4 IA         Story County   2017             3000               87.5
5 IA         Story County   2017             3000               87.5
6 IA         Story County   2017             3000               87.5
# ℹ 7 more variables: Turbine.Rotor_Diameter <dbl>, Turbine.Swept_Area <dbl>,
#   Turbine.Total_Height <dbl>, Project.Capacity <dbl>,
#   Project.Number_Turbines <dbl>, Site.Latitude <dbl>, Site.Longitude <dbl>

Clean the dataset

The next chunks are all about the cleaning of the dataset that I conducted which included putting all the names of the variables in lowercase using command names, changing where there is a “.” and putting a “_” instead using command gsub. I have also renamed the variables site.longitude and site.latitude to longitude and latitude respectively using command rename. After that, I filtered the state CA, the specific years 1985, 1995, 2005 and 2015 as well as the specific counties Kern and Riverside. Then, I removed all the nas from the variables I wanted to focus on, removed the variables that I will not work with and renamed CA to California.

names(windturbinesdataset) <- tolower(names(windturbinesdataset))
head(windturbinesdataset)
# A tibble: 6 × 12
  site.state site.county    year turbine.capacity turbine.hub_height
  <chr>      <chr>         <dbl>            <dbl>              <dbl>
1 IA         Story County   2017             3000               87.5
2 IA         Hardin County  2017             3000               87.5
3 IA         Story County   2017             3000               87.5
4 IA         Story County   2017             3000               87.5
5 IA         Story County   2017             3000               87.5
6 IA         Story County   2017             3000               87.5
# ℹ 7 more variables: turbine.rotor_diameter <dbl>, turbine.swept_area <dbl>,
#   turbine.total_height <dbl>, project.capacity <dbl>,
#   project.number_turbines <dbl>, site.latitude <dbl>, site.longitude <dbl>
summary(windturbinesdataset$year)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1983    2008    2012    2012    2017    2021 
names(windturbinesdataset) <- gsub("\\.","_",names(windturbinesdataset))
wind <- windturbinesdataset|>
  rename(lon = site_latitude, lat = site_longitude)
head(wind)
# A tibble: 6 × 12
  site_state site_county    year turbine_capacity turbine_hub_height
  <chr>      <chr>         <dbl>            <dbl>              <dbl>
1 IA         Story County   2017             3000               87.5
2 IA         Hardin County  2017             3000               87.5
3 IA         Story County   2017             3000               87.5
4 IA         Story County   2017             3000               87.5
5 IA         Story County   2017             3000               87.5
6 IA         Story County   2017             3000               87.5
# ℹ 7 more variables: turbine_rotor_diameter <dbl>, turbine_swept_area <dbl>,
#   turbine_total_height <dbl>, project_capacity <dbl>,
#   project_number_turbines <dbl>, lon <dbl>, lat <dbl>

Filtering

windturbinesCA <- wind |>
  filter(site_state == "CA") |>
  filter(year %in% c("1985","1995", "2005", "2015")) |>
  filter(site_county %in% c("Kern County", "Riverside County")) |>
  filter(!is.na("site_county") & !is.na("turbine_capacity") & !is.na("turbine_hub_height") & !is.na("turbine_rotor_diameter") & !is.na("turbine_swept_area") & !is.na("site_latitude") & !is.na("site_longitude"))

Renaming CA to California

windturbinesCA$site_state[windturbinesCA$site_state== "CA"] <- "California"

Removing variables that will not be used

clean_CA <- windturbinesCA |>
  select(-turbine_total_height,-project_capacity,-project_number_turbines)
head(clean_CA)
# A tibble: 6 × 9
  site_state site_county       year turbine_capacity turbine_hub_height
  <chr>      <chr>            <dbl>            <dbl>              <dbl>
1 California Riverside County  1995              225                 40
2 California Riverside County  1995              225                 40
3 California Riverside County  1995              225                 40
4 California Riverside County  1995              225                 40
5 California Riverside County  1995              225                 40
6 California Riverside County  1995              225                 40
# ℹ 4 more variables: turbine_rotor_diameter <dbl>, turbine_swept_area <dbl>,
#   lon <dbl>, lat <dbl>

Plot 1:

My first plot shows a comparison of turbine hub height and turbine rotor diameter between Riverside county and Kern county in California across the years of 1985, 1995, 2005, and 2015.

p1 <- ggplot(clean_CA, aes(x = year, y = turbine_hub_height,  color = site_county)) +
  geom_point(aes(size = turbine_rotor_diameter), alpha = 0.7) +
  geom_jitter(size = 1) +
  #geom_text(aes(=)) +
  scale_color_brewer(palette = "Set1") +
  labs(title = "Turbine Hub Height in Kern County and Riverside County in California", 
       caption = "USGS",
       y = "Turbine Hub Height",
       x = "Year" 
       ) + 
       theme_bw(base_family = "serif")
p1

In this graph, I have added an interactive style using plotly.

ggplotly(p1)

Correlation

I wanted to see the correlation between turbine rotor diameter and turbine hub height and saw that it was a strong correlation with a value of 0.991. Its p value with three asterisks showed the contribution to the model and the adjusted r square of 0.98 showed that 98% of the variation in the data may be explained by the model.

cor(clean_CA$turbine_rotor_diameter, clean_CA$turbine_hub_height)
[1] 0.9910903
fit1 <- lm(turbine_hub_height ~ turbine_rotor_diameter, data = clean_CA)
summary(fit1)

Call:
lm(formula = turbine_hub_height ~ turbine_rotor_diameter, data = clean_CA)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.2212  0.5027  0.5027  1.7666  1.7666 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)            23.855462   0.532455   44.80   <2e-16 ***
turbine_rotor_diameter  0.532516   0.007502   70.98   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.743 on 91 degrees of freedom
Multiple R-squared:  0.9823,    Adjusted R-squared:  0.9821 
F-statistic:  5039 on 1 and 91 DF,  p-value: < 2.2e-16

Map visualization using leaflet

In this section, I have created a map visualization that includes a popup tooltip for points in my map.

popupturbines <- paste0(
  "<b>Turbine Height: <\b>", clean_CA$turbine_hub_height, "<br>",
  "<b>Turbine Rotor Diameter: <\b>", clean_CA$turbine_rotor_diameter)
leaflet() |>
  setView(lng = -117.417931, lat = 34, zoom = 7) |>
  addProviderTiles("Esri.NatGeoWorldMap") |> 
  addCircles(
    lat = ~lat,
    lng = ~lon,
    data = clean_CA,
    color = "#2ca25f",
    fillColor = "#feb24c",
    radius = clean_CA$turbine_hub_height*150,
    fillOpacity = 0.75)
popup=popupturbines 

Adding a legend focusing on turbine hub height

pal <- colorFactor(palette = "#7538a1", 
            levels = c("40", "65", "84"), clean_CA$turbine_hub_height)
map2 <- leaflet(clean_CA) |> 
  setView(lng = -117.417931, lat = 34, zoom = 7) |> 
  addProviderTiles("Esri.NatGeoWorldMap") |> 
  addCircles(
    fillColor = ~pal(turbine_hub_height), 
    radius = ~turbine_hub_height * 150, 
    fillOpacity = 0.75, 
    color = ~pal(turbine_hub_height), 
    popup = ~popupturbines 
  ) |> 
  addLegend(
    position = "topright", pal=pal, values = clean_CA$turbine_hub_height, 
    title = "Turbine Hub Height", opacity = 1)
Assuming "lon" and "lat" are longitude and latitude, respectively
map2

Concluding Essay

While conducting research on wind turbines, I learned that the taller turbine hub height is, the faster turbines are able to access more consistent winds at higher altitudes. Accessing more consistent winds at higher altitudes increases energy production. As far as the turbine rotor diameter, I learned that the larger the diameters, the more efficiency is enabled.

Both the plot 1 and the map show that Kern County has wind turbines that have higher hub heights and rotor diameters than Riverside County which means that Kern County’s energy production and efficiency are higher than Riverside County.

We can also observe between 1990 and 2000, there were no data on Kern County and between 2000 and 2010, there were no data on Riverside County. Only between 2010 and 2020 were data collected for both Counties. Unfortunately, while doing research, I could not find what historical event could have been associated with the lack of data in the earlier dates expect from maybe the fact that wind energy only started in the 1980s. The availability of data and funding to manage data in those early years may have not been in place yet.

Sources: USGS