Add message = FALSE, warning = FALSE so that extra output is not included in your final document. This will prevent the messages that come from loading packages from being included
In your setup chunk,you can also add error = TRUE so that the document will knit even if the code doesn’t work.
Make sure to include:
Don’t forget to leave a space between the # and the words for your headers!
Hyperlinks: See example in markdown file for the next sentence. Add [] around the words you want to be the link and () around the link.
The ERCOT power data is publicly available from ERCOT’s website and the EIA’s energy tracking website. Information is collected hourly on megawatt amounts from multiple categories of energy sources.
Policy Context: Texas Governor Greg Abbott claimed that solar and wind were to blame for the power outages and that fossil fuels were the only reliable way to power electric grids. Other conservative leaders and influential talking heads said similar things across various forms of media.
“Unbeknownst to most people, the Green New Deal came to Texas, the power grid in the state became totally reliant on windmills,” Carlson said Feb. 16. “Then it got cold, and the windmills broke, because that’s what happens in the Green New Deal.”
Carlson also warned that “the same energy policies that have wrecked Texas this week are going nationwide.”
Communication Plan:
Show change over time for each energy source. Maybe use a double y-axis to add temperature change over time; must be used with caution. Graphs with double axes have been used to misrepresent data because the two y axes do not share a common scale. Show that proportionally, natural gas decreased the largest amount (in raw numbers and percentage) and that “green energy sources” were relatively reliable during the freezing weather. Calculate percent of total energy that comes from each source, similar to HW 1.
Reminder: Leave a space between your headings’ ## and your words so that it formats correctly!
Data source: Government website -Energy Information Administration (EIA) The Hourly Electric Grid Monitor
- Hourly measurements of power output for different energy sources.
Originally, I downloaded the data as a massive CSV file and kept only the data I needed. Later, I figured out how to use an API to make the data collection reproducible for anyone to download (after they get a unique EIA API key; see Methodology section). Date ranges included in the dataset go from January 1st to August 13th, the day the data was originally downloaded using the API.
Do echo = FALSE if you do not want the code to appear in the document with the output.
The data for February power sources in Texas originally contained 5,382 rows, where each row represented an hourly update on power output. Dates begin at January 1st and end at August 13th (when data was originally downloaded). Columns consisted of a time stamp and categories of energy (solar, wind, coal, etc.) Originally, the structure was a wide format where each energy source was it’s own variable, however, I wanted to transform my data into a longer format that only contains data from February 1st to February 28th where all the energy sources become one variable named “source”. In order to do this, I used pivot_longer() and filtered for a range of dates.
Texas_wide also does not have the dates/times separated. They are in one variable named date. For summarizing and graphing purposes, I may want to have date and time as separate variables and rename the original variable to datetime since it contains both items.
Texas_feb <- Texas_wide %>%
arrange(date) %>%
filter(date >= as.Date("2021-02-01") & date <= as.Date("2021-02-28") ) %>%
rename(datetime = date) %>%
mutate(date = as_date(datetime), # a new column appeared! stored as date
time = hms::as_hms(datetime) ) # stored as S3:hms (format for time)
Texas_long <- Texas_wide %>%
arrange(date) %>%
filter(date >= as.Date("2021-02-01") & date <= as.Date("2021-02-28") ) %>%
pivot_longer(!date,
names_to = "source",
values_to = "power") %>%
rename(datetime = date) %>%
mutate(date = as_date(datetime), # a new column appeared! stored as date
time = hms::as_hms(datetime) ) # stored as S3:hms (format for time)
str(Texas_long)
## tibble [3,894 x 5] (S3: tbl_df/tbl/data.frame)
## $ datetime: POSIXct[1:3894], format: "2021-02-01 00:00:00" "2021-02-01 00:00:00" ...
## $ source : chr [1:3894] "Natural Gas" "Wind" "Coal" "Solar" ...
## $ power : num [1:3894] 13969 8957 7845 1697 50 ...
## $ date : Date[1:3894], format: "2021-02-01" "2021-02-01" ...
## $ time : 'hms' num [1:3894] 00:00:00 00:00:00 00:00:00 00:00:00 ...
## ..- attr(*, "units")= chr "secs"
My new data frame containing information only for February, Texas_long, has 3,894 rows and 5 columns. Each row represents an hourly output from a specific energy source (i.e. there are multiple rows each hourly update).
load("C:/Users/aleaw/OneDrive/Desktop/cuppackage/data/texastemperature.rda")
as_tibble(texastemperature)
# date stored as character, temp as double
#57 rows, 2 columns
temp <- as_tibble(texastemperature) %>%
mutate(date = mdy(date)) %>% # date now stored as date instead of character
filter(date >= as.Date("2021-02-01") & date <= as.Date("2021-02-28") ) # filter Feb.
temp
temp %>%
ggplot() +
geom_line(aes(date, temp_daily_avg)) +
theme_bw()
Describe the data: How many observations? How many variables are there? What kind of variables are they (categorical, continuous, or ordinal)? How are they stored when you read your data into R? Was your data tidy? Do you want it in long or wide format (or both)?
What are the main variables you are using for your example? Provide appropriate descriptive statistics given the variable type. (Range, median, mean, distribution shape, count, etc.).
Do not just run a command and include the output. Interpret/summarize the key statistics in a sentence or two! Communicate to the reader!
summary(Texas_feb)
## datetime Natural Gas Wind Coal
## Min. :2021-02-01 00:00:00 Min. : 4758 Min. : 649 Min. : 3873
## 1st Qu.:2021-02-07 18:00:00 1st Qu.:11625 1st Qu.: 4409 1st Qu.: 6964
## Median :2021-02-14 12:00:00 Median :16479 Median : 7416 Median : 8855
## Mean :2021-02-14 12:00:00 Mean :20087 Mean : 8608 Mean : 8446
## 3rd Qu.:2021-02-21 06:00:00 3rd Qu.:29812 3rd Qu.:12358 3rd Qu.:10477
## Max. :2021-02-28 00:00:00 Max. :43967 Max. :22415 Max. :11693
## Solar Water Nuclear date
## Min. : 0.0 Min. : 43.00 Min. :3780 Min. :2021-02-01
## 1st Qu.: 0.0 1st Qu.: 48.00 1st Qu.:5100 1st Qu.:2021-02-07
## Median : 2.0 Median : 72.00 Median :5115 Median :2021-02-14
## Mean : 981.1 Mean : 79.76 Mean :4958 Mean :2021-02-14
## 3rd Qu.:1680.0 3rd Qu.: 97.00 3rd Qu.:5136 3rd Qu.:2021-02-21
## Max. :4957.0 Max. :343.00 Max. :5149 Max. :2021-02-28
## time
## Length:649
## Class1:hms
## Class2:difftime
## Mode :numeric
##
##
Texas_feb %>%
select(`Natural Gas`:Nuclear) %>%
describe(fast = TRUE)
ggplot(Texas_long) +
geom_line(aes(datetime, power, color= source)) +
theme_classic()
The exploratory graph above shows the amount of power produced from each energy source from February 1st to February 28th.
Texas_long %>%
group_by(source)%>%
summarize(feb_sum = sum(power)) %>%
ggplot(aes(source, feb_sum) ) +
geom_col() + theme_minimal()
ggplot(Texas_feb, aes(Solar)) +
geom_histogram()
Texas_feb %>%
ggplot(aes(`Natural Gas`)) +
geom_histogram()
ggplot(Texas_feb, aes(Wind)) +
geom_density()
ggplot(Texas_feb, aes(Nuclear)) +
geom_density() # interesting. Probably because it's either off or on depending on need.
ggplot(Texas_feb, aes(Coal)) +
geom_density()
The simple table below shows how many megawatts each source produced during the month of February as a raw count of Megawatts and a percentage of total output from all energy sources:
Texas_long %>%
group_by(source) %>%
summarize(Megawatts = sum(power)) %>%
mutate(Percent = round(prop.table(Megawatts), digits = 3)) %>%
kable_classic(full_width = F)
## Error in if (!kable_format %in% c("html", "latex")) {: argument is of length zero
Not complete.
To calculate the difference in energy output on February
HourlyChange <- Texas_long %>%
group_by(source) %>%
arrange(datetime, .by_group = TRUE) %>%
mutate(pct_change = (power/lag(power) - 1) * 100)
Texas_long %>%
filter(date == as.Date("2021-02-15")) %>%
group_by(source) %>%
summarize(dailysum15 = sum(power)) %>%
mutate(Percent = round(prop.table(dailysum15), digits = 3))
Texas_long %>%
filter(date == as.Date("2021-02-25")) %>%
group_by(source) %>%
summarize(Megawatts = sum(power)) %>%
mutate(Percent = round(prop.table(Megawatts), digits = 3))
To calculate a simple moving average (over 7 days), we can use the rollmean() function from the zoo package. This function takes a k, which is an integer width of the rolling window.
The code below calculates a 3, 5, 7, 15, and 21-day rolling average for the deathsfrom COVID in the US.
# install.packages("zoo")
library(zoo)
coal <- Texas_long %>%
filter( source == "Coal") %>% # keep only coal observations
arrange(datetime) %>% # Start with Feb 1st
mutate(source_03hr = rollmean(power, k = 3, fill = 0), # 3 hour average
source_07hr = rollmean(power, k = 7, fill = 0), # 7 hour average
source_12hr = rollmean(power, k = 12, fill = 0)) # 12 hour average
coal
Added 3 new columns for a rolling average of coal power output. I don’t need this, I’m just showing how I did it.
Doing it again for natural gas and storing it as its own object named “naturalgas”:
naturalgas <- Texas_long %>%
filter( source == "Natural Gas") %>%
arrange(desc(datetime)) %>%
mutate(original = power,
source_03hr = rollmean(power, k = 3, fill = 0),
source_07hr = rollmean(power, k = 7, fill = 0),
source_12hr = rollmean(power, k = 12, fill = 0))
naturalgas
Now to graph the megawatts from coal per hour with columns and the 12 hour rolling average with a line in one graph:
coal %>%
ggplot(aes(x = datetime,
y = power)) +
geom_col(fill = "light gray") +
geom_line(aes(y = source_12hr,
color = "red")) +
theme_minimal() +
labs(title="Power output from Coal",
x = "",
y="Megawatts")
mov.avg <- Texas_long %>%
arrange(datetime) %>%
group_by(source) %>%
summarise(datetime = datetime,
source_03hr = rollmean(power, k = 3, fill = NA),
source_07hr = rollmean(power, k = 7, fill = NA),
source_12hr = rollmean(power, k = 12, fill = NA))
mov.avg
“Thermal unit category” includes natural gas, coal and nuclear power. Recode using this definition and green energy sources just for fun.
Texas_long %>%
mutate(
Energy = case_when(
source == "Wind" | source == "Water" | source =="Solar" ~ "Green",
source == "Coal" | source == "Nuclear" | source == "Natural Gas" ~ "ThermalUnit") ) %>%
group_by(Energy) %>%
summarize(Megawatts = sum(power))
cor(Texas_feb$Wind, Texas_feb$`Natural Gas`) # 2 variables
## [1] -0.7450777
# correlation matrix
Texas_feb %>%
select(`Natural Gas`:Nuclear) %>%
cor(use = "pairwise.complete.obs")
## Natural Gas Wind Coal Solar Water
## Natural Gas 1.0000000 -0.74507769 0.54098879 -0.179418037 0.473690492
## Wind -0.7450777 1.00000000 -0.53655719 -0.063957320 -0.276937349
## Coal 0.5409888 -0.53655719 1.00000000 -0.038273160 0.155804984
## Solar -0.1794180 -0.06395732 -0.03827316 1.000000000 0.009598617
## Water 0.4736905 -0.27693735 0.15580498 0.009598617 1.000000000
## Nuclear -0.3271192 0.40138951 0.14874660 0.034718663 -0.281910443
## Nuclear
## Natural Gas -0.32711919
## Wind 0.40138951
## Coal 0.14874660
## Solar 0.03471866
## Water -0.28191044
## Nuclear 1.00000000
Discuss comments on correlations. What goes up the most as the other goes down? Throw that into the discussion of the topic and assumptions that were made at the time.
In the early morning hours of Feb. 15, natural gas generation dropped 23% by 4 a.m., a total of about 10,000 megawatts on a system that was running about 65,000 megawatts in total at midnight. That morning ERCOT started rolling blackouts.
So, it’s true that wind plays a significant role in Texas’ power supply — the state actually generates more wind energy than any other state in the nation — but there’s no indication that wind energy was the primary cause of the power outages in Texas.
Blah blah blah add more stuff that relates to your graphs and tables to support your argument.
Energy Information Administration (EIA) The Hourly Electric Grid Monitor
“Wind Turbines Didn’t Cause Texas Energy Crisis” FactCheck.org
“How Fox News, far-right TV blamed green energy for Texas’ power outages” Politifact
#' Texas's Energy Output
#'
#' Energy output (megawatthours) from each source for Texas during the winter storm.
#' Data are from the U.S. Energy Information Administration
#' You will need YOUR OWN api eia key to download the data through their API
#' Get an API key here: https://www.eia.gov/developer/
#' Check out the Hourly Electric Grid Monitor here: https://www.eia.gov/electricity/gridmonitor/dashboard/electric_overview/US48/US48
#' @format A data frame with 5382 rows and 7 variables:
#' \describe{
#' \item{date}{Date and time of measurement}
#' \item{Natural Gas}{Supply of energy in megawatthours from Natural Gas}
#' \item{Wind}{}
#' \item{Coal}{}
#' \item{Solar}{}
#' \item{Water}{}
#' \item{Nuclear}{}
#' }
#' @source \url{https://www.eia.gov/}
library("eia") # works with API for downloading data from the EIA
## replicatable way
eia_set_key("c0817f67f7817ab45b9f7e8dbf0de9bb")
# Prep the responses
base_url = 'http://api.eia.gov/series/?series_id='
variables <- c("EBA.TEX-ALL.NG.NG.H",
"EBA.TEX-ALL.NG.WND.H",
"EBA.TEX-ALL.NG.COL.H",
"EBA.TEX-ALL.NG.SUN.H",
"EBA.TEX-ALL.NG.WAT.H",
"EBA.TEX-ALL.NG.NUC.H")
list <- eia_series(variables, start = 2021)
#Downloaded on August 13th. Last day of data included.
list$data[[3]] # 3,438 X 5 tibble
unlist <- unnest(list, cols = data)
unlist <- unnest(list, cols = data) %>%
select(date, series_id, value) %>%
pivot_wider(names_from = series_id,
values_from = value)
storm <- unlist %>% rename("Natural Gas" = "EBA.TEX-ALL.NG.NG.H",
"Wind" = "EBA.TEX-ALL.NG.WND.H",
"Coal" = "EBA.TEX-ALL.NG.COL.H",
"Solar" = "EBA.TEX-ALL.NG.SUN.H",
"Water" = "EBA.TEX-ALL.NG.WAT.H",
"Nuclear" = "EBA.TEX-ALL.NG.NUC.H")
load("C:/Users/aleaw/OneDrive/Desktop/cuppackage/data/storm.rda")
Texas_wide <- storm
Texas_feb <- Texas_wide %>%
arrange(date) %>%
filter(date >= as.Date("2021-02-01") & date <= as.Date("2021-02-28") ) %>%
rename(datetime = date) %>%
mutate(date = as_date(datetime), # a new column appeared! stored as date
time = hms::as_hms(datetime) ) # stored as S3:hms (format for time)
Texas_long <- Texas_wide %>%
arrange(date) %>%
filter(date >= as.Date("2021-02-01") & date <= as.Date("2021-02-28") ) %>%
pivot_longer(!date,
names_to = "source",
values_to = "power") %>%
rename(datetime = date) %>%
mutate(date = as_date(datetime), # a new column appeared! stored as date
time = hms::as_hms(datetime) ) # stored as S3:hms (format for time)
str(Texas_long)
load("C:/Users/aleaw/OneDrive/Desktop/cuppackage/data/texastemperature.rda")
as_tibble(texastemperature)
# date stored as character, temp as double
#57 rows, 2 columns
temp <- as_tibble(texastemperature) %>%
mutate(date = mdy(date)) %>% # date now stored as date instead of character
filter(date >= as.Date("2021-02-01") & date <= as.Date("2021-02-28") ) # filter Feb.
temp
temp %>%
ggplot() +
geom_line(aes(date, temp_daily_avg))