Select a column of your data that encodes time (e.g., “date”, “timestamp”, “year”, etc.). Convert this into a Date in R.
I am going to use year in my time based data. I think It would be interesting to see the time series of energy usage in the united states compared with renewable energy usage. I will need to select only the columns where the country code is equivalent to the USA’s.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)
#time series
library(xts)
## Warning: package 'xts' was built under R version 4.3.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.3.2
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'xts'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
library(tsibble)
## Warning: package 'tsibble' was built under R version 4.3.2
##
## Attaching package: 'tsibble'
##
## The following object is masked from 'package:zoo':
##
## index
##
## The following object is masked from 'package:lubridate':
##
## interval
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
setwd("C:/Users/kaitl/OneDrive/Documents/590_Working")
#update data types of dataframe
energy <- read_delim("./590_FinalData1.csv", delim = ",", col_types = "icciiciiiiiiii")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
energy1 <- energy
energy1[energy1 == '..'] <- NA
energy1$year <- as.Date(energy1$year , format = "d%/m%/%y")
#print(energy1$year)
Choose a column of data to analyze over time. This should be a “response-like” variable that is of particular interest.
Renewable Energy Usage
Create a tsibble object of just the date and response variable. Then, plot your data over time. Consider different windows of time. What stands out?
# # create a tsibble of renewable energy consumption for the United States
# energy_ <- energy1 |>
# filter(country_name == "United States") |>
# select(year, ren_energy_cons) |>
# distinct()
#
# energy_ts <- as_tsibble(energy_, index=year) |>
# index_by(date = date(year))
#
# energy_ts
# #plot data over time
# # an "xts" object separate from the original
# energy_xts <- xts(x = energy_ts$ren_energy_cons,
# order.by = energy_ts$date)
#
# energy_xts <- setNames(energy_xts, "ren. eng. use")
#
# #loess method
# energy_ts |>
# filter_index("1975" ~ "2000") |>
# drop_na() |>
# ggplot(mapping = aes(x = year, y = ren_energy_cons)) +
# geom_point(size=1, shape='O') +
# geom_smooth(span=0.2, color = 'blue', se=FALSE) + #adjust span to see changes in plot 0.1, 0.2, 0.3..
# labs(title = "U.S. Renewable Energy Consumption During 1975") +
# theme_hc()
I do notice an oscillation in the data, but as its doing this it is also headed in a positive direction.
{# {r} # # model <- lm(ren_energy_cons ~ year , data = energy1, na.action = na.omit) # # rsquared <- summary(model)$r.squared # # energy1 |> # ggplot(mapping = aes(x = ren_energy_cons, # y = year)) + # geom_point() + # geom_smooth(method = 'lm', color = 'gray', linetype = 'dashed', # se = FALSE) + # geom_smooth(se = FALSE) + # labs(title = "Renweable Energy Consumption vs. Year", # subtitle = paste("Linear Fit R-Squared =", round(rsquared, 3))) + # theme_classic()
# energy_ts |>
# mutate(years = lag(year, 7)) |>
# drop_na()
# acf(energy_ts, ci = 0.95, na.action = na.exclude)
This looks like it is telling us that there is a period of 10 in the data