The dataset of aggregated electricity load of consumers from an anonymous area is used. Time series data have the length of 17 weeks.
Firstly, let’s scan all of the needed packages for data analysis, modeling and visualizing.
library(feather) # data import
library(data.table) # data handle
library(rpart) # decision tree method
library(party) # decision tree method
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
library(forecast) # forecasting methods
library(randomForest) # ensemble learning method
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
library(ggplot2) # visualizations
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:randomForest':
##
## margin
Now read the mentioned time series data by read_feather to one data.table. The dataset can be found on my github repo, the name of the file is DT_load_17weeks.
DT <- as.data.table(read_feather("DT_load_17weeks"))
And store information of the date and period of time series that is 48
n_date <- unique(DT[, date])
period <- 48
For data visualization needs, store my favorite ggplot theme settings by function theme.
theme_ts <- theme(panel.border = element_rect(fill = NA,
colour = "grey10"),
panel.background = element_blank(),
panel.grid.minor = element_line(colour = "grey85"),
panel.grid.major = element_line(colour = "grey85"),
panel.grid.major.x = element_line(colour = "grey85"),
axis.text = element_text(size = 13, face = "bold"),
axis.title = element_text(size = 15, face = "bold"),
plot.title = element_text(size = 16, face = "bold"),
strip.text = element_text(size = 16, face = "bold"),
strip.background = element_rect(colour = "black"),
legend.text = element_text(size = 15),
legend.title = element_text(size = 16, face = "bold"),
legend.background = element_rect(fill = "white"),
legend.key = element_rect(fill = "white"))
I will use three weeks of data of electricity consumption for training regression trees methods. Forecasts will be performed to one day ahead. Let’s extract train and test set from the dataset.
data_train <- DT[date %in% n_date[1:21]]
data_test <- DT[date %in% n_date[22]]
And visualize the train part:
ggplot(data_train, aes(date_time, value)) +
geom_line() +
labs(x = "Date", y = "Load (kW)") +
theme_ts