The following example has been borrowed from the following article: https://www.business-science.io/code-tools/2018/04/08/introducing-anomalize.html
## Example
# ---
# Find the anomalies on the following given time series dataset.
# ---
# OUR CODE BELOW
#
# Installing anomalize package
# ---
#
install.packages("anomalize")
## Installing package into 'C:/Users/RoySambu/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)
## package 'anomalize' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\RoySambu\AppData\Local\Temp\RtmpcvS9bR\downloaded_packages
# Load tidyverse and anomalize
# ---
#
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.5
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.1.1 v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.5
## Warning: package 'tibble' was built under R version 4.0.5
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'readr' was built under R version 4.0.5
## Warning: package 'dplyr' was built under R version 4.0.5
## Warning: package 'forcats' was built under R version 4.0.5
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(anomalize)
## == Use anomalize to improve your Forecasts by 50%! =============================
## Business Science offers a 1-hour course - Lab #18: Time Series Anomaly Detection!
## </> Learn more at: https://university.business-science.io/p/learning-labs-pro </>
# Collect our time series data
# ---
#
tidyverse_cran_downloads
## # A tibble: 6,375 x 3
## # Groups: package [15]
## date count package
## <date> <dbl> <chr>
## 1 2017-01-01 873 tidyr
## 2 2017-01-02 1840 tidyr
## 3 2017-01-03 2495 tidyr
## 4 2017-01-04 2906 tidyr
## 5 2017-01-05 2847 tidyr
## 6 2017-01-06 2756 tidyr
## 7 2017-01-07 1439 tidyr
## 8 2017-01-08 1556 tidyr
## 9 2017-01-09 3678 tidyr
## 10 2017-01-10 7086 tidyr
## # ... with 6,365 more rows
# Detecting our anomalies
# ----
# We now use the following functions to detect and visualize anomalies;
# We decomposed the “count” column into “observed”, “season”, “trend”, and “remainder” columns.
# The default values for time series decompose are method = "stl",
# which is just seasonal decomposition using a Loess smoother (refer to stats::stl()).
# The frequency and trend parameters are automatically set based on the time scale (or periodicity)
# of the time series using tibbletime based function under the hood.
# time_decompose() - this function would help with time series decomposition.
#
# anomalize() -
# We perform anomaly detection on the decomposed data using
# the remainder column through the use of the anomalize() function
# which procides 3 new columns; “remainder_l1” (lower limit),
# “remainder_l2” (upper limit), and “anomaly” (Yes/No Flag).
# The default method is method = "iqr", which is fast and relatively
# accurate at detecting anomalies.
# The alpha parameter is by default set to alpha = 0.05,
# but can be adjusted to increase or decrease the height of the anomaly bands,
# making it more difficult or less difficult for data to be anomalous.
# The max_anoms parameter is by default set to a maximum of max_anoms = 0.2
# for 20% of data that can be anomalous.
#
# time_recompose()-
# We create the lower and upper bounds around the “observed” values
# through the use of the time_recompose() function, which recomposes
# the lower and upper bounds of the anomalies around the observed values.
# We create new columns created: “recomposed_l1” (lower limit)
# and “recomposed_l2” (upper limit).
#
# plot_anomalies() -
# we now plot using plot_anomaly_decomposition() to visualize out data.
#
# ---
#
tidyverse_cran_downloads %>%
time_decompose(count) %>%
anomalize(remainder) %>%
time_recompose() %>%
plot_anomalies(time_recomposed = TRUE, ncol = 3, alpha_dots = 0.5)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Warning: `type_convert()` only converts columns of type 'character'.
## - `df` has no columns of type 'character'
## Challenge
# ---
# Find the anomalies on the following given time series dataset.
# ---
# OUR CODE GOES BELOW
#
#logs_path <- "http://bit.ly/LogsDataset"
# Grouping by server and converting to tibbletime
#security_access_logs <- read_csv(logs_path) %>%
##group_by(server) %>%
#as_tbl_time(date)
#security_access_logs