Anomaly Detection in R

I am a Data analyst at Carrefour Kenya and currently undertaking a project that will inform the marketing department on the most relevant marketing strategies that will result in the hughest number of sales (total price including tax). I’ll explore a recent marketing dataset by performing various unsupervised learning techniques and later providing recommendations based on your insights. I will be checking whether there are any anomalies in the sales dataset, with the objective being fraud detection.

A sales dataset has been provided to perform dimensionality reduction on. We first begin with loading and previewing the dataset at

Install the anomalize and tibbletime package

install.packages("anomalize")

## Installing package into '/home/binti/R/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)

install.packages("tibbletime")

## Installing package into '/home/binti/R/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)

Let’s load the libraries we’ll need for this task.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(anomalize)

## ══ Use anomalize to improve your Forecasts by 50%! ═════════════════════════════
## Business Science offers a 1-hour course - Lab #18: Time Series Anomaly Detection!
## </> Learn more at: https://university.business-science.io/p/learning-labs-pro </>

Loading the tibbletime and dplyr library.

library(tibbletime)

## 
## Attaching package: 'tibbletime'

## The following object is masked from 'package:stats':
## 
##     filter

library(dplyr)

Loading the dataset

sales <- read.csv("http://bit.ly/CarreFourSalesDataset")

sales$Date <- as.Date(sales$Date, format ="%m/%d/%Y")
sales$Date <- sort(sales$Date, decreasing = FALSE)

sales <- as_tbl_time(sales, index = Date)

sales <- sales %>%
    as_period("daily")

Previewing the dataset

Checking for the dataset’s dimensions

dim(sales)

## [1] 89  2

Let’s preview the top of our dataset

head(sales)

## # A time tibble: 6 × 2
## # Index: Date
##   Date       Sales
##   <date>     <dbl>
## 1 2019-01-01  549.
## 2 2019-01-02  246.
## 3 2019-01-03  452.
## 4 2019-01-04  464.
## 5 2019-01-05  418.
## 6 2019-01-06  536.

Checking the bottom of our dataset

tail(sales)

## # A time tibble: 6 × 2
## # Index: Date
##   Date       Sales
##   <date>     <dbl>
## 1 2019-03-25 361. 
## 2 2019-03-26 188. 
## 3 2019-03-27  43.9
## 4 2019-03-28 271. 
## 5 2019-03-29 244. 
## 6 2019-03-30 633.

Detecting our anomalies. Let’s plot to visualize our data.

library(anomalize)
library(dplyr)

sales %>%
    time_decompose(Sales) %>%
    anomalize(remainder) %>%
    time_recompose() %>%
    plot_anomalies(time_recomposed = TRUE, ncol = 3, alpha_dots = 0.5)

## frequency = 7 days

## trend = 30 days

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

There are no anomalies in our dataset.

Anomaly Detection in R

Matilda Kadzo

2022-06-11

Loading the dataset

Previewing the dataset