This notebook, “1-e1-moving-averages.Rmd” uses original content from the Data Analyst Nanodegree by Udacity, used under CC BY-NC-ND 4.0. “1-e1-moving-averages.Rmd” is licensed under CC0 by Clare Gibson

All supporting files can be found in the Github repo

1 Moving averages

Moving averages are used to smooth out time-based data which has a tendency to show peaks and valleys. For example, a bricks-and-mortar clothing store may have more sales on a weekend compared to a weekday. Another contemporary example is the daily reported COVID-19 case numbers in the UK. Typically more cases are reported on a weekday compared to a weekend, and there can be dips on other days such as public holidays.

In these examples, if you visualise only the daily data, it is hard to interpret the overall trend because of the volatility of the data. Using a moving average you can smooth out some of the daily volatility while allowing you to view the long term trend.

2 Calculating the moving average

The data for this exercise comes from the CSV file titled 1-e1-moving-average.csv in the data-in folder. Let’s read it in and view the head.

daily_sales <- read_csv("data-in/1-e1-moving-average.csv") %>% 
  janitor::clean_names()

kable(head(daily_sales), caption = "Table 1: Daily sales data")
Table 1: Daily sales data
date sales
1/1/2009 7855
1/2/2009 12329
1/3/2009 11617
1/4/2009 7684
1/5/2009 8448
1/6/2009 8022

The 7-day moving average for any given day is calculated by taking the average sales of that day and the 6 previous days. The 14-day moving average is calculated by taking the average sales of the day in question and the previous 13 days. In R we can calculate this with a function called rollmean from the zoo package.

daily_sales <- daily_sales %>% 
  mutate(sales_07da = rollmean(sales, k = 7, align = "right", fill = NA),
         sales_14da = rollmean(sales, k = 14, align = "right", fill = NA))

kable(head(daily_sales, 20),
      caption = "Table 2: Daily sales data including 7-day and 14-day moving averages")
Table 2: Daily sales data including 7-day and 14-day moving averages
date sales sales_07da sales_14da
1/1/2009 7855 NA NA
1/2/2009 12329 NA NA
1/3/2009 11617 NA NA
1/4/2009 7684 NA NA
1/5/2009 8448 NA NA
1/6/2009 8022 NA NA
1/7/2009 8410 9195.000 NA
1/8/2009 7850 9194.286 NA
1/9/2009 12399 9204.286 NA
1/10/2009 11919 9247.429 NA
1/11/2009 7767 9259.286 NA
1/12/2009 8199 9223.714 NA
1/13/2009 8522 9295.143 NA
1/14/2009 8339 9285.000 9240.000
1/15/2009 8112 9322.429 9258.357
1/16/2009 12486 9334.857 9269.571
1/17/2009 12371 9399.429 9323.429
1/18/2009 8180 9458.429 9358.857
1/19/2009 8081 9441.571 9332.643
1/20/2009 7814 9340.429 9317.786

3 Questions

  1. What is the 7-day moving average for January 11, 2009?
daily_sales %>% 
  filter(date == "1/11/2009") %>% 
  pull(sales_07da)
## [1] 9259.286
  1. What is the 14-day moving average for March 19, 2009?
daily_sales %>% 
  filter(date == "3/19/2009") %>% 
  pull(sales_14da)
## [1] 9981.071