This notebook, “1-e1-moving-averages.Rmd” uses original content from the Data Analyst Nanodegree by Udacity, used under CC BY-NC-ND 4.0. “1-e1-moving-averages.Rmd” is licensed under CC0 by Clare Gibson
All supporting files can be found in the Github repo
Moving averages are used to smooth out time-based data which has a tendency to show peaks and valleys. For example, a bricks-and-mortar clothing store may have more sales on a weekend compared to a weekday. Another contemporary example is the daily reported COVID-19 case numbers in the UK. Typically more cases are reported on a weekday compared to a weekend, and there can be dips on other days such as public holidays.
In these examples, if you visualise only the daily data, it is hard to interpret the overall trend because of the volatility of the data. Using a moving average you can smooth out some of the daily volatility while allowing you to view the long term trend.
The data for this exercise comes from the CSV file titled 1-e1-moving-average.csv in the data-in folder. Let’s read it in and view the head.
daily_sales <- read_csv("data-in/1-e1-moving-average.csv") %>%
janitor::clean_names()
kable(head(daily_sales), caption = "Table 1: Daily sales data")
| date | sales |
|---|---|
| 1/1/2009 | 7855 |
| 1/2/2009 | 12329 |
| 1/3/2009 | 11617 |
| 1/4/2009 | 7684 |
| 1/5/2009 | 8448 |
| 1/6/2009 | 8022 |
The 7-day moving average for any given day is calculated by taking the average sales of that day and the 6 previous days. The 14-day moving average is calculated by taking the average sales of the day in question and the previous 13 days. In R we can calculate this with a function called rollmean from the zoo package.
daily_sales <- daily_sales %>%
mutate(sales_07da = rollmean(sales, k = 7, align = "right", fill = NA),
sales_14da = rollmean(sales, k = 14, align = "right", fill = NA))
kable(head(daily_sales, 20),
caption = "Table 2: Daily sales data including 7-day and 14-day moving averages")
| date | sales | sales_07da | sales_14da |
|---|---|---|---|
| 1/1/2009 | 7855 | NA | NA |
| 1/2/2009 | 12329 | NA | NA |
| 1/3/2009 | 11617 | NA | NA |
| 1/4/2009 | 7684 | NA | NA |
| 1/5/2009 | 8448 | NA | NA |
| 1/6/2009 | 8022 | NA | NA |
| 1/7/2009 | 8410 | 9195.000 | NA |
| 1/8/2009 | 7850 | 9194.286 | NA |
| 1/9/2009 | 12399 | 9204.286 | NA |
| 1/10/2009 | 11919 | 9247.429 | NA |
| 1/11/2009 | 7767 | 9259.286 | NA |
| 1/12/2009 | 8199 | 9223.714 | NA |
| 1/13/2009 | 8522 | 9295.143 | NA |
| 1/14/2009 | 8339 | 9285.000 | 9240.000 |
| 1/15/2009 | 8112 | 9322.429 | 9258.357 |
| 1/16/2009 | 12486 | 9334.857 | 9269.571 |
| 1/17/2009 | 12371 | 9399.429 | 9323.429 |
| 1/18/2009 | 8180 | 9458.429 | 9358.857 |
| 1/19/2009 | 8081 | 9441.571 | 9332.643 |
| 1/20/2009 | 7814 | 9340.429 | 9317.786 |
daily_sales %>%
filter(date == "1/11/2009") %>%
pull(sales_07da)
## [1] 9259.286
daily_sales %>%
filter(date == "3/19/2009") %>%
pull(sales_14da)
## [1] 9981.071