Plotting Time Series: Seasonal Plots

Author

AS

Economy class passengers: Melbourne-Sydney

# install.packages("fpp2") # all data in book
library(fpp2) # reference data
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──
✔ ggplot2   3.5.2      ✔ fma       2.5   
✔ forecast  8.24.0     ✔ expsmooth 2.3   
?fpp2

plot(melsyd[,"Economy.Class"],
     main ="Economy class passengers: Melbourne-Sydney",
     xlab="Year",
     ylab="Thousands"
     )

Passenger traffic shows several notable anomalies. In 1989, no passengers were carried for a period due to an industrial dispute. In 1992, load factors were reduced during a trial that converted some economy seats to business class. A sharp increase in passenger numbers occurred in the second half of 1991, while holiday effects produced large dips around the start of each year.

The series also exhibits long-term fluctuations: rising through 1987, falling in 1989, and then increasing again across 1990–1991. Some observations are missing, and in certain periods, values even drop to zero.

Antidiabetic drug sales

This time series plot of antidiabetic drug sales gives a clear visual of the data’s structure over time.

The data is from the fpp2 package, specifically the a10 dataset, which contains monthly sales of antidiabetic drugs in Australia from 1991 to 2008. The plot below illustrates the sales trend and seasonal patterns in the data.

plot(x = a10, 
     ylab="$ million", 
     xlab="Year", 
     main="Antidiabetic drug sales"
     )

Interpretation of the Time Series Plot

  1. Clear upward trend: The sales are increasing steadily from the early 1990s to the late 2000s, reflecting exponential growth, not just a linear increase.

  2. Strong seasonal pattern: There are regular spikes at the end of each year — likely December — followed by a drop, consistent with end-of-year stockpiling behavior due to government subsidies.

  3. Increasing seasonal amplitude: The size of the seasonal fluctuations grows with the level of the series, suggesting a multiplicative seasonal effect.

  4. Heteroskedasticity: The variance increases over time, which violates assumptions of constant variance — a log transformation may stabilize this.

Implications for Modeling

  • Use a multiplicative decomposition or fit a model on the log-transformed data.

  • Models like ETS(M,A,M) (multiplicative error, additive trend, multiplicative seasonality) would be appropriate.

  • Any forecasting model should account for both the nonlinear growth trend and the seasonality tied to policy-driven behavior.

Seasonal Plot

A seasonal plot shows how a variable behaves across the same cycle (e.g., months of the year), repeated for multiple years. It helps you:

  • Compare patterns across different years

  • Identify consistent seasonality (e.g., always low in Feb, high in Dec)

  • Detect unusual years that deviate from typical patterns

  • Spot upward or downward trends within seasons

DATA: Monthly anti-diabetic drug subsidy in Australia from 1991 to 2008 (a10)

Monthly government expenditure (millions of dollars) as part of the Pharmaceutical Benefit Scheme for products falling under ATC code A10 as recorded by the Australian Health Insurance Commission. July 1991 - June 2008.

?a10
autoplot(a10)

?seasonplot
seasonplot(a10,
           ylab="$ million", 
           xlab="Year", 
           main="Seasonal plot: antidiabetic drug sales",
           year.labels=TRUE, 
           year.labels.left=TRUE, 
           col=1:20, 
           pch=19
           )

ggseasonplot(a10) 

ggseasonplot(a10) +
  labs(title = "Seasonal Plot: Antidiabetic Drug Sales",
       x = "Month", 
       y = "Sales ($ million)"
       ) 

Interpreting seasonal plot:

  • X-axis: Months (Jan–Dec)

  • Y-axis: Sales in millions of dollars

  • Lines: Each colored line is a year (1991 to 2008)

  • Insight: Sales are consistently lowest in February, with a clear upward trend over time — especially steep in 2006–2008.

When to Use a Seasonal Plot

Use it to visualize and analyze seasonal patterns in your time series data. Seasonal plots are particularly useful for identifying how different years compare in terms of seasonal behavior. They are especially helpful when you want to understand how a variable behaves during specific months across multiple years.

  • You have monthly or quarterly time series data

  • You suspect seasonal effects (e.g., sales, demand, prices)

  • You want to compare how each year behaves month by month

Month Plot

The monthplot() function in R is used to visualize seasonal effects in a time series, typically for monthly data. It helps you see how each month behaves on average across multiple years, highlighting patterns such as seasonality or outliers.

Purpose

  • Isolate and display seasonality — especially useful for monthly time series data.

  • Compare the average deviation of each month from the overall trend.

  • Visually assess which months are consistently higher or lower than the yearly average.

How It Works

  • monthplot() is most often applied to time series objects (e.g., ts, tslm, decompose, stl).

  • It usually works on the seasonal component obtained after decomposing a time series (e.g., via decompose()).

  • It creates a small line chart for each month across years, or plots the average seasonal effect for each month.

?monthplot
monthplot(a10,
          ylab = "$ million", 
          xlab = "Month", 
          xaxt = "n", 
          main = "Seasonal deviation plot: antidiabetic drug sales")

axis(1, 
     at = 1:12, 
     labels = month.abb, 
     cex = 0.8)

Understand jitter() in a Plot

Sometimes in a scatterplot, multiple data points have the same or similar values on one axis — causing them to overlap and hide patterns. jitter() adds small noise to make these points visible.

Example: Students’ Scores in Two Subjects

Let’s simulate student scores, where many students got the same grade:

# Create example data
set.seed(123)
students <- data.frame(
  Math = sample(x = c(70, 75, 80, 85, 90), size = 100, replace = TRUE),
  English = sample(x = c(70, 75, 80, 85, 90), size = 100, replace = TRUE)
)

# Without jitter (many overlapping points)
plot(students$Math, students$English,
     xlab = "Math Score", ylab = "English Score",
     main = "Without jitter: overlapping points",
     pch = 19, col = "steelblue")

You’ll see clusters of points overplotted, especially where many students got the same pair of scores.

Now: Apply jitter() to reveal structure

# Add jitter to reveal overlapping points
plot(jitter(students$Math), jitter(students$English),
     xlab = "Math Score", ylab = "English Score",
     main = "With jitter: hidden points revealed",
     pch = 19, col = "darkorange")

Effect:

  • Each point is nudged slightly so identical scores become separate dots, making it easier to see data density.
Feature Without jitter() With jitter()
Overlapping points Hidden Spread out
Pattern visibility Low High
Data distortion None Slight (controlled)

Use jitter() when:

  • You’re plotting categorical or discrete variables (test scores, ratings, MPG, etc.).

  • You want to see all the observations, even when values repeat.

#install.packages("fpp_0.5.tar.gz", repos = NULL, type = "source")
library(fpp) # data archived here
Loading required package: lmtest
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
Loading required package: tseries

Attaching package: 'fpp'
The following objects are masked from 'package:fpp2':

    ausair, ausbeer, austa, austourists, debitcards, departures,
    elecequip, euretail, guinearice, oil, sunspotarea, usmelec
str(fuel) # from fpp that is archived
'data.frame':   134 obs. of  8 variables:
 $ Model    : Factor w/ 110 levels "Chevrolet Aveo",..: 1 2 3 4 4 5 6 6 7 8 ...
 $ Cylinders: num  4 4 4 4 5 5 4 5 4 4 ...
 $ Litres   : num  1.6 1.6 2.2 2.9 3.7 3.7 2.9 3.7 2 2 ...
 $ Barrels  : num  12.2 12.2 12.7 17.1 18 20.1 17.1 18 14.9 14.9 ...
 $ City     : num  25 25 24 18 17 15 18 17 19 19 ...
 $ Highway  : num  34 34 33 24 23 20 24 23 29 29 ...
 $ Cost     : num  1012 1012 1049 1418 1491 ...
 $ Carbon   : num  6.6 6.6 6.8 9.2 9.6 10.8 9.2 9.6 8 8 ...
plot(x = fpp::fuel[,5],
     y =  fpp::fuel[,8],
     xlab="City mpg",
     ylab="Carbon footprint"
     )

plot(x = jitter(fpp::fuel[,5]), 
     y = jitter(fpp::fuel[,8]), 
     xlab="City mpg", 
     ylab="Carbon footprint"
     )

Scatterplot Matrix: Exploring Fuel Efficiency and Emissions

To explore relationships between fuel efficiency, engine size, and emissions, we use a scatterplot matrix based on the fuel dataset. This dataset contains information about various vehicle models, including their engine specifications and environmental impact.

We focus on four key quantitative variables:

  • Litres – Engine size in litres
  • City – City fuel efficiency in miles per gallon (MPG)
  • Highway – Highway fuel efficiency in MPG
  • Carbon – Estimated CO₂ emissions in metric tons per year
# pairs(fpp::fuel[,-c(1:2,4,7)], pch=19)

# Create scatterplot matrix with selected numeric variables
pairs(fuel[, c("Litres", "City", "Highway", "Carbon")],
      pch = 19,
      main = "Scatterplot Matrix: Engine Size, MPG, and CO₂ Emissions")

?pairs

Key Observations

  • Engine Size vs MPG
    Vehicles with larger engines (Litres) tend to have lower city and highway MPG. This inverse relationship is expected: bigger engines typically consume more fuel.

  • Fuel Efficiency vs Emissions
    Both City and Highway fuel economy are negatively correlated with Carbon. That is, fuel-efficient vehicles emit less CO₂ per year.

  • Engine Size vs Carbon Emissions
    There is a strong positive correlation between Litres and Carbon. Larger engines result in higher carbon output due to more fuel burned.

  • City vs Highway MPG
    These two metrics are strongly positively correlated—cars that are efficient in the city tend to be efficient on highways too.


Why Use a Scatterplot Matrix?

The pairs() function in R lets us:

  • Visualize pairwise relationships across multiple numeric variables
  • Detect correlations, outliers, and clustering
  • Quickly screen for non-linear relationships or redundancy

This tool is especially helpful during exploratory data analysis (EDA), where understanding the structure and interaction between variables is key before modeling.

Enhanced Pairwise Plot: Fuel Dataset

The kdpairs() function from the car package provides an enhanced version of the classic scatterplot matrix. It adds:

  • 2D density contour plots (lower triangle)
  • Correlation values and smoothed trend lines (upper triangle)
  • Histograms with density overlays (diagonal)

This allows us to explore pairwise relationships between variables more effectively than with a standard pairs() plot.

We use the fuel dataset (from the archived fpp package), which includes data on vehicle engine size, fuel efficiency, and carbon emissions.

# Load the required package library(car)
library(ResourceSelection)
??kdepairs
kdepairs(fpp::fuel[, c("Litres", "City", "Highway", "Carbon")])

Note that I am controlling the figure width and breadth in the R code chunk options above. See the code options for more details and play around..

Chart Interpretation

  • Lower triangle:
    Contour density plots highlight regions where data points are concentrated.
    Darker rings = higher density.

  • Upper triangle:
    Scatterplots with a LOESS smoother, plus correlation coefficient.
    For example:

    • City vs Highway: +0.821 → strong positive relationship
    • Highway vs Carbon: –0.927 → strong negative relationship
  • Diagonal:
    Histograms overlaid with kernel density curves show distribution shape.

These plots help detect non-linear relationships, outliers, and redundancy among predictors.

myfun=function(x){
  m=mean(x)
  md=median(x)
  s=sd(x)
  f=fivenum(x)
  par(mfrow=c(2,1))
  h=hist(x, main="Hist")
  b=boxplot(x, main="Boxplot")
  all=as.list(c(m,s,f))
  names(all)=c("Mean","sd","min","first","median","third","max")
  return(all)
}

cardata=c(4,4.4,5.9,5.9,6.1,6.1,6.1,6.3,6.3,6.3,6.3,
          rep(6.6,7), rep(6.8,3))

myfun(cardata)

$Mean
[1] 6.204762

$sd
[1] 0.7255868

$min
[1] 4

$first
[1] 6.1

$median
[1] 6.3

$third
[1] 6.6

$max
[1] 6.8
par(mfrow=c(2,2))
plot(anscombe$y1 ~ anscombe$x1)
plot(anscombe$y2 ~ anscombe$x2)
plot(anscombe$y3 ~ anscombe$x3)
plot(anscombe$y4 ~ anscombe$x4)

cor(anscombe$y1, anscombe$x1, method="pearson")
[1] 0.8164205
cor(anscombe$y2, anscombe$x2, method="spearman")
[1] 0.6909091
cor(anscombe)
           x1         x2         x3         x4         y1         y2         y3
x1  1.0000000  1.0000000  1.0000000 -0.5000000  0.8164205  0.8162365  0.8162867
x2  1.0000000  1.0000000  1.0000000 -0.5000000  0.8164205  0.8162365  0.8162867
x3  1.0000000  1.0000000  1.0000000 -0.5000000  0.8164205  0.8162365  0.8162867
x4 -0.5000000 -0.5000000 -0.5000000  1.0000000 -0.5290927 -0.7184365 -0.3446610
y1  0.8164205  0.8164205  0.8164205 -0.5290927  1.0000000  0.7500054  0.4687167
y2  0.8162365  0.8162365  0.8162365 -0.7184365  0.7500054  1.0000000  0.5879193
y3  0.8162867  0.8162867  0.8162867 -0.3446610  0.4687167  0.5879193  1.0000000
y4 -0.3140467 -0.3140467 -0.3140467  0.8165214 -0.4891162 -0.4780949 -0.1554718
           y4
x1 -0.3140467
x2 -0.3140467
x3 -0.3140467
x4  0.8165214
y1 -0.4891162
y2 -0.4780949
y3 -0.1554718
y4  1.0000000
cor.test
function (x, ...) 
UseMethod("cor.test")
<bytecode: 0x1040bc808>
<environment: namespace:stats>
kdepairs(anscombe)
Warning in par(usr): argument 1 does not name a graphical parameter
Warning in par(usr): argument 1 does not name a graphical parameter
Warning in par(usr): argument 1 does not name a graphical parameter
Warning in par(usr): argument 1 does not name a graphical parameter
Warning in par(usr): argument 1 does not name a graphical parameter
Warning in par(usr): argument 1 does not name a graphical parameter
Warning in par(usr): argument 1 does not name a graphical parameter
Warning in par(usr): argument 1 does not name a graphical parameter