© 2026 Dr. Debashis Chatterjee. All rights reserved.
Prepared exclusively for educational use in MJST09B Time Series
Analysis (Practical/Lab Work), Department of Statistics,
Visva-Bharati University.
pkgs <- c("ggplot2","dplyr","tidyr","readr","knitr","forecast","tseries")
to_install <- pkgs[!sapply(pkgs, requireNamespace, quietly = TRUE)]
if(length(to_install) > 0) install.packages(to_install, dependencies = TRUE)
library(ggplot2)
library(dplyr)
library(tidyr)
library(readr)
library(knitr)
library(forecast)
library(tseries)
set.seed(2026)k_tbl <- function(x, caption=NULL, digits=4){
knitr::kable(as.data.frame(x), caption = caption, digits = digits)
}
# Convert a ts object to a data frame for ggplot
ts_df <- function(x, value_name="value"){
tt <- time(x)
data.frame(
time = as.numeric(tt),
value = as.numeric(x),
stringsAsFactors = FALSE
) |>
dplyr::rename(!!value_name := value)
}
# A simple ggplot line plot for a ts object
plot_ts <- function(x, title="", ylab="Value"){
df <- ts_df(x)
ggplot(df, aes(time, value)) +
geom_line(color="#2c7fb8", linewidth=0.7) +
labs(title=title, x="Time", y=ylab) +
theme_minimal(base_size=12)
}A time series is a sequence of observations ordered in time: \[ \{X_t\}_{t=1}^T. \] Typical components:
Two common decomposition models:
This is a classic real dataset (monthly totals, 1949–1960).
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
## [1] 12
## [1] 1949 1
## [1] 1960 12
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 104.0 180.0 265.5 280.3 360.5 622.0
ggseasonplot(AirPassengers, year.labels=TRUE) +
labs(title="Seasonal plot (AirPassengers)", y="Passengers") +
theme_minimal(base_size=12)ap_stl <- stl(AirPassengers, s.window="periodic")
plot(ap_stl, main="STL Decomposition: AirPassengers")ap_mult <- decompose(AirPassengers, type="multiplicative")
plot(ap_mult)
title(main="Classical Decomposition (Multiplicative): AirPassengers", line=0.5)ap_add <- decompose(AirPassengers, type="additive")
plot(ap_add)
title(main="Classical Decomposition (Additive): AirPassengers", line=0.5)For monthly data, a common choice is a 12-month moving average.
ap <- AirPassengers
ma12 <- ma(ap, order=12, centre=TRUE) # from forecast package
df <- ts_df(ap, "y") |>
mutate(ma12 = as.numeric(ma12))
ggplot(df, aes(time)) +
geom_line(aes(y=y), color="#2c7fb8", linewidth=0.6) +
geom_line(aes(y=ma12), color="#d95f0e", linewidth=1.0, na.rm=TRUE) +
labs(title="AirPassengers with 12-month moving average trend",
x="Time", y="Passengers") +
theme_minimal(base_size=12)What students should write:
- Moving average smooths short-term fluctuations and highlights the
trend.
- Trend line is “slow” and ignores seasonality.
We use ChickWeight (real dataset): chick weights
measured over time under different diets.
This is ideal for growth curves: Modified Exponential,
Gompertz, Logistic.
We take the average weight over chicks for Diet 1 at each time.
data(ChickWeight)
cw <- ChickWeight |>
filter(Diet == 1) |>
group_by(Time) |>
summarise(weight = mean(weight), .groups="drop") |>
arrange(Time)
k_tbl(head(cw, 10), caption="ChickWeight (Diet 1): average weight by Time", digits=2)| Time | weight |
|---|---|
| 0 | 41.40 |
| 2 | 47.25 |
| 4 | 56.47 |
| 6 | 66.79 |
| 8 | 79.68 |
| 10 | 93.05 |
| 12 | 108.53 |
| 14 | 123.39 |
| 16 | 144.65 |
| 18 | 158.94 |
ggplot(cw, aes(Time, weight)) +
geom_point(color="#2c7fb8", size=2) +
geom_line(color="#2c7fb8", linewidth=0.6) +
labs(title="Growth data: average chick weight vs time (Diet 1)",
x="Time", y="Average weight") +
theme_minimal(base_size=12)A simple modified exponential curve (one common form) is: \[ y(t) = a + b e^{ct}, \] where \(a,b,c\) are parameters to be estimated.
t <- cw$Time
y <- cw$weight
modexp <- nls(y ~ a + b*exp(c*t),
start = list(a = min(y), b = 1, c = 0.02))
summary(modexp)##
## Formula: y ~ a + b * exp(c * t)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a -1.299e+02 5.919e+01 -2.195 0.05583 .
## b 1.672e+02 5.736e+01 2.914 0.01719 *
## c 2.960e-02 7.519e-03 3.937 0.00342 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.459 on 9 degrees of freedom
##
## Number of iterations to convergence: 10
## Achieved convergence tolerance: 2.257e-06
cw$fit_modexp <- predict(modexp)
ggplot(cw, aes(Time, weight)) +
geom_point(color="#2c7fb8", size=2) +
geom_line(color="#2c7fb8", linewidth=0.5) +
geom_line(aes(y=fit_modexp), color="#d95f0e", linewidth=1.0) +
labs(title="Modified Exponential Fit: y = a + b exp(ct)",
x="Time", y="Average weight") +
theme_minimal(base_size=12)Gompertz growth model: \[ y(t)=a\exp\{-b\exp(-ct)\}, \] where \(a>0\) is the upper asymptote.
gompertz <- nls(y ~ a*exp(-b*exp(-c*t)),
start = list(a = max(y)*1.05, b = 2, c = 0.2))
summary(gompertz)##
## Formula: y ~ a * exp(-b * exp(-c * t))
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 5.165e+02 1.200e+02 4.305 0.001977 **
## b 2.614e+00 2.014e-01 12.981 3.93e-07 ***
## c 4.329e-02 6.615e-03 6.545 0.000106 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.738 on 9 degrees of freedom
##
## Number of iterations to convergence: 11
## Achieved convergence tolerance: 8.284e-06
cw$fit_gompertz <- predict(gompertz)
ggplot(cw, aes(Time, weight)) +
geom_point(color="#2c7fb8", size=2) +
geom_line(color="#2c7fb8", linewidth=0.5) +
geom_line(aes(y=fit_gompertz), color="#1b9e77", linewidth=1.0) +
labs(title="Gompertz Fit: y = a exp{-b exp(-ct)}",
x="Time", y="Average weight") +
theme_minimal(base_size=12)Logistic growth model: \[ y(t)=\frac{a}{1+b\exp(-ct)}, \] where \(a>0\) is the carrying capacity.
logistic <- nls(y ~ a/(1 + b*exp(-c*t)),
start = list(a = max(y)*1.1, b = 5, c = 0.2))
summary(logistic)##
## Formula: y ~ a/(1 + b * exp(-c * t))
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 2.734e+02 1.735e+01 15.76 7.34e-08 ***
## b 6.081e+00 3.358e-01 18.11 2.18e-08 ***
## c 1.165e-01 5.873e-03 19.84 9.76e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.092 on 9 degrees of freedom
##
## Number of iterations to convergence: 6
## Achieved convergence tolerance: 1.2e-06
cw$fit_logistic <- predict(logistic)
ggplot(cw, aes(Time, weight)) +
geom_point(color="#2c7fb8", size=2) +
geom_line(color="#2c7fb8", linewidth=0.5) +
geom_line(aes(y=fit_logistic), color="#7570b3", linewidth=1.0) +
labs(title="Logistic Fit: y = a / (1 + b exp(-ct))",
x="Time", y="Average weight") +
theme_minimal(base_size=12)We compute a simple fit comparison using SSE (sum of squared errors).
SSE <- function(obs, fit) sum((obs - fit)^2, na.rm=TRUE)
cmp <- data.frame(
Model = c("Modified Exponential", "Gompertz", "Logistic"),
SSE = c(SSE(y, cw$fit_modexp), SSE(y, cw$fit_gompertz), SSE(y, cw$fit_logistic))
) |>
arrange(SSE)
k_tbl(cmp, caption="Fit comparison (smaller SSE is better)", digits=2)| Model | SSE |
|---|---|
| Logistic | 39.39 |
| Gompertz | 67.45 |
| Modified Exponential | 107.69 |
cw_long <- cw |>
select(Time, weight, fit_modexp, fit_gompertz, fit_logistic) |>
pivot_longer(cols = starts_with("fit_"), names_to="model", values_to="fit")
ggplot() +
geom_point(data=cw, aes(Time, weight), color="black", size=2) +
geom_line(data=cw_long, aes(Time, fit, color=model), linewidth=1.0) +
labs(title="Growth curve fits (all on one plot)", x="Time", y="Average weight", color="Model") +
theme_minimal(base_size=12)We demonstrate three standard seasonal index methods:
We use multiplicative seasonality for AirPassengers.
Seasonal index for month \(m\): \[ SI(m)=\frac{\text{mean of month }m}{\text{overall mean}}\times 100. \]
ap_df <- data.frame(
year = as.integer(floor(time(AirPassengers))),
month = cycle(AirPassengers),
y = as.numeric(AirPassengers)
)
monthly_means <- ap_df |>
group_by(month) |>
summarise(month_mean = mean(y), .groups="drop")
overall_mean <- mean(ap_df$y)
si_simple <- monthly_means |>
mutate(SI = 100 * month_mean / overall_mean)
k_tbl(si_simple, caption="Seasonal indices by Simple Averages (percent)", digits=2)| month | month_mean | SI |
|---|---|---|
| 1 | 241.75 | 86.25 |
| 2 | 235.00 | 83.84 |
| 3 | 270.17 | 96.39 |
| 4 | 267.08 | 95.29 |
| 5 | 271.83 | 96.98 |
| 6 | 311.67 | 111.19 |
| 7 | 351.33 | 125.34 |
| 8 | 351.08 | 125.25 |
| 9 | 302.42 | 107.89 |
| 10 | 266.58 | 95.11 |
| 11 | 232.83 | 83.07 |
| 12 | 261.83 | 93.41 |
ggplot(si_simple, aes(month, SI)) +
geom_line(color="#2c7fb8", linewidth=0.8) +
geom_point(color="#2c7fb8", size=2) +
labs(title="Seasonal Indices (Simple Averages)", x="Month (1=Jan,...,12=Dec)", y="SI (%)") +
theme_minimal(base_size=12)Steps: 1. Compute 12-month centered moving average (trend-cycle
estimate).
2. Compute ratios \(100\times
y_t/\text{CMA}_t\).
3. Average ratios by month and normalize so mean index = 100.
cma <- ma(AirPassengers, order=12, centre=TRUE)
ratio <- 100 * (AirPassengers / cma)
ratio_df <- data.frame(
month = cycle(AirPassengers),
ratio = as.numeric(ratio)
) |>
filter(is.finite(ratio))
si_rma <- ratio_df |>
group_by(month) |>
summarise(SI_raw = mean(ratio), .groups="drop") |>
mutate(SI = SI_raw * (1200 / sum(SI_raw))) # normalize mean to 100
k_tbl(si_rma, caption="Seasonal indices by Ratio-to-Moving-Average (normalized)", digits=2)| month | SI_raw | SI |
|---|---|---|
| 1 | 90.86 | 91.02 |
| 2 | 88.21 | 88.36 |
| 3 | 100.56 | 100.74 |
| 4 | 97.42 | 97.59 |
| 5 | 97.96 | 98.14 |
| 6 | 111.08 | 111.28 |
| 7 | 122.44 | 122.66 |
| 8 | 121.78 | 121.99 |
| 9 | 105.86 | 106.05 |
| 10 | 92.01 | 92.18 |
| 11 | 79.98 | 80.12 |
| 12 | 89.72 | 89.88 |
ggplot(si_rma, aes(month, SI)) +
geom_line(color="#d95f0e", linewidth=0.8) +
geom_point(color="#d95f0e", size=2) +
labs(title="Seasonal Indices (Ratio-to-Moving-Average)", x="Month", y="SI (%)") +
theme_minimal(base_size=12)Steps (monthly): 1. Compute link relatives: \(LR_t = 100\times y_t/y_{t-1}\).
2. For each month, average link relatives.
3. Convert to chain relatives (cumulative product).
4. Normalize to get seasonal indices with mean 100.
yap <- as.numeric(AirPassengers)
m <- cycle(AirPassengers)
LR <- 100 * yap[-1] / yap[-length(yap)]
month_for_LR <- m[-1] # month of current observation
lr_df <- data.frame(month = month_for_LR, LR = LR)
avg_lr <- lr_df |>
group_by(month) |>
summarise(avgLR = mean(LR), .groups="drop") |>
arrange(month)
# Chain relatives (start month 1 at 100)
CR <- numeric(12)
CR[1] <- 100
for(i in 2:12) CR[i] <- CR[i-1] * avg_lr$avgLR[i] / 100
# Normalize chain relatives to mean 100
SI_lr <- 100 * CR / mean(CR)
si_link <- data.frame(month = 1:12, SI = SI_lr)
k_tbl(si_link, caption="Seasonal indices by Link Relative method (normalized)", digits=2)| month | SI |
|---|---|
| 1 | 85.72 |
| 2 | 84.83 |
| 3 | 97.70 |
| 4 | 95.74 |
| 5 | 96.56 |
| 6 | 110.32 |
| 7 | 123.69 |
| 8 | 123.84 |
| 9 | 108.35 |
| 10 | 95.34 |
| 11 | 83.43 |
| 12 | 94.49 |
ggplot(si_link, aes(month, SI)) +
geom_line(color="#1b9e77", linewidth=0.8) +
geom_point(color="#1b9e77", size=2) +
labs(title="Seasonal Indices (Link Relative)", x="Month", y="SI (%)") +
theme_minimal(base_size=12)A (weakly) stationary series has constant mean and autocovariance depending only on lag.
AirPassengers is non-stationary. We often stabilize it by: - log transform - differencing
ap_log <- log(AirPassengers)
ap_d1 <- diff(ap_log) # remove trend
ap_d12 <- diff(ap_d1, 12) # remove seasonality
par(mfrow=c(3,1))
plot(ap_log, main="log(AirPassengers)")
plot(ap_d1, main="diff(log(AirPassengers))")
plot(ap_d12, main="seasonal diff of diff(log(AirPassengers))")ar1 <- arima.sim(model=list(ar=0.7), n=200)
ar2 <- arima.sim(model=list(ar=c(0.6,-0.2)), n=200)
ma1 <- arima.sim(model=list(ma=0.6), n=200)
ma2 <- arima.sim(model=list(ma=c(0.5,0.3)), n=200)
par(mfrow=c(2,2))
plot(ar1, main="Simulated AR(1)")
plot(ar2, main="Simulated AR(2)")
plot(ma1, main="Simulated MA(1)")
plot(ma2, main="Simulated MA(2)")fit_ar1_yw <- ar.yw(ar1, order.max=1)
fit_ar2_yw <- ar.yw(ar2, order.max=2)
k_tbl(data.frame(model="AR(1)", phi1=fit_ar1_yw$ar[1]), caption="Yule–Walker estimate (AR1)", digits=3)| model | phi1 |
|---|---|
| AR(1) | 0.691 |
k_tbl(data.frame(model="AR(2)", phi1=fit_ar2_yw$ar[1], phi2=fit_ar2_yw$ar[2]), caption="Yule–Walker estimate (AR2)", digits=3)| model | phi1 | phi2 |
|---|---|---|
| AR(2) | 0.509 | -0.188 |
##
## Call:
## arima(x = ar1, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.6937 0.1735
## s.e. 0.0507 0.2207
##
## sigma^2 estimated as 0.9343: log likelihood = -277.32, aic = 560.65
##
## Call:
## arima(x = ma1, order = c(0, 0, 1))
##
## Coefficients:
## ma1 intercept
## 0.6278 0.2035
## s.e. 0.0594 0.1172
##
## sigma^2 estimated as 1.04: log likelihood = -288, aic = 582
fit_ses <- ses(Nile, h=10)
autoplot(fit_ses) + labs(title="Simple Exponential Smoothing forecast (Nile)") + theme_minimal(base_size=12)fit_holt <- holt(Nile, h=10)
autoplot(fit_holt) + labs(title="Holt (trend) forecast (Nile)") + theme_minimal(base_size=12)fit_hw <- hw(AirPassengers, seasonal="multiplicative", h=24)
autoplot(fit_hw) + labs(title="Holt–Winters forecast (AirPassengers)") + theme_minimal(base_size=12)We use auto.arima() for a student-friendly workflow.
## Series: log(AirPassengers)
## ARIMA(0,1,1)(0,1,1)[12]
##
## Coefficients:
## ma1 sma1
## -0.4018 -0.5569
## s.e. 0.0896 0.0731
##
## sigma^2 = 0.001371: log likelihood = 244.7
## AIC=-483.4 AICc=-483.21 BIC=-474.77
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)(0,1,1)[12]
## Q* = 26.446, df = 22, p-value = 0.233
##
## Model df: 2. Total lags used: 24
fc_arima <- forecast(fit_arima, h=24)
autoplot(fc_arima) + labs(title="ARIMA forecast on log(AirPassengers)") + theme_minimal(base_size=12)Discounted (weighted) regression gives more weight to recent data. We fit: \[ y_t = \beta_0 + \beta_1 t + \varepsilon_t \] with weights \(w_t=\delta^{T-t}\) for some discount \(\delta\in(0,1)\).
discounted_lm_forecast <- function(y, h=10, delta=0.97){
T <- length(y)
t <- 1:T
w <- delta^(T - t)
fit <- lm(y ~ t, weights=w)
t_future <- (T+1):(T+h)
pred <- predict(fit, newdata=data.frame(t=t_future))
list(fit=fit, t=t, y=y, t_future=t_future, pred=pred)
}
y <- as.numeric(Nile)
dr <- discounted_lm_forecast(y, h=10, delta=0.97)
df <- data.frame(t=dr$t, y=dr$y)
df_f <- data.frame(t=dr$t_future, pred=dr$pred)
ggplot() +
geom_line(data=df, aes(t, y), color="#2c7fb8", linewidth=0.7) +
geom_point(data=df, aes(t, y), color="#2c7fb8", size=1.8) +
geom_line(data=df_f, aes(t, pred), color="#d95f0e", linewidth=1.0) +
labs(title="Discounted regression forecast (Nile)", x="Time index", y="Value") +
theme_minimal(base_size=12)##
## Call:
## lm(formula = y ~ t, weights = w)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -192.062 -47.739 -1.712 52.748 295.854
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 970.6552 43.4036 22.363 <2e-16 ***
## t -1.3278 0.5683 -2.336 0.0215 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 75.34 on 98 degrees of freedom
## Multiple R-squared: 0.05277, Adjusted R-squared: 0.0431
## F-statistic: 5.459 on 1 and 98 DF, p-value: 0.0215
For each practical item in the syllabus, submit:
nottem (monthly temperature).UKgas), and compare accuracy.Using the dataset AirPassengers:
\[ MA_t = \frac{1}{k} \sum_{i=-m}^{m} X_{t+i} \]
Using the ChickWeight dataset (Diet 1 average):
\[ y(t) = a + b e^{ct} \]
\[ y(t)=a\exp\{-b\exp(-ct)\} \]
\[ y(t)=\frac{a}{1+b\exp(-ct)} \]
Provide:
Which model fits best? Why?
Interpret parameters in biological context.
Using AirPassengers dataset:
Compute seasonal indices using:
For each method:
Explain the normalization condition:
\[ \frac{1}{12}\sum_{m=1}^{12} SI_m = 100 \]
Explain weak stationarity: \[ \mathbb{E}(X_t)=\mu,\quad \text{Var}(X_t)=\sigma^2,\quad \gamma(h)=Cov(X_t,X_{t+h}) \]
Using AirPassengers:
Plot ACF and PACF.
Perform ADF test and interpret.
AR(1): \[ X_t = \phi X_{t-1} + \varepsilon_t \]
MA(1): \[ X_t = \varepsilon_t + \theta \varepsilon_{t-1} \]
Compare ACF and PACF patterns.
Estimate AR parameters using Yule–Walker equations.
Using Nile dataset:
Fit weighted regression: \[ y_t = \beta_0 + \beta_1 t + \varepsilon_t \]
Use weights: \[ w_t = \delta^{T-t} \]
Compare forecast with Holt’s method.
Write 1–2 pages on:
End of manual.