Smoothing methods, such as moving average, Kernel smoothing and LOWESS (locally weighted scatterplot smoothing), are commonly used in time series analysis to reduce noise, outliers and highlight underlying trends or patterns in the data.
In moving average smoothing, each data point is replaced by the average of nearby observations within a specified window size. It can be simple or weighted, where all observations within the window are weighted equally, giving more importance to recent data points. Moving averages are effective at filtering out short-term fluctuations and noise, providing a clearer view of the overall trend.
A symmetric moving average \(\left(m_t\right)\) for the observation \(x_t\) is given by \[ m_t=\sum_{i=-k}^k a_i x_{t-i}, \] where \(a_i=a_{-i}\) and \(\sum_{i=-k}^k a_i=1\).
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(zoo)
load(url("https://userpage.fu-berlin.de/soga/data/r-data/Earth_Surface_Temperature.RData"))
str(t_global)
## An xts object on Jan 1750 / Jun 2022 containing:
## Data: double [3270, 2]
## Columns: Monthly_Anomaly_Global, Monthly_Unc_Global
## Index: yearmon [3270] (TZ: "UTC")
temp_global <- NULL
# Build a dataframe
dt <- index(temp_global)
# Convert dt to numeric format and build a data frame
dt_numeric <- as.numeric(dt)
temp_global <- t_global["1950/2025", "Monthly_Anomaly_Global"]
# Load the data from the RData file
save(file = "Earth_Surface_Temperature_Global.RData", temp_global)
load("Earth_Surface_Temperature_Global.RData")
#calculate moving average of 5 and 10 years using rollmean() function
temp_global_f5 <- rollmean(temp_global, 12 * 5)
temp_global_f10 <- rollmean(temp_global, 12 * 10)
#plot Earth Surface temperature
plot.zoo(cbind(
temp_global,
temp_global_f5,
temp_global_f10
),
plot.type = "single",
col = c("gray", "green", "red"),
main = "Annual Earth Surface Temperature Variations (Moving Average)",
ylab = "", xlab = ""
)
legend("topleft",
legend = c(
"Original",
"five-year average",
"ten-year average"
),
col = c("gray", "green", "red"),
lty = 1, cex = 0.65
)
Kernel smoothing is a moving average smoother that uses a weight function, also referred to as kernel, to average the observations. The kernel smoothing function is estimated by \[ \hat{f_t}=\sum_{i=1}^b w_i(t) x_i, \] where \[ w_i(t)=\frac{K\left(\frac{t-i}{b}\right)}{\sum_{i=1}^b K\left(\frac{t-i}{b}\right)} \] are the weights and \(K(\cdot)\) is a kernel function, typically the normal kernel, \(K(z)=\frac{1}{\sqrt{2 \pi}} \exp \left(-z^2 / 2\right)\). The wider the bandwidth \(b\), the smoother the result. The ksmooth( ) function is used for kernel smoothing.
# Convert the index to numeric format
dt <- index(temp_global)
dt_numeric <- as.numeric(index(temp_global))
y <- coredata(temp_global)
plot(dt, y,
type = "l",
col = "gray", xlab = "", ylab = "",
main = "Annual Earth Surface Temperature Variations (Kernel Smoothing)"
)
lines(ksmooth(dt, y, "normal", bandwidth = 5),
col = "red", type = "l"
)
lines(ksmooth(dt, y, "normal", bandwidth = 50),
col = "green", type = "l"
)
legend("topleft",
legend = c("b = 5", "b = 50"),
col = c("red", "green"),
lty = 1,
cex = 0.6
)
Lowess (locally weighted scatterplot smoothing) is a non-parametric method of smoothing uses nearby (in time) points to obtain the smoothed estimate of \(f_t\).
First, a certain proportion of nearest neighbors to \(x_t\) are weighting, where values closer to \(x_t\) in time get more weight. Then, a robust weighted regression is used to predict \(x_t\) and obtain the smoothed estimate of \(f_t\). The degree of smoothing is controlled by the size of the neighborhood. The larger the fraction of nearest neighbors included, the smoother the estimate.
In \(R\) lowess is implemented in the lowess ( ) function. The argument \(f\) gives the proportion of data points which influence the smoothing at each value. Larger values give more smoothness.
dt <- index(temp_global)
y <- coredata(temp_global)
plot(dt, y, type = "l", col = "gray", xlab = "", ylab = "",
main = "Annual Earth Surface Temperature Variations (LOWESS)", cex.main = 0.85)
lines(lowess(dt, y, f = 0.1), col = "green", type = "l")
lines(lowess(dt, y, f = 0.01), col = "red", type = "l")
legend("topleft", legend = c("f = 0.1", "f = 0.01"),
col = c("green", "red"), lty = 1, cex = 0.55)