Time Series Analysis in R: Step by Step

I. Introduction

Time series analysis is used to analyze data points collected or recorded at specific time intervals. It helps in understanding patterns, trends, and behaviors within data over time. It is also used for future prediction based on past patterns.

II. Steps for Time Series Analysis

a. Collect Data

Gather your data points over time.

b. Prepare Data

Handle missing data and outliers.
Make sure the data is in chronological order.

c. Explore the Data

Plot the data to see any trends or patterns.
Check for any repeating patterns (seasonality).

d. Check for Stationarity

Make sure the data doesn’t have changing trends over time
If not, transform the data such as differencing.
Stationary can be checked by using ADF test.

e. Choose a Model

Pick a model like ARIMA or Exponential Smoothing based on the pattern of your data

f. Fit the Model

Apply the chosen model to your data and find the best-fit parameters.
In R, we can use auto.arima() to select the best model

g. Evaluate the Model

Check if the model is accurate by comparing its predictions to actual data.
Make sure the model’s errors are random (no patterns).

h. Forecast

Use the model to predict future values.

III. Time Series Analysis by R

Import libraries

Some packages are needed for performing time series analysis. If you do not have them in your RStudio, please install by typing install.packages() and type those packages in the brackets.

After the installation, we need to recall libraries for use as follows:

klippy::klippy()

library(tidyr) ## For tidying the data
library(dplyr) ## For data manipulation

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2) ## For advanced graph
library(ggfortify) ## For supporting ggplot2
library(ggpubr) ## For supporting ggplot2
library(patchwork) ## For combining graphs
library(forecast) ## For time series prediction

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Registered S3 methods overwritten by 'forecast':
##   method                 from     
##   autoplot.Arima         ggfortify
##   autoplot.acf           ggfortify
##   autoplot.ar            ggfortify
##   autoplot.bats          ggfortify
##   autoplot.decomposed.ts ggfortify
##   autoplot.ets           ggfortify
##   autoplot.forecast      ggfortify
##   autoplot.stl           ggfortify
##   autoplot.ts            ggfortify
##   fitted.ar              ggfortify
##   fortify.ts             ggfortify
##   residuals.ar           ggfortify

## 
## Attaching package: 'forecast'

## The following object is masked from 'package:ggpubr':
## 
##     gghistogram

library(tseries) ## For time series data handling
library(rio) ## For importing the data from Excel
library(broom) ## For smoothing the data
library(lubridate) ## For smoothing the date data

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(rstatix) ## For descriptive statistics

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

III. Import the data from Excel

growth <- import("Time series.xlsx")

## Show the data
head(growth)

##   year Cambodia   France  Germany Thailand Uganda
## 1 1960       NA       NA       NA       NA     NA
## 2 1961       NA 4.803832 4.298440 5.362146     NA
## 3 1962       NA 6.871699 4.623471 7.554254     NA
## 4 1963       NA 6.198635 2.735296 7.999831     NA
## 5 1964       NA 6.425865 6.639470 6.830996     NA
## 6 1965       NA 4.807918 5.244164 8.181662     NA

## Check the data
summary(growth)

##       year         Cambodia           France          Germany      
##  Min.   :1960   Min.   :-34.809   Min.   :-7.441   Min.   :-5.545  
##  1st Qu.:1976   1st Qu.:  3.720   1st Qu.: 1.330   1st Qu.: 1.013  
##  Median :1992   Median :  6.081   Median : 2.443   Median : 2.287  
##  Mean   :1992   Mean   :  4.560   Mean   : 2.696   Mean   : 2.260  
##  3rd Qu.:2007   3rd Qu.:  8.019   3rd Qu.: 4.428   3rd Qu.: 3.807  
##  Max.   :2023   Max.   : 21.532   Max.   : 7.113   Max.   : 7.418  
##                 NA's   :16        NA's   :1        NA's   :1       
##     Thailand          Uganda      
##  Min.   :-7.634   Min.   :-3.306  
##  1st Qu.: 4.183   1st Qu.: 3.962  
##  Median : 5.534   Median : 5.638  
##  Mean   : 5.552   Mean   : 5.637  
##  3rd Qu.: 8.102   3rd Qu.: 6.807  
##  Max.   :13.288   Max.   :11.523  
##  NA's   :1        NA's   :23

## check missing data
tidy(colSums(is.na(growth)))

## Warning in tidy.numeric(colSums(is.na(growth))): 'tidy.numeric' is deprecated.
## See help("Deprecated")

## # A tibble: 6 × 2
##   names        x
##   <chr>    <dbl>
## 1 year         0
## 2 Cambodia    16
## 3 France       1
## 4 Germany      1
## 5 Thailand     1
## 6 Uganda      23

## Type of data
typeof(growth)

## [1] "list"

## Names of columns
colnames(growth)

## [1] "year"     "Cambodia" "France"   "Germany"  "Thailand" "Uganda"

## Count frequency
growth %>% freq_table(year)

## # A tibble: 64 × 3
##     year     n  prop
##    <dbl> <int> <dbl>
##  1  1960     1   1.6
##  2  1961     1   1.6
##  3  1962     1   1.6
##  4  1963     1   1.6
##  5  1964     1   1.6
##  6  1965     1   1.6
##  7  1966     1   1.6
##  8  1967     1   1.6
##  9  1968     1   1.6
## 10  1969     1   1.6
## # ℹ 54 more rows

IV. Time Series Analysis

4.1 Example for one country

## Turn the data for Cambodia into time series
Cam <- ts(growth$Cambodia, start = (1960), 
          frequency=1)

## Check the data
typeof(Cam)

## [1] "double"

## Plot a graph
autoplot(Cam)

## Check stationary
Cam1 <- na.remove(Cam)
adf.test(Cam1)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  Cam1
## Dickey-Fuller = -2.6215, Lag order = 3, p-value = 0.3259
## alternative hypothesis: stationary

## Differencing the data
d_opt <- ndiffs(Cam1)
diff_auto <- diff(Cam1, differences = d_opt)

## Check stationary again
adf.test(diff_auto)

## Warning in adf.test(diff_auto): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff_auto
## Dickey-Fuller = -4.9353, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

## Check number of lags 
acf(diff_auto) ## Base plot

ggAcf(diff_auto) ## By ggplot2

##𝑞 = 1 → One moving average (MA) term is best option.
pacf(diff_auto) ## Base plot

ggPacf(diff_auto) ## By ggplot2

## 𝑝 = 0 → No autoregressive (AR) terms.

## Note: 
## ACF: Helps determine the Moving Average (q) order.
## PACF: Helps determine the Auto-Regressive (p) order.

## Using auto.arima to select the best option
ma <- auto.arima(Cam)
ma

## Series: Cam 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.7770
## s.e.   0.1353
## 
## sigma^2 = 69.57:  log likelihood = -166.34
## AIC=336.69   AICc=336.96   BIC=340.39

## Prediction
pre_cam <- forecast(ma, h=10)
autoplot(pre_cam)

## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_line()`).

4.2 Example for multiple countries

## Turn the data for all countries

growth_rate <- ts(growth[, -1], start = (1960), 
          frequency=1)

## Plot the graph
autoplot(growth_rate, facets = T, scales="free_y",
         color="blue", lwd=0.8)

## Make predictions for all country
library(vars)

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:rstatix':
## 
##     select

## The following object is masked from 'package:patchwork':
## 
##     area

## The following object is masked from 'package:dplyr':
## 
##     select

## Loading required package: strucchange

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: sandwich

## Loading required package: urca

## Loading required package: lmtest

growth_rate1 <- na.remove(growth_rate)
new <- VARselect(growth_rate1, lag.max = 3, type = 'const')$selection[1]
new.pre <- VAR(growth_rate1, p = new, type = 'const')

## ------------------------------------------
## Calculate the predicted values for 20 periods
ma1 <-predict(new.pre, n.ahead = 20)

## Plot the graph
autoplot(ma1, ts.colour = "blue",
         predict.colour = 'red') + 
  labs(x="Year", y="GDP growth rate", title = "GDP growth rate of five countries\n from 1980 to 2023 and prediction to 2043")+
  theme(text = element_text(size=14))

V. Conclusion

Time series analysis is a valuable tool for examining data points ordered in time, enabling the identification of trends, seasonal patterns, and irregularities. By analyzing these components, it provides insights into historical data and allows for accurate predictions about future trends.

It is widely used in fields such as economics, finance, and environmental studies, where forecasting and understanding temporal patterns are crucial. Proper handling of the data, such as addressing missing values and ensuring stationarity, is essential for obtaining reliable results.