Time series analysis is used to analyze data points collected or recorded at specific time intervals. It helps in understanding patterns, trends, and behaviors within data over time. It is also used for future prediction based on past patterns.
Some packages are needed for performing time series analysis. If you
do not have them in your RStudio, please install by typing
install.packages()
and type those packages in the
brackets.
After the installation, we need to recall libraries for use as follows:
klippy::klippy()
library(tidyr) ## For tidying the data
library(dplyr) ## For data manipulation
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) ## For advanced graph
library(ggfortify) ## For supporting ggplot2
library(ggpubr) ## For supporting ggplot2
library(patchwork) ## For combining graphs
library(forecast) ## For time series prediction
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 methods overwritten by 'forecast':
## method from
## autoplot.Arima ggfortify
## autoplot.acf ggfortify
## autoplot.ar ggfortify
## autoplot.bats ggfortify
## autoplot.decomposed.ts ggfortify
## autoplot.ets ggfortify
## autoplot.forecast ggfortify
## autoplot.stl ggfortify
## autoplot.ts ggfortify
## fitted.ar ggfortify
## fortify.ts ggfortify
## residuals.ar ggfortify
##
## Attaching package: 'forecast'
## The following object is masked from 'package:ggpubr':
##
## gghistogram
library(tseries) ## For time series data handling
library(rio) ## For importing the data from Excel
library(broom) ## For smoothing the data
library(lubridate) ## For smoothing the date data
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(rstatix) ## For descriptive statistics
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
growth <- import("Time series.xlsx")
## Show the data
head(growth)
## year Cambodia France Germany Thailand Uganda
## 1 1960 NA NA NA NA NA
## 2 1961 NA 4.803832 4.298440 5.362146 NA
## 3 1962 NA 6.871699 4.623471 7.554254 NA
## 4 1963 NA 6.198635 2.735296 7.999831 NA
## 5 1964 NA 6.425865 6.639470 6.830996 NA
## 6 1965 NA 4.807918 5.244164 8.181662 NA
## Check the data
summary(growth)
## year Cambodia France Germany
## Min. :1960 Min. :-34.809 Min. :-7.441 Min. :-5.545
## 1st Qu.:1976 1st Qu.: 3.720 1st Qu.: 1.330 1st Qu.: 1.013
## Median :1992 Median : 6.081 Median : 2.443 Median : 2.287
## Mean :1992 Mean : 4.560 Mean : 2.696 Mean : 2.260
## 3rd Qu.:2007 3rd Qu.: 8.019 3rd Qu.: 4.428 3rd Qu.: 3.807
## Max. :2023 Max. : 21.532 Max. : 7.113 Max. : 7.418
## NA's :16 NA's :1 NA's :1
## Thailand Uganda
## Min. :-7.634 Min. :-3.306
## 1st Qu.: 4.183 1st Qu.: 3.962
## Median : 5.534 Median : 5.638
## Mean : 5.552 Mean : 5.637
## 3rd Qu.: 8.102 3rd Qu.: 6.807
## Max. :13.288 Max. :11.523
## NA's :1 NA's :23
## check missing data
tidy(colSums(is.na(growth)))
## Warning in tidy.numeric(colSums(is.na(growth))): 'tidy.numeric' is deprecated.
## See help("Deprecated")
## # A tibble: 6 × 2
## names x
## <chr> <dbl>
## 1 year 0
## 2 Cambodia 16
## 3 France 1
## 4 Germany 1
## 5 Thailand 1
## 6 Uganda 23
## Type of data
typeof(growth)
## [1] "list"
## Names of columns
colnames(growth)
## [1] "year" "Cambodia" "France" "Germany" "Thailand" "Uganda"
## Count frequency
growth %>% freq_table(year)
## # A tibble: 64 × 3
## year n prop
## <dbl> <int> <dbl>
## 1 1960 1 1.6
## 2 1961 1 1.6
## 3 1962 1 1.6
## 4 1963 1 1.6
## 5 1964 1 1.6
## 6 1965 1 1.6
## 7 1966 1 1.6
## 8 1967 1 1.6
## 9 1968 1 1.6
## 10 1969 1 1.6
## # ℹ 54 more rows
## Turn the data for Cambodia into time series
Cam <- ts(growth$Cambodia, start = (1960),
frequency=1)
## Check the data
typeof(Cam)
## [1] "double"
## Plot a graph
autoplot(Cam)
## Check stationary
Cam1 <- na.remove(Cam)
adf.test(Cam1)
##
## Augmented Dickey-Fuller Test
##
## data: Cam1
## Dickey-Fuller = -2.6215, Lag order = 3, p-value = 0.3259
## alternative hypothesis: stationary
## Differencing the data
d_opt <- ndiffs(Cam1)
diff_auto <- diff(Cam1, differences = d_opt)
## Check stationary again
adf.test(diff_auto)
## Warning in adf.test(diff_auto): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff_auto
## Dickey-Fuller = -4.9353, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary
## Check number of lags
acf(diff_auto) ## Base plot
ggAcf(diff_auto) ## By ggplot2
##𝑞 = 1 → One moving average (MA) term is best option.
pacf(diff_auto) ## Base plot
ggPacf(diff_auto) ## By ggplot2
## 𝑝 = 0 → No autoregressive (AR) terms.
## Note:
## ACF: Helps determine the Moving Average (q) order.
## PACF: Helps determine the Auto-Regressive (p) order.
## Using auto.arima to select the best option
ma <- auto.arima(Cam)
ma
## Series: Cam
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.7770
## s.e. 0.1353
##
## sigma^2 = 69.57: log likelihood = -166.34
## AIC=336.69 AICc=336.96 BIC=340.39
## Prediction
pre_cam <- forecast(ma, h=10)
autoplot(pre_cam)
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Turn the data for all countries
growth_rate <- ts(growth[, -1], start = (1960),
frequency=1)
## Plot the graph
autoplot(growth_rate, facets = T, scales="free_y",
color="blue", lwd=0.8)
## Make predictions for all country
library(vars)
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:rstatix':
##
## select
## The following object is masked from 'package:patchwork':
##
## area
## The following object is masked from 'package:dplyr':
##
## select
## Loading required package: strucchange
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: urca
## Loading required package: lmtest
growth_rate1 <- na.remove(growth_rate)
new <- VARselect(growth_rate1, lag.max = 3, type = 'const')$selection[1]
new.pre <- VAR(growth_rate1, p = new, type = 'const')
## ------------------------------------------
## Calculate the predicted values for 20 periods
ma1 <-predict(new.pre, n.ahead = 20)
## Plot the graph
autoplot(ma1, ts.colour = "blue",
predict.colour = 'red') +
labs(x="Year", y="GDP growth rate", title = "GDP growth rate of five countries\n from 1980 to 2023 and prediction to 2043")+
theme(text = element_text(size=14))
Time series analysis is a valuable tool for examining data points ordered in time, enabling the identification of trends, seasonal patterns, and irregularities. By analyzing these components, it provides insights into historical data and allows for accurate predictions about future trends.
It is widely used in fields such as economics, finance, and environmental studies, where forecasting and understanding temporal patterns are crucial. Proper handling of the data, such as addressing missing values and ensuring stationarity, is essential for obtaining reliable results.