Packages for making time serries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(quantmod)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################
## 
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## Loading required package: TTR
library(xts)
library(PerformanceAnalytics)
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
## 
##     legend

data(Airpassengers)

data("AirPassengers")
ap<-AirPassengers
class(ap)
## [1] "ts"

Time serries plots

plot(ap)

plot(decompose(ap))

From this plots we can see a trend and some seasonality .Hence we can tell that the best Model for this is SARIMA.

removing seasonality

Decomposing data in time series analysis involves breaking down a time series dataset into its constituent components. These components provide insight into the underlying patterns, trends, and variations present in the data. The main purpose of decomposing time series data is to better understand and model the data, making it easier to analyze, forecast, and interpret. The decomposition process typically involves separating the time series into three main components: trend, seasonality, and noise (or residuals).

  1. Trend: The trend component represents the long-term, overall movement or direction of the time series data. It captures the underlying growth or decline in the data over time. By isolating the trend, analysts can identify the general pattern or trajectory of the data, which can help in understanding the underlying behavior and making long-term predictions.

  2. Seasonality: Seasonality refers to the repeating patterns that occur at regular intervals within the time series data. These patterns might correspond to seasonal effects, such as monthly or yearly cycles, and can be caused by factors like holidays, weather, or other calendar-related events. By separating out the seasonality component, analysts can better identify and analyze the recurring patterns within the data, which is essential for short- to medium-term forecasting.

  3. Noise (Residuals): The noise component, also known as residuals or errors, represents the random fluctuations or irregularities that are not captured by the trend and seasonality components. Noise can be caused by various factors like measurement errors, random events, or other external influences that are difficult to model explicitly. Decomposing the data helps separate these random variations from the underlying patterns, making it easier to spot anomalies and assess the quality of the model’s fit.

Decomposing time series data provides several benefits:

Modeling and Forecasting: Once the trend and seasonality are isolated, it becomes easier to model and forecast the data. Trends can help in understanding long-term behavior, while seasonality aids in short- to medium-term predictions.

A. Anomaly Detection: By removing the regular patterns and focusing on the noise component, analysts can more easily identify anomalies or unusual events that deviate from the expected behavior.

B. Data Interpretation: Decomposition enhances data interpretation by breaking down the complex time series into its fundamental components. This can reveal insights about the driving factors behind the data’s behavior.

C. Model Improvement: Understanding the different components of the data can guide the selection and fine-tuning of appropriate forecasting models that effectively capture the various patterns.

D. Data Cleaning: Decomposition can help identify and handle data quality issues such as missing values, outliers, or erroneous measurements, leading to more accurate analysis and forecasts.

E. Overall, decomposing time series data is a fundamental step in time series analysis that facilitates a deeper understanding of the data’s underlying structure, aiding in better forecasting, decision-making, and insights extraction

ap.decomp<-decompose(ap,type = "mult")
plot(ap.decomp)

plot(ap.decomp$trend,main="Trend")

plot(ap.decomp$seasonal,main="seasonality")

plot(ap.decomp$random,main="Statonary")

Checking if the data is Normaly distributed

hist(ap,main = "HISTOGRAM")

Making the data stationary

Using log transformation

AP<-log(ap)
plot(AP)

acf(AP)

pacf(AP)

From this we can see that the data is non stationary hence we need to make it stationary .

Differencing

d<-diff(AP)
plot(d,main="Differenced plot")

acf(d)

pacf(d)

hist(d)

qqnorm(d)
qqline(d)

From this data we see that we can use SARIMA .

Checking the best ARIMA MODEL

model<-auto.arima(AP)
summary(model)
## Series: AP 
## ARIMA(0,1,1)(0,1,1)[12] 
## 
## Coefficients:
##           ma1     sma1
##       -0.4018  -0.5569
## s.e.   0.0896   0.0731
## 
## sigma^2 = 0.001371:  log likelihood = 244.7
## AIC=-483.4   AICc=-483.21   BIC=-474.77
## 
## Training set error measures:
##                        ME       RMSE        MAE        MPE      MAPE      MASE
## Training set 0.0005730622 0.03504883 0.02626034 0.01098898 0.4752815 0.2169522
##                    ACF1
## Training set 0.01443892

Checking the ACF and PACF for residuals

acf(model$residuals)

pacf(model$residuals)

BOX LJUNG TEST

Box.test(model$residuals)
## 
##  Box-Pierce test
## 
## data:  model$residuals
## X-squared = 0.030021, df = 1, p-value = 0.8624

This shows that the model can be used since the p-value is greater than 0.05.

Histogram plots for residuals

hist(model$residuals,col = "lightblue")
lines(density(model$residuals))

forecasting

f<-forecast(model,50)
autoplot(f)

# Checking the forecasted data

# To check what has been forecasted run f
f
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1961       6.110186 6.062729 6.157642 6.037607 6.182764
## Feb 1961       6.053775 5.998476 6.109074 5.969203 6.138347
## Mar 1961       6.171715 6.109555 6.233874 6.076650 6.266779
## Apr 1961       6.199300 6.130966 6.267635 6.094792 6.303809
## May 1961       6.232556 6.158560 6.306552 6.119388 6.345724
## Jun 1961       6.368779 6.289524 6.448033 6.247569 6.489988
## Jul 1961       6.507294 6.423109 6.591479 6.378544 6.636044
## Aug 1961       6.502906 6.414064 6.591749 6.367034 6.638779
## Sep 1961       6.324698 6.231431 6.417965 6.182058 6.467338
## Oct 1961       6.209008 6.111516 6.306500 6.059908 6.358109
## Nov 1961       6.063487 5.961947 6.165028 5.908195 6.218780
## Dec 1961       6.168025 6.062591 6.273459 6.006778 6.329272
## Jan 1962       6.206435 6.089996 6.322874 6.028358 6.384512
## Feb 1962       6.150025 6.026590 6.273459 5.961248 6.338801
## Mar 1962       6.267964 6.137910 6.398018 6.069064 6.466865
## Apr 1962       6.295550 6.159197 6.431903 6.087016 6.504084
## May 1962       6.328805 6.186432 6.471179 6.111064 6.546547
## Jun 1962       6.465028 6.316878 6.613177 6.238453 6.691603
## Jul 1962       6.603543 6.449834 6.757252 6.368466 6.838620
## Aug 1962       6.599156 6.440082 6.758229 6.355874 6.842438
## Sep 1962       6.420947 6.256684 6.585211 6.169728 6.672167
## Oct 1962       6.305257 6.135963 6.474552 6.046344 6.564171
## Nov 1962       6.159737 5.985557 6.333917 5.893352 6.426122
## Dec 1962       6.264274 6.085342 6.443206 5.990621 6.537927
## Jan 1963       6.302684 6.113318 6.492050 6.013074 6.592295
## Feb 1963       6.246274 6.049484 6.443063 5.945310 6.547238
## Mar 1963       6.364213 6.160270 6.568157 6.052309 6.676118
## Apr 1963       6.391799 6.180945 6.602653 6.069325 6.714273
## May 1963       6.425054 6.207509 6.642600 6.092347 6.757762
## Jun 1963       6.561277 6.337239 6.785315 6.218641 6.903913
## Jul 1963       6.699792 6.469446 6.930139 6.347508 7.052077
## Aug 1963       6.695405 6.458918 6.931892 6.333729 7.057081
## Sep 1963       6.517197 6.274724 6.759669 6.146367 6.888026
## Oct 1963       6.401507 6.153193 6.649820 6.021744 6.781269
## Nov 1963       6.255986 6.001966 6.510006 5.867496 6.644476
## Dec 1963       6.360523 6.100922 6.620125 5.963497 6.757550
## Jan 1964       6.398933 6.128835 6.669032 5.985853 6.812014
## Feb 1964       6.342523 6.064450 6.620597 5.917246 6.767800
## Mar 1964       6.460463 6.174637 6.746289 6.023329 6.897596
## Apr 1964       6.488048 6.194674 6.781422 6.039372 6.936725
## May 1964       6.521304 6.220572 6.822036 6.061374 6.981234
## Jun 1964       6.657526 6.349612 6.965441 6.186612 7.128441
## Jul 1964       6.796042 6.481108 7.110975 6.314392 7.277691
## Aug 1964       6.791654 6.469855 7.113453 6.299505 7.283804
## Sep 1964       6.613446 6.284924 6.941968 6.111016 7.115876
## Oct 1964       6.497756 6.162647 6.832865 5.985251 7.010261
## Nov 1964       6.352235 6.010666 6.693805 5.829850 6.874621
## Dec 1964       6.456773 6.108863 6.804683 5.924690 6.988855
## Jan 1965       6.495183 6.136525 6.853841 5.946663 7.043703
## Feb 1965       6.438772 6.071582 6.805962 5.877204 7.000341

All this data is in log so we must convert it back to normal formal

Changing the forcasted data to normal form

f$x<-exp(f$x)
f$mean<-exp(f$mean)
f$lower<-exp(f$lower)
f$upper<-exp(f$upper)
autoplot(f)

f
##          Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## Jan 1961       450.4224 429.5461  472.3132 418.8895  484.3289
## Feb 1961       425.7172 402.8146  449.9219 391.1938  463.2874
## Mar 1961       479.0068 450.1386  509.7265 435.5677  526.7781
## Apr 1961       492.4045 459.8801  527.2290 443.5416  546.6503
## May 1961       509.0550 472.7467  548.1518 454.5866  570.0497
## Jun 1961       583.3449 538.8968  631.4591 516.7552  658.5155
## Jul 1961       670.0108 615.9149  728.8579 589.0693  762.0740
## Aug 1961       667.0776 610.3693  729.0546 582.3280  764.1613
## Sep 1961       558.1894 508.4826  612.7552 483.9871  643.7679
## Oct 1961       497.2078 451.0221  548.1230 428.3358  577.1537
## Nov 1961       429.8720 388.3656  475.8144 368.0412  502.0903
## Dec 1961       477.2426 429.4869  530.3083 406.1725  560.7482
## Jan 1962       495.9301 441.4199  557.1717 415.0328  592.5957
## Feb 1962       468.7289 414.3000  530.3084 388.0942  566.1170
## Mar 1962       527.4025 463.0847  600.6535 432.2757  643.4631
## Apr 1962       542.1538 473.0479  621.3551 440.1061  667.8634
## May 1962       560.4865 486.1084  646.2450 450.8180  696.8336
## Jun 1962       642.2823 553.8414  744.8460 512.0656  805.6127
## Jul 1962       737.7043 632.5975  860.2747 583.1625  933.2006
## Aug 1962       734.4748 626.4582  861.1161 575.8651  936.7700
## Sep 1962       614.5852 521.4868  724.3039 478.0561  790.1058
## Oct 1962       547.4424 462.1839  648.4284 422.5653  709.2234
## Nov 1962       473.3034 397.6439  563.3587 362.6186  617.7735
## Dec 1962       525.4600 439.3700  628.4185 399.6627  690.8531
## Jan 1963       546.0356 451.8355  659.8749 408.7378  729.4528
## Feb 1963       516.0862 423.8943  628.3287 381.9577  697.3152
## Mar 1963       580.6879 473.5560  712.0560 425.0935  793.2335
## Apr 1963       596.9295 483.4484  737.0482 432.3888  824.0842
## May 1963       617.1144 496.4628  767.0871 442.4585  860.7139
## Jun 1963       707.1743 565.2338  884.7586 502.0206  996.1653
## Jul 1963       812.2371 645.1260 1022.6360 571.0676 1155.2557
## Aug 1963       808.6813 638.3697 1024.4306 563.2530 1161.0511
## Sep 1963       676.6788 530.9799  862.3569 467.0177  980.4645
## Oct 1963       602.7524 470.2164  772.6452 412.2969  881.1864
## Nov 1963       521.1229 404.2226  671.8305 353.3629  768.5275
## Dec 1963       578.5491 446.2690  750.0388 388.9680  860.5310
## Jan 1964       601.2035 458.9012  787.6328 397.7618  908.6987
## Feb 1964       568.2282 430.2857  750.3926 371.3876  869.3969
## Mar 1964       639.3568 480.4084  850.8949 412.9512  989.8919
## Apr 1964       657.2393 490.1318  881.3210 419.6293 1029.3931
## May 1964       679.4636 502.9907  917.8517 428.9643 1076.2454
## Jun 1964       778.6226 572.2705 1059.3821 486.1959 1246.9319
## Jul 1964       894.3002 652.6938 1225.3417 552.4662 1447.6412
## Aug 1964       890.3852 645.3901 1228.3823 544.3022 1456.5176
## Sep 1964       745.0460 536.4238 1034.8042 450.7963 1231.3622
## Oct 1964       663.6506 474.6828  927.8451 397.5222 1107.9434
## Nov 1964       573.7738 407.7547  807.3882 340.3076  967.4083
## Dec 1964       637.0019 449.8268  902.0614 374.1625 1084.4790
## Jan 1965       661.9452 462.4437  947.5130 382.4748 1145.6216
## Feb 1965       625.6382 433.3659  903.2165 356.8101 1097.0070

We can see frOm that the graph that the values are now in normal form

Sources for time series data

There are several sources where you can obtain time series data for various purposes, including research, analysis, and modeling. Here are some popular sources you can explore:

1.Government and Economic Organizations:

U.S. Federal Reserve Economic Data (FRED): Provides a wide range of economic data for the United States. Bureau of Economic Analysis (BEA): Offers economic data on GDP, income, trade, and more. European Central Bank (ECB): Provides economic and financial data for the European Union. World Bank Data: Offers global economic, social, and environmental data. U.S. Census Bureau: Provides demographic and economic data for the United States. 2. Financial Markets and Investing:

Yahoo Finance: Offers historical stock prices, indices, and other financial data. Quandl: Provides a vast collection of financial, economic, and alternative data. Alpha Vantage: Offers free API access to historical and real-time financial data. Investing.com: Provides a variety of financial data and charts. Academic and Research Databases:

UCI Machine Learning Repository: Offers various datasets, including time series data, for machine learning and research. THIS IS IMPORTANT Kaggle Datasets: A platform that hosts various datasets, including time series data, often used in data science competitions. Time Series Data Library (TSDL): Maintained by the University of California, Riverside, it offers a collection of time series datasets for research purposes. 3.Climate and Environmental Data:

a.National Oceanic and Atmospheric Administration (NOAA): Offers climate, weather, and environmental data. b.Climate Data Online (CDO): Provides access to historical weather and climate data. c.Global Historical Climatology Network (GHCN): Offers global climate data. 3.Health and Medical Data:

World Health Organization (WHO): Offers health-related data, including disease statistics and health indicators. National Institutes of Health (NIH): Provides medical research data and clinical trial information. Social Media and Web Data:

Twitter API: Offers access to real-time and historical tweets, which can be used for sentiment analysis and trend detection. Reddit API: Provides access to Reddit posts and comments for analysis. Energy and Power Data:

U.S. Energy Information Administration (EIA): Offers energy-related data, including consumption, production, and prices. Transportation and Mobility Data:

Uber Movement: Provides anonymized data related to urban mobility and travel times. Transit Agencies: Many cities offer open data portals with transportation-related information. When using data from these sources, make sure to review the terms of use, data formats, and any restrictions associated with the data. Additionally, if you’re looking for specific types of time series data, consider searching for dedicated repositories or APIs related to your field of interest

Published by
AARON CHOLA