library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(quantmod)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
## Loading required package: TTR
library(xts)
library(PerformanceAnalytics)
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
data("AirPassengers")
ap<-AirPassengers
class(ap)
## [1] "ts"
plot(ap)
plot(decompose(ap))
From this plots we can see a trend and some seasonality .Hence we can tell that the best Model for this is SARIMA.
Decomposing data in time series analysis involves breaking down a time series dataset into its constituent components. These components provide insight into the underlying patterns, trends, and variations present in the data. The main purpose of decomposing time series data is to better understand and model the data, making it easier to analyze, forecast, and interpret. The decomposition process typically involves separating the time series into three main components: trend, seasonality, and noise (or residuals).
Trend: The trend component represents the long-term, overall movement or direction of the time series data. It captures the underlying growth or decline in the data over time. By isolating the trend, analysts can identify the general pattern or trajectory of the data, which can help in understanding the underlying behavior and making long-term predictions.
Seasonality: Seasonality refers to the repeating patterns that occur at regular intervals within the time series data. These patterns might correspond to seasonal effects, such as monthly or yearly cycles, and can be caused by factors like holidays, weather, or other calendar-related events. By separating out the seasonality component, analysts can better identify and analyze the recurring patterns within the data, which is essential for short- to medium-term forecasting.
Noise (Residuals): The noise component, also known as residuals or errors, represents the random fluctuations or irregularities that are not captured by the trend and seasonality components. Noise can be caused by various factors like measurement errors, random events, or other external influences that are difficult to model explicitly. Decomposing the data helps separate these random variations from the underlying patterns, making it easier to spot anomalies and assess the quality of the model’s fit.
Decomposing time series data provides several benefits:
Modeling and Forecasting: Once the trend and seasonality are isolated, it becomes easier to model and forecast the data. Trends can help in understanding long-term behavior, while seasonality aids in short- to medium-term predictions.
A. Anomaly Detection: By removing the regular patterns and focusing on the noise component, analysts can more easily identify anomalies or unusual events that deviate from the expected behavior.
B. Data Interpretation: Decomposition enhances data interpretation by breaking down the complex time series into its fundamental components. This can reveal insights about the driving factors behind the data’s behavior.
C. Model Improvement: Understanding the different components of the data can guide the selection and fine-tuning of appropriate forecasting models that effectively capture the various patterns.
D. Data Cleaning: Decomposition can help identify and handle data quality issues such as missing values, outliers, or erroneous measurements, leading to more accurate analysis and forecasts.
E. Overall, decomposing time series data is a fundamental step in time series analysis that facilitates a deeper understanding of the data’s underlying structure, aiding in better forecasting, decision-making, and insights extraction
ap.decomp<-decompose(ap,type = "mult")
plot(ap.decomp)
plot(ap.decomp$trend,main="Trend")
plot(ap.decomp$seasonal,main="seasonality")
plot(ap.decomp$random,main="Statonary")
hist(ap,main = "HISTOGRAM")
AP<-log(ap)
plot(AP)
acf(AP)
pacf(AP)
From this we can see that the data is non stationary hence we need to
make it stationary .
d<-diff(AP)
plot(d,main="Differenced plot")
acf(d)
pacf(d)
hist(d)
qqnorm(d)
qqline(d)
From this data we see that we can use SARIMA .
model<-auto.arima(AP)
summary(model)
## Series: AP
## ARIMA(0,1,1)(0,1,1)[12]
##
## Coefficients:
## ma1 sma1
## -0.4018 -0.5569
## s.e. 0.0896 0.0731
##
## sigma^2 = 0.001371: log likelihood = 244.7
## AIC=-483.4 AICc=-483.21 BIC=-474.77
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0005730622 0.03504883 0.02626034 0.01098898 0.4752815 0.2169522
## ACF1
## Training set 0.01443892
acf(model$residuals)
pacf(model$residuals)
Box.test(model$residuals)
##
## Box-Pierce test
##
## data: model$residuals
## X-squared = 0.030021, df = 1, p-value = 0.8624
This shows that the model can be used since the p-value is greater than 0.05.
hist(model$residuals,col = "lightblue")
lines(density(model$residuals))
f<-forecast(model,50)
autoplot(f)
# Checking the forecasted data
# To check what has been forecasted run f
f
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1961 6.110186 6.062729 6.157642 6.037607 6.182764
## Feb 1961 6.053775 5.998476 6.109074 5.969203 6.138347
## Mar 1961 6.171715 6.109555 6.233874 6.076650 6.266779
## Apr 1961 6.199300 6.130966 6.267635 6.094792 6.303809
## May 1961 6.232556 6.158560 6.306552 6.119388 6.345724
## Jun 1961 6.368779 6.289524 6.448033 6.247569 6.489988
## Jul 1961 6.507294 6.423109 6.591479 6.378544 6.636044
## Aug 1961 6.502906 6.414064 6.591749 6.367034 6.638779
## Sep 1961 6.324698 6.231431 6.417965 6.182058 6.467338
## Oct 1961 6.209008 6.111516 6.306500 6.059908 6.358109
## Nov 1961 6.063487 5.961947 6.165028 5.908195 6.218780
## Dec 1961 6.168025 6.062591 6.273459 6.006778 6.329272
## Jan 1962 6.206435 6.089996 6.322874 6.028358 6.384512
## Feb 1962 6.150025 6.026590 6.273459 5.961248 6.338801
## Mar 1962 6.267964 6.137910 6.398018 6.069064 6.466865
## Apr 1962 6.295550 6.159197 6.431903 6.087016 6.504084
## May 1962 6.328805 6.186432 6.471179 6.111064 6.546547
## Jun 1962 6.465028 6.316878 6.613177 6.238453 6.691603
## Jul 1962 6.603543 6.449834 6.757252 6.368466 6.838620
## Aug 1962 6.599156 6.440082 6.758229 6.355874 6.842438
## Sep 1962 6.420947 6.256684 6.585211 6.169728 6.672167
## Oct 1962 6.305257 6.135963 6.474552 6.046344 6.564171
## Nov 1962 6.159737 5.985557 6.333917 5.893352 6.426122
## Dec 1962 6.264274 6.085342 6.443206 5.990621 6.537927
## Jan 1963 6.302684 6.113318 6.492050 6.013074 6.592295
## Feb 1963 6.246274 6.049484 6.443063 5.945310 6.547238
## Mar 1963 6.364213 6.160270 6.568157 6.052309 6.676118
## Apr 1963 6.391799 6.180945 6.602653 6.069325 6.714273
## May 1963 6.425054 6.207509 6.642600 6.092347 6.757762
## Jun 1963 6.561277 6.337239 6.785315 6.218641 6.903913
## Jul 1963 6.699792 6.469446 6.930139 6.347508 7.052077
## Aug 1963 6.695405 6.458918 6.931892 6.333729 7.057081
## Sep 1963 6.517197 6.274724 6.759669 6.146367 6.888026
## Oct 1963 6.401507 6.153193 6.649820 6.021744 6.781269
## Nov 1963 6.255986 6.001966 6.510006 5.867496 6.644476
## Dec 1963 6.360523 6.100922 6.620125 5.963497 6.757550
## Jan 1964 6.398933 6.128835 6.669032 5.985853 6.812014
## Feb 1964 6.342523 6.064450 6.620597 5.917246 6.767800
## Mar 1964 6.460463 6.174637 6.746289 6.023329 6.897596
## Apr 1964 6.488048 6.194674 6.781422 6.039372 6.936725
## May 1964 6.521304 6.220572 6.822036 6.061374 6.981234
## Jun 1964 6.657526 6.349612 6.965441 6.186612 7.128441
## Jul 1964 6.796042 6.481108 7.110975 6.314392 7.277691
## Aug 1964 6.791654 6.469855 7.113453 6.299505 7.283804
## Sep 1964 6.613446 6.284924 6.941968 6.111016 7.115876
## Oct 1964 6.497756 6.162647 6.832865 5.985251 7.010261
## Nov 1964 6.352235 6.010666 6.693805 5.829850 6.874621
## Dec 1964 6.456773 6.108863 6.804683 5.924690 6.988855
## Jan 1965 6.495183 6.136525 6.853841 5.946663 7.043703
## Feb 1965 6.438772 6.071582 6.805962 5.877204 7.000341
All this data is in log so we must convert it back to normal formal
f$x<-exp(f$x)
f$mean<-exp(f$mean)
f$lower<-exp(f$lower)
f$upper<-exp(f$upper)
autoplot(f)
f
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1961 450.4224 429.5461 472.3132 418.8895 484.3289
## Feb 1961 425.7172 402.8146 449.9219 391.1938 463.2874
## Mar 1961 479.0068 450.1386 509.7265 435.5677 526.7781
## Apr 1961 492.4045 459.8801 527.2290 443.5416 546.6503
## May 1961 509.0550 472.7467 548.1518 454.5866 570.0497
## Jun 1961 583.3449 538.8968 631.4591 516.7552 658.5155
## Jul 1961 670.0108 615.9149 728.8579 589.0693 762.0740
## Aug 1961 667.0776 610.3693 729.0546 582.3280 764.1613
## Sep 1961 558.1894 508.4826 612.7552 483.9871 643.7679
## Oct 1961 497.2078 451.0221 548.1230 428.3358 577.1537
## Nov 1961 429.8720 388.3656 475.8144 368.0412 502.0903
## Dec 1961 477.2426 429.4869 530.3083 406.1725 560.7482
## Jan 1962 495.9301 441.4199 557.1717 415.0328 592.5957
## Feb 1962 468.7289 414.3000 530.3084 388.0942 566.1170
## Mar 1962 527.4025 463.0847 600.6535 432.2757 643.4631
## Apr 1962 542.1538 473.0479 621.3551 440.1061 667.8634
## May 1962 560.4865 486.1084 646.2450 450.8180 696.8336
## Jun 1962 642.2823 553.8414 744.8460 512.0656 805.6127
## Jul 1962 737.7043 632.5975 860.2747 583.1625 933.2006
## Aug 1962 734.4748 626.4582 861.1161 575.8651 936.7700
## Sep 1962 614.5852 521.4868 724.3039 478.0561 790.1058
## Oct 1962 547.4424 462.1839 648.4284 422.5653 709.2234
## Nov 1962 473.3034 397.6439 563.3587 362.6186 617.7735
## Dec 1962 525.4600 439.3700 628.4185 399.6627 690.8531
## Jan 1963 546.0356 451.8355 659.8749 408.7378 729.4528
## Feb 1963 516.0862 423.8943 628.3287 381.9577 697.3152
## Mar 1963 580.6879 473.5560 712.0560 425.0935 793.2335
## Apr 1963 596.9295 483.4484 737.0482 432.3888 824.0842
## May 1963 617.1144 496.4628 767.0871 442.4585 860.7139
## Jun 1963 707.1743 565.2338 884.7586 502.0206 996.1653
## Jul 1963 812.2371 645.1260 1022.6360 571.0676 1155.2557
## Aug 1963 808.6813 638.3697 1024.4306 563.2530 1161.0511
## Sep 1963 676.6788 530.9799 862.3569 467.0177 980.4645
## Oct 1963 602.7524 470.2164 772.6452 412.2969 881.1864
## Nov 1963 521.1229 404.2226 671.8305 353.3629 768.5275
## Dec 1963 578.5491 446.2690 750.0388 388.9680 860.5310
## Jan 1964 601.2035 458.9012 787.6328 397.7618 908.6987
## Feb 1964 568.2282 430.2857 750.3926 371.3876 869.3969
## Mar 1964 639.3568 480.4084 850.8949 412.9512 989.8919
## Apr 1964 657.2393 490.1318 881.3210 419.6293 1029.3931
## May 1964 679.4636 502.9907 917.8517 428.9643 1076.2454
## Jun 1964 778.6226 572.2705 1059.3821 486.1959 1246.9319
## Jul 1964 894.3002 652.6938 1225.3417 552.4662 1447.6412
## Aug 1964 890.3852 645.3901 1228.3823 544.3022 1456.5176
## Sep 1964 745.0460 536.4238 1034.8042 450.7963 1231.3622
## Oct 1964 663.6506 474.6828 927.8451 397.5222 1107.9434
## Nov 1964 573.7738 407.7547 807.3882 340.3076 967.4083
## Dec 1964 637.0019 449.8268 902.0614 374.1625 1084.4790
## Jan 1965 661.9452 462.4437 947.5130 382.4748 1145.6216
## Feb 1965 625.6382 433.3659 903.2165 356.8101 1097.0070
We can see frOm that the graph that the values are now in normal form
There are several sources where you can obtain time series data for various purposes, including research, analysis, and modeling. Here are some popular sources you can explore:
1.Government and Economic Organizations:
U.S. Federal Reserve Economic Data (FRED): Provides a wide range of economic data for the United States. Bureau of Economic Analysis (BEA): Offers economic data on GDP, income, trade, and more. European Central Bank (ECB): Provides economic and financial data for the European Union. World Bank Data: Offers global economic, social, and environmental data. U.S. Census Bureau: Provides demographic and economic data for the United States. 2. Financial Markets and Investing:
Yahoo Finance: Offers historical stock prices, indices, and other financial data. Quandl: Provides a vast collection of financial, economic, and alternative data. Alpha Vantage: Offers free API access to historical and real-time financial data. Investing.com: Provides a variety of financial data and charts. Academic and Research Databases:
UCI Machine Learning Repository: Offers various datasets, including time series data, for machine learning and research. THIS IS IMPORTANT Kaggle Datasets: A platform that hosts various datasets, including time series data, often used in data science competitions. Time Series Data Library (TSDL): Maintained by the University of California, Riverside, it offers a collection of time series datasets for research purposes. 3.Climate and Environmental Data:
a.National Oceanic and Atmospheric Administration (NOAA): Offers climate, weather, and environmental data. b.Climate Data Online (CDO): Provides access to historical weather and climate data. c.Global Historical Climatology Network (GHCN): Offers global climate data. 3.Health and Medical Data:
World Health Organization (WHO): Offers health-related data, including disease statistics and health indicators. National Institutes of Health (NIH): Provides medical research data and clinical trial information. Social Media and Web Data:
Twitter API: Offers access to real-time and historical tweets, which can be used for sentiment analysis and trend detection. Reddit API: Provides access to Reddit posts and comments for analysis. Energy and Power Data:
U.S. Energy Information Administration (EIA): Offers energy-related data, including consumption, production, and prices. Transportation and Mobility Data:
Uber Movement: Provides anonymized data related to urban mobility and travel times. Transit Agencies: Many cities offer open data portals with transportation-related information. When using data from these sources, make sure to review the terms of use, data formats, and any restrictions associated with the data. Additionally, if you’re looking for specific types of time series data, consider searching for dedicated repositories or APIs related to your field of interest
Published by
AARON CHOLA