The main objective of the paper is to explain two kinds of datasets , one being seasonal and other being non-seasonal datasets in terms of time series modelling. The goal is to come up with best time series model by various techniques which will be discussed further.Time-series is a branch of data science that deals with univariate data with respect to date time. It is very useful for data that are particularly serially correlated. Handling time series data is quite challenging because it is quite difficult to understand the trend it may produce. For some datasets the trend can be totally random , while for others it can be seasonal or cyclic in nature . Time series analysis can be performed only if the dataset is stationary in nature. Throughout the paper , we first preprocess the dataset , make the dataset univariate and make the date time as index . Next steps involve plotting of time series plot, ACF , PACF and EACF graphs which helps in identifying the model to be used. Also we need to check if the dataset is stationary using Dicky-Fuller test. If it proves to be stationary we can directly analyze the bars in ACF and EACF and come up with AR or MA model . If its not stationary we can apply techniques like diffrencing,transforming and detrending to convert them into stationary . Further we apply ARIMA models and perform parameter estimation using AIC,BIC and so on. Further in the paper , various concepts pertaining to residual analysis is performed like ACF plot,histogram,qqplot,Shapiro-wilk test and Ljung-box plot . Prediction is based on forecasting on the original dataset for the future values and see how time series perform.
Keywords: Dicky-Fuller test,AIC,BIC,ACF,PACF,EACF,Forecasting,Time series modelling
Seasonal dataset Identification has been derived from Kaggle which is a temperature change dataset for different months. Kaggle is a good source for collecting any kind of dataset as there is clear description of the fields and the dataset is readily available in csv file.
Non-Seasonal dataset Identification has been collected from Fred official website which has large collection of time series datasets for various categories to chose from . The univariate dataset is readily available for time series analysis with clear description . It also has a time series plot already plotted so we can chose the dataset with a certain trend . I feel it’s a easy and great learning to capture datasets from the fred website which has both financial and non-financial dataset.
The techniques followed in the project are based on Box-Jenkins Approach. Every time series project has six steps that need to be followed according to the model to achive the desired goal. Initial step being check for stationary followed by finding the best parameters to the model using ACF,PACF,EACF plots. Further performing and finding the best models which has least error(AIC).Forecasting is performed based on the model developed to find the future values.
| Dataset Description | |
|---|---|
Seasonal.Dataset |
The FAOSTAT temperature change dataset contains the mean temperature change by country along with their annual updates . The time duration of the dataset goes from 1961-2019. The dataset has statistics available for monthly, seasonal and annual mean temperatures. For the analysis purpose we have converted the columns with years to a single column and have filtered out only temperature change data’s’ and have ignored global warming and climate change respectively. By the problem statement it is known that temperature change can vary for every month of each year which makes it the seasonal part and can be clearly seen in the time series plot as well.
Non-Seasonal.DatasetThe dataset is a unemployment rate dataset for over 20 years . It has been collected from a household survey for population and formulated the .csv file with date and the percentage of unemployment rate over the years. The data has been collected from the source , “US Bureau of Labor Statistics” which has been present in the fred website. The dataset talks about the employment situation in USA , which is a monthly data and is seasonally adjusted.
\[\\[1in]\]
Step1:Stationary Check Every time-series dataset before fitting to a model needs to be checked for Stationarity. In R, this can be done by Dicker-Fuller test based on the p-value. If p-value is greater than 0.05 then we can conclude that data is not stationary. Further in that case techniques like Diffrencing,Detrending,Transformation need to be applied and re-run the test. Based on the number of times the diffrentiation is done untill we get desired p value forms the d parameter in modelling. If First Diffrentiated then d=1 so on..
Step2:Model Selection Model Selection is done based on the significant lines in ACF, PACF plots above the confidence interval. Also , the parameters p,q come from the checking the plots. If the plots are not clear EACF can be checked to know a clearer values.
Step3:Parameter Estimation Based on various models and parameters, AIC,BIC,Loglikelihood the best parameters are chosen for the time series model.
Step4:Residual Analysis The model chosen must be verified for its correctness and accuracy. This is where Residual Analysis plays a major role.There are various techniques available to check it . The major ones include starting from ACF plot to (QQ plot, Histogram and Shapiro-Wilk Test) . To verify if residual is white noise or not , we can perform Ljung - Box test.
Step5:Forecasting The best model chosen based on the above steps can be used to forecast future values. This is done not on the diffrenced dataset but on the actual dataset or the raw data . This is an important steps for good forecasting.
## ds y
## 1 1961 0.14303154
## 2 1962 -0.02839765
## 3 1963 -0.02629724
## 4 1964 -0.12286501
## 5 1965 -0.22415410
## 6 1966 0.09506954