LSTM Neural Network for Time Series Forecasting

Contact : adrian.trujillo@speedy.com.ar

In machine learning, the short-term memory neural network (LSTM) is part of the artificial recurrent neural network (RNN) architectures.

An LSTM cell is made up of an entry gate, an exit gate, and a forget gate, allowing feedback connections, unlike a standard neural network.

LSTM networks are used to classify and process images, voice, video and data series.

The Long Short-Term Memory (LSTM) cell can process data sequentially and keep its hidden state through time. https://en.wikipedia.org/wiki/Long_short-term_memory

In this case we use it to make forecasts on a series of temporal data. A series of temporal data includes observations in a row for different variables as columns, one of which is time.

Data description:

Data set was downloaded Data Mendeley repository : “Data for: Machine learning for inventory management: Analyzing two concepts to get from data to decisions” (Taigel and Meller, 2018), from https://data.mendeley.com/datasets/7s3ys9ntnz/1

“Over two years of daily sales data from a restaurant in Stuttgart, Germany. Number of servings sold for 7 products (CALAMARI, FISCH, GARNELEN, HAEHNCHEN, KOEFTE, LAMM, STEAK). Features include calendar informations such as weekday or special events, information derived from historical sales as well as weather from the local weather station.” (Taigel and Meller, 2018)

In this example we will try to forecast ‘FISCH_DEMAND_T1’ (label), with the following features:

Related to substitutes or complementary: ‘PRAWN’, ‘CHICKEN’, ‘KOEFTE’, ‘LAMM’ and ‘STEAK’ (‘GARNELEN’, ‘HAEHNCHEN’, ‘KOEFTE’, ‘LAMM’, ‘STEAK’)

If the observation date occurs on a certain day of the week and month: Monday Tuesday Wednesday Thursday’,Friday , Saturday , Sunday (‘Montag’,‘Dienstag’, ‘Mittwoch’, ‘Donnerstag’, ‘Freitag’, ‘Samstag’, ‘Sonntag’) ‘MONTH_JAN’,‘MONTH_FEB’, ‘MONTH_MAR’, ‘MONTH_APR’, ‘MONTH_MAY’, ‘MONTH_JUN’, ‘MONTH_JUL’,‘MONTH_AUG’,‘MONTH_SEP’, ‘MONTH_OCT’, ‘MONTH_NOV’, ‘MONTH_DEC’

If the observation date occurs on a weekend or holiday: ‘ISHOLIDAY’, ‘WEEKEND’

And the environmental condition on the observation date: ‘WIND’, ‘CLOUDS’, ‘PRECIPITATION’, ‘SUN’, ‘AIR TEMPERATURE’ (‘WIND’, ‘BEWOELKUNG’, ‘NIEDERSCHLAG’, ‘SONNE’, ‘LUFTTEMPERATUR’)

Exploratory Data Analysis (EDA):

Pearson Correlation:

We will analyze how the variables are statistically related, for this we carry out a Pearson correlation between the variable to forecast ‘FISCH_DEMAND_T1’ (label) with respect to its inputs (features). Pearson’s correlation coefficient is the covariance between the two variables divided by the product of their standard deviations. The coefficient has a range between 1 and -1, when it approaches 1, it is a strong positive correlation that is when the feature varies in a positive sense, the variable to predict (label) also varies in a positive sense and vice versa. When the coefficient approaches -1, it is a strong negative correlation, that is when the feature varies in a positive sense, the variable to predict (label) also varies strongly in a negative sense and vice versa.

In the following figure, the forecast variable ‘FISCH_DEMAND_T1’ (label), does not show a strong correlation both positive and negative.

Correlation Plot

Components:

With Facebook Prophet algorithm we will extract the seasonal components of ‘FISCH_DEMAND_T1’, as shown in the following figure, it has an interannual trend with a slight decrease and a weekly variation, the algorithm does not show a significant variation during the year. This will help us to establish the window of our LSTM algorithm, we will take 7 days as input data (features) to predict the next day (label).

LSTM neural network architecture

There is a great variety of ways to configure the LSTM neural networks, which depend on the data to be processed and desired outputs, in this example we use stacked LSTM, the first with units equal to 56, which represents two months of 4 weeks, the second a represent a 4 week month, an Dropout layer to improve overfitting and finally a fully connected dense layer to obtain the output value.

A summary of the configuration used is provided below:

Traning data:

The original data goes from 04.10.2013 to 07.11.2015. (month day Year) We divide these data from 10.4.2013 to 11.6.2014, the end one year before, to train the neural network and we will predict from 11.7.2014 to 11.7.2015, to compare a prediction year with its real value, which will be observed a plot at the end of this document.

Neural network training results

The stacked LSTM was compiled with: optimizer = ‘adam’ and loss function = ‘mse’, these are parameters to find the coefficients of the neural network.

The loss is calculated on the basis of training data and validation data. Ideally, one would expect loss reduction after each or several iterations, the lower the loss, the better the model.

When overfitting occurs, the loss of training data is too much more less than loss of validation, this is not the case.

Forecasting:

We plot the actual value of ‘FISCH_DEMAND_T1’ in green and its forecast in blue. As can be seen, both values are quite close, although there is a small lag between both signals.

Acknowledgements

We thank to Mr. Fabian Taigel (University of Wuerzburg - Chair of Logistics and Quantitative Methods) and Mr. Jan Meller (University of Wuerzburg - Chair of Logistics and Quantitative Methods) for the work carried out in their research and for making the data obtained available. To Mendeley Data for hosting information.

Bibliography:

Taigel, Fabian; Meller, Jan (2020), “Data for: Machine learning for inventory management: Analyzing two concepts to get from data to decisions”, Mendeley Data, V1, doi: 10.17632/7s3ys9ntnz.1

Meller, Jan and Taigel, Fabian, Machine Learning for Inventory Management: Analyzing Two Concepts to Get From Data to Decisions (November 11, 2019). Available at SSRN: https://ssrn.com/abstract=3256643 or http://dx.doi.org/10.2139/ssrn.3256643