1 Introduction

The challenge of providing electricity to billions of people in developing and emerging economies while transitioning to clean, low-carbon energy sources is paramount. The global population has grown exponentially over centuries, reaching around eight billion[1] today, with approximately 774 million[2] lacking access to electricity in 2022. Bangladesh, a Southeast Asian nation, stands out due to its significant natural gas reserves and strategic location for regional energy trade. However, despite a population of over 169.4 million[3], a high population density, and potential energy resources, only 98.99%[4]in 2021 of Bangladeshis have electricity access, and per capita energy consumption is exceptionally low at 449 kWh[5].

The global demand for energy has doubled over the last five and a half decades, reaching 100.17 quadrillion Btu[6] in 2019, and it is projected to double again by 2030[6]. This rapid increase in energy consumption poses significant challenges, including environmental consequences and equitable access for the poor. To address these challenges, countries must focus on capacity building and planning for both supply- and demand-side strategies. Renewable energy sources and redesigned subsidy policies are becoming core strategies to tackle this issue.

Natural gas, Oil, Coal is the primary indigenous source of commercial energy in Bangladesh, understanding how energy demand will evolve in response to rapid economic growth is of utmost importance. Energy demand models, specifically tailored to forecast energy requirements, can aid the government in planning capacity development and investments in the natural gas sector. However, existing energy demand projections in the country have traditionally relied on trend extrapolation, lacking sophistication.

1.1 Objective

Energy demand forecasting serves as a valuable tool for governments to anticipate future energy consumption trends. With these forecasts, governments can proactively plan and bolster their power generation capacity. While Bangladesh is rich in natural gas resources, it heavily relies on imported fossil fuels such as coal and oil, primarily sourced from countries like India, Indonesia, South Africa[7] and the United Arab Emirate[8]. Having insight into future energy demands allows for prudent financial management by the government and the timely establishment of new power plants to prevent a shortfall in power generation as consumption surpasses current capacity.

Objective of this project is to forecast energy demand for 10 years from 2022 to 2032 by using SARIMA model.

2 Literature Review

Forecasting plays a pivotal role in planning and decision-making, offering insights into future uncertainties by analyzing past and present data. In the energy sector, accurate load forecasting is crucial as it directly influences planning and management strategies. Inaccurate forecasts can result in significant financial and time losses for power companies. Three main categories of approaches are commonly used for load forecasting: traditional methods, machine learning techniques, and support vector regression approaches[9].

One widely adopted traditional approach is the Seasonal Autoregressive Integrated Moving Average (SARIMA) model, which follows the Box-Jenkins methodology. This model has been applied to hourly electrical load data for short-term predictions, demonstrating its reliability and importance in load forecasting.

Linear Regression stands out as a prominent machine learning technique employed in the realm of energy demand forecasting. Linear Regression Models adopt a sophisticated approach, intricately addressing numerous crucial factors, including GDP, population, and fuel prices[6], in accurate energy demand predictions. The model leverage data related to a country’s Gross Domestic Product (GDP), its population, as well as the prevailing market prices of oil, coal, and gas, all playing pivotal roles in the prediction process. Linear Regression Model come with notable merits. They are commended for their inherent simplicity in terms of model structure, demanding only a minimal set of input variables.

In parallel, Time Series Models, specifically SARIMA (Seasonal Autoregressive Integrated Moving Average), emerge as a viable alternative for forecasting heat demand[10]. These models pivot on the analysis of past heat consumption behavior and the incorporation of exogenous variables to craft future projections. Operating within the ARMA (Autoregressive Moving Average) framework, SARIMA models exhibit a pronounced emphasis on model transparency and accuracy. Furthermore, SARIMA model is equipped to confront heteroscedasticity, a statistical phenomenon often encountered in real-world data, through the utilization of GARCH (Generalized Autoregressive Conditional Heteroskedasticity) models. SARIMA models are characterized by sequential modeling, initially liberating predictions from weather dependencies and subsequently focusing on the remaining components. Despite these strengths, it is domineering to recognize the challenges associated with Time Series Models, particularly their strict requirement for a continuous and substantial historical data quantity, which may not consistently align with real-world data availability.

The load predictive process plays a fundamental role in ensuring the power grid functions effectively[11]. The objective of load forecasting is to provide accurate and reliable predictions of future load demand. These forecasts can be categorized into various time frames, including very short-term, short-term, medium-term, and long-term, depending on the specific application. As power utilities expand to meet growing demand, the resulting complex and highly stressed power systems become vulnerable to cascading outages [12].

In 2013, Henao et al.[13] introduced a hybrid forecasting model that combines SARIMA and a generalized single neuron model to predict nonlinear time series data, particularly the monthly electricity demand in the Colombian energy market. The hybrid model leverages SARIMA for capturing linear components in the time series and employs artificial neural networks to forecast nonlinear elements within the SARIMA model’s shocks. The advantages of this approach include its ability to handle nonlinear behavior in data, reliance on the well-established SARIMA methodology for model specification, and avoidance of the need for heuristics or expert knowledge to configure the neural network. The model’s performance is assessed by comparing its electricity demand forecasts to those generated by SARIMA and a multiplicative single neuron model. The results indicate that the proposed hybrid approach outperforms these models individually.

In 2017, Mathiyalagan and colleagues[14] discussed strategies for reducing electricity consumption, emphasizing the significance of smart electricity meters. They highlighted the potential for various analytics on data generated by these meters, including parameters like electricity demand, time of use, and tariff rates. Their study leveraged the integration of Apache Hadoop and R for time series analysis, employing autoregressive integrated moving average (ARIMA) and autoregressive moving average (ARMA) models for evaluation, with measurements assessed using the Akaike information criterion (AIC) and root mean square error (RMSE). The year 2018 saw Do et al. [15] exploring the possibilities presented by smart grids, focusing on energy consumption prediction techniques and energy efficiency calculations. They also provided a case study to demonstrate the proposed solution’s practicality. In the same year, Yildiz et al.[16] delved into smart grid technologies and electricity consumption. They introduced a novel forecast model for electrical loads that employed clustering and classification techniques, particularly efficient in performance forecasting and load profiles distribution. Suresh et al.[17] (2019) harnessed smart meter data for electricity demand prediction, utilizing particle swarm optimization and k-means algorithm.

Also, during 2017, García et al.[18] focused on the prediction of demand time series, emphasizing the importance of providing probabilistic models instead of point forecasts. The study employs a non-linear autoregressive neural network (NAR24) and a seasonal ARIMA (SARIMA) model to analyze and forecast three demand series. It also computes naïve prediction intervals (PI) to assess forecast accuracy, achieving generally satisfactory results with MAPE values under 4%.

In 2021, Andoh et al.[19] explored time series models for electricity demand prediction in Ghana’s Western Regions. Secondary data from the regional headquarters of ECG was analyzed using SARIMA model. 95% confidence level, these models provide reliable insights for future electricity estimation, revenue projection, and customer network expansion.

3 Methodology

3.1 Data Description

The data set utilized for energy demand prediction was sourced from Kaggle[17], comprising a total of 128 variables. These variables encompass diverse data categories pertaining to energy production, usage, and associated factors. They encompass aspects like bio fuel consumption and electricity generation, carbon intensity, coal utilization, electricity sources, greenhouse gas emissions, energy efficiency, per capita electricity consumption, population statistics, primary energy consumption, and renewable energy sources, including solar and wind power. In the context of energy demand prediction, a critical focus is placed on data pertaining to energy consumption and the types of fuels employed. Specifically, for Bangladesh, the primary sources of energy generation predominantly revolve around bio fuels, including oil, gas, and coal.

Energy consumption growth rate by fuel.

In Figure 1, the annual growth rates in energy consumption across various fuel types had been observed. Notably, gas exhibits the highest growth rate, surpassing 8%, signifying an annual increase of over 8% in energy consumption derived from gas sources. Following closely behind, both coal and oil energy consumption show growth rates of approximately 7.5% and 4.25%, respectively. In contrast, hydro energy consumption in Bangladesh displays the lowest growth rate, standing at around 1.25%. Renewable energy sources contribute to the energy mix with a growth rate of nearly 2%.

Energy consumption of coal, gas and oil.

Figure 2 illustrates the consumption of coal, gas, and oil spanning from 1971 to 2020, encompassing a period of 49 years in Bangladesh’s fossil fuel energy consumption. The utilization of coal as an energy source commenced at 25 terawatt-hours in 1971 and reached its zenith, hitting 450 terawatt-hours in 2018. Gas energy consumption closely followed, slightly exceeding 400 terawatt-hours in 2018. In contrast, oil energy experienced the lowest consumption, totaling 100 terawatt-hours in 2018. Notably, 2018 marked the pinnacle of energy consumption. Subsequently, energy consumption witnessed a decline, possibly attributable to the onset of the Covid-19 pandemic in 2019.

Per capita energy consumption in Bangladesh.

In Figure 3, energy consumption per person has been utilized to assess the energy demand situation in Bangladesh. The violet line represents the overall energy consumption in Bangladesh. In 1971, an individual in Bangladesh consumed only 250 kilowatt-hours of energy, which has grown to nearly 2750 kilowatt-hours by 2020. The colors green, blue, and red denote the consumption of gas, oil, and coal energy. Energy usage per capita has been employed for in-depth analysis to predict energy demand in Bangladesh.

3.2 Time Series Analysis

3.2.1 R Libraries for Time Series Analysis

R software was employed for time series analysis, utilizing various libraries designed for different aspects of time series analysis. The following libraries were utilized:

dplyr: This library is used for data manipulation and transformation. It provides functions for filtering, summarizing, and arranging data, making it easier to work with time series data.
ggplot2: It’s a popular data visualization library that helps create customizable and publication-quality plots and charts, making it easier to visualize time series data.
stats: The stats library is a part of the base R package and contains various statistical functions and methods that can be used in time series analysis, including functions for calculating basic statistics and conducting hypothesis tests.
forecast: This library is specifically designed for time series forecasting. It provides various functions and tools for modeling and forecasting time series data using methods like ARIMA and exponential smoothing.
imputeTS: It offers tools for imputing or filling in missing values in time series data, which is crucial for ensuring that your time series analysis is based on complete data.
zoo: The zoo library is used for working with irregular time series data. It provides data structures and functions for handling time series data with non-uniform time intervals.
TTR (Technical Trading Rules): This library offers a wide range of technical indicators commonly used in financial time series analysis. It includes functions for calculating moving averages, oscillators, and other technical indicators.
lubridate: It simplifies working with date and time objects in R. It provides functions to parse, manipulate, and extract information from date-time data, which is essential for time series analysis involving date-time variables.
reshape2: This library helps with data reshaping and restructuring. It’s often used to transform data into the desired format for time series analysis and visualization.

3.2.2 Observing the Data

Line plot of the time series data.

Figure 4 illustrates the monthly kilowatt-hour energy consumption per capita in Bangladesh. The line plot pattern reveals clear seasonality in the data set, indicating that energy usage is influenced by the changing seasons. Specifically, energy consumption tends to increase during the summer months and decrease during winter. Consequently, there is a noticeable peak in energy usage during the summer season. Additionally, the data exhibits an upward trend, with each year surpassing the energy consumption of the previous year. Over the 50-year period, the difference between summer and winter energy consumption has grown, suggesting that the time series displays a multiplicative nature.

3.2.3 Decomposition of the Data

Decomposition of a time series data refers to the process of breaking down a time series into its individual components or constituent parts. The goal of decomposition is to better understand and analyze the underlying patterns, trends, seasonality, and irregularities present in the time series. Typically, a time series can be decomposed into the following three main components:

Trend, seasonal and residual component of time series data.

Trend Component: This component represents the long-term progression or underlying trend in the data. It helps identify whether the time series is generally increasing, decreasing, or following a specific pattern over time. Removing the trend component can help isolate seasonality and irregularities. Figure 5 displays the decomposition of the energy usage per capita data set. It prominently exhibits an ascending trend, indicating a consistent yearly increase in energy consumption within Bangladesh.
Seasonal Component: The seasonal component captures the regular and repeating patterns that occur at fixed intervals within the time series. These patterns may be daily, weekly, monthly, or any other seasonality that repeats over a known period. Seasonal decomposition helps in understanding the seasonal variations in the data. In figure 5, data shows monthly seasonality.
Residual Component (or Error Component): The residual component represents the random noise or irregular fluctuations in the time series data that cannot be attributed to the trend or seasonality. It includes any unexpected or unexplained variations in the data. Analyzing the residual component can help detect anomalies or unusual events.

3.2.4 Trend Analysis

Trend analysis is a statistical technique used to examine and identify patterns, tendencies, or movements in data over time. It involves analyzing historical data points or observations to uncover underlying trends or long-term changes in a data set. This analysis helps in understanding whether there is a consistent upward or downward movement, stability, or any other discernible pattern in the data over a specified time period. Trend analysis is commonly used in various fields, including finance, economics, epidemiology, and environmental science, to make informed decisions, predictions, or forecasts based on historical data trends.

Methods of trend analysis are given bellow:

Method of Semi Average: The Method of Semi Average involves calculating the average of data points for two consecutive time periods to identify trends. This method helps smooth out fluctuations and highlights the overall direction of change.
Method of Least Squares: The Method of Least Squares is a mathematical approach to finding the best-fitting line (usually a straight line) through a set of data points. It minimizes the sum of the squared differences between the observed data points and the points predicted by the line. This method is often used for linear regression analysis to estimate trends and make predictions.
Method of Simple Moving Average: The Simple Moving Average (SMA) method calculates the average of a fixed number of the most recent data points. It is used to smoothen out short-term fluctuations and highlight underlying trends. The number of data points included in the average is called the “window” or “period.”
Exponential Moving Average: The Exponential Moving Average (EMA) is similar to the SMA but gives more weight to recent data points, making it more responsive to recent changes. It assigns exponentially decreasing weights to older data points.EMA is useful for capturing trends with changing dynamics.
Weighted Moving Average: The Weighted Moving Average assigns different weights to each data point in the average calculation. This means that some data points have a more significant impact on the average than others. It allows for customization of the smoothing process based on the importance of each data point.

3.2.4.1 Method of Semi Average

To apply the Method of Semi Average, calculation of the average of data pairs from consecutive time periods had been done. Let’s denote the time series data as “Y,” where Y₁, Y₂, Y₃, … represent the values at each time period.

Calculation of the semi-averages as follows:

Semi-Average 1: (SA₁) = (Y₁ + Y₂) / 2

Semi-Average 2: (SA₂) = (Y₂ + Y₃) / 2

Semi-Average 3: (SA₃) = (Y₃ + Y₄) / 2 … and so on.

Trend analysis by method of semi average

Figure 6 displays both the original trend in red and the semi-average trend in blue. The Method of Semi Average is a straightforward method that provides a quick and simple way to identify potential trends in time series data. However, it comes with limitations, especially when dealing with non-linear data, data that includes noise, or outliers. Typically, it serves as an initial exploratory tool before more advanced time series analysis methods are applied.

3.2.4.2 Method of Least Squares

The Method of Least Squares is a statistical technique used for fitting a mathematical model to a set of data points in such a way that it minimizes the sum of the squared differences between the observed and predicted values. It is commonly used for linear regression analysis, where the goal is to find the best-fitting linear relationship between variables.

The method assumes that the relationship between the independent variable (x) and the dependent variable (y) is linear. It can be expressed as:

\[ y = a + bx \tag{3.1} \]

Where:

y is the dependent variable.
x is the independent variable.
a is the intercept (the value of y when x is 0).
b is the slope (the rate of change of y concerning x).

Trend analysis by method of least squares

Figure 7 depicts the original data with a red line and the trend line, computed using the least squares method, represented by a blue line. While the Method of Least Squares is a powerful and widely used technique, its applicability and accuracy depend on the underlying assumptions of linearity and independence of errors. It is essential to assess these assumptions and consider potential drawbacks when using this method for regression analysis.

3.2.4.3 Method of Simple Moving Average

The method of Simple Moving Average (SMA) is a common technique used in time series analysis to smooth data and identify trends over a specific time period. It involves calculating the average of data points within a moving window of fixed size. For each data point in the time series (except for the first ‘n-1’ points, where ‘n-1’ is the window size), calculation of the average of the previous ‘n’ data points, including the current one had been done. This moving average becomes the smoothed value for that time point.

The formula for calculating the Simple Moving Average at time ‘t’ (SMA_t) is:

\[ SMA_t = \frac{X_t + X_{t-1} + X_{t-2} + \ldots + X_{t-(n-1)}}{n} \tag{3.2} \]

Where:

SMA_t is the Simple Moving Average at time t.
X_t represents the data point at time t.
‘n’ is the window size (the number of data points to include in the average).

Trend analysis by method of simple moving average

Figure 8 displays the original data represented by the red line and the trend line, computed using the Simple Moving Average method, depicted by the blue line. The Simple Moving Average is a straightforward method for smoothing time series data and revealing underlying trends. However, it has limitations, particularly in situations where more advanced techniques might be needed to capture complex patterns or respond to rapid changes in the data.

3.2.4.4 Method of Exponential Moving Average

The Exponential Moving Average (EMA) is a method used in time series analysis to calculate a smoothed average of a sequence of data points, with greater weight given to more recent observations. It is particularly useful for capturing short-term trends and responding quickly to changes in the data. The EMA is calculated using the following formula:

\[ EMA_t = αX_t + (1-α)EMA_{t-1} \tag{3.3} \]

Where:

EMA_t is the Exponential Moving Average at time period t.
X_t is the observed data point at time period t.
EMA_t−1 is the Exponential Moving Average at the previous time period t−1).
α is the smoothing factor, which is a constant value between 0 and 1. It determines the weight assigned to the most recent observation. A higher α gives more weight to recent data, making the EMA more responsive to short-term fluctuations.

Trend analysis by method of exponential moving average

Figure 9 exhibits the original trend represented by the red line and the trend line generated using the Exponential Moving Average method depicted by the blue line. the Exponential Moving Average is a versatile method for smoothing time series data, with its responsiveness to recent observations being a key advantage. However, its performance may vary depending on the characteristics of the data and the chosen smoothing factor.

3.2.4.5 Weighted Moving Average

Weighted Moving Average (WMA) is a time series forecasting method that assigns different weights to different data points in the time series. This allows recent observations to have a stronger influence on the forecast compared to older observations. The formula for calculating the Weighted Moving Average at time t is as follows:

\[ WMA_t = (w_1 X_{t-1}) + (w_2 X_{t-2}) + \ldots + (w_n X_{t-n}) \tag{3.4} \]

Where:

WMA_t is the Weighted Moving Average at time t.
X_t−1,X_t−2,…,X_t−n are the data points in the time series at times t-1, t-2, ..., t-n, respectively.
w_1,w₂,…,w_n are the corresponding weights assigned to each data point.

Trend analysis by method of weighted moving average

Figure 10 illustrates the original data represented by the red line, alongside a trend line generated using weighted moving averages, depicted by the blue line.Weighted moving averages offer flexibility and responsiveness to recent data but require careful weight selection and can be sensitive to outliers. Their effectiveness depends on the specific characteristics of the time series and the goals of the analysis.

3.2.4.6 Detrend the data set by using Weighted Moving Average

Detrending a time series data set involves removing the underlying trend or long-term systematic patterns from the data, leaving behind the shorter-term fluctuations and noise. The goal of detrending is to isolate the cyclical and irregular components of the time series for further analysis or modeling.

Detrending the energy per capita dataset

The blue line in Figure 11 represents the observed data points obtained by dividing the original observations by the weighted moving average. This transformed blue line now incorporates seasonal patterns, cyclic components, and errors. As a result of this transformation, the time series has become stationary.

3.2.5 Seasonal Index

A seasonal index, also known as a seasonal factor or seasonal multiplier, is a statistical measure used in time series analysis and forecasting. It quantifies the relative impact of seasonal patterns or fluctuations in a time series data set. Seasonal indices are typically expressed as ratios or percentages and provide information about how a particular time period’s value compares to the average or baseline value for that same period across multiple seasons or time periods.

There are many methods exist for calculating seasonal index. Some important methods are given bellow:

Method of Simple Averages: This method calculates seasonal indices by computing the average value of the data for each season over multiple years or time periods.
Ratio to Trend Method: In this method, seasonal indices are determined by comparing the actual data to a trend line. It’s often used when there’s a clear trend in the data along with seasonality.
Ratio-to-Moving-Average Method: This method uses a moving average to smooth out the data and calculate seasonal indices.
Method of Link Relatives: In this method, seasonal indices are computed by comparing the current period’s data to the data from a previous period (usually the same season in the previous year).

3.2.5.1 Method of Simple Averages

The Method of Simple Averages is a technique used to calculate seasonal indices for time series data. Seasonal indices are values that represent the relative strength or impact of each season (e.g., quarters, months) within a time series. This method provides a straightforward way to identify seasonal patterns in data.

The seasonal index (SI) for a specific time period within a season is calculated using the following equation:

\[ SI_t = \frac{X_t}{\bar{X}_s} \tag{3.5}\]

Where:

SI_t is the seasonal index for time period t within a season.
X_t is the actual value of the data for time period t.
\({\bar{X}_s}\) is the seasonal average for the season to which time period t belongs.

Seasonal index analysis by method of simple average

Figure 12 illustrates the seasonal indices derived from the energy consumption per capita data set, calculated by using method of simple average. It visually represents the recurring pattern of energy usage in Bangladesh, which repeats every year. Specifically, the data reveals that energy consumption reaches its highest point in the month of April, surpassing all other months. Generally, during the summer season, there is a higher energy consumption, while in the winter season, energy consumption tends to be lower.

3.2.5.2 Deseasonalize the Detrended Data by Method of Ratio to Moving Average

To “deseasonalize” a data set means to remove or adjust for the seasonal patterns and variations within the data. Seasonal patterns are regular fluctuations or cycles that occur at fixed intervals, such as daily, monthly, or yearly, and are often associated with external factors like weather, holidays, or business cycles. Deseasonalizing data is done to isolate the underlying trend or non-seasonal component, making it easier to analyze and forecast.

The Ratio to Moving Average method is a technique used in time series analysis to calculate seasonal indices or factors. These indices help deseasonalize the data, allowing analysts to isolate and study the non-seasonal components of a time series.

The equation for calculating the seasonal index using the Ratio to Moving Average method can be expressed as follows:

Seasonal Index (SI) for period t = (Value of the time series for period t) / (Moving Average for the corresponding season or period)

Line plot of deseasonalize data

Figure 13 displays the deseasonalized data set represented by the blue line. To achieve this deseasonalization, seasonal indices were calculated using the ratio to moving average method. These indices were then used to divide the detrended data. Consequently, the data no longer exhibits seasonal patterns and only residual errors are present.

3.3 Forecasting Model: SARIMA

This study utilizes the Seasonal Autoregressive Integrated Moving Average (SARIMA) forecasting technique, typically represented as ARIMA(p, d, q) * (P, D, Q)s.

Here:

p stands for the number of non-seasonal Autoregressive components.
q represents the number of non-seasonal Moving Average components.
d is the order of non-seasonal differencing.
P denotes the number of seasonal Autoregressive components.
Q signifies the number of seasonal Moving Average components.
D indicates the order of seasonal differencing.
s represents the time span of the repeating seasonal pattern, often expressed as monthly with s = 12[21].

This seasonal ARIMA model combines both non-seasonal and seasonal factors within a multiplicative[19] framework and is expressed as ARIMA(p, d, q) * (P, D, Q) S.

The standard configuration for a seasonal SARIMA model is denoted as SARIMA (p, d, q) (P, D, Q) s and can be described as follows:

\[ 𝝓(𝑩)𝜱(B)^s(1-B)^d(1-B)^sy_t = 𝜽(B)𝜣(B^s)𝜺_t \tag{3.6} \]

Where,

\[ 𝝓(𝑩) = 1 - 𝝓_1B - 𝝓_2B^2 - ... - 𝝓_pB^p \tag{3.7} \]

\[ 𝜱(B^s) = 1 - 𝜱_1B^s - 𝜱_2B^{2s} - ... - 𝝓_pB^{ps} \tag{3.8} \]

\[ 𝜽(B) = 1 + 𝜽_1B + 𝜽_2B^2 + ... + 𝜽_pB^p \tag{3.9} \]

\[ 𝜣(B^s) = 1 + 𝜣_1B^s + 𝜣_2B^2s + ... + 𝜣_pB^{ps} \tag{3.10} \]

y_t represents the observation in the time series at a specific time t.

B denotes the backward shift operator.

ε_t is a sequence of error terms with a mean of zero and a constant variance σ²
ϕ_i, Φ_j, θ_i, Θ_j represent the non-seasonal and seasonal components of autoregressive (AR) and moving average (MA) components, respectively.

The non-seasonal components are:

\[ AR: 𝝓(𝑩) = 1 - 𝝓_1B - ... - 𝝓_p𝑩_p \tag{3.11} \]

\[ MA: 𝜽(𝑩) = 1 + 𝜽_1𝑩 + ... + 𝜽_q𝑩_q \tag{3.12} \]

The seasonal components are:

\[ Seasonal AR: 𝝋(𝑩_s) = 𝟏 - 𝝋_1𝑩_𝑺 - ⋯ - 𝝋_p𝑩_{𝑷𝑺} \tag{3.13} \]

\[ Seasonal MA: 𝝑(𝑩_𝑺) = 𝟏 + 𝝑_𝟏𝑩_𝑺 + ⋯ + 𝝑_𝑸𝑩_{𝑸𝑺} \tag{3.14} \]

3.3.1 auto.arima(): R Function for Finding Best SARIMA Model

The ‘auto.arima()’ function in R is part of the ‘forecast’ package and is designed to automate the process of identifying the best-fitting ARIMA model for a given time series dataset. The ‘auto.arima()’ function simplifies the process of ARIMA model selection by automatically identifying the optimal values for the model’s order parameters (p, d, q) using a stepwise approach. It employs the AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) to evaluate candidate models and selects the one with the lowest information criterion value.

The ‘auto.arima()’ function offers several advantages for time series analysis. Firstly, it streamlines the process by automating model selection, a significant time-saver, especially when dealing with multiple time series datasets. Secondly, it provides a comprehensive selection of ARIMA models, ensuring that the chosen model is not only theoretically sound but also statistically robust. Moreover, it excels in handling seasonal time series data by automatically identifying and incorporating seasonal components into the model. Lastly, ‘auto.arima()’ is user-friendly, catering to both beginners and seasoned time series analysts.

However, there are certain limitations to consider. When dealing with large datasets, the function can become computationally intensive as it tests various model combinations, potentially leading to longer processing times.Additionally, while it simplifies model selection, it might not always capture the full complexity of certain time series data, necessitating manual adjustments.Furthermore, there is a risk of overfitting, as the function may occasionally choose models that fit the data too closely, resulting in poorer out-of-sample performance.

4 Result and Discussion

Summary of the result of auto.arima()
Model	ARIMA(5,1,1)(2,1,2)[12]
AR1	1.080398
AR2	-0.518415
AR3	0.5163269
AR4	-0.7643631
AR5	0.2916679
MA1	-0.8257045
SAR1	-0.2177961
SAR2	-0.7157057
SMA1	-0.559535
SMA2	0.3690222
sigma2	1430.423
Log Likelihood	-3032.415
AIC	6086.83
RMSE	37.82093

Table 1 shows that, the ‘auto.arima()’ function has generated a time series model characterized by an ARIMA(5,1,1)(2,1,2)[12] configuration, which signifies a seasonal ARIMA model with non-seasonal orders of (5, 1, 1) and seasonal orders of (2, 1, 2) with a seasonal period of 12. This model incorporates various coefficients, including AR1 to AR5, representing autoregressive terms, and MA1, representing the moving average term. Additionally, it accounts for seasonal components with SAR1 and SAR2 denoting seasonal autoregressive terms and SMA1 and SMA2 representing seasonal moving average terms. The estimated variance of the model’s errors, \(𝛔^2\) is approximately 1430.423, while the log-likelihood value stands at -3032.415. The Akaike Information Criterion (AIC) for this model is 6086.83, and the Root Mean Squared Error (RMSE) is calculated to be 37.82093. These metrics collectively provide insights into the model’s structure, goodness of fit, and performance in handling the time series data.

Estimated Variance of Model’s Errors (\(𝛔^2\)): This value, approximately 1430.423, represents the estimated variance of the model’s errors or residuals. In the context of time series modeling, it indicates how much the actual data points deviate from the predicted values by the model. A higher \(𝛔^2\) suggests that the model’s predictions have a larger spread or variability, while a lower value indicates that the model fits the data closely with less variability.
Log-Likelihood Value: The log-likelihood value of -3032.415 is a measure of how well the model fits the observed data. In general, a higher log-likelihood indicates a better fit, as it signifies that the observed data points are more likely to have been generated by the model. Conversely, a lower log-likelihood suggests a poorer fit. It’s a critical value for model selection and comparison, often used in conjunction with other criteria like the AIC.
Akaike Information Criterion (AIC): The AIC value of 6086.83 is a criterion for model selection. It balances the goodness of fit against the complexity of the model. Lower AIC values indicate a model that fits the data well while penalizing for complexity, making it a measure of the model’s overall quality. It’s commonly used to compare different models, with the model having the lowest AIC considered the best.
Root Mean Squared Error (RMSE): The RMSE value of 37.82093 is a measure of the model’s prediction accuracy. It quantifies the average difference between the observed data points and the model’s predictions. A lower RMSE indicates that the model’s predictions are, on average, closer to the actual data points, signifying better predictive performance. It’s a widely used metric for assessing the quality of time series forecasts.

These values collectively provide information about how well the time series model fits the data, its complexity, and its predictive accuracy. Lower sigma2, higher log-likelihood, and lower AIC and RMSE values are generally desirable, indicating a better-performing model.

Forecasting energy demand of Bangladesh for 20 years.

Figure 14 illustrates the projected power demand in Bangladesh spanning from 2022 to 2042. The blue line represents the forecasted energy demand, while the red line represents the actual observations. According to the forecast, in 2042, the energy demand is expected to reach 4000 kilowatt-hours per capita.

Comparatively, in 2019, the highest recorded energy consumption was approximately 3500 kilowatt-hours per capita. This projection indicates a substantial increase of over 12% in energy consumption within just 20 years when compared to the energy usage in 2019.

5 Conclusion

The objective of this project was to forecast energy demand in Bangladesh from 2022 to 2032 using the SARIMA (Seasonal Autoregressive Integrated Moving Average) model, a robust method for time series forecasting.

The dataset for this analysis was extensive, comprising 128 variables related to energy production, consumption, and various associated factors. Notable observations from the data include the strong growth of gas consumption, the utilization of coal and oil, and a steady increase in energy consumption per capita over the years.

In-depth time series analysis and data preprocessing involved techniques such as detrending and deseasonalizing to isolate underlying patterns. Seasonal indices were computed to understand recurring patterns, which were then used for deseasonalization.

The SARIMA model, selected using the ‘auto.arima()’ function, was identified as SARIMA(5,1,1)(2,1,2)[12]. The model incorporated autoregressive and moving average terms, both non-seasonal and seasonal, along with other coefficients.

The model evaluation metrics, including the estimated variance of model errors (\(𝝈^2\)), log-likelihood, AIC, and RMSE, provided insights into the model’s goodness of fit and predictive accuracy. The model demonstrated a reasonable fit to the data with acceptable predictive performance.

In the projection, it was revealed that energy demand in Bangladesh is expected to reach 4000 kilowatt-hours per capita by 2042, signifying a substantial increase of over 12% within just two decades compared to 2019 levels.

In conclusion, this study emphasizes the pressing need for robust energy demand forecasting in Bangladesh to address the challenges of rapid population growth and increased energy consumption. The SARIMA model offers a promising tool for policymakers and energy planners to proactively manage capacity development and investments, thereby ensuring a sustainable and accessible energy future for all citizens. Effective implementation of these forecasts and proactive planning will be vital in steering Bangladesh toward a cleaner, more sustainable, and accessible energy future while meeting the demands of a growing population and economy.

Reference

[1] “Population Matters.” https://populationmatters.org/lp-thefacts/?gclid=CjwKCAjwu4WoBhBkEiwAojNdXlA81k2M8nv81- PY2F3Q1_DE2O6kY1jBBjVLWbvWMa5Oc-8VDFFCfhoCBIMQAvD_BwE (accessed Sep. 14, 2023).

[2] “IEA.” https://www.iea.org/commentaries/for-the-first-time-in-decades-the-number-of-peoplewithout-access-to-electricity-is-set-to-increase-in-2022 (accessed Sep. 14, 2023).

[3] “Data Commons.” https://datacommons.org/place/country/BGD/?utm_medium=explore&mprop=count&popt=Pe rson&hl=en (accessed Sep. 14, 2023).

[4] “Trading Economics.” https://tradingeconomics.com/bangladesh/access-to-electricity-percentof-population-wbdata.html#:~:text=Access%20to%20electricity%20(%25%20of%20population)%20in%20Ban gladesh%20was%20reported,compiled%20from%20officially%20recognized%20sources. (accessed Sep. 14, 2023).

[5] “World Data.” https://www.worlddata.info/asia/bangladesh/energy-consumption.php (accessed Sep. 14, 2023).

[6] A. R. Anik and S. Rahman, “Commercial energy demand forecasting in bangladesh,” Energies (Basel), vol. 14, no. 19, Oct. 2021, doi: 10.3390/en14196394.

[7] “Volza Grow Global.” https://www.volza.com/p/coal/import/import-in-bangladesh/ (accessed Sep. 14, 2023).

[8] “OEC.” https://oec.world/en/profile/bilateral-product/crudepetroleum/reporter/bgd#:~:text=Bangladesh%20imports%20Crude%20Petroleum%20primaril y,Arab%20Emirates%20(%2419.7k). (accessed Sep. 14, 2023).

[9] H. Musbah and M. El-Hawary, “SARIMA Model Forecasting of Short-Term Electrical Load Data Augmented by Fast Fourier Transform Seasonality Detection.”

[10] T. Fang and R. Lahdelma, “Evaluation of a multiple linear regression model and SARIMA model in forecasting heat demand for district heating system,” Appl Energy, vol. 179, pp. 544– 552, Oct. 2016, doi: 10.1016/j.apenergy.2016.06.133.

[11] Kakoli Goswami and Aditya Bihar Kandali, “Electricity Demand Prediction using Data Driven Forecasting Scheme: ARIMA and SARIMA for Real-Time Load Data of Assam,”27

[12] D. Saxena, S. N. Singh, and K. S. Verma, “Application of computational intelligence in emerging power systems,” 2010. [Online]. Available: www.ijest-ng.com

[13] J. David, V. Henao, A. De Información, X. S. A. E. S. P. Medellín, C. Jaime, and F. Cardona, “ELECTRICITY DEMAND FORECASTING USING A SARIMA-MULTIPLICATIVE SINGLE NEURON HYBRID MODEL PRONÓSTICO DE LA DEMANDA DE ELECTRICIDAD USANDO UN MODELO HÍBRIDO SARIMA-NEURONA SIMPLE MULTIPLICATIVA VIVIANA MARIA RUEDA MEJIA,” vol. 80, pp. 4–8, 2013.

[14] Dr.P.Mathiyalagan, Ms.A.Shanmugapriya, and Geethu.A.V, “Smart Meter Data Analytics using R and Hadoop,”

[15] H. L. M. Do Amaral, J. A. G. Maginador, R. M. J. Ayres, A. N. De Souza, and D. S. Gastaldello, “Integration of consumption forecasting in Smart Meters and Smart Home Management Systems.”

[16] Baran Yildiz, Jose I. Bilbao, Jonathon Dore, and Alistair Sproul, “Household electricity load forecasting using historical smart meter data with clustering and classification techniques,”

[17] M. Suresh, Anbarasi, R. Jayasree, C. Shivani, and P. Sowmiya, “Smart Meter Data Analytics Using Particle Swarm Optimization,” in Proceeding of International Conference on Systems Computation Automation and Networking , 2019.

[18] J. L. Tena García, E. Cadenas Calderón, E. Rangel Heras, and C. Morales Ontiveros, “Generating electrical demand time series applying SRA technique to complement NAR and sARIMA models,” Energy Effic, vol. 12, no. 7, pp. 1751–1769, Oct. 2019, doi: 10.1007/s12053- 019-09774-2.

[19] P. Y. Andoh, C. K. K Sekyere, L. D. Mensah, and D. E. Dzebre, “FORECASTING ELECTRICITY DEMAND IN GHANA WITH THE SARIMA MODEL The Brew Hammond Energy Centre.”

[20] “Energy Dataset Country-Wise (1900-2021),” Our world in Data BP Statistical Review of World Energy SHIFT Data Portal Ember – Data Explorer and the Ember European Electricity Review. https://www.kaggle.com/datasets/pranjalverma08/energy-dataset-countrywise-19002021 (accessed Sep. 15, 2023).

[21] J. Shi, X. Qu, and S. Zeng, “Short-term wind power generation forecasting: Direct versus indirect ARIMA-based approaches,” Int J Green Energy, vol. 8, no. 1, pp. 100–112, Jan. 2011, doi: 10.1080/15435075.2011.546755.

Time Series Analysis and Forecasting of Energy Demand in Bangladesh by Using SARIMA Model

Md. Makfidunnabi

2023-10-07

1 Introduction

1.1 Objective

2 Literature Review

3 Methodology

3.1 Data Description

3.2 Time Series Analysis

3.2.1 R Libraries for Time Series Analysis

3.2.2 Observing the Data

3.2.3 Decomposition of the Data

3.2.4 Trend Analysis

3.2.4.1 Method of Semi Average

3.2.4.2 Method of Least Squares

3.2.4.3 Method of Simple Moving Average

3.2.4.4 Method of Exponential Moving Average

3.2.4.5 Weighted Moving Average

3.2.4.6 Detrend the data set by using Weighted Moving Average

3.2.5 Seasonal Index

3.2.5.1 Method of Simple Averages

3.2.5.2 Deseasonalize the Detrended Data by Method of Ratio to Moving Average

3.3 Forecasting Model: SARIMA

3.3.1 auto.arima(): R Function for Finding Best SARIMA Model

4 Result and Discussion

5 Conclusion

Reference