Quantitative Forecasting

Part 1: Data Wrangling/Visualization

R Batzinger
Mini-workshop for IT340

Forecasting

Using historical data to suggest future outcomes based on repeating patterns

Why do we attempt to forecast?

  • Scheduling of staff, equipment, and maintenance procedures
  • Acquiring resources (such as limited/seasonal supply)
  • Determing resource requirements based on typical needs

Defining the issues

  • Data Requirements

    • Required reliability
    • Required advanced warning
    • Speed of data flow
  • Forecast Range

    • Short Term - scheduling of peersonnel, production and transportation
    • Medium Term - purchase of raw materials, hire personnel, buy machinery
    • Long Term - strategic planning, market opportunities, environmental factors
  • Skills Required

    • Identification of key forecasting parameters
    • Ability to calculate forecasts
    • Means to test and choose appropriate forecasting methods
    • Management support for the use of forecasts

Essential Requirements for Forecasting

  • Information about the past is available
  • Information is quantified as numeric values
  • It can be assumed that the past patterns will continue

Basic Strategy

  • Acquire the data

  • Standardize and normalize the data

  • Handle outliers (Points beyond 3 std dev)

  • Visualize the data

  • Look for trends and seasonal variations

  • Develop the model

    • Create a model
    • Test the model for accuracy
    • Make corrections
    • Try again
  • Put the model into use

  • Monitor and adjust

Basic models

  • Tomorrow is a function of today

\[V_{(t+1)} = V_{(t)} + \alpha F_{(t+1)} + \epsilon\]

  • Tomorrow is related to today’s changes in the components

\[V_{(t+1)} = V_{(t)} + \Delta v_a + \Delta v_b + \Delta v_c+ \epsilon\]

  • Tomorrow is a moving average of the last several data points

\[ V_{(t+1)} = \left(\frac{V_{(t-1)} + V_{(t)}}{2}\right)\]

  • Trend and Seasonal variation

\[V_{(t+1)} = V_{(trend)} \times (seasonal\_factor)[t+1]\]

Computing environment

  • Excel - often used for prototyping (proof of concept)
  • Python - Common programming environment for creating smart agents
  • R - tool of choice for creating statistically-based production systems
  • Dashboard tools - MS BI, Tableau

Google Sheets spreadsheet

  • Solver - optimizer

Setup of Add-ons

  • Statistics Package

Visualization

  • value over time
  • value over 1 seasonal cycle
  • Trend analysis
  • Decompostion

Some Datasets

Data that tell as story

Load Factor for US Air carriers

Shows effects

  • Recession
  • Covid 19 emergency methods
  • Seasonal effects

China GDP per Capita

\[\tiny\matrix{&\rm\hbox{GDP per capita}\\ \rm\hbox{Year}&\rm\hbox{(In USD})\\ 2013 &7,202\\ 2014 &7,759\\ 2015 &8,103\\ 2016 & 8,258\\ 2017 &8,982\\ 2018 &10,091\\ 2019 &10,344\\ 2020 &10,628\\ 2021 &12,885\\ 2022 &12,983\\ 2023 &12,959\\ 2024 & 13,306\\ }\]

Thai GDP per capita

Thai Electrical Consumption

  • Source: Energy Policy and Planning office at EPPO, Bangkok, Thailand
  • https://www.eppo.go.th/index.php/en/en-energystatistics/electricity-statistic

BangChak Petroleum Prices

  • https://www.bangchak.co.th/en/oilprice/historical

  • Weekly listing of consumer petroleum prices

  • Download years: 2018-2025

CNX Airport Traffic Planes and Passengers

  • Source : Air Transport Information Division, Airports of Thailand.

  • URL: https://investor-th.airportthai.co.th/transport.html

  • Downloaded: 2017-2025

Thai Government Data Center

  • https://data.go.th/en
  • Source of data posted by the various ministries of the Thai govenment

Solver

An search function for optimized solutions

What is Solver?

  • an optimization tool that searchs for the best possible solution to a problem

  • Works by adjusting values to reach a specific objective, subject to a set of constraints.

  • Useful for tasks like:

    * Maximizing profit
    * Minimizing costs
    * Achieving a target value by changing the input values
    * Modelling a trend by minimizing error in the model

Solver requirements

  • an objective cell (the cell to be optimized)
  • the variable cells (list of the values Solver can change)
  • list of constraints (limitations or rules) that must be followed.

Step-by-Step Installation

  1. Open a Google Sheets spreadsheet.

  2. In the top menu bar, click on Extensions.

  3. From the dropdown menu, select Add-ons, then click Get add-ons.

  4. In the search bar of the Google Workspace Marketplace, type “Solver” and press enter.

  5. Select the Solver add-on by Frontline Systems, Inc. and click on it.

  6. Click the Install button and follow the prompts to grant the necessary permissions.

  7. Once the installation is complete, you can access Solver by going to Extensions > Solver > Start.

Forecasting Thai monthly electricity consumption

Steps

  1. Download the dataset from the Data.go.th

  2. Edit and trim the file to a simple time sequence of monthly consumption figures

  3. Cut and paste the time range of interest

  4. Rearrange the date: rows by year; Months by column

  5. Calculate the total consumption per year

  6. Develop a linear model to predict the total consumption by year

  7. Calculate the monthly fraction of consumption based on actual totals

  8. Average the fractions by month

  9. Multiply the average monthly fractions by the estimated annual consumption

  10. Convert the monthly estimates into a time series

  11. Plot the actual and estimated monthly consumption

Your Challenge

  1. Use the electricity consumption figures for Jan 2020 to Dec 2024.
  2. Follow the procedure discussed on the last slide to predict the consumption Jan 2025 to Dec 2026.
  3. Compare the accuracy of this estimate. Has the accuracy increased?
  4. Suggest estimates came out this way.