Module: Introduction to FinTech Using R

Session 1: Introduction to Financial Analysis Using R

Topics

  • Overview of financial data types (stocks, bonds, forex, etc.)
  • Introduction to key R packages (quantmod)
  • Basics of importing financial data using APIs and from financial databases

Practical Exercise

Import stock data for a list of companies using quantmod and visualize the closing prices over time.

Load necessary packages

library(quantmod)

Importing GOLD stock data using quantmod

getSymbols("GOLD", src = "yahoo", from = "2025-11-01", to = "2026-02-01")
## [1] "GOLD"
chartSeries(GOLD, theme = chartTheme("black"))

chartSeries(GOLD, theme = chartTheme("white"))

Green = gains; Red = losses

To import a different stock, look up stock of choice on https://finance.yahoo.com/

  • NMS: NASDAQ Global Market System
  • NYQ: New York Stock Exchange (NYSE)

Re-importing GOLD over a longer time window

getSymbols("GOLD", src = "yahoo", from = "2025-11-01", to = "2026-02-01")
## [1] "GOLD"
chartSeries(GOLD, theme = chartTheme("white"))

Session 2: Manipulating and Analyzing Financial Time Series

Topics

  • Time series data manipulation (transformations, frequency changes)
  • Calculating returns and other financial metrics
  • Visualizing financial time series

Practical Exercise

Calculate daily and monthly returns for different stocks and create plots showing price trends and returns.

Calculate daily returns

The dailyReturn() function computes the percentage change between the closing prices of two consecutive days.

GOLD_daily_returns <- dailyReturn(GOLD)

Calculate monthly returns

This involves converting daily stock prices to monthly prices and then computing returns based on those monthly prices.

GOLD_monthly_returns <- monthlyReturn(GOLD)

Visualizing returns

barChart(GOLD_daily_returns, name = "Daily Returns for GOLD")

barChart(GOLD_monthly_returns, theme = chartTheme("white"), name = "Monthly Returns for GOLD")

barChart(GOLD_daily_returns, theme = chartTheme("black", grid.col = "gray"), name = "Daily Returns for GOLD")

Session 3: Basic Financial Calculations

Topics

  • Risk and return analysis
  • Moving averages, volatility computation
  • Technical indicators (MACD, RSI)

Practical Exercise

Compute and plot moving averages and volatility for selected stocks, and apply technical indicators to generate trading signals.

Moving averages

SMA calculates the arithmetic mean of the series over the past n observations. Below are enhanced charts with volume and SMA overlays.

chartSeries(GOLD, TA = "addVo();addSMA(20)", theme = chartTheme("white"))

chartSeries(GOLD, TA = "addVo();addSMA(50)", theme = chartTheme("white"))

Volatility (standard deviation of daily returns)

Volatility is a statistical measure of the dispersion of returns for a given security or market index, typically used to quantify risk.

GOLD_volatility <- runSD(GOLD_daily_returns, n = 20)
plot(GOLD_volatility, main = "20-Day Rolling Volatility of GOLD")

Technical indicators

  • MACD (Moving Average Convergence/Divergence)
  • RSI (Relative Strength Index): ratio of recent upward price movements to absolute price movement.
GOLD_macd_vals <- MACD(Cl(GOLD))
GOLD_rsi_vals  <- RSI(Cl(GOLD))
plot(GOLD_macd_vals, main = "MACD GOLD")

plot(GOLD_rsi_vals,  main = "RSI GOLD")

Plotting with technical indicators

chartSeries(GOLD,
            TA = "addVo();addMACD();addRSI();addSMA(20);addBBands()",
            theme = chartTheme("white"))

Session 4: Time Series Forecasting

Topics

  • Introduction to forecasting models (ARIMA, exponential smoothing)
  • Building and validating a time series model
  • Forecasting future stock prices or economic indicators

Practical Exercise

Build a simple ARIMA model to forecast next month’s prices for a chosen stock, and evaluate the model’s accuracy.

Background on ARIMA

ARIMA stands for AutoRegressive Integrated Moving Average:

  • AR (AutoRegressive): leverages the dependent relationship between an observation and a number of lagged observations.
  • I (Integrated): differences raw observations to make the series stationary.
  • MA (Moving Average): models the error term as a linear combination of past error terms.

The auto.arima() function from the forecast package automatically selects the best ARIMA model by exploring combinations of AR (p), I (d), and MA (q) parameters and choosing the model with the lowest AIC/BIC/AICc.

Load forecasting library

library(forecast)

Load BJsales time series data

data("BJsales")
BJsales
## Time Series:
## Start = 1 
## End = 150 
## Frequency = 1 
##   [1] 200.1 199.5 199.4 198.9 199.0 200.2 198.6 200.0 200.3 201.2 201.6 201.5
##  [13] 201.5 203.5 204.9 207.1 210.5 210.5 209.8 208.8 209.5 213.2 213.7 215.1
##  [25] 218.7 219.8 220.5 223.8 222.8 223.8 221.7 222.3 220.8 219.4 220.1 220.6
##  [37] 218.9 217.8 217.7 215.0 215.3 215.9 216.7 216.7 217.7 218.7 222.9 224.9
##  [49] 222.2 220.7 220.0 218.7 217.0 215.9 215.8 214.1 212.3 213.9 214.6 213.6
##  [61] 212.1 211.4 213.1 212.9 213.3 211.5 212.3 213.0 211.0 210.7 210.1 211.4
##  [73] 210.0 209.7 208.8 208.8 208.8 210.6 211.9 212.8 212.5 214.8 215.3 217.5
##  [85] 218.8 220.7 222.2 226.7 228.4 233.2 235.7 237.1 240.6 243.8 245.3 246.0
##  [97] 246.3 247.7 247.6 247.8 249.4 249.0 249.9 250.5 251.5 249.0 247.6 248.8
## [109] 250.4 250.7 253.0 253.7 255.0 256.2 256.0 257.4 260.4 260.0 261.3 260.4
## [121] 261.6 260.8 259.8 259.0 258.9 257.4 257.7 257.9 257.4 257.3 257.6 258.9
## [133] 257.8 257.7 257.2 257.5 256.8 257.5 257.0 257.6 257.3 257.5 259.6 261.1
## [145] 262.9 263.3 262.8 261.8 262.2 262.7
plot(BJsales, main = "BJSales without forecasting")

Fit an ARIMA model

BJfit <- auto.arima(BJsales)
summary(BJfit)
## Series: BJsales 
## ARIMA(1,1,1) 
## 
## Coefficients:
##          ar1      ma1
##       0.8800  -0.6415
## s.e.  0.0644   0.1035
## 
## sigma^2 = 1.8:  log likelihood = -254.37
## AIC=514.74   AICc=514.9   BIC=523.75
## 
## Training set error measures:
##                     ME     RMSE    MAE        MPE      MAPE      MASE
## Training set 0.1457572 1.328119 1.0447 0.06512899 0.4600935 0.8997702
##                     ACF1
## Training set -0.02622396

We have an ARIMA(1,1,1) model:

  • 1 for the AR part — 1 lagged term of the variable used.
  • 1 for the I part — data has been differenced once to achieve stationarity.
  • 1 for the MA part — model uses one lagged forecast error term.

Check model diagnostics

checkresiduals(BJfit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,1)
## Q* = 5.8814, df = 8, p-value = 0.6605
## 
## Model df: 2.   Total lags used: 10

Create a forecast

BJ_future_values <- forecast(BJfit, 10)  # next 10 values; use h = 12 for 12 months
plot(BJ_future_values, main = "BJSales with forecasting")

Forecasting GOLD Open Price

plot(GOLD$GOLD.Open, main = "GOLD Open Price without forecasting")

Fit an ARIMA model on GOLD Open

GOLD_OPfit <- auto.arima(GOLD$GOLD.Open)
summary(GOLD_OPfit)
## Series: GOLD$GOLD.Open 
## ARIMA(0,1,2) with drift 
## 
## Coefficients:
##          ma1      ma2   drift
##       0.1815  -0.5136  0.4211
## s.e.  0.1120   0.1103  0.1250
## 
## sigma^2 = 2.117:  log likelihood = -106.49
## AIC=220.97   AICc=221.7   BIC=229.35
## 
## Training set error measures:
##                       ME    RMSE       MAE        MPE     MAPE      MASE
## Training set -0.02100261 1.40664 0.9474574 -0.3898475 2.739125 0.9228481
##                    ACF1
## Training set 0.02547112

Check model diagnostics

checkresiduals(GOLD_OPfit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,2) with drift
## Q* = 4.2909, df = 8, p-value = 0.83
## 
## Model df: 2.   Total lags used: 10

Create a forecast

GOLD_OP_future_values <- forecast(GOLD_OPfit, 10)
plot(GOLD_OP_future_values, main = "GOLD Open Price with forecasting")

Interpretation of the GOLD Stock Forecast

Memo to Prof.Colin

From: Bhargav Patel

Date: April 29, 2026

Subject: forecast from the ARIMA HW15

Dear Professor,

Summary:

This study applies R-based financial analytics to Barrick Gold Corporation (ticker: GOLD) using the quantmod and forecast packages, covering data import, return calculations, technical indicators, and ARIMA-based time series forecasting.

Historical price data was pulled from Yahoo Finance for the period November 1, 2025 to February 1, 2026, providing roughly three months of daily trading observations.

The analysis is structured around four sessions: (1) data import and visualization, (2) return computation, (3) volatility and technical indicators, and (4) ARIMA forecasting of future prices.

Daily and monthly returns were calculated using dailyReturn() and monthlyReturn(), and 20-day rolling volatility was computed via runSD() to quantify short-term risk.

Technical indicators including SMA-20, SMA-50, MACD, RSI, and Bollinger Bands were overlaid on price charts to identify trend direction, momentum, and potential overbought/oversold conditions.

The auto.arima() function was applied to both the benchmark BJsales dataset and the GOLD opening price series to select optimal model parameters based on AIC/BIC criteria.

The BJsales benchmark fit produced an ARIMA(1,1,1) model — one autoregressive term, one differencing step for stationarity, and one moving average term — confirming the procedure works correctly before applying it to GOLD.

Forecast Output and Diagnostics (Figures 1–4):

The price chart series (black and white themes) shows GOLD’s daily candles with green/red bars indicating gains and losses, while the SMA-20 and SMA-50 overlays reveal short- and medium-term trend direction.

The 20-day rolling volatility plot highlights periods where the standard deviation of daily returns spiked, signaling elevated market risk that aligned with broader downturns in the price series.

MACD and RSI plots show momentum shifts; RSI values approaching 70 or 30 in the observation window suggest brief overbought/oversold episodes worth flagging.

The ARIMA forecast plot projects the GOLD opening price over the next 10 trading periods. The solid line shows the point forecast, and the shaded 80% (dark) and 95% (light) prediction intervals widen as the horizon extends — a textbook reflection of compounding uncertainty.

The checkresiduals() diagnostic indicates whether the model has successfully extracted predictable structure from the series; an insignificant Ljung-Box test (p > 0.05) and approximately normal residuals would confirm the model is statistically adequate.

The ARIMA forecast should be treated as a baseline statistical expectation, not a trading signal. GOLD’s price responds heavily to macroeconomic factors — interest rate decisions, inflation expectations, USD strength, and geopolitical risk — none of which a univariate ARIMA model can capture, since it learns only from the price’s own past.

The prediction intervals are the most actionable output of the model. They quantify the range of plausible outcomes under normal market conditions and can directly inform position sizing, stop-loss placement, and portfolio-level volatility budgets.

Re-estimate the model frequently as new data arrives. Forecast reliability degrades quickly beyond 5–10 periods, especially in volatile markets, so the model should be refitted on a rolling basis rather than used as a static prediction.

Combine the ARIMA output with complementary methods. For volatility modeling, a GARCH specification would better capture the volatility clustering visible in the rolling-SD plot. For incorporating macroeconomic drivers, an ARIMAX model with exogenous regressors (e.g., USD index, real interest rates) would improve directional accuracy.

Use technical indicators (MACD, RSI, SMA, Bollinger Bands) alongside — not in place of — the forecast. They are most useful for short-term timing decisions and confirmation signals, particularly when multiple indicators align.

Validate residual assumptions before trusting the prediction intervals. If checkresiduals() shows significant autocorrelation or non-normality, the intervals are likely too narrow and risk metrics derived from them will understate true uncertainty.

Bhargav Patel