1. Choose and Customize an Investment Strategy

For this exercise, we will use the Moving Average Crossover Strategy, a widely used technical analysis strategy. Investment Strategy: Moving Average Crossover Short-Term Moving Average (SMA50): We will use the 50-day SMA as our short-term moving average. Long-Term Moving Average (SMA200): We will use the 200-day SMA as our long-term moving average.

Strategy Logic: Buy when the 50-day SMA crosses above the 200-day SMA (bullish crossover). Sell when the 50-day SMA crosses below the 200-day SMA (bearish crossover). We’ll backtest this strategy using GOLD (XAU/USD) data from January 2020 to January 2025.

  1. Backtest the Strategy Backtesting Steps: Download Historical Data: We’ll download the historical price data for GOLD (XAU/USD) using yfinance in Python or tidyquant in R. Calculate Moving Averages: We will calculate the 50-day and 200-day simple moving averages (SMA) for the data. Create Buy and Sell Signals: We will create buy signals when the short-term moving average crosses above the long-term moving average, and sell signals when the short-term moving average crosses below. Calculate Daily Returns: We will calculate the daily returns of GOLD and the strategy returns based on our buy/sell signals. Cumulative Returns: We will calculate the cumulative returns for both the strategy and the market (buy and hold). Visualize the Performance: We will visualize the cumulative returns for the strategy vs. the market.

  2. Implement the Strategy in R

Step 3.1: Install Packages

install.packages("quantmod")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("tidyquant")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("tidyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(tidyquant)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## ── Attaching core tidyquant packages ─────────────────────── tidyquant 1.0.11 ──
## ✔ PerformanceAnalytics 2.0.8      ✔ TTR                  0.24.4
## ✔ quantmod             0.4.28     ✔ xts                  0.14.1
## ── Conflicts ────────────────────────────────────────── tidyquant_conflicts() ──
## ✖ zoo::as.Date()                 masks base::as.Date()
## ✖ zoo::as.Date.numeric()         masks base::as.Date.numeric()
## ✖ PerformanceAnalytics::legend() masks graphics::legend()
## ✖ quantmod::summary()            masks base::summary()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyr)
library(dplyr)
## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:xts':
## 
##     first, last
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

Step 3.2: Download Historical Data # Download historical data for GOLD (XAU/USD) and check the structure of the gold_data object

gold_data <- tq_get("GC=F", from = "2020-01-01", to = "2025-01-01")
str(gold_data)
## tibble [1,259 × 8] (S3: tbl_df/tbl/data.frame)
##  $ symbol  : chr [1:1259] "GC=F" "GC=F" "GC=F" "GC=F" ...
##  $ date    : Date[1:1259], format: "2020-01-02" "2020-01-03" ...
##  $ open    : num [1:1259] 1518 1530 1580 1558 1580 ...
##  $ high    : num [1:1259] 1529 1553 1580 1576 1604 ...
##  $ low     : num [1:1259] 1518 1530 1560 1558 1552 ...
##  $ close   : num [1:1259] 1524 1549 1566 1572 1557 ...
##  $ volume  : num [1:1259] 214 107 416 47 236 54 16 48 32 11 ...
##  $ adjusted: num [1:1259] 1524 1549 1566 1572 1557 ...

Remove rows with NA values in adjusted price

gold_data <- gold_data %>%
  drop_na(adjusted)
head(gold_data)
## # A tibble: 6 × 8
##   symbol date        open  high   low close volume adjusted
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
## 1 GC=F   2020-01-02 1518. 1529. 1518  1524.    214    1524.
## 2 GC=F   2020-01-03 1530. 1553. 1530. 1549.    107    1549.
## 3 GC=F   2020-01-06 1580  1580  1560. 1566.    416    1566.
## 4 GC=F   2020-01-07 1558. 1576. 1558. 1572.     47    1572.
## 5 GC=F   2020-01-08 1580. 1604. 1552. 1557.    236    1557.
## 6 GC=F   2020-01-09 1556. 1556. 1543. 1552.     54    1552.

Step 3.3: Calculate the Moving Averages # Calculate short-term (50-day) and long-term (200-day) moving averages

gold_data <- gold_data %>%
  tq_mutate(select = adjusted, 
            mutate_fun = SMA, 
            n = 50, col_rename = "SMA50") %>%
  tq_mutate(select = adjusted, 
            mutate_fun = SMA, 
            n = 200, col_rename = "SMA200")
head(gold_data)
## # A tibble: 6 × 10
##   symbol date        open  high   low close volume adjusted SMA50 SMA200
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl>  <dbl>
## 1 GC=F   2020-01-02 1518. 1529. 1518  1524.    214    1524.    NA     NA
## 2 GC=F   2020-01-03 1530. 1553. 1530. 1549.    107    1549.    NA     NA
## 3 GC=F   2020-01-06 1580  1580  1560. 1566.    416    1566.    NA     NA
## 4 GC=F   2020-01-07 1558. 1576. 1558. 1572.     47    1572.    NA     NA
## 5 GC=F   2020-01-08 1580. 1604. 1552. 1557.    236    1557.    NA     NA
## 6 GC=F   2020-01-09 1556. 1556. 1543. 1552.     54    1552.    NA     NA

Step 3.4: Create Buy and Sell Signals # Create Buy/Sell signals (1 = Buy, 0 = Sell)

gold_data <- gold_data %>%
  mutate(Signal = ifelse(SMA50 > SMA200, 1, 0))  # 1 = Buy, 0 = Sell
head(gold_data)
## # A tibble: 6 × 11
##   symbol date        open  high   low close volume adjusted SMA50 SMA200 Signal
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl>  <dbl>  <dbl>
## 1 GC=F   2020-01-02 1518. 1529. 1518  1524.    214    1524.    NA     NA     NA
## 2 GC=F   2020-01-03 1530. 1553. 1530. 1549.    107    1549.    NA     NA     NA
## 3 GC=F   2020-01-06 1580  1580  1560. 1566.    416    1566.    NA     NA     NA
## 4 GC=F   2020-01-07 1558. 1576. 1558. 1572.     47    1572.    NA     NA     NA
## 5 GC=F   2020-01-08 1580. 1604. 1552. 1557.    236    1557.    NA     NA     NA
## 6 GC=F   2020-01-09 1556. 1556. 1543. 1552.     54    1552.    NA     NA     NA

Step 3.5: Calculate Daily Returns and Strategy Returns # Calculate daily returns

gold_data <- gold_data %>%
  mutate(Daily_Return = adjusted / lag(adjusted) - 1)

Calculate strategy returns based on the signals

gold_data <- gold_data %>%
  mutate(Strategy_Return = Daily_Return * lag(Signal, 1))

View the updated data with returns

head(gold_data)
## # A tibble: 6 × 13
##   symbol date        open  high   low close volume adjusted SMA50 SMA200 Signal
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl>  <dbl>  <dbl>
## 1 GC=F   2020-01-02 1518. 1529. 1518  1524.    214    1524.    NA     NA     NA
## 2 GC=F   2020-01-03 1530. 1553. 1530. 1549.    107    1549.    NA     NA     NA
## 3 GC=F   2020-01-06 1580  1580  1560. 1566.    416    1566.    NA     NA     NA
## 4 GC=F   2020-01-07 1558. 1576. 1558. 1572.     47    1572.    NA     NA     NA
## 5 GC=F   2020-01-08 1580. 1604. 1552. 1557.    236    1557.    NA     NA     NA
## 6 GC=F   2020-01-09 1556. 1556. 1543. 1552.     54    1552.    NA     NA     NA
## # ℹ 2 more variables: Daily_Return <dbl>, Strategy_Return <dbl>

Calculate cumulative returns

gold_data <- gold_data %>%
  mutate(Cumulative_Strategy_Return = cumprod(1 + Strategy_Return) - 1,
         Cumulative_Market_Return = cumprod(1 + Daily_Return) - 1)
head(gold_data)
## # A tibble: 6 × 15
##   symbol date        open  high   low close volume adjusted SMA50 SMA200 Signal
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl>  <dbl>  <dbl>
## 1 GC=F   2020-01-02 1518. 1529. 1518  1524.    214    1524.    NA     NA     NA
## 2 GC=F   2020-01-03 1530. 1553. 1530. 1549.    107    1549.    NA     NA     NA
## 3 GC=F   2020-01-06 1580  1580  1560. 1566.    416    1566.    NA     NA     NA
## 4 GC=F   2020-01-07 1558. 1576. 1558. 1572.     47    1572.    NA     NA     NA
## 5 GC=F   2020-01-08 1580. 1604. 1552. 1557.    236    1557.    NA     NA     NA
## 6 GC=F   2020-01-09 1556. 1556. 1543. 1552.     54    1552.    NA     NA     NA
## # ℹ 4 more variables: Daily_Return <dbl>, Strategy_Return <dbl>,
## #   Cumulative_Strategy_Return <dbl>, Cumulative_Market_Return <dbl>

Check for missing values in important columns and remove rows with NA values in critical columns (Daily_Return or Strategy_Return)

sum(is.na(gold_data$Daily_Return))
## [1] 1
sum(is.na(gold_data$Strategy_Return))
## [1] 200
gold_data <- gold_data %>%
  filter(!is.na(Daily_Return) & !is.na(Strategy_Return))

Step 3.6: Calculate Cumulative Returns # Calculate cumulative returns again

gold_data <- gold_data %>%
  mutate(Cumulative_Strategy_Return = cumprod(1 + Strategy_Return) - 1,
         Cumulative_Market_Return = cumprod(1 + Daily_Return) - 1)

Step 3.7: Visualize the Results # Plot cumulative returns

ggplot(gold_data, aes(x = date)) +
  geom_line(aes(y = Cumulative_Strategy_Return, color = "Strategy Return")) +
  geom_line(aes(y = Cumulative_Market_Return, color = "Market Return")) +
  labs(title = "Cumulative Returns: Moving Average Crossover Strategy vs Market",
       x = "Date", y = "Cumulative Return") +
  scale_color_manual(values = c("blue", "red")) +
  theme_minimal() +
  theme(legend.title = element_blank())

4. Show Investment Performance Results

After running the code, you will get two main results: Cumulative Strategy Return: Shows how much the moving average strategy would have grown your capital. Cumulative Market Return: Shows how much a buy-and-hold strategy in GOLD would have returned over the same period.

Step 4.1: Display the Final Performance # Final performance results

final_strategy_return <- tail(gold_data$Cumulative_Strategy_Return, 1)
final_market_return <- tail(gold_data$Cumulative_Market_Return, 1)

cat("Final Strategy Return:", round(final_strategy_return * 100, 2), "%\n")
## Final Strategy Return: 5.65 %
cat("Final Market Return:", round(final_market_return * 100, 2), "%\n")
## Final Market Return: 38.15 %