For this exercise, we will use the Moving Average Crossover Strategy, a widely used technical analysis strategy. Investment Strategy: Moving Average Crossover Short-Term Moving Average (SMA50): We will use the 50-day SMA as our short-term moving average. Long-Term Moving Average (SMA200): We will use the 200-day SMA as our long-term moving average.
Strategy Logic: Buy when the 50-day SMA crosses above the 200-day SMA (bullish crossover). Sell when the 50-day SMA crosses below the 200-day SMA (bearish crossover). We’ll backtest this strategy using GOLD (XAU/USD) data from January 2020 to January 2025.
Backtest the Strategy Backtesting Steps: Download Historical Data: We’ll download the historical price data for GOLD (XAU/USD) using yfinance in Python or tidyquant in R. Calculate Moving Averages: We will calculate the 50-day and 200-day simple moving averages (SMA) for the data. Create Buy and Sell Signals: We will create buy signals when the short-term moving average crosses above the long-term moving average, and sell signals when the short-term moving average crosses below. Calculate Daily Returns: We will calculate the daily returns of GOLD and the strategy returns based on our buy/sell signals. Cumulative Returns: We will calculate the cumulative returns for both the strategy and the market (buy and hold). Visualize the Performance: We will visualize the cumulative returns for the strategy vs. the market.
Implement the Strategy in R
Step 3.1: Install Packages
install.packages("quantmod")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("tidyquant")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("tidyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(tidyquant)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## ── Attaching core tidyquant packages ─────────────────────── tidyquant 1.0.11 ──
## ✔ PerformanceAnalytics 2.0.8 ✔ TTR 0.24.4
## ✔ quantmod 0.4.28 ✔ xts 0.14.1
## ── Conflicts ────────────────────────────────────────── tidyquant_conflicts() ──
## ✖ zoo::as.Date() masks base::as.Date()
## ✖ zoo::as.Date.numeric() masks base::as.Date.numeric()
## ✖ PerformanceAnalytics::legend() masks graphics::legend()
## ✖ quantmod::summary() masks base::summary()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyr)
library(dplyr)
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:xts':
##
## first, last
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
Step 3.2: Download Historical Data # Download historical data for GOLD (XAU/USD) and check the structure of the gold_data object
gold_data <- tq_get("GC=F", from = "2020-01-01", to = "2025-01-01")
str(gold_data)
## tibble [1,259 × 8] (S3: tbl_df/tbl/data.frame)
## $ symbol : chr [1:1259] "GC=F" "GC=F" "GC=F" "GC=F" ...
## $ date : Date[1:1259], format: "2020-01-02" "2020-01-03" ...
## $ open : num [1:1259] 1518 1530 1580 1558 1580 ...
## $ high : num [1:1259] 1529 1553 1580 1576 1604 ...
## $ low : num [1:1259] 1518 1530 1560 1558 1552 ...
## $ close : num [1:1259] 1524 1549 1566 1572 1557 ...
## $ volume : num [1:1259] 214 107 416 47 236 54 16 48 32 11 ...
## $ adjusted: num [1:1259] 1524 1549 1566 1572 1557 ...
gold_data <- gold_data %>%
drop_na(adjusted)
head(gold_data)
## # A tibble: 6 × 8
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 GC=F 2020-01-02 1518. 1529. 1518 1524. 214 1524.
## 2 GC=F 2020-01-03 1530. 1553. 1530. 1549. 107 1549.
## 3 GC=F 2020-01-06 1580 1580 1560. 1566. 416 1566.
## 4 GC=F 2020-01-07 1558. 1576. 1558. 1572. 47 1572.
## 5 GC=F 2020-01-08 1580. 1604. 1552. 1557. 236 1557.
## 6 GC=F 2020-01-09 1556. 1556. 1543. 1552. 54 1552.
Step 3.3: Calculate the Moving Averages # Calculate short-term (50-day) and long-term (200-day) moving averages
gold_data <- gold_data %>%
tq_mutate(select = adjusted,
mutate_fun = SMA,
n = 50, col_rename = "SMA50") %>%
tq_mutate(select = adjusted,
mutate_fun = SMA,
n = 200, col_rename = "SMA200")
head(gold_data)
## # A tibble: 6 × 10
## symbol date open high low close volume adjusted SMA50 SMA200
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 GC=F 2020-01-02 1518. 1529. 1518 1524. 214 1524. NA NA
## 2 GC=F 2020-01-03 1530. 1553. 1530. 1549. 107 1549. NA NA
## 3 GC=F 2020-01-06 1580 1580 1560. 1566. 416 1566. NA NA
## 4 GC=F 2020-01-07 1558. 1576. 1558. 1572. 47 1572. NA NA
## 5 GC=F 2020-01-08 1580. 1604. 1552. 1557. 236 1557. NA NA
## 6 GC=F 2020-01-09 1556. 1556. 1543. 1552. 54 1552. NA NA
Step 3.4: Create Buy and Sell Signals # Create Buy/Sell signals (1 = Buy, 0 = Sell)
gold_data <- gold_data %>%
mutate(Signal = ifelse(SMA50 > SMA200, 1, 0)) # 1 = Buy, 0 = Sell
head(gold_data)
## # A tibble: 6 × 11
## symbol date open high low close volume adjusted SMA50 SMA200 Signal
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 GC=F 2020-01-02 1518. 1529. 1518 1524. 214 1524. NA NA NA
## 2 GC=F 2020-01-03 1530. 1553. 1530. 1549. 107 1549. NA NA NA
## 3 GC=F 2020-01-06 1580 1580 1560. 1566. 416 1566. NA NA NA
## 4 GC=F 2020-01-07 1558. 1576. 1558. 1572. 47 1572. NA NA NA
## 5 GC=F 2020-01-08 1580. 1604. 1552. 1557. 236 1557. NA NA NA
## 6 GC=F 2020-01-09 1556. 1556. 1543. 1552. 54 1552. NA NA NA
Step 3.5: Calculate Daily Returns and Strategy Returns # Calculate daily returns
gold_data <- gold_data %>%
mutate(Daily_Return = adjusted / lag(adjusted) - 1)
gold_data <- gold_data %>%
mutate(Strategy_Return = Daily_Return * lag(Signal, 1))
head(gold_data)
## # A tibble: 6 × 13
## symbol date open high low close volume adjusted SMA50 SMA200 Signal
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 GC=F 2020-01-02 1518. 1529. 1518 1524. 214 1524. NA NA NA
## 2 GC=F 2020-01-03 1530. 1553. 1530. 1549. 107 1549. NA NA NA
## 3 GC=F 2020-01-06 1580 1580 1560. 1566. 416 1566. NA NA NA
## 4 GC=F 2020-01-07 1558. 1576. 1558. 1572. 47 1572. NA NA NA
## 5 GC=F 2020-01-08 1580. 1604. 1552. 1557. 236 1557. NA NA NA
## 6 GC=F 2020-01-09 1556. 1556. 1543. 1552. 54 1552. NA NA NA
## # ℹ 2 more variables: Daily_Return <dbl>, Strategy_Return <dbl>
gold_data <- gold_data %>%
mutate(Cumulative_Strategy_Return = cumprod(1 + Strategy_Return) - 1,
Cumulative_Market_Return = cumprod(1 + Daily_Return) - 1)
head(gold_data)
## # A tibble: 6 × 15
## symbol date open high low close volume adjusted SMA50 SMA200 Signal
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 GC=F 2020-01-02 1518. 1529. 1518 1524. 214 1524. NA NA NA
## 2 GC=F 2020-01-03 1530. 1553. 1530. 1549. 107 1549. NA NA NA
## 3 GC=F 2020-01-06 1580 1580 1560. 1566. 416 1566. NA NA NA
## 4 GC=F 2020-01-07 1558. 1576. 1558. 1572. 47 1572. NA NA NA
## 5 GC=F 2020-01-08 1580. 1604. 1552. 1557. 236 1557. NA NA NA
## 6 GC=F 2020-01-09 1556. 1556. 1543. 1552. 54 1552. NA NA NA
## # ℹ 4 more variables: Daily_Return <dbl>, Strategy_Return <dbl>,
## # Cumulative_Strategy_Return <dbl>, Cumulative_Market_Return <dbl>
sum(is.na(gold_data$Daily_Return))
## [1] 1
sum(is.na(gold_data$Strategy_Return))
## [1] 200
gold_data <- gold_data %>%
filter(!is.na(Daily_Return) & !is.na(Strategy_Return))
Step 3.6: Calculate Cumulative Returns # Calculate cumulative returns again
gold_data <- gold_data %>%
mutate(Cumulative_Strategy_Return = cumprod(1 + Strategy_Return) - 1,
Cumulative_Market_Return = cumprod(1 + Daily_Return) - 1)
Step 3.7: Visualize the Results # Plot cumulative returns
ggplot(gold_data, aes(x = date)) +
geom_line(aes(y = Cumulative_Strategy_Return, color = "Strategy Return")) +
geom_line(aes(y = Cumulative_Market_Return, color = "Market Return")) +
labs(title = "Cumulative Returns: Moving Average Crossover Strategy vs Market",
x = "Date", y = "Cumulative Return") +
scale_color_manual(values = c("blue", "red")) +
theme_minimal() +
theme(legend.title = element_blank())
4. Show Investment Performance Results
After running the code, you will get two main results: Cumulative Strategy Return: Shows how much the moving average strategy would have grown your capital. Cumulative Market Return: Shows how much a buy-and-hold strategy in GOLD would have returned over the same period.
Step 4.1: Display the Final Performance # Final performance results
final_strategy_return <- tail(gold_data$Cumulative_Strategy_Return, 1)
final_market_return <- tail(gold_data$Cumulative_Market_Return, 1)
cat("Final Strategy Return:", round(final_strategy_return * 100, 2), "%\n")
## Final Strategy Return: 5.65 %
cat("Final Market Return:", round(final_market_return * 100, 2), "%\n")
## Final Market Return: 38.15 %