Business Question: Determine which major forex pairs(EUR/USD, GBP/USD, USD/JPY) offered the best balance of profitability.
Retail traders often struggle to decide which currency pairs to focus
on, since each has different volatility and profitability
characteristics.
The objective of this case study is to identify which among the three
most traded pairs — EUR/USD, GBP/USD, and
USD/JPY — provided the best balance of profitability and
consistency between 2008 and 2023.
A clear statement of which pair(s) provide the best balance of profitability and consistency, supported by descriptive statistics, visualizations, and recommendations.
library(tidyverse)
library(lubridate)
library(knitr)
# List all csv files
files <- list.files(pattern = "*.csv")
# Read and combine all into one dataframe
forex <- map_dfr(files, ~ read_csv(.x), .id = "source")
# Quick check
kable(head(forex,10))
| source | Ticker | Date | Open | High | Low | Close | Adj Close | Volume |
|---|---|---|---|---|---|---|---|---|
| 1 | GBP=X | 2008-08-01 | 0.50441 | 0.50680 | 0.50396 | 0.50651 | 0.50651 | 0 |
| 1 | AUD=X | 2008-08-01 | 1.06430 | 1.07650 | 1.06390 | 1.07270 | 1.07270 | 0 |
| 1 | CAD=X | 2008-08-01 | 1.02340 | 1.02940 | 1.02290 | 1.02460 | 1.02460 | 0 |
| 1 | CHF=X | 2008-08-01 | 1.04800 | 1.05120 | 1.04640 | 1.05040 | 1.05040 | 0 |
| 1 | CNY=X | 2008-08-01 | 6.83600 | 6.85540 | 6.82100 | 6.84700 | 6.84700 | 0 |
| 1 | SGD=X | 2008-08-01 | 1.36780 | 1.37250 | 1.36770 | 1.37250 | 1.37250 | 0 |
| 1 | GBP=X | 2008-08-04 | 0.50632 | 0.51015 | 0.50607 | 0.50989 | 0.50989 | 0 |
| 1 | AUD=X | 2008-08-04 | 1.07460 | 1.07690 | 1.07000 | 1.07690 | 1.07690 | 0 |
| 1 | CAD=X | 2008-08-04 | 1.02800 | 1.03680 | 1.02620 | 1.03640 | 1.03640 | 0 |
| 1 | CHF=X | 2008-08-04 | 1.04850 | 1.05020 | 1.04320 | 1.04880 | 1.04880 | 0 |
For this case study, I am using publicly available historical forex
data covering the period 2008–2023.
The dataset was obtained from Kaggle (Daily Currency Exchange Rates
2008–Present), which aggregates official exchange rate data.
Each year of data was provided as a separate CSV file (2008_data.csv,
2009_data.csv, …, 2023_data.csv).
The files contain daily exchange rates for multiple major and minor
currencies.
This dataset is appropriate for the business task because it
provides:
- Long-term coverage (15+ years) to capture market cycles
- Daily granularity, which allows aggregation into Hourly, 4-Hour, and
Daily timeframes
- Multiple pairs for comparative analysis
Before analysis, the data must be combined into a single dataset and checked for missing values, consistency in column names, and correct date formatting.
# Load libraries
library(tidyverse)
library(lubridate)
# Step 1: List all CSV files in the working directory
files <- list.files(pattern = "*.csv")
# Step 2: Read and combine all yearly CSVs into one dataframe
forex <- map_dfr(files, ~ read_csv(.x), .id = "source")
# Step 3: Preview structure
colnames(forex)
## [1] "source" "Ticker" "Date" "Open" "High" "Low"
## [7] "Close" "Adj Close" "Volume"
glimpse(forex)
## Rows: 38,982
## Columns: 9
## $ source <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
## $ Ticker <chr> "GBP=X", "AUD=X", "CAD=X", "CHF=X", "CNY=X", "SGD=X", "GBP…
## $ Date <date> 2008-08-01, 2008-08-01, 2008-08-01, 2008-08-01, 2008-08-0…
## $ Open <dbl> 0.50441, 1.06430, 1.02340, 1.04800, 6.83600, 1.36780, 0.50…
## $ High <dbl> 0.50680, 1.07650, 1.02940, 1.05120, 6.85540, 1.37250, 0.51…
## $ Low <dbl> 0.50396, 1.06390, 1.02290, 1.04640, 6.82100, 1.36770, 0.50…
## $ Close <dbl> 0.50651, 1.07270, 1.02460, 1.05040, 6.84700, 1.37250, 0.50…
## $ `Adj Close` <dbl> 0.50651, 1.07270, 1.02460, 1.05040, 6.84700, 1.37250, 0.50…
## $ Volume <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
# Step 4: Show a clean preview (first 10 rows)
library(knitr)
kable(head(forex, 10))
| source | Ticker | Date | Open | High | Low | Close | Adj Close | Volume |
|---|---|---|---|---|---|---|---|---|
| 1 | GBP=X | 2008-08-01 | 0.50441 | 0.50680 | 0.50396 | 0.50651 | 0.50651 | 0 |
| 1 | AUD=X | 2008-08-01 | 1.06430 | 1.07650 | 1.06390 | 1.07270 | 1.07270 | 0 |
| 1 | CAD=X | 2008-08-01 | 1.02340 | 1.02940 | 1.02290 | 1.02460 | 1.02460 | 0 |
| 1 | CHF=X | 2008-08-01 | 1.04800 | 1.05120 | 1.04640 | 1.05040 | 1.05040 | 0 |
| 1 | CNY=X | 2008-08-01 | 6.83600 | 6.85540 | 6.82100 | 6.84700 | 6.84700 | 0 |
| 1 | SGD=X | 2008-08-01 | 1.36780 | 1.37250 | 1.36770 | 1.37250 | 1.37250 | 0 |
| 1 | GBP=X | 2008-08-04 | 0.50632 | 0.51015 | 0.50607 | 0.50989 | 0.50989 | 0 |
| 1 | AUD=X | 2008-08-04 | 1.07460 | 1.07690 | 1.07000 | 1.07690 | 1.07690 | 0 |
| 1 | CAD=X | 2008-08-04 | 1.02800 | 1.03680 | 1.02620 | 1.03640 | 1.03640 | 0 |
| 1 | CHF=X | 2008-08-04 | 1.04850 | 1.05020 | 1.04320 | 1.04880 | 1.04880 | 0 |
After gathering the data, the next step was to ensure that it is
clean, consistent, and analysis-ready. The dataset
originally came in multiple yearly CSV files (2008–2023), which were
combined into a single dataframe. Each row represents the exchange rate
data for one currency pair (Ticker) on a given date,
including the open, high, low, and close prices.
Steps taken in this phase:
Date column into a
proper Date format using the lubridate
package.EUR/USD (EUR=X), GBP/USD (GBP=X),
and USD/JPY (JPY=X), as these are among the most traded
forex pairs globally.# Step 1: Check column names
colnames(forex)
## [1] "source" "Ticker" "Date" "Open" "High" "Low"
## [7] "Close" "Adj Close" "Volume"
# Step 2: Convert Date to proper format
forex <- forex %>%
mutate(Date = ymd(Date)) %>%
arrange(Date)
# Step 3: Handle missing values (drop rows with NA)
forex <- forex %>%
drop_na()
# Step 4: Filter only major pairs (e.g., EUR/USD, GBP/USD, USD/JPY)
# Note: in Yahoo Finance tickers, EUR/USD = "EUR=X", GBP/USD = "GBP=X", JPY/USD = "JPY=X"
forex_selected <- forex %>%
filter(Ticker %in% c("EUR=X", "GBP=X", "JPY=X"))
# Step 5: Check cleaned dataset
glimpse(forex_selected)
## Rows: 11,686
## Columns: 9
## $ source <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
## $ Ticker <chr> "GBP=X", "GBP=X", "GBP=X", "GBP=X", "GBP=X", "JPY=X", "EUR…
## $ Date <date> 2008-08-01, 2008-08-04, 2008-08-05, 2008-08-06, 2008-08-0…
## $ Open <dbl> 0.50441, 0.50632, 0.50991, 0.51135, 0.51335, 109.42000, 0.…
## $ High <dbl> 0.50680, 0.51015, 0.51216, 0.51362, 0.51488, 110.36200, 0.…
## $ Low <dbl> 0.50396, 0.50607, 0.50989, 0.51047, 0.51180, 109.28000, 0.…
## $ Close <dbl> 0.50651, 0.50989, 0.51135, 0.51329, 0.51451, 110.17000, 0.…
## $ `Adj Close` <dbl> 0.50651, 0.50989, 0.51135, 0.51329, 0.51451, 110.17000, 0.…
## $ Volume <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
kable(head(forex_selected, 10))
| source | Ticker | Date | Open | High | Low | Close | Adj Close | Volume |
|---|---|---|---|---|---|---|---|---|
| 1 | GBP=X | 2008-08-01 | 0.50441 | 0.50680 | 0.50396 | 0.50651 | 0.50651 | 0 |
| 1 | GBP=X | 2008-08-04 | 0.50632 | 0.51015 | 0.50607 | 0.50989 | 0.50989 | 0 |
| 1 | GBP=X | 2008-08-05 | 0.50991 | 0.51216 | 0.50989 | 0.51135 | 0.51135 | 0 |
| 1 | GBP=X | 2008-08-06 | 0.51135 | 0.51362 | 0.51047 | 0.51329 | 0.51329 | 0 |
| 1 | GBP=X | 2008-08-07 | 0.51335 | 0.51488 | 0.51180 | 0.51451 | 0.51451 | 0 |
| 1 | JPY=X | 2008-08-08 | 109.42000 | 110.36200 | 109.28000 | 110.17000 | 110.17000 | 0 |
| 1 | EUR=X | 2008-08-08 | 0.65214 | 0.66667 | 0.65206 | 0.66450 | 0.66450 | 0 |
| 1 | GBP=X | 2008-08-08 | 0.51454 | 0.52223 | 0.51454 | 0.52091 | 0.52091 | 0 |
| 1 | GBP=X | 2008-08-11 | 0.52219 | 0.52422 | 0.51926 | 0.52304 | 0.52304 | 0 |
| 1 | GBP=X | 2008-08-12 | 0.52293 | 0.52757 | 0.52293 | 0.52748 | 0.52748 | 0 |
With the dataset cleaned and filtered, the next step is to
analyze performance metrics for the selected currency
pairs: EUR/USD (EUR=X), GBP/USD (GBP=X), and
USD/JPY (JPY=X).
The analysis in this phase includes:
1. Calculating daily returns from closing prices using
percentage change.
2. Measuring volatility as the standard deviation of
returns.
3. Summarizing performance with descriptive statistics
(mean return, volatility, min/max).
4. Identifying patterns in returns across different
pairs before aggregating into timeframes.
# Step 1: Calculate daily returns for each pair
forex_returns <- forex_selected %>%
group_by(Ticker) %>%
arrange(Date) %>%
mutate(Daily_Return = (Close - lag(Close)) / lag(Close)) %>%
ungroup()
# Step 2: Summarize returns and volatility
summary_stats <- forex_returns %>%
group_by(Ticker) %>%
summarise(
Mean_Return = mean(Daily_Return, na.rm = TRUE),
Volatility = sd(Daily_Return, na.rm = TRUE),
Min_Return = min(Daily_Return, na.rm = TRUE),
Max_Return = max(Daily_Return, na.rm = TRUE),
.groups = "drop"
)
# Step 3: Preview results
library(knitr)
kable(summary_stats, digits = 6)
| Ticker | Mean_Return | Volatility | Min_Return | Max_Return |
|---|---|---|---|---|
| EUR=X | 0.000108 | 0.007322 | -0.147542 | 0.154103 |
| GBP=X | 0.000131 | 0.006267 | -0.034248 | 0.082296 |
| JPY=X | 0.000088 | 0.007589 | -0.155044 | 0.176902 |
GBP/USD is more volatile with bigger swings compared to
USD/JPY.USD/JPY appears steadier, with smaller movements.GBP/USD has fatter tails — it experiences more extreme
positive and negative moves than the others.USD/JPY is tightly centered, confirming its reputation
as a relatively stable pair.GBP/USD has the widest spread, making it riskier but
potentially more profitable.EUR/USD sits in the middle with balanced
volatility.USD/JPY has the smallest spread, suggesting it may be
best for traders who prefer stability and consistency.Deliverables from Share phase:
- Three clear visualizations.
- Narrative insights in plain language.
- Connection back to the business question (“Which
timeframe/pair provides the best balance of profitability and
consistency?”).
Having analyzed and visualized the forex dataset (2008–2023), I can now summarize the high-level insights and translate them into practical recommendations.
GBP/USD shows the highest volatility, making it
attractive for traders who seek larger price swings but also exposing
them to higher risk.USD/JPY has the lowest volatility, offering more
stability and consistency.EUR/USD lies between the two, balancing both risk and
reward.GBP/USD) creates more opportunities
for short-term gains but increases the likelihood of losses.USD/JPY) provides steadier but
smaller returns.USD/JPY or EUR/USD, where returns are steadier
and risks lower.GBP/USD, but use strict risk management to avoid large
drawdowns.This case study demonstrates how historical forex data (2008–2023)
can be analyzed to compare risk and return across major pairs. By
calculating daily returns, volatility, and distributions, I identified
clear differences in stability and risk among EUR/USD,
GBP/USD, and USD/JPY.
The findings highlight the importance of aligning trading strategy
with volatility profiles. This work shows my ability to:
- Frame a business problem clearly
- Gather, clean, and analyze real-world data
- Create effective visualizations
- Translate technical results into actionable insights for
decision-makers