Ask Phase (Forex Case Study )

Business Question: Determine which major forex pairs(EUR/USD, GBP/USD, USD/JPY) offered the best balance of profitability.

Retail traders often struggle to decide which currency pairs to focus on, since each has different volatility and profitability characteristics.
The objective of this case study is to identify which among the three most traded pairs — EUR/USD, GBP/USD, and USD/JPY — provided the best balance of profitability and consistency between 2008 and 2023.

Guiding Questions

  1. What are the characteristics of the selected pairs in terms of returns and volatility?
  2. Which pair offers higher profitability but also greater risk?
  3. Which pair offers more consistent and stable returns?
  4. How do these differences translate into practical choices for retail traders?

Deliverable

A clear statement of which pair(s) provide the best balance of profitability and consistency, supported by descriptive statistics, visualizations, and recommendations.

library(tidyverse)
library(lubridate)
library(knitr)

# List all csv files
files <- list.files(pattern = "*.csv")

# Read and combine all into one dataframe
forex <- map_dfr(files, ~ read_csv(.x), .id = "source")

# Quick check
kable(head(forex,10))
source Ticker Date Open High Low Close Adj Close Volume
1 GBP=X 2008-08-01 0.50441 0.50680 0.50396 0.50651 0.50651 0
1 AUD=X 2008-08-01 1.06430 1.07650 1.06390 1.07270 1.07270 0
1 CAD=X 2008-08-01 1.02340 1.02940 1.02290 1.02460 1.02460 0
1 CHF=X 2008-08-01 1.04800 1.05120 1.04640 1.05040 1.05040 0
1 CNY=X 2008-08-01 6.83600 6.85540 6.82100 6.84700 6.84700 0
1 SGD=X 2008-08-01 1.36780 1.37250 1.36770 1.37250 1.37250 0
1 GBP=X 2008-08-04 0.50632 0.51015 0.50607 0.50989 0.50989 0
1 AUD=X 2008-08-04 1.07460 1.07690 1.07000 1.07690 1.07690 0
1 CAD=X 2008-08-04 1.02800 1.03680 1.02620 1.03640 1.03640 0
1 CHF=X 2008-08-04 1.04850 1.05020 1.04320 1.04880 1.04880 0

Prepare (Forex Case Study )

For this case study, I am using publicly available historical forex data covering the period 2008–2023.
The dataset was obtained from Kaggle (Daily Currency Exchange Rates 2008–Present), which aggregates official exchange rate data.

Each year of data was provided as a separate CSV file (2008_data.csv, 2009_data.csv, …, 2023_data.csv).
The files contain daily exchange rates for multiple major and minor currencies.

This dataset is appropriate for the business task because it provides:
- Long-term coverage (15+ years) to capture market cycles
- Daily granularity, which allows aggregation into Hourly, 4-Hour, and Daily timeframes
- Multiple pairs for comparative analysis

Before analysis, the data must be combined into a single dataset and checked for missing values, consistency in column names, and correct date formatting.

# Load libraries
library(tidyverse)
library(lubridate)

# Step 1: List all CSV files in the working directory
files <- list.files(pattern = "*.csv")

# Step 2: Read and combine all yearly CSVs into one dataframe
forex <- map_dfr(files, ~ read_csv(.x), .id = "source")

# Step 3: Preview structure
colnames(forex)
## [1] "source"    "Ticker"    "Date"      "Open"      "High"      "Low"      
## [7] "Close"     "Adj Close" "Volume"
glimpse(forex)
## Rows: 38,982
## Columns: 9
## $ source      <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
## $ Ticker      <chr> "GBP=X", "AUD=X", "CAD=X", "CHF=X", "CNY=X", "SGD=X", "GBP…
## $ Date        <date> 2008-08-01, 2008-08-01, 2008-08-01, 2008-08-01, 2008-08-0…
## $ Open        <dbl> 0.50441, 1.06430, 1.02340, 1.04800, 6.83600, 1.36780, 0.50…
## $ High        <dbl> 0.50680, 1.07650, 1.02940, 1.05120, 6.85540, 1.37250, 0.51…
## $ Low         <dbl> 0.50396, 1.06390, 1.02290, 1.04640, 6.82100, 1.36770, 0.50…
## $ Close       <dbl> 0.50651, 1.07270, 1.02460, 1.05040, 6.84700, 1.37250, 0.50…
## $ `Adj Close` <dbl> 0.50651, 1.07270, 1.02460, 1.05040, 6.84700, 1.37250, 0.50…
## $ Volume      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
# Step 4: Show a clean preview (first 10 rows)
library(knitr)
kable(head(forex, 10))
source Ticker Date Open High Low Close Adj Close Volume
1 GBP=X 2008-08-01 0.50441 0.50680 0.50396 0.50651 0.50651 0
1 AUD=X 2008-08-01 1.06430 1.07650 1.06390 1.07270 1.07270 0
1 CAD=X 2008-08-01 1.02340 1.02940 1.02290 1.02460 1.02460 0
1 CHF=X 2008-08-01 1.04800 1.05120 1.04640 1.05040 1.05040 0
1 CNY=X 2008-08-01 6.83600 6.85540 6.82100 6.84700 6.84700 0
1 SGD=X 2008-08-01 1.36780 1.37250 1.36770 1.37250 1.37250 0
1 GBP=X 2008-08-04 0.50632 0.51015 0.50607 0.50989 0.50989 0
1 AUD=X 2008-08-04 1.07460 1.07690 1.07000 1.07690 1.07690 0
1 CAD=X 2008-08-04 1.02800 1.03680 1.02620 1.03640 1.03640 0
1 CHF=X 2008-08-04 1.04850 1.05020 1.04320 1.04880 1.04880 0

Process (Forex Case Study )

After gathering the data, the next step was to ensure that it is clean, consistent, and analysis-ready. The dataset originally came in multiple yearly CSV files (2008–2023), which were combined into a single dataframe. Each row represents the exchange rate data for one currency pair (Ticker) on a given date, including the open, high, low, and close prices.

Steps taken in this phase:

  1. Checked column names and structure to confirm consistency across all CSVs.
  2. Converted the Date column into a proper Date format using the lubridate package.
  3. Removed missing values to ensure data quality.
  4. Filtered relevant pairs by selecting only the tickers for EUR/USD (EUR=X), GBP/USD (GBP=X), and USD/JPY (JPY=X), as these are among the most traded forex pairs globally.
  5. Sorted the dataset chronologically to prepare for later timeframe analysis (Hourly, 4-Hour, Daily aggregation).
  6. Documented each cleaning step to make the process transparent and reproducible.
# Step 1: Check column names
colnames(forex)
## [1] "source"    "Ticker"    "Date"      "Open"      "High"      "Low"      
## [7] "Close"     "Adj Close" "Volume"
# Step 2: Convert Date to proper format
forex <- forex %>%
  mutate(Date = ymd(Date)) %>%
  arrange(Date)

# Step 3: Handle missing values (drop rows with NA)
forex <- forex %>%
  drop_na()

# Step 4: Filter only major pairs (e.g., EUR/USD, GBP/USD, USD/JPY)
# Note: in Yahoo Finance tickers, EUR/USD = "EUR=X", GBP/USD = "GBP=X", JPY/USD = "JPY=X"
forex_selected <- forex %>%
  filter(Ticker %in% c("EUR=X", "GBP=X", "JPY=X"))

# Step 5: Check cleaned dataset
glimpse(forex_selected)
## Rows: 11,686
## Columns: 9
## $ source      <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
## $ Ticker      <chr> "GBP=X", "GBP=X", "GBP=X", "GBP=X", "GBP=X", "JPY=X", "EUR…
## $ Date        <date> 2008-08-01, 2008-08-04, 2008-08-05, 2008-08-06, 2008-08-0…
## $ Open        <dbl> 0.50441, 0.50632, 0.50991, 0.51135, 0.51335, 109.42000, 0.…
## $ High        <dbl> 0.50680, 0.51015, 0.51216, 0.51362, 0.51488, 110.36200, 0.…
## $ Low         <dbl> 0.50396, 0.50607, 0.50989, 0.51047, 0.51180, 109.28000, 0.…
## $ Close       <dbl> 0.50651, 0.50989, 0.51135, 0.51329, 0.51451, 110.17000, 0.…
## $ `Adj Close` <dbl> 0.50651, 0.50989, 0.51135, 0.51329, 0.51451, 110.17000, 0.…
## $ Volume      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
kable(head(forex_selected, 10))
source Ticker Date Open High Low Close Adj Close Volume
1 GBP=X 2008-08-01 0.50441 0.50680 0.50396 0.50651 0.50651 0
1 GBP=X 2008-08-04 0.50632 0.51015 0.50607 0.50989 0.50989 0
1 GBP=X 2008-08-05 0.50991 0.51216 0.50989 0.51135 0.51135 0
1 GBP=X 2008-08-06 0.51135 0.51362 0.51047 0.51329 0.51329 0
1 GBP=X 2008-08-07 0.51335 0.51488 0.51180 0.51451 0.51451 0
1 JPY=X 2008-08-08 109.42000 110.36200 109.28000 110.17000 110.17000 0
1 EUR=X 2008-08-08 0.65214 0.66667 0.65206 0.66450 0.66450 0
1 GBP=X 2008-08-08 0.51454 0.52223 0.51454 0.52091 0.52091 0
1 GBP=X 2008-08-11 0.52219 0.52422 0.51926 0.52304 0.52304 0
1 GBP=X 2008-08-12 0.52293 0.52757 0.52293 0.52748 0.52748 0

Analyze (Forex Case Study )

With the dataset cleaned and filtered, the next step is to analyze performance metrics for the selected currency pairs: EUR/USD (EUR=X), GBP/USD (GBP=X), and USD/JPY (JPY=X).

The analysis in this phase includes:
1. Calculating daily returns from closing prices using percentage change.
2. Measuring volatility as the standard deviation of returns.
3. Summarizing performance with descriptive statistics (mean return, volatility, min/max).
4. Identifying patterns in returns across different pairs before aggregating into timeframes.

# Step 1: Calculate daily returns for each pair
forex_returns <- forex_selected %>%
  group_by(Ticker) %>%
  arrange(Date) %>%
  mutate(Daily_Return = (Close - lag(Close)) / lag(Close)) %>%
  ungroup()

# Step 2: Summarize returns and volatility
summary_stats <- forex_returns %>%
  group_by(Ticker) %>%
  summarise(
    Mean_Return = mean(Daily_Return, na.rm = TRUE),
    Volatility = sd(Daily_Return, na.rm = TRUE),
    Min_Return = min(Daily_Return, na.rm = TRUE),
    Max_Return = max(Daily_Return, na.rm = TRUE),
    .groups = "drop"
  )

# Step 3: Preview results
library(knitr)
kable(summary_stats, digits = 6)
Ticker Mean_Return Volatility Min_Return Max_Return
EUR=X 0.000108 0.007322 -0.147542 0.154103
GBP=X 0.000131 0.006267 -0.034248 0.082296
JPY=X 0.000088 0.007589 -0.155044 0.176902

Share (Forex Case Study )

Now that daily returns and volatility have been calculated, the next step is to communicate the findings in a way that is clear, visual, and accessible.

For this phase, I created visualizations to highlight:
1. Trends in returns over time (line chart of daily returns).
2. Distribution of returns (histogram for each currency pair).
3. Volatility comparison (boxplot across pairs).

The goal is to answer the guiding question:
- How do the three selected pairs (EUR/USD, GBP/USD, USD/JPY) compare in terms of profitability and risk?

Visualizations are paired with plain-language insights so that any reader, regardless of technical expertise, can understand the results.

library(ggplot2)

# 1. Line chart of daily returns
ggplot(forex_returns, aes(x = Date, y = Daily_Return, color = Ticker)) +
  geom_line(alpha = 0.5) +
  labs(title = "Daily Returns Over Time",
       y = "Daily Return (%)",
       x = "Date") +
  theme_minimal()

# 2. Histogram of returns by pair
ggplot(forex_returns, aes(x = Daily_Return, fill = Ticker)) +
  geom_histogram(bins = 100, alpha = 0.6, position = "identity") +
  labs(title = "Distribution of Daily Returns",
       x = "Daily Return (%)",
       y = "Frequency") +
  theme_minimal()

# 3. Boxplot of volatility across pairs
ggplot(forex_returns, aes(x = Ticker, y = Daily_Return, fill = Ticker)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Volatility Comparison Across Currency Pairs",
       x = "Currency Pair",
       y = "Daily Return (%)") +
  theme_minimal()

Insights for Average Readers

  1. Daily Returns Over Time (Line Chart)
    • All three pairs show sharp spikes during major global financial events (e.g., 2008 crisis, COVID-19 in 2020).
    • GBP/USD is more volatile with bigger swings compared to USD/JPY.
    • USD/JPY appears steadier, with smaller movements.
  2. Distribution of Daily Returns (Histogram)
    • Most daily returns cluster around zero, meaning currencies often move only slightly.
    • GBP/USD has fatter tails — it experiences more extreme positive and negative moves than the others.
    • USD/JPY is tightly centered, confirming its reputation as a relatively stable pair.
  3. Volatility Comparison (Boxplot)
    • GBP/USD has the widest spread, making it riskier but potentially more profitable.
    • EUR/USD sits in the middle with balanced volatility.
    • USD/JPY has the smallest spread, suggesting it may be best for traders who prefer stability and consistency.

Deliverables from Share phase:
- Three clear visualizations.
- Narrative insights in plain language.
- Connection back to the business question (“Which timeframe/pair provides the best balance of profitability and consistency?”).

Act (Forex Case Study )

Having analyzed and visualized the forex dataset (2008–2023), I can now summarize the high-level insights and translate them into practical recommendations.

Final Conclusions

  1. Volatility Patterns:
    • GBP/USD shows the highest volatility, making it attractive for traders who seek larger price swings but also exposing them to higher risk.
    • USD/JPY has the lowest volatility, offering more stability and consistency.
    • EUR/USD lies between the two, balancing both risk and reward.
  2. Profitability vs. Consistency:
    • Higher volatility (GBP/USD) creates more opportunities for short-term gains but increases the likelihood of losses.
    • Lower volatility (USD/JPY) provides steadier but smaller returns.
    • For many retail traders, the trade-off depends on whether they prioritize big opportunities or stable outcomes.
  3. Market Events:
    • Global crises (2008 financial crash, COVID-19 in 2020) caused significant spikes across all pairs.
    • This confirms that macroeconomic events strongly impact forex markets, regardless of the pair.

Recommendations

  • For new or risk-averse traders: Focus on USD/JPY or EUR/USD, where returns are steadier and risks lower.
  • For experienced or aggressive traders: Consider GBP/USD, but use strict risk management to avoid large drawdowns.
  • For trading educators/prop firms: Emphasize the importance of volatility awareness — traders should choose pairs that align with their risk tolerance.
  • For further study: Extend this analysis by resampling into Hourly and 4-Hour data to identify which timeframe delivers the best risk/reward balance.

Portfolio-Ready Summary

This case study demonstrates how historical forex data (2008–2023) can be analyzed to compare risk and return across major pairs. By calculating daily returns, volatility, and distributions, I identified clear differences in stability and risk among EUR/USD, GBP/USD, and USD/JPY.

The findings highlight the importance of aligning trading strategy with volatility profiles. This work shows my ability to:
- Frame a business problem clearly
- Gather, clean, and analyze real-world data
- Create effective visualizations
- Translate technical results into actionable insights for decision-makers