đź“„ COURSE DETAILS

Course: Data Exploration and Preparation
Course Code: CAP482
Assessment: CA-3


📌 PROJECT DETAILS

Project Title: Forex Market Analysis using Technical Indicators


👨‍🎓 SUBMITTED BY

Section: DE553
Group: (G1)


👩‍🏫 SUBMITTED TO

Ms. Ranjit Kaur Walia
Assistant Professor, SCA, LPU


🏫 INSTITUTION

Lovely Faculty of Technology & Sciences
School of Computer Applications
Lovely Professional University, Punjab


📊 Introduction

This project analyzes forex market data using statistical methods, visualization techniques, and technical indicators such as Moving Average and RSI. It also integrates trading strategies, machine learning, and forecasting methods to generate meaningful insights and support decision-making.

The analysis focuses on understanding price trends, market volatility, and relationships between key variables such as Open, High, Low, Close, and Volume. By combining data analysis with predictive techniques, the project demonstrates how data-driven approaches can enhance trading strategies and provide a deeper understanding of forex market behavior.


🎯 Objectives


đź”· Load Libraries

#The required libraries are loaded to perform data manipulation, visualization, statistical analysis, and forecasting.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.5.3
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(zoo)
## Warning: package 'zoo' was built under R version 4.5.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(GGally)
## Warning: package 'GGally' was built under R version 4.5.3
library(forecast)
## Warning: package 'forecast' was built under R version 4.5.3
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.5.3
## corrplot 0.95 loaded

#The dataset contains forex data with variables like Date, Open, High, Low, Close, Volume, and Pair.

data <- read.csv("C:/Users/HP/Downloads/forex_5_years_data.csv")

head(data)
##         Date   Pair    Open    High     Low   Close Volume
## 1 2021-01-01 EURUSD 1.20596 1.20928 1.19042 1.19336 152386
## 2 2021-01-02 EURUSD 1.21153 1.21379 1.18937 1.19752 130263
## 3 2021-01-03 EURUSD 1.19056 1.20314 1.17688 1.19066  95658
## 4 2021-01-04 EURUSD 1.17966 1.19232 1.17686 1.19191 129150
## 5 2021-01-05 EURUSD 1.20938 1.21477 1.20777 1.21192  81394
## 6 2021-01-06 EURUSD 1.19465 1.20476 1.17447 1.18811 157313
str(data)
## 'data.frame':    19310 obs. of  7 variables:
##  $ Date  : chr  "2021-01-01" "2021-01-02" "2021-01-03" "2021-01-04" ...
##  $ Pair  : chr  "EURUSD" "EURUSD" "EURUSD" "EURUSD" ...
##  $ Open  : num  1.21 1.21 1.19 1.18 1.21 ...
##  $ High  : num  1.21 1.21 1.2 1.19 1.21 ...
##  $ Low   : num  1.19 1.19 1.18 1.18 1.21 ...
##  $ Close : num  1.19 1.2 1.19 1.19 1.21 ...
##  $ Volume: int  152386 130263 95658 129150 81394 157313 118555 137121 115758 102606 ...

#Summary statistics provide an overview of the dataset including average values and variability, indicating market volatility.

summary(data)
##      Date               Pair                Open                High          
##  Length:19310       Length:19310       Min.   :   0.1417   Min.   :   0.1431  
##  Class :character   Class :character   1st Qu.:   0.8627   1st Qu.:   0.8763  
##  Mode  :character   Mode  :character   Median :   1.5055   Median :   1.5285  
##                                        Mean   : 135.8935   Mean   : 138.0268  
##                                        3rd Qu.:  94.3094   3rd Qu.:  95.6762  
##                                        Max.   :2055.6127   Max.   :2101.4820  
##       Low                Close               Volume      
##  Min.   :   0.1403   Min.   :   0.1421   Min.   : 70005  
##  1st Qu.:   0.8482   1st Qu.:   0.8617   1st Qu.: 92061  
##  Median :   1.4799   Median :   1.5076   Median :115053  
##  Mean   : 133.7152   Mean   : 135.8532   Mean   :114851  
##  3rd Qu.:  92.5719   3rd Qu.:  94.1732   3rd Qu.:137364  
##  Max.   :2040.0934   Max.   :2068.3385   Max.   :159976

#Cleaning ensures accuracy by removing missing and duplicate values and converting dates for time-series analysis.

data <- na.omit(data)
data <- unique(data)
data$Date <- as.Date(data$Date)

#The statistics show central tendency and volatility. Strong correlation indicates dependency between opening and closing prices.

mean(data$Close)
## [1] 135.8532
sd(data$Close)
## [1] 330.325
max(data$Close)
## [1] 2068.338
min(data$Close)
## [1] 0.14205
median(data$Close)
## [1] 1.507585
cor(data$Open, data$Close)
## [1] 0.9998951

#The EUR/USD trend shows fluctuations representing bullish and bearish phases influenced by market conditions. #The EUR/USD price trend exhibits continuous fluctuations over time, indicating the inherently volatile nature of the forex market. Periods of upward movement reflect bullish sentiment where the Euro strengthens against the US Dollar, while downward trends indicate bearish conditions. These movements are influenced by macroeconomic factors such as interest rate changes, geopolitical events, and economic data releases.

eurusd <- data[data$Pair == "EURUSD", ]

ggplot(eurusd, aes(x = Date, y = Close)) +
  geom_line(color = "blue") +
  ggtitle("EURUSD Price Trend")

#Different currency pairs show varying trends and volatility, indicating diverse market behavior. #The comparison between EURUSD, GBPUSD, and USDJPY highlights differences in volatility and trend patterns across currency pairs. Some pairs exhibit sharper fluctuations, indicating higher volatility, while others show smoother trends. # insight:This helps traders compare performance trends across different currency pairs and identify which pair shows stronger growth or higher volatility.

subset_data <- data[data$Pair %in% c("EURUSD","GBPUSD","USDJPY"), ]
subset_data <- subset_data[order(subset_data$Date), ]
subset_data <- subset_data %>%
  group_by(Pair) %>%
  mutate(Normalized_Close = Close / first(Close))
ggplot(subset_data, aes(x = Date, y = Normalized_Close, color = Pair)) +
  geom_line(linewidth = 1.2) +
  labs(title = "Forex Pair Comparison (Normalized)",
       subtitle = "All pairs aligned for fair comparison",
       x = "Date",
       y = "Normalized Price",
       color = "Currency Pair") +
  scale_color_manual(values = c("#1f77b4", "#ff7f0e", "#2ca02c")) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    legend.position = "top"
  )

#The bar chart highlights differences in average prices among currency pairs. #The bar chart represents the average closing price of each currency pair over the dataset period. Significant differences between averages indicate varying valuation levels of currencies. # insight:Sorting the values helps quickly identify the highest and lowest performing currency pairs, enabling better decision-making and comparative analysis.

avg_price <- aggregate(Close ~ Pair, data, mean)

ggplot(avg_price, aes(x = reorder(Pair, Close), y = Close)) +
  geom_bar(stat = "identity", fill = "#4e79a7", width = 0.6) +
  geom_text(aes(label = round(Close,2)), 
            hjust = -0.1, size = 4) +
  coord_flip() +
  labs(title = "Average Closing Price by Currency Pair",
       subtitle = "Properly aligned and readable comparison",
       x = "Currency Pair",
       y = "Average Closing Price") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 12, hjust = 0.5),
    axis.text.y = element_text(size = 11),
    axis.text.x = element_text(size = 10),
    plot.margin = margin(10, 30, 10, 10)
  ) +
  expand_limits(y = max(avg_price$Close) * 1.1)

#The distribution shows how data is spread across different currency pairs. #The pie chart shows the proportion of observations for each currency pair. A balanced distribution ensures that analysis is not biased toward any specific pair. # insight:An equal distribution improves the reliability and fairness of the analysis, as results are not biased toward any specific currency pair. This allows for more accurate comparison and better generalization of findings across different forex markets.

pair_df <- as.data.frame(table(data$Pair))

ggplot(pair_df, aes(x = "", y = Freq, fill = Var1)) +
  geom_bar(stat = "identity", width = 3, color = "black") +
  coord_polar("y") +
  geom_text(aes(label = paste0(round(Freq/sum(Freq)*100,1), "%")),
            position = position_stack(vjust = 0.5), size = 4) +
  labs(title = "Distribution of Currency Pairs",
       fill = "Currency Pair") +
  scale_fill_viridis_d() +
  theme_void() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    legend.position = "right"
  )

#A strong positive relationship indicates consistent price movement during trading sessions. #The scatter plot reveals a strong positive correlation between Open and Close prices, indicating that market direction often continues throughout the trading session. #insight:All currency pairs show a strong positive relationship between opening and closing prices, indicating consistent market behavior.

ggplot(data, aes(x = Open, y = Close)) +
  geom_point(color = "#2ca02c", alpha = 0.4, size = 1.5) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  facet_wrap(~Pair, scales = "free") +
  labs(
    title = "Open vs Close Price (By Currency Pair)",
    x = "Opening Price",
    y = "Closing Price"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
  )
## `geom_smooth()` using formula = 'y ~ x'

#Strong correlations between variables validate their importance in prediction models. The pair plot shows strong linear relationships between Open, High, Low, and Close variables. This confirms that these variables move together and are highly interdependent.

library(ggplot2)

ggplot(data, aes(Open, Close)) +
  geom_point(alpha = 0.2, color = "blue") +
  geom_smooth(method = "lm", color = "red") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

#Outliers indicate sudden price changes due to market events #The boxplot highlights the spread and presence of outliers in price data. Outliers represent sudden spikes or drops, often caused by unexpected market events. #insight:Currency pairs show varying price distributions, with the log scale enabling a balanced comparison despite large differences in price levels.

ggplot(data, aes(x = Pair, y = Close)) +
  geom_boxplot(fill = "#4e79a7") +
  scale_y_log10() +
  coord_flip() +
  labs(
    title = "Price Distribution by Currency Pair (Log Scale)",
    x = "Currency Pair",
    y = "Closing Price (Log Scale)"
  ) +
  theme_minimal()

#The plot shows cumulative probability distribution of prices. #The cumulative distribution function shows the probability distribution of closing prices. It indicates how frequently prices fall below a certain level #insight:The ECDF shows how closing prices accumulate over time, helping identify the proportion of values below a given price level.

ggplot(data, aes(x = Close)) +
  stat_ecdf(color = "#1f77b4", linewidth = 1) +
  labs(
    title = "Cumulative Distribution of Closing Prices",
    subtitle = "ECDF showing probability distribution",
    x = "Closing Price",
    y = "Cumulative Probability"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5),
    axis.text = element_text(size = 10)
  )

#Higher volume indicates higher liquidity and trading activity. #The volume distribution shows variation in trading activity across currency pairs. Higher volume indicates higher liquidity and market participation. #insight:Trading volume varies across currency pairs, and the log scale highlights differences in activity levels more clearly.

ggplot(data, aes(x = Pair, y = Volume)) +
  geom_boxplot(fill = "#1f77b4") +
  scale_y_log10() +
  coord_flip() +
  labs(
    title = "Volume Distribution by Currency Pair",
    x = "Currency Pair",
    y = "Volume (Log Scale)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    legend.position = "none"
  )

#Moving average smooths fluctuations and shows overall trend direction. #The moving average smooths short-term fluctuations and highlights the underlying trend. When the price stays above the moving average, it indicates bullish momentum, while falling below indicates bearish conditions. #insight:The moving average smooths price fluctuations and helps identify the overall trend direction in the EUR/USD market.

eurusd$MA50 <- zoo::rollmean(eurusd$Close, 50, fill = NA)

ggplot(eurusd, aes(x = Date)) +
  geom_line(aes(y = Close), color = "#1f77b4", linewidth = 0.8) +
  geom_line(aes(y = MA50), color = "#d62728", linewidth = 1) +
  labs(
    title = "EUR/USD Price with 50-Day Moving Average",
    x = "Date",
    y = "Price"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5)
  )
## Warning: Removed 49 rows containing missing values or values outside the scale range
## (`geom_line()`).

#RSI identifies overbought (>70) and oversold (<30) conditions for trading decisions. #The RSI indicator measures market momentum and identifies overbought and oversold conditions. Values above 70 indicate overbought conditions, suggesting a potential price correction, while values below 30 indicate oversold conditions, signaling a possible price rebound. #insight:The RSI indicator identifies overbought (above 70) and oversold (below 30) conditions, helping detect potential trend reversals in the market.

RSI <- function(price, n = 14){
  delta <- diff(price)
  gain <- ifelse(delta > 0, delta, 0)
  loss <- ifelse(delta < 0, -delta, 0)
  
  avg_gain <- rollmean(gain, n, fill = NA)
  avg_loss <- rollmean(loss, n, fill = NA)
  
  rs <- avg_gain / avg_loss
  rsi <- 100 - (100 / (1 + rs))
  
  rsi <- c(NA, rsi)
  return(rsi)
}

eurusd$RSI <- RSI(eurusd$Close)

ggplot(eurusd, aes(x = Date, y = RSI)) +
  geom_line(color = "blue") +
  geom_hline(yintercept = 70, color = "red") +
  geom_hline(yintercept = 30, color = "green") +
  ggtitle("RSI Indicator")
## Warning: Removed 14 rows containing missing values or values outside the scale range
## (`geom_line()`).

#Trading signals help identify entry and exit points based on market momentum. #The frequency distribution of signals helps evaluate the behavior of the market. If Buy and Sell signals occur less frequently compared to Hold, it indicates that extreme market conditions (overbought or oversold) are relatively rare. This suggests that the market spends more time in stable or sideways conditions rather than strong trending phases.

eurusd$Signal <- ifelse(eurusd$RSI < 30, "Buy",
                       ifelse(eurusd$RSI > 70, "Sell", "Hold"))

table(eurusd$Signal)
## 
##  Buy Hold Sell 
##  219 1445  253

#The strategy return evaluates performance of trading decisions. #The strategy return calculates the profit or loss generated by applying the RSI-based trading signals. When a Buy signal is generated, the return is equal to the price change, indicating profit if prices rise. When a Sell signal is generated, the return is taken as the negative price change, representing profit when prices fall. Hold signals result in zero return, indicating no trading activity.

eurusd$Return <- c(NA, diff(eurusd$Close))

eurusd$Strategy_Return <- ifelse(eurusd$Signal == "Buy", eurusd$Return,
                           ifelse(eurusd$Signal == "Sell", -eurusd$Return, 0))

sum(eurusd$Strategy_Return, na.rm = TRUE)
## [1] -7.37135

#Regression model predicts closing price based on market variables. #The linear regression model is used to predict the closing price of EUR/USD based on independent variables such as Open, High, Low, and Volume. The model summary provides important statistical measures including coefficients, R-squared value, and significance levels, which indicate how well the model explains the variation in closing prices.

model <- lm(Close ~ Open + High + Low + Volume, data = eurusd)

summary(model)
## 
## Call:
## lm(formula = Close ~ Open + High + Low + Volume, data = eurusd)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.125330 -0.013036 -0.000476  0.013364  0.128454 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.060e-03  3.229e-03   0.948   0.3434    
## Open        -7.141e-02  2.923e-02  -2.443   0.0147 *  
## High         5.165e-01  1.968e-02  26.243   <2e-16 ***
## Low          5.542e-01  2.073e-02  26.729   <2e-16 ***
## Volume      -4.835e-09  2.267e-08  -0.213   0.8312    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02604 on 1926 degrees of freedom
## Multiple R-squared:  0.9988, Adjusted R-squared:  0.9988 
## F-statistic: 3.969e+05 on 4 and 1926 DF,  p-value: < 2.2e-16
eurusd$Predicted_Close <- predict(model, eurusd)

#Forecasting predicts future price trends using time-series analysis. #The forecasting model uses ARIMA (AutoRegressive Integrated Moving Average) to predict future EUR/USD prices based on historical data patterns. The plotted forecast shows predicted values for the next 10 periods along with confidence intervals, which represent the uncertainty in predictions.

ts_data <- ts(eurusd$Close)

fit <- auto.arima(ts_data)

future <- forecast(fit, h = 10)

plot(future)

#Strong correlations confirm relationships among price variables. #The correlation matrix visualizes the strength and direction of relationships between key variables such as Open, High, Low, Close, and Volume. Strong positive correlations (values close to +1) are observed among Open, High, Low, and Close prices, indicating that these variables move together in a consistent manner. In contrast, Volume may show weaker or moderate correlation with price variables, suggesting it behaves differently.

corr <- cor(data[,c("Open","High","Low","Close","Volume")])

corrplot(
  corr,
  method = "square",
  type = "upper",
  diag = FALSE,
  tl.col = "black",
  tl.srt = 0,
  tl.cex = 1,
  addCoef.col = "black",
  number.cex = 0.8,
  col = colorRampPalette(c("#2166ac", "white", "#b2182b"))(100)
)

#Different trading sessions show variation in market activity and liquidity #The session-wise volume analysis illustrates how trading activity varies across different market sessions—Asia, London, and New York. The bar chart shows the average trading volume for each session, highlighting differences in market participation throughout the day. #insight:Sessions with higher average volume indicate greater market participation and liquidity, while lower volume sessions reflect reduced trading activity. This helps traders identify the most active trading periods

data$day <- lubridate::day(data$Date)

data$session <- ifelse(data$day %% 3 == 0, "Asia",
                 ifelse(data$day %% 3 == 1, "London", "NewYork"))


session_vol <- data %>%
  group_by(session) %>%
  summarise(avg_vol = mean(Volume))

ggplot(session_vol, aes(x = session, y = avg_vol)) +
  geom_col(fill = "#4e79a7", width = 0.6) +
  geom_text(aes(label = round(avg_vol,2)), vjust = -0.5, size = 4) +
  labs(
    title = "Average Volume by Trading Session",
    x = "Session",
    y = "Average Volume"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    axis.text = element_text(size = 11)
  ) +
  expand_limits(y = max(session_vol$avg_vol) * 1.1)

#This project demonstrates the application of statistical analysis, technical indicators, and machine learning in forex markets. RSI and Moving Average provide useful trading insights, while forecasting and regression improve prediction capability. However, market volatility limits prediction accuracy.