Course: Data Exploration and Preparation
Course Code: CAP482
Assessment: CA-3
Project Title: Forex Market Analysis using Technical Indicators
Prince Kumar Rohit
Reg No: 12403063
Chaitanya Pandey
Reg No: 12400382
Section: DE553
Group: (G1)
Ms. Ranjit Kaur Walia
Assistant Professor, SCA, LPU
Lovely Faculty of Technology & Sciences
School of Computer Applications
Lovely Professional University, Punjab
This project analyzes forex market data using statistical methods, visualization techniques, and technical indicators such as Moving Average and RSI. It also integrates trading strategies, machine learning, and forecasting methods to generate meaningful insights and support decision-making.
The analysis focuses on understanding price trends, market volatility, and relationships between key variables such as Open, High, Low, Close, and Volume. By combining data analysis with predictive techniques, the project demonstrates how data-driven approaches can enhance trading strategies and provide a deeper understanding of forex market behavior.
#The required libraries are loaded to perform data manipulation, visualization, statistical analysis, and forecasting.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.5.3
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(zoo)
## Warning: package 'zoo' was built under R version 4.5.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(GGally)
## Warning: package 'GGally' was built under R version 4.5.3
library(forecast)
## Warning: package 'forecast' was built under R version 4.5.3
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.5.3
## corrplot 0.95 loaded
#The dataset contains forex data with variables like Date, Open, High, Low, Close, Volume, and Pair.
data <- read.csv("C:/Users/HP/Downloads/forex_5_years_data.csv")
head(data)
## Date Pair Open High Low Close Volume
## 1 2021-01-01 EURUSD 1.20596 1.20928 1.19042 1.19336 152386
## 2 2021-01-02 EURUSD 1.21153 1.21379 1.18937 1.19752 130263
## 3 2021-01-03 EURUSD 1.19056 1.20314 1.17688 1.19066 95658
## 4 2021-01-04 EURUSD 1.17966 1.19232 1.17686 1.19191 129150
## 5 2021-01-05 EURUSD 1.20938 1.21477 1.20777 1.21192 81394
## 6 2021-01-06 EURUSD 1.19465 1.20476 1.17447 1.18811 157313
str(data)
## 'data.frame': 19310 obs. of 7 variables:
## $ Date : chr "2021-01-01" "2021-01-02" "2021-01-03" "2021-01-04" ...
## $ Pair : chr "EURUSD" "EURUSD" "EURUSD" "EURUSD" ...
## $ Open : num 1.21 1.21 1.19 1.18 1.21 ...
## $ High : num 1.21 1.21 1.2 1.19 1.21 ...
## $ Low : num 1.19 1.19 1.18 1.18 1.21 ...
## $ Close : num 1.19 1.2 1.19 1.19 1.21 ...
## $ Volume: int 152386 130263 95658 129150 81394 157313 118555 137121 115758 102606 ...
#Summary statistics provide an overview of the dataset including average values and variability, indicating market volatility.
summary(data)
## Date Pair Open High
## Length:19310 Length:19310 Min. : 0.1417 Min. : 0.1431
## Class :character Class :character 1st Qu.: 0.8627 1st Qu.: 0.8763
## Mode :character Mode :character Median : 1.5055 Median : 1.5285
## Mean : 135.8935 Mean : 138.0268
## 3rd Qu.: 94.3094 3rd Qu.: 95.6762
## Max. :2055.6127 Max. :2101.4820
## Low Close Volume
## Min. : 0.1403 Min. : 0.1421 Min. : 70005
## 1st Qu.: 0.8482 1st Qu.: 0.8617 1st Qu.: 92061
## Median : 1.4799 Median : 1.5076 Median :115053
## Mean : 133.7152 Mean : 135.8532 Mean :114851
## 3rd Qu.: 92.5719 3rd Qu.: 94.1732 3rd Qu.:137364
## Max. :2040.0934 Max. :2068.3385 Max. :159976
#Cleaning ensures accuracy by removing missing and duplicate values and converting dates for time-series analysis.
data <- na.omit(data)
data <- unique(data)
data$Date <- as.Date(data$Date)
#The statistics show central tendency and volatility. Strong correlation indicates dependency between opening and closing prices.
mean(data$Close)
## [1] 135.8532
sd(data$Close)
## [1] 330.325
max(data$Close)
## [1] 2068.338
min(data$Close)
## [1] 0.14205
median(data$Close)
## [1] 1.507585
cor(data$Open, data$Close)
## [1] 0.9998951
#The EUR/USD trend shows fluctuations representing bullish and bearish phases influenced by market conditions. #The EUR/USD price trend exhibits continuous fluctuations over time, indicating the inherently volatile nature of the forex market. Periods of upward movement reflect bullish sentiment where the Euro strengthens against the US Dollar, while downward trends indicate bearish conditions. These movements are influenced by macroeconomic factors such as interest rate changes, geopolitical events, and economic data releases.
eurusd <- data[data$Pair == "EURUSD", ]
ggplot(eurusd, aes(x = Date, y = Close)) +
geom_line(color = "blue") +
ggtitle("EURUSD Price Trend")
#Different currency pairs show varying trends and volatility, indicating
diverse market behavior. #The comparison between EURUSD, GBPUSD, and
USDJPY highlights differences in volatility and trend patterns across
currency pairs. Some pairs exhibit sharper fluctuations, indicating
higher volatility, while others show smoother trends. # insight:This
helps traders compare performance trends across different currency pairs
and identify which pair shows stronger growth or higher volatility.
subset_data <- data[data$Pair %in% c("EURUSD","GBPUSD","USDJPY"), ]
subset_data <- subset_data[order(subset_data$Date), ]
subset_data <- subset_data %>%
group_by(Pair) %>%
mutate(Normalized_Close = Close / first(Close))
ggplot(subset_data, aes(x = Date, y = Normalized_Close, color = Pair)) +
geom_line(linewidth = 1.2) +
labs(title = "Forex Pair Comparison (Normalized)",
subtitle = "All pairs aligned for fair comparison",
x = "Date",
y = "Normalized Price",
color = "Currency Pair") +
scale_color_manual(values = c("#1f77b4", "#ff7f0e", "#2ca02c")) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold"),
legend.position = "top"
)
#The bar chart highlights differences in average prices among currency
pairs. #The bar chart represents the average closing price of each
currency pair over the dataset period. Significant differences between
averages indicate varying valuation levels of currencies. #
insight:Sorting the values helps quickly identify the highest and lowest
performing currency pairs, enabling better decision-making and
comparative analysis.
avg_price <- aggregate(Close ~ Pair, data, mean)
ggplot(avg_price, aes(x = reorder(Pair, Close), y = Close)) +
geom_bar(stat = "identity", fill = "#4e79a7", width = 0.6) +
geom_text(aes(label = round(Close,2)),
hjust = -0.1, size = 4) +
coord_flip() +
labs(title = "Average Closing Price by Currency Pair",
subtitle = "Properly aligned and readable comparison",
x = "Currency Pair",
y = "Average Closing Price") +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 12, hjust = 0.5),
axis.text.y = element_text(size = 11),
axis.text.x = element_text(size = 10),
plot.margin = margin(10, 30, 10, 10)
) +
expand_limits(y = max(avg_price$Close) * 1.1)
#The distribution shows how data is spread across different currency
pairs. #The pie chart shows the proportion of observations for each
currency pair. A balanced distribution ensures that analysis is not
biased toward any specific pair. # insight:An equal distribution
improves the reliability and fairness of the analysis, as results are
not biased toward any specific currency pair. This allows for more
accurate comparison and better generalization of findings across
different forex markets.
pair_df <- as.data.frame(table(data$Pair))
ggplot(pair_df, aes(x = "", y = Freq, fill = Var1)) +
geom_bar(stat = "identity", width = 3, color = "black") +
coord_polar("y") +
geom_text(aes(label = paste0(round(Freq/sum(Freq)*100,1), "%")),
position = position_stack(vjust = 0.5), size = 4) +
labs(title = "Distribution of Currency Pairs",
fill = "Currency Pair") +
scale_fill_viridis_d() +
theme_void() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
legend.position = "right"
)
#A strong positive relationship indicates consistent price movement during trading sessions. #The scatter plot reveals a strong positive correlation between Open and Close prices, indicating that market direction often continues throughout the trading session. #insight:All currency pairs show a strong positive relationship between opening and closing prices, indicating consistent market behavior.
ggplot(data, aes(x = Open, y = Close)) +
geom_point(color = "#2ca02c", alpha = 0.4, size = 1.5) +
geom_smooth(method = "lm", color = "red", se = FALSE) +
facet_wrap(~Pair, scales = "free") +
labs(
title = "Open vs Close Price (By Currency Pair)",
x = "Opening Price",
y = "Closing Price"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
)
## `geom_smooth()` using formula = 'y ~ x'
#Strong correlations between variables validate their importance in
prediction models. The pair plot shows strong linear relationships
between Open, High, Low, and Close variables. This confirms that these
variables move together and are highly interdependent.
library(ggplot2)
ggplot(data, aes(Open, Close)) +
geom_point(alpha = 0.2, color = "blue") +
geom_smooth(method = "lm", color = "red") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
#Outliers indicate sudden price changes due to market events #The boxplot highlights the spread and presence of outliers in price data. Outliers represent sudden spikes or drops, often caused by unexpected market events. #insight:Currency pairs show varying price distributions, with the log scale enabling a balanced comparison despite large differences in price levels.
ggplot(data, aes(x = Pair, y = Close)) +
geom_boxplot(fill = "#4e79a7") +
scale_y_log10() +
coord_flip() +
labs(
title = "Price Distribution by Currency Pair (Log Scale)",
x = "Currency Pair",
y = "Closing Price (Log Scale)"
) +
theme_minimal()
#The plot shows cumulative probability distribution of prices. #The cumulative distribution function shows the probability distribution of closing prices. It indicates how frequently prices fall below a certain level #insight:The ECDF shows how closing prices accumulate over time, helping identify the proportion of values below a given price level.
ggplot(data, aes(x = Close)) +
stat_ecdf(color = "#1f77b4", linewidth = 1) +
labs(
title = "Cumulative Distribution of Closing Prices",
subtitle = "ECDF showing probability distribution",
x = "Closing Price",
y = "Cumulative Probability"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
axis.text = element_text(size = 10)
)
#Higher volume indicates higher liquidity and trading activity. #The
volume distribution shows variation in trading activity across currency
pairs. Higher volume indicates higher liquidity and market
participation. #insight:Trading volume varies across currency pairs, and
the log scale highlights differences in activity levels more
clearly.
ggplot(data, aes(x = Pair, y = Volume)) +
geom_boxplot(fill = "#1f77b4") +
scale_y_log10() +
coord_flip() +
labs(
title = "Volume Distribution by Currency Pair",
x = "Currency Pair",
y = "Volume (Log Scale)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", hjust = 0.5),
legend.position = "none"
)
#Moving average smooths fluctuations and shows overall trend direction.
#The moving average smooths short-term fluctuations and highlights the
underlying trend. When the price stays above the moving average, it
indicates bullish momentum, while falling below indicates bearish
conditions. #insight:The moving average smooths price fluctuations and
helps identify the overall trend direction in the EUR/USD market.
eurusd$MA50 <- zoo::rollmean(eurusd$Close, 50, fill = NA)
ggplot(eurusd, aes(x = Date)) +
geom_line(aes(y = Close), color = "#1f77b4", linewidth = 0.8) +
geom_line(aes(y = MA50), color = "#d62728", linewidth = 1) +
labs(
title = "EUR/USD Price with 50-Day Moving Average",
x = "Date",
y = "Price"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", hjust = 0.5)
)
## Warning: Removed 49 rows containing missing values or values outside the scale range
## (`geom_line()`).
#RSI identifies overbought (>70) and oversold (<30) conditions for
trading decisions. #The RSI indicator measures market momentum and
identifies overbought and oversold conditions. Values above 70 indicate
overbought conditions, suggesting a potential price correction, while
values below 30 indicate oversold conditions, signaling a possible price
rebound. #insight:The RSI indicator identifies overbought (above 70) and
oversold (below 30) conditions, helping detect potential trend reversals
in the market.
RSI <- function(price, n = 14){
delta <- diff(price)
gain <- ifelse(delta > 0, delta, 0)
loss <- ifelse(delta < 0, -delta, 0)
avg_gain <- rollmean(gain, n, fill = NA)
avg_loss <- rollmean(loss, n, fill = NA)
rs <- avg_gain / avg_loss
rsi <- 100 - (100 / (1 + rs))
rsi <- c(NA, rsi)
return(rsi)
}
eurusd$RSI <- RSI(eurusd$Close)
ggplot(eurusd, aes(x = Date, y = RSI)) +
geom_line(color = "blue") +
geom_hline(yintercept = 70, color = "red") +
geom_hline(yintercept = 30, color = "green") +
ggtitle("RSI Indicator")
## Warning: Removed 14 rows containing missing values or values outside the scale range
## (`geom_line()`).
#Trading signals help identify entry and exit points based on market
momentum. #The frequency distribution of signals helps evaluate the
behavior of the market. If Buy and Sell signals occur less frequently
compared to Hold, it indicates that extreme market conditions
(overbought or oversold) are relatively rare. This suggests that the
market spends more time in stable or sideways conditions rather than
strong trending phases.
eurusd$Signal <- ifelse(eurusd$RSI < 30, "Buy",
ifelse(eurusd$RSI > 70, "Sell", "Hold"))
table(eurusd$Signal)
##
## Buy Hold Sell
## 219 1445 253
#The strategy return evaluates performance of trading decisions. #The strategy return calculates the profit or loss generated by applying the RSI-based trading signals. When a Buy signal is generated, the return is equal to the price change, indicating profit if prices rise. When a Sell signal is generated, the return is taken as the negative price change, representing profit when prices fall. Hold signals result in zero return, indicating no trading activity.
eurusd$Return <- c(NA, diff(eurusd$Close))
eurusd$Strategy_Return <- ifelse(eurusd$Signal == "Buy", eurusd$Return,
ifelse(eurusd$Signal == "Sell", -eurusd$Return, 0))
sum(eurusd$Strategy_Return, na.rm = TRUE)
## [1] -7.37135
#Regression model predicts closing price based on market variables. #The linear regression model is used to predict the closing price of EUR/USD based on independent variables such as Open, High, Low, and Volume. The model summary provides important statistical measures including coefficients, R-squared value, and significance levels, which indicate how well the model explains the variation in closing prices.
model <- lm(Close ~ Open + High + Low + Volume, data = eurusd)
summary(model)
##
## Call:
## lm(formula = Close ~ Open + High + Low + Volume, data = eurusd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.125330 -0.013036 -0.000476 0.013364 0.128454
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.060e-03 3.229e-03 0.948 0.3434
## Open -7.141e-02 2.923e-02 -2.443 0.0147 *
## High 5.165e-01 1.968e-02 26.243 <2e-16 ***
## Low 5.542e-01 2.073e-02 26.729 <2e-16 ***
## Volume -4.835e-09 2.267e-08 -0.213 0.8312
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02604 on 1926 degrees of freedom
## Multiple R-squared: 0.9988, Adjusted R-squared: 0.9988
## F-statistic: 3.969e+05 on 4 and 1926 DF, p-value: < 2.2e-16
eurusd$Predicted_Close <- predict(model, eurusd)
#Forecasting predicts future price trends using time-series analysis. #The forecasting model uses ARIMA (AutoRegressive Integrated Moving Average) to predict future EUR/USD prices based on historical data patterns. The plotted forecast shows predicted values for the next 10 periods along with confidence intervals, which represent the uncertainty in predictions.
ts_data <- ts(eurusd$Close)
fit <- auto.arima(ts_data)
future <- forecast(fit, h = 10)
plot(future)
#Strong correlations confirm relationships among price variables. #The
correlation matrix visualizes the strength and direction of
relationships between key variables such as Open, High, Low, Close, and
Volume. Strong positive correlations (values close to +1) are observed
among Open, High, Low, and Close prices, indicating that these variables
move together in a consistent manner. In contrast, Volume may show
weaker or moderate correlation with price variables, suggesting it
behaves differently.
corr <- cor(data[,c("Open","High","Low","Close","Volume")])
corrplot(
corr,
method = "square",
type = "upper",
diag = FALSE,
tl.col = "black",
tl.srt = 0,
tl.cex = 1,
addCoef.col = "black",
number.cex = 0.8,
col = colorRampPalette(c("#2166ac", "white", "#b2182b"))(100)
)
#Different trading sessions show variation in market activity and
liquidity #The session-wise volume analysis illustrates how trading
activity varies across different market sessions—Asia, London, and New
York. The bar chart shows the average trading volume for each session,
highlighting differences in market participation throughout the day.
#insight:Sessions with higher average volume indicate greater market
participation and liquidity, while lower volume sessions reflect reduced
trading activity. This helps traders identify the most active trading
periods
data$day <- lubridate::day(data$Date)
data$session <- ifelse(data$day %% 3 == 0, "Asia",
ifelse(data$day %% 3 == 1, "London", "NewYork"))
session_vol <- data %>%
group_by(session) %>%
summarise(avg_vol = mean(Volume))
ggplot(session_vol, aes(x = session, y = avg_vol)) +
geom_col(fill = "#4e79a7", width = 0.6) +
geom_text(aes(label = round(avg_vol,2)), vjust = -0.5, size = 4) +
labs(
title = "Average Volume by Trading Session",
x = "Session",
y = "Average Volume"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", hjust = 0.5),
axis.text = element_text(size = 11)
) +
expand_limits(y = max(session_vol$avg_vol) * 1.1)
#This project demonstrates the application of statistical analysis,
technical indicators, and machine learning in forex markets. RSI and
Moving Average provide useful trading insights, while forecasting and
regression improve prediction capability. However, market volatility
limits prediction accuracy.