In this article, I used stock market data which contains multiple highly correlated variables, making it diffult to analyze without reducing complexity. Dimensionality reduction techniques which i used such as PCA, it technically help to simplifying the stock market data while preserving essential patterns. In this article, I will apply PCA.
There will be few significant topics.
-Impacts of the different stock attributes using PCA loadings. -Potential applications for financial analysts and trades.
In this article dataset will belong one of the most important games’ producers such as; Grand Theft Auto, Red Dead Redemption…
Take-Two Interective dataset was most fit for that article.
To understand our data, we will start to clean dataset.
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
library(FactoMineR)
## Warning: package 'FactoMineR' was built under R version 4.4.2
library(factoextra)
## Warning: package 'factoextra' was built under R version 4.4.2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
take_two_data <- read.csv("TTWO.csv")
pca <- take_two_data %>% select(Open, High, Low, Close,Volume)
pca <- na.omit(pca)
normalizeData <- function(x) {(x-min(x , na.rm = TRUE)) / (max(x, na.rm = TRUE)- min(x, na.rm = TRUE))}
pca <- as.data.frame(lapply(pca,normalizeData))
result <- PCA(pca, graph = FALSE)
fviz_eig(result, addlabels = TRUE, barfill = "steelblue", barcolor = "black") +
ggtitle("Scree Plot of PCA Components") +
theme_minimal()
According to the scree plot, it shows the proportion of variance retained by each principal component. Basically we can select first few component that explain 20-80% of the variance.
result$var$coord
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## Open 0.99983872 -0.012464946 -1.243063e-02 1.875914e-03 3.020986e-03
## High 0.99990507 -0.009549432 -2.036661e-03 -9.166785e-03 -3.237898e-03
## Low 0.99983071 -0.015422098 2.812081e-03 9.081251e-03 -3.215054e-03
## Close 0.99984386 -0.012705769 1.165445e-02 -1.792359e-03 3.432100e-03
## Volume 0.05019802 0.998739283 1.707238e-05 5.319126e-05 7.618834e-07
fviz_pca_var(result, col.var = "contrib")+ggtitle("Contributes to components(PCA)")+theme_minimal()
To understand some of the stocks how they behave in a lower dimensional space we will plot the first two principal components.
fviz_pca_ind(result,label = "none",geom ="point")+ggtitle("TTISD pca")+theme_minimal()
We will focus on to analyze how the first principal component changes over time and we project it onto a time-series plot.
take_two_data$Date <- as.Date(take_two_data$Date)
time <- data.frame(Date = take_two_data$Date, PC1 = result$ind$coord[,1])
ggplot(time, aes(x = Date, y = PC1)) +
geom_line(color = "blue") +
ggtitle("Time-Series Projection of PC1") +
xlab("Date") + ylab("PC1 Value") +
theme_minimal()
time <- data.frame(Date = take_two_data$Date, PC2 = result$ind$coord[,2])
ggplot(time, aes(x = Date, y = PC2)) +
geom_line(color = "blue") +
ggtitle("Time-Series Projection of PC2") +
xlab("Date") + ylab("PC2 Value") +
theme_minimal()
The visualization helps us traders identify trends in stock price movement over time with using PCA.
To sum up, we can accept this study successfully applied PCA for analyzing take two interactive’s stock price movements in time. The key findings include:
-PCA extracted some key components like helping to simplify complex stock market data. -Scree plot helped determine how many components are retained. - Loading of the pca revealed which stock attributes drive variance. - Time series projection of pc1 is highlighted stock trends over in time. - PCA biplot is visualized individual stocks alongside feature contributions in this study.