Principal Component Analysis (PCA) on Take-Two Interactive Stock Data: A Dimensionality Reduction Approach

Introduction

In this article, I used stock market data which contains multiple highly correlated variables, making it diffult to analyze without reducing complexity. Dimensionality reduction techniques which i used such as PCA, it technically help to simplifying the stock market data while preserving essential patterns. In this article, I will apply PCA.

There will be few significant topics.

-Impacts of the different stock attributes using PCA loadings. -Potential applications for financial analysts and trades.

  • Some visualitaion of interpretted stock price movements in time.

About Dataset

In this article dataset will belong one of the most important games’ producers such as; Grand Theft Auto, Red Dead Redemption…

Take-Two Interective dataset was most fit for that article.

To understand our data, we will start to clean dataset.

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
library(FactoMineR)
## Warning: package 'FactoMineR' was built under R version 4.4.2
library(factoextra)
## Warning: package 'factoextra' was built under R version 4.4.2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

Load the dataset and relevanting numerical features

take_two_data <- read.csv("TTWO.csv")
pca <- take_two_data %>% select(Open, High, Low, Close,Volume)

Normalizing data and handling missing values of stock market

pca <- na.omit(pca)
normalizeData <- function(x) {(x-min(x , na.rm = TRUE)) / (max(x, na.rm = TRUE)- min(x, na.rm =  TRUE))}
pca <- as.data.frame(lapply(pca,normalizeData))

Now we will apply the PCA results to extracting our principal components from Take two stock market dataset. Variances will be explained by each component determines how much information is retained.

result <- PCA(pca, graph = FALSE)

Plot Interpretation

fviz_eig(result, addlabels = TRUE, barfill = "steelblue", barcolor = "black") +
  ggtitle("Scree Plot of PCA Components") +
  theme_minimal()

According to the scree plot, it shows the proportion of variance retained by each principal component. Basically we can select first few component that explain 20-80% of the variance.

Now we will find out how much each feature contributes to the principal components with pca loading table.

result$var$coord
##             Dim.1        Dim.2         Dim.3         Dim.4         Dim.5
## Open   0.99983872 -0.012464946 -1.243063e-02  1.875914e-03  3.020986e-03
## High   0.99990507 -0.009549432 -2.036661e-03 -9.166785e-03 -3.237898e-03
## Low    0.99983071 -0.015422098  2.812081e-03  9.081251e-03 -3.215054e-03
## Close  0.99984386 -0.012705769  1.165445e-02 -1.792359e-03  3.432100e-03
## Volume 0.05019802  0.998739283  1.707238e-05  5.319126e-05  7.618834e-07
fviz_pca_var(result, col.var = "contrib")+ggtitle("Contributes to components(PCA)")+theme_minimal()

interpretation results of pca components

  • Component 1 : Highly influenced by stock price movements, suggesting it represents overall market sentiment. -Component 2 : Mostly driven by trading volumes and indicating fluctuations ib trading activity.

Visualizing pca result with Biplot for deeper analysis

To understand some of the stocks how they behave in a lower dimensional space we will plot the first two principal components.

fviz_pca_ind(result,label = "none",geom ="point")+ggtitle("TTISD pca")+theme_minimal()

Biplot for deeper analysis

Time series projection of PCA components

We will focus on to analyze how the first principal component changes over time and we project it onto a time-series plot.

take_two_data$Date <- as.Date(take_two_data$Date)
time <- data.frame(Date = take_two_data$Date, PC1 = result$ind$coord[,1])

ggplot(time, aes(x = Date, y = PC1)) +
  geom_line(color = "blue") +
  ggtitle("Time-Series Projection of PC1") +
  xlab("Date") + ylab("PC1 Value") +
  theme_minimal()

time <- data.frame(Date = take_two_data$Date, PC2 = result$ind$coord[,2])
ggplot(time, aes(x = Date, y = PC2)) +
  geom_line(color = "blue") +
  ggtitle("Time-Series Projection of PC2") +
  xlab("Date") + ylab("PC2 Value") +
  theme_minimal()

The visualization helps us traders identify trends in stock price movement over time with using PCA.

To sum up, we can accept this study successfully applied PCA for analyzing take two interactive’s stock price movements in time. The key findings include:

-PCA extracted some key components like helping to simplify complex stock market data. -Scree plot helped determine how many components are retained. - Loading of the pca revealed which stock attributes drive variance. - Time series projection of pc1 is highlighted stock trends over in time. - PCA biplot is visualized individual stocks alongside feature contributions in this study.