1. Brief Info

This is an R Markdown document for Master Thesis in the title of ANALYZING STOCK MARKET BEHAVIOR AND PRICE PREDICTIONS ACROSS GLOBAL MARKETS. The following analysises is conducted in this paper - data selection, data preparation, and explaratory data analysis. The next steps of the analysis such as modeling and results is not included in this file and is implemented in python environment.

2. Library Installation

# Install packages if you haven't already
# install.packages("quantmod")
# install.packages("dplyr")
# install.packages("ggplot2")
# install.packages("tidyquant")

# Load packages into R environment
library(quantmod)
library(dplyr)
library(ggplot2)
library(tidyverse)
library(tidyquant)
library(readxl)

3. Data Import (from stckdatanalysis websource retrieved January 2025)

stocks_details <- read_excel("C:/Users/User/Desktop/Master Thesis/All Stocks.xlsx")
#View(stocks_details)
colnames(stocks_details)
## [1] "No"                 "Symbol"             "Company Name"      
## [4] "Sector"             "Market Cap"         "% Change"          
## [7] "Volume"             "Revenue"            "Market Cap Numeric"

View of the data

head(stocks_details)
## # A tibble: 6 × 9
##      No Symbol `Company Name`      Sector `Market Cap` `% Change` Volume Revenue
##   <dbl> <chr>  <chr>               <chr>  <chr>        <chr>      <chr>  <chr>  
## 1     1 AAPL   Apple Inc.          Techn… 3,698.01B    -0.0014    10694… 391.04B
## 2     1 GE     General Electric C… Indus… 187.38B      0.0039     24289… 69.95B 
## 3     1 LLY    Eli Lilly and Comp… Healt… 707.21B      0.0245     41386… 40.86B 
## 4     1 BRK.B  Berkshire Hathaway… Finan… 973.14B      0.0087     32551  369.89B
## 5     1 WMT    Walmart Inc.        Consu… 730.40B      -0.0056    27341… 673.82B
## 6     1 XOM    Exxon Mobil Corpor… Energ… 477.57B      -0.0135    13925… 343.82B
## # ℹ 1 more variable: `Market Cap Numeric` <dbl>
nrow(stocks_details)
## [1] 3225
ncol(stocks_details)
## [1] 9

Initially 3225 companies are taken from stockanalysis.com

the number of stocks per each sector

stocksbysector <- stocks_details %>% 
  group_by(Sector) %>% 
  summarize(count = n(), .groups = "drop")

stocksbysector
## # A tibble: 6 × 2
##   Sector                 count
##   <chr>                  <int>
## 1 Consumer Staple Sector   245
## 2 Energy Sector            250
## 3 Financial Sector         603
## 4 Healthcare Sector       1164
## 5 Industrials Sector       453
## 6 Technology Sector        510

The 8 companies are deleted because they are not available and facing the fetching problem from yahoo finance

# Remove rows with problematic symbols 
# We need to remove them because they are not availbale in the Yahoo Finance

stocks_details <- stocks_details %>%
  filter(!Symbol %in% c("BRK.B", "PBR.A", "AGM.A", "VG", "AKO.A", "MKC.V","BF.A","DRDB","HEI.A"))

# Print the updated dataframe to confirm the removal
head(stocks_details)
## # A tibble: 6 × 9
##      No Symbol `Company Name`      Sector `Market Cap` `% Change` Volume Revenue
##   <dbl> <chr>  <chr>               <chr>  <chr>        <chr>      <chr>  <chr>  
## 1     1 AAPL   Apple Inc.          Techn… 3,698.01B    -0.0014    10694… 391.04B
## 2     1 GE     General Electric C… Indus… 187.38B      0.0039     24289… 69.95B 
## 3     1 LLY    Eli Lilly and Comp… Healt… 707.21B      0.0245     41386… 40.86B 
## 4     1 WMT    Walmart Inc.        Consu… 730.40B      -0.0056    27341… 673.82B
## 5     1 XOM    Exxon Mobil Corpor… Energ… 477.57B      -0.0135    13925… 343.82B
## 6     2 NVDA   NVIDIA Corporation  Techn… 3,494.48B    -0.0451    16666… 113.27B
## # ℹ 1 more variable: `Market Cap Numeric` <dbl>
# Optionally, confirm the removal by checking the absence of these symbols
cat("Checking for presence of probelmatic symbols in the dataset:\n")
## Checking for presence of probelmatic symbols in the dataset:
if(any(stocks_details$Symbol == "BRK.B") | any(stocks_details$Symbol == "PBR.A")) {
  cat("Problematic Symbols are still present.\n")
} else {
  cat("Problematic Symbols have been successfully removed.\n")
}
## Problematic Symbols have been successfully removed.

4. Defining Market Leaders

### ADDING MARKET LEADER VARIABLE TO THE DATASET BASED ON MEGA and LARGE CAP CRITERION

# Define a new column 'Market Leader' to indicate whether a stock is a Mega Cap
stocks_details <- stocks_details %>%
  mutate(MarketLeader = `Market Cap Numeric` > 200e9)  # 200 billion represented in scientific notation

# Print the updated dataframe to check the new column
head(stocks_details)
## # A tibble: 6 × 10
##      No Symbol `Company Name`      Sector `Market Cap` `% Change` Volume Revenue
##   <dbl> <chr>  <chr>               <chr>  <chr>        <chr>      <chr>  <chr>  
## 1     1 AAPL   Apple Inc.          Techn… 3,698.01B    -0.0014    10694… 391.04B
## 2     1 GE     General Electric C… Indus… 187.38B      0.0039     24289… 69.95B 
## 3     1 LLY    Eli Lilly and Comp… Healt… 707.21B      0.0245     41386… 40.86B 
## 4     1 WMT    Walmart Inc.        Consu… 730.40B      -0.0056    27341… 673.82B
## 5     1 XOM    Exxon Mobil Corpor… Energ… 477.57B      -0.0135    13925… 343.82B
## 6     2 NVDA   NVIDIA Corporation  Techn… 3,494.48B    -0.0451    16666… 113.27B
## # ℹ 2 more variables: `Market Cap Numeric` <dbl>, MarketLeader <lgl>
#view(stocks_details)


### COUNTING MARKET LEADERS IN EACH SECTOR

# Calculate the number of market leaders in each sector
market_leader_count <- stocks_details %>%
  group_by(Sector) %>%
  summarise(NumberOfMarketLeaders = sum(MarketLeader, na.rm = TRUE),
            .groups = 'drop')  # This line ensures that the grouping is dropped after summarise

# Print the table of market leaders count by sector
print(market_leader_count)
## # A tibble: 6 × 2
##   Sector                 NumberOfMarketLeaders
##   <chr>                                  <int>
## 1 Consumer Staple Sector                     5
## 2 Energy Sector                              3
## 3 Financial Sector                           8
## 4 Healthcare Sector                         12
## 5 Industrials Sector                         0
## 6 Technology Sector                         14

Decreasing cutoff from 200B to 100B.

# Rearranging threshold from 200b to 100b 

# Define a new column 'Market Leader' to indicate whether a stock is a Mega Cap
stocks_details <- stocks_details %>%
  mutate(MarketLeader = `Market Cap Numeric` > 100e9)  # 200 billion represented in scientific notation

### COUNTING MARKET LEADERS IN EACH SECTOR

# Calculate the number of market leaders in each sector
market_leader_count <- stocks_details %>%
  group_by(Sector) %>%
  summarise(NumberOfMarketLeaders = sum(MarketLeader, na.rm = TRUE),
            .groups = 'drop')  # This line ensures that the grouping is dropped after summarise

# Print the table of market leaders count by sector
print(market_leader_count)
## # A tibble: 6 × 2
##   Sector                 NumberOfMarketLeaders
##   <chr>                                  <int>
## 1 Consumer Staple Sector                     7
## 2 Energy Sector                              5
## 3 Financial Sector                          23
## 4 Healthcare Sector                         25
## 5 Industrials Sector                        10
## 6 Technology Sector                         33

Decreasing threshold from 100B to 50B.

# Rearranging threshold from 100b to 50b 

# Define a new column 'Market Leader' to indicate whether a stock is a Mega Cap
stocks_details <- stocks_details %>%
  mutate(MarketLeader = `Market Cap Numeric` > 50e9)  # 200 billion represented in scientific notation

### COUNTING MARKET LEADERS IN EACH SECTOR

# Calculate the number of market leaders in each sector
market_leader_count <- stocks_details %>%
  group_by(Sector) %>%
  summarise(NumberOfMarketLeaders = sum(MarketLeader, na.rm = TRUE),
            .groups = 'drop')  # This line ensures that the grouping is dropped after summarise

# Print the table of market leaders count by sector
print(market_leader_count)
## # A tibble: 6 × 2
##   Sector                 NumberOfMarketLeaders
##   <chr>                                  <int>
## 1 Consumer Staple Sector                    15
## 2 Energy Sector                             21
## 3 Financial Sector                          52
## 4 Healthcare Sector                         44
## 5 Industrials Sector                        33
## 6 Technology Sector                         53

Decreasing the threshold from 50B to 10B

# Rearranging threshold from 50b to 10b 

# Define a new column 'Market Leader' to indicate whether a stock is a Mega Cap
stocks_details <- stocks_details %>%
  mutate(MarketLeader = `Market Cap Numeric` > 10e9)  # 200 billion represented in scientific notation

### COUNTING MARKET LEADERS IN EACH SECTOR

# Calculate the number of market leaders in each sector
market_leader_count <- stocks_details %>%
  group_by(Sector) %>%
  summarise(NumberOfMarketLeaders = sum(MarketLeader, na.rm = TRUE),
            .groups = 'drop')  # This line ensures that the grouping is dropped after summarise

# Print the table of market leaders count by sector
print(market_leader_count)
## # A tibble: 6 × 2
##   Sector                 NumberOfMarketLeaders
##   <chr>                                  <int>
## 1 Consumer Staple Sector                    53
## 2 Energy Sector                             50
## 3 Financial Sector                         134
## 4 Healthcare Sector                         85
## 5 Industrials Sector                       114
## 6 Technology Sector                        150

We see the $10 Billion cutoff is proper for this analysis and based on literature it is also seen that it is correct. 10 Billion and above includes companies inside the large cap and mega cap.

### COUNTING MARKET LEADERS AND NON-MARKET LEADERS IN EACH SECTOR

# Calculate the number of market leaders and non-market leaders in each sector
sector_leader_counts <- stocks_details %>%
  group_by(Sector) %>%
  summarise(NumberOfMarketLeaders = sum(MarketLeader, na.rm = TRUE),
            NumberOfNonMarketLeaders = sum(!MarketLeader, na.rm = TRUE),
            .groups = 'drop')  # This line ensures that the grouping is dropped after summarise

# Print the table of market leaders and non-market leaders count by sector
print(sector_leader_counts)
## # A tibble: 6 × 3
##   Sector                 NumberOfMarketLeaders NumberOfNonMarketLeaders
##   <chr>                                  <int>                    <int>
## 1 Consumer Staple Sector                    53                      189
## 2 Energy Sector                             50                      198
## 3 Financial Sector                         134                      466
## 4 Healthcare Sector                         85                     1079
## 5 Industrials Sector                       114                      338
## 6 Technology Sector                        150                      360

Market Leaders are already defined. As a next step the Volatility levels will be defined…

————————————————————————————-

5. Importing all defined stocks from YahooFinace

# Load necessary libraries
library(quantmod)
library(lubridate)

# Ensure all stock symbols from stock_details are unique and non-NA
symbols <- unique(na.omit(stocks_details$Symbol))

# Initialize a list to store the stock data
symbol_data_list <- list()

# Set the current date and start date to 25 years ago
#end_date <- Sys.Date()
#start_date <- end_date %m-% years(25)
# Set the date range explicitly
start_date <- as.Date("2000-01-01")
end_date <- Sys.Date()



# Loop through each symbol and fetch the data
for (symbol in symbols) {
  # Print the symbol being fetched for tracking progress
  cat("Fetching data for:", symbol, "\n")
  
  # Try to fetch the data, continue even if some symbols fail
  tryCatch({
    # Fetch stock data from Yahoo Finance
    stock_data <- getSymbols(symbol, src = 'yahoo', from = start_date, to = end_date, auto.assign = FALSE)
    # Store the data in the list with the symbol as the key
    symbol_data_list[[symbol]] <- stock_data
  }, error = function(e) {
    cat("Error fetching data for symbol:", symbol, "- Error message:", e$message, "\n")
  })
}

All defined stocks are fetched (some of them have not fetched due to of the problems)

# Check the contents of one of the symbols to verify (pick a symbol known to be in the list)
if(length(symbol_data_list) > 0) {
  print(head(symbol_data_list[[names(symbol_data_list)[1]]]))
} else {
  cat("No data fetched. Check symbol validity or connection settings.")
}
##            AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
## 2000-01-03  0.936384  1.004464 0.907924   0.999442   535796800     0.8421506
## 2000-01-04  0.966518  0.987723 0.903460   0.915179   512377600     0.7711489
## 2000-01-05  0.926339  0.987165 0.919643   0.928571   778321600     0.7824333
## 2000-01-06  0.947545  0.955357 0.848214   0.848214   767972800     0.7147226
## 2000-01-07  0.861607  0.901786 0.852679   0.888393   460734400     0.7485783
## 2000-01-10  0.910714  0.912946 0.845982   0.872768   505064000     0.7354123
# Check and print the number of stocks fetched
cat("Number of stocks successfully fetched:", length(symbol_data_list), "\n")
## Number of stocks successfully fetched: 3185
# Optionally, you can also list the symbols for which data has been fetched
if(length(symbol_data_list) > 0) {
  cat("Symbols fetched successfully include:\n")
  print(names(symbol_data_list))
} else {
  cat("No stocks have been successfully fetched.")
}
## Symbols fetched successfully include:
##    [1] "AAPL"  "GE"    "LLY"   "WMT"   "XOM"   "NVDA"  "CAT"   "UNH"   "JPM"  
##   [10] "COST"  "CVX"   "MSFT"  "RTX"   "NVO"   "V"     "PG"    "SHEL"  "AVGO" 
##   [19] "HON"   "JNJ"   "MA"    "KO"    "TTE"   "TSM"   "UNP"   "ABBV"  "BAC"  
##   [28] "PEP"   "COP"   "ORCL"  "ETN"   "MRK"   "WFC"   "PM"    "ENB"   "CRM"  
##   [37] "BA"    "TMO"   "AXP"   "UL"    "ASML"  "DE"    "ABT"   "BX"    "BUD"  
##   [46] "PBR"   "SAP"   "LMT"   "AZN"   "MS"    "MO"    "BP"    "CSCO"  "UPS"  
##   [55] "ISRG"  "GS"    "BTI"   "EOG"   "ACN"   "TT"    "NVS"   "HSBC"  "MDLZ" 
##   [64] "EPD"   "NOW"   "RELX"  "DHR"   "RY"    "CL"    "WMB"   "IBM"   "PH"   
##   [73] "SYK"   "SPGI"  "DEO"   "ET"    "AMD"   "WM"    "BSX"   "BLK"   "TGT"  
##   [82] "KMI"   "ADBE"  "CTAS"  "AMGN"  "HDB"   "MNST"  "CNQ"   "QCOM"  "ITW"  
##   [91] "PFE"   "MUFG"  "KR"    "EQNR"  "TXN"   "TRI"   "SNY"   "PGR"   "KMB"  
##  [100] "OKE"   "INTU"  "MMM"   "BMY"   "C"     "KDP"   "SLB"   "PLTR"  "GD"   
##  [109] "GILD"  "SCHW"  "STZ"   "ARM"   "CP"    "MDT"   "KKR"   "KVUE"  "MPLX" 
##  [118] "ANET"  "TDG"   "VRTX"  "CB"    "SYY"   "LNG"   "AMAT"  "EMR"   "ELV"  
##  [127] "IBN"   "KHC"   "FANG"  "SHOP"  "FDX"   "CI"    "MMC"   "CCEP"  "PSX"  
##  [136] "UBER"  "NOC"   "HCA"   "UBS"   "GIS"   "TRP"   "SONY"  "CNI"   "ZTS"  
##  [145] "SMFG"  "HSY"   "SU"    "ADP"   "RSG"   "MCK"   "TD"    "ABEV"  "MPC"  
##  [154] "MU"    "CSX"   "REGN"  "APO"   "FMX"   "OXY"   "FI"    "CARR"  "BDX"  
##  [163] "PYPL"  "K"     "TRGP"  "PANW"  "PCAR"  "GSK"   "BN"    "EL"    "BKR"  
##  [172] "ADI"   "CPRT"  "CVS"   "MCO"   "CHD"   "HES"   "APP"   "NSC"   "COR"  
##  [181] "ICE"   "ADM"   "VLO"   "MRVL"  "JCI"   "ALC"   "CME"   "TSN"   "E"    
##  [190] "LRCX"  "GWW"   "A"     "PNC"   "MKC"   "IMO"   "KLAC"  "CMI"   "HLN"  
##  [199] "USB"   "EQT"   "INFY"  "VRT"   "TAK"   "AON"   "CLX"   "TPL"   "CRWD" 
##  [208] "PWR"   "EW"    "SAN"   "CQP"   "INTC"  "HWM"   "GEHC"  "AJG"   "HRL"  
##  [217] "WDS"   "MSTR"  "URI"   "ARGX"  "BMO"   "DG"    "CVE"   "DELL"  "WCN"  
##  [226] "IQV"   "COF"   "DLTR"  "EXE"   "APH"   "AXON"  "RMD"   "COIN"  "KOF"  
##  [235] "HAL"   "CDNS"  "AME"   "VEEV"  "BNS"   "USFD"  "CCJ"   "MSI"   "FAST" 
##  [244] "ALNY"  "MFG"   "SFM"   "DVN"   "SNPS"  "DAL"   "IDXX"  "CM"    "PFGC" 
##  [253] "PBA"   "FTNT"  "LHX"   "HUM"   "TFC"   "CAG"   "CTRA"  "WDAY"  "VRSK" 
##  [262] "DXCM"  "BBVA"  "CPB"   "TS"    "TEAM"  "ODFL"  "CNC"   "AFL"   "PRMB" 
##  [271] "YPF"   "ADSK"  "OTIS"  "CAH"   "MET"   "BJ"    "EC"    "TTD"   "IR"   
##  [280] "BNTX"  "BK"    "SJM"   "WES"   "NXPI"  "FERG"  "MTD"   "ARES"  "TAP"  
##  [289] "PAA"   "WAB"   "PHG"   "TRV"   "FTI"   "ROP"   "UAL"   "WST"   "MFC"  
##  [298] "AR"    "COKE"  "SNOW"  "HEI"   "TEVA"  "NU"    "OVV"   "ACI"   "PAYX" 
##  [307] "ROK"   "WAT"   "AMP"   "DTM"   "BG"    "DDOG"  "FER"   "ONC"   "ING"  
##  [316] "VNOM"  "PPC"   "FICO"  "EFX"   "NTRA"  "ALL"   "KNTK"  "EDU"   "TEL"  
##  [325] "GPN"   "ZBH"   "BCS"   "RRC"   "BRBR"  "FIS"   "XYL"   "ILMN"  "ITUB" 
##  [334] "APA"   "LW"    "GLW"   "STE"   "MSCI"  "HESM"  "INGR"  "GRMN"  "DOV"  
##  [343] "BIIB"  "AIG"   "CHRD"  "ELF"   "NET"   "VLTO"  "LH"    "DFS"   "AM"   
##  [352] "FRPT"  "IT"    "HUBB"  "PODD"  "NDAQ"  "MTDR"  "CELH"  "CTSH"  "LII"  
##  [361] "COO"   "PRU"   "SUN"   "OLLI"  "HUBS"  "EME"   "RPRX"  "LYG"   "POST" 
##  [370] "WIT"   "RYAAY" "SMMT"  "NWG"   "DINO"  "BRFS"  "HPQ"   "LUV"   "ALGN" 
##  [379] "HOOD"  "NOV"   "COTY"  "MCHP"  "JBHT"  "MOH"   "ACGL"  "TGS"   "HIMS" 
##  [388] "MPWR"  "DGX"   "DB"    "LB"    "TAL"   "HPE"   "AER"   "BAX"   "OWL"  
##  [397] "NFG"   "DAR"   "ANSS"  "SNA"   "UTHR"  "SLF"   "NE"    "CALM"  "KEYS" 
##  [406] "WSO"   "ICLR"  "RJF"   "WFRD"  "LANC"  "ZS"    "GFL"   "HOLX"  "MTB"  
##  [415] "CHX"   "LOPE"  "ON"    "CSL"   "MRNA"  "HIG"   "VIST"  "LRN"   "GDDY" 
##  [424] "RBA"   "NBIX"  "TW"    "CRK"   "FLO"   "ERIC"  "BLDR"  "AVTR"  "WTW"  
##  [433] "CRC"   "NWL"   "FTV"   "BAH"   "RVTY"  "FCNCA" "WHD"   "IPAR"  "BR"   
##  [442] "J"     "FMS"   "FCNCO" "IEP"   "FIZZ"  "ZM"    "PNR"   "INSM"  "BRO"  
##  [451] "SM"    "GHC"   "GIB"   "FTAI"  "INCY"  "STT"   "MGY"   "SMPL"  "CDW"  
##  [460] "FIX"   "DVA"   "FITB"  "CIVI"  "ATGE"  "GFS"   "XPO"   "ITCI"  "SYF"  
##  [469] "AROC"  "TBBB"  "IOT"   "IEX"   "GMAB"  "TROW"  "GLNG"  "SAM"   "TYL"  
##  [478] "EXPD"  "VTRS"  "LPLA"  "MUR"   "JJSF"  "CPAY"  "MAS"   "SOLV"  "IX"   
##  [487] "CNX"   "RLX"   "NTAP"  "SYM"   "THC"   "HBAN"  "PAGP"  "PSMT"  "NOK"  
##  [496] "ZTO"   "GMED"  "TPG"   "NXE"   "LAUR"  "STM"   "RKLB"  "UHS"   "BAM"  
##  [505] "CRGY"  "NOMD"  "ASX"   "OC"    "TECH"  "CINF"  "NOG"   "TER"   "ACM"  
##  [514] "RDY"   "MKL"   "KGS"   "WDC"   "CNH"   "BMRN"  "WRB"   "RIG"   "CENT" 
##  [523] "PTC"   "TXT"   "PCVX"  "KB"    "LBRT"  "SPB"   "ALAB"  "GGG"   "SRPT" 
##  [532] "RF"    "FRO"   "TR"    "TDY"   "UHAL"  "SNN"   "RKT"   "ARLP"  "STRA" 
##  [541] "UI"    "CW"    "MEDP"  "ERIE"  "PTEN"  "CENTA" "TOST"  "POOL"  "DOCS" 
##  [550] "PUK"   "VRN"   "CCU"   "FSLR"  "ATR"   "IBKR"  "VAL"   "UTZ"   "SMCI" 
##  [559] "SWK"   "PEN"   "NTRS"  "HP"    "CHEF"  "PSTG"  "CLH"   "EXAS"  "CBOE" 
##  [568] "UEC"   "COCO"  "ZBRA"  "CHRW"  "WBA"   "CFG"   "GPOR"  "WMK"   "VRSN" 
##  [577] "RTO"   "QGEN"  "BBD"   "PBF"   "THS"   "CHKP"  "SAIA"  "CG"    "BSM"  
##  [586] "UNFI"  "AFRM"  "ESLT"  "BIO"   "PFG"   "STR"   "PRDO"  "LDOS"  "NDSN" 
##  [595] "EHC"   "L"     "DNUT"  "GRAB"  "ITT"   "HSIC"  "FDS"   "EE"    "VITL" 
##  [604] "STX"   "AAL"   "RGEN"  "TRU"   "TDW"   "GO"    "SSNC"  "NVT"   "HQY"  
##  [613] "CRBG"  "UGP"   "EPC"   "MDB"   "TFII"  "EXEL"  "NMR"   "USAC"  "KLG"  
##  [622] "KSPI"  "MTZ"   "MASI"  "KEY"   "OII"   "FDP"   "DOCU"  "ALLE"  "GKOS" 
##  [631] "SHG"   "CSAN"  "ANDE"  "TRMB"  "TTEK"  "BRKR"  "JEF"   "SDRL"  "UTI"  
##  [640] "JBL"   "WWD"   "CRL"   "SOFI"  "AESI"  "COUR"  "CYBR"  "BWXT"  "TFX"  
##  [649] "RYAN"  "STNG"  "HELE"  "TWLO"  "LECO"  "CHE"   "EG"    "BTU"   "AFYA" 
##  [658] "NTNX"  "RRX"   "ROIV"  "FNF"   "DKL"   "UVV"   "GEN"   "APG"   "TEM"  
##  [667] "EQH"   "BTE"   "FLEX"  "CNM"   "ENSG"  "BAP"   "SOC"   "UDMY"  "MANH" 
##  [676] "AOS"   "ASND"  "BSBR"  "NEXT"  "DOLE"  "UMC"   "ULS"   "RVMD"  "RGA"  
##  [685] "TRMD"  "IMKTA" "AZPN"  "AAON"  "JAZZ"  "ARCC"  "CMBT"  "JBSS"  "DT"   
##  [694] "ARMK"  "BPMC"  "MORN"  "BKV"   "TPB"   "ENTG"  "GNRC"  "MDGL"  "UNM"  
##  [703] "CLMT"  "AGRO"  "COHR"  "AIT"   "BBIO"  "EWBC"  "DNN"   "NGVC"  "AUR"  
##  [712] "PAC"   "HALO"  "RNR"   "CVI"   "AVO"   "FFIV"  "AYI"   "LNTH"  "CNA"  
##  [721] "HPK"   "STKL"  "SWKS"  "RBC"   "LEGN"  "GGAL"  "INSW"  "MGPI"  "BSY"  
##  [730] "BLD"   "WAY"   "HLI"   "SEI"   "DAO"   "OKTA"  "MLI"   "CORT"  "BCH"  
##  [739] "KOS"   "DDL"   "GWRE"  "CRS"   "MMSI"  "AFG"   "TALO"  "HLF"   "AKAM" 
##  [748] "WCC"   "BLCO"  "EVR"   "MNR"   "USNA"  "DUOL"  "WMS"   "CYTK"  "SF"   
##  [757] "WTTR"  "BRCC"  "EPAM"  "CR"    "ELAN"  "ALLY"  "DMLP"  "HNST"  "LOGI" 
##  [766] "STN"   "STVN"  "FUTU"  "WKC"   "SPTN"  "JNPR"  "KNX"   "NUVL"  "FHN"  
##  [775] "DHT"   "WEST"  "JKHY"  "FLR"   "GH"    "AIZ"   "GLP"   "BGS"   "CRDO" 
##  [784] "DRS"   "TLX"   "SEIC"  "VET"   "SENEB" "WIX"   "FBIN"  "GRFS"  "BEN"  
##  [793] "KRP"   "GOTU"  "RBRK"  "GTLS"  "INSP"  "KNSL"  "XPRO"  "SENEA" "CIEN" 
##  [802] "ALK"   "IONS"  "AEG"   "NRP"   "SNDL"  "CLS"   "LTM"   "OPCH"  "BNT"  
##  [811] "TNK"   "HAIN"  "PAYC"  "DCI"   "ALKS"  "WBS"   "HLX"   "FC"    "DAY"  
##  [820] "ATI"   "AXSM"  "GL"    "FLNG"  "LINC"  "MNDY"  "TTC"   "ITGR"  "PRI"  
##  [829] "LEU"   "LMNR"  "PCOR"  "ASR"   "RDNT"  "UWMC"  "RES"   "VLGEA" "KVYO" 
##  [838] "JOBY"  "NARI"  "WAL"   "GEL"   "CVGW"  "PCTY"  "FLS"   "VRNA"  "PNFP" 
##  [847] "EFXT"  "OTLY"  "IONQ"  "SARO"  "TGTX"  "BSAC"  "ACDC"  "CDXC"  "YMM"  
##  [856] "KBR"   "KRYS"  "CFR"   "VTLE"  "LND"   "NICE"  "CAE"   "ICUI"  "ORI"  
##  [865] "DK"    "APEI"  "ENPH"  "LPX"   "OGN"   "BMA"   "NVGS"  "EWCZ"  "SNX"  
##  [874] "HII"   "OSCR"  "WTFC"  "UUUU"  "NUS"   "ESTC"  "DLB"   "ACHC"  "MKTX" 
##  [883] "MRC"   "LWAY"  "GTLB"  "MIDD"  "SRRK"  "CBSH"  "LPG"   "ZVIA"  "MTSI" 
##  [892] "TREX"  "RARE"  "CMA"   "INVX"  "YSG"   "DSGX"  "ESAB"  "BHVN"  "HLNE" 
##  [901] "PUMP"  "ISPR"  "U"     "RHI"   "PRCT"  "ZION"  "VTOL"  "NATR"  "PSN"  
##  [910] "ERJ"   "BTSG"  "CIB"   "BORR"  "AFRI"  "ONTO"  "LOAR"  "ALVO"  "FRHC" 
##  [919] "SBR"   "MAMA"  "DOX"   "AGCO"  "XRAY"  "WF"    "PARR"  "NAMI"  "CFLT" 
##  [928] "FCN"   "SHC"   "IVZ"   "PDS"   "BYND"  "FOUR"  "WTS"   "VKTX"  "SNV"  
##  [937] "CLB"   "GLOB"  "SPXC"  "ADMA"  "AXS"   "GRNT"  "SKIL"  "ALTR"  "AZEK" 
##  [946] "PBH"   "SSB"   "DEC"   "ALCO"  "CACI"  "CWST"  "APLS"  "RLI"   "CAPL" 
##  [955] "WILC"  "DBX"   "R"     "CRSP"  "ONB"   "NESR"  "SKIN"  "APPF"  "NVST" 
##  [964] "PB"    "VTS"   "VSTA"  "TTAN"  "CRNX"  "BOKF"  "REPX"  "BILL"  "MSA"  
##  [973] "PTCT"  "STEP"  "TXO"   "CHGG"  "KD"    "ADT"   "HAE"   "BPOP"  "EU"   
##  [982] "FTLF"  "FN"    "WSC"   "RYTM"  "JHG"   "CLNE"  "PHH"   "PEGA"  "BECN" 
##  [991] "RNA"   "VOYA"  "NPKI"  "HFFG"  "VERX"  "ZWS"   "IRTC"  "JXN"   "NGL"  
## [1000] "ACU"   "INFA"  "VMI"   "ACLX"  "OMF"   "TK"    "CLEU"  "LSCC"  "AWI"  
## [1009] "VCYT"  "CADE"  "NBR"   "DSY"   "OSK"   "IMVT"  "PJT"   "NOA"   "QSG"  
## [1018] "MKSI"  "KEX"   "PRGO"  "MARA"  "NAT"   "BRLS"  "G"     "CSWI"  "TWST" 
## [1027] "FAF"   "PBT"   "COE"   "OTEX"  "LSTR"  "DNLI"  "XP"    "TEN"   "LGCY" 
## [1036] "PATH"  "SITE"  "WRBY"  "FSK"   "GPRK"  "STG"   "OLED"  "BE"    "ACAD" 
## [1045] "MTG"   "TTI"   "BRID"  "S"     "FSS"   "LFST"  "COOP"  "HNRG"  "PAVS" 
## [1054] "WEX"   "SMR"   "XENE"  "SLM"   "GFR"   "IH"    "EXLS"  "GXO"   "AMED" 
## [1063] "OBDC"  "CLCO"  "VFF"   "CCCS"  "GATX"  "CON"   "ESNT"  "EGY"   "DIT"  
## [1072] "HCP"   "AL"    "QDEL"  "COLB"  "SD"    "LSF"   "QRVO"  "AMTM"  "RXRX" 
## [1081] "CACC"  "OBE"   "MYND"  "SMTC"  "HRI"   "MLTX"  "GBCI"  "URG"   "BEDU" 
## [1090] "SPSC"  "GTES"  "KYMR"  "QFIN"  "SMC"   "BRFH"  "CVLT"  "HXL"   "FOLD" 
## [1099] "MC"    "SGU"   "GROV"  "ASTS"  "DY"    "ALHC"  "BBAR"  "PNRG"  "UG"   
## [1108] "AMKR"  "TKR"   "ZLAB"  "CRVL"  "RNGR"  "CLNN"  "SOUN"  "SNDR"  "NVCR" 
## [1117] "SIGI"  "NGS"   "GNS"   "OS"    "STRL"  "BHC"   "AMG"   "BRY"   "FARM" 
## [1126] "CWAN"  "ROAD"  "VCEL"  "HOMB"  "OIS"   "SHOT"  "QXO"   "ACHR"  "MRUS" 
## [1135] "UPST"  "UROY"  "ORIS"  "SATS"  "BBU"   "PDCO"  "THG"   "TBN"   "COOT" 
## [1144] "CGNX"  "AEIS"  "PRVA"  "VIRT"  "REI"   "SOWG"  "NVMI"  "ACA"   "CPRX" 
## [1153] "LNC"   "FTK"   "DTCK"  "NXT"   "MMS"   "AMRX"  "UMBF"  "GTE"   "AACG" 
## [1162] "BMI"   "AVAV"  "SWTX"  "DNB"   "EP"    "WVVI"  "RMBS"  "IESC"  "EWTX" 
## [1171] "FG"    "AMPY"  "NAII"  "WK"    "MATX"  "LIVN"  "FNB"   "WTI"   "MSS"  
## [1180] "LFUS"  "TNET"  "ARWR"  "MAIN"  "NC"    "MTEX"  "LITE"  "MSM"   "SGRY" 
## [1189] "PFSI"  "KGEI"  "RAY"   "ARW"   "EXPO"  "JANX"  "FFIN"  "FET"   "FEDU" 
## [1198] "LYFT"  "FELE"  "SEM"   "UBSI"  "DLNG"  "SANW"  "QTWO"  "KTOS"  "NAMS" 
## [1207] "VLY"   "SJT"   "JVA"   "TSEM"  "SKYW"  "NEOG"  "OZK"   "EPM"   "LOCL" 
## [1216] "SAIC"  "PRIM"  "AZTA"  "ACT"   "PROP"  "SDOT"  "VNT"   "KAI"   "CLOV" 
## [1225] "WTM"   "BROG"  "BOF"   "CRUS"  "RXO"   "TNDM"  "ESGR"  "PHX"   "EDTK" 
## [1234] "SRAD"  "CBZ"   "CGON"  "HWC"   "AMTX"  "YHC"   "NOVT"  "VRRM"  "ADUS" 
## [1243] "RDN"   "EPSN"  "RMCF"  "SITM"  "BCO"   "HCM"   "FCFS"  "MMLP"  "YQ"   
## [1252] "PONY"  "SPR"   "IBRX"  "PIPR"  "PTLE"  "NCRA"  "ACIW"  "GEO"   "MESO" 
## [1261] "AGO"   "GEOS"  "DDC"   "GDS"   "GVA"   "APGE"  "SFBS"  "DTI"   "EEIQ" 
## [1270] "RGTI"  "CPA"   "MIRM"  "CNS"   "ABVE"  "QLYS"  "UNF"   "PTGX"  "BGC"  
## [1279] "SLNG"  "VINE"  "VRNS"  "ENS"   "PACS"  "LAZ"   "SND"   "SBEV"  "NSIT" 
## [1288] "MDU"   "LMAT"  "BWIN"  "LSE"   "CTCX"  "IDCC"  "MIR"   "WGS"   "ABCB" 
## [1297] "KAVL"  "ITRI"  "NPO"   "LGND"  "KMPR"  "IMPP"  "WAFU"  "FRSH"  "ECG"  
## [1306] "CNTA"  "RIOT"  "KLXE"  "IMG"   "INTA"  "OMAB"  "HRMY"  "AB"    "VOC"  
## [1315] "RKDA"  "INGM"  "BRC"   "PINC"  "GBDC"  "NCSM"  "IBG"   "ASAN"  "CXT"  
## [1324] "VERA"  "AX"    "PED"   "TWG"   "BRZE"  "MWA"   "CERT"  "IBOC"  "RCON" 
## [1333] "AMBO"  "TENB"  "KFY"   "SUPN"  "VCTR"  "CRT"   "ATPC"  "CLBT"  "GFF"  
## [1342] "CNMD"  "FLG"   "USEG"  "GV"    "ALGM"  "REZI"  "SLNO"  "ASB"   "NINE" 
## [1351] "LXEH"  "AI"    "ATMU"  "BEAM"  "CNO"   "PVL"   "GSUN"  "BDC"   "CAAP" 
## [1360] "UFPT"  "GSHD"  "DWSN"  "EDBL"  "AVT"   "HAYW"  "IDYA"  "NNI"   "PRT"  
## [1369] "CASK"  "RELY"  "GMS"   "TMDX"  "BANF"  "NRT"   "BTTR"  "BOX"   "ABM"  
## [1378] "ARDT"  "UCB"   "PXS"   "EEFT"  "FA"    "OMCL"  "AUB"   "VIVK"  "AGRI" 
## [1387] "ZETA"  "ATKR"  "IART"  "MCY"   "INDO"  "AQB"   "ST"    "TEX"   "ATRC" 
## [1396] "TCBI"  "TOPS"  "TCTM"  "PLXS"  "CAR"   "AGIO"  "WU"    "CGBS"  "SNAX" 
## [1405] "PI"    "LUNR"  "NEO"   "FULT"  "BATL"  "SISI"  "SLAB"  "TRN"   "IOVA" 
## [1414] "TFSL"  "BANL"  "XXII"  "SANM"  "POWL"  "GDRX"  "EBC"   "CKX"   "FAMI" 
## [1423] "CAMT"  "ATS"   "PGNY"  "FIBK"  "MXC"   "GNLN"  "GBTG"  "HAFN"  "TARS" 
## [1432] "IFS"   "SKYQ"  "TANH"  "PYCR"  "NSP"   "BLTE"  "CATY"  "HUSA"  "STKH" 
## [1441] "CORZ"  "ENVX"  "WVE"   "APAM"  "BRN"   "PAY"   "PLUG"  "AKRO"  "SNEX" 
## [1450] "MTR"   "NCNO"  "MGRC"  "ASTH"  "FHB"   "BPT"   "RUM"   "HUBG"  "GERN" 
## [1459] "HGTY"  "MARPS" "ASGN"  "MAN"   "SDGR"  "CBU"   "EONR"  "YOU"   "JBLU" 
## [1468] "TXG"   "HTGC"  "TPET"  "WRD"   "AZZ"   "TVTX"  "CLSK"  "BLKB"  "MRCY" 
## [1477] "AMPH"  "WD"    "DXC"   "ENOV"  "TDOC"  "WSFS"  "CLVT"  "ZIM"   "IRON" 
## [1486] "FHI"   "PAYO"  "ENR"   "DVAX"  "BFH"   "FORM"  "MNKD"  "FBP"   "ALIT" 
## [1495] "CXW"   "IMCR"  "GNW"   "BL"    "SEB"   "PHR"   "CVBF"  "POWI"  "MYRG" 
## [1504] "ARQT"  "BUR"   "ALKT"  "NPWR"  "GPCR"  "BHF"   "ZI"    "HNI"   "FTRE" 
## [1513] "BKU"   "AGYS"  "SXI"   "CLDX"  "NMIH"  "FROG"  "ARCB"  "BCRX"  "PLMR" 
## [1522] "ESE"   "WERN"  "GLPG"  "BOH"   "DLO"   "EPAC"  "ATEC"  "PRK"   "LIF"  
## [1531] "VSTS"  "EVO"   "SFNC"  "DV"    "ICFI"  "AGL"   "INDB"  "IPGP"  "HI"   
## [1540] "INDV"  "BANC"  "AMBA"  "AIR"   "PRAX"  "WAFD"  "DOCN"  "ALG"   "AAPG" 
## [1549] "LMND"  "SYNA"  "AGX"   "NHC"   "LPL"   "XMTR"  "NVAX"  "ENVA"  "RNG"  
## [1558] "WOR"   "VIR"   "IREN"  "AVPT"  "CDLR"  "VRDN"  "FFBC"  "DIOD"  "HURN" 
## [1567] "INMD"  "TBBK"  "GRND"  "GBX"   "KNSA"  "TOWN"  "CNXC"  "VSEC"  "MRVI" 
## [1576] "AVAL"  "TDC"   "ULCC"  "NRIX"  "SPNT"  "BTDR"  "HLMN"  "DYN"   "PFS"  
## [1585] "ALRM"  "KMT"   "XNCR"  "HUT"   "FIVN"  "PCT"   "SPRY"  "PPBI"  "PRGS" 
## [1594] "CMPR"  "GRDN"  "WULF"  "OSIS"  "GOGL"  "AHCO"  "MRX"   "KLIC"  "ROCK" 
## [1603] "USPH"  "GCMG"  "RUN"   "PRG"   "RXST"  "FBK"   "PAR"   "ALGT"  "ARDX" 
## [1612] "BANR"  "APPN"  "BWLP"  "MDXG"  "FRME"  "TTMI"  "SBLK"  "BLFS"  "RNST" 
## [1621] "ODD"   "HEES"  "AORT"  "LION"  "STNE"  "EVEX"  "SYRE"  "SBCF"  "KC"   
## [1630] "REVG"  "OCUL"  "NBTB"  "RPD"   "CODI"  "ARVN"  "TRMK"  "LSPD"  "DSGR" 
## [1639] "DAWN"  "WSBC"  "ACLS"  "DAC"   "SNDX"  "LU"    "NATL"  "TNC"   "LQDA" 
## [1648] "AGM"   "BB"    "CMRE"  "MD"    "EFSC"  "SWI"   "VVX"   "STAA"  "SYBT" 
## [1657] "VSH"   "HLIO"  "RCUS"  "CALX"  "KRNT"  "CDNA"  "TFIN"  "FLYW"  "AMRC" 
## [1666] "HROW"  "TRUP"  "VICR"  "ATSG"  "ADPT"  "PWP"   "VIAV"  "BV"    "WEAV" 
## [1675] "TSLX"  "EXTR"  "CMPO"  "INVA"  "INTR"  "QUBT"  "TGI"   "MYGN"  "OFG"  
## [1684] "EVTC"  "SFL"   "EVH"   "FIHL"  "CXM"   "NMM"   "OMI"   "HG"    "QBTS" 
## [1693] "WLFC"  "AUPH"  "PSEC"  "AVDX"  "DNOW"  "ANIP"  "CIFR"  "APLD"  "TILE" 
## [1702] "PCRX"  "HTH"   "IBTA"  "APOG"  "NTLA"  "SUPV"  "VYX"   "BLBD"  "RCKT" 
## [1711] "LC"    "WNS"   "LZ"    "PLSE"  "SKWD"  "PAGS"  "PBI"   "EMBC"  "CASH" 
## [1720] "MQ"    "CDRE"  "TLRY"  "STC"   "PLUS"  "LNN"   "ELVN"  "FBNC"  "RAMP" 
## [1729] "DXPE"  "AMN"   "PAX"   "EVCM"  "NSSC"  "COLL"  "LOB"   "DAVA"  "TPC"  
## [1738] "REPL"  "GB"    "MRTN"  "BGM"   "FINV"  "ROG"   "PL"    "ABCL"  "LKFN" 
## [1747] "SIMO"  "HTZ"   "PHVS"  "CHCO"  "JAMF"  "NVEE"  "ZYME"  "BBUC"  "CNXN" 
## [1756] "CRAI"  "OPK"   "FCF"   "KN"    "EOSE"  "BKDT"  "NWBI"  "SPT"   "UP"   
## [1765] "BKD"   "MBIN"  "GDYN"  "NNE"   "HSTM"  "CLBK"  "SONO"  "AMSC"  "FNA"  
## [1774] "NBHC"  "UCTT"  "ARLO"  "GRAL"  "NTB"   "NABL"  "RDW"   "GYRE"  "TWFG" 
## [1783] "MXL"   "CECO"  "PNTG"  "NIC"   "NN"    "ULH"   "OCS"   "HMN"   "SEMR" 
## [1792] "CTOS"  "AVAH"  "CUBI"  "DFIN"  "ASPN"  "COGT"  "VRTS"  "BHE"   "NX"   
## [1801] "ESTA"  "PX"    "VRNT"  "BBSI"  "AVBP"  "STEL"  "VECO"  "CCEC"  "EOLS" 
## [1810] "SASR"  "UPBD"  "LMB"   "BCYC"  "HOPE"  "PD"    "JBI"   "AVXL"  "SEZL" 
## [1819] "DBD"   "CMCO"  "PAHC"  "NAVI"  "PLAB"  "KFRC"  "CVAC"  "GSBD"  "CTS"  
## [1828] "DLX"   "NUVB"  "STBA"  "MTTR"  "GRC"   "PRTA"  "VBTX"  "INFN"  "THR"  
## [1837] "ETNB"  "SRCE"  "HLIT"  "TRNS"  "TYRA"  "TCBK"  "NTCT"  "EH"    "BFLY" 
## [1846] "WT"    "TH"    "HCSG"  "WABC"  "ADEA"  "CRESY" "CBLL"  "BLX"   "AAOI" 
## [1855] "PRLB"  "SEPN"  "DCOM"  "MLNK"  "DCO"   "RLAY"  "QCRH"  "JKS"   "FWRD" 
## [1864] "AXGN"  "BUSE"  "TASK"  "GIC"   "RBCAA" "KARO"  "FBYD"  "IRMD"  "CET"  
## [1873] "SPNS"  "HSII"  "QURE"  "MFIC"  "VNET"  "VLRS"  "OPT"   "OCSL"  "HIMX" 
## [1882] "HY"    "AVDL"  "BY"    "CSGS"  "ERII"  "SANA"  "TIGR"  "AMPL"  "FIP"  
## [1891] "CRMD"  "EIG"   "DQ"    "HTLD"  "DH"    "NMFC"  "FSLY"  "SERV"  "CSTL" 
## [1900] "SAFT"  "SABR"  "SNCY"  "CTKB"  "HCI"   "FORTY" "BXC"   "ORIC"  "BHLB" 
## [1909] "DCBO"  "PSIX"  "CRON"  "CCB"   "ATEN"  "SKYH"  "OFIX"  "GABC"  "ENFN" 
## [1918] "SPLP"  "TECX"  "PFBC"  "VSAT"  "EVTL"  "AVNS"  "ROOT"  "COHU"  "MATW" 
## [1927] "SIBN"  "BCSF"  "OLO"   "GSL"   "DNA"   "ECPG"  "CRCT"  "ADSE"  "SLP"  
## [1936] "FSUN"  "NYAX"  "GLDD"  "LENZ"  "FBMS"  "SMWB"  "SWIM"  "CAPR"  "PEBO" 
## [1945] "RCAT"  "ASTE"  "SENS"  "SII"   "SCSC"  "WNC"   "VERV"  "IGIC"  "AOSL" 
## [1954] "MEG"   "IMTX"  "CSWC"  "DSP"   "MVST"  "PLRX"  "BOW"   "ICHR"  "CVLG" 
## [1963] "BVS"   "OCFC"  "INOD"  "ECO"   "KIDS"  "OBK"   "TIXT"  "ZIP"   "EBS"  
## [1972] "BRKL"  "PDFS"  "NVRI"  "BCAX"  "AMAL"  "COMM"  "NPK"   "AUTL"  "BBDC" 
## [1981] "VTEX"  "JELD"  "ABUS"  "BFC"   "DAVE"  "MAGN"  "DNTH"  "AMSF"  "XRX"  
## [1990] "ZJK"   "IMNM"  "TMP"   "BELFA" "PLPC"  "KURA"  "LX"    "VMEO"  "BYRN" 
## [1999] "PHAR"  "SBSI"  "DGII"  "GNK"   "CRGX"  "BRDG"  "EXOD"  "BLDP"  "CGEM" 
## [2008] "CTBI"  "PENG"  "RGR"   "LAB"   "PFLT"  "ARRY"  "RYI"   "AUNA"  "NCDL" 
## [2017] "SEDG"  "EVLV"  "EYPT"  "CGBD"  "SHLS"  "CASS"  "IRWD"  "AMTB"  "PSFE" 
## [2026] "ATRO"  "HUMA"  "PFC"   "BELFB" "FLX"   "CCRN"  "BHRB"  "PRO"   "SBC"  
## [2035] "VREX"  "SLRC"  "DMRC"  "EBF"   "SVRA"  "FUFU"  "BBAI"  "ASC"   "TMCI" 
## [2044] "GHLD"  "ACMR"  "WLDN"  "STOK"  "MTAL"  "WOLF"  "KODK"  "ACCD"  "CNOB" 
## [2053] "BLND"  "KELYB" "UPB"   "TRIN"  "ML"    "KELYA" "ERAS"  "FMBH"  "TUYA" 
## [2062] "IIIN"  "SRDX"  "BITF"  "INDI"  "ACCO"  "TALK"  "UVSP"  "RDWR"  "GHM"  
## [2071] "FLGT"  "HFWA"  "ETWO"  "AIRJ"  "AKBA"  "ATLC"  "CRNC"  "NWPX"  "TBPH" 
## [2080] "PRAA"  "CSIQ"  "KE"    "XERS"  "OSBC"  "PRTH"  "SWBI"  "SMLR"  "PRA"  
## [2089] "CINT"  "BWMN"  "ANAB"  "MCBS"  "BASE"  "EAF"   "MNMD"  "NBN"   "GCT"  
## [2098] "BOC"   "DCTH"  "TYG"   "HCKT"  "TWI"   "RAPP"  "EGBN"  "YEXT"  "TRC"  
## [2107] "INNV"  "CFFN"  "AIOT"  "OFLX"  "MXCT"  "AC"    "CRSR"  "ACTG"  "PROK" 
## [2116] "NBBK"  "RZLV"  "CYRX"  "DCGO"  "TCPC"  "IIIV"  "FREY"  "ALT"   "CPF"  
## [2125] "DJCO"  "RR"    "MLYS"  "TIPT"  "DAKT"  "SHYF"  "RNAC"  "FCBC"  "RSKD" 
## [2134] "AMPX"  "ANGO"  "CFB"   "CEVA"  "NL"    "CLPT"  "BFST"  "NTGR"  "LNZA" 
## [2143] "MGTX"  "ABL"   "MLAB"  "SB"    "OMER"  "FDUS"  "NVTS"  "PAMT"  "NNOX" 
## [2152] "EQBK"  "ADTN"  "BLDE"  "KROS"  "IBCP"  "BBCP"  "QTRX"  "SLQT"  "PGY"  
## [2161] "SPIR"  "PHAT"  "CCAP"  "OUST"  "MTRX"  "ARCT"  "MBWM"  "SKYT"  "LXFR" 
## [2170] "PACB"  "UFCS"  "PUBM"  "PKOH"  "SIGA"  "AACT"  "RPAY"  "QUAD"  "ABSI" 
## [2179] "ORRF"  "KULR"  "ASLE"  "GHRS"  "GSBC"  "OSPN"  "MEC"   "TRDA"  "HBNC" 
## [2188] "IMOS"  "TITN"  "YI"    "NOAH"  "SSYS"  "SATL"  "MREO"  "HAFC"  "YALA" 
## [2197] "CIX"   "CYH"   "HBT"   "CTLP"  "MTW"   "TRML"  "LPRO"  "IMXI"  "RLGT" 
## [2206] "ORGO"  "OPY"   "ITRN"  "ARQ"   "OLMA"  "EZPW"  "CGNT"  "RGP"   "ANNX" 
## [2215] "MCB"   "CNDT"  "FSTR"  "NPCE"  "AMRK"  "CAN"   "TATT"  "PSNL"  "GLAD" 
## [2224] "MGIC"  "PKE"   "SAGE"  "VINP"  "NOVA"  "ORN"   "URGN"  "SMBC"  "HKD"  
## [2233] "NVX"   "CVRX"  "FSBC"  "CLMB"  "FORR"  "SNDA"  "INV"   "LASR"  "MG"   
## [2242] "ETON"  "ESQ"   "LYTS"  "AMBI"  "ZVRA"  "HIPO"  "AEHR"  "FCEL"  "TKNO" 
## [2251] "OPFI"  "RXT"   "TG"    "KALV"  "VEL"   "NNDM"  "PANL"  "CELC"  "CAC"  
## [2260] "BIGC"  "ESEA"  "NRC"   "TRST"  "FARO"  "FLYX"  "KOD"   "WRLD"  "PRCH" 
## [2269] "AZUL"  "PRTC"  "CCBG"  "CLFD"  "GENC"  "ATXS"  "CION"  "DDD"   "QSI"  
## [2278] "ALTI"  "RDVT"  "TCMD"  "WASH"  "MITK"  "KMDA"  "LDI"   "ARQQ"  "ESPR" 
## [2287] "MOFG"  "XPER"  "EHAB"  "ACIC"  "ATOM"  "PGEN"  "AMBC"  "CCSI"  "ALLO" 
## [2296] "HTBI"  "MTLS"  "ABVX"  "UVE"   "BAND"  "PRME"  "HTBK"  "AIP"   "CDXS" 
## [2305] "HIFS"  "CRNT"  "ZIMV"  "SPFI"  "UIS"   "NYXH"  "MPB"   "POET"  "AURA" 
## [2314] "BTBT"  "MEI"   "RIGL"  "PGC"   "ALNT"  "RGNX"  "NVEC"  "VALN"  "THFF" 
## [2323] "API"   "OABI"  "TRAK"  "TREE"  "LSAK"  "TERN"  "QD"    "ILLR"  "AHG"  
## [2332] "SMBK"  "KLTR"  "CERS"  "FISI"  "ARBE"  "ENGN"  "GDOT"  "OOMA"  "CMPX" 
## [2341] "BCAL"  "BKKT"  "ALMS"  "SHBI"  "GILT"  "MBX"   "FFWM"  "LGTY"  "LXRX" 
## [2350] "FMNB"  "IBEX"  "CMRX"  "DGICA" "HCAT"  "PFIS"  "EGHT"  "KRRO"  "CCNE" 
## [2359] "BKSY"  "TRVI"  "PSBD"  "EB"    "NMRA"  "DGICB" "FRGT"  "CRVS"  "GBLI" 
## [2368] "MVIS"  "GLUE"  "MSBI"  "SMRT"  "TBRG"  "HONE"  "BLZE"  "SPOK"  "NFBK" 
## [2377] "VUZI"  "VMD"   "GAIN"  "NRDY"  "TNGX"  "ALRS"  "VLN"   "SMTI"  "GLRE" 
## [2386] "TSSI"  "ZBIO"  "VALU"  "VPG"   "XOMA"  "EBTC"  "LAW"   "TSHA"  "PNNT" 
## [2395] "EXFY"  "VYGR"  "VBNK"  "AUDC"  "ALDX"  "BHB"   "PERF"  "IMMP"  "CBNK" 
## [2404] "ASUR"  "ATYR"  "GCBC"  "AEVA"  "OSUR"  "ANSC"  "IMMR"  "AIRS"  "AROW" 
## [2413] "ONTF"  "CADL"  "HIVE"  "DOMO"  "CTNM"  "EQV"   "PDYN"  "ELMD"  "AAM"  
## [2422] "ICG"   "DRTS"  "KRNY"  "TSAT"  "AQST"  "TCBX"  "ALLT"  "DSGN"  "WDH"  
## [2431] "ATGL"  "YMAB"  "ITIC"  "SVCO"  "ITOS"  "YRD"   "MAPS"  "SOPH"  "DHIL" 
## [2440] "RZLT"  "FFIC"  "HRTX"  "NRIM"  "BDMD"  "UNTY"  "INGN"  "NETD"  "VSTM" 
## [2449] "RWAY"  "CGC"   "BSVN"  "CATX"  "ASA"   "ACIU"  "BSRR"  "DMAC"  "CARE" 
## [2458] "SLN"   "WALD"  "ACRS"  "XYF"   "AVIR"  "GNTY"  "ZYXI"  "CCIX"  "CMPS" 
## [2467] "MFH"   "ATAI"  "BMRC"  "ELDN"  "SCM"   "MPLN"  "FMAO"  "AMRN"  "FBIZ" 
## [2476] "VNDA"  "HRZN"  "CCCC"  "GPAT"  "MNPR"  "ALF"   "BNTC"  "MBAV"  "RCEL" 
## [2485] "BWB"   "AMLX"  "MBI"   "THRD"  "CWBC"  "SGMO"  "HBCP"  "LUNG"  "RBB"  
## [2494] "NAUT"  "WTBA"  "HITI"  "RRBI"  "ANIK"  "USCB"  "TVGN"  "PMTS"  "LFCR" 
## [2503] "FRBA"  "NUTX"  "HRTG"  "ARAY"  "ACNB"  "ABEO"  "RM"    "LRMR"  "GIG"  
## [2512] "OCGN"  "NEWT"  "CRDF"  "SAR"   "DRUG"  "GOCO"  "FHTX"  "VACH"  "FULC" 
## [2521] "HYAC"  "NGNE"  "FSBW"  "PRQR"  "CCIR"  "LFMD"  "CIVB"  "SERA"  "NODK" 
## [2530] "CGEN"  "SSBK"  "FDMT"  "CUB"   "MDWD"  "BEAG"  "UTMD"  "SIMA"  "ACB"  
## [2539] "COFS"  "PROF"  "EHTH"  "GOSS"  "TPVG"  "IVA"   "OBT"   "CDTX"  "MLAC" 
## [2548] "BWAY"  "SFST"  "AVR"   "ALDF"  "MGNX"  "POLE"  "OBIO"  "VCIC"  "ZYBT" 
## [2557] "PDLB"  "MYO"   "FNLC"  "CYBN"  "FOA"   "BTMD"  "GRAF"  "OGI"   "GSRT" 
## [2566] "SLRN"  "INBK"  "ACTU"  "HOND"  "MOLN"  "CZFS"  "BIOA"  "LIEN"  "NVRO" 
## [2575] "LPBB"  "INBX"  "BCML"  "PHLT"  "PBFS"  "KRMD"  "SNFCA" "IKT"   "JMSB" 
## [2584] "INMB"  "LPAA"  "PROC"  "CZNC"  "STXS"  "BRBS"  "LXEO"  "NECB"  "HURA" 
## [2593] "FRST"  "VOR"   "HIT"   "PLX"   "PCB"   "GNFT"  "CBAN"  "VXRT"  "LNKB" 
## [2602] "VERU"  "TDAC"  "LYEL"  "PLBC"  "NKTX"  "ALEC"  "FDBC"  "INFU"  "FLIC" 
## [2611] "GLSI"  "MVBF"  "ADCT"  "GHI"   "FENC"  "HLXB"  "ACRV"  "SBXD"  "JYNT" 
## [2620] "SAMG"  "NBTX"  "BACQ"  "AMWL"  "SBT"   "STRO"  "SCPH"  "CLLS"  "NKTR" 
## [2629] "BMEA"  "RAPT"  "VTYX"  "CSBR"  "ZNTL"  "TLSI"  "KYTX"  "TIL"   "AKYA" 
## [2638] "FATE"  "ADAP"  "IPHA"  "BDSX"  "TELO"  "SGHT"  "MCRB"  "CKPT"  "PBYI" 
## [2647] "ARTV"  "CHRS"  "TCRX"  "RENB"  "SEER"  "BDTX"  "LCTX"  "ZJYL"  "NOTV" 
## [2656] "IFRX"  "CRBU"  "MIST"  "SGMT"  "GNLX"  "TSVT"  "PDEX"  "QIPT"  "SLDB" 
## [2665] "CRBP"  "ANRO"  "NVCT"  "CABA"  "CLYM"  "ZOM"   "SAVA"  "CRDL"  "EUDA" 
## [2674] "ZTEK"  "ENTA"  "ATOS"  "SRTS"  "EDIT"  "ALGS"  "ZURA"  "ECOR"  "XBIT" 
## [2683] "MGX"   "LNSR"  "ELUT"  "EPRX"  "ALLK"  "TELA"  "MLSS"  "CAMP"  "ACHV" 
## [2692] "FONR"  "PETS"  "GUTS"  "IGMS"  "OPRX"  "ABOS"  "COYA"  "MGRM"  "FBRX" 
## [2701] "TNYA"  "PYXS"  "HLVX"  "SY"    "NTRB"  "MDXH"  "CUE"   "DBVT"  "ASMB" 
## [2710] "TARA"  "ADVM"  "MNOV"  "IMUX"  "ORMP"  "MASS"  "ANIX"  "INZY"  "ME"   
## [2719] "VIGL"  "CTOR"  "RVPH"  "ENTX"  "KPTI"  "IMAB"  "DERM"  "TLSA"  "RANI" 
## [2728] "JSPR"  "ADAG"  "RBOT"  "SKYE"  "AGEN"  "ELTX"  "LPTX"  "VTGN"  "CBUS" 
## [2737] "XFOR"  "RGLS"  "XTNT"  "SCLX"  "EDAP"  "ASRT"  "ALXO"  "MRSN"  "THTX" 
## [2746] "AVTX"  "INCR"  "INO"   "EPIX"  "ARMP"  "AADI"  "GALT"  "IPSC"  "ANL"  
## [2755] "ACET"  "GNTA"  "HBIO"  "BLUE"  "OKUR"  "BNR"   "AVTE"  "MODV"  "APLT" 
## [2764] "STIM"  "NSPR"  "VANI"  "OWLT"  "IKNA"  "HYPR"  "TNXP"  "RMTI"  "SLS"  
## [2773] "PIII"  "PMVP"  "XGN"   "CLSD"  "WOK"   "QNCX"  "CTMX"  "BYSI"  "FORA" 
## [2782] "ICCM"  "ANVS"  "CNTX"  "MURA"  "PRE"   "PRLD"  "OPTN"  "VRCA"  "ABP"  
## [2791] "IMRX"  "ICAD"  "PEPG"  "KRON"  "UNCY"  "SGN"   "GBIO"  "RGC"   "IOBT" 
## [2800] "HOWL"  "CCEL"  "NEUE"  "ONCY"  "NVNO"  "FBLG"  "GANX"  "RPID"  "LTRN" 
## [2809] "PDSB"  "VNRX"  "TOI"   "OSTX"  "CTSO"  "IVVD"  "BEAT"  "IMMX"  "CCLD" 
## [2818] "CELU"  "APYX"  "STTK"  "CNTB"  "BOLD"  "MODD"  "OVID"  "RPTX"  "FGEN" 
## [2827] "PTHL"  "OMIC"  "ALVR"  "FBIO"  "LSB"   "LUCD"  "OTLK"  "SHLT"  "MAIA" 
## [2836] "RNTX"  "ACHL"  "ATRA"  "SPRO"  "JUNS"  "KZR"   "ICCC"  "OM"    "VTVT" 
## [2845] "SER"   "DYAI"  "QTTB"  "CLGN"  "ITRM"  "TPST"  "CASI"  "ESLA"  "VYNE" 
## [2854] "BRNS"  "SCYX"  "ATNM"  "UBX"   "ANEB"  "DXR"   "GRCE"  "CMMB"  "ELEV" 
## [2863] "DTIL"  "NRXP"  "SCNX"  "SABS"  "ONMD"  "MDAI"  "PASG"  "KALA"  "ANTX" 
## [2872] "RADX"  "INKT"  "NMTC"  "CALC"  "SRZN"  "INTS"  "XCUR"  "RLYB"  "CPIX" 
## [2881] "XLO"   "IRD"   "DHAI"  "ZCMD"  "OKYO"  "NXL"   "CVKD"  "MBOT"  "BIVI" 
## [2890] "OCX"   "ENZ"   "RNXT"  "COCH"  "MOVE"  "POCI"  "SNYR"  "XAIR"  "CGTX" 
## [2899] "IXHL"  "CVM"   "LEXX"  "MLEC"  "LGHL"  "GDTC"  "RAIN"  "AXDX"  "PMN"  
## [2908] "ERNA"  "BCAB"  "CRIS"  "AKTX"  "NSYS"  "CTXR"  "NRXS"  "OMGA"  "IBIO" 
## [2917] "DARE"  "CARA"  "IRIX"  "ATHE"  "DRIO"  "GELS"  "PULM"  "FEMY"  "HOOK" 
## [2926] "DRRX"  "ENLV"  "IGC"   "VSEE"  "CODX"  "NRSN"  "RVP"   "LPCN"  "NXGL" 
## [2935] "LVTX"  "LGVN"  "PLUR"  "SLGL"  "COEP"  "EQ"    "CYTH"  "NXTC"  "VVOS" 
## [2944] "MRKR"  "APRE"  "BLRX"  "ALUR"  "LSTA"  "PTN"   "TENX"  "PYPD"  "SNTI" 
## [2953] "COCP"  "ATHA"  "BFRG"  "CCM"   "AMS"   "TRAW"  "IBO"   "INAB"  "CARM" 
## [2962] "GOVX"  "BOLT"  "IPA"   "KAPA"  "CRVO"  "MEIP"  "MIRA"  "NKGN"  "CDIO" 
## [2971] "IINN"  "AFMD"  "BCTX"  "GLYC"  "STRM"  "BSGM"  "NEPH"  "LFWD"  "BTAI" 
## [2980] "SYBX"  "MTVA"  "COSM"  "CING"  "BRTX"  "FLGC"  "NNVC"  "CANF"  "SPRB" 
## [2989] "XTLB"  "HCWB"  "DWTX"  "NERV"  "PHGE"  "ACXP"  "CHRO"  "ADGM"  "OCEA" 
## [2998] "CLDI"  "LUCY"  "EKSO"  "IMNN"  "AIM"   "TRIB"  "SNSE"  "INDP"  "TXMD" 
## [3007] "PRPH"  "BCLI"  "APTO"  "SSKN"  "BNGO"  "IMRN"  "LYRA"  "EVGN"  "RLMD" 
## [3016] "TLPH"  "KPRX"  "PMCB"  "BIAF"  "AIMD"  "DOMH"  "BMRA"  "CJJD"  "CLRB" 
## [3025] "TSBX"  "MNDR"  "AYTU"  "APLM"  "MBIO"  "JAGX"  "CSCI"  "BCDA"  "ENSC" 
## [3034] "POAI"  "PRPO"  "APDN"  "ELAB"  "HOTH"  "CUTR"  "VRAX"  "PPBT"  "EVOK" 
## [3043] "KLTO"  "MYNZ"  "AEMD"  "SILO"  "SYRA"  "NURO"  "BFRI"  "MHUA"  "RDHL" 
## [3052] "KDLY"  "EDSA"  "CHEK"  "MGRX"  "GLTO"  "KZIA"  "OTRK"  "PSTV"  "APM"  
## [3061] "PAVM"  "NEUP"  "EVAX"  "STRR"  "ICU"   "AWH"   "ADXN"  "VIRX"  "CYCN" 
## [3070] "XBIO"  "IMCC"  "SNGX"  "ABVC"  "EYEN"  "QNTM"  "INBS"  "ALZN"  "SSY"  
## [3079] "ADIL"  "XWEL"  "PHIO"  "CNSP"  "PLRZ"  "ALLR"  "CMND"  "GNPX"  "SBFM" 
## [3088] "SYRS"  "AEON"  "PBM"   "TNON"  "SONN"  "GRI"   "CDT"   "LIXT"  "GTBP" 
## [3097] "TTOO"  "ONVO"  "MBRX"  "NUWE"  "FOXO"  "CERO"  "PRTG"  "SNPX"  "HCTI" 
## [3106] "NCNA"  "GLMD"  "SINT"  "CELZ"  "TOVX"  "ONCO"  "CPHI"  "ARTL"  "SNOA" 
## [3115] "OGEN"  "SLRX"  "RNAZ"  "BPTH"  "ATXI"  "NIVF"  "AMIX"  "NBY"   "VINC" 
## [3124] "ZVSA"  "LIPO"  "NAYA"  "STSS"  "XYLO"  "TTNP"  "HSCS"  "PTPI"  "VTAK" 
## [3133] "TNFA"  "YCBD"  "SHPH"  "WORX"  "RSLS"  "ATNF"  "XRTX"  "QLGN"  "ENVB" 
## [3142] "MTNB"  "KTTA"  "THAR"  "INM"   "PTIX"  "NDRA"  "ICCT"  "SCNI"  "TCRT" 
## [3151] "NLSP"  "TCBP"  "BDRX"  "BBLG"  "HSDT"  "VERO"  "LGMK"  "DRMA"  "BJDX" 
## [3160] "ENTO"  "APVO"  "AKAN"  "CYCC"  "TIVC"  "PCSA"  "SPRC"  "ISPC"  "SXTP" 
## [3169] "AZTR"  "PALI"  "PRFX"  "QNRX"  "AVGR"  "ADTX"  "WINT"  "GCTK"  "NAOV" 
## [3178] "SXTC"  "BACK"  "VRPX"  "SCPX"  "HEPA"  "SLXN"  "REVB"  "ACON"

3 example companies to verify the fetched data.

# VIEW FOR 3 EXAMPLE STOCKS

# Print the first few rows of data for Apple, Microsoft, and Alphabet
if("AAPL" %in% names(symbol_data_list)) {
  cat("Data for Apple (AAPL):\n")
  print(head(symbol_data_list[["AAPL"]]))
} else {
  cat("Data for AAPL not fetched or does not exist.\n")
}
## Data for Apple (AAPL):
##            AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
## 2000-01-03  0.936384  1.004464 0.907924   0.999442   535796800     0.8421506
## 2000-01-04  0.966518  0.987723 0.903460   0.915179   512377600     0.7711489
## 2000-01-05  0.926339  0.987165 0.919643   0.928571   778321600     0.7824333
## 2000-01-06  0.947545  0.955357 0.848214   0.848214   767972800     0.7147226
## 2000-01-07  0.861607  0.901786 0.852679   0.888393   460734400     0.7485783
## 2000-01-10  0.910714  0.912946 0.845982   0.872768   505064000     0.7354123
if("MSFT" %in% names(symbol_data_list)) {
  cat("Data for Microsoft (MSFT):\n")
  print(head(symbol_data_list[["MSFT"]]))
} else {
  cat("Data for MSFT not fetched or does not exist.\n")
}
## Data for Microsoft (MSFT):
##            MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
## 2000-01-03  58.68750  59.31250 56.00000   58.28125    53228400      35.79232
## 2000-01-04  56.78125  58.56250 56.12500   56.31250    54119000      34.58323
## 2000-01-05  55.56250  58.18750 54.68750   56.90625    64059600      34.94787
## 2000-01-06  56.09375  56.93750 54.18750   55.00000    54976600      33.77720
## 2000-01-07  54.31250  56.12500 53.65625   55.71875    62013600      34.21859
## 2000-01-10  56.71875  56.84375 55.68750   56.12500    44963600      34.46809
if("MEDP" %in% names(symbol_data_list)) {
  cat("Data for Medpace (MEDP):\n")
  print(head(symbol_data_list[["MEDP"]]))
} else {
  cat("Data for MEDP not fetched or does not exist.\n")
}
## Data for Medpace (MEDP):
##            MEDP.Open MEDP.High MEDP.Low MEDP.Close MEDP.Volume MEDP.Adjusted
## 2016-08-11     28.15    28.740   27.100      27.79     5356300         27.79
## 2016-08-12     27.55    28.500   27.130      28.04      472700         28.04
## 2016-08-15     28.29    29.600   27.907      29.22      620600         29.22
## 2016-08-16     29.56    29.980   29.200      29.32      315200         29.32
## 2016-08-17     29.43    29.790   28.120      28.14      507900         28.14
## 2016-08-18     28.23    28.843   27.720      27.77      330700         27.77

Finding failed stocks in fetching process and eliminate them

# Get the list of successfully fetched symbols
fetched_symbols <- names(symbol_data_list)

# Find symbols that failed to fetch
failed_symbols <- setdiff(symbols, fetched_symbols)

# Print them
cat("Failed to fetch data for the following symbols:\n")
## Failed to fetch data for the following symbols:
print(failed_symbols)
##  [1] "SQ"     "TAP.A"  "ENLC"   "WSO.B"  "AKO.B"  "UHAL.B" "BIO.B"  "CEIX"  
##  [9] "ASAI"   "MOG.B"  "MOG.A"  "SMAR"   "PFIE"   "AE"     "EAST"   "B"     
## [17] "HTLF"   "OBDE"   "ZUO"    "CDMO"   "SCWX"   "CRD.A"  "CRD.B"  "RVNC"  
## [25] "HEAR"   "MRNS"   "QTI"    "RHE"

All stocks are fetched and prepared successfully. # ———————————————————————————–

There are more than 3k stocks fetched but the majority of them do not include data from 2000s and they have missing values. Now, I am implementing the filter and only keep the companies which have available data since 2000.

# Filter stocks that start exactly on 2000-01-03
filtered_symbol_data_list <- list()

for (symbol in names(symbol_data_list)) {
  stock_data <- symbol_data_list[[symbol]]
  
  # Get the first available date
  start_date <- index(stock_data)[1]
  
  # Keep only those starting exactly on 2000-01-03
  if (start_date == as.Date("2000-01-03")) {
    filtered_symbol_data_list[[symbol]] <- stock_data
  }
}

# Show how many passed the filter
cat("✅ Number of stocks starting exactly on 2000-01-03:", length(filtered_symbol_data_list), "\n")
## ✅ Number of stocks starting exactly on 2000-01-03: 967
# Show a few for verification
cat("📅 Start dates preview:\n")
## 📅 Start dates preview:
for (symbol in head(names(filtered_symbol_data_list), 10)) {
  cat(symbol, "starts at:", as.character(index(filtered_symbol_data_list[[symbol]])[1]), "\n")
}
## AAPL starts at: 2000-01-03 
## GE starts at: 2000-01-03 
## LLY starts at: 2000-01-03 
## WMT starts at: 2000-01-03 
## XOM starts at: 2000-01-03 
## NVDA starts at: 2000-01-03 
## CAT starts at: 2000-01-03 
## UNH starts at: 2000-01-03 
## JPM starts at: 2000-01-03 
## COST starts at: 2000-01-03
# Replace the original list if everything looks fine
symbol_data_list <- filtered_symbol_data_list

The number of fetched stocks are more than 3k+, and the number of stocks decreased to 967 after applying filter and take stocks which have start date from January 2000.

————————————————————————

6. Defining the Volatility

Calculating and Defining the volatility of stocks , using Annualized Volatility of recent 1 year.

# Load necessary libraries
library(quantmod)
library(dplyr)
library(lubridate)

# Initialize the volatility data frame
volatility_data <- data.frame(Symbol = character(), 
                              Sector = character(), 
                              MarketLeader = logical(), 
                              Volatility = numeric(), 
                              stringsAsFactors = FALSE)

# Loop through each symbol to calculate 1-year annualized volatility
for (symbol in names(symbol_data_list)) {
  # Extract the stock data
  stock_data <- symbol_data_list[[symbol]]
  
  # Calculate daily returns using logarithmic return
  daily_returns <- dailyReturn(Cl(stock_data), type = 'log')
  
  # Filter the last 1 year of returns data
  last_1_year <- daily_returns[index(daily_returns) >= (Sys.Date() %m-% years(1))]
  
  # Compute annualized volatility
  annualized_volatility <- sd(last_1_year, na.rm = TRUE) * sqrt(252)
  
  # Get sector and market leader status from 'stocks_details'
  stock_info <- stocks_details %>% 
    filter(Symbol == symbol) %>% 
    select(Sector, MarketLeader) %>% 
    slice(1)
  
  # Append to the volatility data frame
  volatility_data <- rbind(volatility_data, data.frame(Symbol = symbol,
                                                       Sector = stock_info$Sector,
                                                       MarketLeader = stock_info$MarketLeader,
                                                       Volatility = annualized_volatility))
}

# Calculate the mean annualized volatility
mean_volatility <- mean(volatility_data$Volatility, na.rm = TRUE)
print(paste("Mean Annualized Volatility (Last 1 Year):", round(mean_volatility, 4)))
## [1] "Mean Annualized Volatility (Last 1 Year): 0.4167"
# View the first few rows of the volatility data frame
head(volatility_data)
##   Symbol                 Sector MarketLeader Volatility
## 1   AAPL      Technology Sector         TRUE  0.3220947
## 2     GE     Industrials Sector         TRUE  0.3511154
## 3    LLY      Healthcare Sector         TRUE  0.3581632
## 4    WMT Consumer Staple Sector         TRUE  0.2475146
## 5    XOM          Energy Sector         TRUE  0.2396942
## 6   NVDA      Technology Sector         TRUE  0.6019358
nrow(volatility_data)
## [1] 967
ncol(volatility_data)
## [1] 4

Average volatility stocks of the 967 stocks is 0.4166 which is also proving that the stock market is one of the safest market to invest.

First division based on the thresholds of (<0.2, <0.5, and >0.5).

# Load the dplyr library if not already loaded
library(dplyr)

# Add a new column for volatility classification
volatility_data <- volatility_data %>%
  mutate(VolatilityClass = case_when(
    Volatility < 0.2 ~ "Stable",
    Volatility >= 0.2 & Volatility < 0.5 ~ "Moderate",
    Volatility >= 0.5 ~ "Volatile"
  ))

# View the first few rows of the updated volatility data frame
head(volatility_data)
##   Symbol                 Sector MarketLeader Volatility VolatilityClass
## 1   AAPL      Technology Sector         TRUE  0.3220947        Moderate
## 2     GE     Industrials Sector         TRUE  0.3511154        Moderate
## 3    LLY      Healthcare Sector         TRUE  0.3581632        Moderate
## 4    WMT Consumer Staple Sector         TRUE  0.2475146        Moderate
## 5    XOM          Energy Sector         TRUE  0.2396942        Moderate
## 6   NVDA      Technology Sector         TRUE  0.6019358        Volatile
# Load the dplyr library if not already loaded
library(dplyr)

# Group by Sector, MarketLeader, and VolatilityClass, then count the number of stocks
sector_leader_volatility_counts <- volatility_data %>%
  group_by(Sector, MarketLeader, VolatilityClass) %>%
  summarise(StockCount = n(), .groups = 'drop')  # 'drop' ensures the grouping is dropped after summarise

# View the grouped and counted data
# Print all rows of the grouped and counted data
print(sector_leader_volatility_counts, n = Inf)  # 'Inf' indicates to print all rows
## # A tibble: 31 × 4
##    Sector                 MarketLeader VolatilityClass StockCount
##    <chr>                  <lgl>        <chr>                <int>
##  1 Consumer Staple Sector FALSE        Moderate                36
##  2 Consumer Staple Sector FALSE        Volatile                12
##  3 Consumer Staple Sector TRUE         Moderate                27
##  4 Consumer Staple Sector TRUE         Stable                  10
##  5 Consumer Staple Sector TRUE         Volatile                 1
##  6 Energy Sector          FALSE        Moderate                30
##  7 Energy Sector          FALSE        Volatile                24
##  8 Energy Sector          TRUE         Moderate                27
##  9 Energy Sector          TRUE         Stable                   2
## 10 Energy Sector          TRUE         Volatile                 1
## 11 Financial Sector       FALSE        Moderate               162
## 12 Financial Sector       FALSE        Stable                   2
## 13 Financial Sector       FALSE        Volatile                 8
## 14 Financial Sector       TRUE         Moderate                66
## 15 Financial Sector       TRUE         Stable                   3
## 16 Healthcare Sector      FALSE        Moderate                59
## 17 Healthcare Sector      FALSE        Stable                   2
## 18 Healthcare Sector      FALSE        Volatile                78
## 19 Healthcare Sector      TRUE         Moderate                25
## 20 Healthcare Sector      TRUE         Stable                   3
## 21 Healthcare Sector      TRUE         Volatile                 2
## 22 Industrials Sector     FALSE        Moderate               108
## 23 Industrials Sector     FALSE        Volatile                37
## 24 Industrials Sector     TRUE         Moderate                73
## 25 Industrials Sector     TRUE         Stable                   2
## 26 Industrials Sector     TRUE         Volatile                 1
## 27 Technology Sector      FALSE        Moderate                69
## 28 Technology Sector      FALSE        Volatile                29
## 29 Technology Sector      TRUE         Moderate                53
## 30 Technology Sector      TRUE         Stable                   2
## 31 Technology Sector      TRUE         Volatile                13

Next and Proper division for the analysis 0.25 and below stable stocks and 0.25 above volatile stocks

# Reclassify volatility using updated thresholds
volatility_data <- volatility_data %>%
  mutate(VolatilityClass = case_when(
    Volatility <= 0.25 ~ "Stable",
    Volatility > 0.25 ~ "Volatile"
  ))
# Recalculate stock counts based on the new VolatilityClass
sector_leader_volatility_counts <- volatility_data %>%
  group_by(Sector, MarketLeader, VolatilityClass) %>%
  summarise(StockCount = n(), .groups = 'drop')

# Print all rows of the grouped and counted data
print(sector_leader_volatility_counts, n = Inf)
## # A tibble: 24 × 4
##    Sector                 MarketLeader VolatilityClass StockCount
##    <chr>                  <lgl>        <chr>                <int>
##  1 Consumer Staple Sector FALSE        Stable                   3
##  2 Consumer Staple Sector FALSE        Volatile                45
##  3 Consumer Staple Sector TRUE         Stable                  24
##  4 Consumer Staple Sector TRUE         Volatile                14
##  5 Energy Sector          FALSE        Stable                   2
##  6 Energy Sector          FALSE        Volatile                52
##  7 Energy Sector          TRUE         Stable                   7
##  8 Energy Sector          TRUE         Volatile                23
##  9 Financial Sector       FALSE        Stable                  11
## 10 Financial Sector       FALSE        Volatile               161
## 11 Financial Sector       TRUE         Stable                  22
## 12 Financial Sector       TRUE         Volatile                47
## 13 Healthcare Sector      FALSE        Stable                  11
## 14 Healthcare Sector      FALSE        Volatile               128
## 15 Healthcare Sector      TRUE         Stable                  10
## 16 Healthcare Sector      TRUE         Volatile                20
## 17 Industrials Sector     FALSE        Stable                   6
## 18 Industrials Sector     FALSE        Volatile               139
## 19 Industrials Sector     TRUE         Stable                  20
## 20 Industrials Sector     TRUE         Volatile                56
## 21 Technology Sector      FALSE        Stable                   1
## 22 Technology Sector      FALSE        Volatile                97
## 23 Technology Sector      TRUE         Stable                  13
## 24 Technology Sector      TRUE         Volatile                55

—————————————————————————————–

7. EXPLORATORY DATA ANALYSIS

This table shows average volatility by each sector ##### Table 1

# Load necessary library
library(dplyr)

# Calculate average annualized volatility score by sector
average_volatility_by_sector <- volatility_data %>%
  group_by(Sector) %>%
  summarise(AverageVolatility = mean(Volatility, na.rm = TRUE))

# View the result
print(average_volatility_by_sector)
## # A tibble: 6 × 2
##   Sector                 AverageVolatility
##   <chr>                              <dbl>
## 1 Consumer Staple Sector             0.342
## 2 Energy Sector                      0.430
## 3 Financial Sector                   0.330
## 4 Healthcare Sector                  0.583
## 5 Industrials Sector                 0.385
## 6 Technology Sector                  0.447
Plot 1

This plot is bar plot version of the table 1 and showing same thing with table 1 which is average volatility score by each sector of comapnies.

# plot also for this: .....
# Load necessary libraries
library(ggplot2)
library(dplyr)

# Create a bar plot with black background and professional style
ggplot(average_volatility_by_sector, aes(x = reorder(Sector, AverageVolatility), y = AverageVolatility, fill = Sector)) +
  geom_bar(stat = "identity", color = "gray30", show.legend = FALSE) +  # Bars with light gray borders
  geom_text(aes(label = round(AverageVolatility, 2)), vjust = -0.5, color = "white", size = 4) +  # White labels on top
  theme_minimal(base_size = 14) +  # Use minimal theme with a professional font size
  theme(
    panel.background = element_rect(fill = "#121212"),  # Darker background for the plot panel
    plot.background = element_rect(fill = "#121212"),   # Dark background for the whole plot
    axis.text.x = element_text(color = "white"),  # White axis text
    axis.text.y = element_text(color = "white"),  # White axis text
    axis.title.x = element_text(color = "white", size = 14),  # White axis titles
    axis.title.y = element_text(color = "white", size = 14),  # White axis titles
    plot.title = element_text(color = "white", size = 16, face = "bold", hjust = 0.5),  # Title centered with bold style
    panel.grid.major = element_line(color = "gray40", size = 0.5),  # Light gray major grid lines
    panel.grid.minor = element_line(color = "gray40", size = 0.25),  # Light gray minor grid lines
    axis.ticks = element_line(color = "white"),  # White axis ticks
    axis.line = element_line(color = "white")  # White axis line
  ) +
  labs(
    title = "Average Volatility by Sector",
    x = "Sector",
    y = "Average Volatility"
  ) +
  coord_flip()  # Flip the coordinates for a more professional horizontal bar chart

Plot 2

In this code chunk we can observe again average volatility by each sector but this time, it is also groupped by market leaders and non market leaders.

# 2. Calculate average volatility score by sector and market leader status
average_volatility_by_sector_leader <- volatility_data %>%
  group_by(Sector, MarketLeader) %>%
  summarise(AverageVolatility = mean(Volatility, na.rm = TRUE), .groups = 'drop')

# View the result
print(average_volatility_by_sector_leader)
## # A tibble: 12 × 3
##    Sector                 MarketLeader AverageVolatility
##    <chr>                  <lgl>                    <dbl>
##  1 Consumer Staple Sector FALSE                    0.410
##  2 Consumer Staple Sector TRUE                     0.256
##  3 Energy Sector          FALSE                    0.495
##  4 Energy Sector          TRUE                     0.315
##  5 Financial Sector       FALSE                    0.349
##  6 Financial Sector       TRUE                     0.280
##  7 Healthcare Sector      FALSE                    0.642
##  8 Healthcare Sector      TRUE                     0.313
##  9 Industrials Sector     FALSE                    0.427
## 10 Industrials Sector     TRUE                     0.304
## 11 Technology Sector      FALSE                    0.490
## 12 Technology Sector      TRUE                     0.387
# plot for this: ...... 
# Load necessary libraries for plotting
library(ggplot2)

# Plot 2.1: Market Leaders' average volatility
market_leader_plot <- average_volatility_by_sector_leader %>%
  filter(MarketLeader == TRUE) %>%
  ggplot(aes(x = Sector, y = AverageVolatility, fill = Sector)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  geom_text(aes(label = round(AverageVolatility, 2)), vjust = -0.3, size = 5, fontface = "bold") +
  theme_minimal(base_size = 15) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 12),
        plot.background = element_rect(fill = "black", color = "black"),
        panel.background = element_rect(fill = "black"),
        axis.text = element_text(color = "white"),
        axis.title = element_text(color = "white"),
        plot.title = element_text(color = "white", size = 16)) +
  labs(title = "Market Leaders' Average Volatility",
       x = "Sector",
       y = "Average Volatility")

# Show the plot
print(market_leader_plot)

# Plot 2.2: Non-Market Leaders' average volatility with large labels
non_market_leader_plot <- average_volatility_by_sector_leader %>%
  filter(MarketLeader == FALSE) %>%
  ggplot(aes(x = Sector, y = AverageVolatility, fill = Sector)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  geom_text(aes(label = round(AverageVolatility, 2)), vjust = -0.3, size = 8, fontface = "bold") +
  theme_minimal(base_size = 15) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 14),
        plot.background = element_rect(fill = "black", color = "black"),
        panel.background = element_rect(fill = "black"),
        axis.text = element_text(color = "white"),
        axis.title = element_text(color = "white"),
        plot.title = element_text(color = "white", size = 16)) +
  labs(title = "Non-Market Leaders' Average Volatility",
       x = "Sector",
       y = "Average Volatility")

# Show the plot
print(non_market_leader_plot)

Plot 3

This plot showing the number of market leaders and non market leaders in each sector

# Calculate the count of market leaders and non-leaders per sector
market_leader_distribution <- volatility_data %>%
  group_by(Sector, MarketLeader) %>%
  summarise(Count = n(), .groups = 'drop')

# View the result
print(market_leader_distribution)
## # A tibble: 12 × 3
##    Sector                 MarketLeader Count
##    <chr>                  <lgl>        <int>
##  1 Consumer Staple Sector FALSE           48
##  2 Consumer Staple Sector TRUE            38
##  3 Energy Sector          FALSE           54
##  4 Energy Sector          TRUE            30
##  5 Financial Sector       FALSE          172
##  6 Financial Sector       TRUE            69
##  7 Healthcare Sector      FALSE          139
##  8 Healthcare Sector      TRUE            30
##  9 Industrials Sector     FALSE          145
## 10 Industrials Sector     TRUE            76
## 11 Technology Sector      FALSE           98
## 12 Technology Sector      TRUE            68
# Load the necessary library
library(ggplot2)

# Bar plot for market leader distribution
ggplot(market_leader_distribution, aes(x = Sector, y = Count, fill = as.factor(MarketLeader))) +
  geom_bar(stat = "identity", position = "fill", color = "white") +  # stacked to 100%
  scale_fill_manual(values = c("red", "darkblue"), labels = c("Non-Leader", "Leader")) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  theme_minimal(base_size = 16) +
  theme(
    plot.background = element_rect(fill = "black"),
    panel.background = element_rect(fill = "black", color = "black"),
    text = element_text(color = "white"),
    axis.title = element_text(color = "white"),
    axis.text = element_text(color = "white"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.title = element_text(color = "white"),
    legend.text = element_text(color = "white")
  ) +
  labs(title = "Market Leadership Distribution by Sector",
       x = "Sector",
       y = "Percentage of Stocks",
       fill = "Market Leader")

PLOT 4

Volatile Stocks

This plot showing the number of stable companies and risky (volatile) comapnies in each sector

# Calculate the count of volatile and stable stocks per sector
volatility_distribution <- volatility_data %>%
  group_by(Sector, VolatilityClass) %>%
  summarise(Count = n(), .groups = 'drop')

# View the result
print(volatility_distribution)
## # A tibble: 12 × 3
##    Sector                 VolatilityClass Count
##    <chr>                  <chr>           <int>
##  1 Consumer Staple Sector Stable             27
##  2 Consumer Staple Sector Volatile           59
##  3 Energy Sector          Stable              9
##  4 Energy Sector          Volatile           75
##  5 Financial Sector       Stable             33
##  6 Financial Sector       Volatile          208
##  7 Healthcare Sector      Stable             21
##  8 Healthcare Sector      Volatile          148
##  9 Industrials Sector     Stable             26
## 10 Industrials Sector     Volatile          195
## 11 Technology Sector      Stable             14
## 12 Technology Sector      Volatile          152
# Bar plot for volatility distribution
ggplot(volatility_distribution, aes(x = Sector, y = Count, fill = as.factor(VolatilityClass))) +
  geom_bar(stat = "identity", position = "fill", color = "white") +
  scale_fill_manual(values = c("green", "darkblue"), labels = c("Stable", "Volatile")) +
  scale_y_continuous(labels = scales::percent_format()) +
  theme_minimal(base_size = 16) +
  theme(
    plot.background = element_rect(fill = "black"),
    panel.background = element_rect(fill = "black", color = "black"),
    text = element_text(color = "white"),
    axis.title = element_text(color = "white"),
    axis.text = element_text(color = "white"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.title = element_text(color = "white"),
    legend.text = element_text(color = "white")
  ) +
  labs(title = "Volatility Distribution by Sector",
       x = "Sector",
       y = "Percentage",
       fill = "Volatility Class")

Table 5

It is important to see the market gigants which are in the top based on market shares and also important to be aware of the most stable companies in each industry. This table shows the top 3 and bottom 3 companies by each 6 sector.

library(dplyr)
# Merge the market capitalization data with the original volatility data
full_data <- left_join(volatility_data, stocks_details, by = "Symbol")

# Remove duplicate columns after the merge and keep only the necessary ones
full_data <- full_data %>%
  select(-c(Sector.x, MarketLeader.x)) %>%  # Remove Sector.x and MarketLeader.x
  rename(Sector = Sector.y, MarketLeader = MarketLeader.y)  # Rename Sector.y and MarketLeader.y

# Now, find the top 3 and bottom 3 Market Capitalizations per sector
max_min_market_cap <- full_data %>%
  group_by(Sector) %>%
  arrange(Sector, desc(`Market Cap Numeric`)) %>%
  slice_head(n = 3) %>%
  bind_rows(
    full_data %>%
      group_by(Sector) %>%
      arrange(Sector, `Market Cap Numeric`) %>%
      slice_head(n = 3)
  ) %>%
  select(Sector, `Company Name`, Symbol, `Market Cap Numeric`) %>%
  arrange(Sector, `Market Cap Numeric`)

# Print the results for market capitalization
print(max_min_market_cap, n = Inf)
## # A tibble: 36 × 4
## # Groups:   Sector [6]
##    Sector                 `Company Name`             Symbol `Market Cap Numeric`
##    <chr>                  <chr>                      <chr>                 <dbl>
##  1 Consumer Staple Sector Rocky Mountain Chocolate … RMCF               15450000
##  2 Consumer Staple Sector Mannatech, Incorporated    MTEX               23090000
##  3 Consumer Staple Sector Natural Alternatives Inte… NAII               25020000
##  4 Consumer Staple Sector The Procter & Gamble Comp… PG             380390000000
##  5 Consumer Staple Sector Costco Wholesale Corporat… COST           408390000000
##  6 Consumer Staple Sector Walmart Inc.               WMT            730400000000
##  7 Energy Sector          Marine Petroleum Trust     MARPS               8180000
##  8 Energy Sector          BP Prudhoe Bay Royalty Tr… BPT                11660000
##  9 Energy Sector          Mesa Royalty Trust         MTR                11720000
## 10 Energy Sector          Shell plc                  SHEL           202800000000
## 11 Energy Sector          Chevron Corporation        CVX            279720000000
## 12 Energy Sector          Exxon Mobil Corporation    XOM            477570000000
## 13 Financial Sector       Greystone Housing Impact … GHI               259460000
## 14 Financial Sector       The First of Long Island … FLIC              260380000
## 15 Financial Sector       Fidelity D & D Bancorp, I… FDBC              265120000
## 16 Financial Sector       Wells Fargo & Company      WFC            241850000000
## 17 Financial Sector       Bank of America Corporati… BAC            355560000000
## 18 Financial Sector       JPMorgan Chase & Co.       JPM            687570000000
## 19 Healthcare Sector      Windtree Therapeutics, In… WINT                1610000
## 20 Healthcare Sector      Titan Pharmaceuticals, In… TTNP                3310000
## 21 Healthcare Sector      Becton, Dickinson and Com… BDX                 3920000
## 22 Healthcare Sector      Novo Nordisk A/S           NVO            389380000000
## 23 Healthcare Sector      UnitedHealth Group Incorp… UNH            490060000000
## 24 Healthcare Sector      Eli Lilly and Company      LLY            707210000000
## 25 Industrials Sector     Tredegar Corporation       TG                254590000
## 26 Industrials Sector     FuelCell Energy, Inc.      FCEL              266660000
## 27 Industrials Sector     Forrester Research, Inc.   FORR              279240000
## 28 Industrials Sector     RTX Corporation            RTX            152670000000
## 29 Industrials Sector     Caterpillar Inc.           CAT            176130000000
## 30 Industrials Sector     General Electric Company   GE             187380000000
## 31 Technology Sector      Immersion Corporation      IMMR              278860000
## 32 Technology Sector      Asure Software, Inc.       ASUR              288770000
## 33 Technology Sector      AudioCodes Ltd.            AUDC              302490000
## 34 Technology Sector      Microsoft Corporation      MSFT          3175510000000
## 35 Technology Sector      NVIDIA Corporation         NVDA          3494480000000
## 36 Technology Sector      Apple Inc.                 AAPL          3698010000000

Table 6

This table shows the most stable 3 and the most volatile 3 comapnies based on each sector.

# Find the top 3 and bottom 3 Volatility scores per sector
max_min_volatility <- full_data %>%
  group_by(Sector) %>%
  arrange(Sector, desc(Volatility)) %>%
  slice_head(n = 3) %>%
  bind_rows(
    full_data %>%
      group_by(Sector) %>%
      arrange(Sector, Volatility) %>%
      slice_head(n = 3)
  ) %>%
  select(Sector, `Company Name`, Symbol, Volatility) %>%
  arrange(Sector, Volatility)

# Print the results for volatility
print(max_min_volatility, n = Inf)
## # A tibble: 36 × 4
## # Groups:   Sector [6]
##    Sector                 `Company Name`                      Symbol Volatility
##    <chr>                  <chr>                               <chr>       <dbl>
##  1 Consumer Staple Sector The Coca-Cola Company               KO          0.164
##  2 Consumer Staple Sector The Procter & Gamble Company        PG          0.185
##  3 Consumer Staple Sector Altria Group, Inc.                  MO          0.190
##  4 Consumer Staple Sector Newell Brands Inc.                  NWL         0.698
##  5 Consumer Staple Sector Mannatech, Incorporated             MTEX        0.719
##  6 Consumer Staple Sector Lifeway Foods, Inc.                 LWAY        0.749
##  7 Energy Sector          Enbridge Inc.                       ENB         0.174
##  8 Energy Sector          Enterprise Products Partners L.P.   EPD         0.186
##  9 Energy Sector          National Fuel Gas Company           NFG         0.206
## 10 Energy Sector          BP Prudhoe Bay Royalty Trust        BPT         0.881
## 11 Energy Sector          U.S. Energy Corp.                   USEG        0.894
## 12 Energy Sector          Centrus Energy Corp.                LEU         0.962
## 13 Financial Sector       Enstar Group Limited                ESGR        0.146
## 14 Financial Sector       Central Securities Corporation      CET         0.153
## 15 Financial Sector       Marsh & McLennan Companies, Inc.    MMC         0.168
## 16 Financial Sector       MBIA Inc.                           MBI         0.620
## 17 Financial Sector       Banco BBVA Argentina S.A.           BBAR        0.679
## 18 Financial Sector       TeraWulf Inc.                       WULF        1.19 
## 19 Healthcare Sector      Amedisys, Inc.                      AMED        0.137
## 20 Healthcare Sector      Utah Medical Products, Inc.         UTMD        0.191
## 21 Healthcare Sector      Cencora, Inc.                       COR         0.191
## 22 Healthcare Sector      Soligenix, Inc.                     SNGX        1.64 
## 23 Healthcare Sector      Windtree Therapeutics, Inc.         WINT        2.01 
## 24 Healthcare Sector      Kazia Therapeutics Limited          KZIA        2.06 
## 25 Industrials Sector     Waste Connections, Inc.             WCN         0.170
## 26 Industrials Sector     Republic Services, Inc.             RSG         0.178
## 27 Industrials Sector     Waste Management, Inc.              WM          0.202
## 28 Industrials Sector     American Superconductor Corporation AMSC        0.889
## 29 Industrials Sector     Plug Power Inc.                     PLUG        0.918
## 30 Industrials Sector     FuelCell Energy, Inc.               FCEL        1.04 
## 31 Technology Sector      Juniper Networks, Inc.              JNPR        0.146
## 32 Technology Sector      Automatic Data Processing, Inc.     ADP         0.189
## 33 Technology Sector      Amdocs Limited                      DOX         0.202
## 34 Technology Sector      MicroStrategy Incorporated          MSTR        1.02 
## 35 Technology Sector      Innodata Inc.                       INOD        1.21 
## 36 Technology Sector      Wolfspeed, Inc.                     WOLF        1.46

————————————————————————————–

8. DATA SELECTION

Simple View of the Full data

head(full_data)
##   Symbol Volatility VolatilityClass No             Company Name
## 1   AAPL  0.3220947        Volatile  1               Apple Inc.
## 2     GE  0.3511154        Volatile  1 General Electric Company
## 3    LLY  0.3581632        Volatile  1    Eli Lilly and Company
## 4    WMT  0.2475146          Stable  1             Walmart Inc.
## 5    XOM  0.2396942          Stable  1  Exxon Mobil Corporation
## 6   NVDA  0.6019358        Volatile  2       NVIDIA Corporation
##                   Sector Market Cap % Change    Volume Revenue
## 1      Technology Sector  3,698.01B  -0.0014  10694412 391.04B
## 2     Industrials Sector    187.38B   0.0039   2428942  69.95B
## 3      Healthcare Sector    707.21B   0.0245   4138602  40.86B
## 4 Consumer Staple Sector    730.40B  -0.0056   2734176 673.82B
## 5          Energy Sector    477.57B  -0.0135  13925409 343.82B
## 6      Technology Sector  3,494.48B  -0.0451 166664997 113.27B
##   Market Cap Numeric MarketLeader
## 1        3.69801e+12         TRUE
## 2        1.87380e+11         TRUE
## 3        7.07210e+11         TRUE
## 4        7.30400e+11         TRUE
## 5        4.77570e+11         TRUE
## 6        3.49448e+12         TRUE

The columnames of the full data

colnames(full_data)
##  [1] "Symbol"             "Volatility"         "VolatilityClass"   
##  [4] "No"                 "Company Name"       "Sector"            
##  [7] "Market Cap"         "% Change"           "Volume"            
## [10] "Revenue"            "Market Cap Numeric" "MarketLeader"
Random Selection of 3 stocks for each section.

There are 24 sections.So, it is assumed 24*3 = 72 stocks.

# Set the seed for reproducibility
set.seed(42)

# Load required libraries
library(dplyr)

# Make sure your column names are correct (e.g. MarketLeader is logical or character)
full_data <- full_data %>%
  mutate(
    MarketLeader = as.character(MarketLeader), # or as.factor if you prefer
    VolatilityClass = as.character(VolatilityClass),
    Sector = as.character(Sector)
  )

# Group by Sector, MarketLeader, VolatilityClass and sample 3 from each group
selected_stocks <- full_data %>%
  group_by(Sector, MarketLeader, VolatilityClass) %>%
  slice_sample(n = 3) %>%
  ungroup()

# View the result
print(selected_stocks, n = Inf)
## # A tibble: 69 × 12
##    Symbol Volatility VolatilityClass    No `Company Name`    Sector `Market Cap`
##    <chr>       <dbl> <chr>           <dbl> <chr>             <chr>  <chr>       
##  1 INGR        0.248 Stable             58 Ingredion Incorp… Consu… 8.74B       
##  2 TR          0.244 Stable             91 Tootsie Roll Ind… Consu… 2.23B       
##  3 FLO         0.206 Stable             73 Flowers Foods, I… Consu… 4.27B       
##  4 STKL        0.511 Volatile          123 SunOpta Inc.      Consu… 879.90M     
##  5 JJSF        0.284 Volatile           82 J&J Snack Foods … Consu… 2.87B       
##  6 WILC        0.387 Volatile          162 G. Willi-Food In… Consu… 226.03M     
##  7 TSN         0.220 Stable             32 Tyson Foods, Inc. Consu… 20.15B      
##  8 CHD         0.192 Stable             30 Church & Dwight … Consu… 25.07B      
##  9 GIS         0.220 Stable             24 General Mills, I… Consu… 34.15B      
## 10 ADM         0.272 Volatile           31 Archer-Daniels-M… Consu… 24.10B      
## 11 EL          0.529 Volatile           29 The Est茅e Laude… Consu… 27.15B      
## 12 HSY         0.268 Volatile           25 The Hershey Comp… Consu… 33.98B      
## 13 NFG         0.206 Stable             68 National Fuel Ga… Energ… 5.53B       
## 14 SBR         0.232 Stable            156 Sabine Royalty T… Energ… 960.27M     
## 15 NRT         0.519 Volatile          233 North European O… Energ… 42.84M      
## 16 LEU         0.962 Volatile          141 Centrus Energy C… Energ… 1.31B       
## 17 REPX        0.486 Volatile          165 Riley Exploratio… Energ… 693.68M     
## 18 SHEL        0.230 Stable              3 Shell plc         Energ… 202.80B     
## 19 TTE         0.231 Stable              4 TotalEnergies SE  Energ… 132.32B     
## 20 XOM         0.240 Stable              1 Exxon Mobil Corp… Energ… 477.57B     
## 21 EOG         0.289 Volatile           10 EOG Resources, I… Energ… 73.82B      
## 22 WMB         0.259 Volatile           12 The Williams Com… Energ… 72.62B      
## 23 VLO         0.376 Volatile           31 Valero Energy Co… Energ… 44.20B      
## 24 MTG         0.248 Stable            178 MGIC Investment … Finan… 5.97B       
## 25 RLI         0.226 Stable            162 RLI Corp.         Finan… 7.22B       
## 26 ORI         0.209 Stable            147 Old Republic Int… Finan… 8.62B       
## 27 PFBC        0.315 Volatile          364 Preferred Bank    Finan… 1.14B       
## 28 UVSP        0.326 Volatile          401 Univest Financia… Finan… 834.84M     
## 29 GL          0.323 Volatile          140 Globe Life Inc.   Finan… 9.39B       
## 30 RY          0.190 Stable             12 Royal Bank of Ca… Finan… 171.61B     
## 31 CM          0.215 Stable             43 Canadian Imperia… Finan… 59.41B      
## 32 AFL         0.217 Stable             46 Aflac Incorporat… Finan… 57.31B      
## 33 SAN         0.337 Volatile           36 Banco Santander,… Finan… 71.59B      
## 34 ALL         0.256 Volatile           55 The Allstate Cor… Finan… 49.41B      
## 35 BLK         0.254 Volatile           14 BlackRock, Inc.   Finan… 153.97B     
## 36 LH          0.237 Stable             60 Labcorp Holdings… Healt… 20.58B      
## 37 UTMD        0.191 Stable            558 Utah Medical Pro… Healt… 213.40M     
## 38 HOLX        0.224 Stable             71 Hologic, Inc.     Healt… 16.03B      
## 39 INCY        0.345 Volatile           78 Incyte Corporati… Healt… 13.98B      
## 40 CYTH        0.948 Volatile          918 Cyclo Therapeuti… Healt… 23.62M      
## 41 ASRT        0.718 Volatile          720 Assertio Holding… Healt… 80.49M      
## 42 SYK         0.220 Stable             13 Stryker Corporat… Healt… 150.32B     
## 43 NVS         0.195 Stable             11 Novartis AG       Healt… 199.45B     
## 44 ABT         0.206 Stable              8 Abbott Laborator… Healt… 216.86B     
## 45 CNMD        0.409 Volatile          228 CONMED Corporati… Healt… 2.12B       
## 46 NVO         0.426 Volatile            3 Novo Nordisk A/S  Healt… 389.38B     
## 47 RDNT        0.451 Volatile          142 RadNet, Inc.      Healt… 4.74B       
## 48 NPK         0.248 Stable            379 National Presto … Indus… 674.53M     
## 49 MSA         0.234 Stable            166 MSA Safety Incor… Indus… 6.30B       
## 50 RGR         0.248 Stable            387 Sturm, Ruger & C… Indus… 579.79M     
## 51 WCC         0.444 Volatile          128 WESCO Internatio… Indus… 8.95B       
## 52 FSTR        0.544 Volatile          439 L.B. Foster Comp… Indus… 289.71M     
## 53 MTRX        0.501 Volatile          426 Matrix Service C… Indus… 345.54M     
## 54 GGG         0.227 Stable             91 Graco Inc.        Indus… 13.88B      
## 55 UNP         0.232 Stable              5 Union Pacific Co… Indus… 140.40B     
## 56 HON         0.235 Stable              4 Honeywell Intern… Indus… 143.98B     
## 57 LECO        0.336 Volatile          112 Lincoln Electric… Indus… 10.49B      
## 58 FIX         0.603 Volatile           79 Comfort Systems … Indus… 15.99B      
## 59 MMM         0.334 Volatile           18 3M Company        Indus… 72.06B      
## 60 DOX         0.202 Stable            157 Amdocs Limited    Techn… 9.52B       
## 61 MKSI        0.608 Volatile          174 MKS Instruments,… Techn… 7.69B       
## 62 PRGS        0.315 Volatile          282 Progress Softwar… Techn… 2.76B       
## 63 UPBD        0.386 Volatile          337 Upbound Group, I… Techn… 1.61B       
## 64 PAYX        0.227 Stable             53 Paychex, Inc.     Techn… 50.17B      
## 65 FI          0.246 Stable             28 Fiserv, Inc.      Techn… 116.64B     
## 66 GIB         0.219 Stable             78 CGI Inc.          Techn… 24.48B      
## 67 ADBE        0.375 Volatile           15 Adobe Inc.        Techn… 188.72B     
## 68 HPQ         0.395 Volatile           65 HP Inc.           Techn… 31.96B      
## 69 INTU        0.334 Volatile           18 Intuit Inc.       Techn… 173.34B     
## # ℹ 5 more variables: `% Change` <chr>, Volume <chr>, Revenue <chr>,
## #   `Market Cap Numeric` <dbl>, MarketLeader <chr>

It is assumed to select 72 stocks however, the selected stocks became 69 because 1 section has 1 stocks, and 1 section has 2 stocks although, it is assumed to have 3 for each section.

library(dplyr)

# 1. See all unique company names
unique_companies <- unique(selected_stocks$`Company Name`)
print(unique_companies)
##  [1] "Ingredion Incorporated"                
##  [2] "Tootsie Roll Industries, Inc."         
##  [3] "Flowers Foods, Inc."                   
##  [4] "SunOpta Inc."                          
##  [5] "J&J Snack Foods Corp."                 
##  [6] "G. Willi-Food International Ltd."      
##  [7] "Tyson Foods, Inc."                     
##  [8] "Church & Dwight Co., Inc."             
##  [9] "General Mills, Inc."                   
## [10] "Archer-Daniels-Midland Company"        
## [11] "The Est茅e Lauder Companies Inc."      
## [12] "The Hershey Company"                   
## [13] "National Fuel Gas Company"             
## [14] "Sabine Royalty Trust"                  
## [15] "North European Oil Royalty Trust"      
## [16] "Centrus Energy Corp."                  
## [17] "Riley Exploration Permian, Inc."       
## [18] "Shell plc"                             
## [19] "TotalEnergies SE"                      
## [20] "Exxon Mobil Corporation"               
## [21] "EOG Resources, Inc."                   
## [22] "The Williams Companies, Inc."          
## [23] "Valero Energy Corporation"             
## [24] "MGIC Investment Corporation"           
## [25] "RLI Corp."                             
## [26] "Old Republic International Corporation"
## [27] "Preferred Bank"                        
## [28] "Univest Financial Corporation"         
## [29] "Globe Life Inc."                       
## [30] "Royal Bank of Canada"                  
## [31] "Canadian Imperial Bank of Commerce"    
## [32] "Aflac Incorporated"                    
## [33] "Banco Santander, S.A."                 
## [34] "The Allstate Corporation"              
## [35] "BlackRock, Inc."                       
## [36] "Labcorp Holdings Inc."                 
## [37] "Utah Medical Products, Inc."           
## [38] "Hologic, Inc."                         
## [39] "Incyte Corporation"                    
## [40] "Cyclo Therapeutics, Inc."              
## [41] "Assertio Holdings, Inc."               
## [42] "Stryker Corporation"                   
## [43] "Novartis AG"                           
## [44] "Abbott Laboratories"                   
## [45] "CONMED Corporation"                    
## [46] "Novo Nordisk A/S"                      
## [47] "RadNet, Inc."                          
## [48] "National Presto Industries, Inc."      
## [49] "MSA Safety Incorporated"               
## [50] "Sturm, Ruger & Company, Inc."          
## [51] "WESCO International, Inc."             
## [52] "L.B. Foster Company"                   
## [53] "Matrix Service Company"                
## [54] "Graco Inc."                            
## [55] "Union Pacific Corporation"             
## [56] "Honeywell International Inc."          
## [57] "Lincoln Electric Holdings, Inc."       
## [58] "Comfort Systems USA, Inc."             
## [59] "3M Company"                            
## [60] "Amdocs Limited"                        
## [61] "MKS Instruments, Inc."                 
## [62] "Progress Software Corporation"         
## [63] "Upbound Group, Inc."                   
## [64] "Paychex, Inc."                         
## [65] "Fiserv, Inc."                          
## [66] "CGI Inc."                              
## [67] "Adobe Inc."                            
## [68] "HP Inc."                               
## [69] "Intuit Inc."
# 2. Create the summary table
company_table <- selected_stocks %>%
  select(Sector, MarketLeader, VolatilityClass, `Company Name`, Symbol) %>%
  mutate(
    `Market leadership status` = ifelse(MarketLeader == TRUE, "Leader", "Non-Leader"),
    `Volatility status` = VolatilityClass
  ) %>%
  select(Sector, `Market leadership status`, `Volatility status`, `Company Name`, Symbol) %>%
  distinct()  # remove duplicates if any

# View the summary table
print(company_table, n = Inf)
## # A tibble: 69 × 5
##    Sector       Market leadership st…¹ `Volatility status` `Company Name` Symbol
##    <chr>        <chr>                  <chr>               <chr>          <chr> 
##  1 Consumer St… Non-Leader             Stable              Ingredion Inc… INGR  
##  2 Consumer St… Non-Leader             Stable              Tootsie Roll … TR    
##  3 Consumer St… Non-Leader             Stable              Flowers Foods… FLO   
##  4 Consumer St… Non-Leader             Volatile            SunOpta Inc.   STKL  
##  5 Consumer St… Non-Leader             Volatile            J&J Snack Foo… JJSF  
##  6 Consumer St… Non-Leader             Volatile            G. Willi-Food… WILC  
##  7 Consumer St… Leader                 Stable              Tyson Foods, … TSN   
##  8 Consumer St… Leader                 Stable              Church & Dwig… CHD   
##  9 Consumer St… Leader                 Stable              General Mills… GIS   
## 10 Consumer St… Leader                 Volatile            Archer-Daniel… ADM   
## 11 Consumer St… Leader                 Volatile            The Est茅e La… EL    
## 12 Consumer St… Leader                 Volatile            The Hershey C… HSY   
## 13 Energy Sect… Non-Leader             Stable              National Fuel… NFG   
## 14 Energy Sect… Non-Leader             Stable              Sabine Royalt… SBR   
## 15 Energy Sect… Non-Leader             Volatile            North Europea… NRT   
## 16 Energy Sect… Non-Leader             Volatile            Centrus Energ… LEU   
## 17 Energy Sect… Non-Leader             Volatile            Riley Explora… REPX  
## 18 Energy Sect… Leader                 Stable              Shell plc      SHEL  
## 19 Energy Sect… Leader                 Stable              TotalEnergies… TTE   
## 20 Energy Sect… Leader                 Stable              Exxon Mobil C… XOM   
## 21 Energy Sect… Leader                 Volatile            EOG Resources… EOG   
## 22 Energy Sect… Leader                 Volatile            The Williams … WMB   
## 23 Energy Sect… Leader                 Volatile            Valero Energy… VLO   
## 24 Financial S… Non-Leader             Stable              MGIC Investme… MTG   
## 25 Financial S… Non-Leader             Stable              RLI Corp.      RLI   
## 26 Financial S… Non-Leader             Stable              Old Republic … ORI   
## 27 Financial S… Non-Leader             Volatile            Preferred Bank PFBC  
## 28 Financial S… Non-Leader             Volatile            Univest Finan… UVSP  
## 29 Financial S… Non-Leader             Volatile            Globe Life In… GL    
## 30 Financial S… Leader                 Stable              Royal Bank of… RY    
## 31 Financial S… Leader                 Stable              Canadian Impe… CM    
## 32 Financial S… Leader                 Stable              Aflac Incorpo… AFL   
## 33 Financial S… Leader                 Volatile            Banco Santand… SAN   
## 34 Financial S… Leader                 Volatile            The Allstate … ALL   
## 35 Financial S… Leader                 Volatile            BlackRock, In… BLK   
## 36 Healthcare … Non-Leader             Stable              Labcorp Holdi… LH    
## 37 Healthcare … Non-Leader             Stable              Utah Medical … UTMD  
## 38 Healthcare … Non-Leader             Stable              Hologic, Inc.  HOLX  
## 39 Healthcare … Non-Leader             Volatile            Incyte Corpor… INCY  
## 40 Healthcare … Non-Leader             Volatile            Cyclo Therape… CYTH  
## 41 Healthcare … Non-Leader             Volatile            Assertio Hold… ASRT  
## 42 Healthcare … Leader                 Stable              Stryker Corpo… SYK   
## 43 Healthcare … Leader                 Stable              Novartis AG    NVS   
## 44 Healthcare … Leader                 Stable              Abbott Labora… ABT   
## 45 Healthcare … Leader                 Volatile            CONMED Corpor… CNMD  
## 46 Healthcare … Leader                 Volatile            Novo Nordisk … NVO   
## 47 Healthcare … Leader                 Volatile            RadNet, Inc.   RDNT  
## 48 Industrials… Non-Leader             Stable              National Pres… NPK   
## 49 Industrials… Non-Leader             Stable              MSA Safety In… MSA   
## 50 Industrials… Non-Leader             Stable              Sturm, Ruger … RGR   
## 51 Industrials… Non-Leader             Volatile            WESCO Interna… WCC   
## 52 Industrials… Non-Leader             Volatile            L.B. Foster C… FSTR  
## 53 Industrials… Non-Leader             Volatile            Matrix Servic… MTRX  
## 54 Industrials… Leader                 Stable              Graco Inc.     GGG   
## 55 Industrials… Leader                 Stable              Union Pacific… UNP   
## 56 Industrials… Leader                 Stable              Honeywell Int… HON   
## 57 Industrials… Leader                 Volatile            Lincoln Elect… LECO  
## 58 Industrials… Leader                 Volatile            Comfort Syste… FIX   
## 59 Industrials… Leader                 Volatile            3M Company     MMM   
## 60 Technology … Non-Leader             Stable              Amdocs Limited DOX   
## 61 Technology … Non-Leader             Volatile            MKS Instrumen… MKSI  
## 62 Technology … Non-Leader             Volatile            Progress Soft… PRGS  
## 63 Technology … Non-Leader             Volatile            Upbound Group… UPBD  
## 64 Technology … Leader                 Stable              Paychex, Inc.  PAYX  
## 65 Technology … Leader                 Stable              Fiserv, Inc.   FI    
## 66 Technology … Leader                 Stable              CGI Inc.       GIB   
## 67 Technology … Leader                 Volatile            Adobe Inc.     ADBE  
## 68 Technology … Leader                 Volatile            HP Inc.        HPQ   
## 69 Technology … Leader                 Volatile            Intuit Inc.    INTU  
## # ℹ abbreviated name: ¹​`Market leadership status`

The selected stocks are assumed 72 however, because of the data groupping 3 stocks is not revelead and avoided.The following part of the analysis is cocducted in Python such as Data preparation for Modeling, Modeling, Evaluation of the Models, and Results parts.