Loading data

Date Ticker Open Price Close Price High Price Low Price Volume Traded Market Cap PE Ratio Dividend Yield EPS 52 Week High 52 Week Low Sector
1/6/2025 SLH 34.92 34.53 35.22 34.38 2966611 57381363839 29.63 2.85 1.17 39.39 28.44 Industrials
1/6/2025 WGB 206.50 208.45 210.51 205.12 1658738 52747068434 13.03 2.73 16.00 227.38 136.79 Energy
1/6/2025 ZIN 125.10 124.03 127.40 121.77 10709898 55969486767 29.19 2.64 4.25 138.35 100.69 Healthcare
1/6/2025 YPY 260.55 265.28 269.99 256.64 14012358 79640893981 19.92 1.29 13.32 317.57 178.26 Industrials
1/6/2025 VKD 182.43 186.89 189.40 179.02 14758143 72714371067 40.18 1.17 4.65 243.54 165.53 Technology
1/6/2025 PJO 71.47 71.20 72.79 71.07 5719915 85866943557 30.51 2.71 2.33 82.93 56.81 Consumer Staples

We are studying a stock market data with several risk factors, analyzing market caps and volume traded according to each sector and each ticker.

Interpretation & Reporting

1. Which variables matter most for Close Price?

2. Does Volume significantly explain movements?

3. Are there sector differences?

4. Are returns predictable at all, or mostly random?

Visualizations

data %>% ggplot(aes(x=Sector,y=`Market Cap`))+
  geom_violin(col="violet")

This violin plot shows us the distribution market capacity insights of each sector. It is more ranged with respect to materials sector,real estate,energy then financials sectors. More concentrated observations are in technology sector even with its limited range.

ggplot(data,aes(x=`Volume Traded`))+
  geom_density(fill="lightblue",color="darkblue")+
  labs(title="Density Plot",x="Volume Traded",y="Density")+
  theme_classic()+
  theme(legend.position="none")

This density plot shows the distribution of volume traded with its different frequencies. The plot indicates highest peaks at volumes between e+ & 2e+ for high density 8e^(-08). So more concentrated tradings are between this range of volume values. Volatility is not too wide with bimodal distribution.

data %>% ggplot(aes(x=`PE Ratio`,y=Sector))+
  geom_jitter(alpha=0.5,color="lightblue")

Based on our plot,higher PE Ratio is in Technology Sector and lower ones are in energy & utilities sectors. So, Investors tend to reward growth sectors(Tech) with higher valuations and low PE sectors(energy,utilities) are considered “value” sectors-less growth, more stable dividends.

~ PE Ratio reflects the market’s valuation of a company relative to it’s earnings.

So we are going to compare each company PE Ratio to it’s sector average to assess whether the stock is overvalued or undervalued.

Also we will measure EPS average ticker and sector to help understand later company’s profitability and it’s potential for future growth.

~ EPS is a key indicator of a company’s financial performance and its ability to generate profits.

Financial Analysis

data peparation

data$Date <- dmy(data$Date)
data <- data %>% arrange(Date)

EPS & PE Ratio Averages

sector_summary <- data %>% group_by(Sector) %>% 
  summarize(Average_PE_Ratio_Sector=mean(`PE Ratio`,na.rm=TRUE),
            Average_EPS_Sector=mean(EPS,na.rm=TRUE))

ticker_summary <-data %>% group_by(Ticker) %>% 
  summarize(Average_PE_Ratio_Ticker=mean(`PE Ratio`,na.rm=TRUE),
            Average_EPS_Ticker=mean(EPS,na.rm=TRUE))

Joining these new measures with our data.

data <- left_join(data,sector_summary,by="Sector")
data <-left_join(data,ticker_summary,by="Ticker")

Company’s Valuation

data <- data %>% mutate(PE_eval=if_else(`PE Ratio` > Average_PE_Ratio_Sector,
                                          "Above Average","Undervalued"))
data <- data %>% mutate(historical_volatility=`52 Week High` - `52 Week Low`,
                          Strength_percentage=((`52 Week High`-`Close Price`)/`historical_volatility`)*100,
                          Stock_Strength=case_when(
                            Strength_percentage < 50 ~ "Bullish Momentum",
                                    Strength_percentage == 50 ~ "Middle range",
                                    Strength_percentage > 50 ~ "Bearish Momentum"))

Now, we have calculated several measures, first of all the valuation of PE Company Ratio, that reflects investor sentiment about company’s future earnings. Then we got the category classification of the stock strength compared and based on historical variables.So we are going to filter the highest profitable index.

Filtering Technology Sector and Visualyzing its average Close Price

tech_index <- data %>%
  filter(Sector == "Technology") %>%
  group_by(Date) %>%
  summarise(Sector_Avg = mean(`Close Price`, na.rm = TRUE))

Plot aggregated index

ggplot(tech_index, aes(x = Date, y = Sector_Avg,group=1)) +
  geom_line(color = "blue", linewidth = 1) +
  scale_x_date(date_labels="%d/%m/%Y",date_breaks="1 day")+
  labs(title = "Average Stock Price Over Time-Technology Sector",
       y = "Average Close Price", x = "Date") +
  theme_minimal()

Time Series Analysis

Preparing data

techdata <- data %>% filter(Sector=="Technology")

library(zoo)
library(xts)
techdata <- na.omit(techdata)
stock_xts <- xts(techdata$`Close Price`,order.by=techdata$Date)
colnames(stock_xts)<- c("Close")

Deep Volatility

rolling_volatility <- rollapply(techdata$`Dividend Yield`,
                                width=21,FUN=sd,fill=NA)
plot(rolling_volatility,main="Rolling Volatility In Tech",col="red")

Seasonal Analysis

decomposed <- decompose(ts(stock_xts$Close,frequency=9))
plot(decomposed)

We have here:

1.The Original Time Series: Shows the overall behavior of the stock price in Tech Sector, including ups and downs within the recorded data dates.The goal is to understand what’s driving the price movement.

2.Trend Component: The trend represents the long-term direction of the time series.

3.Seasonal Component: It helps to identify recurring patterns in the data.In our data it fluctuates several instants across each trading day.

4.Random Component: They represent the unpredictable or random variations in the data.

Time Series Modeling and Forecasting

library(forecast)
stock_xts$Close <- as.numeric(stock_xts$Close)
stock_ts <- ts(stock_xts$Close)

arima_model <- auto.arima(stock_ts)
forecasted_values <- forecast(arima_model,h=1)
plot(forecasted_values,xaxt="n",yaxt="n",main="Forecast of Stock Prices In Technology",col="red")
axis(1,at=seq(10,11,by=1),labels=seq(10,11,by=1))

Our Arima Model gives us an approximate forecasting of the 10th day in Trading for technology sector in different tickers(companies).

Correlation Analysis

numeric_cols <- sapply(data,is.numeric)
data_numeric <- data[,numeric_cols]

correlation_matrix <- cor(data_numeric,use="pairwise.complete.obs")
library(corrplot)
corrplot(correlation_matrix,method="color",type="upper",tl.col="black",tl.srt=45)

High correlation between Close Price,Open Price,Low Price and High Price. Also it seems there exists somewhat high correlation between these four financial and dividend yield.

Correlation between sectors

Do the returns of sector A and sector B move together?

sector_returns <- data %>% arrange(Sector,Date) %>% 
  group_by(Sector,Date) %>% summarise(Avg_Close=mean(`Close Price`), .groups="drop") %>% group_by(Sector) %>% mutate(Return=log(Avg_Close/lag(Avg_Close))) %>% 
  dplyr::select(Date, Sector, Return)

sector_matrix <- sector_returns %>%
  pivot_wider(names_from=Sector,values_from=Return)

cor_matrix <-cor(sector_matrix[,-1], use="complete.obs")
library(corrplot)
corrplot(cor_matrix,method="color",type="upper",tl.col="black",tl.srt=45)

From this plot, we observe weak correlations between sectors. So seems there is no associations or strong influence within sectors.

Dynamic influence(VAR model)

library(vars)
sector_ts <- sector_matrix %>% 
  dplyr::select(all_of(c("Technology","Financials")))

sector_ts_num <- sector_ts %>% 
  dplyr::select(where(is.numeric)) %>% 
  na.omit()

var_model <- VAR(sector_ts_num, p=1, type="const")
summary(var_model)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: Technology, Financials 
## Deterministic variables: const 
## Sample size: 18 
## Log Likelihood: -3.72 
## Roots of the characteristic polynomial:
## 0.465 0.465
## Call:
## VAR(y = sector_ts_num, p = 1, type = "const")
## 
## 
## Estimation results for equation Technology: 
## =========================================== 
## Technology = Technology.l1 + Financials.l1 + const 
## 
##               Estimate Std. Error t value Pr(>|t|)  
## Technology.l1 -0.37453    0.19575  -1.913   0.0750 .
## Financials.l1 -0.64558    0.23977  -2.692   0.0167 *
## const         -0.03940    0.06839  -0.576   0.5731  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.2882 on 15 degrees of freedom
## Multiple R-Squared: 0.4507,  Adjusted R-squared: 0.3775 
## F-statistic: 6.154 on 2 and 15 DF,  p-value: 0.01118 
## 
## 
## Estimation results for equation Financials: 
## =========================================== 
## Financials = Technology.l1 + Financials.l1 + const 
## 
##               Estimate Std. Error t value Pr(>|t|)  
## Technology.l1  0.04951    0.20432   0.242   0.8118  
## Financials.l1 -0.49203    0.25028  -1.966   0.0681 .
## const          0.01104    0.07138   0.155   0.8791  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.3008 on 15 degrees of freedom
## Multiple R-Squared: 0.2049,  Adjusted R-squared: 0.09886 
## F-statistic: 1.932 on 2 and 15 DF,  p-value: 0.1792 
## 
## 
## 
## Covariance matrix of residuals:
##            Technology Financials
## Technology   0.083056  -0.007271
## Financials  -0.007271   0.090492
## 
## Correlation matrix of residuals:
##            Technology Financials
## Technology    1.00000   -0.08387
## Financials   -0.08387    1.00000

More strong evidence, no significant relation between Financials and Technology.

causality(var_model, cause="Technology")
## $Granger
## 
##  Granger causality H0: Technology do not Granger-cause Financials
## 
## data:  VAR object var_model
## F-Test = 0.058709, df1 = 1, df2 = 30, p-value = 0.8102
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: Technology and Financials
## 
## data:  VAR object var_model
## Chi-squared = 0.12573, df = 1, p-value = 0.7229

H0: No instantaneous causality between: Technology and Financials

Checking Relations & Associations within variables

cor(techdata$`Dividend Yield`,techdata$historical_volatility)
## [1] 0.06508285

Strong correlation between Dividend Yield and historical volatility.

Techdata, Studying Risk factors

library(rpart)
library(rpart.plot)
rpart(data$`Close Price` ~ ., data=data)
## n= 1762 
## 
## node), split, n, deviance, yval
##       * denotes terminal node
## 
##  1) root 1762 11906740.00 157.56710  
##    2) Low Price< 155.99 902  1516341.00  88.43042  
##      4) Low Price< 86.395 450   198331.10  52.96964  
##        8) Low Price< 52.26 225    30838.72  35.21293 *
##        9) Low Price>=52.26 225    25607.00  70.72636 *
##      5) Low Price>=86.395 452   188793.90 123.73430  
##       10) Low Price< 119.57 203    18471.75 104.07440 *
##       11) Low Price>=119.57 249    27893.27 139.76220 *
##    3) Low Price>=155.99 860  1556950.00 230.08010  
##      6) Low Price< 225.42 434   190932.20 194.47820  
##       12) Low Price< 189.355 201    17849.26 174.75570 *
##       13) Low Price>=189.355 233    27452.62 211.49200 *
##      7) Low Price>=225.42 426   255498.40 266.35070  
##       14) Low Price< 261.855 224    24629.84 248.35800 *
##       15) Low Price>=261.855 202    77937.65 286.30290 *
model <- rpart(data$`Close Price`~ ., data=data)
rpart.plot(model)

Based on our risk classification tree model, Low price is the most risk factor affecting the close price.

Modeling

Visualizing

data %>% ggplot(aes(x=`Low Price`,y=`Close Price`))+
  geom_point(color="blue")+
  geom_smooth(method="lm",
                se=FALSE,
                color="lightblue")

So yes, it’s clear there is strong linear realtionship between low price and close price. Let’s Model,,

Regression Prediction Model across techdata

Model1 <- lm(`Close Price`~`Low Price`,data=techdata)
summary(Model1)
## 
## Call:
## lm(formula = `Close Price` ~ `Low Price`, data = techdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8454 -0.9995 -0.1417  0.7071  5.8020 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.29098    0.32700   -0.89    0.375    
## `Low Price`  1.02053    0.00189  539.99   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.873 on 153 degrees of freedom
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9995 
## F-statistic: 2.916e+05 on 1 and 153 DF,  p-value: < 2.2e-16

Very significant.

Model2 <- lm(`Close Price`~`Low Price`+`High Price`+`Open Price`,data=techdata)
summary(Model2)
## 
## Call:
## lm(formula = `Close Price` ~ `Low Price` + `High Price` + `Open Price`, 
##     data = techdata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.02469 -0.69506  0.08188  0.62233  2.42472 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.25991    0.19618  -1.325    0.187    
## `Low Price`   0.80533    0.05348  15.060   <2e-16 ***
## `High Price`  0.87161    0.05370  16.232   <2e-16 ***
## `Open Price` -0.67619    0.05511 -12.269   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.115 on 151 degrees of freedom
## Multiple R-squared:  0.9998, Adjusted R-squared:  0.9998 
## F-statistic: 2.744e+05 on 3 and 151 DF,  p-value: < 2.2e-16

Very significant hypothesis regarding other financial predictors.

Model3 <- lm(`Close Price`~`Low Price`+`High Price`+`Open Price`+`Dividend Yield`
             +EPS+`PE Ratio`+`historical_volatility`,data=techdata)
summary(Model3)
## 
## Call:
## lm(formula = `Close Price` ~ `Low Price` + `High Price` + `Open Price` + 
##     `Dividend Yield` + EPS + `PE Ratio` + historical_volatility, 
##     data = techdata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.96431 -0.73228  0.06593  0.69880  2.50560 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -1.140662   0.846017  -1.348    0.180    
## `Low Price`            0.803045   0.054404  14.761   <2e-16 ***
## `High Price`           0.866563   0.054655  15.855   <2e-16 ***
## `Open Price`          -0.674863   0.055841 -12.085   <2e-16 ***
## `Dividend Yield`       0.072147   0.159594   0.452    0.652    
## EPS                    0.159989   0.135958   1.177    0.241    
## `PE Ratio`             0.024839   0.025678   0.967    0.335    
## historical_volatility  0.002091   0.004560   0.459    0.647    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.122 on 147 degrees of freedom
## Multiple R-squared:  0.9998, Adjusted R-squared:  0.9998 
## F-statistic: 1.162e+05 on 7 and 147 DF,  p-value: < 2.2e-16

No significance for the additive financial indicators.

Final check Model for the overall data!

Model <- lm(`Close Price` ~`Low Price`+`High Price`+`Open Price`+
              `Dividend Yield`+`Average_EPS_Ticker`+`Average_EPS_Sector`+
              `Average_PE_Ratio_Sector`+
               `Average_PE_Ratio_Ticker`+`historical_volatility`+
               `Strength_percentage`,
                data=data)
summary(Model)
## 
## Call:
## lm(formula = `Close Price` ~ `Low Price` + `High Price` + `Open Price` + 
##     `Dividend Yield` + Average_EPS_Ticker + Average_EPS_Sector + 
##     Average_PE_Ratio_Sector + Average_PE_Ratio_Ticker + historical_volatility + 
##     Strength_percentage, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0484 -0.7011  0.0193  0.6680  5.1191 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -1.278461   0.929307  -1.376   0.1691    
## `Low Price`              0.836431   0.017446  47.944   <2e-16 ***
## `High Price`             0.837718   0.017350  48.284   <2e-16 ***
## `Open Price`            -0.676160   0.018186 -37.180   <2e-16 ***
## `Dividend Yield`         0.025081   0.030118   0.833   0.4051    
## Average_EPS_Ticker       0.021772   0.015298   1.423   0.1549    
## Average_EPS_Sector       0.051889   0.049570   1.047   0.2953    
## Average_PE_Ratio_Sector  0.025952   0.023743   1.093   0.2745    
## Average_PE_Ratio_Ticker  0.002693   0.007095   0.380   0.7043    
## historical_volatility    0.002689   0.001382   1.946   0.0518 .  
## Strength_percentage      0.001934   0.001835   1.054   0.2922    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.27 on 1751 degrees of freedom
## Multiple R-squared:  0.9998, Adjusted R-squared:  0.9998 
## F-statistic: 7.385e+05 on 10 and 1751 DF,  p-value: < 2.2e-16

Regarding Trading data with all tickers and sectors, historical volatility plays a slight influence on the close price, in addition to the primary key indicators (LOW,HIGH,OPEN).