R for statistics

Sohan444664

2024-02-18

Data Source

The data used in this particular scenario was collected by me for the purpose of study and analysis of some queries regarding this project. The data is available on KAGGLE website(https://www.kaggle.com/). The data is in JSON form which then I converted to CSV from ConvertCSV website(https://www.convertcsv.com/json-to-csv.htm). All of the data is also available in the DSE website. This data only includes stock prices of 2022. Most of the Bangladesh’s IPO listed companies data is available in this file.

Exploratory Analysis

First of all we will read the CSV file. #Read The Data

data<- read.csv("Squar_2022.csv")

There are total 112714 variable. these represents the total historical trading data of Bangladesh market in year 2022. The dates are in the first column, stock code in the second, on the third column it is mentioned about the last traded price of a particular stock, then highest price and the lowest price respectably and so on.

Let’s look at the summery of trading data

summary(data$trade)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0     624     972    1137    1451    6084

If we look at the mean number of trading we can see it has 1137 times on average. and median is a little bit low which is 972, some of the data is much lower than the other. that is why the mean and the median is placing far away from each other.

weighted.mean(data$opening_price, w=data$volume)
## [1] 216.871

Here the data shows that the weighted mean is 216.871 which refers to the mean price for the stock is 216.871

quantile(data$opening_price, p=c(0.05,0.25,0.5,0.75,0.95))
##     5%    25%    50%    75%    95% 
## 209.80 209.80 214.75 220.00 226.95

The data shows that the price in which the stock mostly traded was above 226 and the least price was 209.8. Now to look at the trading density.

hist(data$high, breaks=10, freq=T)

hist(data$low, breaks=10, freq=T)

The histogram shows the stock in high price trades less frequently than the stock on low price. which is in a sense is correct as traders buy more when the price is low and sellss less when the price is higher.

library(ggplot2)
ggplot(data,(aes(x=date, y=volume, trim=0.5))) +
  stat_binhex(colour="blue") +
  theme_classic() +
  scale_fill_gradient(low = "green", high = "red") +
  labs(x="Date", y="Volume")

Trading data according to the dates. from feb 2022 to dec 2022. Distribution Of trading.

The data is left tailed and moves further to the right.Meaning it is not a normal distribution.

ggplot(data, aes(trade, volume)) +
  geom_point() +
  geom_smooth(method = "lm", se= T)

If we look at the data it shows that most of the trading occure in at lowest price as the density of trading and volume is highest at the lowest proint.

library(ggplot2)
qqnorm(data$volume)
qqline(data$volume)

hist(data$volume)

Regression

library(tidyverse)
data_lm <- lm(opening_price ~
               last_traded_price + high + low +
               closing_price + trade + value_mn + volume,
             data=data, na.action=na.omit)
data_lm
## 
## Call:
## lm(formula = opening_price ~ last_traded_price + high + low + 
##     closing_price + trade + value_mn + volume, data = data, na.action = na.omit)
## 
## Coefficients:
##       (Intercept)  last_traded_price               high                low  
##         2.393e+00         -5.589e-01          9.045e-01          6.550e-01  
##     closing_price              trade           value_mn             volume  
##        -1.128e-02          2.479e-04          2.605e-03         -2.344e-06
summary(data_lm)
## 
## Call:
## lm(formula = opening_price ~ last_traded_price + high + low + 
##     closing_price + trade + value_mn + volume, data = data, na.action = na.omit)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.52300 -0.33428 -0.01826  0.47293  2.16531 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        2.393e+00  4.088e+00   0.585   0.5588    
## last_traded_price -5.589e-01  7.533e-02  -7.420 2.11e-12 ***
## high               9.045e-01  5.652e-02  16.004  < 2e-16 ***
## low                6.550e-01  6.364e-02  10.293  < 2e-16 ***
## closing_price     -1.128e-02  1.961e-02  -0.575   0.5657    
## trade              2.479e-04  1.108e-04   2.238   0.0262 *  
## value_mn           2.605e-03  3.549e-02   0.073   0.9415    
## volume            -2.344e-06  7.714e-06  -0.304   0.7615    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.811 on 236 degrees of freedom
## Multiple R-squared:  0.9972, Adjusted R-squared:  0.9971 
## F-statistic: 1.19e+04 on 7 and 236 DF,  p-value: < 2.2e-16