Data Source
The data used in this particular scenario was collected by me for the purpose of study and analysis of some queries regarding this project. The data is available on KAGGLE website(https://www.kaggle.com/). The data is in JSON form which then I converted to CSV from ConvertCSV website(https://www.convertcsv.com/json-to-csv.htm). All of the data is also available in the DSE website. This data only includes stock prices of 2022. Most of the Bangladesh’s IPO listed companies data is available in this file.
Exploratory Analysis
First of all we will read the CSV file. #Read The Data
There are total 112714 variable. these represents the total historical trading data of Bangladesh market in year 2022. The dates are in the first column, stock code in the second, on the third column it is mentioned about the last traded price of a particular stock, then highest price and the lowest price respectably and so on.
Let’s look at the summery of trading data
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 624 972 1137 1451 6084
If we look at the mean number of trading we can see it has 1137 times on average. and median is a little bit low which is 972, some of the data is much lower than the other. that is why the mean and the median is placing far away from each other.
## [1] 216.871
Here the data shows that the weighted mean is 216.871 which refers to the mean price for the stock is 216.871
## 5% 25% 50% 75% 95%
## 209.80 209.80 214.75 220.00 226.95
The data shows that the price in which the stock mostly traded was above 226 and the least price was 209.8. Now to look at the trading density.
The histogram shows the stock in high price trades less frequently than
the stock on low price. which is in a sense is correct as traders buy
more when the price is low and sellss less when the price is higher.
library(ggplot2)
ggplot(data,(aes(x=date, y=volume, trim=0.5))) +
stat_binhex(colour="blue") +
theme_classic() +
scale_fill_gradient(low = "green", high = "red") +
labs(x="Date", y="Volume")
Trading data according to the dates. from feb 2022 to dec 2022.
Distribution Of trading.
The data is left tailed and moves further to the right.Meaning it is not a normal distribution.
If we look at the data it shows that most of the trading occure in at
lowest price as the density of trading and volume is highest at the
lowest proint.
Regression
library(tidyverse)
data_lm <- lm(opening_price ~
last_traded_price + high + low +
closing_price + trade + value_mn + volume,
data=data, na.action=na.omit)
data_lm##
## Call:
## lm(formula = opening_price ~ last_traded_price + high + low +
## closing_price + trade + value_mn + volume, data = data, na.action = na.omit)
##
## Coefficients:
## (Intercept) last_traded_price high low
## 2.393e+00 -5.589e-01 9.045e-01 6.550e-01
## closing_price trade value_mn volume
## -1.128e-02 2.479e-04 2.605e-03 -2.344e-06
##
## Call:
## lm(formula = opening_price ~ last_traded_price + high + low +
## closing_price + trade + value_mn + volume, data = data, na.action = na.omit)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.52300 -0.33428 -0.01826 0.47293 2.16531
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.393e+00 4.088e+00 0.585 0.5588
## last_traded_price -5.589e-01 7.533e-02 -7.420 2.11e-12 ***
## high 9.045e-01 5.652e-02 16.004 < 2e-16 ***
## low 6.550e-01 6.364e-02 10.293 < 2e-16 ***
## closing_price -1.128e-02 1.961e-02 -0.575 0.5657
## trade 2.479e-04 1.108e-04 2.238 0.0262 *
## value_mn 2.605e-03 3.549e-02 0.073 0.9415
## volume -2.344e-06 7.714e-06 -0.304 0.7615
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.811 on 236 degrees of freedom
## Multiple R-squared: 0.9972, Adjusted R-squared: 0.9971
## F-statistic: 1.19e+04 on 7 and 236 DF, p-value: < 2.2e-16