This R Markdown file is about the analysis of the NSE Stock Exchange dataset which shows the market prices of various companies at various equity values and at different time, and on different days.
Stock brokers and traders buy and sell shares of stock, bonds and other securities in a stock Exchange. For trading a security, it must be listed on the stock exchange.Participants in the market include retail investors,mutual funds, banks, insurance companies and hedge funds. The roles of stock exchange are raising capital for business ,mobilizing savings for investment, facilitating company growth, profit sharing, corporate governance and creation of investment opportunities.
This study comprises of the variation of market prices of various companies over a particular time period. The National Stock Exchange located in Mumbai, India has a market capitalization of $ 1.41 Trillion. The National Stock Exchange offers trading and investment in equities, derivatives and debt. NIFTY 50, the index of National Stock Exchange,is used as a measure of Indian capital markets by many investors. This dataset comprises of values like opening and closing market prices, highest and lowest market prices, last market price and the total traded values and quantitites of various companies on the given date (TIMESTAMP). A regression analysis was conducted to evaluate the dependence of Total traded quantities and Total traded values on other factors like Opening and Closing Market prices, Highest and lowest Market prices.
The aim of this study was to analyse the various market prices of the National Stock Exchange of the companies listed. The analysis reveals that the mean of total traded quanities is 698380. Also, it seems that the average of total trades is 5014.( https://en.wikipedia.org/wiki/National_Stock_Exchange_of_India#Markets )
Hypothesis H1: Variables like Total Traded Quantity,Total Traded Values and Total Trades depend on variables like HIGH, CLOSE, LAST, OPEN, LOW and PREVCLOSE.
For this analysis, I collected data from https://www.kaggle.com/minatverma/nse-stocks-data. This dataset comprised a big list of companies and they were classified under the column “SERIES” as per the series of equity values. Then, the timestamp of each share of a company is available in the dataset, and it is useful to analyse the total trades by date. Also, the output variables like Total Traded Quantity and Total Traded Value could be analysed accurately using Timestamp.Also, the analysis reveals that, the average values of Opening and Closing Market prices turn out to be 561.2588 and 560.8152 respectively.
The data is of National Stock Exchange of India’s stock listings for each trading day of 2016 and 2017. The columns of the dataset are described below:
SYMBOL: Symbol of the listed company. SERIES: Series of the equity. Values are [EQ, BE, BL, BT, GC and IL] OPEN: The opening market price of the equity symbol on the date. HIGH: The highest market price of the equity symbol on the date. LOW: The lowest recorded market price of the equity symbol on the date. CLOSE: The closing recorded price of the equity symbol on the date. LAST: The last traded price of the equity symbol on the date. PREVCLOSE: The previous day closing price of the equity symbol on the date. TOTTRDQTY: Total traded quantity of the equity symbol on the date. TOTTRDVAL: Total traded volume of the equity symbol on the date. TIMESTAMP: Date of record. TOTALTRADES: Total trades executed on the day. ISIN: International Securities Identification Number.
The following linear regression model was carried out to test the H1 Hypothesis.
setwd("F:/R-Internship/Course related files")
nse_stocks.df<-read.csv(paste("NSE_Stocks.csv",sep=""))
View(nse_stocks.df)
LIN_REG<-lm(TOTTRDQTY+TOTTRDVAL+TOTALTRADES~
OPEN+HIGH+CLOSE+LOW+LAST+PREVCLOSE,data=nse_stocks.df)
summary(LIN_REG)
##
## Call:
## lm(formula = TOTTRDQTY + TOTTRDVAL + TOTALTRADES ~ OPEN + HIGH +
## CLOSE + LOW + LAST + PREVCLOSE, data = nse_stocks.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.677e+09 -1.228e+08 -1.203e+08 -9.421e+07 1.425e+11
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 121278956 712153 170.299 < 2e-16 ***
## OPEN -93570 39786 -2.352 0.0187 *
## HIGH 843552 41068 20.540 < 2e-16 ***
## CLOSE 431112 51880 8.310 < 2e-16 ***
## LOW -953328 42049 -22.672 < 2e-16 ***
## LAST -168407 25761 -6.537 6.27e-11 ***
## PREVCLOSE -60214 8408 -7.162 7.97e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 629400000 on 846397 degrees of freedom
## Multiple R-squared: 0.006842, Adjusted R-squared: 0.006835
## F-statistic: 971.8 on 6 and 846397 DF, p-value: < 2.2e-16
The result of the regression model tells us that the all the independent variables like OPEN, HIGH, CLOSE, LOW, LAST, PREVCLOSE have a great influence on the dependent variables. This reveals that the Total Traded Values and Total Traded Quantities vary greatly based on t*hese independent variables.
The result of the empirical study reveals that, there is a strong effect of variables like LOW, HIGH, CLOSE, LAST, OPEN on the other variables like TOTTRDQTY, TOTTRDVAL.The regression model produced the value of p-value < 0.05, and the values of Multiple R-squared and Adjusted R-squared are 0.006842 and 0.006835 respectively.
I always had a significant interest in Stock-markets and shares. So, this dataset seemed to be suitable to work with. This study helped me to understand the range of market prices within which the shares of each company varied. It looks like, the closing market prices, lowest and highest market prices, and the last traded price of every company prove to be statistically significant. Many companies seem to have their lowest market prices in the range of 1000 to 1100.
Appendix 1:(Descriptive Statistics)
#Q1: Read your dataset in R and visualize the length and breadth of your dataset.
setwd("F:/R-Internship/Course related files")
nse_stocks.df<-read.csv(paste("NSE_Stocks.csv",sep=""))
View(nse_stocks.df)
dim(nse_stocks.df)
## [1] 846404 13
#Q2: Create a descriptive statistics (min, max, median etc) of each variable.
library(psych)
summary(nse_stocks.df)
## SYMBOL SERIES OPEN
## SRTRANSFIN: 4709 EQ :739199 Min. : 0.05
## IDFCBANK : 4379 BE : 37428 1st Qu.: 39.90
## IRFC : 4362 SM : 11229 Median : 139.20
## NHAI : 3360 N6 : 5080 Mean : 561.26
## RECLTD : 2919 N2 : 4828 3rd Qu.: 490.00
## L&TINFRA : 2760 N4 : 3722 Max. :119990.00
## (Other) :823915 (Other): 44918
## HIGH LOW CLOSE
## Min. : 0.05 Min. : 0.05 Min. : 0.05
## 1st Qu.: 40.85 1st Qu.: 38.85 1st Qu.: 39.75
## Median : 142.05 Median : 136.15 Median : 138.90
## Mean : 568.68 Mean : 553.86 Mean : 560.82
## 3rd Qu.: 499.55 3rd Qu.: 481.10 3rd Qu.: 489.55
## Max. :119990.00 Max. :119990.00 Max. :119990.00
##
## LAST PREVCLOSE TOTTRDQTY
## Min. : 0.0 Min. : 0.05 Min. : 1
## 1st Qu.: 39.6 1st Qu.: 39.70 1st Qu.: 6628
## Median : 138.4 Median : 138.75 Median : 44292
## Mean : 560.2 Mean : 560.23 Mean : 698380
## 3rd Qu.: 488.5 3rd Qu.: 489.15 3rd Qu.: 275732
## Max. :119990.0 Max. :119990.00 Max. :781836478
##
## TOTTRDVAL TIMESTAMP TOTALTRADES
## Min. :0.000e+00 26-12-17: 1860 Min. : 1
## 1st Qu.:6.072e+05 29-12-17: 1850 1st Qu.: 84
## Median :5.321e+06 27-12-17: 1846 Median : 600
## Mean :1.335e+08 22-12-17: 1836 Mean : 5014
## 3rd Qu.:3.711e+07 24-11-17: 1830 3rd Qu.: 3028
## Max. :1.426e+11 27-11-17: 1830 Max. :1192900
## (Other) :835352
## ISIN
## INE040A01026: 640
## INE237A01028: 508
## INE043D01016: 505
## INE092A01019: 504
## INE979R01011: 504
## INE095A01012: 501
## (Other) :843242
describe(nse_stocks.df)
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## vars n mean sd min max
## SYMBOL* 1 846404 NaN NA Inf -Inf
## SERIES* 2 846404 NaN NA Inf -Inf
## OPEN 3 846404 561.26 2005.76 0.05 119990
## HIGH 4 846404 568.68 2027.16 0.05 119990
## LOW 5 846404 553.86 1983.41 0.05 119990
## CLOSE 6 846404 560.82 2004.58 0.05 119990
## LAST 7 846404 560.18 2003.90 0.00 119990
## PREVCLOSE 8 846404 560.23 2000.36 0.05 119990
## TOTTRDQTY 9 846404 698380.05 4055629.69 1.00 781836478
## TOTTRDVAL 10 846404 133459225.97 629696792.93 0.05 142640000000
## TIMESTAMP* 11 846404 NaN NA Inf -Inf
## TOTALTRADES 12 846404 5013.82 14901.12 1.00 1192900
## ISIN* 13 846404 NaN NA Inf -Inf
## range se
## SYMBOL* -Inf NA
## SERIES* -Inf NA
## OPEN 1.199899e+05 2.18
## HIGH 1.199899e+05 2.20
## LOW 1.199899e+05 2.16
## CLOSE 1.199899e+05 2.18
## LAST 1.199900e+05 2.18
## PREVCLOSE 1.199899e+05 2.17
## TOTTRDQTY 7.818365e+08 4408.28
## TOTTRDVAL 1.426400e+11 684451.42
## TIMESTAMP* -Inf NA
## TOTALTRADES 1.192899e+06 16.20
## ISIN* -Inf NA
#Q3: Create one-way contingency tables for the categorical variables in your dataset.
with(nse_stocks.df,table(SERIES))
## SERIES
## BE BL BT BZ D1 DR EQ GB H6
## 2457 37428 227 1 2825 48 495 739199 2343 7
## HA HB HC HE IL IT IV MF N1 N2
## 1 7 20 1 205 300 299 473 3576 4828
## N3 N4 N5 N6 N7 N8 N9 NB NC ND
## 3098 3722 3661 5080 1658 2884 2021 1467 1195 1046
## NE NF NG NH NI NJ NK NL NM NN
## 1602 716 450 506 477 851 471 826 198 383
## NO NP NQ NR NS NT NU NV NW NX
## 544 410 19 86 266 120 341 183 476 375
## NY NZ P1 P2 Q1 Q2 SM W2 Y1 Y2
## 120 452 432 923 298 30 11229 402 310 212
## Y3 Y4 Y5 Y6 Y7 Y8 Y9 YA YB YC
## 239 77 77 151 227 257 372 197 230 107
## YD YG
## 60 130
#Q4: Create two-way contingency tables for the categorical variables in your dataset.
nse_TAB<-nse_stocks.df[c(1:25,2500:2525),]
table(droplevels(nse_TAB$SYMBOL))
##
## 20MICRONS 3IINFOTECH 3MINDIA 63MOONS 8KMILES A2ZINFRA
## 1 1 1 1 1 1
## AARTIDRUGS AARTIIND AARVEEDEN ABAN ABB ABBOTINDIA
## 1 1 1 1 1 1
## ABFRL ABGSHIP ABIRLANUVO ABMINTLTD ACC ACCELYA
## 1 1 1 1 1 1
## ACE ADANIENT ADANIPORTS ADANIPOWER ADANITRANS ADFFOODS
## 1 1 1 1 1 1
## ADHUNIK ITI IVC IVP IVRCLINFRA IVZINGOLD
## 1 1 1 1 1 1
## IVZINNIFTY IZMO J&KBANK JAGRAN JAGSNPHARM JAIBALAJI
## 1 1 1 1 1 1
## JAICORPLTD JAINSTUDIO JALAN JAMNAAUTO JASH JAYAGROGN
## 1 1 1 1 1 1
## JAYBARMARU JAYNECOIND JAYSREETEA JBCHEPHARM JBFIND JBMA
## 1 1 1 1 1 1
## JCHAC JENSONICOL JETAIRWAYS
## 1 1 1
table(droplevels(nse_TAB$SERIES))
##
## BE BZ EQ SM
## 4 1 44 2
Appendix 2: (PLOTS)
#Q5: Draw a boxplot of the variables that belong to your study.
nse_TAB3<-nse_stocks.df[(nse_stocks.df$SERIES=="Y2") |
(nse_stocks.df$SERIES=="Y7"),]
nse_TAB3$S1<-droplevels(nse_TAB3$SERIES)
boxplot(HIGH~S1,data=nse_TAB3,xlab="Highest Market Price",ylab="Equity values",main="Variation of Highest Market Price of Y2 and Y7", yaxt="n",horizontal=TRUE)
axis(side=2,at=c(1,2),labels=c("Y2","Y7"))
boxplot(LAST~S1,data=nse_TAB3,xlab="Last Traded Price",
ylab="Equity Value",main="Variation of Last Traded Price of Y2 and Y7",
yaxt="n",horizontal=TRUE)
axis(side=2,at=c(1,2),labels=c("Y2","Y7"))
#Q6: Draw Histograms for your suitable data fields.
hist(nse_TAB3$OPEN,xlab="Opening Market Price",
ylab="Count",main="Histogram of Opening Market Price",col=c("light blue"))
hist(nse_TAB3$PREVCLOSE,xlab="Closing Market Price on the previous day",
ylab="Count",main="Histogram of Previous Day's Closing Market Prices",col=c("gold 1"))
#Q7: Draw suitable plot for your data fields.
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplot(nse_TAB3$LOW,nse_TAB3$TOTTRDQTY,xlab = "Lowest Market Price on a day",
ylab= "Total Trade Quantity",
main="Scatterplot of Total Trade Quantity vs Lowest Market Price",
ylim=c(0,1000))
scatterplot(nse_TAB3$CLOSE,nse_TAB3$TOTTRDQTY,xlab = "Closing Market Price on a day", ylab= "Total Trade Quantity",
main="Scatterplot of Total Trade Quantity vs Closing Market Price", ylim=c(0,1000))
Appendix 3: (Correlations and Hypothesis tests)
#Q8: Create a correlation matrix.
library(psych)
nse<-nse_stocks.df[,c(3:10,12)]
corr.test(nse,use="complete")
## Call:corr.test(x = nse, use = "complete")
## Correlation matrix
## OPEN HIGH LOW CLOSE LAST PREVCLOSE TOTTRDQTY TOTTRDVAL
## OPEN 1.00 1.00 1.00 1.00 1.00 1.00 -0.03 0.06
## HIGH 1.00 1.00 1.00 1.00 1.00 1.00 -0.03 0.06
## LOW 1.00 1.00 1.00 1.00 1.00 1.00 -0.03 0.06
## CLOSE 1.00 1.00 1.00 1.00 1.00 1.00 -0.03 0.06
## LAST 1.00 1.00 1.00 1.00 1.00 1.00 -0.03 0.06
## PREVCLOSE 1.00 1.00 1.00 1.00 1.00 1.00 -0.03 0.06
## TOTTRDQTY -0.03 -0.03 -0.03 -0.03 -0.03 -0.03 1.00 0.45
## TOTTRDVAL 0.06 0.06 0.06 0.06 0.06 0.06 0.45 1.00
## TOTALTRADES 0.02 0.02 0.02 0.02 0.02 0.02 0.48 0.86
## TOTALTRADES
## OPEN 0.02
## HIGH 0.02
## LOW 0.02
## CLOSE 0.02
## LAST 0.02
## PREVCLOSE 0.02
## TOTTRDQTY 0.48
## TOTTRDVAL 0.86
## TOTALTRADES 1.00
## Sample Size
## [1] 846404
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## OPEN HIGH LOW CLOSE LAST PREVCLOSE TOTTRDQTY TOTTRDVAL
## OPEN 0 0 0 0 0 0 0 0
## HIGH 0 0 0 0 0 0 0 0
## LOW 0 0 0 0 0 0 0 0
## CLOSE 0 0 0 0 0 0 0 0
## LAST 0 0 0 0 0 0 0 0
## PREVCLOSE 0 0 0 0 0 0 0 0
## TOTTRDQTY 0 0 0 0 0 0 0 0
## TOTTRDVAL 0 0 0 0 0 0 0 0
## TOTALTRADES 0 0 0 0 0 0 0 0
## TOTALTRADES
## OPEN 0
## HIGH 0
## LOW 0
## CLOSE 0
## LAST 0
## PREVCLOSE 0
## TOTTRDQTY 0
## TOTTRDVAL 0
## TOTALTRADES 0
##
## To see confidence intervals of the correlations, print with the short=FALSE option
#Q9: Visualize your correlation matrix using corrgram.
nse4<-nse_stocks.df[1:1000,]
library(corrgram)
corrgram(nse4,order=TRUE,lower.panel = panel.shade, upper.panel = panel.pie,
text.panel = panel.txt,main="Corrgram of nse_stocks intercorrelations")
#Q10: Create a scatter plot matrix for your data set.
library(car)
scatterplotMatrix(nse4[,c("OPEN","TOTTRDVAL","CLOSE","HIGH","LOW","LAST",
"TOTTRDQTY")],spread=FALSE,smoother.args = list(lty=2),
main="Scatter-plot Matrix of variables of nse_stocks dataset")
#Q11: Run a suitable test to check your hypothesis for your suitable assumptions.
library(MASS)
library(psych)
t.test(nse_stocks.df$OPEN,nse_stocks.df$LAST)
##
## Welch Two Sample t-test
##
## data: nse_stocks.df$OPEN and nse_stocks.df$LAST
## t = 0.34862, df = 1692800, p-value = 0.7274
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.965846 7.114581
## sample estimates:
## mean of x mean of y
## 561.2588 560.1844
t.test(nse_stocks.df$LAST,nse_stocks.df$TOTALTRADES)
##
## Welch Two Sample t-test
##
## data: nse_stocks.df$LAST and nse_stocks.df$TOTALTRADES
## t = -272.52, df = 877010, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4485.670 -4421.608
## sample estimates:
## mean of x mean of y
## 560.1844 5013.8232
#Q12: Run a t-test to analyse your hypothesis.
library(MASS)
library(psych)
t.test(nse_stocks.df$TOTTRDVAL,nse_stocks.df$OPEN)
##
## Welch Two Sample t-test
##
## data: nse_stocks.df$TOTTRDVAL and nse_stocks.df$OPEN
## t = 194.99, df = 846400, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 132117163 134800167
## sample estimates:
## mean of x mean of y
## 1.334592e+08 5.612588e+02
t.test(nse_stocks.df$TOTTRDVAL,nse_stocks.df$CLOSE)
##
## Welch Two Sample t-test
##
## data: nse_stocks.df$TOTTRDVAL and nse_stocks.df$CLOSE
## t = 194.99, df = 846400, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 132117163 134800167
## sample estimates:
## mean of x mean of y
## 1.334592e+08 5.608152e+02
t.test(nse_stocks.df$TOTTRDQTY,nse_stocks.df$LAST)
##
## Welch Two Sample t-test
##
## data: nse_stocks.df$TOTTRDQTY and nse_stocks.df$LAST
## t = 158.3, df = 846400, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 689179.8 706460.0
## sample estimates:
## mean of x mean of y
## 698380.0461 560.1844
t.test(nse_stocks.df$TOTTRDQTY,nse_stocks.df$OPEN)
##
## Welch Two Sample t-test
##
## data: nse_stocks.df$TOTTRDQTY and nse_stocks.df$OPEN
## t = 158.3, df = 846400, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 689178.7 706458.9
## sample estimates:
## mean of x mean of y
## 698380.0461 561.2588