Project Title: An analysis of NSE Stock Exchange dataset

NAME: SHIVAPRAKASH.B

EMAIL: shivaprakash.balakrishnan@gmail.com

COLLEGE: NATIONAL INSTITUTE OF TECHNOLOGY, TIRUCHIRAPPALLI

This R Markdown file is about the analysis of the NSE Stock Exchange dataset which shows the market prices of various companies at various equity values and at different time, and on different days.

1. INTRODUCTION:

Stock brokers and traders buy and sell shares of stock, bonds and other securities in a stock Exchange. For trading a security, it must be listed on the stock exchange.Participants in the market include retail investors,mutual funds, banks, insurance companies and hedge funds. The roles of stock exchange are raising capital for business ,mobilizing savings for investment, facilitating company growth, profit sharing, corporate governance and creation of investment opportunities.

2. OVERVIEW of the study:

This study comprises of the variation of market prices of various companies over a particular time period. The National Stock Exchange located in Mumbai, India has a market capitalization of $ 1.41 Trillion. The National Stock Exchange offers trading and investment in equities, derivatives and debt. NIFTY 50, the index of National Stock Exchange,is used as a measure of Indian capital markets by many investors. This dataset comprises of values like opening and closing market prices, highest and lowest market prices, last market price and the total traded values and quantitites of various companies on the given date (TIMESTAMP). A regression analysis was conducted to evaluate the dependence of Total traded quantities and Total traded values on other factors like Opening and Closing Market prices, Highest and lowest Market prices.

3. An empirical field study of the stock-market prices of various companies listed in National Stock Exchange:

3.1 Overview:

The aim of this study was to analyse the various market prices of the National Stock Exchange of the companies listed. The analysis reveals that the mean of total traded quanities is 698380. Also, it seems that the average of total trades is 5014.( https://en.wikipedia.org/wiki/National_Stock_Exchange_of_India#Markets )

Hypothesis H1: Variables like Total Traded Quantity,Total Traded Values and Total Trades depend on variables like HIGH, CLOSE, LAST, OPEN, LOW and PREVCLOSE.

3.2 Data:

For this analysis, I collected data from https://www.kaggle.com/minatverma/nse-stocks-data. This dataset comprised a big list of companies and they were classified under the column “SERIES” as per the series of equity values. Then, the timestamp of each share of a company is available in the dataset, and it is useful to analyse the total trades by date. Also, the output variables like Total Traded Quantity and Total Traded Value could be analysed accurately using Timestamp.Also, the analysis reveals that, the average values of Opening and Closing Market prices turn out to be 561.2588 and 560.8152 respectively.

The data is of National Stock Exchange of India’s stock listings for each trading day of 2016 and 2017. The columns of the dataset are described below:

SYMBOL: Symbol of the listed company. SERIES: Series of the equity. Values are [EQ, BE, BL, BT, GC and IL] OPEN: The opening market price of the equity symbol on the date. HIGH: The highest market price of the equity symbol on the date. LOW: The lowest recorded market price of the equity symbol on the date. CLOSE: The closing recorded price of the equity symbol on the date. LAST: The last traded price of the equity symbol on the date. PREVCLOSE: The previous day closing price of the equity symbol on the date. TOTTRDQTY: Total traded quantity of the equity symbol on the date. TOTTRDVAL: Total traded volume of the equity symbol on the date. TIMESTAMP: Date of record. TOTALTRADES: Total trades executed on the day. ISIN: International Securities Identification Number.

3.3 Model:

The following linear regression model was carried out to test the H1 Hypothesis.

setwd("F:/R-Internship/Course related files")
nse_stocks.df<-read.csv(paste("NSE_Stocks.csv",sep=""))
View(nse_stocks.df)
LIN_REG<-lm(TOTTRDQTY+TOTTRDVAL+TOTALTRADES~
              OPEN+HIGH+CLOSE+LOW+LAST+PREVCLOSE,data=nse_stocks.df)

summary(LIN_REG)

## 
## Call:
## lm(formula = TOTTRDQTY + TOTTRDVAL + TOTALTRADES ~ OPEN + HIGH + 
##     CLOSE + LOW + LAST + PREVCLOSE, data = nse_stocks.df)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -4.677e+09 -1.228e+08 -1.203e+08 -9.421e+07  1.425e+11 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 121278956     712153 170.299  < 2e-16 ***
## OPEN           -93570      39786  -2.352   0.0187 *  
## HIGH           843552      41068  20.540  < 2e-16 ***
## CLOSE          431112      51880   8.310  < 2e-16 ***
## LOW           -953328      42049 -22.672  < 2e-16 ***
## LAST          -168407      25761  -6.537 6.27e-11 ***
## PREVCLOSE      -60214       8408  -7.162 7.97e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 629400000 on 846397 degrees of freedom
## Multiple R-squared:  0.006842,   Adjusted R-squared:  0.006835 
## F-statistic: 971.8 on 6 and 846397 DF,  p-value: < 2.2e-16

The result of the regression model tells us that the all the independent variables like OPEN, HIGH, CLOSE, LOW, LAST, PREVCLOSE have a great influence on the dependent variables. This reveals that the Total Traded Values and Total Traded Quantities vary greatly based on t*hese independent variables.

3.4. Results:

The result of the empirical study reveals that, there is a strong effect of variables like LOW, HIGH, CLOSE, LAST, OPEN on the other variables like TOTTRDQTY, TOTTRDVAL.The regression model produced the value of p-value < 0.05, and the values of Multiple R-squared and Adjusted R-squared are 0.006842 and 0.006835 respectively.

4. Conclusion:

I always had a significant interest in Stock-markets and shares. So, this dataset seemed to be suitable to work with. This study helped me to understand the range of market prices within which the shares of each company varied. It looks like, the closing market prices, lowest and highest market prices, and the last traded price of every company prove to be statistically significant. Many companies seem to have their lowest market prices in the range of 1000 to 1100.

5. References:

Appendix 1:(Descriptive Statistics)

#Q1: Read your dataset in R and visualize the length and breadth of your dataset.

setwd("F:/R-Internship/Course related files")
nse_stocks.df<-read.csv(paste("NSE_Stocks.csv",sep=""))
View(nse_stocks.df)
dim(nse_stocks.df)

## [1] 846404     13

#Q2: Create a descriptive statistics (min, max, median etc) of each variable.

library(psych)
summary(nse_stocks.df)

##         SYMBOL           SERIES            OPEN          
##  SRTRANSFIN:  4709   EQ     :739199   Min.   :     0.05  
##  IDFCBANK  :  4379   BE     : 37428   1st Qu.:    39.90  
##  IRFC      :  4362   SM     : 11229   Median :   139.20  
##  NHAI      :  3360   N6     :  5080   Mean   :   561.26  
##  RECLTD    :  2919   N2     :  4828   3rd Qu.:   490.00  
##  L&TINFRA  :  2760   N4     :  3722   Max.   :119990.00  
##  (Other)   :823915   (Other): 44918                      
##       HIGH                LOW                CLOSE          
##  Min.   :     0.05   Min.   :     0.05   Min.   :     0.05  
##  1st Qu.:    40.85   1st Qu.:    38.85   1st Qu.:    39.75  
##  Median :   142.05   Median :   136.15   Median :   138.90  
##  Mean   :   568.68   Mean   :   553.86   Mean   :   560.82  
##  3rd Qu.:   499.55   3rd Qu.:   481.10   3rd Qu.:   489.55  
##  Max.   :119990.00   Max.   :119990.00   Max.   :119990.00  
##                                                             
##       LAST            PREVCLOSE           TOTTRDQTY        
##  Min.   :     0.0   Min.   :     0.05   Min.   :        1  
##  1st Qu.:    39.6   1st Qu.:    39.70   1st Qu.:     6628  
##  Median :   138.4   Median :   138.75   Median :    44292  
##  Mean   :   560.2   Mean   :   560.23   Mean   :   698380  
##  3rd Qu.:   488.5   3rd Qu.:   489.15   3rd Qu.:   275732  
##  Max.   :119990.0   Max.   :119990.00   Max.   :781836478  
##                                                            
##    TOTTRDVAL            TIMESTAMP       TOTALTRADES     
##  Min.   :0.000e+00   26-12-17:  1860   Min.   :      1  
##  1st Qu.:6.072e+05   29-12-17:  1850   1st Qu.:     84  
##  Median :5.321e+06   27-12-17:  1846   Median :    600  
##  Mean   :1.335e+08   22-12-17:  1836   Mean   :   5014  
##  3rd Qu.:3.711e+07   24-11-17:  1830   3rd Qu.:   3028  
##  Max.   :1.426e+11   27-11-17:  1830   Max.   :1192900  
##                      (Other) :835352                    
##            ISIN       
##  INE040A01026:   640  
##  INE237A01028:   508  
##  INE043D01016:   505  
##  INE092A01019:   504  
##  INE979R01011:   504  
##  INE095A01012:   501  
##  (Other)     :843242

describe(nse_stocks.df)

## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning
## Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

##             vars      n         mean           sd  min          max
## SYMBOL*        1 846404          NaN           NA  Inf         -Inf
## SERIES*        2 846404          NaN           NA  Inf         -Inf
## OPEN           3 846404       561.26      2005.76 0.05       119990
## HIGH           4 846404       568.68      2027.16 0.05       119990
## LOW            5 846404       553.86      1983.41 0.05       119990
## CLOSE          6 846404       560.82      2004.58 0.05       119990
## LAST           7 846404       560.18      2003.90 0.00       119990
## PREVCLOSE      8 846404       560.23      2000.36 0.05       119990
## TOTTRDQTY      9 846404    698380.05   4055629.69 1.00    781836478
## TOTTRDVAL     10 846404 133459225.97 629696792.93 0.05 142640000000
## TIMESTAMP*    11 846404          NaN           NA  Inf         -Inf
## TOTALTRADES   12 846404      5013.82     14901.12 1.00      1192900
## ISIN*         13 846404          NaN           NA  Inf         -Inf
##                    range        se
## SYMBOL*             -Inf        NA
## SERIES*             -Inf        NA
## OPEN        1.199899e+05      2.18
## HIGH        1.199899e+05      2.20
## LOW         1.199899e+05      2.16
## CLOSE       1.199899e+05      2.18
## LAST        1.199900e+05      2.18
## PREVCLOSE   1.199899e+05      2.17
## TOTTRDQTY   7.818365e+08   4408.28
## TOTTRDVAL   1.426400e+11 684451.42
## TIMESTAMP*          -Inf        NA
## TOTALTRADES 1.192899e+06     16.20
## ISIN*               -Inf        NA

#Q3: Create one-way contingency tables for the categorical variables in your dataset.

with(nse_stocks.df,table(SERIES))

## SERIES
##            BE     BL     BT     BZ     D1     DR     EQ     GB     H6 
##   2457  37428    227      1   2825     48    495 739199   2343      7 
##     HA     HB     HC     HE     IL     IT     IV     MF     N1     N2 
##      1      7     20      1    205    300    299    473   3576   4828 
##     N3     N4     N5     N6     N7     N8     N9     NB     NC     ND 
##   3098   3722   3661   5080   1658   2884   2021   1467   1195   1046 
##     NE     NF     NG     NH     NI     NJ     NK     NL     NM     NN 
##   1602    716    450    506    477    851    471    826    198    383 
##     NO     NP     NQ     NR     NS     NT     NU     NV     NW     NX 
##    544    410     19     86    266    120    341    183    476    375 
##     NY     NZ     P1     P2     Q1     Q2     SM     W2     Y1     Y2 
##    120    452    432    923    298     30  11229    402    310    212 
##     Y3     Y4     Y5     Y6     Y7     Y8     Y9     YA     YB     YC 
##    239     77     77    151    227    257    372    197    230    107 
##     YD     YG 
##     60    130

#Q4: Create two-way contingency tables for the categorical variables in your dataset.

nse_TAB<-nse_stocks.df[c(1:25,2500:2525),]
table(droplevels(nse_TAB$SYMBOL))

## 
##  20MICRONS 3IINFOTECH    3MINDIA    63MOONS    8KMILES   A2ZINFRA 
##          1          1          1          1          1          1 
## AARTIDRUGS   AARTIIND  AARVEEDEN       ABAN        ABB ABBOTINDIA 
##          1          1          1          1          1          1 
##      ABFRL    ABGSHIP ABIRLANUVO  ABMINTLTD        ACC    ACCELYA 
##          1          1          1          1          1          1 
##        ACE   ADANIENT ADANIPORTS ADANIPOWER ADANITRANS   ADFFOODS 
##          1          1          1          1          1          1 
##    ADHUNIK        ITI        IVC        IVP IVRCLINFRA  IVZINGOLD 
##          1          1          1          1          1          1 
## IVZINNIFTY       IZMO    J&KBANK     JAGRAN JAGSNPHARM  JAIBALAJI 
##          1          1          1          1          1          1 
## JAICORPLTD JAINSTUDIO      JALAN  JAMNAAUTO       JASH  JAYAGROGN 
##          1          1          1          1          1          1 
## JAYBARMARU JAYNECOIND JAYSREETEA JBCHEPHARM     JBFIND       JBMA 
##          1          1          1          1          1          1 
##      JCHAC JENSONICOL JETAIRWAYS 
##          1          1          1

table(droplevels(nse_TAB$SERIES))

## 
## BE BZ EQ SM 
##  4  1 44  2

Appendix 2: (PLOTS)

#Q5: Draw a  boxplot of the variables that belong to your study.

nse_TAB3<-nse_stocks.df[(nse_stocks.df$SERIES=="Y2") | 
                          (nse_stocks.df$SERIES=="Y7"),]
nse_TAB3$S1<-droplevels(nse_TAB3$SERIES)

boxplot(HIGH~S1,data=nse_TAB3,xlab="Highest Market Price",ylab="Equity values",main="Variation of Highest Market Price of Y2 and Y7", yaxt="n",horizontal=TRUE)
axis(side=2,at=c(1,2),labels=c("Y2","Y7"))

boxplot(LAST~S1,data=nse_TAB3,xlab="Last Traded Price",
        ylab="Equity Value",main="Variation of Last Traded Price of Y2 and Y7",
        yaxt="n",horizontal=TRUE)
axis(side=2,at=c(1,2),labels=c("Y2","Y7"))

#Q6: Draw Histograms for your suitable data fields.

hist(nse_TAB3$OPEN,xlab="Opening Market Price",
     ylab="Count",main="Histogram of Opening Market Price",col=c("light blue"))

hist(nse_TAB3$PREVCLOSE,xlab="Closing Market Price on the previous day",
     ylab="Count",main="Histogram of Previous Day's Closing Market Prices",col=c("gold 1"))

#Q7: Draw suitable plot for your data fields.

library(car)

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

scatterplot(nse_TAB3$LOW,nse_TAB3$TOTTRDQTY,xlab = "Lowest Market Price on a day",
            ylab= "Total Trade Quantity",
            main="Scatterplot of Total Trade Quantity vs Lowest Market Price",
            ylim=c(0,1000))

scatterplot(nse_TAB3$CLOSE,nse_TAB3$TOTTRDQTY,xlab = "Closing Market Price on a               day", ylab= "Total Trade Quantity",
            main="Scatterplot of Total Trade Quantity vs Closing Market Price",               ylim=c(0,1000))

Appendix 3: (Correlations and Hypothesis tests)

#Q8: Create a correlation matrix.

library(psych)
nse<-nse_stocks.df[,c(3:10,12)]
corr.test(nse,use="complete")

## Call:corr.test(x = nse, use = "complete")
## Correlation matrix 
##              OPEN  HIGH   LOW CLOSE  LAST PREVCLOSE TOTTRDQTY TOTTRDVAL
## OPEN         1.00  1.00  1.00  1.00  1.00      1.00     -0.03      0.06
## HIGH         1.00  1.00  1.00  1.00  1.00      1.00     -0.03      0.06
## LOW          1.00  1.00  1.00  1.00  1.00      1.00     -0.03      0.06
## CLOSE        1.00  1.00  1.00  1.00  1.00      1.00     -0.03      0.06
## LAST         1.00  1.00  1.00  1.00  1.00      1.00     -0.03      0.06
## PREVCLOSE    1.00  1.00  1.00  1.00  1.00      1.00     -0.03      0.06
## TOTTRDQTY   -0.03 -0.03 -0.03 -0.03 -0.03     -0.03      1.00      0.45
## TOTTRDVAL    0.06  0.06  0.06  0.06  0.06      0.06      0.45      1.00
## TOTALTRADES  0.02  0.02  0.02  0.02  0.02      0.02      0.48      0.86
##             TOTALTRADES
## OPEN               0.02
## HIGH               0.02
## LOW                0.02
## CLOSE              0.02
## LAST               0.02
## PREVCLOSE          0.02
## TOTTRDQTY          0.48
## TOTTRDVAL          0.86
## TOTALTRADES        1.00
## Sample Size 
## [1] 846404
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##             OPEN HIGH LOW CLOSE LAST PREVCLOSE TOTTRDQTY TOTTRDVAL
## OPEN           0    0   0     0    0         0         0         0
## HIGH           0    0   0     0    0         0         0         0
## LOW            0    0   0     0    0         0         0         0
## CLOSE          0    0   0     0    0         0         0         0
## LAST           0    0   0     0    0         0         0         0
## PREVCLOSE      0    0   0     0    0         0         0         0
## TOTTRDQTY      0    0   0     0    0         0         0         0
## TOTTRDVAL      0    0   0     0    0         0         0         0
## TOTALTRADES    0    0   0     0    0         0         0         0
##             TOTALTRADES
## OPEN                  0
## HIGH                  0
## LOW                   0
## CLOSE                 0
## LAST                  0
## PREVCLOSE             0
## TOTTRDQTY             0
## TOTTRDVAL             0
## TOTALTRADES           0
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

#Q9: Visualize your correlation matrix using corrgram.

nse4<-nse_stocks.df[1:1000,]
library(corrgram)
corrgram(nse4,order=TRUE,lower.panel = panel.shade, upper.panel = panel.pie,
        text.panel = panel.txt,main="Corrgram of nse_stocks intercorrelations")

#Q10: Create a scatter plot matrix for your data set.

library(car)
scatterplotMatrix(nse4[,c("OPEN","TOTTRDVAL","CLOSE","HIGH","LOW","LAST",
                  "TOTTRDQTY")],spread=FALSE,smoother.args = list(lty=2),
                  main="Scatter-plot Matrix of variables of nse_stocks dataset")

#Q11: Run a suitable test to check your hypothesis for your suitable assumptions.

library(MASS)
library(psych)
t.test(nse_stocks.df$OPEN,nse_stocks.df$LAST)

## 
##  Welch Two Sample t-test
## 
## data:  nse_stocks.df$OPEN and nse_stocks.df$LAST
## t = 0.34862, df = 1692800, p-value = 0.7274
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.965846  7.114581
## sample estimates:
## mean of x mean of y 
##  561.2588  560.1844

t.test(nse_stocks.df$LAST,nse_stocks.df$TOTALTRADES)

## 
##  Welch Two Sample t-test
## 
## data:  nse_stocks.df$LAST and nse_stocks.df$TOTALTRADES
## t = -272.52, df = 877010, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4485.670 -4421.608
## sample estimates:
## mean of x mean of y 
##  560.1844 5013.8232

#Q12: Run a t-test to analyse your hypothesis.
library(MASS)
library(psych)
t.test(nse_stocks.df$TOTTRDVAL,nse_stocks.df$OPEN)

## 
##  Welch Two Sample t-test
## 
## data:  nse_stocks.df$TOTTRDVAL and nse_stocks.df$OPEN
## t = 194.99, df = 846400, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  132117163 134800167
## sample estimates:
##    mean of x    mean of y 
## 1.334592e+08 5.612588e+02

t.test(nse_stocks.df$TOTTRDVAL,nse_stocks.df$CLOSE)

## 
##  Welch Two Sample t-test
## 
## data:  nse_stocks.df$TOTTRDVAL and nse_stocks.df$CLOSE
## t = 194.99, df = 846400, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  132117163 134800167
## sample estimates:
##    mean of x    mean of y 
## 1.334592e+08 5.608152e+02

t.test(nse_stocks.df$TOTTRDQTY,nse_stocks.df$LAST)

## 
##  Welch Two Sample t-test
## 
## data:  nse_stocks.df$TOTTRDQTY and nse_stocks.df$LAST
## t = 158.3, df = 846400, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  689179.8 706460.0
## sample estimates:
##   mean of x   mean of y 
## 698380.0461    560.1844

t.test(nse_stocks.df$TOTTRDQTY,nse_stocks.df$OPEN)

## 
##  Welch Two Sample t-test
## 
## data:  nse_stocks.df$TOTTRDQTY and nse_stocks.df$OPEN
## t = 158.3, df = 846400, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  689178.7 706458.9
## sample estimates:
##   mean of x   mean of y 
## 698380.0461    561.2588

NSE Stock Exchange dataset

Shivaprakash.B

January 27, 2018