Advanced Statistical Inference
JPMorgan Chase & Co. Stock Analysis
2019.12.17
😎
Getting the data needed…
Stock = as.data.frame(tidyquant::tq_get(c("JPM"),get="stock.prices"))
Stock = Stock[year(Stock$date) > 2014, ]
str(Stock)## 'data.frame': 1271 obs. of 7 variables:
## $ date : Date, format: "2015-01-02" "2015-01-05" ...
## $ open : num 62.2 62.1 60.6 59.9 60 ...
## $ high : num 63 62.3 60.8 59.9 60.9 ...
## $ low : num 62.1 60.2 58.3 58.7 60 ...
## $ close : num 62.5 60.5 59 59.1 60.4 ...
## $ volume : num 12600000 20100600 29074100 23843200 16971100 ...
## $ adjusted: num 54.5 52.8 51.4 51.5 52.6 ...
Some data cleaning…
# finding number of the day within a week
Stock$weekday = as.POSIXlt(Stock$date)$wday
Stock$weekdayf<-factor(Stock$weekday,levels=rev(1:7),
labels=rev(c("M","T","W","R","F","Sa","Su")),ordered=TRUE)
# finding the week of the year for each date
Stock$week <- as.numeric(format(Stock$date,"%W"))
# finding the month
Stock$monthf<-factor(month(Stock$date),levels=as.character(1:12),
labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)
# finding the year and the month from the date
Stock$yearmonth<- factor(as.yearmon(Stock$date))
# normalizing the week to start at 1 for every month
Stock<-ddply(Stock,.(yearmonth),transform,monthweek=1+week-min(week)) How does the data frame look like now?
## 'data.frame': 1271 obs. of 14 variables:
## $ date : Date, format: "2015-01-02" "2015-01-05" ...
## $ open : num 62.2 62.1 60.6 59.9 60 ...
## $ high : num 63 62.3 60.8 59.9 60.9 ...
## $ low : num 62.1 60.2 58.3 58.7 60 ...
## $ close : num 62.5 60.5 59 59.1 60.4 ...
## $ volume : num 12600000 20100600 29074100 23843200 16971100 ...
## $ adjusted : num 54.5 52.8 51.4 51.5 52.6 ...
## $ weekday : int 5 1 2 3 4 5 1 2 3 4 ...
## $ weekdayf : Ord.factor w/ 7 levels "Su"<"Sa"<"F"<..: 3 7 6 5 4 3 7 6 5 4 ...
## $ week : num 0 1 1 1 1 1 2 2 2 2 ...
## $ monthf : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 1 1 1 1 1 1 1 1 1 1 ...
## $ yearmonth: Factor w/ 61 levels "Jan 2015","Feb 2015",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ monthweek: num 1 2 2 2 2 2 3 3 3 3 ...
## $ percent : num 0.499 2.433 2.737 1.369 0.7 ...
HYPOTHESIS 1
Null Hypothesis: There is no correlation between adjusted stock price and different days within a week.
Alternative Hypothesis: There is a correlation between adjusted stock price and different days in a week.
Anova Test
## Df Sum Sq Mean Sq F value Pr(>F)
## weekdayf 4 126 31.6 0.053 0.995
## Residuals 1266 760954 601.1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = adjusted ~ weekdayf, data = Stock)
##
## $weekdayf
## diff lwr upr p adj
## R-F -0.5542334 -6.468069 5.359602 0.9990546
## W-F -0.8893065 -6.791737 5.013124 0.9939818
## T-F -0.3072013 -6.198380 5.583977 0.9999075
## M-F -0.1453322 -6.175816 5.885151 0.9999957
## W-R -0.3350731 -6.231726 5.561579 0.9998697
## T-R 0.2470321 -5.638357 6.132421 0.9999610
## M-R 0.4089012 -5.615927 6.433730 0.9997362
## T-W 0.5821051 -5.291823 6.456034 0.9988223
## M-W 0.7439742 -5.269659 6.757608 0.9971935
## M-T 0.1618691 -5.840721 6.164459 0.9999933
RESULTS FROM ANOVA TEST
P-value is not significant – very close to 1
Difference between each day is observed due to chance
Cannot reject null hypothesis
Conclude that there is no correlation between adjusted stock price and different days in a week
HYPOTHESIS 2
Null Hypothesis: There is no correlation between percentage change in stock price and trading volume.
Alternative Hypothesis: There is a correlation between percentage change in stock price and trading volume.
Checking if there are non meaningful 0 values…
## date open high low
## Min. :2015-01-02 Min. : 53.90 Min. : 53.91 Min. : 50.07
## 1st Qu.:2016-04-07 1st Qu.: 65.96 1st Qu.: 66.36 1st Qu.: 65.50
## Median :2017-07-12 Median : 91.25 Median : 91.85 Median : 90.84
## Mean :2017-07-11 Mean : 89.65 Mean : 90.38 Mean : 88.95
## 3rd Qu.:2018-10-13 3rd Qu.:109.70 3rd Qu.:110.80 3rd Qu.:108.60
## Max. :2020-01-21 Max. :139.90 Max. :141.10 Max. :139.26
##
## close volume adjusted weekday
## Min. : 53.07 Min. : 3324300 Min. : 47.39 Min. :1.000
## 1st Qu.: 65.89 1st Qu.:10918500 1st Qu.: 58.81 1st Qu.:2.000
## Median : 91.28 Median :13440900 Median : 85.12 Median :3.000
## Mean : 89.67 Mean :14737441 Mean : 84.34 Mean :3.025
## 3rd Qu.:109.74 3rd Qu.:16892150 3rd Qu.:105.74 3rd Qu.:4.000
## Max. :141.09 Max. :56192300 Max. :140.19 Max. :5.000
##
## weekdayf week monthf yearmonth monthweek
## Su: 0 Min. : 0.00 Jan :114 Aug 2016: 23 Min. :1.000
## Sa: 0 1st Qu.:13.00 Aug :112 Mar 2017: 23 1st Qu.:2.000
## F :256 Median :26.00 Oct :111 Aug 2017: 23 Median :3.000
## R :257 Mean :26.14 Mar :109 Aug 2018: 23 Mean :2.956
## W :259 3rd Qu.:39.00 May :107 Oct 2018: 23 3rd Qu.:4.000
## T :261 Max. :53.00 Jun :107 Oct 2019: 23 Max. :5.000
## M :238 (Other):611 (Other) :1133
## percent
## Min. :0.0000
## 1st Qu.:0.2566
## Median :0.5695
## Mean :0.7581
## 3rd Qu.:1.0072
## Max. :5.0555
##
open, close, adjusted, and volume – all looking good
Linear Regression
##
## Call:
## lm(formula = percent ~ volume, data = Stock)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7783 -0.3814 -0.1168 0.2823 3.2708
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.301e-01 4.691e-02 -2.773 0.00564 **
## volume 6.026e-08 2.957e-09 20.377 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6182 on 1269 degrees of freedom
## Multiple R-squared: 0.2465, Adjusted R-squared: 0.246
## F-statistic: 415.2 on 1 and 1269 DF, p-value: < 2.2e-16
Post Hoc Power Analysis
f2 = 0.246 / (1 - 0.246)
f2 = round(f2, digit = 5)
pwr.f2.test(u = 1, v = 1266, f2 = 0.32626, power = NULL)##
## Multiple regression power calculation
##
## u = 1
## v = 1266
## f2 = 0.32626
## sig.level = 0.05
## power = 1
RESULTS FROM LINEAR REGRESSION TEST
P-value is significant: < 2.2e-16
Adjusted R-squared: 0.246
Estimated coefficient for predictor (volume): 6.026e-08
Power: 1
Can rejct the null hypothesis very confidently and conclude that there is a correlation between percentage change in stock price and trading volume regardless the estimated coefficient for predictor being minimal
How has the adjusted stock price of JPMorgan Chase & Co.
been changing over the past 5 years?
one visualization that explains all!