Advanced Statistical Inference
Alphabet Inc. Stock Analysis
2019年12月17日
星期二
😎
Getting the data needed…
Stock = as.data.frame(tidyquant::tq_get(c("GOOGL"),get="stock.prices"))
Stock = Stock[year(Stock$date) > 2014, ]
str(Stock)## 'data.frame': 1265 obs. of 7 variables:
## $ date : Date, format: "2015-01-02" "2015-01-05" ...
## $ open : num 533 527 520 511 502 ...
## $ high : num 536 528 521 511 508 ...
## $ low : num 528 518 506 504 495 ...
## $ close : num 530 519 507 505 507 ...
## $ volume : num 1324000 2059100 2722800 2345900 3652700 ...
## $ adjusted: num 530 519 507 505 507 ...
Some data cleaning…
# finding number of the day within a week
Stock$weekday = as.POSIXlt(Stock$date)$wday
Stock$weekdayf<-factor(Stock$weekday,levels=rev(1:7),
labels=rev(c("M","T","W","R","F","Sa","Su")),ordered=TRUE)
# finding the week of the year for each date
Stock$week <- as.numeric(format(Stock$date,"%W"))
# finding the month
Stock$monthf<-factor(month(Stock$date),levels=as.character(1:12),
labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"),ordered=TRUE)
# finding the year and the month from the date
Stock$yearmonth<- factor(as.yearmon(Stock$date))
# normalizing the week to start at 1 for every month
Stock<-ddply(Stock,.(yearmonth),transform,monthweek=1+week-min(week)) How does the data frame look like now?
## 'data.frame': 1265 obs. of 14 variables:
## $ date : Date, format: "2015-01-02" "2015-01-05" ...
## $ open : num 533 527 520 511 502 ...
## $ high : num 536 528 521 511 508 ...
## $ low : num 528 518 506 504 495 ...
## $ close : num 530 519 507 505 507 ...
## $ volume : num 1324000 2059100 2722800 2345900 3652700 ...
## $ adjusted : num 530 519 507 505 507 ...
## $ weekday : int 5 1 2 3 4 5 1 2 3 4 ...
## $ weekdayf : Ord.factor w/ 7 levels "Su"<"Sa"<"F"<..: 3 7 6 5 4 3 7 6 5 4 ...
## $ week : num 0 1 1 1 1 1 2 2 2 2 ...
## $ monthf : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 1 1 1 1 1 1 1 1 1 1 ...
## $ yearmonth: Factor w/ 61 levels "Jan 2015","Feb 2015",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ monthweek: num 1 2 2 2 2 2 3 3 3 3 ...
## $ percent : num 0.573 1.459 2.663 1.135 1.077 ...
HYPOTHESIS 1
Null Hypothesis: There is no correlation between adjusted stock price and different days within a week.
Alternative Hypothesis: There is a correlation between adjusted stock price and different days in a week.
Anova Test
## Df Sum Sq Mean Sq F value Pr(>F)
## weekdayf 4 9853 2463 0.047 0.996
## Residuals 1260 65711693 52152
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = adjusted ~ weekdayf, data = Stock)
##
## $weekdayf
## diff lwr upr p adj
## R-F -4.374492 -59.56880 50.81982 0.9995122
## W-F -7.931393 -63.01884 47.15606 0.9949471
## T-F -5.793433 -60.82799 49.24113 0.9985070
## M-F -2.027067 -58.31460 54.26047 0.9999788
## W-R -3.556901 -58.59021 51.47641 0.9997827
## T-R -1.418940 -56.39931 53.56143 0.9999944
## M-R 2.347425 -53.88713 58.58198 0.9999619
## T-W 2.137961 -52.73514 57.01106 0.9999710
## M-W 5.904326 -50.22535 62.03400 0.9985113
## M-T 3.766365 -52.31140 59.84413 0.9997469
RESULTS FROM ANOVA TEST
P-value is not significant – very close to 1
Difference between each day is observed due to chance
Cannot reject null hypothesis
Conclude that there is no correlation between adjusted stock price and different days in a week
HYPOTHESIS 2
Null Hypothesis: There is no correlation between percentage change in stock price and trading volume.
Alternative Hypothesis: There is a correlation between percentage change in stock price and trading volume.
Checking if there are non meaningful 0 values…
## date open high low
## Min. :2015-01-02 Min. : 499.2 Min. : 500.3 Min. : 490.9
## 1st Qu.:2016-04-06 1st Qu.: 749.0 1st Qu.: 755.3 1st Qu.: 743.6
## Median :2017-07-07 Median : 948.0 Median : 954.2 Median : 941.0
## Mean :2017-07-07 Mean : 929.5 Mean : 937.4 Mean : 921.3
## 3rd Qu.:2018-10-08 3rd Qu.:1120.2 3rd Qu.:1134.0 3rd Qu.:1112.0
## Max. :2020-01-10 Max. :1429.5 Max. :1434.9 Max. :1419.6
##
## close volume adjusted weekday
## Min. : 497.1 Min. : 520600 Min. : 497.1 Min. :1.000
## 1st Qu.: 750.4 1st Qu.: 1312900 1st Qu.: 750.4 1st Qu.:2.000
## Median : 948.5 Median : 1636400 Median : 948.5 Median :3.000
## Mean : 929.7 Mean : 1861701 Mean : 929.7 Mean :3.026
## 3rd Qu.:1122.9 3rd Qu.: 2098000 3rd Qu.:1122.9 3rd Qu.:4.000
## Max. :1429.0 Max. :12858100 Max. :1429.0 Max. :5.000
##
## weekdayf week monthf yearmonth monthweek
## Su: 0 Min. : 0.00 Aug :112 Aug 2016: 23 Min. :1.000
## Sa: 0 1st Qu.:13.00 Oct :111 Mar 2017: 23 1st Qu.:2.000
## F :255 Median :26.00 Mar :109 Aug 2017: 23 Median :3.000
## R :256 Mean :26.26 Jan :108 Aug 2018: 23 Mean :2.955
## W :258 3rd Qu.:39.00 May :107 Oct 2018: 23 3rd Qu.:4.000
## T :259 Max. :53.00 Jun :107 Oct 2019: 23 Max. :5.000
## M :237 (Other):611 (Other) :1127
## percent
## Min. :0.0000
## 1st Qu.:0.2788
## Median :0.6233
## Mean :0.8490
## 3rd Qu.:1.1881
## Max. :5.6368
##
open, close, adjusted, and volume – all looking good
Linear Regression
##
## Call:
## lm(formula = percent ~ volume, data = Stock)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4050 -0.4468 -0.1284 0.3084 4.0987
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.707e-02 4.476e-02 2.169 0.0303 *
## volume 4.039e-07 2.146e-08 18.822 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7176 on 1263 degrees of freedom
## Multiple R-squared: 0.219, Adjusted R-squared: 0.2184
## F-statistic: 354.3 on 1 and 1263 DF, p-value: < 2.2e-16
Post Hoc Power Analysis
f2 = 0.2196 / (1 - 0.2196)
f2 = round(f2, digit = 5)
pwr.f2.test(u = 1, v = 1242, f2 = 0.28139, power = NULL)##
## Multiple regression power calculation
##
## u = 1
## v = 1242
## f2 = 0.28139
## sig.level = 0.05
## power = 1
RESULTS FROM LINEAR REGRESSION TEST
P-value is significant: < 2.2e-16
R-squared: 0.2192
Estimated coefficient for predictor (volume): 4.034e-07
Power: 1
Can rejct the null hypothesis very confidently and conclude that there is a correlation between percentage change in stock price and trading volume
How has the adjusted stock price of ALphabet Inc.
been changing over the past 5 years?
one visualization that explains all!
thank u, next
æ—† :)