This problem set covers simple categorical predictors, the link between regression and ANOVA, and post-hoc tests.`
On each of day of one week, we sampled 100 random company stocks and examined their trading price. Each day a different set of stocks was sampled at random from the NYSE and NASDAQ published prices.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.2
data <- read.csv("ps10data.csv")
head(data)
## day sector stockprice
## 1 Mo automotive 55.11
## 2 Mo automotive 85.10
## 3 Mo health 99.67
## 4 Mo automotive 19.79
## 5 Mo automotive 69.68
## 6 Mo automotive 61.97
This is stored in a ‘long’ format data frame (day and sector are columns) instead of a matrix (with sector and day as rows and columns). For a regression or ANOVA, we really need the long data format, like we would get from ‘aggregate’ instead of ‘tapply’.
ggplot(data,aes(x=day,y=stockprice)) + geom_point(aes(color=sector)) +theme_minimal()
For this problem, we want to determine, using a number of methods, which days differed from which other days. In each case, run the test, and answer the question in 1-2 sentences describing what you found. Use a p=.05 as a criterion for determining whether an effect is statistically significant.
First use a contrast that will compare each day to Monday, and report which of the days had prices significantly higher than monday (report the test obtained directly from the coefficients of lm by doing summary() on the results of lm()).
stocks = data$stockprice
days = factor(data$day)
days <- factor(days, levels = c("Mo","Tu","We","Th","Fr","Sa","Su"))
summary(lm(stocks~days))
##
## Call:
## lm(formula = stocks ~ days)
##
## Residuals:
## Min 1Q Median 3Q Max
## -90.497 -33.876 -0.053 36.118 103.973
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60.5618 4.2292 14.320 < 2e-16 ***
## daysTu -1.0436 5.9809 -0.174 0.861533
## daysWe -0.3232 5.9809 -0.054 0.956920
## daysTh 20.2005 5.9809 3.377 0.000772 ***
## daysFr 21.0045 5.9809 3.512 0.000474 ***
## daysSa 17.8423 5.9809 2.983 0.002953 **
## daysSu 23.2153 5.9809 3.882 0.000114 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42.29 on 693 degrees of freedom
## Multiple R-squared: 0.05869, Adjusted R-squared: 0.05054
## F-statistic: 7.202 on 6 and 693 DF, p-value: 1.808e-07
Stock prices during Thursday, Friday, Saturday, and Sunday were significantly higher than Monday
Then, use successive difference coding of the day variable to determine which days of the week differed significantly from the previous day.
library(MASS)
## Warning: package 'MASS' was built under R version 4.0.3
contrasts(days) <- contr.sdif(levels(days))
summary(lm(stocks~days))
##
## Call:
## lm(formula = stocks ~ days)
##
## Residuals:
## Min 1Q Median 3Q Max
## -90.497 -33.876 -0.053 36.118 103.973
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 72.1183 1.5985 45.117 < 2e-16 ***
## daysTu-Mo -1.0436 5.9809 -0.174 0.861533
## daysWe-Tu 0.7204 5.9809 0.120 0.904162
## daysTh-We 20.5237 5.9809 3.432 0.000636 ***
## daysFr-Th 0.8040 5.9809 0.134 0.893104
## daysSa-Fr -3.1622 5.9809 -0.529 0.597174
## daysSu-Sa 5.3730 5.9809 0.898 0.369309
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42.29 on 693 degrees of freedom
## Multiple R-squared: 0.05869, Adjusted R-squared: 0.05054
## F-statistic: 7.202 on 6 and 693 DF, p-value: 1.808e-07
If we use successive difference contrasts 2 days are statistically different by each other. The stock prices in Thusday are way higher than those in Wednesday.
Use pairwise.t.test function to compute all pairwise t-tests and the holm correction between days of the week. Describe concisely which days differed from which other days.
pairwise.t.test(stocks,days,p.adj="holm")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: stocks and days
##
## Mo Tu We Th Fr Sa
## Tu 1.0000 - - - - -
## We 1.0000 1.0000 - - - -
## Th 0.0100 0.0066 0.0089 - - -
## Fr 0.0071 0.0044 0.0066 1.0000 - -
## Sa 0.0295 0.0199 0.0272 1.0000 1.0000 -
## Su 0.0022 0.0012 0.0018 1.0000 1.0000 1.0000
##
## P value adjustment method: holm
Monday, Tuesday, Wednesday stock prices did not differ between them (p > 0.05). While on Thursday, Friday, Saturday, and Sunday stock prices were significantly higher than the remaining days (p < 0.05) and did not significantly different between them.
Use an aov() model to predict stock price by day, and then compute Tukey HSD test on all pairwise comparisons using the Tukey test. Do the result differ from part 3?
summary(aov(stocks~days))
## Df Sum Sq Mean Sq F value Pr(>F)
## days 6 77286 12881 7.202 1.81e-07 ***
## Residuals 693 1239488 1789
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(aov(stocks~days))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = stocks ~ days)
##
## $days
## diff lwr upr p adj
## Tu-Mo -1.0436 -18.729386 16.64219 0.9999976
## We-Mo -0.3232 -18.008986 17.36259 1.0000000
## Th-Mo 20.2005 2.514714 37.88629 0.0135627
## Fr-Mo 21.0045 3.318714 38.69029 0.0085579
## Sa-Mo 17.8423 0.156514 35.52809 0.0463876
## Su-Mo 23.2153 5.529514 40.90109 0.0021786
## We-Tu 0.7204 -16.965386 18.40619 0.9999997
## Th-Tu 21.2441 3.558314 38.92989 0.0074315
## Fr-Tu 22.0481 4.362314 39.73389 0.0045691
## Sa-Tu 18.8859 1.200114 36.57169 0.0275391
## Su-Tu 24.2589 6.573114 41.94469 0.0010864
## Th-We 20.5237 2.837914 38.20949 0.0112984
## Fr-We 21.3277 3.641914 39.01349 0.0070715
## Sa-We 18.1655 0.479714 35.85129 0.0396268
## Su-We 23.5385 5.852714 41.22429 0.0017622
## Fr-Th 0.8040 -16.881786 18.48979 0.9999995
## Sa-Th -2.3582 -20.043986 15.32759 0.9997075
## Su-Th 3.0148 -14.670986 20.70059 0.9988009
## Sa-Fr -3.1622 -20.847986 14.52359 0.9984289
## Su-Fr 2.2108 -15.474986 19.89659 0.9997990
## Su-Sa 5.3730 -12.312786 23.05879 0.9727959
Our one-way-Anova model predicts that there are significantly different stock prices in different days (p = 1.81e-07). The Tukey’s range test gives us the exact same results as the pairwise test even though the p values slightly differ (because they are corrected using a different “logic”)
Compute a kruskall-wallis test to see if the non-parametric test shows stock price depended on day-of-week.
kruskal.test(stocks~days)
##
## Kruskal-Wallis rank sum test
##
## data: stocks by days
## Kruskal-Wallis chi-squared = 36.113, df = 6, p-value = 2.621e-06
Similarly to the previous one-way-Anova, the kruskall-wallis test detects statistically significant differences of stock prices within the week (p = 2.621e-06). ## 6. Bayes Factor ANOVA Compute a one-way BayesFactor ANOVA and report the Bayes factor score determining if day-of-week impacted stock price.
#options("contrasts")
library(BayesFactor)
## Warning: package 'BayesFactor' was built under R version 4.0.2
## Loading required package: coda
## Warning: package 'coda' was built under R version 4.0.2
## Loading required package: Matrix
## ************
## Welcome to BayesFactor 0.9.12-4.2. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
##
## Type BFManual() to open the manual.
## ************
dat = data.frame(stocks,days)
bf = anovaBF(stocks~days,data = dat)
chains <- posterior(bf,iterations=10000)
chains[1:5,]
## mu days-Mo days-Tu days-We days-Th days-Fr days-Sa
## [1,] 72.12510 -11.480711 -12.439192 -11.724618 8.543681 9.217479 6.194580
## [2,] 69.38709 -7.245098 -8.955825 -10.983874 5.969557 7.050878 10.426345
## [3,] 68.38855 -18.813957 -5.743782 -11.992990 5.194182 18.558269 4.291178
## [4,] 72.53927 -14.820836 -10.026128 -7.317038 10.247065 5.603544 6.816419
## [5,] 71.88645 -14.616078 -12.834101 -4.542650 8.779043 6.500464 6.701760
## days-Su sig2 g_days
## [1,] 11.688781 1611.467 0.0648013
## [2,] 3.738015 1777.272 0.1786362
## [3,] 8.507100 1735.346 0.1251258
## [4,] 9.496974 1899.829 0.1907061
## [5,] 10.011563 2046.888 0.1736627
new_levels = c('na',levels(days))
library(RColorBrewer)
par(mfrow=c(2,3))
for(i in 3:8){
hist(chains[,i]-chains[,i-1],breaks=100,main = paste(new_levels[i],'-', new_levels[i-1],'at 1000 iterations','P(',new_levels[i],'>',new_levels[i-1],')','=',mean(chains[,i]>chains[,i-1])*100,'%'), xlab = 'stock price difference (10^3 $)' ,ylab = 'likelihood', col = brewer.pal(7, "Set1")[i])
abline(v = 0,lwd = 3,col = 'red')
abline(v = mean(chains[,i])-mean(chains[,i-1]),lwd=3,col = 'blue')
legend('topright', legend = c('0 point','mean'), lty = 1, lwd = 3, col = c('red','blue'),bty = "n")
}
The one-way BayesFactor ANOVA reports an extreme evidence for the data to be under the Ha than the H0 (BF = 41865.66). Meaning that stock prices significantly different within the week (which is our alternative hypothesis Ha).