This problem set covers simple categorical predictors, the link between regression and ANOVA, and post-hoc tests.`

Categorical Predictors

On each of day of one week, we sampled 100 random company stocks and examined their trading price. Each day a different set of stocks was sampled at random from the NYSE and NASDAQ published prices.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.2
data <- read.csv("ps10data.csv")

head(data)
##   day     sector stockprice
## 1  Mo automotive      55.11
## 2  Mo automotive      85.10
## 3  Mo     health      99.67
## 4  Mo automotive      19.79
## 5  Mo automotive      69.68
## 6  Mo automotive      61.97

This is stored in a ‘long’ format data frame (day and sector are columns) instead of a matrix (with sector and day as rows and columns). For a regression or ANOVA, we really need the long data format, like we would get from ‘aggregate’ instead of ‘tapply’.

ggplot(data,aes(x=day,y=stockprice)) + geom_point(aes(color=sector)) +theme_minimal()

For this problem, we want to determine, using a number of methods, which days differed from which other days. In each case, run the test, and answer the question in 1-2 sentences describing what you found. Use a p=.05 as a criterion for determining whether an effect is statistically significant.

1. Contrasts

First use a contrast that will compare each day to Monday, and report which of the days had prices significantly higher than monday (report the test obtained directly from the coefficients of lm by doing summary() on the results of lm()).

stocks = data$stockprice
days = factor(data$day)
days <- factor(days, levels = c("Mo","Tu","We","Th","Fr","Sa","Su"))

summary(lm(stocks~days))
## 
## Call:
## lm(formula = stocks ~ days)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -90.497 -33.876  -0.053  36.118 103.973 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  60.5618     4.2292  14.320  < 2e-16 ***
## daysTu       -1.0436     5.9809  -0.174 0.861533    
## daysWe       -0.3232     5.9809  -0.054 0.956920    
## daysTh       20.2005     5.9809   3.377 0.000772 ***
## daysFr       21.0045     5.9809   3.512 0.000474 ***
## daysSa       17.8423     5.9809   2.983 0.002953 ** 
## daysSu       23.2153     5.9809   3.882 0.000114 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.29 on 693 degrees of freedom
## Multiple R-squared:  0.05869,    Adjusted R-squared:  0.05054 
## F-statistic: 7.202 on 6 and 693 DF,  p-value: 1.808e-07

Stock prices during Thursday, Friday, Saturday, and Sunday were significantly higher than Monday

2. Successive difference contrasts

Then, use successive difference coding of the day variable to determine which days of the week differed significantly from the previous day.

library(MASS)
## Warning: package 'MASS' was built under R version 4.0.3
contrasts(days) <- contr.sdif(levels(days))
summary(lm(stocks~days))
## 
## Call:
## lm(formula = stocks ~ days)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -90.497 -33.876  -0.053  36.118 103.973 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  72.1183     1.5985  45.117  < 2e-16 ***
## daysTu-Mo    -1.0436     5.9809  -0.174 0.861533    
## daysWe-Tu     0.7204     5.9809   0.120 0.904162    
## daysTh-We    20.5237     5.9809   3.432 0.000636 ***
## daysFr-Th     0.8040     5.9809   0.134 0.893104    
## daysSa-Fr    -3.1622     5.9809  -0.529 0.597174    
## daysSu-Sa     5.3730     5.9809   0.898 0.369309    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.29 on 693 degrees of freedom
## Multiple R-squared:  0.05869,    Adjusted R-squared:  0.05054 
## F-statistic: 7.202 on 6 and 693 DF,  p-value: 1.808e-07

If we use successive difference contrasts 2 days are statistically different by each other. The stock prices in Thusday are way higher than those in Wednesday.

3. Pairwise t-tests

Use pairwise.t.test function to compute all pairwise t-tests and the holm correction between days of the week. Describe concisely which days differed from which other days.

pairwise.t.test(stocks,days,p.adj="holm")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  stocks and days 
## 
##    Mo     Tu     We     Th     Fr     Sa    
## Tu 1.0000 -      -      -      -      -     
## We 1.0000 1.0000 -      -      -      -     
## Th 0.0100 0.0066 0.0089 -      -      -     
## Fr 0.0071 0.0044 0.0066 1.0000 -      -     
## Sa 0.0295 0.0199 0.0272 1.0000 1.0000 -     
## Su 0.0022 0.0012 0.0018 1.0000 1.0000 1.0000
## 
## P value adjustment method: holm

Monday, Tuesday, Wednesday stock prices did not differ between them (p > 0.05). While on Thursday, Friday, Saturday, and Sunday stock prices were significantly higher than the remaining days (p < 0.05) and did not significantly different between them.

4. Oneway Anova and post-hoc tests

Use an aov() model to predict stock price by day, and then compute Tukey HSD test on all pairwise comparisons using the Tukey test. Do the result differ from part 3?

summary(aov(stocks~days))
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## days          6   77286   12881   7.202 1.81e-07 ***
## Residuals   693 1239488    1789                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(aov(stocks~days))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = stocks ~ days)
## 
## $days
##          diff        lwr      upr     p adj
## Tu-Mo -1.0436 -18.729386 16.64219 0.9999976
## We-Mo -0.3232 -18.008986 17.36259 1.0000000
## Th-Mo 20.2005   2.514714 37.88629 0.0135627
## Fr-Mo 21.0045   3.318714 38.69029 0.0085579
## Sa-Mo 17.8423   0.156514 35.52809 0.0463876
## Su-Mo 23.2153   5.529514 40.90109 0.0021786
## We-Tu  0.7204 -16.965386 18.40619 0.9999997
## Th-Tu 21.2441   3.558314 38.92989 0.0074315
## Fr-Tu 22.0481   4.362314 39.73389 0.0045691
## Sa-Tu 18.8859   1.200114 36.57169 0.0275391
## Su-Tu 24.2589   6.573114 41.94469 0.0010864
## Th-We 20.5237   2.837914 38.20949 0.0112984
## Fr-We 21.3277   3.641914 39.01349 0.0070715
## Sa-We 18.1655   0.479714 35.85129 0.0396268
## Su-We 23.5385   5.852714 41.22429 0.0017622
## Fr-Th  0.8040 -16.881786 18.48979 0.9999995
## Sa-Th -2.3582 -20.043986 15.32759 0.9997075
## Su-Th  3.0148 -14.670986 20.70059 0.9988009
## Sa-Fr -3.1622 -20.847986 14.52359 0.9984289
## Su-Fr  2.2108 -15.474986 19.89659 0.9997990
## Su-Sa  5.3730 -12.312786 23.05879 0.9727959

Our one-way-Anova model predicts that there are significantly different stock prices in different days (p = 1.81e-07). The Tukey’s range test gives us the exact same results as the pairwise test even though the p values slightly differ (because they are corrected using a different “logic”)

5. Kruskall-Wallis Test

Compute a kruskall-wallis test to see if the non-parametric test shows stock price depended on day-of-week.

kruskal.test(stocks~days)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  stocks by days
## Kruskal-Wallis chi-squared = 36.113, df = 6, p-value = 2.621e-06

Similarly to the previous one-way-Anova, the kruskall-wallis test detects statistically significant differences of stock prices within the week (p = 2.621e-06). ## 6. Bayes Factor ANOVA Compute a one-way BayesFactor ANOVA and report the Bayes factor score determining if day-of-week impacted stock price.

#options("contrasts")
library(BayesFactor)
## Warning: package 'BayesFactor' was built under R version 4.0.2
## Loading required package: coda
## Warning: package 'coda' was built under R version 4.0.2
## Loading required package: Matrix
## ************
## Welcome to BayesFactor 0.9.12-4.2. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
## 
## Type BFManual() to open the manual.
## ************
dat = data.frame(stocks,days)
bf = anovaBF(stocks~days,data = dat)

chains <- posterior(bf,iterations=10000)
chains[1:5,]
##            mu    days-Mo    days-Tu    days-We   days-Th   days-Fr   days-Sa
## [1,] 72.12510 -11.480711 -12.439192 -11.724618  8.543681  9.217479  6.194580
## [2,] 69.38709  -7.245098  -8.955825 -10.983874  5.969557  7.050878 10.426345
## [3,] 68.38855 -18.813957  -5.743782 -11.992990  5.194182 18.558269  4.291178
## [4,] 72.53927 -14.820836 -10.026128  -7.317038 10.247065  5.603544  6.816419
## [5,] 71.88645 -14.616078 -12.834101  -4.542650  8.779043  6.500464  6.701760
##        days-Su     sig2    g_days
## [1,] 11.688781 1611.467 0.0648013
## [2,]  3.738015 1777.272 0.1786362
## [3,]  8.507100 1735.346 0.1251258
## [4,]  9.496974 1899.829 0.1907061
## [5,] 10.011563 2046.888 0.1736627
new_levels = c('na',levels(days))
library(RColorBrewer)


par(mfrow=c(2,3))
for(i in 3:8){
  hist(chains[,i]-chains[,i-1],breaks=100,main = paste(new_levels[i],'-', new_levels[i-1],'at 1000 iterations','P(',new_levels[i],'>',new_levels[i-1],')','=',mean(chains[,i]>chains[,i-1])*100,'%'), xlab = 'stock price difference (10^3 $)' ,ylab = 'likelihood', col = brewer.pal(7, "Set1")[i])
  abline(v = 0,lwd = 3,col = 'red')
  abline(v = mean(chains[,i])-mean(chains[,i-1]),lwd=3,col = 'blue')
  legend('topright', legend = c('0 point','mean'), lty = 1, lwd = 3, col = c('red','blue'),bty = "n")
}

The one-way BayesFactor ANOVA reports an extreme evidence for the data to be under the Ha than the H0 (BF = 41865.66). Meaning that stock prices significantly different within the week (which is our alternative hypothesis Ha).