Data:

library(ggplot2)
data <- read.csv("ps10data.csv")
head(data)
ggplot(data,aes(x=day,y=stockprice)) + geom_point(aes(color=sector)) +theme_minimal()

DATAMEANS<- aggregate(data$stockprice,list(data$day),mean)
DATAMEANS## means data set
VECTOROFdays<-c("Mo","Tu","We","Th","Fr","Sa","Su") ##new order
factordays<-factor(data$day,levels = VECTOROFdays)

Thedays<-factor(VECTOROFdays,levels = VECTOROFdays)
values<-c('60.5618','59.5182','60.2386','80.7623','81.5663','78.4041','83.7771')

Problem 1:

  1. First use a contrast that will compare each day to Monday, and report which of the days had prices significantly higher than monday (report the test obtained directly from the coefficients of lm by doing summary() on the results of lm()).

Solution:

Treament contrasts is used. The days which had prices significantly higher than Monday are: Thursday, Friday, Saturday and Sunday. As the P-value of these are less than 0.05, we conclude the above days are higher than Monday.

## Q1

contrasts(factordays)<-contr.treatment(levels(factordays))
summary(lm(data$stockprice~factordays))
## 
## Call:
## lm(formula = data$stockprice ~ factordays)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -90.497 -33.876  -0.053  36.118 103.973 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   60.5618     4.2292  14.320  < 2e-16 ***
## factordaysTu  -1.0436     5.9809  -0.174 0.861533    
## factordaysWe  -0.3232     5.9809  -0.054 0.956920    
## factordaysTh  20.2005     5.9809   3.377 0.000772 ***
## factordaysFr  21.0045     5.9809   3.512 0.000474 ***
## factordaysSa  17.8423     5.9809   2.983 0.002953 ** 
## factordaysSu  23.2153     5.9809   3.882 0.000114 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.29 on 693 degrees of freedom
## Multiple R-squared:  0.05869,    Adjusted R-squared:  0.05054 
## F-statistic: 7.202 on 6 and 693 DF,  p-value: 1.808e-07
  1. Then, use successive difference coding of the day variable to determine which days of the week differed significantly from the previous day.

Solution:

Monday - Sunday and Thursday - Wednesday, differed significantly from the Previous day.

Looking at the P-values, the values for the above pairs are less than 0.05, hence the differece is significant.

library(MASS)

contrasts(factordays) <- contr.sdif(levels(factordays))

summary(lm(data$stockprice~factordays))
## 
## Call:
## lm(formula = data$stockprice ~ factordays)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -90.497 -33.876  -0.053  36.118 103.973 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      72.1183     1.5985  45.117  < 2e-16 ***
## factordaysTu-Mo  -1.0436     5.9809  -0.174 0.861533    
## factordaysWe-Tu   0.7204     5.9809   0.120 0.904162    
## factordaysTh-We  20.5237     5.9809   3.432 0.000636 ***
## factordaysFr-Th   0.8040     5.9809   0.134 0.893104    
## factordaysSa-Fr  -3.1622     5.9809  -0.529 0.597174    
## factordaysSu-Sa   5.3730     5.9809   0.898 0.369309    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.29 on 693 degrees of freedom
## Multiple R-squared:  0.05869,    Adjusted R-squared:  0.05054 
## F-statistic: 7.202 on 6 and 693 DF,  p-value: 1.808e-07
##Q2
  1. Use pairwise.t.test function to compute all pairwise t-tests and the holm correction between days of the week. Describe concisely which days differed from which other days.

Solution:

The pairs: Mo-Th, Mo-Fr, Mo-Sa, Mo-Su, Tu-Th, Tu-Fr, Tu-Sa, Tu-Su, We-Th, We-Fr, We-Sa, We-Su.

Since the p-values are less than 0.05, the difference is significant.

pairwise.t.test(data$stockprice,factordays)
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  data$stockprice and factordays 
## 
##    Mo     Tu     We     Th     Fr     Sa    
## Tu 1.0000 -      -      -      -      -     
## We 1.0000 1.0000 -      -      -      -     
## Th 0.0100 0.0066 0.0089 -      -      -     
## Fr 0.0071 0.0044 0.0066 1.0000 -      -     
## Sa 0.0295 0.0199 0.0272 1.0000 1.0000 -     
## Su 0.0022 0.0012 0.0018 1.0000 1.0000 1.0000
## 
## P value adjustment method: holm
  1. Use an aov() model to predict stock price by day, and then compute Tukey HSD test on all pairwise comparisons using the Tukey test. Do the result differ from part 3?

Solution:

No, The result does not differ from Part-3.

aov(data$stockprice~factordays)##aov test
## Call:
##    aov(formula = data$stockprice ~ factordays)
## 
## Terms:
##                 factordays Residuals
## Sum of Squares     77286.5 1239488.1
## Deg. of Freedom          6       693
## 
## Residual standard error: 42.29164
## Estimated effects may be unbalanced
TukeyHSD(aov(data$stockprice~factordays))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = data$stockprice ~ factordays)
## 
## $factordays
##          diff        lwr      upr     p adj
## Tu-Mo -1.0436 -18.729386 16.64219 0.9999976
## We-Mo -0.3232 -18.008986 17.36259 1.0000000
## Th-Mo 20.2005   2.514714 37.88629 0.0135627
## Fr-Mo 21.0045   3.318714 38.69029 0.0085579
## Sa-Mo 17.8423   0.156514 35.52809 0.0463876
## Su-Mo 23.2153   5.529514 40.90109 0.0021786
## We-Tu  0.7204 -16.965386 18.40619 0.9999997
## Th-Tu 21.2441   3.558314 38.92989 0.0074315
## Fr-Tu 22.0481   4.362314 39.73389 0.0045691
## Sa-Tu 18.8859   1.200114 36.57169 0.0275391
## Su-Tu 24.2589   6.573114 41.94469 0.0010864
## Th-We 20.5237   2.837914 38.20949 0.0112984
## Fr-We 21.3277   3.641914 39.01349 0.0070715
## Sa-We 18.1655   0.479714 35.85129 0.0396268
## Su-We 23.5385   5.852714 41.22429 0.0017622
## Fr-Th  0.8040 -16.881786 18.48979 0.9999995
## Sa-Th -2.3582 -20.043986 15.32759 0.9997075
## Su-Th  3.0148 -14.670986 20.70059 0.9988009
## Sa-Fr -3.1622 -20.847986 14.52359 0.9984289
## Su-Fr  2.2108 -15.474986 19.89659 0.9997990
## Su-Sa  5.3730 -12.312786 23.05879 0.9727959
  1. Compute a kruskall-wallis test to see if the non-parametric test shows stock price depended on day-of-week.

Solution:

Since the P-value of the test is less than 0.05, it can be concluded that price is dependent on day of the week.

kruskal.test(data$stockprice~data$day)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  data$stockprice by data$day
## Kruskal-Wallis chi-squared = 36.113, df = 6, p-value = 2.621e-06
  1. Compute a one-way BayesFactor ANOVA and report the Bayes factor score determining if day-of-week impacted stock price.

Solution:

The BayesFactor = 41865.66. As the score is 150+, it can be concluded that there is Very strong evidence for hypothesis.

library(BayesFactor)
## Warning: package 'BayesFactor' was built under R version 3.6.2
## Loading required package: coda
## Warning: package 'coda' was built under R version 3.6.2
## Loading required package: Matrix
## ************
## Welcome to BayesFactor 0.9.12-4.2. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
## 
## Type BFManual() to open the manual.
## ************
DF<-data.frame(data$stockprice,factordays)
colnames(DF)<-c('STOCKPRICE','DAYZ')
anovaBF(STOCKPRICE~DAYZ,data=DF)
## Bayes factor analysis
## --------------
## [1] DAYZ : 41865.66 ±0%
## 
## Against denominator:
##   Intercept only 
## ---
## Bayes factor type: BFlinearModel, JZS

Problem 2:

  1. The effect of sector on its own (a one-way test).

Solution:

Since the P-Value is less than 0.05, it can be concluded that there is effect of sector on Stockprice.

The F-value of the test = 17.282.

oneway.test(data$stockprice~data$sector)
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  data$stockprice and data$sector
## F = 17.282, num df = 1.00, denom df = 697.67, p-value = 3.624e-05
model1<-lm(data$stockprice~data$sector)
anova(model1)
  1. Whether sector has an effect after day-of-week is considered: lm(stockprice~day+sector)

Solution:

The sector has an effect after day of the week is considered, as the P-Value is less than 0.05. Also, the F-Value of Sector is higher than the Day-of-week.

model2<-lm(data$stockprice~data$day+data$sector)
anova(model2)
  1. Whether the results differ if sector is included in the model first (lm(stockprice~sector+day)).

Solution:

Although, the F-values and P-values slightly differ, the Sector has higher effect than Day.

The sector has an effect after day of the week is considered, as the P-Value is less than 0.05. Also, the F-Value of Sector is higher than the Day-of-week.

model3<-lm(data$stockprice~data$sector+data$day)
anova(model3)

4.Then compare results of the three tests, including the sum-squared deviations and the results of the F test. Are the results of the tests identical or do they differ? Why? Pick which one you would prefer to use to test the effect, and describe why you feel it is better than the others.

Solution:

The results of the tests are identical, although there is slight difference in the values. As the predictors are same, the results will be identical.

Picking lm(data\(stockprice~data\)day+data$sector) is better as the F-Value and Sum Squares are higher than the other models.