Data:
library(ggplot2)
data <- read.csv("ps10data.csv")
head(data)
ggplot(data,aes(x=day,y=stockprice)) + geom_point(aes(color=sector)) +theme_minimal()
DATAMEANS<- aggregate(data$stockprice,list(data$day),mean)
DATAMEANS## means data set
VECTOROFdays<-c("Mo","Tu","We","Th","Fr","Sa","Su") ##new order
factordays<-factor(data$day,levels = VECTOROFdays)
Thedays<-factor(VECTOROFdays,levels = VECTOROFdays)
values<-c('60.5618','59.5182','60.2386','80.7623','81.5663','78.4041','83.7771')
Problem 1:
Solution:
Treament contrasts is used. The days which had prices significantly higher than Monday are: Thursday, Friday, Saturday and Sunday. As the P-value of these are less than 0.05, we conclude the above days are higher than Monday.
## Q1
contrasts(factordays)<-contr.treatment(levels(factordays))
summary(lm(data$stockprice~factordays))
##
## Call:
## lm(formula = data$stockprice ~ factordays)
##
## Residuals:
## Min 1Q Median 3Q Max
## -90.497 -33.876 -0.053 36.118 103.973
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60.5618 4.2292 14.320 < 2e-16 ***
## factordaysTu -1.0436 5.9809 -0.174 0.861533
## factordaysWe -0.3232 5.9809 -0.054 0.956920
## factordaysTh 20.2005 5.9809 3.377 0.000772 ***
## factordaysFr 21.0045 5.9809 3.512 0.000474 ***
## factordaysSa 17.8423 5.9809 2.983 0.002953 **
## factordaysSu 23.2153 5.9809 3.882 0.000114 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42.29 on 693 degrees of freedom
## Multiple R-squared: 0.05869, Adjusted R-squared: 0.05054
## F-statistic: 7.202 on 6 and 693 DF, p-value: 1.808e-07
Solution:
Monday - Sunday and Thursday - Wednesday, differed significantly from the Previous day.
Looking at the P-values, the values for the above pairs are less than 0.05, hence the differece is significant.
library(MASS)
contrasts(factordays) <- contr.sdif(levels(factordays))
summary(lm(data$stockprice~factordays))
##
## Call:
## lm(formula = data$stockprice ~ factordays)
##
## Residuals:
## Min 1Q Median 3Q Max
## -90.497 -33.876 -0.053 36.118 103.973
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 72.1183 1.5985 45.117 < 2e-16 ***
## factordaysTu-Mo -1.0436 5.9809 -0.174 0.861533
## factordaysWe-Tu 0.7204 5.9809 0.120 0.904162
## factordaysTh-We 20.5237 5.9809 3.432 0.000636 ***
## factordaysFr-Th 0.8040 5.9809 0.134 0.893104
## factordaysSa-Fr -3.1622 5.9809 -0.529 0.597174
## factordaysSu-Sa 5.3730 5.9809 0.898 0.369309
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42.29 on 693 degrees of freedom
## Multiple R-squared: 0.05869, Adjusted R-squared: 0.05054
## F-statistic: 7.202 on 6 and 693 DF, p-value: 1.808e-07
##Q2
Solution:
The pairs: Mo-Th, Mo-Fr, Mo-Sa, Mo-Su, Tu-Th, Tu-Fr, Tu-Sa, Tu-Su, We-Th, We-Fr, We-Sa, We-Su.
Since the p-values are less than 0.05, the difference is significant.
pairwise.t.test(data$stockprice,factordays)
##
## Pairwise comparisons using t tests with pooled SD
##
## data: data$stockprice and factordays
##
## Mo Tu We Th Fr Sa
## Tu 1.0000 - - - - -
## We 1.0000 1.0000 - - - -
## Th 0.0100 0.0066 0.0089 - - -
## Fr 0.0071 0.0044 0.0066 1.0000 - -
## Sa 0.0295 0.0199 0.0272 1.0000 1.0000 -
## Su 0.0022 0.0012 0.0018 1.0000 1.0000 1.0000
##
## P value adjustment method: holm
Solution:
No, The result does not differ from Part-3.
aov(data$stockprice~factordays)##aov test
## Call:
## aov(formula = data$stockprice ~ factordays)
##
## Terms:
## factordays Residuals
## Sum of Squares 77286.5 1239488.1
## Deg. of Freedom 6 693
##
## Residual standard error: 42.29164
## Estimated effects may be unbalanced
TukeyHSD(aov(data$stockprice~factordays))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = data$stockprice ~ factordays)
##
## $factordays
## diff lwr upr p adj
## Tu-Mo -1.0436 -18.729386 16.64219 0.9999976
## We-Mo -0.3232 -18.008986 17.36259 1.0000000
## Th-Mo 20.2005 2.514714 37.88629 0.0135627
## Fr-Mo 21.0045 3.318714 38.69029 0.0085579
## Sa-Mo 17.8423 0.156514 35.52809 0.0463876
## Su-Mo 23.2153 5.529514 40.90109 0.0021786
## We-Tu 0.7204 -16.965386 18.40619 0.9999997
## Th-Tu 21.2441 3.558314 38.92989 0.0074315
## Fr-Tu 22.0481 4.362314 39.73389 0.0045691
## Sa-Tu 18.8859 1.200114 36.57169 0.0275391
## Su-Tu 24.2589 6.573114 41.94469 0.0010864
## Th-We 20.5237 2.837914 38.20949 0.0112984
## Fr-We 21.3277 3.641914 39.01349 0.0070715
## Sa-We 18.1655 0.479714 35.85129 0.0396268
## Su-We 23.5385 5.852714 41.22429 0.0017622
## Fr-Th 0.8040 -16.881786 18.48979 0.9999995
## Sa-Th -2.3582 -20.043986 15.32759 0.9997075
## Su-Th 3.0148 -14.670986 20.70059 0.9988009
## Sa-Fr -3.1622 -20.847986 14.52359 0.9984289
## Su-Fr 2.2108 -15.474986 19.89659 0.9997990
## Su-Sa 5.3730 -12.312786 23.05879 0.9727959
Solution:
Since the P-value of the test is less than 0.05, it can be concluded that price is dependent on day of the week.
kruskal.test(data$stockprice~data$day)
##
## Kruskal-Wallis rank sum test
##
## data: data$stockprice by data$day
## Kruskal-Wallis chi-squared = 36.113, df = 6, p-value = 2.621e-06
Solution:
The BayesFactor = 41865.66. As the score is 150+, it can be concluded that there is Very strong evidence for hypothesis.
library(BayesFactor)
## Warning: package 'BayesFactor' was built under R version 3.6.2
## Loading required package: coda
## Warning: package 'coda' was built under R version 3.6.2
## Loading required package: Matrix
## ************
## Welcome to BayesFactor 0.9.12-4.2. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
##
## Type BFManual() to open the manual.
## ************
DF<-data.frame(data$stockprice,factordays)
colnames(DF)<-c('STOCKPRICE','DAYZ')
anovaBF(STOCKPRICE~DAYZ,data=DF)
## Bayes factor analysis
## --------------
## [1] DAYZ : 41865.66 ±0%
##
## Against denominator:
## Intercept only
## ---
## Bayes factor type: BFlinearModel, JZS
Problem 2:
Solution:
Since the P-Value is less than 0.05, it can be concluded that there is effect of sector on Stockprice.
The F-value of the test = 17.282.
oneway.test(data$stockprice~data$sector)
##
## One-way analysis of means (not assuming equal variances)
##
## data: data$stockprice and data$sector
## F = 17.282, num df = 1.00, denom df = 697.67, p-value = 3.624e-05
model1<-lm(data$stockprice~data$sector)
anova(model1)
Solution:
The sector has an effect after day of the week is considered, as the P-Value is less than 0.05. Also, the F-Value of Sector is higher than the Day-of-week.
model2<-lm(data$stockprice~data$day+data$sector)
anova(model2)
Solution:
Although, the F-values and P-values slightly differ, the Sector has higher effect than Day.
The sector has an effect after day of the week is considered, as the P-Value is less than 0.05. Also, the F-Value of Sector is higher than the Day-of-week.
model3<-lm(data$stockprice~data$sector+data$day)
anova(model3)
4.Then compare results of the three tests, including the sum-squared deviations and the results of the F test. Are the results of the tests identical or do they differ? Why? Pick which one you would prefer to use to test the effect, and describe why you feel it is better than the others.
Solution:
The results of the tests are identical, although there is slight difference in the values. As the predictors are same, the results will be identical.
Picking lm(data\(stockprice~data\)day+data$sector) is better as the F-Value and Sum Squares are higher than the other models.