Question1) Explore the data.
1-c) From the visualization alone, do you feel that media type makes
a difference on intention to share?
Although there are 3 types of media which would averagely
agree to share, and 50-to-50 to share on 1 type of media, in fact, I
don’t observe the significant difference across these 4 types of media
actually.
It could be contributed by many reason, for example, the
video is not attractive enough for the participants. To sum up, people
become neutral when they need to determine to share this information to
someone else, no matter the media type.
Question2) Traditional one-way ANOVA.
2-a) State the null and alternative hypotheses when comparing
INTEND.0 across four groups in ANOVA.
Null Hypothesis (H0): There is no difference in each 4 types
of media.
Alternative Hypothesis (H1): There are some differences in 4
types of media.
2-b-i) Show the code and results of computing MSTR, MSE, and F.
# Combine the datasets into one data frame
data <- data.frame(value = c(pls1$INTEND.0, pls2$INTEND.0, pls3$INTEND.0, pls4$INTEND.0),
group = rep(1:4, c(42, 38, 40, 46)))
# Calculate the overall mean
grand_mean <- mean(data$value)
# Calculate the sum of squares due to treatments (SSTR)
SSTR <- sum((tapply(data$value, data$group, mean) - grand_mean)^2 * c(42, 38, 40, 46))
# Calculate the mean square due to treatments (MSTR)
k <- length(unique(data$group))
df_mstr <- k - 1
MSTR <- SSTR / df_mstr
# Calculate the sum of squares due to error (SSE)
SSE <- sum((tapply(data$value, data$group, sd)^2) * (c(42, 38, 40, 46) - 1))
# Calculate the mean square due to error (MSE)
nT <- length(data$value)
df_mse <- nT - k
MSE <- SSE / df_mse
# Calculate the F-statistic
F_stat <- MSTR / MSE
# Print the results
cat("MSTR:", MSTR, "\n")
## MSTR: 7.507617
cat("MSE:", MSE, "\n")
## MSE: 2.869151
cat("F:", F_stat, "\n")
## F: 2.616669
2-b-ii) Compute the p-value of F, from the null F-distribution; is
the F-value significant? If so, state your conclusion for the
hypotheses.
p_value <- pf(F_stat, df_mstr, df_mse, lower.tail=FALSE);p_value
## [1] 0.05289015
From the F-distribution, it is (3,162), and the F-value is
2.61667, which is less that the critical F value. Moreover, the p value
of F-value is greater than 0.05(95% confidence we set). Therefore, we
can not reject the null hypothesis based on the two facts.
2-c) Conduct the same one-way ANOVA using the aov() function in R –
confirm that you got similar results.
anova_model <- aov( data$value ~ factor(data$group))
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(data$group) 3 22.5 7.508 2.617 0.0529 .
## Residuals 162 464.8 2.869
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
2-d) Conduct a post-hoc Tukey test to see if any pairs of media have
significantly different means – what do you find?
TukeyHSD(anova_model, conf.level = 0.05)
## Tukey multiple comparisons of means
## 5% family-wise confidence level
##
## Fit: aov(formula = data$value ~ factor(data$group))
##
## $`factor(data$group)`
## diff lwr upr p adj
## 2-1 -0.86215539 -1.06562977 -0.6586810 0.1085727
## 3-1 -0.08452381 -0.28530983 0.1162622 0.9959223
## 4-1 0.08178054 -0.11218249 0.2757436 0.9959032
## 3-2 0.77763158 0.57175512 0.9835080 0.1825044
## 4-2 0.94393593 0.74470805 1.1431638 0.0573229
## 4-3 0.16630435 -0.03017708 0.3627858 0.9687417
Due to the post-hoc test, we find that media2 has
significantly different means from the others.
2-e) Do you feel the classic requirements of one-way ANOVA were
met?
shapiro.test(pls1$INTEND.0)
##
## Shapiro-Wilk normality test
##
## data: pls1$INTEND.0
## W = 0.91279, p-value = 0.003557
shapiro.test(pls2$INTEND.0)
##
## Shapiro-Wilk normality test
##
## data: pls2$INTEND.0
## W = 0.92974, p-value = 0.01969
shapiro.test(pls3$INTEND.0)
##
## Shapiro-Wilk normality test
##
## data: pls3$INTEND.0
## W = 0.88247, p-value = 0.0006139
shapiro.test(pls4$INTEND.0)
##
## Shapiro-Wilk normality test
##
## data: pls4$INTEND.0
## W = 0.89611, p-value = 0.0006242
In order to perform a valid one-way ANOVA, the most
important of the assumptions is that the data within each group should
be normally distributed.
However, after we performed the Shapiro-Wilk test(the simple
function to test whether the data is normally distributed or not), the
p-value of all the groups are less than 0.05, which means that they are
not normally distributed.
Therefore, since the assumption have been violated, there
may be alternative methods that could be used instead of ANOVA, such as
nonparametric test.
Question3) Non-parametric Kruskal Wallis test.
3-a) State the null and alternative hypotheses
Null Hypothesis (H0): All groups would give you similar a
value if randomly drawn from them.
Alternative Hypothesis (H1): At least one group would give
you a larger value than another if randomly drawn.
3-b-i) Compute (an approximate) Kruskal Wallis H ourselves. Show the
code and results of computing H
media_ranks <- rank(data$value)
group_ranks <- split(media_ranks,data$group)
sapply(group_ranks, sum)
## 1 2 3 4
## 3693.5 2421.0 3556.0 4190.5
N <- length(data$value)
H <- 12/(N*(N+1)) * sum(tapply(media_ranks,data$group,sum)^2/tapply(data$value,data$group,FUN = length)) - 3*(N+1);H
## [1] 8.45466
3-b-ii) Compute the p-value of H, from the null chi-square
distribution; is the H value significant?
kw_p <- 1 - pchisq(H, df=k-1);kw_p
## [1] 0.03749292
The p-value of H is 0.0375, which is smaller than 0.05, so
the H value is significant.
Therefore, we should reject the null hypothesis and say that
there is at least one group would give you a larger value than another
if randomly drawn.
3-c) Conduct the same test using the kruskal.wallis() function in
R.
kruskal.test(value ~ group, data = data)
##
## Kruskal-Wallis rank sum test
##
## data: value by group
## Kruskal-Wallis chi-squared = 8.8283, df = 3, p-value = 0.03166
3-d) Conduct a post-hoc Dunn test to see if the values of any pairs
of media are significantly different – what are your conclusions?
require(FSA)
## 載入需要的套件:FSA
## Warning: 套件 'FSA' 是用 R 版本 4.2.3 來建造的
## ## FSA v0.9.4. See citation('FSA') if used in publication.
## ## Run fishR() for related website and fishR('IFAR') for related book.
dunnTest(value ~ group, data = data, method = "bonferroni")
## Warning: group was coerced to a factor.
## Dunn (1964) Kruskal-Wallis multiple comparison
## p-values adjusted with the Bonferroni method.
## Comparison Z P.unadj P.adj
## 1 1 - 2 2.30087819 0.021398517 0.12839110
## 2 1 - 3 -0.09233644 0.926430736 1.00000000
## 3 2 - 3 -2.36408588 0.018074622 0.10844773
## 4 1 - 4 -0.31452459 0.753122646 1.00000000
## 5 2 - 4 -2.65613380 0.007904225 0.04742535
## 6 3 - 4 -0.21613379 0.828883460 1.00000000
Due to the post-hoc test, after viewing the adjustment of
p-value, we can find that media2 has slightly significant difference
from media4 at 95% confidence, but except of this, there is no
significant difference between 4 types of media.
In conclusion, I would say that there value of intend to
share in each group is similar.