pls1 <- read.csv("C:/R-language/BACS/pls-media/pls-media1.csv")
pls2 <- read.csv("C:/R-language/BACS/pls-media/pls-media2.csv")
pls3 <- read.csv("C:/R-language/BACS/pls-media/pls-media3.csv")
pls4 <- read.csv("C:/R-language/BACS/pls-media/pls-media4.csv")

Question1) Explore the data.

1-a) What are the means of viewers’ intentions to share (INTEND.0) on each of the four media types?

cat("The means of viewers' intentions on the media1:",mean(pls1$INTEND.0))
## The means of viewers' intentions on the media1: 4.809524
cat("The means of viewers' intentions on the media2:",mean(pls2$INTEND.0))
## The means of viewers' intentions on the media2: 3.947368
cat("The means of viewers' intentions on the media3:",mean(pls3$INTEND.0))
## The means of viewers' intentions on the media3: 4.725
cat("The means of viewers' intentions on the media4:",mean(pls4$INTEND.0))
## The means of viewers' intentions on the media4: 4.891304

1-b) Visualize the distribution and mean of intention to share, across all four media.

plot(density(pls1$INTEND.0), col="black", lwd=2, xlim=c(1, 7),ylim=c(0,0.5),main="The distribution of the four media")
lines(density(pls2$INTEND.0), col="coral3", lwd=2)
lines(density(pls3$INTEND.0), col="yellow3", lwd=2)
lines(density(pls4$INTEND.0), col="cornflowerblue", lwd=2)
legend(1,0.5, lty=1, c("media1", "media2", "media3", "media4"), col=c("black","coral3","yellow3","cornflowerblue"))
abline(v=mean(pls1$INTEND.0),col="black",lwd=3)
abline(v=mean(pls2$INTEND.0),col="coral3",lwd=3)
abline(v=mean(pls3$INTEND.0),col="yellow3",lwd=3)
abline(v=mean(pls4$INTEND.0),col="cornflowerblue",lwd=3)

1-c) From the visualization alone, do you feel that media type makes a difference on intention to share?

Although there are 3 types of media which would averagely agree to share, and 50-to-50 to share on 1 type of media, in fact, I don’t observe the significant difference across these 4 types of media actually.

It could be contributed by many reason, for example, the video is not attractive enough for the participants. To sum up, people become neutral when they need to determine to share this information to someone else, no matter the media type.

Question2) Traditional one-way ANOVA.

2-a) State the null and alternative hypotheses when comparing INTEND.0 across four groups in ANOVA.

Null Hypothesis (H0): There is no difference in each 4 types of media.

Alternative Hypothesis (H1): There are some differences in 4 types of media.

2-b-i) Show the code and results of computing MSTR, MSE, and F.

# Combine the datasets into one data frame
data <- data.frame(value = c(pls1$INTEND.0, pls2$INTEND.0, pls3$INTEND.0, pls4$INTEND.0),
                   group = rep(1:4, c(42, 38, 40, 46)))

# Calculate the overall mean
grand_mean <- mean(data$value)

# Calculate the sum of squares due to treatments (SSTR)
SSTR <- sum((tapply(data$value, data$group, mean) - grand_mean)^2 * c(42, 38, 40, 46))

# Calculate the mean square due to treatments (MSTR)
k <- length(unique(data$group))
df_mstr <- k - 1
MSTR <- SSTR / df_mstr

# Calculate the sum of squares due to error (SSE)
SSE <- sum((tapply(data$value, data$group, sd)^2) * (c(42, 38, 40, 46) - 1))

# Calculate the mean square due to error (MSE)
nT <- length(data$value)
df_mse <- nT - k
MSE <- SSE / df_mse

# Calculate the F-statistic
F_stat <- MSTR / MSE

# Print the results
cat("MSTR:", MSTR, "\n")
## MSTR: 7.507617
cat("MSE:", MSE, "\n")
## MSE: 2.869151
cat("F:", F_stat, "\n")
## F: 2.616669

2-b-ii) Compute the p-value of F, from the null F-distribution; is the F-value significant? If so, state your conclusion for the hypotheses.

p_value <- pf(F_stat, df_mstr, df_mse, lower.tail=FALSE);p_value
## [1] 0.05289015

From the F-distribution, it is (3,162), and the F-value is 2.61667, which is less that the critical F value. Moreover, the p value of F-value is greater than 0.05(95% confidence we set). Therefore, we can not reject the null hypothesis based on the two facts.

2-c) Conduct the same one-way ANOVA using the aov() function in R – confirm that you got similar results.

anova_model <- aov( data$value ~ factor(data$group))
summary(anova_model)
##                     Df Sum Sq Mean Sq F value Pr(>F)  
## factor(data$group)   3   22.5   7.508   2.617 0.0529 .
## Residuals          162  464.8   2.869                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2-d) Conduct a post-hoc Tukey test to see if any pairs of media have significantly different means – what do you find?

TukeyHSD(anova_model, conf.level = 0.05)
##   Tukey multiple comparisons of means
##     5% family-wise confidence level
## 
## Fit: aov(formula = data$value ~ factor(data$group))
## 
## $`factor(data$group)`
##            diff         lwr        upr     p adj
## 2-1 -0.86215539 -1.06562977 -0.6586810 0.1085727
## 3-1 -0.08452381 -0.28530983  0.1162622 0.9959223
## 4-1  0.08178054 -0.11218249  0.2757436 0.9959032
## 3-2  0.77763158  0.57175512  0.9835080 0.1825044
## 4-2  0.94393593  0.74470805  1.1431638 0.0573229
## 4-3  0.16630435 -0.03017708  0.3627858 0.9687417

Due to the post-hoc test, we find that media2 has significantly different means from the others.

2-e) Do you feel the classic requirements of one-way ANOVA were met?

shapiro.test(pls1$INTEND.0)
## 
##  Shapiro-Wilk normality test
## 
## data:  pls1$INTEND.0
## W = 0.91279, p-value = 0.003557
shapiro.test(pls2$INTEND.0)
## 
##  Shapiro-Wilk normality test
## 
## data:  pls2$INTEND.0
## W = 0.92974, p-value = 0.01969
shapiro.test(pls3$INTEND.0)
## 
##  Shapiro-Wilk normality test
## 
## data:  pls3$INTEND.0
## W = 0.88247, p-value = 0.0006139
shapiro.test(pls4$INTEND.0)
## 
##  Shapiro-Wilk normality test
## 
## data:  pls4$INTEND.0
## W = 0.89611, p-value = 0.0006242

In order to perform a valid one-way ANOVA, the most important of the assumptions is that the data within each group should be normally distributed.

However, after we performed the Shapiro-Wilk test(the simple function to test whether the data is normally distributed or not), the p-value of all the groups are less than 0.05, which means that they are not normally distributed.

Therefore, since the assumption have been violated, there may be alternative methods that could be used instead of ANOVA, such as nonparametric test.

Question3) Non-parametric Kruskal Wallis test.

3-a) State the null and alternative hypotheses

Null Hypothesis (H0): All groups would give you similar a value if randomly drawn from them.

Alternative Hypothesis (H1): At least one group would give you a larger value than another if randomly drawn.

3-b-i) Compute (an approximate) Kruskal Wallis H ourselves. Show the code and results of computing H

media_ranks <- rank(data$value)
group_ranks <- split(media_ranks,data$group)
sapply(group_ranks, sum)
##      1      2      3      4 
## 3693.5 2421.0 3556.0 4190.5
N <- length(data$value)
H <- 12/(N*(N+1)) * sum(tapply(media_ranks,data$group,sum)^2/tapply(data$value,data$group,FUN = length)) - 3*(N+1);H
## [1] 8.45466

3-b-ii) Compute the p-value of H, from the null chi-square distribution; is the H value significant?

kw_p <- 1 - pchisq(H, df=k-1);kw_p
## [1] 0.03749292

The p-value of H is 0.0375, which is smaller than 0.05, so the H value is significant.

Therefore, we should reject the null hypothesis and say that there is at least one group would give you a larger value than another if randomly drawn.

3-c) Conduct the same test using the kruskal.wallis() function in R.

kruskal.test(value ~ group, data = data)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by group
## Kruskal-Wallis chi-squared = 8.8283, df = 3, p-value = 0.03166

3-d) Conduct a post-hoc Dunn test to see if the values of any pairs of media are significantly different – what are your conclusions?

require(FSA)
## 載入需要的套件:FSA
## Warning: 套件 'FSA' 是用 R 版本 4.2.3 來建造的
## ## FSA v0.9.4. See citation('FSA') if used in publication.
## ## Run fishR() for related website and fishR('IFAR') for related book.
dunnTest(value ~ group, data = data, method = "bonferroni")
## Warning: group was coerced to a factor.
## Dunn (1964) Kruskal-Wallis multiple comparison
##   p-values adjusted with the Bonferroni method.
##   Comparison           Z     P.unadj      P.adj
## 1      1 - 2  2.30087819 0.021398517 0.12839110
## 2      1 - 3 -0.09233644 0.926430736 1.00000000
## 3      2 - 3 -2.36408588 0.018074622 0.10844773
## 4      1 - 4 -0.31452459 0.753122646 1.00000000
## 5      2 - 4 -2.65613380 0.007904225 0.04742535
## 6      3 - 4 -0.21613379 0.828883460 1.00000000

Due to the post-hoc test, after viewing the adjustment of p-value, we can find that media2 has slightly significant difference from media4 at 95% confidence, but except of this, there is no significant difference between 4 types of media.

In conclusion, I would say that there value of intend to share in each group is similar.