ken <- c(21, 32, 38, 40, 48, 55, 63, 66, 70, 75, 80, 84, 86, 90, 90, 93, 95, 98, 100, 105, 106, 108, 115, 118, 126, 128, 130, 142, 145, 155)
ken
## [1] 21 32 38 40 48 55 63 66 70 75 80 84 86 90 90 93 95 98 100
## [20] 105 106 108 115 118 126 128 130 142 145 155
Null Hypothesis: The data is normally distributed.
Alternative Hypothesis: The data is not normally distributed.
mean(ken)
## [1] 90.06667
sd(ken)
## [1] 34.79292
ks.test(ken, "pnorm")
## Warning in ks.test.default(ken, "pnorm"): ties should not be present for the
## Kolmogorov-Smirnov test
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: ken
## D = 1, p-value < 2.2e-16
## alternative hypothesis: two-sided
Since the p-value is less than 0.05 hence, reject the null hypothesis
i.e. the data is not normal.
kenneth <- c(9, 10, 8, 4, 8, 3, 0, 10, 15, 9)
kenneth
## [1] 9 10 8 4 8 3 0 10 15 9
ks.test(kenneth, "pnorm")
## Warning in ks.test.default(kenneth, "pnorm"): ties should not be present for the
## Kolmogorov-Smirnov test
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: kenneth
## D = 0.89865, p-value = 1.934e-07
## alternative hypothesis: two-sided
The same justifications in item 1.
Null Hypothesis: The median number of times he sees each of his
patient during the year is five.
Alternative Hypothesis: The median number of times he sees each of his patient during the year is not five.
hist(kenneth)
boxplot(kenneth, ylab = "Score")
round(summary(kenneth), digits = 2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 5.00 8.50 7.60 9.75 15.00
From the boxplot and the descriptive statistics above, we see that the mean and median of the scores in our sample are respectively 7.60 and 8.50.
wilcox.test(kenneth,
mu = 5 # default value
)
## Warning in wilcox.test.default(kenneth, mu = 5): cannot compute exact p-value
## with ties
##
## Wilcoxon signed rank test with continuity correction
##
## data: kenneth
## V = 44, p-value = 0.1016
## alternative hypothesis: true location is not equal to 5
Based on the results of the test (at the significance level of 0.05) we do not reject the null hypothesis that the median number of times he sees a patient is five, and we cannot conclude that the median number of times he sees a patient are significantly different from 5.
library(readxl)
library(rmarkdown)
Mann <- read_excel("Finals3.xlsx")
paged_table(Mann)
normality_check <- shapiro.test(Mann$Group1)
if (normality_check$p.value > 0.05){
print("The data comes from a population that is normally distributed. Please check the result of two sample t test.")
} else {
print("The data is not normally distribued. Please check the result of Mann-Whitney U test.")
}
## [1] "The data is not normally distribued. Please check the result of Mann-Whitney U test."
normality_check <- shapiro.test(Mann$Group2)
if (normality_check$p.value > 0.05){
print("The data comes from a population that is normally distributed. Please check the result of two sample t test.")
} else {
print("The data is not normally distribued. Please check the result of Mann-Whitney U test.")
}
## [1] "The data comes from a population that is normally distributed. Please check the result of two sample t test."
Null Hypothesis: The median of the population Group 1 represents
equals the median of the population Group 2 represents.
Alternative Hypothesis: The median of the population Group 1 represents does not equal the median of the population Group 2 represents.
head(Mann)
## # A tibble: 5 × 2
## Group1 Group2
## <dbl> <dbl>
## 1 11 11
## 2 1 11
## 3 0 5
## 4 2 8
## 5 0 4
str(Mann)
## tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
## $ Group1: num [1:5] 11 1 0 2 0
## $ Group2: num [1:5] 11 11 5 8 4
summary(Mann)
## Group1 Group2
## Min. : 0.0 Min. : 4.0
## 1st Qu.: 0.0 1st Qu.: 5.0
## Median : 1.0 Median : 8.0
## Mean : 2.8 Mean : 7.8
## 3rd Qu.: 2.0 3rd Qu.:11.0
## Max. :11.0 Max. :11.0
wilcox.test(Mann$Group1,Mann$Group2)
## Warning in wilcox.test.default(Mann$Group1, Mann$Group2): cannot compute exact
## p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: Mann$Group1 and Mann$Group2
## W = 4, p-value = 0.08969
## alternative hypothesis: true location shift is not equal to 0
Since the p-value is 0.08969 which is greater than the significance level 0.05, we have sufficient evidence to say that the level of depression (median) of depressed patients which is administered by antidepressant drug is not different from the placebo group.
wilcox.test(Mann$Group1, Mann$Group2, alternative = "less")
## Warning in wilcox.test.default(Mann$Group1, Mann$Group2, alternative = "less"):
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: Mann$Group1 and Mann$Group2
## W = 4, p-value = 0.04484
## alternative hypothesis: true location shift is less than 0
Since the p value is 0.04484 which is less than the significance level 0.05, we have sufficient evidence to say that the level of depression of depressed patients administered by antidepressant drug was less than that of the patients in the placebo group. Hence, antidepressant drug is effective.
The null hypothesis can only be rejected if the researcher employs a directional alternative hypothesis which predicts a lower degree of depression in the group which receives the antidepressant medication (Group 1).
library(readxl)
library(rmarkdown)
Friedman <- read_excel("Finals4.xlsx")
paged_table(Friedman)
ks.test(Friedman$Condition1, "pnorm")
## Warning in ks.test.default(Friedman$Condition1, "pnorm"): ties should not be
## present for the Kolmogorov-Smirnov test
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: Friedman$Condition1
## D = 1, p-value = 1.229e-05
## alternative hypothesis: two-sided
ks.test(Friedman$Condition2, "pnorm")
## Warning in ks.test.default(Friedman$Condition2, "pnorm"): ties should not be
## present for the Kolmogorov-Smirnov test
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: Friedman$Condition2
## D = 1, p-value = 1.229e-05
## alternative hypothesis: two-sided
ks.test(Friedman$Condition3, "pnorm")
## Warning in ks.test.default(Friedman$Condition3, "pnorm"): ties should not be
## present for the Kolmogorov-Smirnov test
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: Friedman$Condition3
## D = 0.97725, p-value = 2.108e-05
## alternative hypothesis: two-sided
Null Hypothesis: The median of the population Condition 1 represents
equals the median of the population Condition 2 represents equals the
median of the population Condition 3 represents.
Alternative Hypothesis: There is a difference between at least two of the 3 population medians.
kyle <- Friedman %>%
gather(key = "Condition", value = "score", Condition1, Condition2, Condition3) %>%
convert_as_factor(Subject, Condition)
head(kyle,6)
## # A tibble: 6 × 3
## Subject Condition score
## <fct> <fct> <dbl>
## 1 1 Condition1 9
## 2 2 Condition1 10
## 3 3 Condition1 7
## 4 4 Condition1 10
## 5 5 Condition1 7
## 6 6 Condition1 8
summary(Friedman)
## Subject Condition1 Condition2 Condition3
## Min. :1.00 Min. : 7.00 Min. :5.00 Min. :2.000
## 1st Qu.:2.25 1st Qu.: 7.25 1st Qu.:5.25 1st Qu.:3.250
## Median :3.50 Median : 8.50 Median :6.50 Median :5.000
## Mean :3.50 Mean : 8.50 Mean :6.50 Mean :4.833
## 3rd Qu.:4.75 3rd Qu.: 9.75 3rd Qu.:7.75 3rd Qu.:6.750
## Max. :6.00 Max. :10.00 Max. :8.00 Max. :7.000
kyle %>%
group_by(Condition) %>%
get_summary_stats(score, type = "common")
## # A tibble: 3 × 11
## Condition variable n min max median iqr mean sd se ci
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Condition1 score 6 7 10 8.5 2.5 8.5 1.38 0.563 1.45
## 2 Condition2 score 6 5 8 6.5 2.5 6.5 1.38 0.563 1.45
## 3 Condition3 score 6 2 7 5 3.5 4.83 2.14 0.872 2.24
ggboxplot(kyle, x = "Condition", y = "score", add = "jitter")
res.fried <- kyle %>% friedman_test(score ~ Condition |Subject)
res.fried
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <dbl> <dbl> <chr>
## 1 score 6 11.6 2 0.00308 Friedman test
There was a statistically significant difference in inhibiting learning depending on which type of noise was introduced upon memorizing the nonsense syllable of the subject, χ^2 = 11.56522, p = 0.003080668
kyle %>% friedman_effsize(score ~ Condition |Subject)
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 score 6 0.964 Kendall W large
A large effect size is detected, W = 0.9637681.
pwc <- kyle %>% wilcox_test(score ~ Condition, paired = TRUE, p.adjust.method = "bonferroni")
pwc
## # A tibble: 3 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 score Condition1 Condition2 6 6 21 0.02 0.059 ns
## 2 score Condition1 Condition3 6 6 21 0.035 0.105 ns
## 3 score Condition2 Condition3 6 6 15 0.057 0.17 ns
Pairwise Wilcoxon signed rank test between groups revealed no statistically significant differences in condition score between Condition1 and Condition2 (p = 0.059); Condition1 and Condition3 (p = 0.105); Condition2 and Condition3 (p = 0.170).
# pairwise comparisons using sign test
pwc2 <- kyle %>% sign_test(score ~ Condition, p.adjust.method = "bonferroni")
pwc2
## # A tibble: 3 × 10
## .y. group1 group2 n1 n2 statistic df p p.adj p.adj.si…¹
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 score Condition1 Condition2 6 6 6 6 0.031 0.094 ns
## 2 score Condition1 Condition3 6 6 6 6 0.031 0.094 ns
## 3 score Condition2 Condition3 6 6 5 5 0.062 0.188 ns
## # … with abbreviated variable name ¹p.adj.signif
# visualization: box plots with p-values
pwc <- pwc %>% add_xy_position(x = "Condition")
ggboxplot(kyle, x = "Condition", y = "score", add = "point") +
stat_pvalue_manual(pwc, hide.ns = TRUE) +
labs(subtitle = get_test_label(res.fried, detailed = TRUE),
caption = get_pwc_label(pwc)
)
Therefore, the data indicate that noise influenced subjects’ performance.
In this problem, the smaller difference between the computed test statistic and the tabled critical value in the case of the Friedman test reflects the fact that, as a general rule (assuming that none of the assumptions of the analysis of variance are saliently violated), it provides a less powerful test of an alternative hypothesis than the analysis of variance.
library(readxl)
ruaya <- read_excel("Finals5.xlsx")
ruaya
## # A tibble: 5 × 4
## Subject Group1 Group2 Group3
## <dbl> <dbl> <dbl> <dbl>
## 1 1 8 7 4
## 2 2 10 8 8
## 3 3 9 5 7
## 4 4 10 8 5
## 5 5 9 5 7
shapiro.test(ruaya$Group1)
##
## Shapiro-Wilk normality test
##
## data: ruaya$Group1
## W = 0.88104, p-value = 0.314
shapiro.test(ruaya$Group2)
##
## Shapiro-Wilk normality test
##
## data: ruaya$Group2
## W = 0.80299, p-value = 0.08569
shapiro.test(ruaya$Group3)
##
## Shapiro-Wilk normality test
##
## data: ruaya$Group3
## W = 0.91367, p-value = 0.4899
kenken <- ruaya %>%
gather(key = "Group", value = "score", Group1, Group2, Group3) %>%
convert_as_factor(Subject, Group)
head(kenken,5)
## # A tibble: 5 × 3
## Subject Group score
## <fct> <fct> <dbl>
## 1 1 Group1 8
## 2 2 Group1 10
## 3 3 Group1 9
## 4 4 Group1 10
## 5 5 Group1 9
head(kenken)
## # A tibble: 6 × 3
## Subject Group score
## <fct> <fct> <dbl>
## 1 1 Group1 8
## 2 2 Group1 10
## 3 3 Group1 9
## 4 4 Group1 10
## 5 5 Group1 9
## 6 1 Group2 7
summary(ruaya)
## Subject Group1 Group2 Group3
## Min. :1 Min. : 8.0 Min. :5.0 Min. :4.0
## 1st Qu.:2 1st Qu.: 9.0 1st Qu.:5.0 1st Qu.:5.0
## Median :3 Median : 9.0 Median :7.0 Median :7.0
## Mean :3 Mean : 9.2 Mean :6.6 Mean :6.2
## 3rd Qu.:4 3rd Qu.:10.0 3rd Qu.:8.0 3rd Qu.:7.0
## Max. :5 Max. :10.0 Max. :8.0 Max. :8.0
set.seed(12345)
kenken %>% sample_n_by(Group, size = 5)
## # A tibble: 15 × 3
## Subject Group score
## <fct> <fct> <dbl>
## 1 3 Group1 9
## 2 4 Group1 10
## 3 2 Group1 10
## 4 5 Group1 9
## 5 1 Group1 8
## 6 2 Group2 8
## 7 1 Group2 7
## 8 3 Group2 5
## 9 5 Group2 5
## 10 4 Group2 8
## 11 2 Group3 8
## 12 5 Group3 7
## 13 3 Group3 7
## 14 4 Group3 5
## 15 1 Group3 4
kenken1 <- kenken %>%
group_by(Group) %>%
get_summary_stats(score, type = "common")
kenken1
## # A tibble: 3 × 11
## Group variable n min max median iqr mean sd se ci
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Group1 score 5 8 10 9 1 9.2 0.837 0.374 1.04
## 2 Group2 score 5 5 8 7 3 6.6 1.52 0.678 1.88
## 3 Group3 score 5 4 8 7 2 6.2 1.64 0.735 2.04
ggboxplot(kenken, x = "Group", y = "score")
Null Hypothesis: The median of the population Group 1 represents
equals the median of the population Group 2 represents equals the median
of the population Group 3 represents.
Alternative Hypothesis: There is a difference between at least two of the 3 population medians.
res.kruskal <- kenken %>% kruskal_test(score ~ Group)
res.kruskal
## # A tibble: 1 × 6
## .y. n statistic df p method
## * <chr> <int> <dbl> <int> <dbl> <chr>
## 1 score 15 8.75 2 0.0126 Kruskal-Wallis
It is conclusive that there is a difference between at least two of the 3 population medians.
kenken %>% kruskal_effsize(score ~ Group)
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 score 15 0.562 eta2[H] large
A large effect size is detected, eta2[H] = 0.562284
pwc <- kenken %>%
dunn_test(score ~ Group, p.adjust.method = "bonferroni")
pwc
## # A tibble: 3 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 score Group1 Group2 5 5 -2.34 0.0193 0.0578 ns
## 2 score Group1 Group3 5 5 -2.74 0.00621 0.0186 *
## 3 score Group2 Group3 5 5 -0.396 0.692 1 ns
Only Group1 and Group3 are statistically significant with p-value = 0.01863958.
# pairwise comparisons using Wilcoxon's test
pwc2 <- kenken %>%
wilcox_test(score ~ Group, p.adjust.method = "bonferroni")
pwc2
## # A tibble: 3 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 score Group1 Group2 5 5 24 0.019 0.057 ns
## 2 score Group1 Group3 5 5 24.5 0.015 0.045 *
## 3 score Group2 Group3 5 5 15 0.664 1 ns
There was a statistically significant difference on inhibiting learning between the noise present during the memorizing as assessed using the Kruskal-Wallise Test (p = 0.013). Pairwise Wilcoxon test between groups showed that there is a significant difference between Group1 and Group3 with (p = 0.045).
# visualization: box plots with p-values
pwc <- pwc %>% add_xy_position(x = "Group")
ggboxplot(kenken, x = "Group", y = "score", add = "point") +
stat_pvalue_manual(pwc, hide.ns = TRUE) +
labs(subtitle = get_test_label(res.kruskal, detailed = TRUE),
caption = get_pwc_label(pwc)
)
It should be noted that when the data in this problem are evaluated with a single-factor between-subjects analysis of variance, the null hypothesis can be rejected at both the .05 and .01 levels (although it barely achieves significance at the latter level and, in the case of the Kruskal-Wallis test, the result just falls short of significance at the .01 level). The slight discrepancy between the results of the two tests reflects the fact that, as a general rule (assuming that none of the assumptions of the analysis of variance is saliently violated), the KruskalWallis one-way analysis of variance by ranks provides a less powerful test of an alternative hypothesis than the single-factor between-subjects analysis of variance.
library(readxl)
kyubs <- read_excel("FInal6.xlsx")
kyubs
## # A tibble: 5 × 3
## Child NumberofOunces NumberofCavities
## <dbl> <dbl> <dbl>
## 1 1 20 7
## 2 2 0 0
## 3 3 1 2
## 4 4 12 5
## 5 5 3 3
summary(kyubs)
## Child NumberofOunces NumberofCavities
## Min. :1 Min. : 0.0 Min. :0.0
## 1st Qu.:2 1st Qu.: 1.0 1st Qu.:2.0
## Median :3 Median : 3.0 Median :3.0
## Mean :3 Mean : 7.2 Mean :3.4
## 3rd Qu.:4 3rd Qu.:12.0 3rd Qu.:5.0
## Max. :5 Max. :20.0 Max. :7.0
library(ggplot2)
ggplot(kyubs, aes(x = NumberofOunces, y = NumberofCavities)) + geom_point(color = '#2980B9', size = 4) + geom_smooth(method = lm, se = FALSE, fullrange = TRUE, color = '#2C3E50')
## `geom_smooth()` using formula = 'y ~ x'
Null Hypothesis: There is no association between the two variables,
Number of Ounces and Number of Cavities.
Alternative Hypothesis: There is an association between the two variables, Number of Ounces and Number of Cavities.
kevin <-cor.test(x = kyubs$NumberofOunces, y = kyubs$NumberofCavities, method = 'spearman')
kevin
##
## Spearman's rank correlation rho
##
## data: kyubs$NumberofOunces and kyubs$NumberofCavities
## S = 4.4409e-15, p-value = 0.01667
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 1
Since the p-value = 0.01667 < 0.05 = significance level, we have enough evidence to reject the null hypothesis. Thus, there is an association between the two variables, Number of Ounces and Number of Cavities.
library(readxl)
kyla <- read_excel("FInals7.xlsx")
kyla
## # A tibble: 6 × 2
## Day Frequency
## <chr> <dbl>
## 1 Monday 20
## 2 Tuesday 14
## 3 Wednesday 18
## 4 Thursday 17
## 5 Friday 22
## 6 Saturday 29
summary(kyla)
## Day Frequency
## Length:6 Min. :14.00
## Class :character 1st Qu.:17.25
## Mode :character Median :19.00
## Mean :20.00
## 3rd Qu.:21.50
## Max. :29.00
observedfreq <- kyla$Frequency
observedfreq
## [1] 20 14 18 17 22 29
expectedprop <- c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6)
expectedprop
## [1] 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
Null Hypothesis: There is no difference with respect to the number of
books taken out on different days of the week.
Alternative Hypothesis: There is a difference with respect to the number of books taken out on different days of the week.
buit <- chisq.test(x = observedfreq, p = expectedprop)
buit
##
## Chi-squared test for given probabilities
##
## data: observedfreq
## X-squared = 6.7, df = 5, p-value = 0.2439
Here, the p-value is 0.2439 and the Chi-square test statistic is 6.7. We cannot reject the null hypothesis since the p-value is greater than 0.05. This means that we don’t have enough evidence to conclude that there is a difference with respect to the number of books taken out on different days of the week.