library(readxl)
library(rmarkdown)
Kolmogorov <- read_excel("D:/COLLEGE 4TH YEAR/2nd SEMESTER/STAT 54 NONPARAMETRIC STATISTICS/FINAL/Kolmogorov.xlsx")
paged_table(Kolmogorov)
mean(Kolmogorov$Scores)
[1] 90.06667
sd(Kolmogorov$Scores)
[1] 34.79292
ks.test(Kolmogorov,"pnorm")
Warning in ks.test.default(Kolmogorov, "pnorm"): ties should not be present for
the Kolmogorov-Smirnov test
Asymptotic one-sample Kolmogorov-Smirnov test
data: Kolmogorov
D = 0.96532, p-value < 2.2e-16
alternative hypothesis: two-sided
Since D is greater than the p-value, hence, we reject the null
hypothesis. Thus, there is sufficient evidence to warrant rejection of
the claim that the data comes form the specified distribution.
library(ggplot2)
ggplot(Kolmogorov, aes(Scores)) +
geom_density()
ggplot(data=Kolmogorov) +
geom_histogram( aes(Scores, ..density..) ) +
geom_density( aes(Scores, ..density..) ) +
geom_rug( aes(Scores) )
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The graph show that the data is not normally distributed.
library(readxl)
library(rmarkdown)
Wilcox <- read_excel("D:/COLLEGE 4TH YEAR/2nd SEMESTER/STAT 54 NONPARAMETRIC STATISTICS/FINAL/Wilcox.xlsx")
paged_table(Wilcox)
summary(Wilcox$Frequency)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 5.00 8.50 7.60 9.75 15.00
library(ggpubr)
ggboxplot(Wilcox$Frequency,
ylab = "Frequency", xlab = FALSE,
ggtheme = theme_minimal())
res <- wilcox.test(Wilcox$Frequency, mu = 5)
Warning in wilcox.test.default(Wilcox$Frequency, mu = 5): cannot compute exact
p-value with ties
res
Wilcoxon signed rank test with continuity correction
data: Wilcox$Frequency
V = 44, p-value = 0.1016
alternative hypothesis: true location is not equal to 5
res$p.value
[1] 0.1015756
Since the p-value of the test is 0.1015756 which is greater than 0.05
level of significance, hence, we do not reject the null hypothesis.Thus
the median number of times a physician sees each of his patients during
the year is five significantly the same from a median of five with a
p-value = 0.1015756.
library(readxl)
library(rmarkdown)
Mann <- read_excel("D:/COLLEGE 4TH YEAR/2nd SEMESTER/STAT 54 NONPARAMETRIC STATISTICS/FINAL/Mann.xlsx")
paged_table(Mann)
normality_check <- shapiro.test(Mann$Group1)
if (normality_check$p.value > 0.05){
print("The data comes from a population that is normally distributed. Please check the result of two sample t test.")
} else {
print("The data is not normally distribued. Please check the result of Mann-Whitney U test.")
}
[1] "The data is not normally distribued. Please check the result of Mann-Whitney U test."
normality_check <- shapiro.test(Mann$Group2)
if (normality_check$p.value > 0.05){
print("The data comes from a population that is normally distributed. Please check the result of two sample t test.")
} else {
print("The data is not normally distribued. Please check the result of Mann-Whitney U test.")
}
[1] "The data comes from a population that is normally distributed. Please check the result of two sample t test."
head(Mann)
# A tibble: 5 × 2
Group1 Group2
<dbl> <dbl>
1 11 11
2 1 11
3 0 5
4 2 8
5 0 4
str(Mann)
tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
$ Group1: num [1:5] 11 1 0 2 0
$ Group2: num [1:5] 11 11 5 8 4
summary(Mann)
Group1 Group2
Min. : 0.0 Min. : 4.0
1st Qu.: 0.0 1st Qu.: 5.0
Median : 1.0 Median : 8.0
Mean : 2.8 Mean : 7.8
3rd Qu.: 2.0 3rd Qu.:11.0
Max. :11.0 Max. :11.0
wilcox.test(Mann$Group1, Mann$Group2)
Wilcoxon rank sum test with continuity correction
data: Mann$Group1 and Mann$Group2
W = 4, p-value = 0.08969
alternative hypothesis: true location shift is not equal to 0
Since the p-value of the test is 0.08969 which is greater than the
significance level 0.05, hence, we accept the null hypothesis.
wilcox.test(Mann$Group1, Mann$Group2, alternative = "less")
Warning in wilcox.test.default(Mann$Group1, Mann$Group2, alternative = "less"):
cannot compute exact p-value with ties
Wilcoxon rank sum test with continuity correction
data: Mann$Group1 and Mann$Group2
W = 4, p-value = 0.04484
alternative hypothesis: true location shift is less than 0
Since the p value is 0.04484 which is less than the significance
level 0.05, we have sufficient evidence to say that the level of
depression of depressed patients administered by antidepressant drug was
less than that of the patients in the placebo group. Hence,
antidepressant drug is effective.
library(readxl)
library(rmarkdown)
Friedman <- read_excel("D:/COLLEGE 4TH YEAR/2nd SEMESTER/STAT 54 NONPARAMETRIC STATISTICS/FINAL/Friedman.xlsx")
paged_table(Friedman)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ tibble 3.1.7 ✔ dplyr 1.0.9
✔ tidyr 1.2.1 ✔ stringr 1.4.0
✔ readr 2.1.2 ✔ forcats 0.5.2
✔ purrr 0.3.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
Attaching package: 'rstatix'
The following object is masked from 'package:stats':
filter
Fried <- Friedman %>%
gather(key = "Condition", value = "score", Condition1, Condition2, Condition3) %>%
convert_as_factor(Subject, Condition)
head(F, 6)
[1] FALSE
summary(Friedman)
Subject Condition1 Condition2 Condition3
Min. :1.00 Min. : 7.00 Min. :5.00 Min. :2.000
1st Qu.:2.25 1st Qu.: 7.25 1st Qu.:5.25 1st Qu.:3.250
Median :3.50 Median : 8.50 Median :6.50 Median :5.000
Mean :3.50 Mean : 8.50 Mean :6.50 Mean :4.833
3rd Qu.:4.75 3rd Qu.: 9.75 3rd Qu.:7.75 3rd Qu.:6.750
Max. :6.00 Max. :10.00 Max. :8.00 Max. :7.000
ggboxplot(Fried, x = "Condition", y = "score", add = "jitter")
res.fried <- Fried %>% friedman_test(score ~ Condition |Subject)
res.fried
# A tibble: 1 × 6
.y. n statistic df p method
* <chr> <int> <dbl> <dbl> <dbl> <chr>
1 score 6 11.6 2 0.00308 Friedman test
There is statistically significant difference in inhibiting learning
depending on which type of noise was upon the memorizing of nonsense
syllable of the subject.
Fried %>% friedman_effsize(score ~ Condition |Subject)
# A tibble: 1 × 5
.y. n effsize method magnitude
* <chr> <int> <dbl> <chr> <ord>
1 score 6 0.964 Kendall W large
A large effect size is detected, W = 0.9637681.
# pairwise comparisons
pwc <- Fried %>%
wilcox_test(score ~ Condition, paired = TRUE, p.adjust.method = "bonferroni")
pwc
# A tibble: 3 × 9
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
* <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
1 score Condition1 Condition2 6 6 21 0.02 0.059 ns
2 score Condition1 Condition3 6 6 21 0.035 0.105 ns
3 score Condition2 Condition3 6 6 15 0.057 0.17 ns
Condition1 and Condition2 are statistically significant with p-value 0.020.
Condition1 and Condition3 are statistically significant with p-value 0.035.
Condition2 and Condition3 are not statistically significant with p-value 0.057.
# Visualization: box plots with p-values
pwc <- pwc %>% add_xy_position(x = "Condition")
ggboxplot(Fried, x = "Condition", y = "score", add = "point") +
stat_pvalue_manual(pwc, hide.ns = TRUE) +
labs(
subtitle = get_test_label(res.fried, detailed = TRUE),
caption = get_pwc_label(pwc)
)
Based on the results above, the answer is affirmative.
library(readxl)
library(rmarkdown)
Kruskal <- read_excel("D:/COLLEGE 4TH YEAR/2nd SEMESTER/STAT 54 NONPARAMETRIC STATISTICS/FINAL/Kruskal.xlsx")
paged_table(Kruskal)
Krus <- Kruskal %>%
gather(key = "Condition", value = "score", Condition1, Condition2, Condition3) %>%
convert_as_factor(Subject, Condition)
head(Krus, 5)
# A tibble: 5 × 3
Subject Condition score
<fct> <fct> <dbl>
1 1 Condition1 8
2 2 Condition1 10
3 3 Condition1 9
4 4 Condition1 10
5 5 Condition1 9
head(Krus)
# A tibble: 6 × 3
Subject Condition score
<fct> <fct> <dbl>
1 1 Condition1 8
2 2 Condition1 10
3 3 Condition1 9
4 4 Condition1 10
5 5 Condition1 9
6 1 Condition2 7
summary(Kruskal)
Subject Condition1 Condition2 Condition3
Min. :1 Min. : 8.0 Min. :5.0 Min. :4.0
1st Qu.:2 1st Qu.: 9.0 1st Qu.:5.0 1st Qu.:5.0
Median :3 Median : 9.0 Median :7.0 Median :7.0
Mean :3 Mean : 9.2 Mean :6.6 Mean :6.2
3rd Qu.:4 3rd Qu.:10.0 3rd Qu.:8.0 3rd Qu.:7.0
Max. :5 Max. :10.0 Max. :8.0 Max. :8.0
set.seed(12345)
Krus %>% sample_n_by(Condition, size = 5)
# A tibble: 15 × 3
Subject Condition score
<fct> <fct> <dbl>
1 3 Condition1 9
2 4 Condition1 10
3 2 Condition1 10
4 5 Condition1 9
5 1 Condition1 8
6 2 Condition2 8
7 1 Condition2 7
8 3 Condition2 5
9 5 Condition2 5
10 4 Condition2 8
11 2 Condition3 8
12 5 Condition3 7
13 3 Condition3 7
14 4 Condition3 5
15 1 Condition3 4
Krus1 <- Krus %>%
reorder_levels(Condition, order = c("C1", "C2", "C3"))
Krus %>%
group_by(Condition) %>%
get_summary_stats(score, type = "common")
# A tibble: 3 × 11
Condition variable n min max median iqr mean sd se ci
<fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Condition1 score 5 8 10 9 1 9.2 0.837 0.374 1.04
2 Condition2 score 5 5 8 7 3 6.6 1.52 0.678 1.88
3 Condition3 score 5 4 8 7 2 6.2 1.64 0.735 2.04
# Visualization
ggboxplot(Krus, x = "Condition", y = "score")
# Computation
res.kruskal <- Krus %>% kruskal_test(score ~ Condition)
res.kruskal
# A tibble: 1 × 6
.y. n statistic df p method
* <chr> <int> <dbl> <int> <dbl> <chr>
1 score 15 8.75 2 0.0126 Kruskal-Wallis
# Effect size
Krus %>% kruskal_effsize(score ~ Condition)
# A tibble: 1 × 5
.y. n effsize method magnitude
* <chr> <int> <dbl> <chr> <ord>
1 score 15 0.562 eta2[H] large
A large effect size is detected, eta2[H] = 0.562284.
# Pairwise comparisons
pwc <- Krus %>%
dunn_test(score ~ Condition, p.adjust.method = "bonferroni")
pwc
# A tibble: 3 × 9
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
* <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
1 score Condition1 Condition2 5 5 -2.34 0.0193 0.0578 ns
2 score Condition1 Condition3 5 5 -2.74 0.00621 0.0186 *
3 score Condition2 Condition3 5 5 -0.396 0.692 1 ns
# Pairwise comparisons using Wilcoxon test:
pwc2 <- Krus %>%
wilcox_test(score ~ Condition, p.adjust.method = "bonferroni")
pwc2
# A tibble: 3 × 9
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
* <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
1 score Condition1 Condition2 5 5 24 0.019 0.057 ns
2 score Condition1 Condition3 5 5 24.5 0.015 0.045 *
3 score Condition2 Condition3 5 5 15 0.664 1 ns
There is statistically significant differences between inhibiting
learning depending on the noise present during the memorizing as
assessed using the Kruskal-Wallis test, that is, p = 0.013. Moreover,
Pairwise Wilcoxon test between groups showed that only the difference
between Condition1 and Condition3 group was significant, that is, p =
0.045.
# Visualization: box plots with p-values
pwc <- pwc %>% add_xy_position(x = "Condition")
ggboxplot(Krus, x = "Condition", y = "score") +
stat_pvalue_manual(pwc, hide.ns = TRUE) +
labs(
subtitle = get_test_label(res.kruskal, detailed = TRUE),
caption = get_pwc_label(pwc)
)
library(readxl)
library(rmarkdown)
Spearman <- read_excel("D:/COLLEGE 4TH YEAR/2nd SEMESTER/STAT 54 NONPARAMETRIC STATISTICS/FINAL/Spearman.xlsx")
paged_table(Spearman)
head(Spearman)
# A tibble: 5 × 3
Child NumberofOunces NumberofCavities
<dbl> <dbl> <dbl>
1 1 20 7
2 2 0 0
3 3 1 2
4 4 12 5
5 5 3 3
summary(Spearman)
Child NumberofOunces NumberofCavities
Min. :1 Min. : 0.0 Min. :0.0
1st Qu.:2 1st Qu.: 1.0 1st Qu.:2.0
Median :3 Median : 3.0 Median :3.0
Mean :3 Mean : 7.2 Mean :3.4
3rd Qu.:4 3rd Qu.:12.0 3rd Qu.:5.0
Max. :5 Max. :20.0 Max. :7.0
library(ggplot2)
ggplot(Spearman, aes(x=NumberofOunces, y=NumberofCavities)) +
geom_point(color='#2980B9', size = 4) +
geom_smooth(method=lm, se=FALSE, fullrange=TRUE, color='#2C3E50')
`geom_smooth()` using formula = 'y ~ x'
corr <- cor.test(x=Spearman$NumberofOunces, y=Spearman$NumberofCavities, method = 'spearman')
corr
Spearman's rank correlation rho
data: Spearman$NumberofOunces and Spearman$NumberofCavities
S = 4.4409e-15, p-value = 0.01667
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
1
Since the p value is less than 0.05 significance level, we have
enough evidence to reject the null hypothesis. Hence, there is an
association between the two variables, NumberofOunces and
NumberofCavities.