STAT 54 - FINAL EXAM

Answer:

library(readxl)
library(rmarkdown)
Kolmogorov <- read_excel("D:/COLLEGE 4TH YEAR/2nd SEMESTER/STAT 54 NONPARAMETRIC STATISTICS/FINAL/Kolmogorov.xlsx")
paged_table(Kolmogorov)

Null Hypothesis : The data comes form the specified distribution.

Alternative Hypothesis: At least one value does not match the specifies distribution.

mean(Kolmogorov$Scores)

[1] 90.06667

sd(Kolmogorov$Scores)

[1] 34.79292

ks.test(Kolmogorov,"pnorm")

Warning in ks.test.default(Kolmogorov, "pnorm"): ties should not be present for
the Kolmogorov-Smirnov test


    Asymptotic one-sample Kolmogorov-Smirnov test

data:  Kolmogorov
D = 0.96532, p-value < 2.2e-16
alternative hypothesis: two-sided

Since D is greater than the p-value, hence, we reject the null hypothesis. Thus, there is sufficient evidence to warrant rejection of the claim that the data comes form the specified distribution.

library(ggplot2)
ggplot(Kolmogorov, aes(Scores)) +
  geom_density()

ggplot(data=Kolmogorov) +  
  geom_histogram( aes(Scores, ..density..) ) +
  geom_density( aes(Scores, ..density..) ) +
  geom_rug( aes(Scores) )

Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The graph show that the data is not normally distributed.

Answer:

library(readxl)
library(rmarkdown)
Wilcox <- read_excel("D:/COLLEGE 4TH YEAR/2nd SEMESTER/STAT 54 NONPARAMETRIC STATISTICS/FINAL/Wilcox.xlsx")
paged_table(Wilcox)

Null Hypothesis : The median number of times a physician sees each of his patients during the year is five.

Alternative Hypothesis: The median number of times a physician sees each of his patients during the year is not equal to 5.

summary(Wilcox$Frequency)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    5.00    8.50    7.60    9.75   15.00

library(ggpubr)
ggboxplot(Wilcox$Frequency, 
          ylab = "Frequency", xlab = FALSE,
          ggtheme = theme_minimal())

res <- wilcox.test(Wilcox$Frequency, mu = 5)

Warning in wilcox.test.default(Wilcox$Frequency, mu = 5): cannot compute exact
p-value with ties

res


    Wilcoxon signed rank test with continuity correction

data:  Wilcox$Frequency
V = 44, p-value = 0.1016
alternative hypothesis: true location is not equal to 5

res$p.value

[1] 0.1015756

Since the p-value of the test is 0.1015756 which is greater than 0.05 level of significance, hence, we do not reject the null hypothesis.Thus the median number of times a physician sees each of his patients during the year is five significantly the same from a median of five with a p-value = 0.1015756.

Answer:

library(readxl)
library(rmarkdown)
Mann <- read_excel("D:/COLLEGE 4TH YEAR/2nd SEMESTER/STAT 54 NONPARAMETRIC STATISTICS/FINAL/Mann.xlsx")
paged_table(Mann)

normality_check <- shapiro.test(Mann$Group1)
if (normality_check$p.value > 0.05){
  print("The data comes from a population that is normally distributed. Please check the result of two sample t test.")
} else {
  print("The data is not normally distribued. Please check the result of Mann-Whitney U test.")
}

[1] "The data is not normally distribued. Please check the result of Mann-Whitney U test."

normality_check <- shapiro.test(Mann$Group2)
if (normality_check$p.value > 0.05){
  print("The data comes from a population that is normally distributed. Please check the result of two sample t test.")
} else {
  print("The data is not normally distribued. Please check the result of Mann-Whitney U test.")
}

[1] "The data comes from a population that is normally distributed. Please check the result of two sample t test."

Null Hypothesis: The level of depression of depressed patients which is administered by antidepressant drug is different from the placebo group.

Alternative Hypothesis: The level of depression of depressed patients which is administered by antidepressant drug is not different from the placebo group.

head(Mann)

# A tibble: 5 × 2
  Group1 Group2
   <dbl>  <dbl>
1     11     11
2      1     11
3      0      5
4      2      8
5      0      4

str(Mann)

tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
 $ Group1: num [1:5] 11 1 0 2 0
 $ Group2: num [1:5] 11 11 5 8 4

summary(Mann)

     Group1         Group2    
 Min.   : 0.0   Min.   : 4.0  
 1st Qu.: 0.0   1st Qu.: 5.0  
 Median : 1.0   Median : 8.0  
 Mean   : 2.8   Mean   : 7.8  
 3rd Qu.: 2.0   3rd Qu.:11.0  
 Max.   :11.0   Max.   :11.0

wilcox.test(Mann$Group1, Mann$Group2)


    Wilcoxon rank sum test with continuity correction

data:  Mann$Group1 and Mann$Group2
W = 4, p-value = 0.08969
alternative hypothesis: true location shift is not equal to 0

Since the p-value of the test is 0.08969 which is greater than the significance level 0.05, hence, we accept the null hypothesis.

wilcox.test(Mann$Group1, Mann$Group2, alternative = "less")

Warning in wilcox.test.default(Mann$Group1, Mann$Group2, alternative = "less"):
cannot compute exact p-value with ties


    Wilcoxon rank sum test with continuity correction

data:  Mann$Group1 and Mann$Group2
W = 4, p-value = 0.04484
alternative hypothesis: true location shift is less than 0

Since the p value is 0.04484 which is less than the significance level 0.05, we have sufficient evidence to say that the level of depression of depressed patients administered by antidepressant drug was less than that of the patients in the placebo group. Hence, antidepressant drug is effective.

Answer:

library(readxl)
library(rmarkdown)
Friedman <- read_excel("D:/COLLEGE 4TH YEAR/2nd SEMESTER/STAT 54 NONPARAMETRIC STATISTICS/FINAL/Friedman.xlsx")
paged_table(Friedman)

Null Hypothesis: The mean response of nonsense syllables correctly recalled by the six subjects under the three experimental conditions is the same for the three condition.

Alternative Hypothesis: The mean response of nonsense syllables correctly recalled by the six subjects under the three experimental conditions is not the same for the three condition.

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ tibble  3.1.7     ✔ dplyr   1.0.9
✔ tidyr   1.2.1     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.2
✔ purrr   0.3.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Attaching package: 'rstatix'


The following object is masked from 'package:stats':

    filter

Fried <- Friedman %>%
  gather(key = "Condition", value = "score", Condition1, Condition2, Condition3) %>%
  convert_as_factor(Subject, Condition)
head(F, 6)

[1] FALSE

summary(Friedman)

    Subject       Condition1      Condition2     Condition3   
 Min.   :1.00   Min.   : 7.00   Min.   :5.00   Min.   :2.000  
 1st Qu.:2.25   1st Qu.: 7.25   1st Qu.:5.25   1st Qu.:3.250  
 Median :3.50   Median : 8.50   Median :6.50   Median :5.000  
 Mean   :3.50   Mean   : 8.50   Mean   :6.50   Mean   :4.833  
 3rd Qu.:4.75   3rd Qu.: 9.75   3rd Qu.:7.75   3rd Qu.:6.750  
 Max.   :6.00   Max.   :10.00   Max.   :8.00   Max.   :7.000

ggboxplot(Fried, x = "Condition", y = "score", add = "jitter")

res.fried <- Fried %>% friedman_test(score ~ Condition |Subject)
res.fried

# A tibble: 1 × 6
  .y.       n statistic    df       p method       
* <chr> <int>     <dbl> <dbl>   <dbl> <chr>        
1 score     6      11.6     2 0.00308 Friedman test

There is statistically significant difference in inhibiting learning depending on which type of noise was upon the memorizing of nonsense syllable of the subject.

Fried %>% friedman_effsize(score ~ Condition |Subject)

# A tibble: 1 × 5
  .y.       n effsize method    magnitude
* <chr> <int>   <dbl> <chr>     <ord>    
1 score     6   0.964 Kendall W large

A large effect size is detected, W = 0.9637681.

# pairwise comparisons
pwc <- Fried %>%
  wilcox_test(score ~ Condition, paired = TRUE, p.adjust.method = "bonferroni")
pwc

# A tibble: 3 × 9
  .y.   group1     group2        n1    n2 statistic     p p.adj p.adj.signif
* <chr> <chr>      <chr>      <int> <int>     <dbl> <dbl> <dbl> <chr>       
1 score Condition1 Condition2     6     6        21 0.02  0.059 ns          
2 score Condition1 Condition3     6     6        21 0.035 0.105 ns          
3 score Condition2 Condition3     6     6        15 0.057 0.17  ns

Condition1 and Condition2 are statistically significant with p-value 0.020.

Condition1 and Condition3 are statistically significant with p-value 0.035.

Condition2 and Condition3 are not statistically significant with p-value 0.057.

# Visualization: box plots with p-values
pwc <- pwc %>% add_xy_position(x = "Condition")
ggboxplot(Fried, x = "Condition", y = "score", add = "point") +
  stat_pvalue_manual(pwc, hide.ns = TRUE) +
  labs(
    subtitle = get_test_label(res.fried,  detailed = TRUE),
    caption = get_pwc_label(pwc)
  )

Do the data indicate that noise influenced subjects’ performance?

Based on the results above, the answer is affirmative.

Answer:

library(readxl)
library(rmarkdown)
Kruskal <- read_excel("D:/COLLEGE 4TH YEAR/2nd SEMESTER/STAT 54 NONPARAMETRIC STATISTICS/FINAL/Kruskal.xlsx")
paged_table(Kruskal)

Krus <- Kruskal %>%
  gather(key = "Condition", value = "score", Condition1, Condition2, Condition3) %>%
  convert_as_factor(Subject, Condition)
head(Krus, 5)

# A tibble: 5 × 3
  Subject Condition  score
  <fct>   <fct>      <dbl>
1 1       Condition1     8
2 2       Condition1    10
3 3       Condition1     9
4 4       Condition1    10
5 5       Condition1     9

head(Krus)

# A tibble: 6 × 3
  Subject Condition  score
  <fct>   <fct>      <dbl>
1 1       Condition1     8
2 2       Condition1    10
3 3       Condition1     9
4 4       Condition1    10
5 5       Condition1     9
6 1       Condition2     7

summary(Kruskal)

    Subject    Condition1     Condition2    Condition3 
 Min.   :1   Min.   : 8.0   Min.   :5.0   Min.   :4.0  
 1st Qu.:2   1st Qu.: 9.0   1st Qu.:5.0   1st Qu.:5.0  
 Median :3   Median : 9.0   Median :7.0   Median :7.0  
 Mean   :3   Mean   : 9.2   Mean   :6.6   Mean   :6.2  
 3rd Qu.:4   3rd Qu.:10.0   3rd Qu.:8.0   3rd Qu.:7.0  
 Max.   :5   Max.   :10.0   Max.   :8.0   Max.   :8.0

set.seed(12345)
Krus %>% sample_n_by(Condition, size = 5)

# A tibble: 15 × 3
   Subject Condition  score
   <fct>   <fct>      <dbl>
 1 3       Condition1     9
 2 4       Condition1    10
 3 2       Condition1    10
 4 5       Condition1     9
 5 1       Condition1     8
 6 2       Condition2     8
 7 1       Condition2     7
 8 3       Condition2     5
 9 5       Condition2     5
10 4       Condition2     8
11 2       Condition3     8
12 5       Condition3     7
13 3       Condition3     7
14 4       Condition3     5
15 1       Condition3     4

Krus1 <- Krus %>%
  reorder_levels(Condition, order = c("C1", "C2", "C3"))

Krus %>% 
  group_by(Condition) %>%
  get_summary_stats(score, type = "common")

# A tibble: 3 × 11
  Condition  variable     n   min   max median   iqr  mean    sd    se    ci
  <fct>      <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Condition1 score        5     8    10      9     1   9.2 0.837 0.374  1.04
2 Condition2 score        5     5     8      7     3   6.6 1.52  0.678  1.88
3 Condition3 score        5     4     8      7     2   6.2 1.64  0.735  2.04

# Visualization

ggboxplot(Krus, x = "Condition", y = "score")

# Computation

res.kruskal <- Krus %>% kruskal_test(score ~ Condition)
res.kruskal

# A tibble: 1 × 6
  .y.       n statistic    df      p method        
* <chr> <int>     <dbl> <int>  <dbl> <chr>         
1 score    15      8.75     2 0.0126 Kruskal-Wallis

# Effect size

Krus %>% kruskal_effsize(score ~ Condition)

# A tibble: 1 × 5
  .y.       n effsize method  magnitude
* <chr> <int>   <dbl> <chr>   <ord>    
1 score    15   0.562 eta2[H] large

A large effect size is detected, eta2[H] = 0.562284.

# Pairwise comparisons

pwc <- Krus %>% 
  dunn_test(score ~ Condition, p.adjust.method = "bonferroni") 
pwc

# A tibble: 3 × 9
  .y.   group1     group2        n1    n2 statistic       p  p.adj p.adj.signif
* <chr> <chr>      <chr>      <int> <int>     <dbl>   <dbl>  <dbl> <chr>       
1 score Condition1 Condition2     5     5    -2.34  0.0193  0.0578 ns          
2 score Condition1 Condition3     5     5    -2.74  0.00621 0.0186 *           
3 score Condition2 Condition3     5     5    -0.396 0.692   1      ns

# Pairwise comparisons using Wilcoxon test:

pwc2 <- Krus %>% 
  wilcox_test(score ~ Condition, p.adjust.method = "bonferroni")
pwc2

# A tibble: 3 × 9
  .y.   group1     group2        n1    n2 statistic     p p.adj p.adj.signif
* <chr> <chr>      <chr>      <int> <int>     <dbl> <dbl> <dbl> <chr>       
1 score Condition1 Condition2     5     5      24   0.019 0.057 ns          
2 score Condition1 Condition3     5     5      24.5 0.015 0.045 *           
3 score Condition2 Condition3     5     5      15   0.664 1     ns

There is statistically significant differences between inhibiting learning depending on the noise present during the memorizing as assessed using the Kruskal-Wallis test, that is, p = 0.013. Moreover, Pairwise Wilcoxon test between groups showed that only the difference between Condition1 and Condition3 group was significant, that is, p = 0.045.

# Visualization: box plots with p-values

pwc <- pwc %>% add_xy_position(x = "Condition")
ggboxplot(Krus, x = "Condition", y = "score") +
  stat_pvalue_manual(pwc, hide.ns = TRUE) +
  labs(
    subtitle = get_test_label(res.kruskal, detailed = TRUE),
    caption = get_pwc_label(pwc)
    )

Answer:

library(readxl)
library(rmarkdown)
Spearman <- read_excel("D:/COLLEGE 4TH YEAR/2nd SEMESTER/STAT 54 NONPARAMETRIC STATISTICS/FINAL/Spearman.xlsx")
paged_table(Spearman)

Null Hypothesis: There is no association between the two variables, NumberofOunces and NumberofCavities.

Alternative Hypothesis: There is an association between the two variables, NumberofOunces and NumberofCavities.

head(Spearman)

# A tibble: 5 × 3
  Child NumberofOunces NumberofCavities
  <dbl>          <dbl>            <dbl>
1     1             20                7
2     2              0                0
3     3              1                2
4     4             12                5
5     5              3                3

summary(Spearman)

     Child   NumberofOunces NumberofCavities
 Min.   :1   Min.   : 0.0   Min.   :0.0     
 1st Qu.:2   1st Qu.: 1.0   1st Qu.:2.0     
 Median :3   Median : 3.0   Median :3.0     
 Mean   :3   Mean   : 7.2   Mean   :3.4     
 3rd Qu.:4   3rd Qu.:12.0   3rd Qu.:5.0     
 Max.   :5   Max.   :20.0   Max.   :7.0

library(ggplot2)
ggplot(Spearman, aes(x=NumberofOunces, y=NumberofCavities)) + 
  geom_point(color='#2980B9', size = 4) + 
  geom_smooth(method=lm, se=FALSE, fullrange=TRUE, color='#2C3E50')

`geom_smooth()` using formula = 'y ~ x'

corr <- cor.test(x=Spearman$NumberofOunces, y=Spearman$NumberofCavities, method = 'spearman')
corr


    Spearman's rank correlation rho

data:  Spearman$NumberofOunces and Spearman$NumberofCavities
S = 4.4409e-15, p-value = 0.01667
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho 
  1

Since the p value is less than 0.05 significance level, we have enough evidence to reject the null hypothesis. Hence, there is an association between the two variables, NumberofOunces and NumberofCavities.

STAT 54 - FINAL EXAM

Ian Duhaylungsod

2023-05-24

Answer:

Null Hypothesis : The data comes form the specified distribution.

Alternative Hypothesis: At least one value does not match the specifies distribution.

Answer:

Null Hypothesis : The median number of times a physician sees each of his patients during the year is five.

Alternative Hypothesis: The median number of times a physician sees each of his patients during the year is not equal to 5.

Answer:

Null Hypothesis: The level of depression of depressed patients which is administered by antidepressant drug is different from the placebo group.

Alternative Hypothesis: The level of depression of depressed patients which is administered by antidepressant drug is not different from the placebo group.

Answer:

Null Hypothesis: The mean response of nonsense syllables correctly recalled by the six subjects under the three experimental conditions is the same for the three condition.

Alternative Hypothesis: The mean response of nonsense syllables correctly recalled by the six subjects under the three experimental conditions is not the same for the three condition.

Do the data indicate that noise influenced subjects’ performance?

Answer:

Answer:

Null Hypothesis: There is no association between the two variables, NumberofOunces and NumberofCavities.

Alternative Hypothesis: There is an association between the two variables, NumberofOunces and NumberofCavities.

Answer:

Answer: