library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.2.2
  1. A physician states that the median number of times he sees each of his patients during the year is five. In order to evaluate the validity of this statement, he randomly selects ten of his patients and determines the number of office visits each of them made during the past year.
library(readxl)
Wilcoxon <- read_excel("D:/MARV BS MATH/4th year, 2nd sem/Nonparametric Statistics/Final Exam/Wilcoxon.xlsx")
Wilcoxon
# A tibble: 10 × 2
   Patient Frequency
     <dbl>     <dbl>
 1       1         9
 2       2        10
 3       3         8
 4       4         4
 5       5         8
 6       6         3
 7       7         0
 8       8        10
 9       9        15
10      10         9

Null Hypothesis: The median number of times a physician sees each of his patients during the year is five.

Alternative Hypothesis: The median number of times a physician sees each of his patients during the year is not equal to 5.

To perform Wilcoxon Signed-Rank Test we have the following assumptions:

  1. The variable of interest which in this case is the frequency that a physician sees his patient during the year, is continuous. Answer : Yes
  2. The variable of interest is skewed.
ks.test(Wilcoxon,"pnorm")
Warning in ks.test.default(Wilcoxon, "pnorm"): ties should not be present for
the Kolmogorov-Smirnov test

    Asymptotic one-sample Kolmogorov-Smirnov test

data:  Wilcoxon
D = 0.87725, p-value = 8.56e-14
alternative hypothesis: two-sided

Since the p-value 8.56e-14 is less than D = 0.87725, then we reject the null hypothesis and conclude that at least one value does not match the specifies distribution.

This observation is shown in the graph below.

ggplot(Wilcoxon, aes(Frequency)) +
  geom_density()

ggplot(data=Wilcoxon) +  
  geom_histogram( aes(Frequency, ..density..) ) +
  geom_density( aes(Frequency, ..density..) ) +
  geom_rug( aes(Frequency) )
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

By the above illustration and Kolmogorov-Smirnov test for normality, it shows that the data is skewed. Thus, we can perfom the Wilcoxon Signed-Rank Test.

Do the data support his contention that the median number of times he sees a patient is five?

summary(Wilcoxon$Frequency)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    5.00    8.50    7.60    9.75   15.00 
library(ggpubr)
Warning: package 'ggpubr' was built under R version 4.2.2
ggboxplot(Wilcoxon$Frequency, 
          ylab = "Frequency", xlab = FALSE,
          ggtheme = theme_minimal())

res <- wilcox.test(Wilcoxon$Frequency, mu = 5)
Warning in wilcox.test.default(Wilcoxon$Frequency, mu = 5): cannot compute exact
p-value with ties
res 

    Wilcoxon signed rank test with continuity correction

data:  Wilcoxon$Frequency
V = 44, p-value = 0.1016
alternative hypothesis: true location is not equal to 5
res$p.value
[1] 0.1015756

The p-value of the test is 0.1015756, which is greater than the significance level alpha = 0.05. We do not reject the null hypothesis and conclude that The median number of times a physician sees each of his patients during the year is five significantly the same from a median of five with a p-value = 0.1015756.