The data table that I have used, describes the hours of relief provided by two analgesic drugs in 12 patients suffering from arthritis.We want to analyze the effects of two separate drugs on each of the 12 patients.
Verification of each assumption for the test we are using for this project. As we know that Wilcoxon Rank-Sum test can often be used provided the two independent samples are drawn from populations with an ordinal distribution. We will do comparative analysis of two treatment groups of pair data.
#load drug data
library(readxl)
Drug_Data <- read_excel("Drug Data/Drug Data.xlsx")
#only selecting DrugA reading
DrugA = Drug_Data[,2]
DrugB = Drug_Data[,3]
Web have assumed: (I) we are dealing with non parametric data where they are not normally distributed. (II) The distribution of the differences between DrugA and DrugB are not symmetrical (III) The variables are independent identically distributed. The Wilcoxon test is a non-parametric test where we do not require the assumption of normality of distributions.Let’s take a look at the data for verifying our assumption.
library(ggplot2)
# take a look at the data
par(mfrow=c(2,2))
diff1 <- Drug_Data$DrugA - Drug_Data$DrugB
hist(diff1, col = c("green"), main = "Histogram for differences")
hist(Drug_Data$DrugA, col = c("red"), main = "Histogram for Drug A")
hist(Drug_Data$DrugB, col = c("darkgrey"), main = "Histogram for Drug B")
#check the distribution
boxplot(diff1, col = c("red"), main = "Boxplot")
#Q-Q plot
qqplot = ggplot (data = Drug_Data, aes (x = Drug_Data$DrugA, y = Drug_Data$DrugB )) + geom_point(shape=1)
qqplot = qqplot + xlab("Expected DrugA~DrugB Values") + ylab("Observed DrugA~DrugB Values")
qqplot = qqplot + geom_abline (intercept = 0, slope = 1, color="red")
qqplot
## Warning: Use of `Drug_Data$DrugA` is discouraged. Use `DrugA` instead.
## Warning: Use of `Drug_Data$DrugB` is discouraged. Use `DrugB` instead.
Description: Histogram allows us to inspect the data for its underlying distribution (e.g., normal distribution), outliers, skewness, etc.From the histograms depicted above, we can clearly say that data is skewed.A boxplot is a graph that gives us a good indication of how the values in the data are spread out. From the boxplots depicted above, we have seen some outliers in the data and upper whisker greater than lower whisker.Finally,It is very important for us to know whether the distribution is normal or not.From the QQ plot depicted above we can clearly justify our assumption that data is not normally distributed.
When we look at the QQ Plot for the Control group we see that it is not very normal, but more concerning is that the Box Plot for the group that took the drug shows that the data is not very symmetric. We, therefore, decide to use the Wilcoxon Sign-Rank test instead of the t-test.
Shapiro-wilk test for normality: We can build a hypothesis testing model to justify our shapiro-wilk test for normality. Formal hypotehsis testing: Null Hypothesis: The data is from normal distribution Alternate Hypothesis: The data is not from normal distribution
shapiro.test(diff1)
##
## Shapiro-Wilk normality test
##
## data: diff1
## W = 0.62683, p-value = 0.0001813
As the p-value is less than the significance level (\(\alpha\) = 0.05), we can reject null hypothesis. So, it is proved that data is not normally distributed.
Research objective for this data set With the given data, we can develop our research objective as Is there any evidence that one drug provides longer relief than the other?
Develop null and alternative hypotheses that are related to the research objective. Here, we are assuming a significance level of \(\alpha\) = 0.05. We have a paired data over here.So,We can do hypothesis testing (two Tailed Test) to identify the given research proposition:
Hypothesis:
Null Hypothesis, \(H_o\), \(Median Difference\) = 0
Alternative Hypothesis, \(H_a\), \(Median Difference\) \(\neq\) 0
Wilcoxon Rank-Sum Test For paired data set with ties,
wilcox.test(Drug_Data$DrugA,Drug_Data$DrugB, alternative = "two.sided", paired = T, exact=FALSE,correct=TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: Drug_Data$DrugA and Drug_Data$DrugB
## V = 7, p-value = 0.01344
## alternative hypothesis: true location shift is not equal to 0
Test Statistic The test statistic for the Wilcoxon Signed Rank Test is W, defined as the smaller of W+ (sum of the positive ranks) and W- (sum of the negative ranks). If the null hypothesis is true, we expect to see similar numbers of lower and higher ranks that are both positive and negative (i.e., W+ and W- would be similar). If the research hypothesis is true we expect to see more higher and positive ranks.
Calculating W+ and W− from rank gives: W− = 1 + 2 + 4 = 7 W+ = 3 + 5.5 + 5.5 + 7 + 8 + 9 + 10 + 11 + 12 = 71
Therefore, we have \[n = \frac{n(n+1)}{2} = \frac{12×13}{2} = 78\] If the number of observations/pairs n(n+1)/2 is greater than 20, you can use a normal approximation. Here, n = 78, we use normal approximation.
\[W = max(W^{-}, W^{+}) = 71.\] The mean value: \[\mu_w = \frac{n(n+1)}{4}= \frac{12×13}{4}= 39\] The variance: \[\sigma_w = \sqrt(\frac{n(n+1)(2n+1)}{24}) = \sqrt(\frac{12*13*25}{24}) = 162.5\] We have one group of 2 tied ranks, t = 2, so we must reduce the variance by \[\frac{t^3-t}{48}= \frac{8−2}{48} = 0.125\] So the z-value will be: \[z = \frac{max(W^-, W^+)-\mu_w}{\sqrt(\sigma_w-0.125)}= \frac{71-39}{\sqrt(162.5-0.125)}= 2.511125\] Looking up this score in the z-table, we get an area of 0.9880, equal to a two-tailed p-value of 0.012. This is a tiny p-value, a strong indication that the medians are significantly different.
p-value From our above calculation, we got p-value = 0.012 and from R-Code, we have found the p-value= 0.01344 which is less than the significance level \(\alpha= 0.05\).This is a tiny p-value, a strong indication that the medians are significantly different.
Decision about the null hypothesis As the p-value is less than the significance level. So, we can reject the null hypothesis.
Conclusion with respect to the research objective:
As the medians are significantly different, we can say that there is strong evidence that Drug B provides more relief than Drug A