This is a part of basic statistical analysis topic of FETP training,
Thailand. This article aim to provide basis R code about basic
Non-parametric ( non normal distribution ) continuous data comparison
for those who not familiar with R.
The data set for this article is not provided.
The term non parametric mean data with other distributions except
normal distribution i.e. skewed distribution. For this kind of
comparison tests we have Wilcoxon signed rank test for paired (
independent ) variables and Wilcoxon rank sum test ( Mann-Whitney U test
) for dependent variables.In R we can use wilcox.test
function for the non parametric comparison.We can define the argument
paired = TRUE in this function to let R know that our test
is for independent variables.
library(readxl)
zinc_water <- read_xlsx("dataset_basic_2.xlsx",
sheet = "Zinc in water")
head(zinc_water)
## # A tibble: 6 × 3
## Location Bottom surface
## <dbl> <dbl> <dbl>
## 1 1 0.43 0.415
## 2 2 0.266 0.238
## 3 3 0.567 0.39
## 4 4 0.531 0.41
## 5 5 0.707 0.605
## 6 6 0.716 0.609
diff <- zinc_water$Bottom-zinc_water$surface
par(mfrow = c(1,2))
hist(diff,
main = "Histogram of zinc concentration",
xlab = "zinc concentration")
qqnorm(diff)
qqline(diff,
col = "red")
The zinc_water dataset record zinc concentration on the
surface and in the bottom of the water of 10 difference locations.
If we want to know that “Is zinc concentration at the bottom differ from
the surface of water?”, we can set null hypothesis to “The concentration
of zinc on the surface of water equal to the bottom” and then do the
wilcoxon test because the plots of zinc concentration different between
the the bottom and surface is not approximate to normal distribution.
With define paired = TRUE argument because the surface and
the bottom are dependent.
wilcox.test(x = zinc_water$Bottom,
y = zinc_water$surface,
paired = TRUE)
##
## Wilcoxon signed rank exact test
##
## data: zinc_water$Bottom and zinc_water$surface
## V = 55, p-value = 0.001953
## alternative hypothesis: true location shift is not equal to 0
P value less than 0.05 we can reject null hypothesis and conclude that there is a statistical different of zinc concentration between the bottom and surface of water.