library(tidyverse)
library(skimr)
library(tibble)
library(psych)
library(patchwork)
Gibberellic acid (GA) is thought to elongate the stems of plants. Researchers conducted an experiment to investigate the effect of GA on a mutant strain of the genus Brassica called ros. They applied GA to 17 plants and applied water to 15 control plants. After 14 days they measured the growth of each of the 32 plants. In this experiment, the researchers were trying to establish whether GA affects the growth rate of ros; (2) the response variable is 14-day growth of ros, which is numeric; (3) the predictor variable is group membership (GA group or control group) and is categorical; the two groups are independent of one another.
control <- c(3,2,34,12,6,118,14,107,30,9,3,3,49,4,6)
GA <- c(71,87,117,80,112,66,128,153,131,45,38,137,57,163,47,108,35)
summary(control)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 3.50 9.00 26.67 32.00 118.00
summary(GA)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 35.00 57.00 87.00 92.65 128.00 163.00
The average growth of ros plants(mm) after 14 days on control is 26.67mm which is less than GA with average of 92.65mm. We can conclude that applying Ga(Gibberellic acid) is more effective than control(applied water only). Also, GA have a higher growth with 163mm than control have a 118mm only. Control have also a lowest growth of 2mm than GA have 35mm.
describe(control)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 15 26.67 37.52 9 21.54 8.9 2 118 116 1.51 0.82 9.69
describe(GA)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 17 92.65 41.67 87 91.8 59.3 35 163 128 0.12 -1.48 10.11
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
getmode(control)
## [1] 3
getmode(GA)
## [1] 71
#creating a histogram with density of the growth of ros plants (mm) after 14 days
control_dist <- ggplot(mapping = aes(control))+
geom_histogram(aes(y = ..density..), fill ="blue", color = "blue", alpha = .7, binwidth = 10)+
geom_density( fill = "skyblue", alpha = 0.8, color = "skyblue", adjust = .8)+
geom_vline(aes(xintercept = mean(control), color = "mean"), lty = 2, lwd = 1)+
geom_vline(aes(xintercept = median(control), color = "median"), lty = 2, lwd = 1)+
geom_vline(aes(xintercept = getmode(control), color = "mode"), lty = 2, lwd = 1)+
scale_color_manual(name = "Line Type",
breaks = c("mean", "median", "mode"),
values = c("mean" = "blue", "median" = "red", "mode" = "yellow"))+
theme_bw()+
labs(title = "Control Distribution",
subtitle = "Skewed to the right",
x = "growth of ros plants (mm)")+
theme( plot.title = element_text(size = 10,
face = "bold",
hjust = 0.5),
legend.position = "bottom")
GA_dist<- ggplot(mapping = aes(GA))+
geom_histogram(aes(y = ..density..), fill ="#0c6124", color ="#0c6124",alpha = .5, binwidth =10)+
geom_density( fill = "#2fed33", alpha = 0.5, color = "#2fed33", adjust = .7)+
geom_vline(aes(xintercept = mean(GA), color = "mean"), lty = 2, lwd = 1, show.legend = F)+
geom_vline(aes(xintercept = median(GA), color = "median"), lty = 2, lwd = 1, show.legend = F)+
geom_vline(aes(xintercept = getmode(GA), color = "mode"), lty = 2, lwd = 1, show.legend = F)+
scale_color_manual(breaks = c("mean", "median", "mode"),
values = c("mean" = "blue", "median" = "red", "mode" = "yellow"))+
theme_bw()+
labs(title = "Gibberellic acid(GA) Distribution",
subtitle = "Bimodal distribution",
x = "growth of ros plants (mm)")+
theme( plot.title = element_text(size = 10,
face = "bold",
hjust = 0.5),
)
(control_dist | GA_dist)+
plot_annotation(
title = "Growth of ros plants (mm) after 14 days Distribution",
theme = theme(plot.title = element_text(size = 15,
color = "blue"))
)
The distribution of the growth of ros plants (mm) after 14 days in control is positive skewed distribution or called skewed to the right meaning that data average in control is greater than the median and same also to the mode.
The distribution of the growth of ros plants (mm) after 14 days on GA is Bimodal distribution which there’s a growth have 2 peak or have a higher value of growth of within that days.
#boxplot of control
control_bxp <- ggplot(mapping = aes(control))+
geom_boxplot(fill = "skyblue", color = "blue")+
theme_bw() +
labs(title = ("Control Boxplot"))+
theme( plot.title = element_text(size = 15L,
face = "bold",
hjust = 0.5))
GA_bxp <- ggplot(mapping = aes(GA))+
geom_boxplot(fill = "orange", color = "orange", alpha = 0.5)+
theme_bw()+
labs(title = ("Gibberellic acid(GA) Boxplot"))+
theme( plot.title = element_text(size = 15L,
face = "bold",
hjust = 0.5))
(control_bxp | GA_bxp)+
plot_annotation(
title = "Growth of ros plants (mm) after 14 days Boxplot",
theme = theme(plot.title = element_text(size = 15,
color = "blue"))
)
The whisker-boxplot above show that value occur more on right after the center line or we called median meaning that our data in control is skewed to the right. We can also see there’s two outliers it may happen due to a large value of standard deviation 38.5 which also higher to our mean of 28.1 or we conclude that the dispersion of our data is very far from each other.
The whisker boxplot above show in GA data that is look like normally distributed but it’s actually bimodal based on histogram we see previously. Like what we explain earlier we can’t conclude that the graph above is bimodal since the mean and median is meaningless to that distribution.
There is a pros and cons of using the histogram and boxplot. In the histogram we clearly see the distribution of the data but we can see clearly if theirs outliers occurred. Unlike histogram, boxplot is can’t clearly see the distribution of data but we can see obviously if there some outliers.
1. A sample of 15 patients was randomly split into two groups as part of a double blind experiment to compare two pain relievers. The 7 patients in the first group were given Demerol and reported the following numbers of hours of pain relief: 2, 6, 4, 13, 5, 8, 4 The 8 patients in the second group were given an experimental drug and reported the following numbers of hours of pain relief. 0, 8, 1, 4, 2, 2, 1, 3 How might these data be analyzed?
first_group <- c(2,6,4,13,5,8,4,NA)
second_group <- c(0,8,1,4,2,2,1,3)
patient <- data.frame(first_group,second_group)
describe(patient)
## vars n mean sd median trimmed mad min max range skew kurtosis
## first_group 1 7 6.00 3.61 5 6.00 1.48 2 13 11 0.82 -0.71
## second_group 2 8 2.62 2.50 2 2.62 1.48 0 8 8 1.04 -0.14
## se
## first_group 1.36
## second_group 0.89
ratio <- var(first_group, na.rm = T) / var(second_group)
ratio
## [1] 2.074074
Since this ratio is less than 4, we could assume that the variances between the two groups are approximately equal.
fg_density <- patient %>%
drop_na(first_group) %>%
ggplot(aes(first_group))+
geom_density(fill = "orange", color = "orange", alpha = .5 )+
geom_vline(xintercept = 6, color = "orange", lty = 2, lwd =1)+
geom_text(aes(x = 8.5, y =0.14, label = "Mean of hours of relief\n6.00 hours"), size = 3)+
theme_bw()+
labs(
title = "first Group Distribution",
subtitle = "skew = 0.82\nThe distribution is approximately normal\n becuase the skew value is\n roughly around 0",
x = "Hours of Relief"
)
sg_density <- patient %>%
ggplot(aes(second_group))+
geom_density(fill = "blue", color = "blue", alpha = .5 )+
geom_vline(xintercept = mean(second_group), color = "blue", lty = 2, lwd =1)+
geom_text(aes(x = 4.5, y =0.20, label = "Mean of hours of relief\n2.62 hours"), size = 3)+
theme_bw()+
labs(
title = "Second Group Distribution",
subtitle = "skew = 1.04\nThe distrbution is not normal\nsince the skew value is positive",
x = "Hours of Relief"
)
((fg_density | sg_density)) &
plot_annotation(
title = "Distribution of Hours Of Relief in Two Groups ",
theme = theme(plot.title = element_text(size = 15,
color = "blue"))
)
Since the normally distributed didn’t meet we will performed Mann Whitney U Test
my_test <-wilcox.test(first_group, second_group, exact = F)
my_test
##
## Wilcoxon rank sum test with continuity correction
##
## data: first_group and second_group
## W = 46.5, p-value = 0.03556
## alternative hypothesis: true location shift is not equal to 0
hypothesis testing: Ho: the mean of two groups is equal Ha: the mean of two groups is not equal
Since the p-value is less than 0.05 we failed to reject the null hypothesis, we can conclude now that both group have same distribution and the effect of pain relievers to first group is equal to the effect of pain relievers of second group.
A researcher was interested in the relationship between forearm length and, height. He measured the forearm lengths and heights of a sample of 16 women and obtained the following data. How might these data be (i) visualized and (ii) analyzed?
height <- c(163,161,151,163,166,168,170,163,175,178,163,161,173,160,158,170)
length <- c(25.5,26,25,25,27.2,26,26,26,26,27,24.5,26,28,24.5,25,26)
forearm <- data.frame(height, length)
forearm %>%
ggplot(aes(height, length, color = height))+
scale_color_viridis_c()+
geom_point(size = 3)+
geom_smooth(method = lm, se = F)+
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
### Analyze The graph above have a positive linearity that going up and have some an outliers occurred.
cor(forearm$length, forearm$height)
## [1] 0.656379
The r value is 0.66 which mean that the relationship between height and length is moderately positive strong or we can say that while height increasing there a change of forearm also increasing.