Hide all messages and warnings for the RMarkdown file.
knitr::opts_chunk$set(message = FALSE, warning = FALSE)
The data is available directly in RStudio.
{data(PlantGrowth)}
PlantGrowth is results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control(ctrl) and two different treatment conditions (trt1 and trt2).
library(dplyr)
library(tidyverse)
library(ggplot2)
Each treatment group has 10 data points, and the summary is calculated using the weights of plants from all 3 treatment groups.
summary(PlantGrowth)
## weight group
## Min. :3.590 ctrl:10
## 1st Qu.:4.550 trt1:10
## Median :5.155 trt2:10
## Mean :5.073
## 3rd Qu.:5.530
## Max. :6.310
The ‘head’ syntax shows the first 6 rows of data from PlantGrowth. This allows us to see that the data is formatted in 2 columns titled ‘weight’ and ‘group’.
head(PlantGrowth)
## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
str shows us the structure of the data. It is in a 30x2 dataframe with ‘Weight’ as numerical data and ‘Group’ as Factor with 3 categories.
str(PlantGrowth)
## 'data.frame': 30 obs. of 2 variables:
## $ weight: num 4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
## $ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...
QQplot of PlantGrowth to check the normality of the data. It seems like it is mostly normal with the points following the reference line of a normal distribution.
qqnorm(PlantGrowth$weight, main="QQ plot of Plant Growth")
qqline(PlantGrowth$weight, col="red")
A graph is plotted using ggplot to compare the three different treatments.
PlantGrowth %>% ggplot() +
# Set aesthetics (x, y, fill)
geom_histogram(aes(x = weight,
fill = group),
alpha = 0.5, bins = 15, color = "black") +
# Split the graph by species
facet_grid(rows = vars(group)) +
# Format axis labels
labs(x = "Weight (g)", y = "Count",
title = "Weight of Plants by Treatment Group") +
# Add colours manually
scale_fill_manual(values = c("darkorange",
"purple",
"cyan4")) +
# Set plot theme
theme_minimal() +
# Remove legend
theme(legend.position = "none")
From the histogram, we can see the spread of data across the different weights. The control has an even spread throughout the different weights. Treatment 1 mostly has lower plant weights with the data on the left and two outliers, and treatment 2 has much higher plant weights with the data on the right of the graph.
PlantGrowth %>% ggplot() +
geom_boxplot(aes(x = group, y = weight,
colour = group),
size = 1, outlier.color = "red") +
labs(x = "Treatment Group", y = "Weight (g)",
title = "Boxplot of Plants by Treatment Group") +
scale_colour_manual(values = c("darkorange",
"purple",
"cyan4")) +
theme_minimal() +
theme(legend.position = "none")
The boxplot also allows us to confirm the outliers; there are 2 outliers for treatment group 1, and the box for treatment 2 is much higher compared to treatment 1 and the control. Hence, different treatments does affect plant weight.
The mean, sum of squared deviations (ssd), standard error (se), lower and upper confidence intervals are calculated per treatment type and placed into a dataframe in order to be retrieved later on.
# Using dplyr functions
plantgrowth_summary <-
PlantGrowth %>%
group_by(group) %>%
summarise(count = n(),
mean = mean(weight, na.rm = TRUE),
ssd = sd(weight, na.rm = TRUE)) %>%
mutate(se = ssd / sqrt(count),
lower_ci = mean - qt(1 - (0.05 / 2), count - 1) * se,
upper_ci = mean + qt(1 - (0.05 / 2), count - 1) * se)
# Nice table output
knitr::kable(plantgrowth_summary)
| group | count | mean | ssd | se | lower_ci | upper_ci |
|---|---|---|---|---|---|---|
| ctrl | 10 | 5.032 | 0.5830914 | 0.1843897 | 4.614882 | 5.449118 |
| trt1 | 10 | 4.661 | 0.7936757 | 0.2509823 | 4.093239 | 5.228761 |
| trt2 | 10 | 5.526 | 0.4425733 | 0.1399540 | 5.209402 | 5.842598 |
The summary dataframe created above is now used to find out which treatment has the highest average growth and plotted into a graph for visual comparison.
plantgrowth_summary %>% ggplot() +
geom_point(aes(x = group, y = mean,
colour = group),
size = 3) +
geom_errorbar(aes(x = group,
ymin = lower_ci,
ymax = upper_ci,
colour = group),
width = 0.2, linewidth = 1) +
labs(x = "Treatment Group", y = "Weight (g)",
title = "Mean Weight of Plants by Treatment Group") +
scale_colour_manual(values = c("darkorange",
"purple",
"cyan4")) +
theme_minimal() +
theme(legend.position = "none")
At a glance, treatment 2 has the highest average growth, followed by the control and finally treatment 1. The lines extending from the data point are the confidence intervals, whereby there is a 95% confidence level that the true population mean weight falls within the range shown.