One-way: To analyze the differences in < SOMETHING > (example:
% T cells) between your experimental groups (example: clean, dirty,
co-housed) we can use a one-way ANOVA.
Two-way: A two-way ANOVA will allow us to test the main effects of each
factor group as well as their interaction effect, such a group:sex, to
determine if there’s a difference between the groups and the sexes.
The assumptions of ANOVA are as follows: Normality: The data within
each group should be normally distributed.
Homogeneity of variance: The variance of the data within each group
should be equal.
Independence: The observations within each group should be
independent.
First, the data was added to the cluster and put in the appropriate
project folder. You can use scp in the terminal.
In a terminal that is NOT logged into the mayo cluster run the
following:
scp /PathTo/Spleen_General_males_1_Results.xls m####@mforgehn4:/research/labs/neurology/fryer/m####/ProjectFolder/
# read in the excel spreed sheet
df <- read_excel("Spleen_General_males_1_Results.xls")
## New names:
## • `` -> `...1`
# We need a group column so we know which samples belong to which group!
# First, rename the first column to sampleID
df <- df %>% rename(sampleID = 1)
# oh my, there's two rows that are not needed and will mess up our dataframe, let's remove mean and SD
df <- df %>% filter(!sampleID %in% c("Mean", "SD"))
# Create the new column group based on the conditions
df <- df %>% mutate(group = case_when(
startsWith(sampleID, "B") ~ "bedding",
startsWith(sampleID, "CH") ~ "co_housed",
startsWith(sampleID, "C") ~ "clean",
startsWith(sampleID, "D") ~ "dirty",
TRUE ~ NA_character_ # Assign NA if none of the conditions are met
))
# Display the resulting data frame
print(df) # Inspect
## # A tibble: 25 × 49
## sampleID `% live Cells` % Granulocytes of Li…¹ %B Cells of Live Cel…²
## <chr> <chr> <chr> <chr>
## 1 B_1_M general.f… 79.3 % 3.22 % 52.2 %
## 2 B_2_M general.f… 73.2 % 4.30 % 46.0 %
## 3 B_3_M general.f… 13.1 % 13.0 % 4.62 %
## 4 B_4_M general.f… 60.7 % 7.96 % 13.2 %
## 5 B_5_M general.f… 71.9 % 29.2 % 33.3 %
## 6 B_6_M general.f… 67.4 % 20.7 % 35.7 %
## 7 B_10_M general.… 64.3 % 9.26 % 22.2 %
## 8 CH_1_M general.… 57.7 % 16.1 % 35.2 %
## 9 CH_2_M general.… 60.8 % 9.87 % 49.9 %
## 10 CH_3_M general.… 69.5 % 18.8 % 31.1 %
## # ℹ 15 more rows
## # ℹ abbreviated names: ¹​`% Granulocytes of Live Cells`,
## # ²​`%B Cells of Live Cells`
## # ℹ 45 more variables: `MFI CD19 of B Cells` <dbl>,
## # `MFI CD45R of B Cells` <dbl>, `% NK Cells of Live Cells` <chr>,
## # `MFI of CD161 of NK Cells` <dbl>, `% NKT Cells of Live Cells` <chr>,
## # `MFI CD3 of NKT Cells` <dbl>, `MFI CD161 of NKT Cells` <dbl>, …
# R will not recognize numbers as numeric if that string contains characters.
is.numeric(df$`% live Cells`) # FALSE
## [1] FALSE
typeof(df$`% live Cells`) # Check the typeof data is % live Cells, see how it reports back as character.
## [1] "character"
# We will need to remove all % signs and convert from character to numeric.
# Remove all "%" characters in the data frame except for in the column names.
df <- df %>% mutate_all(~ gsub("%", "", .))
typeof(df$`% live Cells`) # Recheck the % live Cells. It still shows as character. Next convert to numeric
## [1] "character"
is.numeric(df$`% live Cells`) # FALSE
## [1] FALSE
# Convert all columns to numeric except for columns sampleID and group.
df <- df %>% mutate_at(vars(-sampleID, -group), as.numeric)
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `MFI CD62L of Naive CD4 = .Primitive("as.double")(`MFI CD62L of
## Naive CD4`)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
# The warning messages are due to the fact there there are NAs in some of the columns.
# You can't convert NA to numeric, hence the warning.
is.numeric(df$`% live Cells`) # Recheck the % live Cells. It now shows TRUE
## [1] TRUE
# Define the order of the groups
df$group <- factor(df$group, levels = c("clean", "bedding", "co_housed", "dirty"))
df$group # Inspect
## [1] bedding bedding bedding bedding bedding bedding bedding
## [8] co_housed co_housed co_housed co_housed co_housed co_housed clean
## [15] clean clean clean clean clean dirty dirty
## [22] dirty dirty dirty dirty
## Levels: clean bedding co_housed dirty
# To avoid some of these above pre-set up sets, think carefully about how R will handle column names, strings, and numbers when setting up your dataframe.
Shapiro test - testing for normality.
Levene Test - homogeneity of variance across groups.
clean_values <- df[df$group == "clean",]
bedding_values <- df[df$group == "bedding",]
co_housed_values <- df[df$group == "co_housed",]
dirty_values <- df[df$group == "dirty",]
# Shapiro-Wilk test, normality test for the % live cells
clean_shapiro <- shapiro.test(clean_values$`% live Cells`)
bedding_shapiro <- shapiro.test(bedding_values$`% live Cells`) # We reject the NULL
co_housed_shapiro <- shapiro.test(co_housed_values$`% live Cells`)
dirty_shapiro <- shapiro.test(dirty_values$`% live Cells`)
# Let us inspect the bedding values distribution
# Q-Q plots (or quantile-quantile plot) draws the correlation between a given sample and the normal distribution.
ggqqplot(bedding_values$`% live Cells`)
# There's an outlier sample which explains why we reject the null of normality
# Levene's test, performed using all samples from all the groups.
levTest <- leveneTest(df$`% live Cells` ~ df$group)
# Fit the model
model <- aov(`% live Cells` ~ group, data = df)
model # inspect
## Call:
## aov(formula = `% live Cells` ~ group, data = df)
##
## Terms:
## group Residuals
## Sum of Squares 2698.595 4621.305
## Deg. of Freedom 3 21
##
## Residual standard error: 14.83449
## Estimated effects may be unbalanced
summary(model) # summary
## Df Sum Sq Mean Sq F value Pr(>F)
## group 3 2699 899.5 4.088 0.0197 *
## Residuals 21 4621 220.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggplot(df,aes(x=group,y=`% live Cells`,colour=group)) +
geom_violin() +
geom_point(position = position_dodge(0.3)) +
scale_shape_manual(values=seq(0,15)) +
scale_color_manual(values=c("blue", "purple", "red", "black")) +
theme_bw() +
labs(title = "% live Cells", y = "% live Cells", x = "Group") +
stat_compare_means(method = "anova", paired = FALSE, size = 4, label.x = 1.25)
TukeyHSD(model)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = `% live Cells` ~ group, data = df)
##
## $group
## diff lwr upr p adj
## bedding-clean 18.630952 -4.37328653 41.63519 0.1405537
## co_housed-clean 27.616667 3.74403974 51.48929 0.0196100
## dirty-clean 23.800000 -0.07262692 47.67263 0.0508840
## co_housed-bedding 8.985714 -14.01852463 31.98995 0.7000276
## dirty-bedding 5.169048 -17.83519129 28.17329 0.9224471
## dirty-co_housed -3.816667 -27.68929359 20.05596 0.9697457