Unraveling the Phenotypic Tapestry of Silkworm Hybrids: A Comprehensive Multi-Trait, Multi-Location Analysis: This study embarks on a rigorous exploration of phenotypic diversity within a population of silkworm (Bombyx mori) hybrids, reared across two distinct locations in the Philippines. By employing a robust statistical and multivariate framework, we aim to transcend mere description and delve into the intricate interplay of genetic and environmental factors that orchestrate the expression of key traits critical for both fundamental research and applied breeding programs.
Delving into the Data: A Multifaceted Approach
Our investigation is structured around four key analytical pillars:
1. Data Exploration & Assumption Validation: We begin by meticulously examining the distributional properties of each phenotypic trait across the hybrid strains. This involves constructing histograms and density plots to visualize the shape of the distributions, assess central tendency and variability, and identify potential outliers that might warrant further investigation or specialized statistical treatment. Recognizing that many statistical tests rely on specific assumptions about the data, we rigorously assess the normality of the trait distributions using the Shapiro-Wilk test and evaluate the homogeneity of variance across strains using Levene’s test. These assessments guide our choice of subsequent statistical tests and ensure the validity of our inferences.
2. ANOVA & Non-Parametric Testing: Discerning Strain Differences: To discern statistically significant differences in trait means among the silkworm hybrids, we employ Analysis of Variance (ANOVA), a powerful parametric test that compares means across multiple groups. However, the validity of ANOVA hinges on the fulfillment of the normality and homogeneity of variance assumptions. In cases where these assumptions are violated, we strategically employ log transformations to potentially rectify deviations from normality or homoscedasticity. If transformations fail to remedy the violations, we adopt the non-parametric Kruskal-Wallis test, which compares medians across groups based on ranks, making it suitable for data that doesn’t conform to the assumptions of ANOVA. To pinpoint specific strain pairs exhibiting significant differences, we employ post-hoc tests, namely Tukey’s Honestly Significant Difference (HSD) test for ANOVA and Dunn’s test for Kruskal-Wallis. These tests allow for pairwise comparisons between strains while controlling for the increased risk of Type I errors (false positives) associated with multiple testing.
3. Multivariate Analysis: Unveiling Hidden Patterns: To move beyond univariate analysis and explore the interrelationships among multiple traits, we harness the power of multivariate techniques. Principal Component Analysis (PCA) is employed to unveil the underlying structure of phenotypic variation, effectively reducing the dimensionality of the data and identifying key patterns. By scrutinizing scree plots and biplots, we ascertain the contribution of each trait to the principal components and visualize the relationships between individuals (silkworms) and variables (traits). Furthermore, we apply hierarchical clustering and k-means clustering on the principal component scores to classify silkworms into distinct groups based on their phenotypic similarities. This clustering analysis can reveal natural groupings of strains with similar trait profiles, potentially suggesting shared genetic backgrounds or responses to environmental factors.
4. Correlation Analysis: Deciphering Trait Interdependencies: To quantify and visualize the relationships between traits, we compute and visualize both Pearson and Spearman correlation matrices. The Pearson correlation coefficient captures linear relationships between traits, while the Spearman correlation coefficient assesses monotonic relationships, accommodating both linear and non-linear associations. Scrutinizing correlograms allows us to discern clusters of highly correlated traits and identify potential multicollinearity, which can inform subsequent analyses. To rigorously assess the strength and significance of associations between specific trait pairs, we conduct individual correlation tests (Pearson and Spearman). Furthermore, we generate scatterplot matrices to visualize all pairwise relationships between traits, aiding in the identification of patterns, outliers, and potential non-linear associations that might not be captured by simple correlation coefficients. Finally, we explore trait correlations within the context of strain variation using mixed-effects models, enabling us to discern whether correlations are consistent across strains or if they vary depending on the genetic background. This can provide insights into the genetic and environmental factors that influence trait co-variation.
Bridging the Gap: Genotype to Phenotype: By integrating these diverse analytical approaches, this study strives to transcend mere description and provide a comprehensive portrait of the phenotypic landscape within a population of silkworm hybrids. The insights gleaned from this investigation will not only deepen our understanding of the genetic architecture underlying silkworm traits but also furnish invaluable information for targeted breeding efforts to enhance silk production and other economically important characteristics. Moreover, this study contributes to the broader field of evolutionary biology by shedding light on the interplay between genetic variation and phenotypic diversity in a model organism with profound implications for both scientific inquiry and industrial applications.
# Read the data with subsamples
data_subsamples <- read_excel("Hybrid.xls")
# Convert 'Strain' to a factor in the subsample data
data_subsamples$Strain <- as.factor(data_subsamples$Strain)
# Convert 'Location' to a factor in the subsample data
data_subsamples$Location <- as.factor(data_subsamples$Location)
# List of traits in the subsample dataset
traits_subsamples <- c("Single_Cocoon_Weight", "Cocoon_Shell_Weight", "Cocoon_Shell_Percentage")
# Remove rows with missing values for any of the traits in the subsample dataset
data_subsamples_complete <- data_subsamples %>% filter(complete.cases(data_subsamples[, traits_subsamples]))
# Load the data without subsamples
data_no_subsamples <- read_excel("Hybrid_new.xls")
# Convert 'Strain' to a factor in the no subsample data
data_no_subsamples$Strain <- as.factor(data_no_subsamples$Strain)
# Convert 'Location' to a factor in the no subsample data
data_no_subsamples$Location <- as.factor(data_no_subsamples$Location)
# --- Merge the two datasets ---
# Aggregate the data with subsamples by Strain and Location using 'aggregate()'
data_agg <- aggregate(cbind(Single_Cocoon_Weight, Cocoon_Shell_Weight, Cocoon_Shell_Percentage) ~ Strain + Location,
data = data_subsamples_complete,
FUN = mean)
# Merge the aggregated data with the data without subsamples
combined_data <- merge(data_agg, data_no_subsamples, by = c("Strain", "Location"))
# Reorder columns to place 'Replicate' after 'Strain'
combined_data <- combined_data %>% dplyr::select(Strain, Replicate, everything()) # Use dplyr::select
# List of all traits
all_traits <- c(traits_subsamples, "Hatching_Rate", "Larval_Weight", "Live_Pupa_Percentage")
We begin by exploring the distributions of each trait and checking the assumptions of normality and homogeneity of variance, crucial for subsequent analyses.
Data Exploration and Assumption Checking
# Loop through each trait
for (trait in all_traits) {
cat("\n\n--- Analyzing", trait, "---\n")
# Use the appropriate dataframe for each trait
if (trait %in% traits_subsamples) {
# Use data_subsamples_complete for traits with subsamples
data_trait <- data_subsamples_complete
} else {
# Use combined_data for traits without subsamples
data_trait <- combined_data
}
# Fit a temporary linear mixed-effects model to check assumptions
temp_model <- lm(as.formula(paste(trait, "~ Strain + Location")), data = data_trait)
# Boxplot with outliers, filled by Location
cat("\nBoxplot for", trait, ":\n")
boxplot_outliers <- ggplot(data_trait, aes(x = Strain, y = .data[[trait]], fill = Location)) +
geom_boxplot(outlier.shape = 16, outlier.size = 2) +
labs(title = paste("Boxplot of", trait, "by Strain and Location"),
x = "Strain", y = trait) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
print(boxplot_outliers)
# Outlier Detection and Treatment (using Tukey's fences on residuals)
resid_trait <- residuals(temp_model)
Q1 <- quantile(resid_trait, 0.25)
Q3 <- quantile(resid_trait, 0.75)
IQR <- Q3 - Q1
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR
outliers <- data_trait %>%
mutate(residual = resid_trait) %>%
filter(residual < lower_bound | residual > upper_bound) %>%
dplyr::select(Strain, Replicate, Location, all_of(trait))
if (nrow(outliers) > 0) {
cat("\nOutliers for", trait, "(based on residuals):\n")
print(outliers)
} else {
cat("\nNo outliers found for", trait, "\n")
}
# Normality test (Shapiro-Wilk) on residuals
normality_test <- shapiro.test(resid_trait)
cat("\nNormality Test (Shapiro-Wilk) on Residuals:\n")
print(normality_test)
# Homogeneity of variance test (Levene's)
variance_test <- leveneTest(resid_trait ~ data_trait$Strain * data_trait$Location)
cat("\nHomogeneity of Variance Test (Levene's):\n")
print(variance_test)
# Distribution plots
cat("\nDistribution Plots for", trait, ":\n")
# Combined distribution plot
p1 <- ggplot(data_trait, aes(x = .data[[trait]], fill = Location)) +
geom_density(alpha = 0.5) +
labs(title = paste("Distribution of", trait, "(All Strains)"),
x = trait, y = "Density") +
theme_bw()
# Distribution plots by strain and location
p2 <- ggplot(data_trait, aes(x = .data[[trait]], fill = Location)) +
geom_density(alpha = 0.5) +
labs(x = trait, y = "Density") +
facet_wrap(~ Strain, ncol = 6) +
theme_bw() +
theme(strip.text = element_text(size = 8))
# Arrange and print the plots
print(ggarrange(p1, p2, nrow = 2, heights = c(1, 3)))
}
##
##
## --- Analyzing Single_Cocoon_Weight ---
##
## Boxplot for Single_Cocoon_Weight :
##
## Outliers for Single_Cocoon_Weight (based on residuals):
## # A tibble: 8 × 4
## Strain Replicate Location Single_Cocoon_Weight
## <fct> <dbl> <fct> <dbl>
## 1 SW 101 2 BALUBAL 2.27
## 2 SW 102 3 BALUBAL 2.29
## 3 SW 107 3 BALUBAL 2.30
## 4 SW 109 1 BALUBAL 0.359
## 5 SW 110 2 BALUBAL 2.32
## 6 SW 111 2 BALUBAL 2.8
## 7 SW 115 2 BALUBAL 0.823
## 8 SW 115 3 BALUBAL 0.708
##
## Normality Test (Shapiro-Wilk) on Residuals:
##
## Shapiro-Wilk normality test
##
## data: resid_trait
## W = 0.98536, p-value = 5.174e-08
##
##
## Homogeneity of Variance Test (Levene's):
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 31 2.0195 0.000899 ***
## 896
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Distribution Plots for Single_Cocoon_Weight :
##
##
## --- Analyzing Cocoon_Shell_Weight ---
##
## Boxplot for Cocoon_Shell_Weight :
##
## Outliers for Cocoon_Shell_Weight (based on residuals):
## # A tibble: 23 × 4
## Strain Replicate Location Cocoon_Shell_Weight
## <fct> <dbl> <fct> <dbl>
## 1 SW 104 2 TCMO 0.77
## 2 SW 111 2 TCMO 0.39
## 3 SW 111 2 TCMO 0.4
## 4 SW 111 2 TCMO 0.39
## 5 SW 113 3 TCMO 0.414
## 6 SW 101 2 BALUBAL 0.468
## 7 SW 101 2 BALUBAL 0.187
## 8 SW 101 2 BALUBAL 0.21
## 9 SW 102 3 BALUBAL 0.443
## 10 SW 103 3 BALUBAL 0.213
## # ℹ 13 more rows
##
## Normality Test (Shapiro-Wilk) on Residuals:
##
## Shapiro-Wilk normality test
##
## data: resid_trait
## W = 0.92283, p-value < 2.2e-16
##
##
## Homogeneity of Variance Test (Levene's):
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 31 2.3735 4.36e-05 ***
## 896
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Distribution Plots for Cocoon_Shell_Weight :
##
##
## --- Analyzing Cocoon_Shell_Percentage ---
##
## Boxplot for Cocoon_Shell_Percentage :
##
## Outliers for Cocoon_Shell_Percentage (based on residuals):
## # A tibble: 13 × 4
## Strain Replicate Location Cocoon_Shell_Percentage
## <fct> <dbl> <fct> <dbl>
## 1 SW 104 2 TCMO 65.8
## 2 SW 106 2 TCMO 28.9
## 3 SW 111 2 TCMO 29.1
## 4 SW 113 2 TCMO 28.2
## 5 SW 115 2 TCMO 27.6
## 6 SW 103 1 BALUBAL 33.4
## 7 SW 103 3 BALUBAL 11.7
## 8 SW 109 1 BALUBAL 85.5
## 9 SW 111 2 BALUBAL 13.1
## 10 SW 114 1 BALUBAL 12.3
## 11 SW 114 2 BALUBAL 12.9
## 12 SW 115 1 BALUBAL 10.7
## 13 SW 115 2 BALUBAL 30.4
##
## Normality Test (Shapiro-Wilk) on Residuals:
##
## Shapiro-Wilk normality test
##
## data: resid_trait
## W = 0.64692, p-value < 2.2e-16
##
##
## Homogeneity of Variance Test (Levene's):
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 31 0.9305 0.5772
## 896
##
## Distribution Plots for Cocoon_Shell_Percentage :
##
##
## --- Analyzing Hatching_Rate ---
##
## Boxplot for Hatching_Rate :
##
## Outliers for Hatching_Rate (based on residuals):
## Strain Replicate Location Hatching_Rate
## 1 SW 107 3 BALUBAL 91.86992
## 2 SW 108 3 BALUBAL 92.15686
##
## Normality Test (Shapiro-Wilk) on Residuals:
##
## Shapiro-Wilk normality test
##
## data: resid_trait
## W = 0.98544, p-value = 0.3701
##
##
## Homogeneity of Variance Test (Levene's):
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 31 0.5917 0.9445
## 64
##
## Distribution Plots for Hatching_Rate :
##
##
## --- Analyzing Larval_Weight ---
##
## Boxplot for Larval_Weight :
##
## Outliers for Larval_Weight (based on residuals):
## Strain Replicate Location Larval_Weight
## 1 SW 101 2 TCMO 32.95233
## 2 SW 108 3 BALUBAL 26.34000
##
## Normality Test (Shapiro-Wilk) on Residuals:
##
## Shapiro-Wilk normality test
##
## data: resid_trait
## W = 0.98383, p-value = 0.2864
##
##
## Homogeneity of Variance Test (Levene's):
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 31 0.6008 0.9391
## 64
##
## Distribution Plots for Larval_Weight :
##
##
## --- Analyzing Live_Pupa_Percentage ---
##
## Boxplot for Live_Pupa_Percentage :
##
## Outliers for Live_Pupa_Percentage (based on residuals):
## Strain Replicate Location Live_Pupa_Percentage
## 1 SW 109 2 BALUBAL 72.61905
## 2 SW 113 1 BALUBAL 72.22222
##
## Normality Test (Shapiro-Wilk) on Residuals:
##
## Shapiro-Wilk normality test
##
## data: resid_trait
## W = 0.97226, p-value = 0.03928
##
##
## Homogeneity of Variance Test (Levene's):
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 31 0.9019 0.6155
## 64
##
## Distribution Plots for Live_Pupa_Percentage :
Technical Discussion Points and Considerations:
Boxplots: 1. Central Tendency and Spread: Examine the medians (horizontal lines within the boxes) to compare the typical values of each trait across strains. Observe the interquartile ranges (IQR, the height of the boxes) to gauge the spread of the data within each strain. 2. Outliers: Identify any points plotted individually outside the whiskers. These outliers might indicate unusual observations or potential errors in data collection. Consider their impact on 3. subsequent analyses and whether they should be handled (e.g., removal, winsorization) if they are deemed to be influential. 4. Strain Differences: Visually compare the boxplots across strains to get a preliminary sense of which traits might show significant differences between strains.
Normality Tests (Shapiro-Wilk): 1. P-values: Assess the p-values for each trait. If p < 0.05, it suggests a significant departure from normality. This might necessitate data transformation or the use of non-parametric tests. 2. Visual Inspection of Q-Q Plots: Complement the p-values by examining the Q-Q plots generated by the check_model function. If the points deviate substantially from the straight line, it further supports the evidence of non-normality.
Homogeneity of Variance Tests (Levene’s): 1. P-values: Evaluate the p-values for each trait. If p < 0.05, it suggests significant heterogeneity of variance (unequal variances) across strains. This might also necessitate data transformation or the use of alternative statistical tests. 2. Visual Inspection of Residuals vs. Fitted Plots: Examine the Residuals vs. Fitted plots from check_model. If the spread of the residuals is not roughly constant across the range of fitted values, it further supports the evidence of heteroscedasticity.
Distribution Plots: 1. Shape of Distributions: Observe the shape of the density curves for each trait, both for all strains combined and for individual strains. Look for deviations from a normal bell-shaped curve (e.g., skewness, multiple peaks). 2. Strain Comparisons: Compare the density curves across strains to visually assess differences in central tendency (mean or median) and spread (variance or interquartile range).
# Loop through each trait
for (trait in all_traits) {
cat("\n\n--- Analyzing", trait, "---\n")
# Use the appropriate dataframe for each trait
if (trait %in% traits_subsamples) {
# Use data_subsamples_complete for traits with subsamples
data_trait <- data_subsamples_complete
} else {
# Use combined_data for traits without subsamples
data_trait <- combined_data
}
# Fit the linear mixed-effects model (LMM) with 'Location' and interaction
lmer_model <- lm(as.formula(paste(trait, "~ Strain + Location")), data = data_trait)
# Check for non-normality or heteroscedasticity
normality_p_value <- shapiro.test(residuals(lmer_model))$p.value
levene_result <- leveneTest(residuals(lmer_model) ~ data_trait$Strain * data_trait$Location)
homogeneity_p_value <- levene_result$"Pr(>F)"[1]
# Create means dataframe for CLDs and boxplot (outside the conditional blocks)
means <- aggregate(as.formula(paste(trait, "~ Strain")), data = data_trait, FUN = mean)
colnames(means)[colnames(means) == trait] <- paste0("Mean_", trait)
if (normality_p_value >= 0.05 & homogeneity_p_value >= 0.05) {
# Perform ANOVA directly on original data if both assumptions are met
cat("\n**Performing ANOVA on original data:**\n")
anova_result <- anova(lmer_model)
print(anova_result)
# Calculate and print eta-squared for the LMM
eta_squared <- performance::r2(lmer_model)$Marginal$R2
cat("Eta-squared (Marginal R-squared):", eta_squared, "\n")
# Tukey's HSD if ANOVA is significant
if (anova_result$`Pr(>F)`[1] < 0.05) {
tukey_result <- emmeans(lmer_model, pairwise ~ Strain | Location, adjust = "tukey")
print(tukey_result)
# Extract CLD from Tukey's results
cld_data <- as.data.frame(tukey_result$contrasts) %>%
mutate(sig = ifelse(p.value < 0.05, "*", " ")) %>%
dplyr::select(contrast, estimate, SE, df, t.ratio, p.value, sig)
# Create CLDs using 'multcompLetters4'
cld_letters <- multcompLetters4(tukey_result$emmeans, tukey_result$contrasts)
# Print the compact letter display
cat("\n**Compact Letter Displays (CLDs):**\n")
print(cld_letters$LetterMatrix)
# Boxplot with CLDs, faceted by Location
# Ensure complete cases for plotting
data_complete_plot <- data_trait %>% filter(!is.na(.data[[trait]]))
# Base boxplot
p <- ggplot(data_complete_plot, aes(x = Strain, y = .data[[trait]], fill = Location)) +
geom_boxplot() +
labs(title = paste("Boxplot of", trait, "by Strain and Location (ANOVA with Tukey HSD)"),
x = "Strain", y = trait) +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
facet_wrap(~ Location) # Facet by Location
# Add labels from cld_letters
for (loc in levels(data_trait$Location)) { # Use data_trait$Location here
cld_data_loc <- cld_letters$LetterMatrix[, loc] %>%
data.frame(Letters = .) %>%
rownames_to_column("Strain") %>%
mutate(y_pos = aggregate(as.formula(paste(trait, "~ Strain")), data = data_trait %>% filter(Location == loc), FUN = mean)[[2]] + 0.1)
p <- p + geom_text(data = cld_data_loc,
aes(x = Strain, y = y_pos, label = Letters),
vjust = -0.5, size = 3)
}
# Print the plot
print(p)
} else {
cat("\nANOVA not significant. No post-hoc test needed.\n")
}
} else {
# Attempt log transformation if at least one assumption is not met
cat("\n**Attempting log transformation...**\n")
transformed_data <- data_trait
transformed_data[[trait]] <- log1p(transformed_data[[trait]])
transformed_model <- lmer(as.formula(paste(trait, "~ Strain * Location + (1 | Replicate)")), data = transformed_data)
check_model(transformed_model, transformed_data) # Pass transformed_data to check_model
# Re-check assumptions after transformation
normality_p_value_transformed <- shapiro.test(residuals(transformed_model))$p.value
levene_result_transformed <- leveneTest(residuals(transformed_model) ~ transformed_data$Strain * transformed_data$Location)
homogeneity_p_value_transformed <- levene_result_transformed$"Pr(>F)"[1]
# Print the results of assumption checks after transformation
cat("\nAssumption Checks after Transformation:\n")
cat(" Normality (Shapiro-Wilk) p-value:", normality_p_value_transformed, "\n")
cat(" Homogeneity of Variance (Levene's) p-value:", homogeneity_p_value_transformed, "\n")
if (normality_p_value_transformed >= 0.05 & homogeneity_p_value_transformed >= 0.05) {
# Use ANOVA on transformed data if both assumptions are now met
cat("\n**Performing ANOVA on transformed data:**\n")
anova_result <- anova(transformed_model)
print(anova_result)
# Tukey's HSD if ANOVA is significant
if (anova_result$`Pr(>F)`[1] < 0.05) {
tukey_result <- emmeans(transformed_model, pairwise ~ Strain | Location, adjust = "tukey")
print(tukey_result)
# Extract CLD from Tukey's results
cld_data <- as.data.frame(tukey_result$contrasts) %>%
mutate(sig = ifelse(p.value < 0.05, "*", " ")) %>%
dplyr::select(contrast, estimate, SE, df, t.ratio, p.value, sig)
# Create CLDs using 'multcompLetters4'
cld_letters <- multcompLetters4(tukey_result$emmeans, tukey_result$contrasts)
# Print the compact letter display
cat("\n**Compact Letter Displays (CLDs):**\n")
print(cld_letters$LetterMatrix)
# Boxplot with CLDs, faceted by Location
# Ensure complete cases for plotting
data_complete_plot <- transformed_data %>% filter(!is.na(.data[[trait]]))
# Base boxplot
p <- ggplot(data_complete_plot, aes(x = Strain, y = .data[[trait]], fill = Location)) +
geom_boxplot() +
labs(title = paste("Boxplot of", trait, "by Strain and Location (ANOVA with Tukey HSD)"),
x = "Strain", y = trait) +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
facet_wrap(~ Location) # Facet by Location
# Add labels from cld_letters
for (loc in levels(data_complete$Location)) {
cld_data_loc <- cld_letters$LetterMatrix[, loc] %>%
data.frame(Letters = .) %>%
rownames_to_column("Strain") %>%
mutate(y_pos = aggregate(as.formula(paste(trait, "~ Strain")), data = transformed_data %>% filter(Location == loc), FUN = mean)[[2]] + 0.1)
p <- p + geom_text(data = cld_data_loc,
aes(x = Strain, y = y_pos, label = Letters),
vjust = -0.5, size = 3)
}
# Print the plot
print(p)
} else {
cat("\nANOVA not significant. No post-hoc test needed.\n")
}
} else {
# Use Kruskal-Wallis if assumptions are still violated after transformation
cat("\n**Performing Kruskal-Wallis test:**\n")
kruskal_result <- kruskal.test(as.formula(paste(trait, "~ Strain")), data = data_trait)
print(kruskal_result)
# Calculate and print epsilon-squared (effect size for Kruskal-Wallis)
k <- length(unique(data_trait$Strain)) # Number of groups
n <- nrow(data_trait) # Total number of observations
epsilon_squared <- (kruskal_result$statistic - (k - 1)) / (n - k)
cat("Epsilon-squared:", epsilon_squared, "\n")
# Dunn's Test if Kruskal-Wallis is significant
if (kruskal_result$p.value < 0.05) {
cat("\n**Dunn's Test (Post-hoc with Holm correction):**\n")
dunn_result <- dunnTest(as.formula(paste(trait, "~ Strain")), data = data_trait, method="holm")
print(dunn_result)
# Get unique strain names
strains <- unique(data_trait$Strain)
# Initialize a vector to store group letters
group_letters <- rep("", length(strains))
names(group_letters) <- strains
# Assign group letters based on adjusted p-values (example threshold of 0.05)
for (i in 1:nrow(dunn_result$res)) {
comparison <- strsplit(dunn_result$res$Comparison[i], " - ")
strain1 <- comparison[[1]][1]
strain2 <- comparison[[1]][2]
if (dunn_result$res$P.adj[i] < 0.05) {
# Strains are significantly different, assign different letters
if (any(group_letters[strain1] == "") && any(group_letters[strain2] == "")) {
# Both strains have no group yet
group_letters[strain1] <- "a"
group_letters[strain2] <- "b"
} else if (any(group_letters[strain1] == "")) {
# Only strain1 has no group
group_letters[strain1] <- setdiff(letters, group_letters[strain2])[1]
} else if (any(group_letters[strain2] == "")) {
# Only strain2 has no group
group_letters[strain2] <- setdiff(letters, group_letters[strain1])[1]
} else {
# Both strains have groups, ensure they are different
if (group_letters[strain1] == group_letters[strain2]) {
group_letters[strain2] <- setdiff(letters, group_letters[strain1])[1]
}
}
} else {
# Strains are not significantly different
if (any(group_letters[strain1] == "") && any(group_letters[strain2] == "")) {
# Both strains have no group yet
group_letters[strain1] <- "a"
group_letters[strain2] <- "a"
} else if (any(group_letters[strain1] == "")) {
# Only strain1 has no group
group_letters[strain1] <- group_letters[strain2]
} else if (any(group_letters[strain2] == "")) {
# Only strain2 has no group
group_letters[strain2] <- group_letters[strain1]
}
}
}
# Create the cld_data dataframe
cld_data <- data.frame(Strain = strains,
Trait = means[[2]],
.group = group_letters)
# Rename the trait column
colnames(cld_data)[colnames(cld_data) == "Trait"] <- trait
cat("\n**Compact Letter Displays (CLDs - Holm):**\n")
print(cld_data)
# Boxplot of data with group labels
plotting_data <- merge(data_trait, cld_data, by = "Strain") # Use data_trait for merging
# Use 'plotting_data' in ggplot to access the trait and .group columns
ggplot(plotting_data, aes(x = Strain, y = .data[[trait]], fill = .group)) +
geom_boxplot() +
labs(title = paste("Boxplot of", trait, "by Strain (Kruskal-Wallis with Dunn's Test)"),
x = "Strain", y = trait) +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
} else {
cat("\nKruskal-Wallis not significant. No post-hoc test needed.\n")
}
}
}
}
##
##
## --- Analyzing Single_Cocoon_Weight ---
##
## **Attempting log transformation...**
##
## Assumption Checks after Transformation:
## Normality (Shapiro-Wilk) p-value: 1.449919e-10
## Homogeneity of Variance (Levene's) p-value: 0.01517713
##
## **Performing Kruskal-Wallis test:**
##
## Kruskal-Wallis rank sum test
##
## data: Single_Cocoon_Weight by Strain
## Kruskal-Wallis chi-squared = 51.435, df = 15, p-value = 7e-06
##
## Epsilon-squared: 0.03995115
##
## **Dunn's Test (Post-hoc with Holm correction):**
## Dunn (1964) Kruskal-Wallis multiple comparison
## p-values adjusted with the Holm method.
## Comparison Z P.unadj P.adj
## 1 SW 101 - SW 102 1.72201660 8.506651e-02 1.0000000000
## 2 SW 101 - SW 103 1.31188029 1.895605e-01 1.0000000000
## 3 SW 102 - SW 103 -0.46411450 6.425657e-01 1.0000000000
## 4 SW 101 - SW 104 3.57289381 3.530580e-04 0.0395424918
## 5 SW 102 - SW 104 1.73468765 8.279615e-02 1.0000000000
## 6 SW 103 - SW 104 2.28146404 2.252100e-02 1.0000000000
## 7 SW 101 - SW 105 2.69628915 7.011677e-03 0.7151910833
## 8 SW 102 - SW 105 0.85637203 3.917920e-01 1.0000000000
## 9 SW 103 - SW 105 1.37862842 1.680094e-01 1.0000000000
## 10 SW 104 - SW 105 -0.93368133 3.504683e-01 1.0000000000
## 11 SW 101 - SW 106 1.03196001 3.020909e-01 1.0000000000
## 12 SW 102 - SW 106 -0.73808134 4.604650e-01 1.0000000000
## 13 SW 103 - SW 106 -0.28426540 7.762070e-01 1.0000000000
## 14 SW 104 - SW 106 -2.56845769 1.021522e-02 1.0000000000
## 15 SW 105 - SW 106 -1.66867425 9.518195e-02 1.0000000000
## 16 SW 101 - SW 107 -0.30873655 7.575219e-01 1.0000000000
## 17 SW 102 - SW 107 -2.01638535 4.375970e-02 1.0000000000
## 18 SW 103 - SW 107 -1.61931689 1.053791e-01 1.0000000000
## 19 SW 104 - SW 107 -3.87339590 1.073293e-04 0.0125575239
## 20 SW 105 - SW 107 -3.00372575 2.666956e-03 0.2880312599
## 21 SW 106 - SW 107 -1.34069656 1.800190e-01 1.0000000000
## 22 SW 101 - SW 108 -0.07567349 9.396789e-01 1.0000000000
## 23 SW 102 - SW 108 -1.73717793 8.235577e-02 1.0000000000
## 24 SW 103 - SW 108 -1.33991654 1.802725e-01 1.0000000000
## 25 SW 104 - SW 108 -3.52321059 4.263524e-04 0.0473251209
## 26 SW 105 - SW 108 -2.67436684 7.487050e-03 0.7561920704
## 27 SW 106 - SW 108 -1.07009503 2.845765e-01 1.0000000000
## 28 SW 107 - SW 108 0.22183250 8.244443e-01 1.0000000000
## 29 SW 101 - SW 109 -0.56842946 5.697434e-01 1.0000000000
## 30 SW 102 - SW 109 -2.26399282 2.357456e-02 1.0000000000
## 31 SW 103 - SW 109 -1.87791635 6.039261e-02 1.0000000000
## 32 SW 104 - SW 109 -4.12616241 3.688668e-05 0.0043526284
## 33 SW 105 - SW 109 -3.26232521 1.105023e-03 0.1204475030
## 34 SW 106 - SW 109 -1.60038946 1.095122e-01 1.0000000000
## 35 SW 107 - SW 109 -0.25969291 7.951007e-01 1.0000000000
## 36 SW 108 - SW 109 -0.47207884 6.368705e-01 1.0000000000
## 37 SW 101 - SW 110 -0.20893523 8.344988e-01 1.0000000000
## 38 SW 102 - SW 110 -1.87980234 6.013502e-02 1.0000000000
## 39 SW 103 - SW 110 -1.48527451 1.374711e-01 1.0000000000
## 40 SW 104 - SW 110 -3.68607099 2.277428e-04 0.0261904273
## 41 SW 105 - SW 110 -2.83305722 4.610513e-03 0.4887143988
## 42 SW 106 - SW 110 -1.21337135 2.249879e-01 1.0000000000
## 43 SW 107 - SW 110 0.09156686 9.270422e-01 1.0000000000
## 44 SW 108 - SW 110 -0.12792055 8.982119e-01 1.0000000000
## 45 SW 109 - SW 110 0.34433337 7.305956e-01 1.0000000000
## 46 SW 101 - SW 111 -0.89879289 3.687630e-01 1.0000000000
## 47 SW 102 - SW 111 -2.57898200 9.909195e-03 0.9909195130
## 48 SW 103 - SW 111 -2.20688878 2.732183e-02 1.0000000000
## 49 SW 104 - SW 111 -4.44771457 8.678875e-06 0.0010327861
## 50 SW 105 - SW 111 -3.59129763 3.290356e-04 0.0371810247
## 51 SW 106 - SW 111 -1.93075290 5.351362e-02 1.0000000000
## 52 SW 107 - SW 111 -0.59005634 5.551529e-01 1.0000000000
## 53 SW 108 - SW 111 -0.79042502 4.292796e-01 1.0000000000
## 54 SW 109 - SW 111 -0.33036343 7.411254e-01 1.0000000000
## 55 SW 110 - SW 111 -0.66588553 5.054843e-01 1.0000000000
## 56 SW 101 - SW 112 -1.08628265 2.773540e-01 1.0000000000
## 57 SW 102 - SW 112 -2.75774647 5.820132e-03 0.6111138451
## 58 SW 103 - SW 112 -2.39358911 1.668443e-02 1.0000000000
## 59 SW 104 - SW 112 -4.63020370 3.653062e-06 0.0004383674
## 60 SW 105 - SW 112 -3.77799796 1.580942e-04 0.0183389242
## 61 SW 106 - SW 112 -2.11824266 3.415452e-02 1.0000000000
## 62 SW 107 - SW 112 -0.77754610 4.368366e-01 1.0000000000
## 63 SW 108 - SW 112 -0.97109467 3.315011e-01 1.0000000000
## 64 SW 109 - SW 112 -0.51785320 6.045607e-01 1.0000000000
## 65 SW 110 - SW 112 -0.84837466 3.962293e-01 1.0000000000
## 66 SW 111 - SW 112 -0.18748976 8.512766e-01 1.0000000000
## 67 SW 101 - SW 113 0.27552992 7.829091e-01 1.0000000000
## 68 SW 102 - SW 113 -1.45930913 1.444800e-01 1.0000000000
## 69 SW 103 - SW 113 -1.03751051 2.994980e-01 1.0000000000
## 70 SW 104 - SW 113 -3.30471268 9.507381e-04 0.1045811922
## 71 SW 105 - SW 113 -2.42191936 1.543878e-02 1.0000000000
## 72 SW 106 - SW 113 -0.75643009 4.493914e-01 1.0000000000
## 73 SW 107 - SW 113 0.58426647 5.590410e-01 1.0000000000
## 74 SW 108 - SW 113 0.34118076 7.329675e-01 1.0000000000
## 75 SW 109 - SW 113 0.84395937 3.986921e-01 1.0000000000
## 76 SW 110 - SW 113 0.47711636 6.332793e-01 1.0000000000
## 77 SW 111 - SW 113 1.17432281 2.402658e-01 1.0000000000
## 78 SW 112 - SW 113 1.36181257 1.732570e-01 1.0000000000
## 79 SW 101 - SW 114 -0.04785161 9.618345e-01 1.0000000000
## 80 SW 102 - SW 114 -1.76764132 7.712089e-02 1.0000000000
## 81 SW 103 - SW 114 -1.35953042 1.739786e-01 1.0000000000
## 82 SW 104 - SW 114 -3.61946914 2.952080e-04 0.0336537128
## 83 SW 105 - SW 114 -2.74393928 6.070677e-03 0.6313503685
## 84 SW 106 - SW 114 -1.07981162 2.802261e-01 1.0000000000
## 85 SW 107 - SW 114 0.26088494 7.941812e-01 1.0000000000
## 86 SW 108 - SW 114 0.02956252 9.764160e-01 1.0000000000
## 87 SW 109 - SW 114 0.52057785 6.026609e-01 1.0000000000
## 88 SW 110 - SW 114 0.16235990 8.710224e-01 1.0000000000
## 89 SW 111 - SW 114 0.85094128 3.948020e-01 1.0000000000
## 90 SW 112 - SW 114 1.03843104 2.990694e-01 1.0000000000
## 91 SW 113 - SW 114 -0.32338153 7.464063e-01 1.0000000000
## 92 SW 101 - SW 115 1.15984812 2.461106e-01 1.0000000000
## 93 SW 102 - SW 115 -0.61614481 5.377990e-01 1.0000000000
## 94 SW 103 - SW 115 -0.15691577 8.753112e-01 1.0000000000
## 95 SW 104 - SW 115 -2.44398054 1.452621e-02 1.0000000000
## 96 SW 105 - SW 115 -1.54132462 1.232378e-01 1.0000000000
## 97 SW 106 - SW 115 0.12788811 8.982375e-01 1.0000000000
## 98 SW 107 - SW 115 1.46858467 1.419455e-01 1.0000000000
## 99 SW 108 - SW 115 1.19333110 2.327397e-01 1.0000000000
## 100 SW 109 - SW 115 1.72827758 8.393847e-02 1.0000000000
## 101 SW 110 - SW 115 1.33784850 1.809458e-01 1.0000000000
## 102 SW 111 - SW 115 2.05864101 3.952864e-02 1.0000000000
## 103 SW 112 - SW 115 2.24613077 2.469563e-02 1.0000000000
## 104 SW 113 - SW 115 0.88431820 3.765245e-01 1.0000000000
## 105 SW 114 - SW 115 1.20769973 2.271628e-01 1.0000000000
## 106 SW 101 - SW 116 1.82721632 6.766725e-02 1.0000000000
## 107 SW 102 - SW 116 0.02016580 9.839111e-01 0.9839111085
## 108 SW 103 - SW 116 0.50764245 6.117041e-01 1.0000000000
## 109 SW 104 - SW 116 -1.79441204 7.274743e-02 1.0000000000
## 110 SW 105 - SW 116 -0.87676641 3.806135e-01 1.0000000000
## 111 SW 106 - SW 116 0.79525631 4.264644e-01 1.0000000000
## 112 SW 107 - SW 116 2.13595287 3.268326e-02 1.0000000000
## 113 SW 108 - SW 116 1.83642318 6.629509e-02 1.0000000000
## 114 SW 109 - SW 116 2.39564577 1.659112e-02 1.0000000000
## 115 SW 110 - SW 116 1.98741700 4.687621e-02 1.0000000000
## 116 SW 111 - SW 116 2.72600921 6.410521e-03 0.6602836639
## 117 SW 112 - SW 116 2.91349897 3.574030e-03 0.3824212260
## 118 SW 113 - SW 116 1.55168640 1.207373e-01 1.0000000000
## 119 SW 114 - SW 116 1.87506793 6.078338e-02 1.0000000000
## 120 SW 115 - SW 116 0.66736820 5.045370e-01 1.0000000000
##
## **Compact Letter Displays (CLDs - Holm):**
## Strain Single_Cocoon_Weight .group
## SW 101 SW 101 1.483800 a
## SW 102 SW 102 1.404540 a
## SW 103 SW 103 1.419119 a
## SW 104 SW 104 1.315407 b
## SW 105 SW 105 1.363203 a
## SW 106 SW 106 1.431433 a
## SW 107 SW 107 1.501850 a
## SW 108 SW 108 1.465827 a
## SW 109 SW 109 1.496533 a
## SW 110 SW 110 1.484426 a
## SW 111 SW 111 1.532417 b
## SW 112 SW 112 1.532417 b
## SW 113 SW 113 1.459350 a
## SW 114 SW 114 1.473400 a
## SW 115 SW 115 1.420700 a
## SW 116 SW 116 1.402300 a
##
##
## --- Analyzing Cocoon_Shell_Weight ---
##
## **Attempting log transformation...**
##
## Assumption Checks after Transformation:
## Normality (Shapiro-Wilk) p-value: 2.21599e-18
## Homogeneity of Variance (Levene's) p-value: 8.501889e-05
##
## **Performing Kruskal-Wallis test:**
##
## Kruskal-Wallis rank sum test
##
## data: Cocoon_Shell_Weight by Strain
## Kruskal-Wallis chi-squared = 36.722, df = 15, p-value = 0.00139
##
## Epsilon-squared: 0.02381752
##
## **Dunn's Test (Post-hoc with Holm correction):**
## Dunn (1964) Kruskal-Wallis multiple comparison
## p-values adjusted with the Holm method.
## Comparison Z P.unadj P.adj
## 1 SW 101 - SW 102 2.047025256 0.0406556048 1.00000000
## 2 SW 101 - SW 103 -0.587136437 0.5571120778 1.00000000
## 3 SW 102 - SW 103 -2.599211805 0.0093438101 1.00000000
## 4 SW 101 - SW 104 1.723660739 0.0847691065 1.00000000
## 5 SW 102 - SW 104 -0.349819939 0.7264738344 1.00000000
## 6 SW 103 - SW 104 2.288385944 0.0221150555 1.00000000
## 7 SW 101 - SW 105 0.680251852 0.4963450056 1.00000000
## 8 SW 102 - SW 105 -1.390342645 0.1644248541 1.00000000
## 9 SW 103 - SW 105 1.262096456 0.2069140802 1.00000000
## 10 SW 104 - SW 105 -1.054527881 0.2916413027 1.00000000
## 11 SW 101 - SW 106 0.048705804 0.9611537507 1.00000000
## 12 SW 102 - SW 106 -2.000586095 0.0454370133 1.00000000
## 13 SW 103 - SW 106 0.635637163 0.5250129416 1.00000000
## 14 SW 104 - SW 106 -1.676253991 0.0936884527 1.00000000
## 15 SW 105 - SW 106 -0.631751126 0.5275495144 1.00000000
## 16 SW 101 - SW 107 -0.550069041 0.5822720199 1.00000000
## 17 SW 102 - SW 107 -2.571495509 0.0101260329 1.00000000
## 18 SW 103 - SW 107 0.039383486 0.9685846459 1.00000000
## 19 SW 104 - SW 107 -2.259058629 0.0238797381 1.00000000
## 20 SW 105 - SW 107 -1.228004803 0.2194451651 1.00000000
## 21 SW 106 - SW 107 -0.598774845 0.5493230390 1.00000000
## 22 SW 101 - SW 108 0.055997743 0.9553436051 1.00000000
## 23 SW 102 - SW 108 -1.925432728 0.0541752445 1.00000000
## 24 SW 103 - SW 108 0.621727410 0.5341211247 1.00000000
## 25 SW 104 - SW 108 -1.609485045 0.1075103226 1.00000000
## 26 SW 105 - SW 108 -0.599925201 0.5485560865 1.00000000
## 27 SW 106 - SW 108 0.009063656 0.9927683475 0.99276835
## 28 SW 107 - SW 108 0.586057534 0.5578368520 1.00000000
## 29 SW 101 - SW 109 0.413488431 0.6792487968 1.00000000
## 30 SW 102 - SW 109 -1.652779506 0.0983757464 1.00000000
## 31 SW 103 - SW 109 0.998883856 0.3178509575 1.00000000
## 32 SW 104 - SW 109 -1.321200654 0.1864344683 1.00000000
## 33 SW 105 - SW 109 -0.268504432 0.7883110636 1.00000000
## 34 SW 106 - SW 109 0.364782627 0.7152736741 1.00000000
## 35 SW 107 - SW 109 0.963557472 0.3352678369 1.00000000
## 36 SW 108 - SW 109 0.342449679 0.7320125075 1.00000000
## 37 SW 101 - SW 110 0.246942377 0.8049528194 1.00000000
## 38 SW 102 - SW 110 -1.761196199 0.0782052012 1.00000000
## 39 SW 103 - SW 110 0.817560074 0.4136084331 1.00000000
## 40 SW 104 - SW 110 -1.439327018 0.1500578913 1.00000000
## 41 SW 105 - SW 110 -0.416297988 0.6771919645 1.00000000
## 42 SW 106 - SW 110 0.199535629 0.8418437757 1.00000000
## 43 SW 107 - SW 110 0.782340267 0.4340146283 1.00000000
## 44 SW 108 - SW 110 0.183801244 0.8541693799 1.00000000
## 45 SW 109 - SW 110 -0.155517708 0.8764131946 1.00000000
## 46 SW 101 - SW 111 -1.559948117 0.1187721423 1.00000000
## 47 SW 102 - SW 111 -3.534377427 0.0004087370 0.04863970
## 48 SW 103 - SW 111 -0.966243449 0.3339223764 1.00000000
## 49 SW 104 - SW 111 -3.242002742 0.0011869288 0.13887067
## 50 SW 105 - SW 111 -2.233631737 0.0255073139 1.00000000
## 51 SW 106 - SW 111 -1.608653921 0.1076920368 1.00000000
## 52 SW 107 - SW 111 -1.009879076 0.3125532286 1.00000000
## 53 SW 108 - SW 111 -1.559201362 0.1189487286 1.00000000
## 54 SW 109 - SW 111 -1.973436548 0.0484458470 1.00000000
## 55 SW 110 - SW 111 -1.765284380 0.0775159834 1.00000000
## 56 SW 101 - SW 112 -1.018904977 0.3082480809 1.00000000
## 57 SW 102 - SW 112 -3.018513034 0.0025401847 0.28958106
## 58 SW 103 - SW 112 -0.427478394 0.6690309186 1.00000000
## 59 SW 104 - SW 112 -2.715390019 0.0066197760 0.73479513
## 60 SW 105 - SW 112 -1.694866683 0.0901007141 1.00000000
## 61 SW 106 - SW 112 -1.067610780 0.2856961188 1.00000000
## 62 SW 107 - SW 112 -0.468835935 0.6391869117 1.00000000
## 63 SW 108 - SW 112 -1.037839146 0.2993449486 1.00000000
## 64 SW 109 - SW 112 -1.432393408 0.1520312641 1.00000000
## 65 SW 110 - SW 112 -1.238671657 0.2154671192 1.00000000
## 66 SW 111 - SW 112 0.541043141 0.5884778478 1.00000000
## 67 SW 101 - SW 113 -1.180860289 0.2376582291 1.00000000
## 68 SW 102 - SW 113 -3.172931365 0.0015090820 0.17505351
## 69 SW 103 - SW 113 -0.588751786 0.5560277928 1.00000000
## 70 SW 104 - SW 113 -2.873025745 0.0040656098 0.45534830
## 71 SW 105 - SW 113 -1.856140075 0.0634335949 1.00000000
## 72 SW 106 - SW 113 -1.229566092 0.2188596339 1.00000000
## 73 SW 107 - SW 113 -0.630791247 0.5281770270 1.00000000
## 74 SW 108 - SW 113 -1.193903190 0.2325158396 1.00000000
## 75 SW 109 - SW 113 -1.594348720 0.1108579512 1.00000000
## 76 SW 110 - SW 113 -1.396307382 0.1626219517 1.00000000
## 77 SW 111 - SW 113 0.379087828 0.7046226444 1.00000000
## 78 SW 112 - SW 113 -0.161955312 0.8713410449 1.00000000
## 79 SW 101 - SW 114 0.031846102 0.9745947809 1.00000000
## 80 SW 102 - SW 114 -2.016661189 0.0437308827 1.00000000
## 81 SW 103 - SW 114 0.618848450 0.5360162011 1.00000000
## 82 SW 104 - SW 114 -1.692664020 0.0905194377 1.00000000
## 83 SW 105 - SW 114 -0.648539839 0.5166358524 1.00000000
## 84 SW 106 - SW 114 -0.016859701 0.9865485419 1.00000000
## 85 SW 107 - SW 114 0.581915144 0.5606238395 1.00000000
## 86 SW 108 - SW 114 -0.025310071 0.9798076410 1.00000000
## 87 SW 109 - SW 114 -0.381642329 0.7027266827 1.00000000
## 88 SW 110 - SW 114 -0.215945657 0.8290301073 1.00000000
## 89 SW 111 - SW 114 1.591794220 0.1114309494 1.00000000
## 90 SW 112 - SW 114 1.050751079 0.2933729296 1.00000000
## 91 SW 113 - SW 114 1.212706391 0.2252420954 1.00000000
## 92 SW 101 - SW 115 2.077149256 0.0377877856 1.00000000
## 93 SW 102 - SW 115 -0.066541148 0.9469469986 1.00000000
## 94 SW 103 - SW 115 2.655539763 0.0079181598 0.87099758
## 95 SW 104 - SW 115 0.298087886 0.7656360872 1.00000000
## 96 SW 105 - SW 115 1.388151474 0.1650909252 1.00000000
## 97 SW 106 - SW 115 2.028443452 0.0425150091 1.00000000
## 98 SW 107 - SW 115 2.627218297 0.0086086088 0.93833836
## 99 SW 108 - SW 115 1.945593364 0.0517036048 1.00000000
## 100 SW 109 - SW 115 1.663660825 0.0961802292 1.00000000
## 101 SW 110 - SW 115 1.774806248 0.0759298843 1.00000000
## 102 SW 111 - SW 115 3.637097373 0.0002757277 0.03308733
## 103 SW 112 - SW 115 3.096054233 0.0019611449 0.22553166
## 104 SW 113 - SW 115 3.258009545 0.0011219663 0.13239203
## 105 SW 114 - SW 115 2.045303154 0.0408249800 1.00000000
## 106 SW 101 - SW 116 1.388115404 0.1651019066 1.00000000
## 107 SW 102 - SW 116 -0.723509149 0.4693671354 1.00000000
## 108 SW 103 - SW 116 1.969407119 0.0489063589 1.00000000
## 109 SW 104 - SW 116 -0.372568418 0.7094696820 1.00000000
## 110 SW 105 - SW 116 0.702018830 0.4826674204 1.00000000
## 111 SW 106 - SW 116 1.339409601 0.1804373656 1.00000000
## 112 SW 107 - SW 116 1.938184446 0.0526007237 1.00000000
## 113 SW 108 - SW 116 1.281623730 0.1999746716 1.00000000
## 114 SW 109 - SW 116 0.974626973 0.3297453288 1.00000000
## 115 SW 110 - SW 116 1.104149944 0.2695280990 1.00000000
## 116 SW 111 - SW 116 2.948063522 0.0031977135 0.36134163
## 117 SW 112 - SW 116 2.407020381 0.0160832723 1.00000000
## 118 SW 113 - SW 116 2.568975693 0.0101999612 1.00000000
## 119 SW 114 - SW 116 1.356269302 0.1750134990 1.00000000
## 120 SW 115 - SW 116 -0.689033852 0.4908019650 1.00000000
##
## **Compact Letter Displays (CLDs - Holm):**
## Strain Cocoon_Shell_Weight .group
## SW 101 SW 101 0.2937500 a
## SW 102 SW 102 0.2738000 a
## SW 103 SW 103 0.2980847 a
## SW 104 SW 104 0.2824630 a
## SW 105 SW 105 0.2858644 a
## SW 106 SW 106 0.2898333 a
## SW 107 SW 107 0.3016167 a
## SW 108 SW 108 0.2937115 a
## SW 109 SW 109 0.2868167 a
## SW 110 SW 110 0.2900185 a
## SW 111 SW 111 0.3048333 b
## SW 112 SW 112 0.3018033 a
## SW 113 SW 113 0.3026000 a
## SW 114 SW 114 0.2910000 a
## SW 115 SW 115 0.2727167 a
## SW 116 SW 116 0.2795000 a
##
##
## --- Analyzing Cocoon_Shell_Percentage ---
##
## **Attempting log transformation...**
## boundary (singular) fit: see help('isSingular')
##
## Assumption Checks after Transformation:
## Normality (Shapiro-Wilk) p-value: 3.532175e-24
## Homogeneity of Variance (Levene's) p-value: 0.4632928
##
## **Performing Kruskal-Wallis test:**
##
## Kruskal-Wallis rank sum test
##
## data: Cocoon_Shell_Percentage by Strain
## Kruskal-Wallis chi-squared = 45.229, df = 15, p-value = 7.046e-05
##
## Epsilon-squared: 0.03314579
##
## **Dunn's Test (Post-hoc with Holm correction):**
## Dunn (1964) Kruskal-Wallis multiple comparison
## p-values adjusted with the Holm method.
## Comparison Z P.unadj P.adj
## 1 SW 101 - SW 102 0.79234032 4.281623e-01 1.000000000
## 2 SW 101 - SW 103 -2.49137857 1.272485e-02 1.000000000
## 3 SW 102 - SW 103 -3.16564982 1.547370e-03 0.173305388
## 4 SW 101 - SW 104 -2.22753971 2.591123e-02 1.000000000
## 5 SW 102 - SW 104 -2.90203420 3.707481e-03 0.404115384
## 6 SW 103 - SW 104 0.20681503 8.361543e-01 1.000000000
## 7 SW 101 - SW 105 -2.26530087 2.349422e-02 1.000000000
## 8 SW 102 - SW 105 -2.95001080 3.177628e-03 0.349539099
## 9 SW 103 - SW 105 0.22513374 8.218752e-01 1.000000000
## 10 SW 104 - SW 105 0.01328153 9.894032e-01 1.000000000
## 11 SW 101 - SW 106 -0.82999180 4.065434e-01 1.000000000
## 12 SW 102 - SW 106 -1.58370645 1.132605e-01 1.000000000
## 13 SW 103 - SW 106 1.66488149 9.593640e-02 1.000000000
## 14 SW 104 - SW 106 1.41968501 1.556994e-01 1.000000000
## 15 SW 105 - SW 106 1.43880379 1.502061e-01 1.000000000
## 16 SW 101 - SW 107 -0.75727811 4.488833e-01 1.000000000
## 17 SW 102 - SW 107 -1.51437667 1.299303e-01 1.000000000
## 18 SW 103 - SW 107 1.73728901 8.233617e-02 1.000000000
## 19 SW 104 - SW 107 1.49045932 1.361035e-01 1.000000000
## 20 SW 105 - SW 107 1.51121131 1.307346e-01 1.000000000
## 21 SW 106 - SW 107 0.07271368 9.420340e-01 1.000000000
## 22 SW 101 - SW 108 -0.22926627 8.186620e-01 1.000000000
## 23 SW 102 - SW 108 -0.98531935 3.244672e-01 1.000000000
## 24 SW 103 - SW 108 2.17310382 2.977250e-02 1.000000000
## 25 SW 104 - SW 108 1.92698513 5.398148e-02 1.000000000
## 26 SW 105 - SW 108 1.95518449 5.056130e-02 1.000000000
## 27 SW 106 - SW 108 0.57053384 5.683157e-01 1.000000000
## 28 SW 107 - SW 108 0.50046518 6.167476e-01 1.000000000
## 29 SW 101 - SW 109 1.49463234 1.350104e-01 1.000000000
## 30 SW 102 - SW 109 0.63273570 5.269063e-01 1.000000000
## 31 SW 103 - SW 109 3.97971770 6.899715e-05 0.008279658
## 32 SW 104 - SW 109 3.68230800 2.311319e-04 0.027042436
## 33 SW 105 - SW 109 3.75364000 1.742851e-04 0.020739930
## 34 SW 106 - SW 109 2.32462414 2.009207e-02 1.000000000
## 35 SW 107 - SW 109 2.25191045 2.432793e-02 1.000000000
## 36 SW 108 - SW 109 1.66953003 9.501238e-02 1.000000000
## 37 SW 101 - SW 110 0.59017261 5.550749e-01 1.000000000
## 38 SW 102 - SW 110 -0.20900053 8.344478e-01 1.000000000
## 39 SW 103 - SW 110 3.01328393 2.584370e-03 0.286865070
## 40 SW 104 - SW 110 2.74636625 6.025945e-03 0.644776134
## 41 SW 105 - SW 110 2.79318738 5.219145e-03 0.563667661
## 42 SW 106 - SW 110 1.39802730 1.621049e-01 1.000000000
## 43 SW 107 - SW 110 1.32725300 1.844250e-01 1.000000000
## 44 SW 108 - SW 110 0.79334862 4.275747e-01 1.000000000
## 45 SW 109 - SW 110 -0.86459569 3.872607e-01 1.000000000
## 46 SW 101 - SW 111 -0.79235778 4.281521e-01 1.000000000
## 47 SW 102 - SW 111 -1.54782382 1.216647e-01 1.000000000
## 48 SW 103 - SW 111 1.70235705 8.868846e-02 1.000000000
## 49 SW 104 - SW 111 1.45631527 1.453055e-01 1.000000000
## 50 SW 105 - SW 111 1.47627935 1.398689e-01 1.000000000
## 51 SW 106 - SW 111 0.03763401 9.699795e-01 1.000000000
## 52 SW 107 - SW 111 -0.03507967 9.720162e-01 1.000000000
## 53 SW 108 - SW 111 -0.53426880 5.931556e-01 1.000000000
## 54 SW 109 - SW 111 -2.28699012 2.219640e-02 1.000000000
## 55 SW 110 - SW 111 -1.36139704 1.733883e-01 1.000000000
## 56 SW 101 - SW 112 0.14985490 8.808791e-01 1.000000000
## 57 SW 102 - SW 112 -0.64945928 5.160416e-01 1.000000000
## 58 SW 103 - SW 112 2.64060250 8.275875e-03 0.868966911
## 59 SW 104 - SW 112 2.37339776 1.762527e-02 1.000000000
## 60 SW 105 - SW 112 2.41452480 1.575575e-02 1.000000000
## 61 SW 106 - SW 112 0.97984670 3.271618e-01 1.000000000
## 62 SW 107 - SW 112 0.90713302 3.643365e-01 1.000000000
## 63 SW 108 - SW 112 0.37367007 7.086498e-01 1.000000000
## 64 SW 109 - SW 112 -1.34477744 1.786971e-01 1.000000000
## 65 SW 110 - SW 112 -0.44431456 6.568152e-01 1.000000000
## 66 SW 111 - SW 112 0.94221268 3.460838e-01 1.000000000
## 67 SW 101 - SW 113 -1.96378035 4.955557e-02 1.000000000
## 68 SW 102 - SW 113 -2.66473142 7.704983e-03 0.816728226
## 69 SW 103 - SW 113 0.53586681 5.920506e-01 1.000000000
## 70 SW 104 - SW 113 0.31613627 7.518991e-01 1.000000000
## 71 SW 105 - SW 113 0.30978911 7.567213e-01 1.000000000
## 72 SW 106 - SW 113 -1.13378855 2.568833e-01 1.000000000
## 73 SW 107 - SW 113 -1.20650224 2.276239e-01 1.000000000
## 74 SW 108 - SW 113 -1.66307983 9.629646e-02 1.000000000
## 75 SW 109 - SW 113 -3.45841269 5.433684e-04 0.062487361
## 76 SW 110 - SW 113 -2.50157604 1.236419e-02 1.000000000
## 77 SW 111 - SW 113 -1.17142257 2.414290e-01 1.000000000
## 78 SW 112 - SW 113 -2.11363525 3.454643e-02 1.000000000
## 79 SW 101 - SW 114 -0.33478947 7.377839e-01 1.000000000
## 80 SW 102 - SW 114 -1.11154955 2.663319e-01 1.000000000
## 81 SW 103 - SW 114 2.15799875 3.092793e-02 1.000000000
## 82 SW 104 - SW 114 1.90167957 5.721306e-02 1.000000000
## 83 SW 105 - SW 114 1.93192105 5.336925e-02 1.000000000
## 84 SW 106 - SW 114 0.49520233 6.204573e-01 1.000000000
## 85 SW 107 - SW 114 0.42248864 6.726684e-01 1.000000000
## 86 SW 108 - SW 114 -0.09334494 9.256295e-01 1.000000000
## 87 SW 109 - SW 114 -1.82942181 6.733644e-02 1.000000000
## 88 SW 110 - SW 114 -0.91603275 3.596497e-01 1.000000000
## 89 SW 111 - SW 114 0.45756831 6.472626e-01 1.000000000
## 90 SW 112 - SW 114 -0.48464437 6.279286e-01 1.000000000
## 91 SW 113 - SW 114 1.62899088 1.033149e-01 1.000000000
## 92 SW 101 - SW 115 1.21688991 2.236461e-01 1.000000000
## 93 SW 102 - SW 115 0.36791868 7.129339e-01 1.000000000
## 94 SW 103 - SW 115 3.70314471 2.129433e-04 0.025127313
## 95 SW 104 - SW 115 3.41197337 6.449441e-04 0.073523632
## 96 SW 105 - SW 115 3.47706701 5.069311e-04 0.058804008
## 97 SW 106 - SW 115 2.04688170 4.066970e-02 1.000000000
## 98 SW 107 - SW 115 1.97416802 4.836264e-02 1.000000000
## 99 SW 108 - SW 115 1.40189072 1.609479e-01 1.000000000
## 100 SW 109 - SW 115 -0.27774243 7.812101e-01 1.000000000
## 101 SW 110 - SW 115 0.59426105 5.523375e-01 1.000000000
## 102 SW 111 - SW 115 2.00924769 4.451087e-02 1.000000000
## 103 SW 112 - SW 115 1.06703501 2.859560e-01 1.000000000
## 104 SW 113 - SW 115 3.18067026 1.469348e-03 0.166036291
## 105 SW 114 - SW 115 1.55167938 1.207390e-01 1.000000000
## 106 SW 101 - SW 116 -0.32286919 7.467943e-01 1.000000000
## 107 SW 102 - SW 116 -1.10018402 2.712520e-01 1.000000000
## 108 SW 103 - SW 116 2.16986884 3.001678e-02 1.000000000
## 109 SW 104 - SW 116 1.91328191 5.571197e-02 1.000000000
## 110 SW 105 - SW 116 1.94379113 5.192064e-02 1.000000000
## 111 SW 106 - SW 116 0.50712261 6.120688e-01 1.000000000
## 112 SW 107 - SW 116 0.43440892 6.639915e-01 1.000000000
## 113 SW 108 - SW 116 -0.08185827 9.347594e-01 1.000000000
## 114 SW 109 - SW 116 -1.81750153 6.914035e-02 1.000000000
## 115 SW 110 - SW 116 -0.90443040 3.657672e-01 1.000000000
## 116 SW 111 - SW 116 0.46948859 6.387204e-01 1.000000000
## 117 SW 112 - SW 116 -0.47272409 6.364100e-01 1.000000000
## 118 SW 113 - SW 116 1.64091116 1.008159e-01 1.000000000
## 119 SW 114 - SW 116 0.01192028 9.904892e-01 0.990489221
## 120 SW 115 - SW 116 -1.53975910 1.236191e-01 1.000000000
##
## **Compact Letter Displays (CLDs - Holm):**
## Strain Cocoon_Shell_Percentage .group
## SW 101 SW 101 19.90443 a
## SW 102 SW 102 19.53279 a
## SW 103 SW 103 21.18908 a
## SW 104 SW 104 21.71367 a
## SW 105 SW 105 21.13654 a
## SW 106 SW 106 20.38933 a
## SW 107 SW 107 20.20448 a
## SW 108 SW 108 20.12382 a
## SW 109 SW 109 20.20702 b
## SW 110 SW 110 19.70253 a
## SW 111 SW 111 20.22120 a
## SW 112 SW 112 19.83908 a
## SW 113 SW 113 20.87511 a
## SW 114 SW 114 19.92945 a
## SW 115 SW 115 19.42583 b
## SW 116 SW 116 20.07143 a
##
##
## --- Analyzing Hatching_Rate ---
##
## **Performing ANOVA on original data:**
## Analysis of Variance Table
##
## Response: Hatching_Rate
## Df Sum Sq Mean Sq F value Pr(>F)
## Strain 15 59.064 3.9376 1.6219 0.08659 .
## Location 1 8.520 8.5204 3.5096 0.06471 .
## Residuals 79 191.790 2.4277
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Eta-squared (Marginal R-squared):
##
## ANOVA not significant. No post-hoc test needed.
##
##
## --- Analyzing Larval_Weight ---
##
## **Performing ANOVA on original data:**
## Analysis of Variance Table
##
## Response: Larval_Weight
## Df Sum Sq Mean Sq F value Pr(>F)
## Strain 15 155.39 10.36 1.6854 0.07097 .
## Location 1 598.31 598.31 97.3423 2.003e-15 ***
## Residuals 79 485.57 6.15
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Eta-squared (Marginal R-squared):
##
## ANOVA not significant. No post-hoc test needed.
##
##
## --- Analyzing Live_Pupa_Percentage ---
##
## **Attempting log transformation...**
## boundary (singular) fit: see help('isSingular')
##
## Assumption Checks after Transformation:
## Normality (Shapiro-Wilk) p-value: 5.605831e-07
## Homogeneity of Variance (Levene's) p-value: 0.185054
##
## **Performing Kruskal-Wallis test:**
##
## Kruskal-Wallis rank sum test
##
## data: Live_Pupa_Percentage by Strain
## Kruskal-Wallis chi-squared = 8.2778, df = 15, p-value = 0.9122
##
## Epsilon-squared: -0.08402715
##
## Kruskal-Wallis not significant. No post-hoc test needed.
We’ll now explore the relationships among the traits and strains using multivariate techniques.
1. Principal Component Analysis (PCA): PCA helps us understand the main sources of variation in the data and visualize the relationships between individuals (silkworms) and variables (traits).
# Subset the data for multivariate analysis
# Include all traits except "Larval_Weight" since it has missing data for one location
traits_multivariate <- c("Single_Cocoon_Weight", "Cocoon_Shell_Weight", "Larval_Weight",
"Cocoon_Shell_Percentage", "Live_Pupa_Percentage", "Hatching_Rate")
# Filter combined_data to include only the 16 hybrid strains
hybrid_strains <- unique(combined_data$Strain) # Get the unique hybrid strains
data_hybrids <- combined_data %>% filter(Strain %in% hybrid_strains)
# Reshape the data to wide format for location comparison
data_wide <- data_hybrids %>%
pivot_wider(id_cols = Strain,
names_from = Location,
values_from = all_of(traits_multivariate),
names_sep = "_", # Separate location names in column names with "_"
values_fn = mean) # Use the mean value for each trait in each location
# 1. Principal Component Analysis (PCA)
# Perform PCA (scaling the data is generally recommended)
pca_result <- PCA(data_wide[, -1], scale.unit = TRUE, graph = FALSE) # Exclude the 'Strain' column
# Scree plot
fviz_eig(pca_result, addlabels = TRUE, ylim = c(0, 80)) + # Adjust ylim as needed
labs(x = "Principal Component", y = "Percentage of Explained Variance")
# Individuals plot (colored by Strain)
fviz_pca_ind(pca_result, repel = TRUE,
col.ind = data_wide$Strain,
palette = colorRampPalette(brewer.pal(12, "Paired"))(16),# Use a larger color palette
addEllipses = TRUE, ellipse.type = "confidence",
legend.title = "Strain",
title = "PCA - Individuals Plot (Hybrids Only, Colored by Strain)")
# Variables plot
fviz_pca_var(pca_result, repel = TRUE,
col.var = "contrib", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
title = "PCA - Variables Plot")
# Biplot with both individuals and variables
fviz_pca_biplot(pca_result, repel = TRUE,
col.ind = data_wide$Strain,
palette = colorRampPalette(brewer.pal(12, "Paired"))(16),# Use a larger color palette
addEllipses = TRUE, ellipse.type = "confidence",
legend.title = "Strain",
title = "PCA Biplot (Hybrids Only, Colored by Strain)")
# 2. Hierarchical Clustering
# Extract the principal components
pca_coords <- pca_result$ind$coord
# Perform hierarchical clustering (using first few PCs - adjust as needed)
hc_result <- hclust(dist(pca_coords[, 1:2]), method = "ward.D2")
# Visualize the dendrogram
plot(hc_result, hang = -1, main = "Hierarchical Clustering on PCs",
xlab = "", sub = "")
# 3. K-means Clustering
# Determine optimal number of clusters (Elbow method) using data_hybrids
set.seed(123)
wss <- fviz_nbclust(data_hybrids[, traits_multivariate], kmeans, method = "wss") # Use data_hybrids
# Perform k-means clustering
k <- 3 # Adjust based on the elbow plot or other criteria
kmeans_result <- kmeans(pca_coords, centers = k, nstart = 25)
# Visualize clusters
fviz_cluster(kmeans_result, data = pca_coords, geom = "point", ellipse.type = "convex",
ggtheme = theme_minimal())
# 4. Factor Analysis
# Load necessary library
library(psych)
# Determine the appropriate number of factors (e.g., using scree plot or parallel analysis)
# For this example, let's assume you've decided to extract 2 factors
nfactors <- 4
# Perform Factor Analysis (determine the number of factors first)
fa_model <- fa(data_hybrids[, traits_multivariate], nfactors = 4, rotate = "varimax") # Adjust nfactors based on your analysis
# Print the results
print(fa_model, sort = TRUE)
## Factor Analysis using method = minres
## Call: fa(r = data_hybrids[, traits_multivariate], nfactors = 4, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
## item MR1 MR2 MR4 MR3 h2 u2 com
## Single_Cocoon_Weight 1 0.96 0.01 0.25 -0.01 0.98 0.0212 1.1
## Cocoon_Shell_Weight 2 0.82 0.52 0.23 0.03 1.00 0.0049 1.9
## Live_Pupa_Percentage 5 -0.67 -0.40 -0.25 0.01 0.67 0.3258 1.9
## Cocoon_Shell_Percentage 4 0.20 0.95 0.14 0.05 0.97 0.0334 1.1
## Larval_Weight 3 0.40 0.20 0.88 0.18 1.00 0.0050 1.6
## Hatching_Rate 6 -0.01 0.02 0.08 0.65 0.43 0.5687 1.0
##
## MR1 MR2 MR4 MR3
## SS loadings 2.24 1.37 0.97 0.46
## Proportion Var 0.37 0.23 0.16 0.08
## Cumulative Var 0.37 0.60 0.76 0.84
## Proportion Explained 0.44 0.27 0.19 0.09
## Cumulative Proportion 0.44 0.72 0.91 1.00
##
## Mean item complexity = 1.5
## Test of the hypothesis that 4 factors are sufficient.
##
## df null model = 15 with the objective function = 5.41 with Chi Square = 498.7
## df of the model are -3 and the objective function was 0
##
## The root mean square of the residuals (RMSR) is 0
## The df corrected root mean square of the residuals is NA
##
## The harmonic n.obs is 96 with the empirical chi square 0 with prob < NA
## The total n.obs was 96 with Likelihood Chi Square = 0 with prob < NA
##
## Tucker Lewis Index of factoring reliability = 1.032
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR1 MR2 MR4 MR3
## Correlation of (regression) scores with factors 0.99 0.99 0.98 0.66
## Multiple R square of scores with factors 0.98 0.97 0.96 0.44
## Minimum correlation of possible factor scores 0.97 0.94 0.93 -0.13
# Visualize factor loadings
fa.diagram(fa_model, simple = FALSE)
# 1. Multidimensional Scaling (MDS)
# Calculate a distance matrix (using Euclidean distance as an example)
dist_matrix <- dist(data_hybrids[, traits_multivariate], method = "euclidean")
# Perform classical MDS
mds_result <- cmdscale(dist_matrix, k = 2) # k = number of dimensions
# Convert to dataframe for plotting
mds_df <- as.data.frame(mds_result)
colnames(mds_df) <- c("MDS1", "MDS2")
mds_df$Strain <- data_hybrids$Strain
mds_df$Location <- data_hybrids$Location
# Visualize MDS results
ggplot(mds_df, aes(x = MDS1, y = MDS2, color = Strain, shape = Location)) +
geom_point(size = 3) +
labs(title = "MDS Plot of Silkworm Strains") +
theme_bw()
# 2. Principal Coordinate Analysis (PCoA)
# Perform PCoA (using the same distance matrix)
pcoa_result <- cmdscale(dist_matrix, k = 2, eig = TRUE) # eig = TRUE to get eigenvalues
# Convert to dataframe for plotting
pcoa_df <- as.data.frame(pcoa_result$points)
colnames(pcoa_df) <- c("PCoA1", "PCoA2")
pcoa_df$Strain <- data_hybrids$Strain
pcoa_df$Location <- data_hybrids$Location
# Visualize PCoA results
ggplot(pcoa_df, aes(x = PCoA1, y = PCoA2, color = Strain, shape = Location)) +
geom_point(size = 3) +
labs(title = "PCoA Plot of Silkworm Strains") +
theme_bw()
# 3. Non-metric Multidimensional Scaling (NMDS)
# Perform NMDS
nmds_result <- metaMDS(data_hybrids[, traits_multivariate], k = 2) # k = number of dimensions
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.0309942
## Run 1 stress 0.03620601
## Run 2 stress 0.03569862
## Run 3 stress 0.0418888
## Run 4 stress 0.041855
## Run 5 stress 0.04107385
## Run 6 stress 0.03712024
## Run 7 stress 0.04048451
## Run 8 stress 0.03620584
## Run 9 stress 0.03724261
## Run 10 stress 0.03304121
## Run 11 stress 0.03960414
## Run 12 stress 0.03858935
## Run 13 stress 0.04412229
## Run 14 stress 0.04147472
## Run 15 stress 0.04014125
## Run 16 stress 0.03299432
## Run 17 stress 0.03099445
## ... Procrustes: rmse 4.890779e-05 max resid 0.0004212855
## ... Similar to previous best
## Run 18 stress 0.03620604
## Run 19 stress 0.03839966
## Run 20 stress 0.04480203
## *** Best solution repeated 1 times
# Visualize NMDS results (using 'stressplot' to assess goodness-of-fit)
stressplot(nmds_result)
# Plot NMDS results with strain and location information
ordiplot(nmds_result, type = "n") # Create an empty plot
points(nmds_result, col = as.numeric(data_hybrids$Strain), pch = as.numeric(data_hybrids$Location))
ordihull(nmds_result, groups = data_hybrids$Strain, draw = "polygon", col = "grey90") # Add convex hulls for strains
# 6. Discriminant Analysis
# Load necessary library
library(MASS)
# Perform Linear Discriminant Analysis
lda_model <- lda(Strain ~ ., data = data_hybrids[, c(traits_multivariate, "Strain")]) # Use data_hybrids
# Print the results
print(lda_model)
## Call:
## lda(Strain ~ ., data = data_hybrids[, c(traits_multivariate,
## "Strain")])
##
## Prior probabilities of groups:
## SW 101 SW 102 SW 103 SW 104 SW 105 SW 106 SW 107 SW 108 SW 109 SW 110 SW 111
## 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625 0.0625
## SW 112 SW 113 SW 114 SW 115 SW 116
## 0.0625 0.0625 0.0625 0.0625 0.0625
##
## Group means:
## Single_Cocoon_Weight Cocoon_Shell_Weight Larval_Weight
## SW 101 1.483800 0.2937500 29.43856
## SW 102 1.433683 0.2817167 28.82072
## SW 103 1.419659 0.2983770 28.98178
## SW 104 1.328917 0.2838542 29.02983
## SW 105 1.364420 0.2863379 28.13722
## SW 106 1.431433 0.2898333 31.15083
## SW 107 1.501850 0.3016167 28.59839
## SW 108 1.479129 0.2995985 29.56856
## SW 109 1.496533 0.2868167 30.23294
## SW 110 1.487437 0.2914792 29.30011
## SW 111 1.532417 0.3048333 32.25856
## SW 112 1.532417 0.3018033 32.28139
## SW 113 1.459350 0.3026000 31.27800
## SW 114 1.473400 0.2910000 31.38083
## SW 115 1.420700 0.2727167 30.67006
## SW 116 1.402300 0.2795000 30.81650
## Cocoon_Shell_Percentage Live_Pupa_Percentage Hatching_Rate
## SW 101 19.90443 67.90551 97.70825
## SW 102 19.69722 60.93521 97.07138
## SW 103 21.20281 65.48781 97.05508
## SW 104 21.59885 68.26253 96.97524
## SW 105 21.15305 64.27633 96.52421
## SW 106 20.38933 56.39149 97.05680
## SW 107 20.20448 58.76280 95.38672
## SW 108 20.32851 47.36697 95.98854
## SW 109 20.20702 55.66281 94.85372
## SW 110 19.77045 52.02084 96.51966
## SW 111 20.22120 61.29672 96.17741
## SW 112 19.83908 63.69991 95.84273
## SW 113 20.87511 71.05085 96.53398
## SW 114 19.92945 75.21564 95.70890
## SW 115 19.42583 67.40499 97.23525
## SW 116 20.07143 69.04902 97.64132
##
## Coefficients of linear discriminants:
## LD1 LD2 LD3 LD4
## Single_Cocoon_Weight -45.662383413 28.75622882 -6.421795663 1.39797864
## Cocoon_Shell_Weight 193.042705416 -197.11676454 -4.769458117 -37.14424759
## Larval_Weight -0.050401573 -0.09756293 -0.004739677 0.33472202
## Cocoon_Shell_Percentage -1.867591352 2.80706699 -0.992320812 0.34293096
## Live_Pupa_Percentage -0.001484765 -0.03305863 -0.047030577 0.01490563
## Hatching_Rate 0.363148961 -0.01672590 0.300879586 0.06963535
## LD5 LD6
## Single_Cocoon_Weight -20.21647918 2.05199921
## Cocoon_Shell_Weight 65.87357128 -5.34658662
## Larval_Weight 0.15211049 0.07033239
## Cocoon_Shell_Percentage -1.10394258 0.24364051
## Live_Pupa_Percentage -0.02089249 -0.01680327
## Hatching_Rate -0.47004763 0.12758767
##
## Proportion of trace:
## LD1 LD2 LD3 LD4 LD5 LD6
## 0.4994 0.2530 0.1413 0.0746 0.0259 0.0057
# Predict group membership for new data (optional)
# new_data <- data.frame( ... ) # Replace ... with your new data
# predictions <- predict(lda_model, newdata = new_data)
# Visualize LDA results
# Scatterplot of discriminant scores
lda_scores <- predict(lda_model)$x
# Create a data frame with the discriminant scores, Strain, and Location information
lda_df <- data.frame(lda_scores, Strain = data_hybrids$Strain, Location = data_hybrids$Location)
# Plot the discriminant scores, with a custom color palette for better distinction
ggplot(lda_df, aes(x = LD1, y = LD2, color = Strain, shape = Location)) +
geom_point(size = 3) + # Increase the 'size' argument
labs(title = "Scatterplot of Discriminant Scores", x = "LD1", y = "LD2") +
theme_bw() +
scale_color_manual(values = rainbow(nlevels(data_hybrids$Strain))) + # Use a rainbow color palette
guides(color = guide_legend(ncol = 1)) # Display Strain legend in 1 column
Technical Discussion Points and Considerations:
Scree Plot: 1. Examine the proportion of variance explained by each principal component (PC). 2. Look for the “elbow” point where the explained variance starts to level off, indicating the number of PCs to retain for further analysis.
Individuals Plot: 1. Observe how the individual silkworms are distributed in the PCA space based on their trait values. 2. Look for clusters or patterns that might indicate similarities or differences between strains. 3. Consider the implications of the positioning of individuals relative to the PC axes and the ellipses representing strain confidence intervals.
Variables Plot: 1. Interpret the contribution of each trait to the principal components. 2. Arrows pointing in the same direction suggest positive correlations between traits, while arrows pointing in opposite directions suggest negative correlations. 3. The length of the arrow represents the strength of the contribution.
2. Hierarchical Clustering on Principal Components: We’ll perform hierarchical clustering on the first few principal components to group similar individuals together.
Technical Discussion Points and Considerations:
3. K-means Clustering: K-means clustering will partition the individuals into a predefined number of clusters based on their principal component scores.
Technical Discussion Points and Considerations:
Elbow Plot: 1. Use the elbow plot to help determine the optimal number of clusters (k) for k-means clustering. 2. Look for the point where the within-cluster sum of squares (WSS) starts to level off, indicating diminishing returns from adding more clusters.
Cluster Plot: 1. Observe how the individuals are partitioned into clusters in the PCA space. 2. Interpret the meaning of each cluster in terms of the phenotypic traits and strains. 3. Consider the separation between clusters and the tightness within clusters.
We will explore the relationships between the phenotypic traits using both Pearson and Spearman correlation coefficients.
# 1. Correlation Matrices and Correlograms
# Calculate Pearson and Spearman correlation matrices
cor_pearson <- cor(combined_data[, all_traits], method = "pearson") # Use all_traits
cor_spearman <- cor(combined_data[, all_traits], method = "spearman") # Use all_traits
# Visualize correlation matrices using heatmaps
heatmap(cor_pearson, Rowv = NA, Colv = NA,
col = colorRampPalette(c("blue", "white", "red"))(20),
scale = "none", margins = c(5, 10), cexRow = 0.8, cexCol = 0.8)
heatmap(cor_spearman, Rowv = NA, Colv = NA,
col = colorRampPalette(c("blue", "white", "red"))(20),
scale = "none", margins = c(5, 10), cexRow = 0.8, cexCol = 0.8)
# Visualize correlograms
corrplot(cor_pearson, method = "color", type = "upper", order = "hclust",
tl.col = "black", tl.srt = 45, sig.level = 0.05,
title = "Pearson Correlation", mar = c(0, 0, 2, 0))
corrplot(cor_spearman, method = "color", type = "upper", order = "hclust",
tl.col = "black", tl.srt = 45, sig.level = 0.05,
title = "Spearman Correlation", mar = c(0, 0, 2, 0))
# 2. Individual Correlation Tests and Scatterplots
# Function to perform correlation test and plot
plot_correlation <- function(trait1, trait2, data, method = "pearson") {
cor_test <- cor.test(data[[trait1]], data[[trait2]], method = method)
p_value <- cor_test$p.value
cor_coef <- cor_test$estimate
ggplot(data, aes_string(x = trait1, y = trait2, color = "Location")) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = paste("Correlation between", trait1, "and", trait2),
subtitle = paste(method, "correlation:", round(cor_coef, 3), ", p-value:", round(p_value, 5))) +
facet_wrap(~ Location) # Facet by Location
}
# Example usage (replace with your actual trait names)
plot_correlation("Single_Cocoon_Weight", "Cocoon_Shell_Weight", combined_data, method = "pearson")
## `geom_smooth()` using formula = 'y ~ x'
plot_correlation("Single_Cocoon_Weight", "Cocoon_Shell_Weight", combined_data, method = "spearman")
## `geom_smooth()` using formula = 'y ~ x'
# 3. Scatterplot Matrix
# Scatterplot matrix with Pearson correlations
ggpairs(combined_data[, c(all_traits, "Location")], # Include 'Location' in the subset
aes(color = Location, alpha = 0.5),
upper = list(continuous = wrap("cor", method = "pearson", size = 3))) +
theme(axis.text = element_text(size = 8)) +
theme(plot.margin = margin(5, 10, 5, 5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Scatterplot matrix with Spearman correlations
ggpairs(combined_data[, c(all_traits, "Location")], # Include 'Location' in the subset
aes(color = Location, alpha = 0.5),
upper = list(continuous = wrap("cor", method = "spearman", size = 3))) +
theme(axis.text = element_text(size = 8)) +
theme(plot.margin = margin(5, 10, 5, 5))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# 4. Partial Correlation Analysis
# Get the list of traits for partial correlation analysis
traits_partial <- c("Single_Cocoon_Weight", "Cocoon_Shell_Weight", "Larval_Weight",
"Cocoon_Shell_Percentage", "Hatching_Rate", "Live_Pupa_Percentage")
# Generate all combinations of 3 traits
trait_combinations <- combn(traits_partial, 3, simplify = FALSE)
# Loop through each location
for (loc in levels(combined_data$Location)) { # Use combined_data$Location
cat("\n\n--- Analyzing Partial Correlations for", loc, "---\n")
# Subset data by location first
data_loc <- combined_data %>% filter(Location == loc)
# Loop through each combination of traits
for (combination in trait_combinations) {
trait1 <- combination[1]
trait2 <- combination[2]
trait3 <- combination[3] # Controlling variable
# Check for missing values in the selected traits within the location subset
if (any(is.na(data_loc[[trait1]])) || any(is.na(data_loc[[trait2]])) || any(is.na(data_loc[[trait3]]))) {
cat("\n\nMissing values found for", trait1, ",", trait2, ", or", trait3, "in", loc, ". Imputing missing values.\n")
# Impute missing values using the mean for each trait within the location
data_loc <- data_loc %>%
mutate(across(c(trait1, trait2, trait3), ~ ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x)))
}
# Calculate partial correlation using 'psych' package
partial_cor <- partial.r(data_loc[, c(trait1, trait2, trait3)]) # Calculate partial correlation matrix
# Extract the relevant partial correlation coefficient
partial_cor_coef <- partial_cor[trait1, trait2]
# Perform correlation test to get p-value
cor_test_result <- cor.test(data_loc[[trait1]], data_loc[[trait2]], method = "pearson") # Use appropriate method
p_value <- cor_test_result$p.value
# Print the results
cat("\n\nPartial Correlation between", trait1, "and", trait2,
"controlling for", trait3, "in", loc, ":\n")
cat(" Correlation Coefficient:", partial_cor_coef, "\n")
cat(" P-value:", p_value, "\n") # Print the p-value
}
}
##
##
## --- Analyzing Partial Correlations for BALUBAL ---
##
##
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Weight controlling for Larval_Weight in BALUBAL :
## Correlation Coefficient: 0.7023777
## P-value: 3.15263e-08
##
##
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Weight controlling for Cocoon_Shell_Percentage in BALUBAL :
## Correlation Coefficient: 0.8801142
## P-value: 3.15263e-08
##
##
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Weight controlling for Hatching_Rate in BALUBAL :
## Correlation Coefficient: 0.6820058
## P-value: 3.15263e-08
##
##
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Weight controlling for Live_Pupa_Percentage in BALUBAL :
## Correlation Coefficient: 0.6859562
## P-value: 3.15263e-08
##
##
## Partial Correlation between Single_Cocoon_Weight and Larval_Weight controlling for Cocoon_Shell_Percentage in BALUBAL :
## Correlation Coefficient: -0.03890379
## P-value: 0.8983348
##
##
## Partial Correlation between Single_Cocoon_Weight and Larval_Weight controlling for Hatching_Rate in BALUBAL :
## Correlation Coefficient: 0.02922613
## P-value: 0.8983348
##
##
## Partial Correlation between Single_Cocoon_Weight and Larval_Weight controlling for Live_Pupa_Percentage in BALUBAL :
## Correlation Coefficient: 0.03335183
## P-value: 0.8983348
##
##
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Percentage controlling for Hatching_Rate in BALUBAL :
## Correlation Coefficient: -0.5263022
## P-value: 0.000309444
##
##
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Percentage controlling for Live_Pupa_Percentage in BALUBAL :
## Correlation Coefficient: -0.5077487
## P-value: 0.000309444
##
##
## Partial Correlation between Single_Cocoon_Weight and Hatching_Rate controlling for Live_Pupa_Percentage in BALUBAL :
## Correlation Coefficient: -0.1604772
## P-value: 0.1388483
##
##
## Partial Correlation between Cocoon_Shell_Weight and Larval_Weight controlling for Cocoon_Shell_Percentage in BALUBAL :
## Correlation Coefficient: -0.114652
## P-value: 0.4273099
##
##
## Partial Correlation between Cocoon_Shell_Weight and Larval_Weight controlling for Hatching_Rate in BALUBAL :
## Correlation Coefficient: -0.06250522
## P-value: 0.4273099
##
##
## Partial Correlation between Cocoon_Shell_Weight and Larval_Weight controlling for Live_Pupa_Percentage in BALUBAL :
## Correlation Coefficient: -0.05151845
## P-value: 0.4273099
##
##
## Partial Correlation between Cocoon_Shell_Weight and Cocoon_Shell_Percentage controlling for Hatching_Rate in BALUBAL :
## Correlation Coefficient: 0.1029392
## P-value: 0.4305926
##
##
## Partial Correlation between Cocoon_Shell_Weight and Cocoon_Shell_Percentage controlling for Live_Pupa_Percentage in BALUBAL :
## Correlation Coefficient: 0.1207397
## P-value: 0.4305926
##
##
## Partial Correlation between Cocoon_Shell_Weight and Hatching_Rate controlling for Live_Pupa_Percentage in BALUBAL :
## Correlation Coefficient: -0.1944656
## P-value: 0.06232077
##
##
## Partial Correlation between Larval_Weight and Cocoon_Shell_Percentage controlling for Hatching_Rate in BALUBAL :
## Correlation Coefficient: -0.01603855
## P-value: 0.8416214
##
##
## Partial Correlation between Larval_Weight and Cocoon_Shell_Percentage controlling for Live_Pupa_Percentage in BALUBAL :
## Correlation Coefficient: -0.03087665
## P-value: 0.8416214
##
##
## Partial Correlation between Larval_Weight and Hatching_Rate controlling for Live_Pupa_Percentage in BALUBAL :
## Correlation Coefficient: 0.1303172
## P-value: 0.1406125
##
##
## Partial Correlation between Cocoon_Shell_Percentage and Hatching_Rate controlling for Live_Pupa_Percentage in BALUBAL :
## Correlation Coefficient: -0.07004473
## P-value: 0.6616764
##
##
## --- Analyzing Partial Correlations for TCMO ---
##
##
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Weight controlling for Larval_Weight in TCMO :
## Correlation Coefficient: 0.5127208
## P-value: 1.959491e-05
##
##
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Weight controlling for Cocoon_Shell_Percentage in TCMO :
## Correlation Coefficient: 0.9967735
## P-value: 1.959491e-05
##
##
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Weight controlling for Hatching_Rate in TCMO :
## Correlation Coefficient: 0.5626946
## P-value: 1.959491e-05
##
##
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Weight controlling for Live_Pupa_Percentage in TCMO :
## Correlation Coefficient: 0.540071
## P-value: 1.959491e-05
##
##
## Partial Correlation between Single_Cocoon_Weight and Larval_Weight controlling for Cocoon_Shell_Percentage in TCMO :
## Correlation Coefficient: 0.4529217
## P-value: 0.001023493
##
##
## Partial Correlation between Single_Cocoon_Weight and Larval_Weight controlling for Hatching_Rate in TCMO :
## Correlation Coefficient: 0.455567
## P-value: 0.001023493
##
##
## Partial Correlation between Single_Cocoon_Weight and Larval_Weight controlling for Live_Pupa_Percentage in TCMO :
## Correlation Coefficient: 0.4546899
## P-value: 0.001023493
##
##
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Percentage controlling for Hatching_Rate in TCMO :
## Correlation Coefficient: -0.3907053
## P-value: 0.008486627
##
##
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Percentage controlling for Live_Pupa_Percentage in TCMO :
## Correlation Coefficient: -0.3746729
## P-value: 0.008486627
##
##
## Partial Correlation between Single_Cocoon_Weight and Hatching_Rate controlling for Live_Pupa_Percentage in TCMO :
## Correlation Coefficient: -0.05502732
## P-value: 0.3154037
##
##
## Partial Correlation between Cocoon_Shell_Weight and Larval_Weight controlling for Cocoon_Shell_Percentage in TCMO :
## Correlation Coefficient: 0.4396895
## P-value: 0.03382506
##
##
## Partial Correlation between Cocoon_Shell_Weight and Larval_Weight controlling for Hatching_Rate in TCMO :
## Correlation Coefficient: 0.3005679
## P-value: 0.03382506
##
##
## Partial Correlation between Cocoon_Shell_Weight and Larval_Weight controlling for Live_Pupa_Percentage in TCMO :
## Correlation Coefficient: 0.2932895
## P-value: 0.03382506
##
##
## Partial Correlation between Cocoon_Shell_Weight and Cocoon_Shell_Percentage controlling for Hatching_Rate in TCMO :
## Correlation Coefficient: 0.5384771
## P-value: 7.42543e-05
##
##
## Partial Correlation between Cocoon_Shell_Weight and Cocoon_Shell_Percentage controlling for Live_Pupa_Percentage in TCMO :
## Correlation Coefficient: 0.5752916
## P-value: 7.42543e-05
##
##
## Partial Correlation between Cocoon_Shell_Weight and Hatching_Rate controlling for Live_Pupa_Percentage in TCMO :
## Correlation Coefficient: -0.1275126
## P-value: 0.1982872
##
##
## Partial Correlation between Larval_Weight and Cocoon_Shell_Percentage controlling for Hatching_Rate in TCMO :
## Correlation Coefficient: -0.1173128
## P-value: 0.4466857
##
##
## Partial Correlation between Larval_Weight and Cocoon_Shell_Percentage controlling for Live_Pupa_Percentage in TCMO :
## Correlation Coefficient: -0.1062923
## P-value: 0.4466857
##
##
## Partial Correlation between Larval_Weight and Hatching_Rate controlling for Live_Pupa_Percentage in TCMO :
## Correlation Coefficient: -0.03882875
## P-value: 0.656431
##
##
## Partial Correlation between Cocoon_Shell_Percentage and Hatching_Rate controlling for Live_Pupa_Percentage in TCMO :
## Correlation Coefficient: -0.09105911
## P-value: 0.6554186
# 5. Correlation with Location
# Calculate ANOVA for each trait with Location as a factor and store the results
anova_results <- list()
for (trait in all_traits) { # Loop through all traits
anova_result <- aov(as.formula(paste(trait, "~ Location")), data = combined_data)
anova_results[[trait]] <- anova_result # Store the entire ANOVA table
print(summary(anova_result))
}
## Df Sum Sq Mean Sq F value Pr(>F)
## Location 1 0.5895 0.5895 103.6 <2e-16 ***
## Residuals 94 0.5351 0.0057
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Location 1 0.0610 0.06100 271.7 <2e-16 ***
## Residuals 94 0.0211 0.00022
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Location 1 51.73 51.73 61.39 7.07e-12 ***
## Residuals 94 79.20 0.84
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Location 1 8.52 8.520 3.193 0.0772 .
## Residuals 94 250.85 2.669
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Location 1 598.3 598.3 87.75 4.07e-15 ***
## Residuals 94 641.0 6.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Df Sum Sq Mean Sq F value Pr(>F)
## Location 1 66639 66639 343.8 <2e-16 ***
## Residuals 94 18220 194
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Create a list to store the boxplots
boxplots <- list()
# Generate boxplots for each trait with ANOVA results
for (trait in all_traits) { # Loop through all traits
# Extract p-value from ANOVA result
p_value <- summary(anova_results[[trait]])[[1]][["Pr(>F)"]][1]
# Format p-value for display
p_value_label <- if (p_value < 0.001) {
"< 0.001" # Display as "< 0.001" if very small
} else {
paste("=", round(p_value, 3)) # Otherwise, round to 3 decimal places
}
# Create boxplot with p-value annotation
boxplots[[trait]] <- ggplot(combined_data, aes(x = Location, y = .data[[trait]])) +
geom_boxplot() +
labs(title = paste("Boxplot of", trait, "by Location"),
x = "Location", y = trait) +
theme_bw() +
annotate("text", x = 1.5, y = max(combined_data[[trait]]),
label = paste("ANOVA p-value:", p_value_label)) # Use the formatted p-value label
}
# Arrange all boxplots in a single figure
ggarrange(plotlist = boxplots, ncol = 2, nrow = 3) # Adjust ncol and nrow as needed
## Advanced Correlation Analysis (Combined Locations)
# 1. Distance Correlation
library(energy) # Load the 'energy' package
# Calculate distance correlation between all pairs of traits
dist_cor_matrix <- matrix(nrow = length(all_traits), ncol = length(all_traits))
rownames(dist_cor_matrix) <- all_traits
colnames(dist_cor_matrix) <- all_traits
for (i in 1:length(all_traits)) {
for (j in i:length(all_traits)) {
trait1 <- all_traits[i]
trait2 <- all_traits[j]
dist_cor <- dcor(combined_data[[trait1]], combined_data[[trait2]])
dist_cor_matrix[i, j] <- dist_cor_matrix[j, i] <- dist_cor
}
}
# Print the distance correlation matrix
print(dist_cor_matrix)
## Single_Cocoon_Weight Cocoon_Shell_Weight
## Single_Cocoon_Weight 1.0000000 0.8441943
## Cocoon_Shell_Weight 0.8441943 1.0000000
## Cocoon_Shell_Percentage 0.4568003 0.7403779
## Hatching_Rate 0.1329776 0.1771922
## Larval_Weight 0.5845634 0.6584816
## Live_Pupa_Percentage 0.7107905 0.8266545
## Cocoon_Shell_Percentage Hatching_Rate Larval_Weight
## Single_Cocoon_Weight 0.4568003 0.1329776 0.5845634
## Cocoon_Shell_Weight 0.7403779 0.1771922 0.6584816
## Cocoon_Shell_Percentage 1.0000000 0.1509995 0.4720629
## Hatching_Rate 0.1509995 1.0000000 0.2280294
## Larval_Weight 0.4720629 0.2280294 1.0000000
## Live_Pupa_Percentage 0.6059396 0.1627910 0.6082860
## Live_Pupa_Percentage
## Single_Cocoon_Weight 0.7107905
## Cocoon_Shell_Weight 0.8266545
## Cocoon_Shell_Percentage 0.6059396
## Hatching_Rate 0.1627910
## Larval_Weight 0.6082860
## Live_Pupa_Percentage 1.0000000
library(heatmaply)
## Loading required package: plotly
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:MASS':
##
## select
##
## The following object is masked from 'package:Hmisc':
##
## subplot
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
##
## Loading required package: viridis
## Loading required package: viridisLite
## Registered S3 method overwritten by 'dendextend':
## method from
## rev.hclust vegan
## Registered S3 method overwritten by 'seriation':
## method from
## reorder.hclust vegan
##
## ======================
## Welcome to heatmaply version 1.5.0
##
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
##
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## You may ask questions at stackoverflow, use the r and heatmaply tags:
## https://stackoverflow.com/questions/tagged/heatmaply
## ======================
heatmaply(dist_cor_matrix,
col = colorRampPalette(c("white", "blue"))(20),
main = "Distance Correlation Matrix")
# 2. Canonical Correlation Analysis (CCA)
# Define the two sets of variables (adjust based on your traits)
set1 <- combined_data[, c("Single_Cocoon_Weight", "Cocoon_Shell_Weight")]
set2 <- combined_data[, c("Cocoon_Shell_Percentage", "Live_Pupa_Percentage")]
# Perform CCA
cca_result <- cancor(set1, set2)
# Print and visualize the results
print(cca_result)
## $cor
## [1] 0.9580569 0.7017367
##
## $xcoef
## [,1] [,2]
## Single_Cocoon_Weight -1.106622 -1.372460
## Cocoon_Shell_Weight 6.177256 2.101308
##
## $ycoef
## [,1] [,2]
## Cocoon_Shell_Percentage 0.0803646770 0.067002971
## Live_Pupa_Percentage -0.0004624349 0.004083874
##
## $xcenter
## Single_Cocoon_Weight Cocoon_Shell_Weight
## 1.4529653 0.2916146
##
## $ycenter
## Cocoon_Shell_Percentage Live_Pupa_Percentage
## 20.30114 62.79934
# Display the canonical correlations
cat("\nCanonical Correlations:\n")
##
## Canonical Correlations:
print(cca_result$cor)
## [1] 0.9580569 0.7017367
# Calculate the canonical variates
cv1 <- as.matrix(set1) %*% cca_result$xcoef[, 1] # First canonical variate for set1
cv2 <- as.matrix(set2) %*% cca_result$ycoef[, 1] # First canonical variate for set2
# Visualize the canonical variates
plot(cv1, cv2, # Plot the canonical variates
xlab = "Canonical Variate 1 (Set 1)",
ylab = "Canonical Variate 2 (Set 2)",
main = "Canonical Correlation Analysis")
# 3. Copula-Based Correlation
# Load necessary library
library(copula)
##
## Attaching package: 'copula'
##
## The following object is masked from 'package:lubridate':
##
## interval
# Select two traits for copula analysis (replace with your actual trait names)
trait1 <- "Single_Cocoon_Weight"
trait2 <- "Cocoon_Shell_Weight"
# Create a copula object (using Gaussian copula as an example)
copula_data <- cbind(combined_data[[trait1]], combined_data[[trait2]])
copula_model <- normalCopula(dim = 2)
# Transform the data to pseudo-observations using pobs()
copula_data <- pobs(cbind(combined_data[[trait1]], combined_data[[trait2]]))
# Fit the copula model to the pseudo-observations
fit_result <- fitCopula(copula_model, data = copula_data, method = "ml") # Use maximum likelihood estimation
# Extract the estimated correlation parameter
copula_cor <- coef(fit_result)[1]
# Print the results
cat("\nCopula-Based Correlation between", trait1, "and", trait2, ":\n")
##
## Copula-Based Correlation between Single_Cocoon_Weight and Cocoon_Shell_Weight :
cat(" Correlation Coefficient:", copula_cor, "\n")
## Correlation Coefficient: 0.852632
# You can further analyze and visualize the copula model using functions like 'contourplot' or 'persp' from the 'copula' package
1. Correlation Matrices and Correlograms:
Technical Discussion Points and Considerations:
Correlation Matrices (Pearson and Spearman): 1. Correlation Coefficients: Examine the strength and direction of the correlations between traits. 2. P-values: Assess the statistical significance of the correlations. 3. Comparison: Compare the Pearson and Spearman correlations to see if there are any notable differences, which might indicate non-linear relationships between traits.
Correlograms: 1. Visual Patterns: Observe the overall patterns in the correlograms. 2. Clusters: Identify clusters of traits that are highly correlated with each other. 3. Potential Multicollinearity: Be mindful of strong correlations between multiple traits, as this might indicate multicollinearity, which can affect the interpretation of regression or other multivariate analyses.
2. Individual Correlation Tests and Scatterplots: We will visualize and test the correlations between specific pairs of traits.
Technical Discussion Points and Considerations:
3. Scatterplot Matrix: Visualize all pairwise relationships between traits using a scatterplot matrix.
Technical Discussion Points and Considerations:
4. Heatmap of Trait Values by Strain:
This comprehensive analysis has provided insights into the distribution, relationships, and differences in silkworm phenotypic traits across various strains. We’ve addressed issues of non-normality and heteroscedasticity using transformations and non-parametric tests where appropriate. Multivariate techniques like PCA and clustering helped uncover underlying patterns in the data, while correlation analysis revealed the interdependencies between traits. These findings can guide further research and breeding efforts in silkworms.
Remember:
Interpret the results in the context of your specific research questions and biological knowledge. Consider exploring additional analyses or visualizations to gain further insights into your data. Clearly document your findings and conclusions, highlighting the key takeaways and their implications.
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.