Multivariate Analysis of Silkworm Hybrid Phenotypes Across Two Locations in the Philippines

INTRODUCTION

Unraveling the Phenotypic Tapestry of Silkworm Hybrids: A Comprehensive Multi-Trait, Multi-Location Analysis: This study embarks on a rigorous exploration of phenotypic diversity within a population of silkworm (Bombyx mori) hybrids, reared across two distinct locations in the Philippines. By employing a robust statistical and multivariate framework, we aim to transcend mere description and delve into the intricate interplay of genetic and environmental factors that orchestrate the expression of key traits critical for both fundamental research and applied breeding programs.

Delving into the Data: A Multifaceted Approach

Our investigation is structured around four key analytical pillars:

1. Data Exploration & Assumption Validation: We begin by meticulously examining the distributional properties of each phenotypic trait across the hybrid strains. This involves constructing histograms and density plots to visualize the shape of the distributions, assess central tendency and variability, and identify potential outliers that might warrant further investigation or specialized statistical treatment. Recognizing that many statistical tests rely on specific assumptions about the data, we rigorously assess the normality of the trait distributions using the Shapiro-Wilk test and evaluate the homogeneity of variance across strains using Levene’s test. These assessments guide our choice of subsequent statistical tests and ensure the validity of our inferences.

2. ANOVA & Non-Parametric Testing: Discerning Strain Differences: To discern statistically significant differences in trait means among the silkworm hybrids, we employ Analysis of Variance (ANOVA), a powerful parametric test that compares means across multiple groups. However, the validity of ANOVA hinges on the fulfillment of the normality and homogeneity of variance assumptions. In cases where these assumptions are violated, we strategically employ log transformations to potentially rectify deviations from normality or homoscedasticity. If transformations fail to remedy the violations, we adopt the non-parametric Kruskal-Wallis test, which compares medians across groups based on ranks, making it suitable for data that doesn’t conform to the assumptions of ANOVA. To pinpoint specific strain pairs exhibiting significant differences, we employ post-hoc tests, namely Tukey’s Honestly Significant Difference (HSD) test for ANOVA and Dunn’s test for Kruskal-Wallis. These tests allow for pairwise comparisons between strains while controlling for the increased risk of Type I errors (false positives) associated with multiple testing.

3. Multivariate Analysis: Unveiling Hidden Patterns: To move beyond univariate analysis and explore the interrelationships among multiple traits, we harness the power of multivariate techniques. Principal Component Analysis (PCA) is employed to unveil the underlying structure of phenotypic variation, effectively reducing the dimensionality of the data and identifying key patterns. By scrutinizing scree plots and biplots, we ascertain the contribution of each trait to the principal components and visualize the relationships between individuals (silkworms) and variables (traits). Furthermore, we apply hierarchical clustering and k-means clustering on the principal component scores to classify silkworms into distinct groups based on their phenotypic similarities. This clustering analysis can reveal natural groupings of strains with similar trait profiles, potentially suggesting shared genetic backgrounds or responses to environmental factors.

4. Correlation Analysis: Deciphering Trait Interdependencies: To quantify and visualize the relationships between traits, we compute and visualize both Pearson and Spearman correlation matrices. The Pearson correlation coefficient captures linear relationships between traits, while the Spearman correlation coefficient assesses monotonic relationships, accommodating both linear and non-linear associations. Scrutinizing correlograms allows us to discern clusters of highly correlated traits and identify potential multicollinearity, which can inform subsequent analyses. To rigorously assess the strength and significance of associations between specific trait pairs, we conduct individual correlation tests (Pearson and Spearman). Furthermore, we generate scatterplot matrices to visualize all pairwise relationships between traits, aiding in the identification of patterns, outliers, and potential non-linear associations that might not be captured by simple correlation coefficients. Finally, we explore trait correlations within the context of strain variation using mixed-effects models, enabling us to discern whether correlations are consistent across strains or if they vary depending on the genetic background. This can provide insights into the genetic and environmental factors that influence trait co-variation.

Bridging the Gap: Genotype to Phenotype: By integrating these diverse analytical approaches, this study strives to transcend mere description and provide a comprehensive portrait of the phenotypic landscape within a population of silkworm hybrids. The insights gleaned from this investigation will not only deepen our understanding of the genetic architecture underlying silkworm traits but also furnish invaluable information for targeted breeding efforts to enhance silk production and other economically important characteristics. Moreover, this study contributes to the broader field of evolutionary biology by shedding light on the interplay between genetic variation and phenotypic diversity in a model organism with profound implications for both scientific inquiry and industrial applications.

Data Loading and Pre-processing

# Read the data 
data <- read_excel("Hybrid.xls")

# Pre-process the 'Strain' column
data$Strain <- as.factor(data$Strain)

# Convert 'Location' to a factor
data$Location <- as.factor(data$Location)

# List of traits to analyze
traits <- c("Larval_Weight", "Single_Cocoon_Weight", "Cocoon_Shell_Weight", "Cocoon_Shell_Percentage")

Data Exploration and Assumption Checking

We begin by exploring the distributions of each trait and checking the assumptions of normality and homogeneity of variance, crucial for subsequent analyses.

Data Exploration and Assumption Checking for Larval_Weight (TCMO only)

# Function to visualize model diagnostics 
check_model <- function(model) {
  
   # Add Residuals vs Fitted plot
  plot(model, which=1)  # Generate Residuals vs Fitted plot
  
  # Create a layout with 2 rows and 2 columns
  par(mfrow = c(2, 2)) 

   # Add Residuals vs Fitted plot
  plot(model, which=1)  # Generate Residuals vs Fitted plot
  
  # Reduce margins (adjust values as needed)
  par(mar = c(3, 3, 1, 1) + 0.1)  # bottom, left, top, right

  # Plot model diagnostics
  plot(model, which = 1, sub.caption = "Residuals vs Fitted")
  plot(model, which = 2, sub.caption = "Normal Q-Q")
  plot(model, which = 3, sub.caption = "Scale-Location")
  plot(model, which = 5, sub.caption = "Residuals vs Leverage")

  # Reset the plot layout
  par(mfrow = c(1, 1)) 
}

# Subset data for TCMO location
data_tcmo <- data %>% filter(Location == "TCMO")

trait_LW <- "Larval_Weight" 

# Fit a temporary linear mixed-effects model to check assumptions (TCMO data only)
temp_model_LW <- lmer(Larval_Weight ~ Strain + (1 | Replicate), data = data_tcmo)

# Visualize model diagnostics
check_model(temp_model_LW)

# Boxplot with outliers
cat("\nBoxplot for Larval_Weight (TCMO):\n")

## 
## Boxplot for Larval_Weight (TCMO):

boxplot_LW <- ggplot(data_tcmo, aes(x = Strain, y = Larval_Weight)) +
  geom_boxplot(outlier.shape = 16, outlier.size = 2) + 
  labs(title = "Boxplot of Larval_Weight by Strain (TCMO)",
       x = "Strain", y = "Larval_Weight (g)") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) 
print(boxplot_LW)

# Outlier Detection and Treatment (using Tukey's fences on residuals)
resid_LW <- residuals(temp_model_LW) 
Q1_LW <- quantile(resid_LW, 0.25)
Q3_LW <- quantile(resid_LW, 0.75)
IQR_LW <- Q3_LW - Q1_LW

lower_bound_LW <- Q1_LW - 1.5 * IQR_LW
upper_bound_LW <- Q3_LW + 1.5 * IQR_LW

# Use 'data_tcmo' within the pipe, not 'data'
outliers_LW <- data_tcmo %>% 
  mutate(residual = resid_LW) %>%  # Add residuals to the 'data_tcmo' dataframe
  filter(residual < lower_bound_LW | residual > upper_bound_LW) %>% 
  dplyr::select(Strain, Replicate, Larval_Weight) 

if (nrow(outliers_LW) > 0) {
  cat("\nOutliers for Larval_Weight (based on residuals):\n")
  print(outliers_LW)
} else {
  cat("\nNo outliers found for Larval_Weight \n")
}

## 
## Outliers for Larval_Weight (based on residuals):
## # A tibble: 10 × 3
##    Strain Replicate Larval_Weight
##    <fct>      <dbl>         <dbl>
##  1 SW 101         2          3.89
##  2 SW 101         2          3.85
##  3 SW 103         1          3.47
##  4 SW 106         2          3.69
##  5 SW 106         2          3.75
##  6 SW 112         1          3.76
##  7 SW 113         1          3.97
##  8 SW 113         3          3.99
##  9 SW 113         3          3.95
## 10 SW 113         3          3.96

# Normality test (Shapiro-Wilk) on residuals
normality_test_LW <- shapiro.test(resid_LW)
cat("\nNormality Test (Shapiro-Wilk) on Residuals:\n")

## 
## Normality Test (Shapiro-Wilk) on Residuals:

print(normality_test_LW)

## 
##  Shapiro-Wilk normality test
## 
## data:  resid_LW
## W = 0.98465, p-value = 5.876e-05

# Distribution plots 
cat("\nDistribution Plots for Larval_Weight (TCMO):\n")

## 
## Distribution Plots for Larval_Weight (TCMO):

# Combined distribution plot 
p1_LW <- ggplot(data_tcmo, aes(x = Larval_Weight)) + 
  geom_density(fill = "lightblue", alpha = 0.5) + 
  labs(title = "Distribution of Larval_Weight (All Strains, TCMO)",
       x = "Larval_Weight", y = "Density") +
  theme_bw()

# Distribution plots by strain 
p2_LW <- ggplot(data_tcmo, aes(x = Larval_Weight)) + 
  geom_density(fill = "lightblue", alpha = 0.5) + 
  labs(x = "Larval_Weight", y = "Density") +
  facet_wrap(~ Strain, ncol = 6) +  
  theme_bw() +
  theme(strip.text = element_text(size = 8)) 

# Arrange and print the plots
print(ggarrange(p1_LW, p2_LW, nrow = 2, heights = c(1, 3)))

Technical Discussion Points and Considerations:

Boxplots: 1. Central Tendency and Spread: Examine the medians (horizontal lines within the boxes) to compare the typical values of each trait across strains. Observe the interquartile ranges (IQR, the height of the boxes) to gauge the spread of the data within each strain. 2. Outliers: Identify any points plotted individually outside the whiskers. These outliers might indicate unusual observations or potential errors in data collection. Consider their impact on 3. subsequent analyses and whether they should be handled (e.g., removal, winsorization) if they are deemed to be influential. 4. Strain Differences: Visually compare the boxplots across strains to get a preliminary sense of which traits might show significant differences between strains.

Normality Tests (Shapiro-Wilk): 1. P-values: Assess the p-values for each trait. If p < 0.05, it suggests a significant departure from normality. This might necessitate data transformation or the use of non-parametric tests. 2. Visual Inspection of Q-Q Plots: Complement the p-values by examining the Q-Q plots generated by the check_model function. If the points deviate substantially from the straight line, it further supports the evidence of non-normality.

Homogeneity of Variance Tests (Levene’s): 1. P-values: Evaluate the p-values for each trait. If p < 0.05, it suggests significant heterogeneity of variance (unequal variances) across strains. This might also necessitate data transformation or the use of alternative statistical tests. 2. Visual Inspection of Residuals vs. Fitted Plots: Examine the Residuals vs. Fitted plots from check_model. If the spread of the residuals is not roughly constant across the range of fitted values, it further supports the evidence of heteroscedasticity.

Distribution Plots: 1. Shape of Distributions: Observe the shape of the density curves for each trait, both for all strains combined and for individual strains. Look for deviations from a normal bell-shaped curve (e.g., skewness, multiple peaks). 2. Strain Comparisons: Compare the density curves across strains to visually assess differences in central tendency (mean or median) and spread (variance or interquartile range).

ANOVA, Transformation, and Non-Parametric Testing

Based on the assumption checks, we’ll perform either ANOVA (if assumptions are met) or Kruskal-Wallis test (if assumptions are violated), along with appropriate post-hoc analyses. We’ll attempt a log transformation if needed to address non-normality or heteroscedasticity.

We’ll check the assumptions of normality and homogeneity of variance for all traits. If both `assumptions are met for a trait, we’ll perform ANOVA directly on the original data, followed by Tukey’s HSD for post-hoc comparisons if the ANOVA is significant. If any trait violates at least one assumption, we’ll attempt a log transformation and re-check the assumptions. If the transformed data meets both assumptions, we’ll perform ANOVA on the transformed data, again followed by Tukey’s HSD if significant. If the transformed data still violates at least one assumption, we’ll resort to the non-parametric Kruskal-Wallis test, and if that’s significant, we’ll conduct Dunn’s test for post-hoc comparisons.

ANOVA and Post-hoc Analysis for Larval_Weight (TCMO only)

## ANOVA and Post-hoc Analysis for Larval_Weight (TCMO only)

# Subset data for TCMO location
data_tcmo <- data %>% filter(Location == "TCMO")

# Fit the linear mixed-effects model (LMM)
lmer_model_LW <- lmer(Larval_Weight ~ Strain + (1|Replicate), data = data_tcmo)  # Use TCMO data

# Check model assumptions
check_model(lmer_model_LW)

# Normality test (Shapiro-Wilk) on residuals
normality_p_value <- shapiro.test(residuals(lmer_model_LW))$p.value
cat("\nNormality Test (Shapiro-Wilk) on Residuals:\n")

## 
## Normality Test (Shapiro-Wilk) on Residuals:

print(normality_test_LW)

## 
##  Shapiro-Wilk normality test
## 
## data:  resid_LW
## W = 0.98465, p-value = 5.876e-05

# Homogeneity of variance test (Levene's)
variance_test_LW <- leveneTest(residuals(lmer_model_LW), data_tcmo$Strain)  # Provide Strain as a separate argument
cat("\nHomogeneity of Variance Test (Levene's):\n")

## 
## Homogeneity of Variance Test (Levene's):

print(variance_test_LW)

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value    Pr(>F)    
## group  15   3.259 3.522e-05 ***
##       464                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Q-Q plot of residuals (create a separate plot)
qqnorm(residuals(lmer_model_LW))  
qqline(residuals(lmer_model_LW))

# Check if both normality and homogeneity assumptions are met
if (normality_p_value >= 0.05 & variance_test_LW$`Pr(>F)`[1] >= 0.05) {
  # Perform ANOVA on the original data
  cat("\n**Performing ANOVA on original data:**\n")
  anova_result_LW <- anova(lmer_model_LW)
  print(anova_result_LW)

  # Calculate and print eta-squared for the LMM (using 'performance' package)
  eta_squared_larval_weight <- performance::r2(lmer_model_LW)$Marginal$R2
  cat("Eta-squared (Marginal R-squared):", eta_squared_larval_weight, "\n")

  # Tukey's HSD if ANOVA is significant
  if (anova_result_LW$`Pr(>F)`[1] < 0.05) {
    tukey_result_LW <- emmeans(lmer_model_LW, pairwise ~ Strain, adjust = "tukey")
    print(tukey_result_LW)

    # ... (Code for CLD generation and boxplot for TCMO data only)
  } else {
    cat("\nANOVA not significant. No post-hoc test needed.\n")
  }

} else {
  # Attempt log transformation if at least one assumption is not met
  cat("\n**Attempting log transformation...**\n")
  transformed_data_LW <- data_tcmo  # Use data_tcmo for transformation
  transformed_data_LW$Larval_Weight <- log1p(transformed_data_LW$Larval_Weight)
  transformed_model_LW <- lmer(Larval_Weight ~ Strain + (1|Replicate), data = transformed_data_LW)
  check_model(transformed_model_LW)

  # Re-check assumptions after transformation
  normality_p_value_transformed <- shapiro.test(residuals(transformed_model_LW))$p.value
  variance_test_LW_transformed <- leveneTest(residuals(transformed_model_LW), transformed_data_LW$Strain)

  # Print the results of assumption checks after transformation
  cat("\nAssumption Checks after Transformation:\n")
  cat("  Normality (Shapiro-Wilk) p-value:", normality_p_value_transformed, "\n")
  cat("  Homogeneity of Variance (Levene's) p-value:", variance_test_LW_transformed$`Pr(>F)`[1], "\n")

  if (normality_p_value_transformed >= 0.05 & variance_test_LW_transformed$`Pr(>F)`[1] >= 0.05) {
    # Use ANOVA on transformed data if both assumptions are now met
    cat("\n**Performing ANOVA on transformed data:**\n")
    anova_result_LW <- anova(transformed_model_LW)
    print(anova_result_LW)

    # Tukey's HSD if ANOVA is significant
    if (anova_result_LW$`Pr(>F)`[1] < 0.05) {
      tukey_result_LW <- emmeans(transformed_model_LW, pairwise ~ Strain, adjust = "tukey")
      print(tukey_result_LW)

      # ... (Code for CLD generation and boxplot for transformed TCMO data)
    } else {
      cat("\nANOVA not significant. No post-hoc test needed.\n")
    }

  } else {
    # Use Kruskal-Wallis if assumptions are still violated after transformation
    cat("\n**Performing Kruskal-Wallis test:**\n")
    kruskal_result_LW <- kruskal.test(Larval_Weight ~ Strain, data = data_tcmo)  # Use data_tcmo
    print(kruskal_result_LW)

    # Calculate and print epsilon-squared (effect size for Kruskal-Wallis)
    k <- length(unique(data_tcmo$Strain)) # Number of groups in data_tcmo
    n <- nrow(data_tcmo) # Total number of observations in data_tcmo
    epsilon_squared <- (kruskal_result_LW$statistic - (k - 1)) / (n - k)
    cat("Epsilon-squared:", epsilon_squared, "\n")

    # Dunn's Test if Kruskal-Wallis is significant
    if (kruskal_result_LW$p.value < 0.05) {
      cat("\n**Dunn's Test (Post-hoc with Holm correction):**\n")
      dunn_result_holm_LW <- dunnTest(Larval_Weight ~ Strain, data = data_tcmo, method="holm")  # Use data_tcmo
      print(dunn_result_holm_LW)

      # ... (Code for CLD generation and boxplot for TCMO data)
    } else {
      cat("\nKruskal-Wallis not significant. No post-hoc test needed.\n")
    }
  }
}

## 
## **Attempting log transformation...**
## 
## Assumption Checks after Transformation:
##   Normality (Shapiro-Wilk) p-value: 0.02751013 
##   Homogeneity of Variance (Levene's) p-value: 0.0002378355 
## 
## **Performing Kruskal-Wallis test:**
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Larval_Weight by Strain
## Kruskal-Wallis chi-squared = 102.98, df = 15, p-value = 3.55e-15
## 
## Epsilon-squared: 0.1896043 
## 
## **Dunn's Test (Post-hoc with Holm correction):**

## Dunn (1964) Kruskal-Wallis multiple comparison

##   p-values adjusted with the Holm method.

##          Comparison           Z      P.unadj        P.adj
## 1   SW 101 - SW 102  2.99787866 2.718659e-03 2.446793e-01
## 2   SW 101 - SW 103  2.00153308 4.533497e-02 1.000000e+00
## 3   SW 102 - SW 103 -0.99634558 3.190823e-01 1.000000e+00
## 4   SW 101 - SW 104  5.32190939 1.026837e-07 1.170594e-05
## 5   SW 102 - SW 104  2.32403074 2.012385e-02 1.000000e+00
## 6   SW 103 - SW 104  3.32037631 8.989618e-04 8.540137e-02
## 7   SW 101 - SW 105  3.68661824 2.272539e-04 2.272539e-02
## 8   SW 102 - SW 105  0.68873959 4.909872e-01 1.000000e+00
## 9   SW 103 - SW 105  1.68508516 9.197214e-02 1.000000e+00
## 10  SW 104 - SW 105 -1.63529115 1.019880e-01 1.000000e+00
## 11  SW 101 - SW 106  1.18062995 2.377498e-01 1.000000e+00
## 12  SW 102 - SW 106 -1.81724870 6.917904e-02 1.000000e+00
## 13  SW 103 - SW 106 -0.82090313 4.117014e-01 1.000000e+00
## 14  SW 104 - SW 106 -4.14127944 3.453739e-05 3.660963e-03
## 15  SW 105 - SW 106 -2.50598829 1.221097e-02 1.000000e+00
## 16  SW 101 - SW 107  1.66600521 9.571238e-02 1.000000e+00
## 17  SW 102 - SW 107 -1.33187344 1.829018e-01 1.000000e+00
## 18  SW 103 - SW 107 -0.33552787 7.372269e-01 1.000000e+00
## 19  SW 104 - SW 107 -3.65590418 2.562771e-04 2.537143e-02
## 20  SW 105 - SW 107 -2.02061303 4.331984e-02 1.000000e+00
## 21  SW 106 - SW 107  0.48537526 6.274102e-01 1.000000e+00
## 22  SW 101 - SW 108 -0.37834141 7.051770e-01 1.000000e+00
## 23  SW 102 - SW 108 -3.37622006 7.348911e-04 7.128444e-02
## 24  SW 103 - SW 108 -2.37987449 1.731854e-02 1.000000e+00
## 25  SW 104 - SW 108 -5.70025080 1.196313e-08 1.399686e-06
## 26  SW 105 - SW 108 -4.06495965 4.804077e-05 5.044281e-03
## 27  SW 106 - SW 108 -1.55897136 1.190032e-01 1.000000e+00
## 28  SW 107 - SW 108 -2.04434662 4.091932e-02 1.000000e+00
## 29  SW 101 - SW 109  0.68455130 4.936271e-01 1.000000e+00
## 30  SW 102 - SW 109 -2.31332735 2.070465e-02 1.000000e+00
## 31  SW 103 - SW 109 -1.31698178 1.878447e-01 1.000000e+00
## 32  SW 104 - SW 109 -4.63735809 3.528907e-06 3.917087e-04
## 33  SW 105 - SW 109 -3.00206694 2.681532e-03 2.440194e-01
## 34  SW 106 - SW 109 -0.49607865 6.198389e-01 1.000000e+00
## 35  SW 107 - SW 109 -0.98145391 3.263690e-01 1.000000e+00
## 36  SW 108 - SW 109  1.06289271 2.878306e-01 1.000000e+00
## 37  SW 101 - SW 110 -1.22158204 2.218657e-01 1.000000e+00
## 38  SW 102 - SW 110 -4.21946069 2.448874e-05 2.620295e-03
## 39  SW 103 - SW 110 -3.22311512 1.268046e-03 1.191963e-01
## 40  SW 104 - SW 110 -6.54349143 6.009893e-11 7.151773e-09
## 41  SW 105 - SW 110 -4.90820028 9.191597e-07 1.029459e-04
## 42  SW 106 - SW 110 -2.40221199 1.629626e-02 1.000000e+00
## 43  SW 107 - SW 110 -2.88758725 3.882089e-03 3.455059e-01
## 44  SW 108 - SW 110 -0.84324063 3.990939e-01 1.000000e+00
## 45  SW 109 - SW 110 -1.90613334 5.663291e-02 1.000000e+00
## 46  SW 101 - SW 111 -0.35274636 7.242786e-01 1.000000e+00
## 47  SW 102 - SW 111 -3.35062501 8.062941e-04 7.740423e-02
## 48  SW 103 - SW 111 -2.35427944 1.855865e-02 1.000000e+00
## 49  SW 104 - SW 111 -5.67465575 1.389678e-08 1.612027e-06
## 50  SW 105 - SW 111 -4.03936460 5.359620e-05 5.574005e-03
## 51  SW 106 - SW 111 -1.53337631 1.251832e-01 1.000000e+00
## 52  SW 107 - SW 111 -2.01875157 4.351305e-02 1.000000e+00
## 53  SW 108 - SW 111  0.02559505 9.795803e-01 9.795803e-01
## 54  SW 109 - SW 111 -1.03729766 2.995972e-01 1.000000e+00
## 55  SW 110 - SW 111  0.86883568 3.849370e-01 1.000000e+00
## 56  SW 101 - SW 112 -1.38445964 1.662178e-01 1.000000e+00
## 57  SW 102 - SW 112 -4.38233830 1.174123e-05 1.279794e-03
## 58  SW 103 - SW 112 -3.38599272 7.092126e-04 6.950284e-02
## 59  SW 104 - SW 112 -6.70636903 1.995268e-11 2.394321e-09
## 60  SW 105 - SW 112 -5.07107788 3.955688e-07 4.469928e-05
## 61  SW 106 - SW 112 -2.56508959 1.031492e-02 8.664530e-01
## 62  SW 107 - SW 112 -3.05046485 2.284874e-03 2.102084e-01
## 63  SW 108 - SW 112 -1.00611823 3.143587e-01 1.000000e+00
## 64  SW 109 - SW 112 -2.06901094 3.854506e-02 1.000000e+00
## 65  SW 110 - SW 112 -0.16287760 8.706148e-01 1.000000e+00
## 66  SW 111 - SW 112 -1.03171328 3.022065e-01 1.000000e+00
## 67  SW 101 - SW 113 -0.21965208 8.261421e-01 1.000000e+00
## 68  SW 102 - SW 113 -3.21753074 1.292992e-03 1.202482e-01
## 69  SW 103 - SW 113 -2.22118516 2.633842e-02 1.000000e+00
## 70  SW 104 - SW 113 -5.54156148 2.997863e-08 3.447542e-06
## 71  SW 105 - SW 113 -3.90627033 9.373165e-05 9.560628e-03
## 72  SW 106 - SW 113 -1.40028204 1.614289e-01 1.000000e+00
## 73  SW 107 - SW 113 -1.88565730 5.934115e-02 1.000000e+00
## 74  SW 108 - SW 113  0.15868932 8.739137e-01 1.000000e+00
## 75  SW 109 - SW 113 -0.90420339 3.658876e-01 1.000000e+00
## 76  SW 110 - SW 113  1.00192995 3.163774e-01 1.000000e+00
## 77  SW 111 - SW 113  0.13309427 8.941188e-01 1.000000e+00
## 78  SW 112 - SW 113  1.16480756 2.440969e-01 1.000000e+00
## 79  SW 101 - SW 114 -0.73992969 4.593427e-01 1.000000e+00
## 80  SW 102 - SW 114 -3.73780835 1.856314e-04 1.874877e-02
## 81  SW 103 - SW 114 -2.74146277 6.116629e-03 5.321467e-01
## 82  SW 104 - SW 114 -6.06183908 1.345738e-09 1.587971e-07
## 83  SW 105 - SW 114 -4.42654793 9.575311e-06 1.053284e-03
## 84  SW 106 - SW 114 -1.92055964 5.478725e-02 1.000000e+00
## 85  SW 107 - SW 114 -2.40593490 1.613114e-02 1.000000e+00
## 86  SW 108 - SW 114 -0.36158828 7.176597e-01 1.000000e+00
## 87  SW 109 - SW 114 -1.42448099 1.543073e-01 1.000000e+00
## 88  SW 110 - SW 114  0.48165235 6.300529e-01 1.000000e+00
## 89  SW 111 - SW 114 -0.38718333 6.986205e-01 1.000000e+00
## 90  SW 112 - SW 114  0.64452995 5.192318e-01 1.000000e+00
## 91  SW 113 - SW 114 -0.52027761 6.028701e-01 1.000000e+00
## 92  SW 101 - SW 115  1.39144011 1.640920e-01 1.000000e+00
## 93  SW 102 - SW 115 -1.60643855 1.081776e-01 1.000000e+00
## 94  SW 103 - SW 115 -0.61009297 5.418002e-01 1.000000e+00
## 95  SW 104 - SW 115 -3.93046929 8.478022e-05 8.732363e-03
## 96  SW 105 - SW 115 -2.29517813 2.172292e-02 1.000000e+00
## 97  SW 106 - SW 115  0.21081016 8.330354e-01 1.000000e+00
## 98  SW 107 - SW 115 -0.27456511 7.836504e-01 1.000000e+00
## 99  SW 108 - SW 115  1.76978152 7.676354e-02 1.000000e+00
## 100 SW 109 - SW 115  0.70688880 4.796356e-01 1.000000e+00
## 101 SW 110 - SW 115  2.61302214 8.974549e-03 7.718112e-01
## 102 SW 111 - SW 115  1.74418646 8.112658e-02 1.000000e+00
## 103 SW 112 - SW 115  2.77589975 5.504918e-03 4.844328e-01
## 104 SW 113 - SW 115  1.61109219 1.071596e-01 1.000000e+00
## 105 SW 114 - SW 115  2.13136980 3.305869e-02 1.000000e+00
## 106 SW 101 - SW 116  1.08429948 2.782320e-01 1.000000e+00
## 107 SW 102 - SW 116 -1.91357917 5.567395e-02 1.000000e+00
## 108 SW 103 - SW 116 -0.91723360 3.590202e-01 1.000000e+00
## 109 SW 104 - SW 116 -4.23760991 2.259118e-05 2.439848e-03
## 110 SW 105 - SW 116 -2.60231876 9.259574e-03 7.870638e-01
## 111 SW 106 - SW 116 -0.09633047 9.232581e-01 1.000000e+00
## 112 SW 107 - SW 116 -0.58170573 5.607649e-01 1.000000e+00
## 113 SW 108 - SW 116  1.46264089 1.435657e-01 1.000000e+00
## 114 SW 109 - SW 116  0.39974818 6.893420e-01 1.000000e+00
## 115 SW 110 - SW 116  2.30588152 2.111725e-02 1.000000e+00
## 116 SW 111 - SW 116  1.43704584 1.507050e-01 1.000000e+00
## 117 SW 112 - SW 116  2.46875912 1.355824e-02 1.000000e+00
## 118 SW 113 - SW 116  1.30395157 1.922501e-01 1.000000e+00
## 119 SW 114 - SW 116  1.82422917 6.811743e-02 1.000000e+00
## 120 SW 115 - SW 116 -0.30714063 7.587363e-01 1.000000e+00

Technical Discussion Points and Considerations:

ANOVA Results: 1. F-statistic and P-value: If ANOVA is performed, examine the F-statistic and its associated p-value. A significant p-value (p < 0.05) indicates that there are statistically significant differences in the means of the trait across at least some of the strains. 2. Eta-squared: Interpret the eta-squared value as the proportion of variance in the trait that is explained by the Strain factor. A higher eta-squared indicates a stronger effect of strain on the trait.

Tukey’s HSD Test: 1. Pairwise Comparisons: If ANOVA is significant, examine the Tukey’s HSD results to identify which specific strain pairs have significantly different means for the trait. 2. Confidence Intervals: Look at the confidence intervals for the mean differences. If the interval doesn’t include zero, it indicates a significant difference. 3. Compact Letter Displays (CLDs): Use the CLDs to visually summarize the results of Tukey’s HSD, grouping strains that are not significantly different from each other.

Kruskal-Wallis Test: 1. Chi-squared Statistic and P-value: If the Kruskal-Wallis test is performed, examine the chi-squared statistic and its associated p-value. A significant p-value (p < 0.05) indicates that there are statistically significant differences in the medians of the trait across at least some of the strains. 2. Epsilon-squared: Interpret epsilon-squared as the proportion of variance in the ranked data that is explained by the Strain factor.

Dunn’s Test: 1. Pairwise Comparisons: If the Kruskal-Wallis test is significant, examine the Dunn’s test results to identify which specific strain pairs have significantly different medians for the trait. 2. Adjusted P-values: Focus on the adjusted p-values (either Benjamini-Hochberg or Bonferroni) to account for multiple comparisons and control the risk of Type I errors.

# List of traits to analyze (excluding "Larval_Weight")
traits <- c("Single_Cocoon_Weight", "Cocoon_Shell_Weight", "Cocoon_Shell_Percentage")  

## Data Exploration and Assumption Checking for Other Traits (Both Locations)

# Choose a missing data handling strategy (imputation or removal)
# Example: Removing rows with missing values for any of the traits
data_complete <- data %>% filter(complete.cases(data[, traits]))

# Loop through each trait
for (trait in traits) {
  cat("\n\n--- Analyzing", trait, "---\n")

  # Fit a temporary linear mixed-effects model to check assumptions (using data_complete)
  temp_model <- lmer(as.formula(paste(trait, "~ Strain * Location + (1 | Replicate)")), data = data_complete)

  # Boxplot with outliers, filled by Location
  cat("\nBoxplot for", trait, ":\n")
  boxplot_outliers <- ggplot(data_complete, aes(x = Strain, y = .data[[trait]], fill = Location)) +
    geom_boxplot(outlier.shape = 16, outlier.size = 2) + 
    labs(title = paste("Boxplot of", trait, "by Strain and Location"),
         x = "Strain", y = trait) +
    theme_bw() +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) 
  print(boxplot_outliers) 

  # Outlier Detection and Treatment (using Tukey's fences on residuals)
  resid_trait <- residuals(temp_model) 
  Q1 <- quantile(resid_trait, 0.25)
  Q3 <- quantile(resid_trait, 0.75)
  IQR <- Q3 - Q1

  lower_bound <- Q1 - 1.5 * IQR
  upper_bound <- Q3 + 1.5 * IQR

  outliers <- data_complete %>%  # Use data_complete here
    mutate(residual = resid_trait) %>% 
    filter(residual < lower_bound | residual > upper_bound) %>% 
    dplyr::select(Strain, Replicate, Location, all_of(trait)) 

  if (nrow(outliers) > 0) {
    cat("\nOutliers for", trait, "(based on residuals):\n")
    print(outliers)
  } else {
    cat("\nNo outliers found for", trait, "\n")
  }

  # Normality test (Shapiro-Wilk) on residuals
  normality_test <- shapiro.test(resid_trait)
  cat("\nNormality Test (Shapiro-Wilk) on Residuals:\n")
  print(normality_test)

  # Distribution plots 
  cat("\nDistribution Plots for", trait, ":\n")

  # Combined distribution plot 
  p1 <- ggplot(data_complete, aes(x = .data[[trait]], fill = Location)) +  # Use data_complete here
    geom_density(alpha = 0.5) + 
    labs(title = paste("Distribution of", trait, "(All Strains)"),
         x = trait, y = "Density") +
    theme_bw()

  # Distribution plots by strain and location
  p2 <- ggplot(data_complete, aes(x = .data[[trait]], fill = Location)) +  # Use data_complete here
    geom_density(alpha = 0.5) + 
    labs(x = trait, y = "Density") +
    facet_wrap(~ Strain, ncol = 6) +  
    theme_bw() +
    theme(strip.text = element_text(size = 8)) 

  # Arrange and print the plots
  print(ggarrange(p1, p2, nrow = 2, heights = c(1, 3)))  
}

## 
## 
## --- Analyzing Single_Cocoon_Weight ---
## 
## Boxplot for Single_Cocoon_Weight :

## 
## Outliers for Single_Cocoon_Weight (based on residuals):
## # A tibble: 9 × 4
##   Strain Replicate Location Single_Cocoon_Weight
##   <fct>      <dbl> <fct>                   <dbl>
## 1 SW 101         2 BALUBAL                 2.27 
## 2 SW 101         2 BALUBAL                 0.946
## 3 SW 102         3 BALUBAL                 2.29 
## 4 SW 104         3 BALUBAL                 0.816
## 5 SW 109         1 BALUBAL                 0.359
## 6 SW 110         2 BALUBAL                 2.32 
## 7 SW 111         2 BALUBAL                 2.8  
## 8 SW 114         3 BALUBAL                 0.914
## 9 SW 115         3 BALUBAL                 0.708
## 
## Normality Test (Shapiro-Wilk) on Residuals:
## 
##  Shapiro-Wilk normality test
## 
## data:  resid_trait
## W = 0.98377, p-value = 1.247e-08
## 
## 
## Distribution Plots for Single_Cocoon_Weight :

## 
## 
## --- Analyzing Cocoon_Shell_Weight ---
## 
## Boxplot for Cocoon_Shell_Weight :

## 
## Outliers for Cocoon_Shell_Weight (based on residuals):
## # A tibble: 26 × 4
##    Strain Replicate Location Cocoon_Shell_Weight
##    <fct>      <dbl> <fct>                  <dbl>
##  1 SW 104         2 TCMO                   0.77 
##  2 SW 111         2 TCMO                   0.4  
##  3 SW 113         3 TCMO                   0.414
##  4 SW 101         2 BALUBAL                0.468
##  5 SW 101         2 BALUBAL                0.187
##  6 SW 101         2 BALUBAL                0.21 
##  7 SW 101         3 BALUBAL                0.226
##  8 SW 102         1 BALUBAL                0.219
##  9 SW 102         3 BALUBAL                0.443
## 10 SW 103         1 BALUBAL                0.407
## # ℹ 16 more rows
## 
## Normality Test (Shapiro-Wilk) on Residuals:
## 
##  Shapiro-Wilk normality test
## 
## data:  resid_trait
## W = 0.92155, p-value < 2.2e-16
## 
## 
## Distribution Plots for Cocoon_Shell_Weight :

## 
## 
## --- Analyzing Cocoon_Shell_Percentage ---

## boundary (singular) fit: see help('isSingular')

## 
## Boxplot for Cocoon_Shell_Percentage :

## 
## Outliers for Cocoon_Shell_Percentage (based on residuals):
## # A tibble: 12 × 4
##    Strain Replicate Location Cocoon_Shell_Percentage
##    <fct>      <dbl> <fct>                      <dbl>
##  1 SW 104         2 TCMO                        65.8
##  2 SW 106         2 TCMO                        28.9
##  3 SW 111         2 TCMO                        29.1
##  4 SW 113         2 TCMO                        28.2
##  5 SW 115         2 TCMO                        27.6
##  6 SW 103         1 BALUBAL                     33.4
##  7 SW 103         3 BALUBAL                     11.7
##  8 SW 109         1 BALUBAL                     85.5
##  9 SW 114         1 BALUBAL                     12.3
## 10 SW 115         1 BALUBAL                     10.7
## 11 SW 115         2 BALUBAL                     30.4
## 12 SW 116         2 BALUBAL                     13.5
## 
## Normality Test (Shapiro-Wilk) on Residuals:
## 
##  Shapiro-Wilk normality test
## 
## data:  resid_trait
## W = 0.64682, p-value < 2.2e-16
## 
## 
## Distribution Plots for Cocoon_Shell_Percentage :

ANOVA, Transformation, and Non-Parametric Testing for Other Traits (Both Locations)

# Loop through each trait
for (trait in traits) {
  cat("\n\n--- Analyzing", trait, "---\n")
  
  # Fit the linear mixed-effects model (LMM) with 'Location' and interaction
  lmer_model <- lmer(as.formula(paste(trait, "~ Strain * Location + (1 | Replicate)")), data = data_complete) 
  
  # Check model assumptions
  check_model(lmer_model) 
  
  # Check for non-normality or heteroscedasticity
  normality_p_value <- shapiro.test(residuals(lmer_model))$p.value
  levene_result <- leveneTest(residuals(lmer_model) ~ data_complete$Strain * data_complete$Location)  
  homogeneity_p_value <- levene_result$"Pr(>F)"[1] 
  
  # Create means dataframe for CLDs and boxplot (outside the conditional blocks)
  means <- aggregate(as.formula(paste(trait, "~ Strain")), data = data_complete, FUN = mean)
  colnames(means)[colnames(means) == trait] <- paste0("Mean_", trait)
  
  if (normality_p_value >= 0.05 & homogeneity_p_value >= 0.05) {
    # Perform ANOVA directly on original data if both assumptions are met
    cat("\n**Performing ANOVA on original data:**\n")
    anova_result <- anova(lmer_model)
    print(anova_result)
    
    # Calculate and print eta-squared for the LMM
    eta_squared <- performance::r2(lmer_model)$Marginal$R2
    cat("Eta-squared (Marginal R-squared):", eta_squared, "\n")
    
    # Tukey's HSD if ANOVA is significant
    if (anova_result$`Pr(>F)`[1] < 0.05) {
      tukey_result <- emmeans(lmer_model, pairwise ~ Strain | Location, adjust = "tukey")
      print(tukey_result)
      
      # Extract CLD from Tukey's results
      cld_data <- as.data.frame(tukey_result$contrasts) %>%
        mutate(sig = ifelse(p.value < 0.05, "*", " ")) %>%
        dplyr::select(contrast, estimate, SE, df, t.ratio, p.value, sig)
      
      # Create CLDs using 'multcompLetters4' 
      cld_letters <- multcompLetters4(tukey_result$emmeans, tukey_result$contrasts)
      
      # Print the compact letter display
      cat("\n**Compact Letter Displays (CLDs):**\n")
      print(cld_letters$LetterMatrix)
      
      # Boxplot with CLDs, faceted by Location
      # Ensure complete cases for plotting
      data_complete_plot <- data_complete %>% filter(!is.na(.data[[trait]]))
      
      # Base boxplot
      p <- ggplot(data_complete_plot, aes(x = Strain, y = .data[[trait]], fill = Location)) + 
        geom_boxplot() +
        labs(title = paste("Boxplot of", trait, "by Strain and Location (ANOVA with Tukey HSD)"),
             x = "Strain", y = trait) +
        theme_bw() +
        theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
        facet_wrap(~ Location)  # Facet by Location
      
      # Add labels from cld_letters
      for (loc in levels(data_complete$Location)) {
        cld_data_loc <- cld_letters$LetterMatrix[, loc] %>% 
          data.frame(Letters = .) %>% 
          rownames_to_column("Strain") %>%
          mutate(y_pos = aggregate(as.formula(paste(trait, "~ Strain")), data = data_complete %>% filter(Location == loc), FUN = mean)[[2]] + 0.1)
        
        p <- p + geom_text(data = cld_data_loc, 
                           aes(x = Strain, y = y_pos, label = Letters),
                           vjust = -0.5, size = 3)
      }
      
      # Print the plot
      print(p)
      
    } else {
      cat("\nANOVA not significant. No post-hoc test needed.\n")
    }
    
  } else {
    # Attempt log transformation if at least one assumption is not met
    cat("\n**Attempting log transformation...**\n")
    transformed_data <- data_complete 
    transformed_data[[trait]] <- log1p(transformed_data[[trait]])
    transformed_model <- lmer(as.formula(paste(trait, "~ Strain * Location + (1 | Replicate)")), data = transformed_data)
    check_model(transformed_model)
    
    # Re-check assumptions after transformation
    normality_p_value_transformed <- shapiro.test(residuals(transformed_model))$p.value
    levene_result_transformed <- leveneTest(residuals(transformed_model) ~ transformed_data$Strain * transformed_data$Location)
    homogeneity_p_value_transformed <- levene_result_transformed$"Pr(>F)"[1] 
    
    # Print the results of assumption checks after transformation
    cat("\nAssumption Checks after Transformation:\n")
    cat("  Normality (Shapiro-Wilk) p-value:", normality_p_value_transformed, "\n")
    cat("  Homogeneity of Variance (Levene's) p-value:", homogeneity_p_value_transformed, "\n")
    
    if (normality_p_value_transformed >= 0.05 & homogeneity_p_value_transformed >= 0.05) { 
      # Use ANOVA on transformed data if both assumptions are now met
      cat("\n**Performing ANOVA on transformed data:**\n")
      anova_result <- anova(transformed_model)
      print(anova_result)
      
      # Tukey's HSD if ANOVA is significant
      if (anova_result$`Pr(>F)`[1] < 0.05) {
        tukey_result <- emmeans(transformed_model, pairwise ~ Strain | Location, adjust = "tukey")
        print(tukey_result)
        
        # Extract CLD from Tukey's results
        cld_data <- as.data.frame(tukey_result$contrasts) %>%
          mutate(sig = ifelse(p.value < 0.05, "*", " ")) %>%
          dplyr::select(contrast, estimate, SE, df, t.ratio, p.value, sig)
        
        # Create CLDs using 'multcompLetters4' 
        cld_letters <- multcompLetters4(tukey_result$emmeans, tukey_result$contrasts)
        
        # Print the compact letter display
        cat("\n**Compact Letter Displays (CLDs):**\n")
        print(cld_letters$LetterMatrix)
        
        # Boxplot with CLDs, faceted by Location
        # Ensure complete cases for plotting
        data_complete_plot <- transformed_data %>% filter(!is.na(.data[[trait]]))
        
        # Base boxplot
        p <- ggplot(data_complete_plot, aes(x = Strain, y = .data[[trait]], fill = Location)) + 
          geom_boxplot() +
          labs(title = paste("Boxplot of", trait, "by Strain and Location (ANOVA with Tukey HSD)"),
               x = "Strain", y = trait) +
          theme_bw() +
          theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
          facet_wrap(~ Location)  # Facet by Location
        
        # Add labels from cld_letters
        for (loc in levels(data_complete$Location)) {
          cld_data_loc <- cld_letters$LetterMatrix[,loc] %>% 
            data.frame(Letters = .) %>% 
            rownames_to_column("Strain") %>%
            mutate(y_pos = aggregate(as.formula(paste(trait, "~ Strain")), data = transformed_data %>% filter(Location == loc), FUN = mean)[[2]] + 0.1)
          
          p <- p + geom_text(data = cld_data_loc, 
                             aes(x = Strain, y = y_pos, label = Letters),
                             vjust = -0.5, size = 3)
        }
        
        # Print the plot
        print(p)
        
      } else {
        cat("\nANOVA not significant. No post-hoc test needed.\n")
      }
      
    } else {
      # Use Kruskal-Wallis if assumptions are still violated after transformation
      cat("\n**Performing Kruskal-Wallis test:**\n")
      kruskal_result <- kruskal.test(as.formula(paste(trait, "~ Strain")), data = data_complete)  
      print(kruskal_result)
      
      # Calculate and print epsilon-squared (effect size for Kruskal-Wallis)
      k <- length(unique(data_complete$Strain)) # Number of groups
      n <- nrow(data_complete) # Total number of observations
      epsilon_squared <- (kruskal_result$statistic - (k - 1)) / (n - k)
      cat("Epsilon-squared:", epsilon_squared, "\n")
      
      # Dunn's Test if Kruskal-Wallis is significant
      if (kruskal_result$p.value < 0.05) {
        cat("\n**Dunn's Test (Post-hoc with Holm correction):**\n")
        dunn_result <- dunnTest(as.formula(paste(trait, "~ Strain")), data = data_complete, method="holm") 
        print(dunn_result)
        
        # Get unique strain names
        strains <- unique(data_complete$Strain)
        
        # Initialize a vector to store group letters
        group_letters <- rep("", length(strains))
        names(group_letters) <- strains
        
        # Assign group letters based on adjusted p-values (example threshold of 0.05)
        for (i in 1:nrow(dunn_result$res)) {
          comparison <- strsplit(dunn_result$res$Comparison[i], " - ")
          strain1 <- comparison[[1]][1]
          strain2 <- comparison[[1]][2]
          
          if (dunn_result$res$P.adj[i] < 0.05) {
            # Strains are significantly different, assign different letters
            if (any(group_letters[strain1] == "") && any(group_letters[strain2] == "")) {
              # Both strains have no group yet
              group_letters[strain1] <- "a"
              group_letters[strain2] <- "b"
            } else if (any(group_letters[strain1] == "")) {
              # Only strain1 has no group
              group_letters[strain1] <- setdiff(letters, group_letters[strain2])[1]
            } else if (any(group_letters[strain2] == "")) {
              # Only strain2 has no group
              group_letters[strain2] <- setdiff(letters, group_letters[strain1])[1]
            } else {
              # Both strains have groups, ensure they are different
              if (group_letters[strain1] == group_letters[strain2]) {
                group_letters[strain2] <- setdiff(letters, group_letters[strain1])[1]
              }
            }
          } else {
            # Strains are not significantly different
            if (any(group_letters[strain1] == "") && any(group_letters[strain2] == "")) {
              # Both strains have no group yet
              group_letters[strain1] <- "a"
              group_letters[strain2] <- "a"
            } else if (any(group_letters[strain1] == "")) {
              # Only strain1 has no group
              group_letters[strain1] <- group_letters[strain2]
            } else if (any(group_letters[strain2] == "")) {
              # Only strain2 has no group
              group_letters[strain2] <- group_letters[strain1]
            }
          }
        }
        
        # Create the cld_data dataframe
        cld_data <- data.frame(Strain = strains,
                               Trait = means[[2]],
                               .group = group_letters)
        
        # Rename the trait column
        colnames(cld_data)[colnames(cld_data) == "Trait"] <- trait
        
        cat("\n**Compact Letter Displays (CLDs - Holm):**\n")
        print(cld_data)
        
        # Boxplot of data with group labels
        plotting_data <- merge(data_complete, cld_data, by = "Strain")
        
        # Use 'data_complete' in ggplot to access the trait column
        ggplot(data_complete, aes(x = Strain, y = .data[[trait]], fill = plotting_data$.group[match(Strain, plotting_data$Strain)])) + 
          geom_boxplot() +
          labs(title = paste("Boxplot of", trait, "by Strain (Kruskal-Wallis with Dunn's Test)"),
               x = "Strain", y = trait) +
          theme_bw() +
          theme(axis.text.x = element_text(angle = 45, hjust = 1))
      } else {
        cat("\nKruskal-Wallis not significant. No post-hoc test needed.\n")
      }
    }
  }
}

## 
## 
## --- Analyzing Single_Cocoon_Weight ---
## 
## **Attempting log transformation...**
## 
## Assumption Checks after Transformation:
##   Normality (Shapiro-Wilk) p-value: 1.449919e-10 
##   Homogeneity of Variance (Levene's) p-value: 0.01517713 
## 
## **Performing Kruskal-Wallis test:**
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Single_Cocoon_Weight by Strain
## Kruskal-Wallis chi-squared = 51.435, df = 15, p-value = 7e-06
## 
## Epsilon-squared: 0.03995115 
## 
## **Dunn's Test (Post-hoc with Holm correction):**

## Dunn (1964) Kruskal-Wallis multiple comparison

##   p-values adjusted with the Holm method.

##          Comparison           Z      P.unadj        P.adj
## 1   SW 101 - SW 102  1.72201660 8.506651e-02 1.0000000000
## 2   SW 101 - SW 103  1.31188029 1.895605e-01 1.0000000000
## 3   SW 102 - SW 103 -0.46411450 6.425657e-01 1.0000000000
## 4   SW 101 - SW 104  3.57289381 3.530580e-04 0.0395424918
## 5   SW 102 - SW 104  1.73468765 8.279615e-02 1.0000000000
## 6   SW 103 - SW 104  2.28146404 2.252100e-02 1.0000000000
## 7   SW 101 - SW 105  2.69628915 7.011677e-03 0.7151910833
## 8   SW 102 - SW 105  0.85637203 3.917920e-01 1.0000000000
## 9   SW 103 - SW 105  1.37862842 1.680094e-01 1.0000000000
## 10  SW 104 - SW 105 -0.93368133 3.504683e-01 1.0000000000
## 11  SW 101 - SW 106  1.03196001 3.020909e-01 1.0000000000
## 12  SW 102 - SW 106 -0.73808134 4.604650e-01 1.0000000000
## 13  SW 103 - SW 106 -0.28426540 7.762070e-01 1.0000000000
## 14  SW 104 - SW 106 -2.56845769 1.021522e-02 1.0000000000
## 15  SW 105 - SW 106 -1.66867425 9.518195e-02 1.0000000000
## 16  SW 101 - SW 107 -0.30873655 7.575219e-01 1.0000000000
## 17  SW 102 - SW 107 -2.01638535 4.375970e-02 1.0000000000
## 18  SW 103 - SW 107 -1.61931689 1.053791e-01 1.0000000000
## 19  SW 104 - SW 107 -3.87339590 1.073293e-04 0.0125575239
## 20  SW 105 - SW 107 -3.00372575 2.666956e-03 0.2880312599
## 21  SW 106 - SW 107 -1.34069656 1.800190e-01 1.0000000000
## 22  SW 101 - SW 108 -0.07567349 9.396789e-01 1.0000000000
## 23  SW 102 - SW 108 -1.73717793 8.235577e-02 1.0000000000
## 24  SW 103 - SW 108 -1.33991654 1.802725e-01 1.0000000000
## 25  SW 104 - SW 108 -3.52321059 4.263524e-04 0.0473251209
## 26  SW 105 - SW 108 -2.67436684 7.487050e-03 0.7561920704
## 27  SW 106 - SW 108 -1.07009503 2.845765e-01 1.0000000000
## 28  SW 107 - SW 108  0.22183250 8.244443e-01 1.0000000000
## 29  SW 101 - SW 109 -0.56842946 5.697434e-01 1.0000000000
## 30  SW 102 - SW 109 -2.26399282 2.357456e-02 1.0000000000
## 31  SW 103 - SW 109 -1.87791635 6.039261e-02 1.0000000000
## 32  SW 104 - SW 109 -4.12616241 3.688668e-05 0.0043526284
## 33  SW 105 - SW 109 -3.26232521 1.105023e-03 0.1204475030
## 34  SW 106 - SW 109 -1.60038946 1.095122e-01 1.0000000000
## 35  SW 107 - SW 109 -0.25969291 7.951007e-01 1.0000000000
## 36  SW 108 - SW 109 -0.47207884 6.368705e-01 1.0000000000
## 37  SW 101 - SW 110 -0.20893523 8.344988e-01 1.0000000000
## 38  SW 102 - SW 110 -1.87980234 6.013502e-02 1.0000000000
## 39  SW 103 - SW 110 -1.48527451 1.374711e-01 1.0000000000
## 40  SW 104 - SW 110 -3.68607099 2.277428e-04 0.0261904273
## 41  SW 105 - SW 110 -2.83305722 4.610513e-03 0.4887143988
## 42  SW 106 - SW 110 -1.21337135 2.249879e-01 1.0000000000
## 43  SW 107 - SW 110  0.09156686 9.270422e-01 1.0000000000
## 44  SW 108 - SW 110 -0.12792055 8.982119e-01 1.0000000000
## 45  SW 109 - SW 110  0.34433337 7.305956e-01 1.0000000000
## 46  SW 101 - SW 111 -0.89879289 3.687630e-01 1.0000000000
## 47  SW 102 - SW 111 -2.57898200 9.909195e-03 0.9909195130
## 48  SW 103 - SW 111 -2.20688878 2.732183e-02 1.0000000000
## 49  SW 104 - SW 111 -4.44771457 8.678875e-06 0.0010327861
## 50  SW 105 - SW 111 -3.59129763 3.290356e-04 0.0371810247
## 51  SW 106 - SW 111 -1.93075290 5.351362e-02 1.0000000000
## 52  SW 107 - SW 111 -0.59005634 5.551529e-01 1.0000000000
## 53  SW 108 - SW 111 -0.79042502 4.292796e-01 1.0000000000
## 54  SW 109 - SW 111 -0.33036343 7.411254e-01 1.0000000000
## 55  SW 110 - SW 111 -0.66588553 5.054843e-01 1.0000000000
## 56  SW 101 - SW 112 -1.08628265 2.773540e-01 1.0000000000
## 57  SW 102 - SW 112 -2.75774647 5.820132e-03 0.6111138451
## 58  SW 103 - SW 112 -2.39358911 1.668443e-02 1.0000000000
## 59  SW 104 - SW 112 -4.63020370 3.653062e-06 0.0004383674
## 60  SW 105 - SW 112 -3.77799796 1.580942e-04 0.0183389242
## 61  SW 106 - SW 112 -2.11824266 3.415452e-02 1.0000000000
## 62  SW 107 - SW 112 -0.77754610 4.368366e-01 1.0000000000
## 63  SW 108 - SW 112 -0.97109467 3.315011e-01 1.0000000000
## 64  SW 109 - SW 112 -0.51785320 6.045607e-01 1.0000000000
## 65  SW 110 - SW 112 -0.84837466 3.962293e-01 1.0000000000
## 66  SW 111 - SW 112 -0.18748976 8.512766e-01 1.0000000000
## 67  SW 101 - SW 113  0.27552992 7.829091e-01 1.0000000000
## 68  SW 102 - SW 113 -1.45930913 1.444800e-01 1.0000000000
## 69  SW 103 - SW 113 -1.03751051 2.994980e-01 1.0000000000
## 70  SW 104 - SW 113 -3.30471268 9.507381e-04 0.1045811922
## 71  SW 105 - SW 113 -2.42191936 1.543878e-02 1.0000000000
## 72  SW 106 - SW 113 -0.75643009 4.493914e-01 1.0000000000
## 73  SW 107 - SW 113  0.58426647 5.590410e-01 1.0000000000
## 74  SW 108 - SW 113  0.34118076 7.329675e-01 1.0000000000
## 75  SW 109 - SW 113  0.84395937 3.986921e-01 1.0000000000
## 76  SW 110 - SW 113  0.47711636 6.332793e-01 1.0000000000
## 77  SW 111 - SW 113  1.17432281 2.402658e-01 1.0000000000
## 78  SW 112 - SW 113  1.36181257 1.732570e-01 1.0000000000
## 79  SW 101 - SW 114 -0.04785161 9.618345e-01 1.0000000000
## 80  SW 102 - SW 114 -1.76764132 7.712089e-02 1.0000000000
## 81  SW 103 - SW 114 -1.35953042 1.739786e-01 1.0000000000
## 82  SW 104 - SW 114 -3.61946914 2.952080e-04 0.0336537128
## 83  SW 105 - SW 114 -2.74393928 6.070677e-03 0.6313503685
## 84  SW 106 - SW 114 -1.07981162 2.802261e-01 1.0000000000
## 85  SW 107 - SW 114  0.26088494 7.941812e-01 1.0000000000
## 86  SW 108 - SW 114  0.02956252 9.764160e-01 1.0000000000
## 87  SW 109 - SW 114  0.52057785 6.026609e-01 1.0000000000
## 88  SW 110 - SW 114  0.16235990 8.710224e-01 1.0000000000
## 89  SW 111 - SW 114  0.85094128 3.948020e-01 1.0000000000
## 90  SW 112 - SW 114  1.03843104 2.990694e-01 1.0000000000
## 91  SW 113 - SW 114 -0.32338153 7.464063e-01 1.0000000000
## 92  SW 101 - SW 115  1.15984812 2.461106e-01 1.0000000000
## 93  SW 102 - SW 115 -0.61614481 5.377990e-01 1.0000000000
## 94  SW 103 - SW 115 -0.15691577 8.753112e-01 1.0000000000
## 95  SW 104 - SW 115 -2.44398054 1.452621e-02 1.0000000000
## 96  SW 105 - SW 115 -1.54132462 1.232378e-01 1.0000000000
## 97  SW 106 - SW 115  0.12788811 8.982375e-01 1.0000000000
## 98  SW 107 - SW 115  1.46858467 1.419455e-01 1.0000000000
## 99  SW 108 - SW 115  1.19333110 2.327397e-01 1.0000000000
## 100 SW 109 - SW 115  1.72827758 8.393847e-02 1.0000000000
## 101 SW 110 - SW 115  1.33784850 1.809458e-01 1.0000000000
## 102 SW 111 - SW 115  2.05864101 3.952864e-02 1.0000000000
## 103 SW 112 - SW 115  2.24613077 2.469563e-02 1.0000000000
## 104 SW 113 - SW 115  0.88431820 3.765245e-01 1.0000000000
## 105 SW 114 - SW 115  1.20769973 2.271628e-01 1.0000000000
## 106 SW 101 - SW 116  1.82721632 6.766725e-02 1.0000000000
## 107 SW 102 - SW 116  0.02016580 9.839111e-01 0.9839111085
## 108 SW 103 - SW 116  0.50764245 6.117041e-01 1.0000000000
## 109 SW 104 - SW 116 -1.79441204 7.274743e-02 1.0000000000
## 110 SW 105 - SW 116 -0.87676641 3.806135e-01 1.0000000000
## 111 SW 106 - SW 116  0.79525631 4.264644e-01 1.0000000000
## 112 SW 107 - SW 116  2.13595287 3.268326e-02 1.0000000000
## 113 SW 108 - SW 116  1.83642318 6.629509e-02 1.0000000000
## 114 SW 109 - SW 116  2.39564577 1.659112e-02 1.0000000000
## 115 SW 110 - SW 116  1.98741700 4.687621e-02 1.0000000000
## 116 SW 111 - SW 116  2.72600921 6.410521e-03 0.6602836639
## 117 SW 112 - SW 116  2.91349897 3.574030e-03 0.3824212260
## 118 SW 113 - SW 116  1.55168640 1.207373e-01 1.0000000000
## 119 SW 114 - SW 116  1.87506793 6.078338e-02 1.0000000000
## 120 SW 115 - SW 116  0.66736820 5.045370e-01 1.0000000000
## 
## **Compact Letter Displays (CLDs - Holm):**
##        Strain Single_Cocoon_Weight .group
## SW 101 SW 101             1.483800      a
## SW 102 SW 102             1.404540      a
## SW 103 SW 103             1.419119      a
## SW 104 SW 104             1.315407      b
## SW 105 SW 105             1.363203      a
## SW 106 SW 106             1.431433      a
## SW 107 SW 107             1.501850      a
## SW 108 SW 108             1.465827      a
## SW 109 SW 109             1.496533      a
## SW 110 SW 110             1.484426      a
## SW 111 SW 111             1.532417      b
## SW 112 SW 112             1.532417      b
## SW 113 SW 113             1.459350      a
## SW 114 SW 114             1.473400      a
## SW 115 SW 115             1.420700      a
## SW 116 SW 116             1.402300      a
## 
## 
## --- Analyzing Cocoon_Shell_Weight ---
## 
## **Attempting log transformation...**
## 
## Assumption Checks after Transformation:
##   Normality (Shapiro-Wilk) p-value: 2.21599e-18 
##   Homogeneity of Variance (Levene's) p-value: 8.501889e-05 
## 
## **Performing Kruskal-Wallis test:**
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Cocoon_Shell_Weight by Strain
## Kruskal-Wallis chi-squared = 36.722, df = 15, p-value = 0.00139
## 
## Epsilon-squared: 0.02381752 
## 
## **Dunn's Test (Post-hoc with Holm correction):**

## Dunn (1964) Kruskal-Wallis multiple comparison
##   p-values adjusted with the Holm method.

##          Comparison            Z      P.unadj      P.adj
## 1   SW 101 - SW 102  2.047025256 0.0406556048 1.00000000
## 2   SW 101 - SW 103 -0.587136437 0.5571120778 1.00000000
## 3   SW 102 - SW 103 -2.599211805 0.0093438101 1.00000000
## 4   SW 101 - SW 104  1.723660739 0.0847691065 1.00000000
## 5   SW 102 - SW 104 -0.349819939 0.7264738344 1.00000000
## 6   SW 103 - SW 104  2.288385944 0.0221150555 1.00000000
## 7   SW 101 - SW 105  0.680251852 0.4963450056 1.00000000
## 8   SW 102 - SW 105 -1.390342645 0.1644248541 1.00000000
## 9   SW 103 - SW 105  1.262096456 0.2069140802 1.00000000
## 10  SW 104 - SW 105 -1.054527881 0.2916413027 1.00000000
## 11  SW 101 - SW 106  0.048705804 0.9611537507 1.00000000
## 12  SW 102 - SW 106 -2.000586095 0.0454370133 1.00000000
## 13  SW 103 - SW 106  0.635637163 0.5250129416 1.00000000
## 14  SW 104 - SW 106 -1.676253991 0.0936884527 1.00000000
## 15  SW 105 - SW 106 -0.631751126 0.5275495144 1.00000000
## 16  SW 101 - SW 107 -0.550069041 0.5822720199 1.00000000
## 17  SW 102 - SW 107 -2.571495509 0.0101260329 1.00000000
## 18  SW 103 - SW 107  0.039383486 0.9685846459 1.00000000
## 19  SW 104 - SW 107 -2.259058629 0.0238797381 1.00000000
## 20  SW 105 - SW 107 -1.228004803 0.2194451651 1.00000000
## 21  SW 106 - SW 107 -0.598774845 0.5493230390 1.00000000
## 22  SW 101 - SW 108  0.055997743 0.9553436051 1.00000000
## 23  SW 102 - SW 108 -1.925432728 0.0541752445 1.00000000
## 24  SW 103 - SW 108  0.621727410 0.5341211247 1.00000000
## 25  SW 104 - SW 108 -1.609485045 0.1075103226 1.00000000
## 26  SW 105 - SW 108 -0.599925201 0.5485560865 1.00000000
## 27  SW 106 - SW 108  0.009063656 0.9927683475 0.99276835
## 28  SW 107 - SW 108  0.586057534 0.5578368520 1.00000000
## 29  SW 101 - SW 109  0.413488431 0.6792487968 1.00000000
## 30  SW 102 - SW 109 -1.652779506 0.0983757464 1.00000000
## 31  SW 103 - SW 109  0.998883856 0.3178509575 1.00000000
## 32  SW 104 - SW 109 -1.321200654 0.1864344683 1.00000000
## 33  SW 105 - SW 109 -0.268504432 0.7883110636 1.00000000
## 34  SW 106 - SW 109  0.364782627 0.7152736741 1.00000000
## 35  SW 107 - SW 109  0.963557472 0.3352678369 1.00000000
## 36  SW 108 - SW 109  0.342449679 0.7320125075 1.00000000
## 37  SW 101 - SW 110  0.246942377 0.8049528194 1.00000000
## 38  SW 102 - SW 110 -1.761196199 0.0782052012 1.00000000
## 39  SW 103 - SW 110  0.817560074 0.4136084331 1.00000000
## 40  SW 104 - SW 110 -1.439327018 0.1500578913 1.00000000
## 41  SW 105 - SW 110 -0.416297988 0.6771919645 1.00000000
## 42  SW 106 - SW 110  0.199535629 0.8418437757 1.00000000
## 43  SW 107 - SW 110  0.782340267 0.4340146283 1.00000000
## 44  SW 108 - SW 110  0.183801244 0.8541693799 1.00000000
## 45  SW 109 - SW 110 -0.155517708 0.8764131946 1.00000000
## 46  SW 101 - SW 111 -1.559948117 0.1187721423 1.00000000
## 47  SW 102 - SW 111 -3.534377427 0.0004087370 0.04863970
## 48  SW 103 - SW 111 -0.966243449 0.3339223764 1.00000000
## 49  SW 104 - SW 111 -3.242002742 0.0011869288 0.13887067
## 50  SW 105 - SW 111 -2.233631737 0.0255073139 1.00000000
## 51  SW 106 - SW 111 -1.608653921 0.1076920368 1.00000000
## 52  SW 107 - SW 111 -1.009879076 0.3125532286 1.00000000
## 53  SW 108 - SW 111 -1.559201362 0.1189487286 1.00000000
## 54  SW 109 - SW 111 -1.973436548 0.0484458470 1.00000000
## 55  SW 110 - SW 111 -1.765284380 0.0775159834 1.00000000
## 56  SW 101 - SW 112 -1.018904977 0.3082480809 1.00000000
## 57  SW 102 - SW 112 -3.018513034 0.0025401847 0.28958106
## 58  SW 103 - SW 112 -0.427478394 0.6690309186 1.00000000
## 59  SW 104 - SW 112 -2.715390019 0.0066197760 0.73479513
## 60  SW 105 - SW 112 -1.694866683 0.0901007141 1.00000000
## 61  SW 106 - SW 112 -1.067610780 0.2856961188 1.00000000
## 62  SW 107 - SW 112 -0.468835935 0.6391869117 1.00000000
## 63  SW 108 - SW 112 -1.037839146 0.2993449486 1.00000000
## 64  SW 109 - SW 112 -1.432393408 0.1520312641 1.00000000
## 65  SW 110 - SW 112 -1.238671657 0.2154671192 1.00000000
## 66  SW 111 - SW 112  0.541043141 0.5884778478 1.00000000
## 67  SW 101 - SW 113 -1.180860289 0.2376582291 1.00000000
## 68  SW 102 - SW 113 -3.172931365 0.0015090820 0.17505351
## 69  SW 103 - SW 113 -0.588751786 0.5560277928 1.00000000
## 70  SW 104 - SW 113 -2.873025745 0.0040656098 0.45534830
## 71  SW 105 - SW 113 -1.856140075 0.0634335949 1.00000000
## 72  SW 106 - SW 113 -1.229566092 0.2188596339 1.00000000
## 73  SW 107 - SW 113 -0.630791247 0.5281770270 1.00000000
## 74  SW 108 - SW 113 -1.193903190 0.2325158396 1.00000000
## 75  SW 109 - SW 113 -1.594348720 0.1108579512 1.00000000
## 76  SW 110 - SW 113 -1.396307382 0.1626219517 1.00000000
## 77  SW 111 - SW 113  0.379087828 0.7046226444 1.00000000
## 78  SW 112 - SW 113 -0.161955312 0.8713410449 1.00000000
## 79  SW 101 - SW 114  0.031846102 0.9745947809 1.00000000
## 80  SW 102 - SW 114 -2.016661189 0.0437308827 1.00000000
## 81  SW 103 - SW 114  0.618848450 0.5360162011 1.00000000
## 82  SW 104 - SW 114 -1.692664020 0.0905194377 1.00000000
## 83  SW 105 - SW 114 -0.648539839 0.5166358524 1.00000000
## 84  SW 106 - SW 114 -0.016859701 0.9865485419 1.00000000
## 85  SW 107 - SW 114  0.581915144 0.5606238395 1.00000000
## 86  SW 108 - SW 114 -0.025310071 0.9798076410 1.00000000
## 87  SW 109 - SW 114 -0.381642329 0.7027266827 1.00000000
## 88  SW 110 - SW 114 -0.215945657 0.8290301073 1.00000000
## 89  SW 111 - SW 114  1.591794220 0.1114309494 1.00000000
## 90  SW 112 - SW 114  1.050751079 0.2933729296 1.00000000
## 91  SW 113 - SW 114  1.212706391 0.2252420954 1.00000000
## 92  SW 101 - SW 115  2.077149256 0.0377877856 1.00000000
## 93  SW 102 - SW 115 -0.066541148 0.9469469986 1.00000000
## 94  SW 103 - SW 115  2.655539763 0.0079181598 0.87099758
## 95  SW 104 - SW 115  0.298087886 0.7656360872 1.00000000
## 96  SW 105 - SW 115  1.388151474 0.1650909252 1.00000000
## 97  SW 106 - SW 115  2.028443452 0.0425150091 1.00000000
## 98  SW 107 - SW 115  2.627218297 0.0086086088 0.93833836
## 99  SW 108 - SW 115  1.945593364 0.0517036048 1.00000000
## 100 SW 109 - SW 115  1.663660825 0.0961802292 1.00000000
## 101 SW 110 - SW 115  1.774806248 0.0759298843 1.00000000
## 102 SW 111 - SW 115  3.637097373 0.0002757277 0.03308733
## 103 SW 112 - SW 115  3.096054233 0.0019611449 0.22553166
## 104 SW 113 - SW 115  3.258009545 0.0011219663 0.13239203
## 105 SW 114 - SW 115  2.045303154 0.0408249800 1.00000000
## 106 SW 101 - SW 116  1.388115404 0.1651019066 1.00000000
## 107 SW 102 - SW 116 -0.723509149 0.4693671354 1.00000000
## 108 SW 103 - SW 116  1.969407119 0.0489063589 1.00000000
## 109 SW 104 - SW 116 -0.372568418 0.7094696820 1.00000000
## 110 SW 105 - SW 116  0.702018830 0.4826674204 1.00000000
## 111 SW 106 - SW 116  1.339409601 0.1804373656 1.00000000
## 112 SW 107 - SW 116  1.938184446 0.0526007237 1.00000000
## 113 SW 108 - SW 116  1.281623730 0.1999746716 1.00000000
## 114 SW 109 - SW 116  0.974626973 0.3297453288 1.00000000
## 115 SW 110 - SW 116  1.104149944 0.2695280990 1.00000000
## 116 SW 111 - SW 116  2.948063522 0.0031977135 0.36134163
## 117 SW 112 - SW 116  2.407020381 0.0160832723 1.00000000
## 118 SW 113 - SW 116  2.568975693 0.0101999612 1.00000000
## 119 SW 114 - SW 116  1.356269302 0.1750134990 1.00000000
## 120 SW 115 - SW 116 -0.689033852 0.4908019650 1.00000000
## 
## **Compact Letter Displays (CLDs - Holm):**
##        Strain Cocoon_Shell_Weight .group
## SW 101 SW 101           0.2937500      a
## SW 102 SW 102           0.2738000      a
## SW 103 SW 103           0.2980847      a
## SW 104 SW 104           0.2824630      a
## SW 105 SW 105           0.2858644      a
## SW 106 SW 106           0.2898333      a
## SW 107 SW 107           0.3016167      a
## SW 108 SW 108           0.2937115      a
## SW 109 SW 109           0.2868167      a
## SW 110 SW 110           0.2900185      a
## SW 111 SW 111           0.3048333      b
## SW 112 SW 112           0.3018033      a
## SW 113 SW 113           0.3026000      a
## SW 114 SW 114           0.2910000      a
## SW 115 SW 115           0.2727167      a
## SW 116 SW 116           0.2795000      a
## 
## 
## --- Analyzing Cocoon_Shell_Percentage ---

## boundary (singular) fit: see help('isSingular')

## 
## **Attempting log transformation...**

## boundary (singular) fit: see help('isSingular')

## 
## Assumption Checks after Transformation:
##   Normality (Shapiro-Wilk) p-value: 3.532175e-24 
##   Homogeneity of Variance (Levene's) p-value: 0.4632928 
## 
## **Performing Kruskal-Wallis test:**
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Cocoon_Shell_Percentage by Strain
## Kruskal-Wallis chi-squared = 45.229, df = 15, p-value = 7.046e-05
## 
## Epsilon-squared: 0.03314579 
## 
## **Dunn's Test (Post-hoc with Holm correction):**

## Dunn (1964) Kruskal-Wallis multiple comparison
##   p-values adjusted with the Holm method.

##          Comparison           Z      P.unadj       P.adj
## 1   SW 101 - SW 102  0.79234032 4.281623e-01 1.000000000
## 2   SW 101 - SW 103 -2.49137857 1.272485e-02 1.000000000
## 3   SW 102 - SW 103 -3.16564982 1.547370e-03 0.173305388
## 4   SW 101 - SW 104 -2.22753971 2.591123e-02 1.000000000
## 5   SW 102 - SW 104 -2.90203420 3.707481e-03 0.404115384
## 6   SW 103 - SW 104  0.20681503 8.361543e-01 1.000000000
## 7   SW 101 - SW 105 -2.26530087 2.349422e-02 1.000000000
## 8   SW 102 - SW 105 -2.95001080 3.177628e-03 0.349539099
## 9   SW 103 - SW 105  0.22513374 8.218752e-01 1.000000000
## 10  SW 104 - SW 105  0.01328153 9.894032e-01 1.000000000
## 11  SW 101 - SW 106 -0.82999180 4.065434e-01 1.000000000
## 12  SW 102 - SW 106 -1.58370645 1.132605e-01 1.000000000
## 13  SW 103 - SW 106  1.66488149 9.593640e-02 1.000000000
## 14  SW 104 - SW 106  1.41968501 1.556994e-01 1.000000000
## 15  SW 105 - SW 106  1.43880379 1.502061e-01 1.000000000
## 16  SW 101 - SW 107 -0.75727811 4.488833e-01 1.000000000
## 17  SW 102 - SW 107 -1.51437667 1.299303e-01 1.000000000
## 18  SW 103 - SW 107  1.73728901 8.233617e-02 1.000000000
## 19  SW 104 - SW 107  1.49045932 1.361035e-01 1.000000000
## 20  SW 105 - SW 107  1.51121131 1.307346e-01 1.000000000
## 21  SW 106 - SW 107  0.07271368 9.420340e-01 1.000000000
## 22  SW 101 - SW 108 -0.22926627 8.186620e-01 1.000000000
## 23  SW 102 - SW 108 -0.98531935 3.244672e-01 1.000000000
## 24  SW 103 - SW 108  2.17310382 2.977250e-02 1.000000000
## 25  SW 104 - SW 108  1.92698513 5.398148e-02 1.000000000
## 26  SW 105 - SW 108  1.95518449 5.056130e-02 1.000000000
## 27  SW 106 - SW 108  0.57053384 5.683157e-01 1.000000000
## 28  SW 107 - SW 108  0.50046518 6.167476e-01 1.000000000
## 29  SW 101 - SW 109  1.49463234 1.350104e-01 1.000000000
## 30  SW 102 - SW 109  0.63273570 5.269063e-01 1.000000000
## 31  SW 103 - SW 109  3.97971770 6.899715e-05 0.008279658
## 32  SW 104 - SW 109  3.68230800 2.311319e-04 0.027042436
## 33  SW 105 - SW 109  3.75364000 1.742851e-04 0.020739930
## 34  SW 106 - SW 109  2.32462414 2.009207e-02 1.000000000
## 35  SW 107 - SW 109  2.25191045 2.432793e-02 1.000000000
## 36  SW 108 - SW 109  1.66953003 9.501238e-02 1.000000000
## 37  SW 101 - SW 110  0.59017261 5.550749e-01 1.000000000
## 38  SW 102 - SW 110 -0.20900053 8.344478e-01 1.000000000
## 39  SW 103 - SW 110  3.01328393 2.584370e-03 0.286865070
## 40  SW 104 - SW 110  2.74636625 6.025945e-03 0.644776134
## 41  SW 105 - SW 110  2.79318738 5.219145e-03 0.563667661
## 42  SW 106 - SW 110  1.39802730 1.621049e-01 1.000000000
## 43  SW 107 - SW 110  1.32725300 1.844250e-01 1.000000000
## 44  SW 108 - SW 110  0.79334862 4.275747e-01 1.000000000
## 45  SW 109 - SW 110 -0.86459569 3.872607e-01 1.000000000
## 46  SW 101 - SW 111 -0.79235778 4.281521e-01 1.000000000
## 47  SW 102 - SW 111 -1.54782382 1.216647e-01 1.000000000
## 48  SW 103 - SW 111  1.70235705 8.868846e-02 1.000000000
## 49  SW 104 - SW 111  1.45631527 1.453055e-01 1.000000000
## 50  SW 105 - SW 111  1.47627935 1.398689e-01 1.000000000
## 51  SW 106 - SW 111  0.03763401 9.699795e-01 1.000000000
## 52  SW 107 - SW 111 -0.03507967 9.720162e-01 1.000000000
## 53  SW 108 - SW 111 -0.53426880 5.931556e-01 1.000000000
## 54  SW 109 - SW 111 -2.28699012 2.219640e-02 1.000000000
## 55  SW 110 - SW 111 -1.36139704 1.733883e-01 1.000000000
## 56  SW 101 - SW 112  0.14985490 8.808791e-01 1.000000000
## 57  SW 102 - SW 112 -0.64945928 5.160416e-01 1.000000000
## 58  SW 103 - SW 112  2.64060250 8.275875e-03 0.868966911
## 59  SW 104 - SW 112  2.37339776 1.762527e-02 1.000000000
## 60  SW 105 - SW 112  2.41452480 1.575575e-02 1.000000000
## 61  SW 106 - SW 112  0.97984670 3.271618e-01 1.000000000
## 62  SW 107 - SW 112  0.90713302 3.643365e-01 1.000000000
## 63  SW 108 - SW 112  0.37367007 7.086498e-01 1.000000000
## 64  SW 109 - SW 112 -1.34477744 1.786971e-01 1.000000000
## 65  SW 110 - SW 112 -0.44431456 6.568152e-01 1.000000000
## 66  SW 111 - SW 112  0.94221268 3.460838e-01 1.000000000
## 67  SW 101 - SW 113 -1.96378035 4.955557e-02 1.000000000
## 68  SW 102 - SW 113 -2.66473142 7.704983e-03 0.816728226
## 69  SW 103 - SW 113  0.53586681 5.920506e-01 1.000000000
## 70  SW 104 - SW 113  0.31613627 7.518991e-01 1.000000000
## 71  SW 105 - SW 113  0.30978911 7.567213e-01 1.000000000
## 72  SW 106 - SW 113 -1.13378855 2.568833e-01 1.000000000
## 73  SW 107 - SW 113 -1.20650224 2.276239e-01 1.000000000
## 74  SW 108 - SW 113 -1.66307983 9.629646e-02 1.000000000
## 75  SW 109 - SW 113 -3.45841269 5.433684e-04 0.062487361
## 76  SW 110 - SW 113 -2.50157604 1.236419e-02 1.000000000
## 77  SW 111 - SW 113 -1.17142257 2.414290e-01 1.000000000
## 78  SW 112 - SW 113 -2.11363525 3.454643e-02 1.000000000
## 79  SW 101 - SW 114 -0.33478947 7.377839e-01 1.000000000
## 80  SW 102 - SW 114 -1.11154955 2.663319e-01 1.000000000
## 81  SW 103 - SW 114  2.15799875 3.092793e-02 1.000000000
## 82  SW 104 - SW 114  1.90167957 5.721306e-02 1.000000000
## 83  SW 105 - SW 114  1.93192105 5.336925e-02 1.000000000
## 84  SW 106 - SW 114  0.49520233 6.204573e-01 1.000000000
## 85  SW 107 - SW 114  0.42248864 6.726684e-01 1.000000000
## 86  SW 108 - SW 114 -0.09334494 9.256295e-01 1.000000000
## 87  SW 109 - SW 114 -1.82942181 6.733644e-02 1.000000000
## 88  SW 110 - SW 114 -0.91603275 3.596497e-01 1.000000000
## 89  SW 111 - SW 114  0.45756831 6.472626e-01 1.000000000
## 90  SW 112 - SW 114 -0.48464437 6.279286e-01 1.000000000
## 91  SW 113 - SW 114  1.62899088 1.033149e-01 1.000000000
## 92  SW 101 - SW 115  1.21688991 2.236461e-01 1.000000000
## 93  SW 102 - SW 115  0.36791868 7.129339e-01 1.000000000
## 94  SW 103 - SW 115  3.70314471 2.129433e-04 0.025127313
## 95  SW 104 - SW 115  3.41197337 6.449441e-04 0.073523632
## 96  SW 105 - SW 115  3.47706701 5.069311e-04 0.058804008
## 97  SW 106 - SW 115  2.04688170 4.066970e-02 1.000000000
## 98  SW 107 - SW 115  1.97416802 4.836264e-02 1.000000000
## 99  SW 108 - SW 115  1.40189072 1.609479e-01 1.000000000
## 100 SW 109 - SW 115 -0.27774243 7.812101e-01 1.000000000
## 101 SW 110 - SW 115  0.59426105 5.523375e-01 1.000000000
## 102 SW 111 - SW 115  2.00924769 4.451087e-02 1.000000000
## 103 SW 112 - SW 115  1.06703501 2.859560e-01 1.000000000
## 104 SW 113 - SW 115  3.18067026 1.469348e-03 0.166036291
## 105 SW 114 - SW 115  1.55167938 1.207390e-01 1.000000000
## 106 SW 101 - SW 116 -0.32286919 7.467943e-01 1.000000000
## 107 SW 102 - SW 116 -1.10018402 2.712520e-01 1.000000000
## 108 SW 103 - SW 116  2.16986884 3.001678e-02 1.000000000
## 109 SW 104 - SW 116  1.91328191 5.571197e-02 1.000000000
## 110 SW 105 - SW 116  1.94379113 5.192064e-02 1.000000000
## 111 SW 106 - SW 116  0.50712261 6.120688e-01 1.000000000
## 112 SW 107 - SW 116  0.43440892 6.639915e-01 1.000000000
## 113 SW 108 - SW 116 -0.08185827 9.347594e-01 1.000000000
## 114 SW 109 - SW 116 -1.81750153 6.914035e-02 1.000000000
## 115 SW 110 - SW 116 -0.90443040 3.657672e-01 1.000000000
## 116 SW 111 - SW 116  0.46948859 6.387204e-01 1.000000000
## 117 SW 112 - SW 116 -0.47272409 6.364100e-01 1.000000000
## 118 SW 113 - SW 116  1.64091116 1.008159e-01 1.000000000
## 119 SW 114 - SW 116  0.01192028 9.904892e-01 0.990489221
## 120 SW 115 - SW 116 -1.53975910 1.236191e-01 1.000000000
## 
## **Compact Letter Displays (CLDs - Holm):**
##        Strain Cocoon_Shell_Percentage .group
## SW 101 SW 101                19.90443      a
## SW 102 SW 102                19.53279      a
## SW 103 SW 103                21.18908      a
## SW 104 SW 104                21.71367      a
## SW 105 SW 105                21.13654      a
## SW 106 SW 106                20.38933      a
## SW 107 SW 107                20.20448      a
## SW 108 SW 108                20.12382      a
## SW 109 SW 109                20.20702      b
## SW 110 SW 110                19.70253      a
## SW 111 SW 111                20.22120      a
## SW 112 SW 112                19.83908      a
## SW 113 SW 113                20.87511      a
## SW 114 SW 114                19.92945      a
## SW 115 SW 115                19.42583      b
## SW 116 SW 116                20.07143      a

Multivariate Analysis

We’ll now explore the relationships among the traits and strains using multivariate techniques.

1. Principal Component Analysis (PCA): PCA helps us understand the main sources of variation in the data and visualize the relationships between individuals (silkworms) and variables (traits).

Multivariate Analysis (Combined Locations, Excluding Larval_Weight)

# Subset the data for multivariate analysis, excluding "Larval_Weight"
traits_multivariate <- c("Single_Cocoon_Weight", "Cocoon_Shell_Weight", "Cocoon_Shell_Percentage")

# Filter data_complete to include only the 16 hybrid strains
hybrid_strains <- unique(data_complete$Strain)  # Get the unique hybrid strains
data_hybrids <- data_complete %>% filter(Strain %in% hybrid_strains)

# Reshape the data to wide format for location comparison
data_wide <- data_hybrids %>%
  pivot_wider(id_cols = Strain, 
              names_from = Location, 
              values_from = all_of(traits_multivariate),
              names_sep = "_",  # Separate location names in column names with "_"
              values_fn = mean)  # Use the mean value for each trait in each location


# Perform PCA (scaling the data is generally recommended)
pca_result <- PCA(data_wide[, -1], scale.unit = TRUE, graph = FALSE)  # Exclude the 'Strain' column

# Scree plot
fviz_eig(pca_result, addlabels = TRUE, ylim = c(0, 80)) + 
  labs(x = "Principal Component", y = "Percentage of Explained Variance")

# Individuals plot (colored by Strain)
fviz_pca_ind(pca_result, repel = TRUE, 
             col.ind = data_wide$Strain,
             palette = colorRampPalette(brewer.pal(12, "Paired"))(16),# Use a larger color palette
             addEllipses = TRUE, ellipse.type = "confidence",
             legend.title = "Strain", 
             title = "PCA - Individuals Plot (Hybrids Only, Colored by Strain)")

# Variables plot
fviz_pca_var(pca_result, repel = TRUE, 
             col.var = "contrib", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             title = "PCA - Variables Plot")

# Biplot with both individuals and variables
fviz_pca_biplot(pca_result, repel = TRUE, 
                col.ind = data_wide$Strain, 
            palette = colorRampPalette(brewer.pal(12, "Paired"))(16),# Use a larger color palette
                addEllipses = TRUE, ellipse.type = "confidence",
                legend.title = "Strain", 
                title = "PCA Biplot (Hybrids Only, Colored by Strain)")

# Extract the principal components
pca_coords <- pca_result$ind$coord

# Perform hierarchical clustering (using first few PCs - adjust as needed)
hc_result <- hclust(dist(pca_coords[, 1:2]), method = "ward.D2") 

# Visualize the dendrogram
plot(hc_result, hang = -1, main = "Hierarchical Clustering on PCs", 
     xlab = "", sub = "")

# Determine optimal number of clusters (Elbow method) using data_wide
set.seed(123) 
wss <- fviz_nbclust(data_wide[, -1], kmeans, method = "wss")  # Use data_wide

# Perform k-means clustering 
k <- 3  # Adjust based on the elbow plot or other criteria
kmeans_result <- kmeans(pca_coords, centers = k, nstart = 25)

# Visualize clusters
fviz_cluster(kmeans_result, data = pca_coords, geom = "point", ellipse.type = "convex", 
             ggtheme = theme_minimal())

# Factor Analysis

# Determine the appropriate number of factors (e.g., using scree plot or parallel analysis)
# For this example, let's assume you've decided to extract 2 factors
nfactors <- 4

# Perform Factor Analysis with varimax rotation
fa_model <- fa(data_wide[, -1], nfactors = nfactors, rotate = "varimax")  # Exclude the 'Strain' column

# Print the results (sorted by loading)
print(fa_model, sort = TRUE)

## Factor Analysis using method =  minres
## Call: fa(r = data_wide[, -1], nfactors = nfactors, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                                 item   MR1   MR2   MR3   MR4   h2     u2 com
## Cocoon_Shell_Weight_BALUBAL        4  0.93 -0.05 -0.12  0.13 0.90 0.0965 1.1
## Single_Cocoon_Weight_BALUBAL       2  0.82  0.04 -0.11 -0.56 0.99 0.0050 1.8
## Single_Cocoon_Weight_TCMO          1  0.02  0.97 -0.24 -0.03 1.00 0.0040 1.1
## Cocoon_Shell_Weight_TCMO           3 -0.12  0.75  0.64 -0.10 1.00 0.0038 2.0
## Cocoon_Shell_Percentage_TCMO       5 -0.15 -0.15  0.97 -0.08 1.00 0.0040 1.1
## Cocoon_Shell_Percentage_BALUBAL    6 -0.02 -0.06 -0.11  0.88 0.80 0.2032 1.0
## 
##                        MR1  MR2  MR3  MR4
## SS loadings           1.57 1.53 1.45 1.13
## Proportion Var        0.26 0.25 0.24 0.19
## Cumulative Var        0.26 0.52 0.76 0.95
## Proportion Explained  0.28 0.27 0.26 0.20
## Cumulative Proportion 0.28 0.55 0.80 1.00
## 
## Mean item complexity =  1.4
## Test of the hypothesis that 4 factors are sufficient.
## 
## df null model =  15  with the objective function =  7.53 with Chi Square =  91.66
## df of  the model are -3  and the objective function was  0.26 
## 
## The root mean square of the residuals (RMSR) is  0 
## The df corrected root mean square of the residuals is  NA 
## 
## The harmonic n.obs is  16 with the empirical chi square  0  with prob <  NA 
## The total n.obs was  16  with Likelihood Chi Square =  2.46  with prob <  NA 
## 
## Tucker Lewis Index of factoring reliability =  1.482
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    MR1  MR2  MR3  MR4
## Correlation of (regression) scores with factors   0.98 1.00 1.00 0.95
## Multiple R square of scores with factors          0.95 1.00 0.99 0.91
## Minimum correlation of possible factor scores     0.90 0.99 0.98 0.81

# Visualize factor loadings
fa.diagram(fa_model, simple = FALSE)

## Additional Multivariate Analyses (Combined Locations, Excluding Larval_Weight)

# Subset the data for multivariate analysis, excluding "Larval_Weight"
traits_multivariate <- c("Single_Cocoon_Weight", "Cocoon_Shell_Weight", "Cocoon_Shell_Percentage")

# Filter data_complete to include only the 16 hybrid strains
hybrid_strains <- unique(data_complete$Strain)  # Get the unique hybrid strains
data_hybrids <- data_complete %>% filter(Strain %in% hybrid_strains)

# 1. Multidimensional Scaling (MDS)

# Calculate a distance matrix (using Euclidean distance as an example)
dist_matrix <- dist(data_hybrids[, traits_multivariate], method = "euclidean")

# Perform classical MDS
mds_result <- cmdscale(dist_matrix, k = 2)  # k = number of dimensions

# Convert to dataframe for plotting
mds_df <- as.data.frame(mds_result)
colnames(mds_df) <- c("MDS1", "MDS2")
mds_df$Strain <- data_hybrids$Strain
mds_df$Location <- data_hybrids$Location

# Visualize MDS results
ggplot(mds_df, aes(x = MDS1, y = MDS2, color = Strain, shape = Location)) + 
  geom_point(size = 3) +
  labs(title = "MDS Plot of Silkworm Strains") +
  theme_bw()

# 2. Principal Coordinate Analysis (PCoA)

# Perform PCoA (using the same distance matrix)
pcoa_result <- cmdscale(dist_matrix, k = 2, eig = TRUE)  # eig = TRUE to get eigenvalues

# Convert to dataframe for plotting
pcoa_df <- as.data.frame(pcoa_result$points)
colnames(pcoa_df) <- c("PCoA1", "PCoA2")
pcoa_df$Strain <- data_hybrids$Strain
pcoa_df$Location <- data_hybrids$Location

# Visualize PCoA results
ggplot(pcoa_df, aes(x = PCoA1, y = PCoA2, color = Strain, shape = Location)) + 
  geom_point(size = 3) +
  labs(title = "PCoA Plot of Silkworm Strains") +
  theme_bw()

# 3. Non-metric Multidimensional Scaling (NMDS)

# Perform NMDS
nmds_result <- metaMDS(data_hybrids[, traits_multivariate], k = 2)  # k = number of dimensions

## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.03399699 
## Run 1 stress 0.0498962 
## Run 2 stress 0.03817507 
## Run 3 stress 0.04180507 
## Run 4 stress 0.03793114 
## Run 5 stress 0.05032314 
## Run 6 stress 0.03912307 
## Run 7 stress 0.05186972 
## Run 8 stress 0.04359579 
## Run 9 stress 0.03923626 
## Run 10 stress 0.04468165 
## Run 11 stress 0.04703207 
## Run 12 stress 0.03776965 
## Run 13 stress 0.03400293 
## ... Procrustes: rmse 0.001080288  max resid 0.02931923 
## Run 14 stress 0.03817205 
## Run 15 stress 0.0447264 
## Run 16 stress 0.04952129 
## Run 17 stress 0.03792953 
## Run 18 stress 0.03429895 
## ... Procrustes: rmse 0.002983378  max resid 0.08511436 
## Run 19 stress 0.0382202 
## Run 20 stress 0.03996661 
## *** Best solution was not repeated -- monoMDS stopping criteria:
##     13: no. of iterations >= maxit
##      7: scale factor of the gradient < sfgrmin

# Visualize NMDS results (using 'stressplot' to assess goodness-of-fit)
stressplot(nmds_result)

# Plot NMDS results with strain and location information
ordiplot(nmds_result, type = "n")  # Create an empty plot
points(nmds_result, col = as.numeric(data_hybrids$Strain), pch = as.numeric(data_hybrids$Location))
ordihull(nmds_result, groups = data_hybrids$Strain, draw = "polygon", col = "grey90")  # Add convex hulls for strains

Technical Discussion Points and Considerations:

Scree Plot: 1. Examine the proportion of variance explained by each principal component (PC). 2. Look for the “elbow” point where the explained variance starts to level off, indicating the number of PCs to retain for further analysis.

Individuals Plot: 1. Observe how the individual silkworms are distributed in the PCA space based on their trait values. 2. Look for clusters or patterns that might indicate similarities or differences between strains. 3. Consider the implications of the positioning of individuals relative to the PC axes and the ellipses representing strain confidence intervals.

Variables Plot: 1. Interpret the contribution of each trait to the principal components. 2. Arrows pointing in the same direction suggest positive correlations between traits, while arrows pointing in opposite directions suggest negative correlations. 3. The length of the arrow represents the strength of the contribution.

2. Hierarchical Clustering on Principal Components: We’ll perform hierarchical clustering on the first few principal components to group similar individuals together.

Technical Discussion Points and Considerations:

Examine the branching pattern of the dendrogram to understand how individuals are grouped based on their similarity in the PCA space.
The height of the branches represents the dissimilarity between clusters.
Look for clusters that correspond to specific strains or groups of strains, providing insights into phenotypic relationships.

3. K-means Clustering: K-means clustering will partition the individuals into a predefined number of clusters based on their principal component scores.

Technical Discussion Points and Considerations:

Elbow Plot: 1. Use the elbow plot to help determine the optimal number of clusters (k) for k-means clustering. 2. Look for the point where the within-cluster sum of squares (WSS) starts to level off, indicating diminishing returns from adding more clusters.

Cluster Plot: 1. Observe how the individuals are partitioned into clusters in the PCA space. 2. Interpret the meaning of each cluster in terms of the phenotypic traits and strains. 3. Consider the separation between clusters and the tightness within clusters.

Correlation Analysis

We will explore the relationships between the phenotypic traits using both Pearson and Spearman correlation coefficients.

Correlation Analysis (Combined Locations)

# Subset the data for correlation analyses (excluding "Replicate" and "Larval_Weight")
analysis_data <- data_complete %>% dplyr::select(all_of(traits_multivariate), Location)

# 1. Correlation Matrices and Correlograms

# Calculate Pearson and Spearman correlation matrices
cor_pearson <- cor(analysis_data[, traits_multivariate], method = "pearson")  # Use traits_multivariate
cor_spearman <- cor(analysis_data[, traits_multivariate], method = "spearman")  # Use traits_multivariate

# Visualize correlation matrices using heatmaps
heatmap(cor_pearson, Rowv = NA, Colv = NA, 
        col = colorRampPalette(c("blue", "white", "red"))(20), 
        scale = "none", margins = c(5, 10), cexRow = 0.8, cexCol = 0.8)

heatmap(cor_spearman, Rowv = NA, Colv = NA, 
        col = colorRampPalette(c("blue", "white", "red"))(20), 
        scale = "none", margins = c(5, 10), cexRow = 0.8, cexCol = 0.8) 

# Visualize correlograms
corrplot(cor_pearson, method = "color", type = "upper", order = "hclust", 
         tl.col = "black", tl.srt = 45, sig.level = 0.05, 
         title = "Pearson Correlation", mar = c(0, 0, 2, 0))

corrplot(cor_spearman, method = "color", type = "upper", order = "hclust", 
         tl.col = "black", tl.srt = 45, sig.level = 0.05, 
         title = "Spearman Correlation", mar = c(0, 0, 2, 0))

# 2. Individual Correlation Tests and Scatterplots

# Function to perform correlation test and plot
plot_correlation <- function(trait1, trait2, data, method = "pearson") {
  cor_test <- cor.test(data[[trait1]], data[[trait2]], method = method)
  p_value <- cor_test$p.value
  cor_coef <- cor_test$estimate
  
  ggplot(data, aes_string(x = trait1, y = trait2, color = "Location")) +
    geom_point() +
    geom_smooth(method = "lm", se = FALSE) +
    labs(title = paste("Correlation between", trait1, "and", trait2),
         subtitle = paste(method, "correlation:", round(cor_coef, 3), ", p-value:", round(p_value, 5))) +
    facet_wrap(~ Location)  # Facet by Location
}

# Example usage (replace with your actual trait names)
plot_correlation("Single_Cocoon_Weight", "Cocoon_Shell_Weight", data_complete, method = "pearson")

## `geom_smooth()` using formula = 'y ~ x'

plot_correlation("Single_Cocoon_Weight", "Cocoon_Shell_Weight", data_complete, method = "spearman")

## `geom_smooth()` using formula = 'y ~ x'

# 3. Scatterplot Matrix

# Scatterplot matrix with Pearson correlations
ggpairs(analysis_data, 
        aes(color = Location, alpha = 0.5), 
        upper = list(continuous = wrap("cor", method = "pearson", size = 3))) + 
  theme(axis.text = element_text(size = 8)) + 
  theme(plot.margin = margin(5, 10, 5, 5))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# Scatterplot matrix with Spearman correlations
ggpairs(analysis_data, 
        aes(color = Location, alpha = 0.5), 
        upper = list(continuous = wrap("cor", method = "spearman", size = 3))) + 
  theme(axis.text = element_text(size = 8)) + 
  theme(plot.margin = margin(5, 10, 5, 5))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# 4. Partial Correlation Analysis

## Partial Correlation Analysis (with p-values)

# Get the list of traits for partial correlation analysis
traits_partial <- c("Single_Cocoon_Weight", "Cocoon_Shell_Weight", "Cocoon_Shell_Percentage")

# Generate all combinations of 3 traits
trait_combinations <- combn(traits_partial, 3, simplify = FALSE)

# Loop through each location
for (loc in levels(data_complete$Location)) {
  cat("\n\n--- Analyzing Partial Correlations for", loc, "---\n")

  # Subset data by location first
  data_loc <- data_complete %>% filter(Location == loc)

  # Loop through each combination of traits
  for (combination in trait_combinations) {
    trait1 <- combination[1]
    trait2 <- combination[2]
    trait3 <- combination[3]  # Controlling variable

    # Check for strains with all missing values for any of the three traits
    strains_to_exclude <- c()
    for (tr in c(trait1, trait2, trait3)) {
      missing_strains <- data_loc %>%
        group_by(Strain) %>%
        filter(all(is.na(.data[[tr]]))) %>%
        pull(Strain)
      strains_to_exclude <- c(strains_to_exclude, missing_strains)
    }

    # If any strains have all missing values for a trait, exclude them
    if (length(strains_to_exclude) > 0) {
      cat("\n\nExcluding strains", strains_to_exclude, "from partial correlation in", loc, "due to all missing values for one or more traits.\n")
      data_loc <- data_loc %>% filter(!Strain %in% strains_to_exclude)
    }

    # Check for missing values in the selected traits within the location subset (after excluding strains)
    if (any(is.na(data_loc[[trait1]])) || any(is.na(data_loc[[trait2]])) || any(is.na(data_loc[[trait3]]))) {
      cat("\n\nMissing values found for", trait1, ",", trait2, ", or", trait3, "in", loc, ". Imputing missing values.\n")

      # Impute missing values using the mean for each trait within the location
      data_loc <- data_loc %>%
        mutate(across(c(trait1, trait2, trait3), ~ ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x)))
    }

    # Calculate partial correlation using 'psych' package
    partial_cor <- partial.r(data_loc[, c(trait1, trait2, trait3)])  # Calculate partial correlation matrix

    # Extract the relevant partial correlation coefficient
    partial_cor_coef <- partial_cor[trait1, trait2]

    # Perform correlation test to get p-value
    cor_test_result <- cor.test(data_loc[[trait1]], data_loc[[trait2]], method = "pearson")  # Use appropriate method
    p_value <- cor_test_result$p.value

    # Print the results
    cat("\n\nPartial Correlation between", trait1, "and", trait2, 
        "controlling for", trait3, "in", loc, ":\n")
    cat("  Correlation Coefficient:", partial_cor_coef, "\n") 
    cat("  P-value:", p_value, "\n")  # Print the p-value
  }
}

## 
## 
## --- Analyzing Partial Correlations for BALUBAL ---
## 
## 
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Weight controlling for Cocoon_Shell_Percentage in BALUBAL :
##   Correlation Coefficient: 0.86665 
##   P-value: 4.494006e-54 
## 
## 
## --- Analyzing Partial Correlations for TCMO ---
## 
## 
## Partial Correlation between Single_Cocoon_Weight and Cocoon_Shell_Weight controlling for Cocoon_Shell_Percentage in TCMO :
##   Correlation Coefficient: 0.9886461 
##   P-value: 1.538226e-28

# 5. Correlation with Location

# Calculate ANOVA for each trait with Location as a factor
for (trait in traits_multivariate) {
  anova_result <- aov(as.formula(paste(trait, "~ Location")), data = data_complete)
  print(summary(anova_result))
}

##              Df Sum Sq Mean Sq F value Pr(>F)    
## Location      1   5.69   5.693   101.9 <2e-16 ***
## Residuals   926  51.74   0.056                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##              Df Sum Sq Mean Sq F value Pr(>F)    
## Location      1 0.5883  0.5883   273.3 <2e-16 ***
## Residuals   926 1.9934  0.0022                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##              Df Sum Sq Mean Sq F value  Pr(>F)    
## Location      1    507   506.8   36.17 2.6e-09 ***
## Residuals   926  12976    14.0                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Visualize the relationship between each trait and Location using boxplots
for (trait in traits_multivariate) {
  boxplot(as.formula(paste(trait, "~ Location")), data = data_complete,
          main = paste("Boxplot of", trait, "by Location"),
          xlab = "Location", ylab = trait)
}

1. Correlation Matrices and Correlograms:

Technical Discussion Points and Considerations:

Correlation Matrices (Pearson and Spearman): 1. Correlation Coefficients: Examine the strength and direction of the correlations between traits. 2. P-values: Assess the statistical significance of the correlations. 3. Comparison: Compare the Pearson and Spearman correlations to see if there are any notable differences, which might indicate non-linear relationships between traits.

Correlograms: 1. Visual Patterns: Observe the overall patterns in the correlograms. 2. Clusters: Identify clusters of traits that are highly correlated with each other. 3. Potential Multicollinearity: Be mindful of strong correlations between multiple traits, as this might indicate multicollinearity, which can affect the interpretation of regression or other multivariate analyses.

2. Individual Correlation Tests and Scatterplots: We will visualize and test the correlations between specific pairs of traits.

Technical Discussion Points and Considerations:

Scatterplots: Visually inspect the scatterplots to assess the linearity or monotonicity of the relationships between trait pairs.
Correlation Coefficients and P-values: Interpret the correlation coefficients and p-values to quantify the strength and significance of the relationships.

3. Scatterplot Matrix: Visualize all pairwise relationships between traits using a scatterplot matrix.

Technical Discussion Points and Considerations:

Overall Patterns: Gain a comprehensive overview of the pairwise relationships between all traits.
Outliers: Identify any potential outliers that might be influencing the correlations.
Non-linear Relationships: Look for any patterns in the scatterplots that suggest non-linear relationships between traits.

4. Heatmap of Trait Values by Strain:

CONCLUSION

This comprehensive analysis has provided insights into the distribution, relationships, and differences in silkworm phenotypic traits across various strains. We’ve addressed issues of non-normality and heteroscedasticity using transformations and non-parametric tests where appropriate. Multivariate techniques like PCA and clustering helped uncover underlying patterns in the data, while correlation analysis revealed the interdependencies between traits. These findings can guide further research and breeding efforts in silkworms.

Remember:

Interpret the results in the context of your specific research questions and biological knowledge. Consider exploring additional analyses or visualizations to gain further insights into your data. Clearly document your findings and conclusions, highlighting the key takeaways and their implications.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Multivariate Analysis of Silkworm Hybrid Phenotypes Across Two Locations in the Philippines

Jeshurun Asher Tarun

2024-10-28

INTRODUCTION

Data Loading and Pre-processing

Data Exploration and Assumption Checking

ANOVA, Transformation, and Non-Parametric Testing

ANOVA, Transformation, and Non-Parametric Testing for Other Traits (Both Locations)

Multivariate Analysis

Multivariate Analysis (Combined Locations, Excluding Larval_Weight)

Correlation Analysis

Correlation Analysis (Combined Locations)

CONCLUSION