Question 1: (60 points) A researcher is interested in studying three types of fertilization methods (100 lb., 150 lb., and 200 lb.) and two levels of irrigation (A and B) on biomass yield. The possible treatment combinations were randomly assigned to 30 plots of land, where each treatment was assigned the same number of plots. You can find the data you need for this exercise in the file Biomass.csv (Canvas/ Module 3/ Resources). Answer the following questions using a significance level of 0.05. Note: You MUST read the Biomass.csv file into R using the following way:

Open a code chunk in your notebook, insert the following code, and run the code chunk:

getwd()
## [1] "C:/Users/cerpa/OneDrive/Documents/CAP3330_Fall2023"
Biomass_df = read.csv ("Biomass.csv",colClasses = c("numeric","factor","factor"))

Once you do that, a data frame called “Biomass_df” will be available in R with the data you need to answer this question.

If you attempt to read the file in a different way (for example, the way we do it in class using the Import option), you will read the file incorrectly and your results will be wrong.

  1. How many replications were taken in this study? Show how you got the answer.
# Count the number of replications
replications <- table(Biomass_df$fertilizer, Biomass_df$irrigation)
replications
##      
##       A B
##   100 5 5
##   150 5 5
##   200 5 5
# There are 5 replications for each irrigation level (A and B)
  1. Identify: The outcome variable #Biomass The factor (or factors) #fertilizer and irrigation n The levels of the factor (or levels of the factors) #The level of the factors: Fertilizer levels: 100, 150, 200. Irrigation levels: A, B

  2. Set up all the hypotheses that should be set up to run this ANOVA. For Fertilization:

#Ho: The mean biomass yield is the same for all fertilizer levels. #Ha: There is a difference in mean biomass yield for at least two fertilizer levels.

#For Irrigation:

#Ho: The mean biomass yield is the same for both irrigation levels. #Ha: There is a difference in mean biomass yield between the two irrigation levels.

#For Interaction between Fertilization and Irrigation:

#Ho: There is no interaction effect (the effect of fertilizer on biomass yield does not depend on irrigation). #Ha: There is an interaction effect (the effect of fertilizer on biomass yield depends on irrigation).

  1. Analyze the results and discuss which effects (among all possible effects) are statistically significant. Justify your answer.
# Run ANOVA
anova_biomass <- aov(biomass ~ fertilizer * irrigation, data = Biomass_df)

# Summary of ANOVA
summary(anova_biomass)
##                       Df  Sum Sq Mean Sq F value  Pr(>F)   
## fertilizer             2  500454  250227   3.536 0.04508 * 
## irrigation             1  707175  707175   9.994 0.00422 **
## fertilizer:irrigation  2  557130  278565   3.937 0.03322 * 
## Residuals             24 1698195   70758                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#The effect of fertilizer on biomass yield is statistically significant (p < 0.05). #The effect of irrigation on biomass yield is statistically significant (p < 0.05). #The interaction effect between fertilizer and irrigation is statistically significant (p < 0.05) #In summary, both fertilizer and irrigation levels have significant individual effects, and there is a significant interaction effect between fertilizer and irrigation on biomass yield.

  1. Construct the interaction plot and explain why it clearly shows the presence of interaction in this problem.

Note: Stating that “the interaction is evident because the lines are not parallel” does NOT count as a valid explanation.

# Create the interaction plot
# Construct the interaction plot
interaction.plot(
  x.factor = Biomass_df$fertilizer,
  trace.factor = Biomass_df$irrigation,
  response = Biomass_df$biomass,
  fun = mean
)

# Explanation:
# The interaction plot illustrates how the effect of fertilizer levels on mean biomass yield differs between Irrigation A and B.
# This visual difference in response patterns confirms the presence of a significant interaction effect, as observed in the ANOVA results.

Question 2: (40 points) A study is done to determine if there is a difference in the average strength of a filament fiber produced by three machines. Researchers are also interested in studying the possible effect of changing the filament diameter on strength. Researchers decided to do the analysis using an alpha of 0.10.

The data for this exercise can be found in the file Filament.csv (Canvas/ Module 3/ Resources). You MUST read the Filament.csv file into R using the following way:

Open a code chunk in your notebook, insert the following code, and run the code chunk:

Filament_df = read.csv ("Filament.csv",colClasses = c("numeric","factor","factor"))
  1. Identify: The outcome variable #Strength of filament fiber The factor (or factors) #Machine used to produce the filament The levels of the factor (or levels of the factors) #Machine has 3 levels: A, B, C #Diameter has 4 levels: 0.5, 0.6, 0.7, and 0.8

  2. Analyze the results and discuss which effects (among all possible effects) are statistically significant. Justify your answer.

# Run ANOVA with alpha = 0.10
alpha_level = 0.10
filament_anova = aov(strength ~ machine * diameter, data = Filament_df)
summary(filament_anova)
##                  Df Sum Sq Mean Sq F value Pr(>F)  
## machine           2 140.40   70.20   5.044 0.0519 .
## diameter          2  19.27    9.63   0.692 0.5364  
## machine:diameter  4 103.23   25.81   1.854 0.2377  
## Residuals         6  83.50   13.92                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

None of the factors (machine or diameter) or their interaction show statistically significant effects on filament strength at the 0.10 significance level, indicating that variations in these factors do not lead to significant differences in filament strength.

  1. Obtain an estimation of the average strength produced by each of the three machines.
# Calculate the average strength for each machine
aggregate(Filament_df$strength, by=list(Filament_df$machine), FUN=mean)