Here I look at data which I obtained regarding the amount of the organic solvent, HFP, that is retained in electrospun fibers after their fabrication.
Below are the first and last 6 lines of the CSV file which contains all of the data.
HFP<-read.csv("C:/Users/Anthony/Desktop/HFP_CSV.csv",header=TRUE)
# To define the Day after fabrication of electrospun fibers as a factor
HFP$Day=as.factor(HFP$Day)
head(HFP)
## Treatment Day Absorbance
## 1 Air 0 0.9894
## 2 Desiccator 0 0.9877
## 3 Ethanol 0 0.9899
## 4 Hot Plate 0 0.8449
## 5 Incubator 0 0.9914
## 6 Vacuum 0 0.9760
tail(HFP)
## Treatment Day Absorbance
## 457 Desiccator 14 0.8942
## 458 Ethanol 14 0.7901
## 459 Hot Plate 14 0.8805
## 460 Incubator 14 0.8753
## 461 Vacuum 14 0.8949
## 462 Water 14 0.8848
As part of a two-factor, multi-level analysis, I will be focusing on the response variable of ‘Absorbance’.
The first factor that I will analyze is ‘Treatment’. There were seven levels of this factor (seven treatment methods) involved in this experiment. They were Air, Desiccator, Ethanol, Hot Plate, Incubator, Vacuum, and Water.
The second factor that I will consider is the number of days after electrospun fiber fabrication which is denoted as ‘Day’ in the following analysis. Absorbance readings were collected at time points of 0 days, 1 day, 2 days, 3 days, 7 days, and 14 days after electrospun fiber fabrication. Thus there are 6 levels to this second factor.
summary(HFP)
## Treatment Day Absorbance
## Air :66 0 :77 Min. :0.771
## Desiccator:66 1 :77 1st Qu.:0.875
## Ethanol :66 2 :77 Median :0.903
## Hot Plate :66 3 :77 Mean :0.909
## Incubator :66 7 :77 3rd Qu.:0.940
## Vacuum :66 14:77 Max. :1.029
## Water :66
The only continuous variable that this experiment looked at was the response variable of ‘Absorbance’. The variables ‘Treatment’ and ‘Day’ were both categorical values.
As previously mentioned, the response variable of this experimentation is ‘Absorbance’. This variable refers to the infrared absorbance of the HFP solvent that was found using Fourier Transform Infrared Spectroscopy. The purpose of this experiment was to monitor the presence of HFP in the electrospun fibers after fabrication. The different treatment methods were used to determine an effective way of solvent removal. Data was taken at different time points (listed as ‘Day’ in the data set) to also determine how long the solvent would be retained in the fibers with and without certain treatments.
This data set contains three different columns of ‘Treatment’, ‘Day’, and ‘Absorbance’. As previously discussed, ‘Absorbance’ is the response variable which is being analyzed with regards to the two factors ‘Treatment’, and ‘Day’ and their respective levels.
The data from this experiment was collected in a random manner. Electrospun fiber samples were fabricated 15 at a time and certain samples were randomly selected for FTIR experimentation. Out of the samples that were originally selected at random there were subsets of samples that were randomly selected for certain treatments, and to be analyzed at certain time points.
How will the experiment be organized and conducted to test the hypothesis?
This experiment was organized by first analyzing solvent retention in fibers using FTIR. Next, the electrospun fibers were treated in a number of different ways to determine the most efficient method of solvent removal. FTIR was then performed again at further time points to determine the efficiency of solvent removal methods compared to simply allowing for a sufficient amount of time to pass to allow for further solvent diffusion out of the fibers.
In this statistical analysis I will begin by analyzing all levels of the factor ‘Treatment’. Six of the levels are experimental treatments, and ‘Air’ was the control group for the experiment. I am doing this to see which off the experimental treatments have statistically significant effects on solvent removal from the electrospun fibers.
Next, I will look at the second factor which is days after fabrication to determine if there is a difference in the amount of solvent retained immediately after fabrication and at a certain time point post fabrication. Thus, I will analyze the amount of solvent retention at all time points.
What is the rationale for this design?
The experiment was designed this way to analyze the two most important factors that affect the amount of solvent retained in electrospun fibers, treatment method, and time after fabrication.
Randomize: What is the Randomization Scheme?
This experiment was performed randomly as previously described. Electrospun fiber samples were fabricated 15 at a time and certain samples were randomly selected for FTIR experimentation. Out of the samples that were originally selected at random there were subsets of samples that were randomly selected for certain treatments, and to be analyzed at certain time points.
Replicate: Are there replicates and/or repeated measures?
Each treatment method had 11 replicates (n=11). The data in the csv file are the averages of the 11 groups. There were no repeated measures.
Block: Did you use blocking in the design?
The experimental units in this experiment were ‘blocked’ according to their treatment groups.
Below are boxplots of the Absorbances of all levels of the two factors of interest.
boxplot(Absorbance~Treatment,data=HFP,ylab="Absorbance",xlab="Treatment",las=1,names=c("Air","Desiccator","Ethanol","Hot Plate","Incubator","Vacuum","Water"))
boxplot(Absorbance~Day,data=HFP,ylab="Absorbance",xlab="Day")
Looking at the box plots above it appears that the absorbance value, which represents the amount of solvent retained, is lowest for the experimental units treated in ethanol. It also seems that as time progresses the amount of solvent in the fibers decreases. Also it is evident that there are some outlying data points.
Here the first two Analyses of Variance (ANOVA) are used to analyze the differences in the amount of solvent retention with regards to the treatment and time after fabrication. The third ANOVA test analyzes the interaction effect between the two factors.
# Assign models for the data of interest
model_Treatment=aov(Absorbance~Treatment,data=HFP)
model_Day=aov(Absorbance~Day,data=HFP)
model_Treatment_Day=aov(Absorbance~Treatment*Day,data=HFP)
# Perform ANOVA on the two models
anova(model_Treatment)
## Analysis of Variance Table
##
## Response: Absorbance
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment 6 0.356 0.0593 34.6 <2e-16 ***
## Residuals 455 0.779 0.0017
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(model_Day)
## Analysis of Variance Table
##
## Response: Absorbance
## Df Sum Sq Mean Sq F value Pr(>F)
## Day 5 0.458 0.0917 61.8 <2e-16 ***
## Residuals 456 0.677 0.0015
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(model_Treatment_Day)
## Analysis of Variance Table
##
## Response: Absorbance
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment 6 0.356 0.0593 156.3 <2e-16 ***
## Day 5 0.458 0.0917 241.7 <2e-16 ***
## Treatment:Day 30 0.162 0.0054 14.2 <2e-16 ***
## Residuals 420 0.159 0.0004
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA test that analyzed the variation in absorbance as a result of variation in the treatment method returned a p-value of 2e-16. This very small p-value translates to the fact that there is a very small probability that the variation in absorbance with regards to treatment method is a result of randomization. Thus the conclusion may be drawn that the change in absorbance is a result of a change in treatment method.
The ANOVA test that analyzed the variation in absorbance as a result of variation in the time point of analysis also returned a p-value of 2e-16, because this is the lowest calculable value for ANOVA tests in R. This very small p-value translates to the fact that there is a very small probability that the variations in absorbance with regards to the time point of analysis is a result of randomization. Thus another conclusion may be drawn that the change in absorbance is a result of a change in the time point of analysis.
Because both ANOVAs alluded to the fact that both factors can effect the absorbance values of the samples I then performed an ANOVA to analyze the interaction effect of the two factors. The resulting p-value was once again 2e-16 which indicates that when the two factors work together there is a very small probability that the changes in the absorbance is a result of randomization.
To expand on the results of the ANOVA tests I perform Tukey’s Honest Significant Differences Test (HSD) to determine which levels of each factor are statistically different from eachother.
TukeyHSD(model_Treatment,conf.level=0.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Absorbance ~ Treatment, data = HFP)
##
## $Treatment
## diff lwr upr p adj
## Desiccator-Air 0.0096848 -0.0116523 0.03102 0.8306
## Ethanol-Air -0.0654909 -0.0868281 -0.04415 0.0000
## Hot Plate-Air -0.0449773 -0.0663144 -0.02364 0.0000
## Incubator-Air -0.0375788 -0.0589160 -0.01624 0.0000
## Vacuum-Air 0.0065561 -0.0147811 0.02789 0.9709
## Water-Air 0.0002864 -0.0210508 0.02162 1.0000
## Ethanol-Desiccator -0.0751758 -0.0965129 -0.05384 0.0000
## Hot Plate-Desiccator -0.0546621 -0.0759993 -0.03332 0.0000
## Incubator-Desiccator -0.0472636 -0.0686008 -0.02593 0.0000
## Vacuum-Desiccator -0.0031288 -0.0244660 0.01821 0.9995
## Water-Desiccator -0.0093985 -0.0307357 0.01194 0.8497
## Hot Plate-Ethanol 0.0205136 -0.0008235 0.04185 0.0686
## Incubator-Ethanol 0.0279121 0.0065749 0.04925 0.0023
## Vacuum-Ethanol 0.0720470 0.0507098 0.09338 0.0000
## Water-Ethanol 0.0657773 0.0444401 0.08711 0.0000
## Incubator-Hot Plate 0.0073985 -0.0139387 0.02874 0.9477
## Vacuum-Hot Plate 0.0515333 0.0301962 0.07287 0.0000
## Water-Hot Plate 0.0452636 0.0239265 0.06660 0.0000
## Vacuum-Incubator 0.0441348 0.0227977 0.06547 0.0000
## Water-Incubator 0.0378652 0.0165280 0.05920 0.0000
## Water-Vacuum -0.0062697 -0.0276069 0.01507 0.9768
TukeyHSD(model_Day,conf.level=0.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Absorbance ~ Day, data = HFP)
##
## $Day
## diff lwr upr p adj
## 1-0 -0.064205 -0.08197 -0.046437 0.0000
## 2-0 -0.075462 -0.09323 -0.057694 0.0000
## 3-0 -0.078261 -0.09603 -0.060493 0.0000
## 7-0 -0.085862 -0.10363 -0.068094 0.0000
## 14-0 -0.097036 -0.11480 -0.079268 0.0000
## 2-1 -0.011257 -0.02903 0.006511 0.4583
## 3-1 -0.014056 -0.03182 0.003713 0.2112
## 7-1 -0.021657 -0.03943 -0.003889 0.0070
## 14-1 -0.032831 -0.05060 -0.015063 0.0000
## 3-2 -0.002799 -0.02057 0.014970 0.9977
## 7-2 -0.010400 -0.02817 0.007368 0.5490
## 14-2 -0.021574 -0.03934 -0.003806 0.0074
## 7-3 -0.007601 -0.02537 0.010167 0.8249
## 14-3 -0.018775 -0.03654 -0.001007 0.0314
## 14-7 -0.011174 -0.02894 0.006594 0.4669
Above is a Tukey’s HSD test which compares the mean absorbance values which resulted from each treatment and each time point. For individual tests that return a p-value<0.05 there is a statistical difference between the mean response variables of those two levels that can be attributed to something other than randomization with a confidence level of 95%.
To check the adequacy of using the ANOVA as a means of analyzing this set of data I performed Quantile-Quantile (Q-Q) tests on the residual error to determine if the residuals followed a normal distribution. I also created an interaction plot to see if there was an interaction effect between the two factors.
The nearly linear fit of the residuals in the two QQ plots are an indication that the ANOVA model may be adequate for this analysis.
The interaction plot following the QQ plots shows that the two factors are interacting with eachother to create an effect in the response variable whenever there is an intersection of curves on the plot.
The third type of plot is a Residuals vs.Fits plot which is used to identify the linearity of the residual values and to determine if there are any outlying values. Because the residual values seem to be centered around zero for the model of the ‘day’ data more than the model of the ‘treatment’ data it can be concluded that the model used in this analysis may not be completely accurate for determining the effect of ‘treatment’ on the amount of solvent retention.
# QQ Plot for residuals in analysis of cylinder effect on highway gas mileage
qqnorm(residuals(model_Treatment))
qqline(residuals(model_Day))
# QQ Plot for residuals in analysis of drive effect on highway gas mileage
qqnorm(residuals(model_Treatment))
qqline(residuals(model_Day))
interaction.plot(HFP$Treatment,HFP$Day,HFP$Absorbance)
plot(fitted(model_Treatment),residuals(model_Treatment))
plot(fitted(model_Day),residuals(model_Day))