Recipe 4: Completely Randomized Block Designs from Literature

This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the MD toolbar button for help on Markdown).

When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Recipes for the Design of Experiments: Recipe Outline

as of August 28, 2014, superceding the version of August 24. Always use the most recent version.

Fruit Intake Analysis

Caroline Hsia

Rensselaer Polytechnic Institute

October 16, 2014 V1

1. Setting

System under test

This dataset was gathered from a study conducted by researchers in the United Kingdom. The inadequate intake of fruits and vegetables is something that has affected the world population and has been the explanation of 5% of deaths in the world. While people have been informed about the health risks associated with not eating the proper amount of fruits and vegetables, it is still something that causes us problems today. Researchers in the United Kingdom wished to test a number of hypothesis using three conventional socioeconomic status indicators, and three financial hardship measures in relation to both variety and quantity of fruits and vegetables in British men and women.

The design of the study consisted of 3 factors with multiple levels which defined the socioeconomic status indicators. These factors include: Social Class, Education, and Home ownership. Another factor included Gender: Men and Women. The response variables included fruit variety, vegetable variety, fruit intake, and vegetable intake.

remove(list=ls())

x<- (read.csv(file.choose(), header = T))
attach(x)
head(x)
##   X Education Gender Fruit.Quantity Fruit.Variety
## 1 1    Degree  Women            318           8.2
## 2 2    Degree  Women            317           8.2
## 3 3   A-level  Women            307           8.0
## 4 4   A-level  Women            306           7.9
## 5 5   O-level  Women            291           7.8
## 6 6   O-level  Women            291           7.8
summary(x)
##        X                    Education   Gender  Fruit.Quantity
##  Min.   : 1.00   A-level         :4   Men  :8   Min.   :231   
##  1st Qu.: 4.75   Degree          :4   Women:8   1st Qu.:234   
##  Median : 8.50   No qualification:4             Median :276   
##  Mean   : 8.50   O-level         :4             Mean   :271   
##  3rd Qu.:12.25                                  3rd Qu.:296   
##  Max.   :16.00                                  Max.   :318   
##  Fruit.Variety 
##  Min.   :6.30  
##  1st Qu.:6.78  
##  Median :7.40  
##  Mean   :7.32  
##  3rd Qu.:7.83  
##  Max.   :8.20

Factors and Levels

The design of the study consisted of 3 factors with multiple levels which defined the socioeconomic status indicators. These factors include: Social Class, Education, and Home ownership. Another factor included Gender: Men and Women.

However, because of the way the data was collected and presented, we will be focusing on just the Education factor and Gender factor. The Education factor has 4 levels: Degree (>= 16 years of schooling), Associate Level (>= 13 years of schooling with no degree), O-level (11 years of schooling), and No Qualification (<11 years of schooling. The Gender factor includes two levels: Men and Women. We will be focusing on the response variable Fruit Quantity. Fruit Quantity was calcuated by observing the number of pieces of fruit consumed in three months and converted into grams and displayed as grams per day.

head(x)
##   X Education Gender Fruit.Quantity Fruit.Variety
## 1 1    Degree  Women            318           8.2
## 2 2    Degree  Women            317           8.2
## 3 3   A-level  Women            307           8.0
## 4 4   A-level  Women            306           7.9
## 5 5   O-level  Women            291           7.8
## 6 6   O-level  Women            291           7.8
tail(x)
##     X        Education Gender Fruit.Quantity Fruit.Variety
## 11 11          A-level    Men            233           6.8
## 12 12          A-level    Men            233           6.8
## 13 13          O-level    Men            239           6.7
## 14 14          O-level    Men            234           6.7
## 15 15 No qualification    Men            231           6.3
## 16 16 No qualification    Men            233           6.3

Continuous variables (if any)

The continuous variables seen in the study by the researchers include fruit intake, vegetable intake, fruit variety, and vegetable variety.

Response variables

The response variables were also the continuous variables: fruit intake, vegetable intake, fruit variety, and vegetable variety.

The Data: How is it organized and what does it look like?

str(x)
## 'data.frame':    16 obs. of  5 variables:
##  $ X             : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Education     : Factor w/ 4 levels "A-level","Degree",..: 2 2 1 1 4 4 3 3 2 2 ...
##  $ Gender        : Factor w/ 2 levels "Men","Women": 2 2 2 2 2 2 2 2 1 1 ...
##  $ Fruit.Quantity: int  318 317 307 306 291 291 292 291 261 258 ...
##  $ Fruit.Variety : num  8.2 8.2 8 7.9 7.8 7.8 7.3 7.4 7.5 7.4 ...

The data was displayed in one table, but not containing values for each factor for each data point gathered. This rendered it hard to use for analysis of multiple socioeconomic status factors, however, we were still able to perform an analysis on multiple factors. However, the data was subset in order to allow for us to perform an analysis.

Randomization

The design chosen used a factorial, randomized block design.

2. (Experimental) Design

How will the experiment be organized and conducted to test the hypothesis?

This data was publically available for anyone to use and perform analysis on. Analysis and exploration of the general data was performed by researchers in the United Kingdom.

The null hypothesis that will be tested is:

The variation of Fruit Quantity Intake cannot be explained by anything other than randomization, or rather Gender and Education.

What is the rationale for this design?

The rationale for the collection of data was just gather information about the socioeconomic status factors that might contribute to malnutrition in the United Kingdom.

Randomize: What is the Randomization Scheme?

The design chosen used a factorial, randomized block design.

Replicate: Are there replicates and/or repeated measures?

There were only 1 additional replication performed for each socioeconomic status and gender.

Block: Did you use blocking in the design?

I used blocking in my analysis for socioeconomic status. I chose Education.

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

#Here we will just look at the boxplots.
xsub<- subset(x, X <=17)
par(mfrow=c(1,1))

hist(Fruit.Quantity)

plot of chunk unnamed-chunk-4

boxplot(Fruit.Quantity~Gender, data = x)

plot of chunk unnamed-chunk-4

boxplot(Fruit.Quantity~Education)

plot of chunk unnamed-chunk-4

#Define dataframe
Y <- x

summary (Y)
##        X                    Education   Gender  Fruit.Quantity
##  Min.   : 1.00   A-level         :4   Men  :8   Min.   :231   
##  1st Qu.: 4.75   Degree          :4   Women:8   1st Qu.:234   
##  Median : 8.50   No qualification:4             Median :276   
##  Mean   : 8.50   O-level         :4             Mean   :271   
##  3rd Qu.:12.25                                  3rd Qu.:296   
##  Max.   :16.00                                  Max.   :318   
##  Fruit.Variety 
##  Min.   :6.30  
##  1st Qu.:6.78  
##  Median :7.40  
##  Mean   :7.32  
##  3rd Qu.:7.83  
##  Max.   :8.20
x$Gender=as.factor(x$Gender)

x$Education = as.factor(x$Education)

Testing

A two way analysis of variance (ANOVA) test was peformed in order to analyze the effects of gender and education level on the distribution of fruit quantity intake. In the analysis, we used the ANOVA to test the null hypothesis stated above.

Anova model for effect of Gender on Fruit Quantity blocked by Socioeconomic Status.

model1 = aov (Fruit.Quantity~Gender, data = x)
anova(model1)
## Analysis of Variance Table
## 
## Response: Fruit.Quantity
##           Df Sum Sq Mean Sq F value Pr(>F)    
## Gender     1  15068   15068     105  7e-08 ***
## Residuals 14   2013     144                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova model for effect of Education Type on Fruit Quantity blocked by Socioeconomic Status.

model2 = aov (Fruit.Quantity~Education, data = x)
anova(model2)
## Analysis of Variance Table
## 
## Response: Fruit.Quantity
##           Df Sum Sq Mean Sq F value Pr(>F)
## Education  3   1784     595    0.47   0.71
## Residuals 12  15297    1275

Anova model for effect of Gender and Education on Fruit Quantity blocked by Socioeconomic Status.

model3 = aov (Fruit.Quantity~Gender * Education, data = x)
anova(model3)
## Analysis of Variance Table
## 
## Response: Fruit.Quantity
##                  Df Sum Sq Mean Sq F value  Pr(>F)    
## Gender            1  15068   15068  5880.0 9.3e-13 ***
## Education         3   1784     595   232.0 4.1e-08 ***
## Gender:Education  3    209      70    27.2 0.00015 ***
## Residuals         8     21       3                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the ANOVA test, we can see that the variation in fruit intake quantity as a result of the variation in gender and education produced a p-value of 0.0001505. This shows us that there is a very small probability that the variation can be explained by randomization. It is likely that the variation intake quantity of fruit is a result of the variation in gender and education level. The ANOVA test was then performed for each factor on the variation of fruit intake quantity and showed us that the variation in both factors can explain the variation in response variation.

Diagnostics/Model Adequacy Checking

A QQ plot is used in order to test the normality of the data.

qqnorm(x$Fruit.Quantity)
qqline(x$Fruit.Quantity)

plot of chunk unnamed-chunk-8

qqnorm(residuals(model3))
qqline(residuals(model3))

plot of chunk unnamed-chunk-9

plot(fitted(model3, residuals(model3)))

plot of chunk unnamed-chunk-10

plot(fitted(model3), residuals(model3))

plot of chunk unnamed-chunk-11

interaction.plot(Gender, Education, Fruit.Quantity)

plot of chunk unnamed-chunk-12

For our test of model adequacy, we can see that the data can be assumed to be normally distributed. This can be seen through the nearly linear fit of the residuatls for the QQ plots. This shows us that the ANOVA model was adequate for this analysis. It can also be seen in the interaction plot that we have some intersection of two or more curves. This tells us that those factors are interacting to create an effect in the response variable. The residuals plot helps us identify the linearity of th residuals value and determine outlying values. With the values surrounding zero, it tells us that the models used in our analysis were accurate for determined the effect of gender and education on the fruit intake quantity.

Tukey’s Test

In order to avoid the chance of false positive in a multivariate statistical test, a Tukey’s HSD test is performed.

TukeyHSD(model1, ordered=FALSE, conf.level=0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Fruit.Quantity ~ Gender, data = x)
## 
## $Gender
##            diff   lwr   upr p adj
## Women-Men 61.38 48.51 74.24     0
TukeyHSD(model2, ordered=FALSE, conf.level=0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Fruit.Quantity ~ Education, data = x)
## 
## $Education
##                            diff     lwr   upr  p adj
## Degree-A-level            18.75  -56.20 93.70 0.8780
## No qualification-A-level  -8.00  -82.95 66.95 0.9884
## O-level-A-level           -6.00  -80.95 68.95 0.9950
## No qualification-Degree  -26.75 -101.70 48.20 0.7192
## O-level-Degree           -24.75  -99.70 50.20 0.7630
## O-level-No qualification   2.00  -72.95 76.95 0.9998
TukeyHSD(model3, ordered=FALSE, conf.level=0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Fruit.Quantity ~ Gender * Education, data = x)
## 
## $Gender
##            diff   lwr   upr p adj
## Women-Men 61.38 59.53 63.22     0
## 
## $Education
##                            diff     lwr     upr  p adj
## Degree-A-level            18.75  15.125  22.375 0.0000
## No qualification-A-level  -8.00 -11.625  -4.375 0.0005
## O-level-A-level           -6.00  -9.625  -2.375 0.0032
## No qualification-Degree  -26.75 -30.375 -23.125 0.0000
## O-level-Degree           -24.75 -28.375 -21.125 0.0000
## O-level-No qualification   2.00  -1.625   5.625 0.3536
## 
## $`Gender:Education`
##                                              diff     lwr     upr  p adj
## Women:A-level-Men:A-level                    73.5  67.166  79.834 0.0000
## Men:Degree-Men:A-level                       26.5  20.166  32.834 0.0000
## Women:Degree-Men:A-level                     84.5  78.166  90.834 0.0000
## Men:No qualification-Men:A-level             -1.0  -7.334   5.334 0.9972
## Women:No qualification-Men:A-level           58.5  52.166  64.834 0.0000
## Men:O-level-Men:A-level                       3.5  -2.834   9.834 0.4415
## Women:O-level-Men:A-level                    58.0  51.666  64.334 0.0000
## Men:Degree-Women:A-level                    -47.0 -53.334 -40.666 0.0000
## Women:Degree-Women:A-level                   11.0   4.666  17.334 0.0018
## Men:No qualification-Women:A-level          -74.5 -80.834 -68.166 0.0000
## Women:No qualification-Women:A-level        -15.0 -21.334  -8.666 0.0002
## Men:O-level-Women:A-level                   -70.0 -76.334 -63.666 0.0000
## Women:O-level-Women:A-level                 -15.5 -21.834  -9.166 0.0002
## Women:Degree-Men:Degree                      58.0  51.666  64.334 0.0000
## Men:No qualification-Men:Degree             -27.5 -33.834 -21.166 0.0000
## Women:No qualification-Men:Degree            32.0  25.666  38.334 0.0000
## Men:O-level-Men:Degree                      -23.0 -29.334 -16.666 0.0000
## Women:O-level-Men:Degree                     31.5  25.166  37.834 0.0000
## Men:No qualification-Women:Degree           -85.5 -91.834 -79.166 0.0000
## Women:No qualification-Women:Degree         -26.0 -32.334 -19.666 0.0000
## Men:O-level-Women:Degree                    -81.0 -87.334 -74.666 0.0000
## Women:O-level-Women:Degree                  -26.5 -32.834 -20.166 0.0000
## Women:No qualification-Men:No qualification  59.5  53.166  65.834 0.0000
## Men:O-level-Men:No qualification              4.5  -1.834  10.834 0.2145
## Women:O-level-Men:No qualification           59.0  52.666  65.334 0.0000
## Men:O-level-Women:No qualification          -55.0 -61.334 -48.666 0.0000
## Women:O-level-Women:No qualification         -0.5  -6.834   5.834 1.0000
## Women:O-level-Men:O-level                    54.5  48.166  60.834 0.0000
#plot the results 
a1 = TukeyHSD(model3, which = "Education", ordered = FALSE)
a2 = TukeyHSD(model3, which = "Gender", ordered = FALSE)
plot(a1)

plot of chunk unnamed-chunk-13

plot(a2)

plot of chunk unnamed-chunk-13

From these plots, we can see that the difference in means all are not inclusive of 0, meaning there is a difference between the sample means. For each individual test that returned a p-adjusted value that was 0.05 or lower, it tells us that there is a statistical difference between the mean response variables of the two levels, and the response can be explained by something other than randomization. All p-adjusted values are interpreted in this analysis at a 95% confidence interval level.

4. References to the literature

Annalijn I. Conklin, Nita G. Forouhi, Marc Suhrcke, Paul Surtees, Nicholas J. Wareham, Pablo Monsivais, Variety more than quantity of fruit and vegetable intake varies by socioeconomic status and financial hardship. Findings from older adults in the EPIC cohort, Appetite, Volume 83, 1 December 2014, Pages 248-255, ISSN 0195-6663, http://dx.doi.org/10.1016/j.appet.2014.08.038. (http://www.sciencedirect.com/science/article/pii/S0195666314004413) Keywords: Healthy eating; Variety; Socioeconomic inequality; Financial hardship; Aging; UK

5. Appendices

A summary of, or pointer to, the raw data

complete and documented R code

Code can be seen above.