This is an R Markdown document. Markdown is a simple formatting syntax for authoring web pages (click the MD toolbar button for help on Markdown).
When you click the Knit HTML button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
as of August 28, 2014, superceding the version of August 24. Always use the most recent version.
This dataset was gathered from a study conducted by researchers in the United Kingdom. The inadequate intake of fruits and vegetables is something that has affected the world population and has been the explanation of 5% of deaths in the world. While people have been informed about the health risks associated with not eating the proper amount of fruits and vegetables, it is still something that causes us problems today. Researchers in the United Kingdom wished to test a number of hypothesis using three conventional socioeconomic status indicators, and three financial hardship measures in relation to both variety and quantity of fruits and vegetables in British men and women.
The design of the study consisted of 3 factors with multiple levels which defined the socioeconomic status indicators. These factors include: Social Class, Education, and Home ownership. Another factor included Gender: Men and Women. The response variables included fruit variety, vegetable variety, fruit intake, and vegetable intake.
remove(list=ls())
x<- (read.csv(file.choose(), header = T))
attach(x)
head(x)
## X Education Gender Fruit.Quantity Fruit.Variety
## 1 1 Degree Women 318 8.2
## 2 2 Degree Women 317 8.2
## 3 3 A-level Women 307 8.0
## 4 4 A-level Women 306 7.9
## 5 5 O-level Women 291 7.8
## 6 6 O-level Women 291 7.8
summary(x)
## X Education Gender Fruit.Quantity
## Min. : 1.00 A-level :4 Men :8 Min. :231
## 1st Qu.: 4.75 Degree :4 Women:8 1st Qu.:234
## Median : 8.50 No qualification:4 Median :276
## Mean : 8.50 O-level :4 Mean :271
## 3rd Qu.:12.25 3rd Qu.:296
## Max. :16.00 Max. :318
## Fruit.Variety
## Min. :6.30
## 1st Qu.:6.78
## Median :7.40
## Mean :7.32
## 3rd Qu.:7.83
## Max. :8.20
The design of the study consisted of 3 factors with multiple levels which defined the socioeconomic status indicators. These factors include: Social Class, Education, and Home ownership. Another factor included Gender: Men and Women.
However, because of the way the data was collected and presented, we will be focusing on just the Education factor and Gender factor. The Education factor has 4 levels: Degree (>= 16 years of schooling), Associate Level (>= 13 years of schooling with no degree), O-level (11 years of schooling), and No Qualification (<11 years of schooling. The Gender factor includes two levels: Men and Women. We will be focusing on the response variable Fruit Quantity. Fruit Quantity was calcuated by observing the number of pieces of fruit consumed in three months and converted into grams and displayed as grams per day.
head(x)
## X Education Gender Fruit.Quantity Fruit.Variety
## 1 1 Degree Women 318 8.2
## 2 2 Degree Women 317 8.2
## 3 3 A-level Women 307 8.0
## 4 4 A-level Women 306 7.9
## 5 5 O-level Women 291 7.8
## 6 6 O-level Women 291 7.8
tail(x)
## X Education Gender Fruit.Quantity Fruit.Variety
## 11 11 A-level Men 233 6.8
## 12 12 A-level Men 233 6.8
## 13 13 O-level Men 239 6.7
## 14 14 O-level Men 234 6.7
## 15 15 No qualification Men 231 6.3
## 16 16 No qualification Men 233 6.3
The continuous variables seen in the study by the researchers include fruit intake, vegetable intake, fruit variety, and vegetable variety.
The response variables were also the continuous variables: fruit intake, vegetable intake, fruit variety, and vegetable variety.
str(x)
## 'data.frame': 16 obs. of 5 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Education : Factor w/ 4 levels "A-level","Degree",..: 2 2 1 1 4 4 3 3 2 2 ...
## $ Gender : Factor w/ 2 levels "Men","Women": 2 2 2 2 2 2 2 2 1 1 ...
## $ Fruit.Quantity: int 318 317 307 306 291 291 292 291 261 258 ...
## $ Fruit.Variety : num 8.2 8.2 8 7.9 7.8 7.8 7.3 7.4 7.5 7.4 ...
The data was displayed in one table, but not containing values for each factor for each data point gathered. This rendered it hard to use for analysis of multiple socioeconomic status factors, however, we were still able to perform an analysis on multiple factors. However, the data was subset in order to allow for us to perform an analysis.
The design chosen used a factorial, randomized block design.
This data was publically available for anyone to use and perform analysis on. Analysis and exploration of the general data was performed by researchers in the United Kingdom.
The null hypothesis that will be tested is:
The variation of Fruit Quantity Intake cannot be explained by anything other than randomization, or rather Gender and Education.
The rationale for the collection of data was just gather information about the socioeconomic status factors that might contribute to malnutrition in the United Kingdom.
The design chosen used a factorial, randomized block design.
There were only 1 additional replication performed for each socioeconomic status and gender.
I used blocking in my analysis for socioeconomic status. I chose Education.
#Here we will just look at the boxplots.
xsub<- subset(x, X <=17)
par(mfrow=c(1,1))
hist(Fruit.Quantity)
boxplot(Fruit.Quantity~Gender, data = x)
boxplot(Fruit.Quantity~Education)
#Define dataframe
Y <- x
summary (Y)
## X Education Gender Fruit.Quantity
## Min. : 1.00 A-level :4 Men :8 Min. :231
## 1st Qu.: 4.75 Degree :4 Women:8 1st Qu.:234
## Median : 8.50 No qualification:4 Median :276
## Mean : 8.50 O-level :4 Mean :271
## 3rd Qu.:12.25 3rd Qu.:296
## Max. :16.00 Max. :318
## Fruit.Variety
## Min. :6.30
## 1st Qu.:6.78
## Median :7.40
## Mean :7.32
## 3rd Qu.:7.83
## Max. :8.20
x$Gender=as.factor(x$Gender)
x$Education = as.factor(x$Education)
A two way analysis of variance (ANOVA) test was peformed in order to analyze the effects of gender and education level on the distribution of fruit quantity intake. In the analysis, we used the ANOVA to test the null hypothesis stated above.
Anova model for effect of Gender on Fruit Quantity blocked by Socioeconomic Status.
model1 = aov (Fruit.Quantity~Gender, data = x)
anova(model1)
## Analysis of Variance Table
##
## Response: Fruit.Quantity
## Df Sum Sq Mean Sq F value Pr(>F)
## Gender 1 15068 15068 105 7e-08 ***
## Residuals 14 2013 144
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova model for effect of Education Type on Fruit Quantity blocked by Socioeconomic Status.
model2 = aov (Fruit.Quantity~Education, data = x)
anova(model2)
## Analysis of Variance Table
##
## Response: Fruit.Quantity
## Df Sum Sq Mean Sq F value Pr(>F)
## Education 3 1784 595 0.47 0.71
## Residuals 12 15297 1275
Anova model for effect of Gender and Education on Fruit Quantity blocked by Socioeconomic Status.
model3 = aov (Fruit.Quantity~Gender * Education, data = x)
anova(model3)
## Analysis of Variance Table
##
## Response: Fruit.Quantity
## Df Sum Sq Mean Sq F value Pr(>F)
## Gender 1 15068 15068 5880.0 9.3e-13 ***
## Education 3 1784 595 232.0 4.1e-08 ***
## Gender:Education 3 209 70 27.2 0.00015 ***
## Residuals 8 21 3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the ANOVA test, we can see that the variation in fruit intake quantity as a result of the variation in gender and education produced a p-value of 0.0001505. This shows us that there is a very small probability that the variation can be explained by randomization. It is likely that the variation intake quantity of fruit is a result of the variation in gender and education level. The ANOVA test was then performed for each factor on the variation of fruit intake quantity and showed us that the variation in both factors can explain the variation in response variation.
A QQ plot is used in order to test the normality of the data.
qqnorm(x$Fruit.Quantity)
qqline(x$Fruit.Quantity)
qqnorm(residuals(model3))
qqline(residuals(model3))
plot(fitted(model3, residuals(model3)))
plot(fitted(model3), residuals(model3))
interaction.plot(Gender, Education, Fruit.Quantity)
For our test of model adequacy, we can see that the data can be assumed to be normally distributed. This can be seen through the nearly linear fit of the residuatls for the QQ plots. This shows us that the ANOVA model was adequate for this analysis. It can also be seen in the interaction plot that we have some intersection of two or more curves. This tells us that those factors are interacting to create an effect in the response variable. The residuals plot helps us identify the linearity of th residuals value and determine outlying values. With the values surrounding zero, it tells us that the models used in our analysis were accurate for determined the effect of gender and education on the fruit intake quantity.
In order to avoid the chance of false positive in a multivariate statistical test, a Tukey’s HSD test is performed.
TukeyHSD(model1, ordered=FALSE, conf.level=0.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Fruit.Quantity ~ Gender, data = x)
##
## $Gender
## diff lwr upr p adj
## Women-Men 61.38 48.51 74.24 0
TukeyHSD(model2, ordered=FALSE, conf.level=0.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Fruit.Quantity ~ Education, data = x)
##
## $Education
## diff lwr upr p adj
## Degree-A-level 18.75 -56.20 93.70 0.8780
## No qualification-A-level -8.00 -82.95 66.95 0.9884
## O-level-A-level -6.00 -80.95 68.95 0.9950
## No qualification-Degree -26.75 -101.70 48.20 0.7192
## O-level-Degree -24.75 -99.70 50.20 0.7630
## O-level-No qualification 2.00 -72.95 76.95 0.9998
TukeyHSD(model3, ordered=FALSE, conf.level=0.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Fruit.Quantity ~ Gender * Education, data = x)
##
## $Gender
## diff lwr upr p adj
## Women-Men 61.38 59.53 63.22 0
##
## $Education
## diff lwr upr p adj
## Degree-A-level 18.75 15.125 22.375 0.0000
## No qualification-A-level -8.00 -11.625 -4.375 0.0005
## O-level-A-level -6.00 -9.625 -2.375 0.0032
## No qualification-Degree -26.75 -30.375 -23.125 0.0000
## O-level-Degree -24.75 -28.375 -21.125 0.0000
## O-level-No qualification 2.00 -1.625 5.625 0.3536
##
## $`Gender:Education`
## diff lwr upr p adj
## Women:A-level-Men:A-level 73.5 67.166 79.834 0.0000
## Men:Degree-Men:A-level 26.5 20.166 32.834 0.0000
## Women:Degree-Men:A-level 84.5 78.166 90.834 0.0000
## Men:No qualification-Men:A-level -1.0 -7.334 5.334 0.9972
## Women:No qualification-Men:A-level 58.5 52.166 64.834 0.0000
## Men:O-level-Men:A-level 3.5 -2.834 9.834 0.4415
## Women:O-level-Men:A-level 58.0 51.666 64.334 0.0000
## Men:Degree-Women:A-level -47.0 -53.334 -40.666 0.0000
## Women:Degree-Women:A-level 11.0 4.666 17.334 0.0018
## Men:No qualification-Women:A-level -74.5 -80.834 -68.166 0.0000
## Women:No qualification-Women:A-level -15.0 -21.334 -8.666 0.0002
## Men:O-level-Women:A-level -70.0 -76.334 -63.666 0.0000
## Women:O-level-Women:A-level -15.5 -21.834 -9.166 0.0002
## Women:Degree-Men:Degree 58.0 51.666 64.334 0.0000
## Men:No qualification-Men:Degree -27.5 -33.834 -21.166 0.0000
## Women:No qualification-Men:Degree 32.0 25.666 38.334 0.0000
## Men:O-level-Men:Degree -23.0 -29.334 -16.666 0.0000
## Women:O-level-Men:Degree 31.5 25.166 37.834 0.0000
## Men:No qualification-Women:Degree -85.5 -91.834 -79.166 0.0000
## Women:No qualification-Women:Degree -26.0 -32.334 -19.666 0.0000
## Men:O-level-Women:Degree -81.0 -87.334 -74.666 0.0000
## Women:O-level-Women:Degree -26.5 -32.834 -20.166 0.0000
## Women:No qualification-Men:No qualification 59.5 53.166 65.834 0.0000
## Men:O-level-Men:No qualification 4.5 -1.834 10.834 0.2145
## Women:O-level-Men:No qualification 59.0 52.666 65.334 0.0000
## Men:O-level-Women:No qualification -55.0 -61.334 -48.666 0.0000
## Women:O-level-Women:No qualification -0.5 -6.834 5.834 1.0000
## Women:O-level-Men:O-level 54.5 48.166 60.834 0.0000
#plot the results
a1 = TukeyHSD(model3, which = "Education", ordered = FALSE)
a2 = TukeyHSD(model3, which = "Gender", ordered = FALSE)
plot(a1)
plot(a2)
From these plots, we can see that the difference in means all are not inclusive of 0, meaning there is a difference between the sample means. For each individual test that returned a p-adjusted value that was 0.05 or lower, it tells us that there is a statistical difference between the mean response variables of the two levels, and the response can be explained by something other than randomization. All p-adjusted values are interpreted in this analysis at a 95% confidence interval level.
Annalijn I. Conklin, Nita G. Forouhi, Marc Suhrcke, Paul Surtees, Nicholas J. Wareham, Pablo Monsivais, Variety more than quantity of fruit and vegetable intake varies by socioeconomic status and financial hardship. Findings from older adults in the EPIC cohort, Appetite, Volume 83, 1 December 2014, Pages 248-255, ISSN 0195-6663, http://dx.doi.org/10.1016/j.appet.2014.08.038. (http://www.sciencedirect.com/science/article/pii/S0195666314004413) Keywords: Healthy eating; Variety; Socioeconomic inequality; Financial hardship; Aging; UK
Code can be seen above.