Mothers and Length of Breastfeeding
The length of breastfeeding was recorded along with several factors pertaining to the mother’s pregnancy and lifestyle. The data consists of 9 variables and 922 observations. 1. duration: Number of weeks breastfeeding 2. race: Race of mother (1=white, 2=black, 3=other) 3. poverty: Mother in poverty (1=yes, 0=no) 4. smoke: Mother smoked at birth of child (1=yes, 0=no) 5. alcohol: Mother used alcohol at child bith (1=yes, 0=no) 6. agemth: Age of mother at child birth 7. ybirth: Year of birth 8. yschool: Education level of mother (years in school) 9. pc3mth: Prenatal care after 3rd month
bfeed = read.csv("bfeed.csv")
Blow are the first 6 observations
head(bfeed)
## X duration race poverty smoke alcohol agemth ybirth yschool pc3mth
## 1 1 16 1 0 0 1 24 82 14 0
## 2 2 1 1 0 1 0 26 85 12 0
## 3 3 4 1 0 0 0 25 85 12 0
## 4 4 3 1 0 1 1 21 85 9 0
## 5 5 36 1 0 1 0 22 82 12 0
## 6 6 36 1 0 0 0 18 82 11 0
The following four factors will be used in the model: 1. race: 3 levels 2. poverty: 2 levels 3. smoke: 2 levels 4 agemth: 14 levels
str(bfeed)
## 'data.frame': 927 obs. of 10 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ duration: int 16 1 4 3 36 36 16 8 20 44 ...
## $ race : int 1 1 1 1 1 1 1 1 1 1 ...
## $ poverty : int 0 0 0 0 0 0 1 0 1 0 ...
## $ smoke : int 0 1 0 1 1 0 1 1 0 0 ...
## $ alcohol : int 1 0 0 1 0 0 0 0 0 0 ...
## $ agemth : int 24 26 25 21 22 18 20 24 24 24 ...
## $ ybirth : int 82 85 85 85 82 82 81 85 85 82 ...
## $ yschool : int 14 12 12 9 12 11 9 12 12 14 ...
## $ pc3mth : int 0 0 0 0 0 0 0 0 0 0 ...
As you can see above, the factors are currently defined as integers. We will change them to be factors before we start our analysis.
bfeed$race <- as.factor(bfeed$race)
str(bfeed$race)
## Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
bfeed$poverty <- as.factor(bfeed$poverty)
str(bfeed$poverty)
## Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 1 2 1 ...
bfeed$smoke <- as.factor(bfeed$smoke)
str(bfeed$smoke)
## Factor w/ 2 levels "0","1": 1 2 1 2 2 1 2 2 1 1 ...
bfeed$agemth <- as.factor(bfeed$agemth)
str(bfeed$agemth)
## Factor w/ 14 levels "15","16","17",..: 10 12 11 7 8 4 6 10 10 10 ...
The levels are shown below.
levels(bfeed$race)
## [1] "1" "2" "3"
levels(bfeed$poverty)
## [1] "0" "1"
levels(bfeed$smoke)
## [1] "0" "1"
levels(bfeed$alcohol)
## NULL
In the data set we have three continuos variables: * duration: number of months breast feeding * agemth: age at giving birth * yschool: years in school
The response variable for this experiment is the amount of months that the mothers spend breast feeding, duration.
As said above, the data consists of 9 variables and 922 observations(mothers). Again here are the first six rows of data:
head(bfeed)
## X duration race poverty smoke alcohol agemth ybirth yschool pc3mth
## 1 1 16 1 0 0 1 24 82 14 0
## 2 2 1 1 0 1 0 26 85 12 0
## 3 3 4 1 0 0 0 25 85 12 0
## 4 4 3 1 0 1 1 21 85 9 0
## 5 5 36 1 0 1 0 22 82 12 0
## 6 6 36 1 0 0 0 18 82 11 0
And here is a general summary of the data:
summary(bfeed)
## X duration race poverty smoke
## Min. : 1.0 Min. : 1.00 1:662 0:756 0:657
## 1st Qu.:232.5 1st Qu.: 4.00 2:117 1:171 1:270
## Median :464.0 Median : 10.00 3:148
## Mean :464.0 Mean : 16.18
## 3rd Qu.:695.5 3rd Qu.: 24.00
## Max. :927.0 Max. :192.00
##
## alcohol agemth ybirth yschool
## Min. :0.00000 21 :135 Min. :78.00 Min. : 3.00
## 1st Qu.:0.00000 20 :123 1st Qu.:80.00 1st Qu.:12.00
## Median :0.00000 22 :120 Median :82.00 Median :12.00
## Mean :0.08522 19 :112 Mean :81.97 Mean :12.21
## 3rd Qu.:0.00000 23 : 99 3rd Qu.:84.00 3rd Qu.:13.00
## Max. :1.00000 24 : 84 Max. :86.00 Max. :19.00
## (Other):254
## pc3mth
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.1769
## 3rd Qu.:0.0000
## Max. :1.0000
##
This experiment will use four factors each with 2 or more levels. The experiment will look at these factors’ effects (and also interaction effects) on the duration of breastfeeding. The null hypothesis of this experiment can be set as the duration of breast feeding is not affected by race, poverty, smoking, alcohol, and any two way interactions of these factors. This expiremnt is set up as a mixed effect experiment. Some of the levels used in this experiment are not able to represent the entire population, while others can. For example; race and age of mother are fixed effect factors where their levels do not represent the entire population (there are more races and ages then those chosen). On the other hand; smoke and poverty are random factors (they are binary variables where you either smoke or do not, or you are in poverty or not).
This data set was retrieved as is from the internet, and the factors show in the data were chosen. There are obviously infinite other factors that could have been chosen to run this experiment with including employment status, number of other children, or mothers current health status (healthy vs. nonhealthy). It is impossible for us to know why the factors that were included were chosen.
There was no information submitted with the data that gave insight to the randomization scheme. By looking at the data and the patterns we can visually see it seems as though the data was collected somewhat randomly where there are not large amounts of “groupings” of durations.
In this dataset there are no replicates and/or repeated measures. The only measures that I could think of as possibly being repeated measures are alcohol and smoking, but they are not.
The data set had more than the four factors that I included in this experiment, so in that sense I used blocking to disregard these variables. There may be some variation in the response variables caused by one of the variables that is not one of the 4 vairables I chose. This makes that variable a nuisance factor and therefore by not including it in the experiment, I am blocking.
Below you can see the summary of our data again, as well as a histogram of the response variable, duration in order to get a better feel for its distribution.
summary(bfeed)
## X duration race poverty smoke
## Min. : 1.0 Min. : 1.00 1:662 0:756 0:657
## 1st Qu.:232.5 1st Qu.: 4.00 2:117 1:171 1:270
## Median :464.0 Median : 10.00 3:148
## Mean :464.0 Mean : 16.18
## 3rd Qu.:695.5 3rd Qu.: 24.00
## Max. :927.0 Max. :192.00
##
## alcohol agemth ybirth yschool
## Min. :0.00000 21 :135 Min. :78.00 Min. : 3.00
## 1st Qu.:0.00000 20 :123 1st Qu.:80.00 1st Qu.:12.00
## Median :0.00000 22 :120 Median :82.00 Median :12.00
## Mean :0.08522 19 :112 Mean :81.97 Mean :12.21
## 3rd Qu.:0.00000 23 : 99 3rd Qu.:84.00 3rd Qu.:13.00
## Max. :1.00000 24 : 84 Max. :86.00 Max. :19.00
## (Other):254
## pc3mth
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.1769
## 3rd Qu.:0.0000
## Max. :1.0000
##
hist(bfeed$duration, breaks = 50, main = "Frequencies of Duration of Breast Feeding", xlab = "Duration")
After some trial and error I decided to use 50 breaks to make it easier to visualize the distribution. It is clear to see that duration is positively skewed with a few outliers pushing out into the 100 and beyond region for number of months.
I will be using boxplots in order to analyze the main effects visually. The box plots show the significance of a factor.
boxplot(bfeed$duration~bfeed$race, xlab = "Race", ylab = "Duration of Breast Feeding", main = "Race: 1= white, 2= black, 3= other")
boxplot(bfeed$duration~bfeed$poverty, xlab = "Poverty", ylab = "Duration of Breast Feeding", main = "Mother in Poverty: 1= Yes, 0= No")
boxplot(bfeed$duration~bfeed$smoke, xlab = "Smoke", ylab = "Duration of Breast Feeding", main = "Mother Smoked at Birth of Child:1= Yes, 0= No")
boxplot(bfeed$duration~bfeed$agemth, xlab = "Age of Mother", ylab = "Duration of Breast Feeding", main = "Age of Mother at Birth of Child")
Based on these box plots, we can say that all of the four factors had some effect on the duration of breast feeding. The factor that seemed to have the biggest difference between its factors was smoke.
I will be using ANOVA in this section to determine the statistical significance of this experiment. We will be able to understand the explained and unexplained variation in the experiment to determine if the experiment is significant.
ANOVA1 = aov(bfeed$duration~bfeed$race)
anova(ANOVA1)
## Analysis of Variance Table
##
## Response: bfeed$duration
## Df Sum Sq Mean Sq F value Pr(>F)
## bfeed$race 2 1991 995.47 3.1139 0.0449 *
## Residuals 924 295394 319.69
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ANOVA2 = aov(bfeed$duration~bfeed$poverty)
anova(ANOVA2)
## Analysis of Variance Table
##
## Response: bfeed$duration
## Df Sum Sq Mean Sq F value Pr(>F)
## bfeed$poverty 1 378 378.50 1.1788 0.2779
## Residuals 925 297006 321.09
ANOVA3 = aov(bfeed$duration~bfeed$smoke)
anova(ANOVA3)
## Analysis of Variance Table
##
## Response: bfeed$duration
## Df Sum Sq Mean Sq F value Pr(>F)
## bfeed$smoke 1 2422 2421.85 7.5949 0.005968 **
## Residuals 925 294963 318.88
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ANOVA4 = aov(bfeed$duration~bfeed$agemth)
anova(ANOVA4)
## Analysis of Variance Table
##
## Response: bfeed$duration
## Df Sum Sq Mean Sq F value Pr(>F)
## bfeed$agemth 13 5206 400.46 1.2513 0.2374
## Residuals 913 292179 320.02
With this outcome, we are not able to reject the null hypothesis because not all of the P values are less that .05. On top of this, the F vlaues for agemth and poverty are not much larger than 1 and therefore the explained and unexplained errors are similar, meaning we cannot confidently say that these factors are significant.
Below are the ANOVA results for the interactions effects.
ANOVA_12 <- aov(bfeed$duration~bfeed$race*bfeed$poverty)
anova(ANOVA_12)
## Analysis of Variance Table
##
## Response: bfeed$duration
## Df Sum Sq Mean Sq F value Pr(>F)
## bfeed$race 2 1991 995.47 3.1119 0.04499 *
## bfeed$poverty 1 619 618.90 1.9347 0.16458
## bfeed$race:bfeed$poverty 2 153 76.58 0.2394 0.78715
## Residuals 921 294622 319.89
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(bfeed$race, bfeed$poverty, bfeed$duration)
Race x Poverty: p > .05, no evidence of interaction effect. No interaction shown in plot.
ANOVA_13 <- aov(bfeed$duration~bfeed$race*bfeed$smoke)
anova(ANOVA_13)
## Analysis of Variance Table
##
## Response: bfeed$duration
## Df Sum Sq Mean Sq F value Pr(>F)
## bfeed$race 2 1991 995.5 3.1473 0.0434299 *
## bfeed$smoke 1 3633 3633.3 11.4871 0.0007306 ***
## bfeed$race:bfeed$smoke 2 455 227.5 0.7193 0.4873755
## Residuals 921 291306 316.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(bfeed$race, bfeed$smoke, bfeed$duration)
Race x Smoke: p > 0.05, no evidence of interaction effect.
ANOVA_14 <- aov(bfeed$duration~bfeed$race*bfeed$agemth)
anova(ANOVA_14)
## Analysis of Variance Table
##
## Response: bfeed$duration
## Df Sum Sq Mean Sq F value Pr(>F)
## bfeed$race 2 1991 995.47 3.0960 0.04572 *
## bfeed$agemth 13 5543 426.38 1.3261 0.19125
## bfeed$race:bfeed$agemth 23 4326 188.11 0.5850 0.94000
## Residuals 888 285525 321.54
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(bfeed$race, bfeed$agemth, bfeed$duration)
Race x Age Mother: p > 0.05, no evidence shown of interaction effects.
ANOVA_23 <- aov(bfeed$duration~bfeed$poverty*bfeed$smoke)
anova(ANOVA_23)
## Analysis of Variance Table
##
## Response: bfeed$duration
## Df Sum Sq Mean Sq F value Pr(>F)
## bfeed$poverty 1 378 378.50 1.1870 0.276210
## bfeed$smoke 1 2626 2625.95 8.2355 0.004202 **
## bfeed$poverty:bfeed$smoke 1 76 75.90 0.2380 0.625737
## Residuals 923 294305 318.86
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(bfeed$poverty, bfeed$smoke, bfeed$duration)
Poverty x Smoke: p > 0.05, no evidence shown of interaction effects. Plots are almost parrellel.
ANOVA_24 <- aov(bfeed$duration~bfeed$poverty*bfeed$agemth)
anova(ANOVA_24)
## Analysis of Variance Table
##
## Response: bfeed$duration
## Df Sum Sq Mean Sq F value Pr(>F)
## bfeed$poverty 1 378 378.50 1.1709 0.2795
## bfeed$agemth 13 5126 394.31 1.2199 0.2592
## bfeed$poverty:bfeed$agemth 13 1287 99.03 0.3064 0.9912
## Residuals 899 290593 323.24
interaction.plot(bfeed$poverty, bfeed$agemth, bfeed$duration)
Poverty x Age Mother: p > 0.05, no interaction effect.
ANOVA_34 <- aov(bfeed$duration~bfeed$smoke*bfeed$agemth)
anova(ANOVA_34)
## Analysis of Variance Table
##
## Response: bfeed$duration
## Df Sum Sq Mean Sq F value Pr(>F)
## bfeed$smoke 1 2422 2421.85 7.6008 0.005953 **
## bfeed$agemth 13 5359 412.20 1.2936 0.210336
## bfeed$smoke:bfeed$agemth 13 3154 242.63 0.7615 0.701673
## Residuals 899 286450 318.63
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
interaction.plot(bfeed$smoke, bfeed$agemth, bfeed$duration)
Smoke x Age Mother: p >0.05, no evidence of interaction effect shown.
http://vincentarelbundock.github.io/Rdatasets/doc/KMsurv/bfeed.html
Klein and Moeschberger (1997) Survival Analysis Techniques for Censored and truncated data, Springer. National Longitudinal Survey of Youth Handbook The Ohio State University, 1995.
Complete R Code:
#Read in Breast Feeding Data
bfeed = read.csv("bfeed.csv")
# Display first 6 rows of data
head(bfeed)
#Display Summary of data
summary(bfeed)
#Display the structure of data
str(bfeed)
#Change the factors from integers to factors, display new structures
bfeed$race <- as.factor(bfeed$race)
str(bfeed$race)
bfeed$poverty <- as.factor(bfeed$poverty)
str(bfeed$poverty)
bfeed$smoke <- as.factor(bfeed$smoke)
str(bfeed$smoke)
bfeed$agemth <- as.factor(bfeed$agemth)
str(bfeed$agemth)
#Display the levels of four factors
levels(bfeed$race)
levels(bfeed$poverty)
levels(bfeed$smoke)
levels(bfeed$alcohol)
#Display Histogram of Duration
hist(bfeed$duration, breaks = 50, main = "Frequencies of Duration of Breast Feeding", xlab = "Duration")
#Display Boxplot of interaction between duration and each of the four factors
boxplot(bfeed$duration~bfeed$race, xlab = "Race", ylab = "Duration of Breast Feeding", main = "Race: 1= white, 2= black, 3= other")
boxplot(bfeed$duration~bfeed$poverty, xlab = "Poverty", ylab = "Duration of Breast Feeding", main = "Mother in Poverty: 1= Yes, 0= No")
boxplot(bfeed$duration~bfeed$smoke, xlab = "Smoke", ylab = "Duration of Breast Feeding", main = "Mother Smoked at Birth of Child:1= Yes, 0= No")
boxplot(bfeed$duration~bfeed$agemth, xlab = "Age of Mother", ylab = "Duration of Breast Feeding", main = "Age of Mother at Birth of Child")
#Perform Analysis of Varience test for Main Effects
ANOVA1 = aov(bfeed$duration~bfeed$race)
anova(ANOVA1)
ANOVA2 = aov(bfeed$duration~bfeed$poverty)
anova(ANOVA2)
ANOVA3 = aov(bfeed$duration~bfeed$smoke)
anova(ANOVA3)
ANOVA4 = aov(bfeed$duration~bfeed$agemth)
anova(ANOVA4)
#Perform ANOVA for Interaction Effects
ANOVA_12 <- aov(bfeed$duration~bfeed$race*bfeed$poverty)
anova(ANOVA_12)
interaction.plot(bfeed$race, bfeed$poverty, bfeed$duration)
ANOVA_13 <- aov(bfeed$duration~bfeed$race*bfeed$smoke)
anova(ANOVA_13)
interaction.plot(bfeed$race, bfeed$smoke, bfeed$duration)
ANOVA_14 <- aov(bfeed$duration~bfeed$race*bfeed$agemth)
anova(ANOVA_14)
interaction.plot(bfeed$race, bfeed$agemth, bfeed$duration)
ANOVA_23 <- aov(bfeed$duration~bfeed$poverty*bfeed$smoke)
anova(ANOVA_23)
interaction.plot(bfeed$poverty, bfeed$smoke, bfeed$duration)
ANOVA_24 <- aov(bfeed$duration~bfeed$poverty*bfeed$agemth)
anova(ANOVA_24)
interaction.plot(bfeed$poverty, bfeed$agemth, bfeed$duration)
ANOVA_34 <- aov(bfeed$duration~bfeed$smoke*bfeed$agemth)
anova(ANOVA_34)
interaction.plot(bfeed$smoke, bfeed$agemth, bfeed$duration)