This study analyzes data from birth weights of children in North Carolina in 2001. In this experiment, there are two 2-level factors and two 3-level factors; thus, the full factorial design is \(2^2\) * \(3^2\). Gender of the baby (sex) has two levels: male (1) or female (2). Mother smoking status (smoke) has two levels: yes (1) or no (2). The race of the mom (race) has three level: white (1), black (2), or other (3). Completed weeks of gestations (weeks) has 3 levels: pre-mature (<=36), ~full term (37-40), and past full term (>40). The dependent variable is the birth weight (weight) of the baby in ounces.
Exploratory analysis will first be conducted. ANOVA will be conducted along with alternatives to null hypothesis statistical testing. The data will be checked for aliasing. The experiment will culminate with model validation.
Randomization: As there was no control over randomization in data collection or assignment to treatments, the data will be randomized without replacement for analysis.
Replication: This study contains replicates as different mothers had children for the same duration of gestation, of the same race, babies of the same gender, and same mother smoking status.
Blocking: Blocking is used to reduce the variability of a sample. No blocking was used with this dataset. The effect of all factors is studied.
Week levels
## [1] "<=36" "37-40" ">40"
Race levels
## [1] "1" "2" "3"
Smoke levels
## [1] "1" "0"
Sex levels
## [1] "1" "2"
Head of table
## ID weeks race sex smoke weight
## 1 1 37-40 1 1 0 111
## 2 2 37-40 1 2 0 116
## 3 3 37-40 1 1 0 138
First, exploratory boxplots were conducted to study the individual levels of all factors.
The dataset was randomized.
## [1] 1445
## ID weeks race sex smoke weight
## 450 450 37-40 1 2 0 139
## 451 451 <=36 1 1 0 120
## 1091 1091 37-40 2 1 0 138
Fractional Factorial Design
A fractional factorial design was first done for the dataset.
## Warning: package 'FrF2' was built under R version 3.3.2
## Loading required package: DoE.base
## Warning: package 'DoE.base' was built under R version 3.3.2
## Loading required package: grid
## Loading required package: conf.design
## Warning: package 'conf.design' was built under R version 3.3.2
##
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
##
## aov, lm
## The following object is masked from 'package:graphics':
##
## plot.design
## The following object is masked from 'package:base':
##
## lengths
## creating full factorial with 64 runs ...
## A B C D E F
## 1 1 1 -1 1 1 1
## 2 -1 1 -1 1 1 -1
## 3 1 1 1 -1 -1 1
## 4 1 -1 1 1 1 -1
## 5 1 1 1 -1 1 1
## 6 -1 -1 -1 1 1 1
## 7 -1 -1 -1 1 -1 -1
## 8 -1 -1 1 1 -1 1
## 9 -1 -1 1 1 1 -1
## 10 -1 -1 -1 1 -1 1
## 11 1 1 -1 1 1 -1
## 12 1 -1 1 1 -1 -1
## 13 -1 1 -1 -1 1 1
## 14 1 -1 -1 -1 1 -1
## 15 -1 1 1 -1 -1 -1
## 16 1 1 1 -1 1 -1
## 17 -1 1 -1 1 -1 -1
## 18 1 -1 -1 -1 -1 -1
## 19 -1 1 -1 -1 -1 -1
## 20 -1 -1 1 -1 -1 -1
## 21 -1 -1 1 1 1 1
## 22 1 1 1 -1 -1 -1
## 23 -1 1 -1 -1 1 -1
## 24 1 1 1 1 1 1
## 25 -1 1 1 1 -1 -1
## 26 1 -1 -1 -1 1 1
## 27 -1 1 -1 1 1 1
## 28 -1 1 -1 -1 -1 1
## 29 1 -1 1 1 -1 1
## 30 1 -1 -1 1 -1 -1
## 31 -1 -1 -1 1 1 -1
## 32 -1 1 1 -1 1 -1
## 33 -1 1 1 1 1 -1
## 34 1 -1 1 1 1 1
## 35 -1 1 1 1 -1 1
## 36 -1 1 1 -1 -1 1
## 37 1 -1 1 -1 -1 1
## 38 -1 -1 -1 -1 1 -1
## 39 1 1 1 1 -1 1
## 40 1 -1 -1 1 1 -1
## 41 1 1 -1 -1 1 -1
## 42 -1 -1 1 -1 1 -1
## 43 -1 -1 -1 -1 -1 1
## 44 1 -1 -1 1 -1 1
## 45 1 -1 -1 1 1 1
## 46 1 -1 1 -1 1 -1
## 47 1 1 -1 1 -1 -1
## 48 -1 1 1 -1 1 1
## 49 1 1 -1 -1 -1 -1
## 50 1 -1 1 -1 -1 -1
## 51 -1 -1 -1 -1 -1 -1
## 52 -1 -1 1 1 -1 -1
## 53 1 1 1 1 1 -1
## 54 -1 1 1 1 1 1
## 55 -1 1 -1 1 -1 1
## 56 -1 -1 1 -1 -1 1
## 57 1 1 -1 -1 1 1
## 58 -1 -1 1 -1 1 1
## 59 1 -1 1 -1 1 1
## 60 -1 -1 -1 -1 1 1
## 61 1 1 1 1 -1 -1
## 62 1 -1 -1 -1 -1 1
## 63 1 1 -1 -1 -1 1
## 64 1 1 -1 1 -1 1
## class=design, type= full factorial
The 3-levels factors were each broken into two-two level factors. Below is the fractional factorial design, containing \(2^6\) or 64 runs, that would be used to study this dataset. The low, med, and high criteria were later used to select data for the factorial design.
## run A_sex B_smoke C_race D_race E_weeks F_weeks A_sex.1 B_smoke.1
## 1 1 0 1 0 0 0 1 low high
## 2 2 1 0 0 0 1 0 high low
## 3 3 1 0 1 0 1 1 high low
## 4 4 0 1 1 0 0 0 low high
## 5 5 1 0 1 1 1 0 high low
## 6 6 0 1 1 1 1 0 low high
## 7 7 0 1 1 1 1 1 low high
## 8 8 1 1 1 0 0 1 high high
## 9 9 0 0 1 0 1 0 low low
## 10 10 0 1 1 0 1 0 low high
## 11 11 0 1 1 0 0 1 low high
## 12 12 0 1 0 1 1 0 low high
## 13 13 0 0 1 0 0 0 low low
## 14 14 1 0 0 0 0 0 high low
## 15 15 1 1 1 0 1 1 high high
## 16 16 0 1 1 1 0 0 low high
## 17 17 0 0 1 1 0 0 low low
## 18 18 1 1 1 1 1 0 high high
## 19 19 0 1 1 1 0 1 low high
## 20 20 0 0 1 0 0 1 low low
## 21 21 0 1 0 1 1 1 low high
## 22 22 1 1 1 0 0 0 high high
## 23 23 1 1 0 1 1 1 high high
## 24 24 1 0 1 1 0 0 high low
## 25 25 1 1 0 0 0 1 high high
## 26 26 1 0 1 0 0 0 high low
## 27 27 1 1 1 1 0 1 high high
## 28 28 1 1 1 1 0 0 high high
## 29 29 1 0 0 1 0 0 high low
## 30 30 1 1 1 1 1 1 high high
## 31 31 1 0 1 0 1 0 high low
## 32 32 0 0 0 0 0 1 low low
## 33 33 1 0 0 1 1 1 high low
## 34 34 1 1 0 1 1 0 high high
## 35 35 1 1 0 1 0 1 high high
## 36 36 1 0 1 1 1 1 high low
## 37 37 1 1 1 0 1 0 high high
## 38 38 0 1 0 1 0 0 low high
## 39 39 0 0 0 0 0 0 low low
## 40 40 0 0 0 0 1 1 low low
## 41 41 0 1 0 0 0 0 low high
## 42 42 1 1 0 0 0 0 high high
## 43 43 0 0 1 0 1 1 low low
## 44 44 0 1 0 0 1 1 low high
## 45 45 0 0 1 1 1 1 low low
## 46 46 1 1 0 1 0 0 high high
## 47 47 1 1 0 0 1 1 high high
## 48 48 0 0 0 1 1 0 low low
## 49 49 1 0 0 0 1 1 high low
## 50 50 0 0 0 1 1 1 low low
## 51 51 0 0 0 0 1 0 low low
## 52 52 0 0 0 1 0 0 low low
## 53 53 0 0 1 1 1 0 low low
## 54 54 1 0 1 0 0 1 high low
## 55 55 0 1 0 0 1 0 low high
## 56 56 1 0 1 1 0 1 high low
## 57 57 1 1 0 0 1 0 high high
## 58 58 0 0 1 1 0 1 low low
## 59 59 1 0 0 1 0 1 high low
## 60 60 1 0 0 0 0 1 high low
## 61 61 0 1 1 0 1 1 low high
## 62 62 0 1 0 1 0 1 low high
## 63 63 1 0 0 1 1 0 high low
## 64 64 0 0 0 1 0 1 low low
## X1_race X2_weeks
## 1 low med
## 2 low med
## 3 med high
## 4 med low
## 5 high med
## 6 high med
## 7 high high
## 8 med med
## 9 med med
## 10 med med
## 11 med med
## 12 med med
## 13 med low
## 14 low low
## 15 med high
## 16 high low
## 17 high low
## 18 high med
## 19 high med
## 20 med med
## 21 med high
## 22 med low
## 23 med high
## 24 high low
## 25 low med
## 26 med low
## 27 high med
## 28 high low
## 29 med low
## 30 high high
## 31 med med
## 32 low med
## 33 med high
## 34 med med
## 35 med med
## 36 high high
## 37 med med
## 38 med low
## 39 low low
## 40 low high
## 41 low low
## 42 low low
## 43 med high
## 44 low high
## 45 high high
## 46 med low
## 47 low high
## 48 med med
## 49 low high
## 50 med high
## 51 low med
## 52 med low
## 53 high med
## 54 med med
## 55 low med
## 56 high med
## 57 low med
## 58 high med
## 59 med med
## 60 low med
## 61 med high
## 62 med med
## 63 med med
## 64 med med
Data were all first converted to either 0 and 1 for 2-level factors and 0, 1, and 2 for 3-level factors.
A for-loop was used to match levels in the fractional factorial design with levels of the actual dataset. For each of the 64 runs, one data point was randomly selected for each criterion. The main issue with the use of the fractional factorial design is that the combination of levels for run 30 did not exist within the actual dataset. The birth weights selected for each run are displayed below.
## [,1]
## [1,] 113
## [2,] 127
## [3,] 112
## [4,] 35
## [5,] 139
## [6,] 105
## [7,] 110
## [8,] 107
## [9,] 123
## [10,] 95
## [11,] 95
## [12,] 117
## [13,] 32
## [14,] 53
## [15,] 129
## [16,] 96
## [17,] 131
## [18,] 123
## [19,] 105
## [20,] 87
## [21,] 133
## [22,] 84
## [23,] 129
## [24,] 120
## [25,] 111
## [26,] 93
## [27,] 123
## [28,] 88
## [29,] 36
## [30,] NA
## [31,] 119
## [32,] 136
## [33,] 116
## [34,] 98
## [35,] 128
## [36,] 114
## [37,] 83
## [38,] 73
## [39,] 94
## [40,] 116
## [41,] 115
## [42,] 78
## [43,] 122
## [44,] 90
## [45,] 102
## [46,] 101
## [47,] 118
## [48,] 130
## [49,] 148
## [50,] 103
## [51,] 137
## [52,] 128
## [53,] 112
## [54,] 117
## [55,] 134
## [56,] 153
## [57,] 102
## [58,] 118
## [59,] 100
## [60,] 131
## [61,] 115
## [62,] 109
## [63,] 102
## [64,] 116
1/8 Fractional Factorial Design
As 64 runs is costly and time-consuming, this experiment can be further reduced using a 1/8 fractional factorial design. Thus, the resulting experiment is a \(2^{6-3}\) fractional factorial design experiment. In this experiment, there are 6 factors, each with 2 levels, and 8 runs total. This is a resolution III experiment, where both me and 2fi are confounded by 2-factor (or greater) interactions. The aliasing is also displayed below.
## sex smoke race1 race2 weeks1 weeks2
## 1 1 1 -1 1 -1 -1
## 2 1 -1 -1 -1 -1 1
## 3 1 -1 1 -1 1 -1
## 4 1 1 1 1 1 1
## 5 -1 -1 1 1 -1 -1
## 6 -1 -1 -1 1 1 1
## 7 -1 1 1 -1 -1 1
## 8 -1 1 -1 -1 1 -1
## class=design, type= FrF2
## $legend
## [1] "A=sex" "B=smoke" "C=race1" "D=race2" "E=weeks1" "F=weeks2"
##
## $main
## [1] "A=BD=CE" "B=AD=CF" "C=AE=BF" "D=AB=EF" "E=AC=DF" "F=BC=DE"
##
## $fi2
## [1] "AF=BE=CD"
The run selection was repeated per the procedure for the fractional factorial design, where actual treatment combinations are see broken up by two levels.
## run sex smoke race1 race2 weeks1 weeks2 sex.1 smoke.1 race weeks
## 1 1 0 0 1 1 0 0 low low high low
## 2 2 0 1 0 0 1 0 low high low med
## 3 3 0 1 1 0 0 1 low high med med
## 4 4 0 0 0 1 1 1 low low med high
## 5 5 1 0 0 0 0 1 high low low med
## 6 6 1 1 0 1 0 0 high high med low
## 7 7 1 1 1 1 1 1 high high high high
## 8 8 1 0 1 0 1 0 high low med med
After the for-loop was used to match levels in the 1/8 fractional factorial design with levels of the actual dataset, the output is shown below. Again, the issue is that the levels of one of the experimental runs has no match within the actual dataset. This is a major flaw with fractional designs and suggests they can only be used for certain datasets.
## [,1]
## [1,] 113
## [2,] 127
## [3,] 112
## [4,] 35
## [5,] 139
## [6,] 105
## [7,] NA
## [8,] 107
Function Generator
To calculate the function generation for the 1/8 fractional factorial design, after the 6 levels are randomly chosen using the FrF2 function, ABC is determined by multiplying the signs (+,-). All levels are then multiplied to determine the signs of the function generator, which are all positive.
## run sex_A smoke_B race1_C race2_AB weeks1_AC weeks2_BC ABC I
## 1 1 0 0 1 1 0 0 0 1
## 2 2 0 1 0 0 1 0 1 1
## 3 3 0 1 1 0 0 1 1 1
## 4 4 0 0 0 1 1 1 0 1
## 5 5 1 0 0 0 0 1 1 1
## 6 6 1 1 0 1 0 0 0 1
## 7 7 1 1 1 1 1 1 0 1
## 8 8 1 0 1 0 1 0 1 1
Main and Interaction Effects
Main effects were then calculated for the 6 factors. The aliasing combinations were used to also determine the interaction effects, listed in a table below.
## effects
## me_sex 78.75
## me_smoke 73.50
## me_race1 64.50
## me_race2 5.25
## me_weeks1 17.25
## me_weeks2 30.00
## if_AB 5.25
## if_AC 17.25
## if_AD 73.50
## if_AE 64.50
## if_BC 30.00
## if_BD 78.75
## if_BF 64.50
## if_CE 78.75
## if_CF 73.50
## if_DE 30.00
## if_DF 17.25
## if_EF 5.25
ANOVA
Finally, an ANOVA was run based on the main effect data. As all factors had quite large effects, all were included in the linear model. By ANOVA, all factors had signficant effects on child birth weight.
## Df Sum Sq Mean Sq F value Pr(>F)
## df$sex 1 2000 2000 5.492 0.01924 *
## df$smoke 1 12597 12597 34.591 5.05e-09 ***
## df$race 1 3177 3177 8.724 0.00319 **
## df$weeks 2 178151 89075 244.599 < 2e-16 ***
## Residuals 1439 524039 364
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model Adequacy Checking
Model adequacy checking was done to determine the legitmacy of the model. The residuals vs. fitted plot shows a relatively uniform distribution of values. The QQ-plot only has significant deviation for negative theoretical quantiles suggesting a relatively good model but one that still could be improved.
Conclusions
Ultimately, these results suggest mothers smoking during pregnancy, baby gender, race, and weeks of gestation all have significant effects on the birth weight of babies. However, the model could be improved a bit more. As aliasing played a role in and we are unable to determine whether effects are solely due to main effects, 2 factor interactions, or even higher order interactions, main effects were included in this model. Thus, future models would benefit from closer consideration of 2fi. Additionally, fractional factorial design was not an appropriate method for analysis for this dataset, as not all of the runs were represented in the dataset, requiring further analysis via different methods.
This section contains all code used to analyze this data.
#**Week levels**
week_levels = c("<=36", "37-40", ">40")
week_levels
## [1] "<=36" "37-40" ">40"
#**Race levels**
Race_levels = c("1", "2", "3")
Race_levels
## [1] "1" "2" "3"
#**Smoke levels**
smoke_levels = c("1", "0")
smoke_levels
## [1] "1" "0"
#**Sex levels**
sex_levels = c("1", "2")
sex_levels
## [1] "1" "2"
#**Head of dataset**
#Load in data using Utils package
library(utils)
setwd("C:/Users/Alexis/Documents/R/win-library/3.3")
df <- read.csv("120316_project3birthweightdata.csv", header = TRUE)
df <- df[complete.cases(df),]
#Show first 3 rows of data table
head(df, 3)
## ID weeks race sex smoke weight
## 1 1 37-40 1 1 0 111
## 2 2 37-40 1 2 0 116
## 3 3 37-40 1 1 0 138
#first, exploratory boxplots were conducted to study the individual levels of all factors
#Boxplot examining levels of each factor for exploratory analysis
boxplotweeks <- boxplot(df$weight~df$weeks, xlab = "Weeks of Gestation", ylab = "Birth Weight", main = "Birth Weight vs. Weeks of Gestation")
boxplotrace <- boxplot(df$weight~df$race, xlab = "Race of mother", ylab = "Birth Weight", main = "Birth Weight vs. Race of Mother")
boxplotsmoke <- boxplot(df$weight~df$smoke, xlab = "Mother Smoking Status", ylab = "Birth Weight", main = "Birth Weight vs. Mother Smoking Status")
boxplotsex <- boxplot(df$weight~df$sex, xlab = "Gender of Baby", ylab = "Birth Weight", main = "Birth Weight vs. Gender of Baby")
#the dataset was randomized
#To randomize data, data were randomly selected and combined to form a new matrix.
sample_size = length(1:nrow(df))
sample_size
## [1] 1445
dfrand = df[sample(1:nrow(df), size = sample_size, replace = FALSE),]
head(dfrand,3)
## ID weeks race sex smoke weight
## 78 78 37-40 1 1 0 114
## 921 921 37-40 1 1 0 140
## 194 194 >40 3 2 0 110
#a fractional factorial design was first done for the dataset
library(FrF2)
FrF2(64,nfactors=6,estimable=formula("~A+B+C+D+A:(B+C+D)"), clear=FALSE)
## creating full factorial with 64 runs ...
## A B C D E F
## 1 1 -1 -1 -1 1 -1
## 2 -1 -1 -1 -1 1 -1
## 3 1 1 1 1 1 1
## 4 1 -1 -1 -1 -1 1
## 5 -1 -1 1 1 1 -1
## 6 1 -1 1 1 -1 -1
## 7 -1 1 -1 1 1 -1
## 8 1 -1 -1 -1 1 1
## 9 -1 -1 1 1 -1 -1
## 10 1 -1 -1 1 1 -1
## 11 1 -1 -1 1 -1 -1
## 12 -1 -1 -1 1 1 -1
## 13 -1 1 1 1 1 1
## 14 1 1 -1 -1 -1 -1
## 15 1 1 1 -1 -1 1
## 16 -1 1 -1 -1 -1 1
## 17 1 1 1 1 -1 1
## 18 -1 -1 1 1 1 1
## 19 1 1 -1 -1 -1 1
## 20 1 -1 1 -1 -1 1
## 21 1 1 1 1 1 -1
## 22 1 1 -1 -1 1 -1
## 23 -1 1 1 -1 1 -1
## 24 -1 -1 1 -1 1 -1
## 25 -1 -1 -1 1 -1 1
## 26 -1 1 -1 -1 1 1
## 27 1 1 -1 -1 1 1
## 28 -1 -1 -1 -1 1 1
## 29 -1 1 -1 1 1 1
## 30 1 -1 1 1 -1 1
## 31 1 1 1 -1 -1 -1
## 32 -1 -1 1 -1 -1 1
## 33 -1 -1 -1 -1 -1 -1
## 34 1 -1 1 1 1 -1
## 35 1 1 -1 1 1 1
## 36 1 1 1 -1 1 1
## 37 -1 1 -1 -1 -1 -1
## 38 -1 1 1 1 -1 1
## 39 -1 1 1 -1 -1 -1
## 40 -1 1 -1 1 -1 -1
## 41 -1 1 1 -1 1 1
## 42 -1 -1 1 -1 1 1
## 43 -1 -1 -1 -1 -1 1
## 44 -1 1 1 -1 -1 1
## 45 1 -1 -1 1 -1 1
## 46 -1 1 1 1 -1 -1
## 47 1 -1 -1 1 1 1
## 48 -1 -1 1 1 -1 1
## 49 -1 -1 -1 1 -1 -1
## 50 1 -1 1 -1 -1 -1
## 51 -1 -1 1 -1 -1 -1
## 52 1 1 -1 1 -1 -1
## 53 1 1 -1 1 -1 1
## 54 1 -1 1 -1 1 1
## 55 1 -1 1 -1 1 -1
## 56 -1 1 -1 -1 1 -1
## 57 1 1 -1 1 1 -1
## 58 -1 1 1 1 1 -1
## 59 -1 1 -1 1 -1 1
## 60 1 1 1 1 -1 -1
## 61 -1 -1 -1 1 1 1
## 62 1 -1 1 1 1 1
## 63 1 1 1 -1 1 -1
## 64 1 -1 -1 -1 -1 -1
## class=design, type= full factorial
#load in fractional factorial design table
df2 <- read.csv("120716_DoE_FractionalFactorialDesign.csv", header = TRUE)
df2
## run A_sex B_smoke C_race D_race E_weeks F_weeks A_sex.1 B_smoke.1
## 1 1 0 1 0 0 0 1 low high
## 2 2 1 0 0 0 1 0 high low
## 3 3 1 0 1 0 1 1 high low
## 4 4 0 1 1 0 0 0 low high
## 5 5 1 0 1 1 1 0 high low
## 6 6 0 1 1 1 1 0 low high
## 7 7 0 1 1 1 1 1 low high
## 8 8 1 1 1 0 0 1 high high
## 9 9 0 0 1 0 1 0 low low
## 10 10 0 1 1 0 1 0 low high
## 11 11 0 1 1 0 0 1 low high
## 12 12 0 1 0 1 1 0 low high
## 13 13 0 0 1 0 0 0 low low
## 14 14 1 0 0 0 0 0 high low
## 15 15 1 1 1 0 1 1 high high
## 16 16 0 1 1 1 0 0 low high
## 17 17 0 0 1 1 0 0 low low
## 18 18 1 1 1 1 1 0 high high
## 19 19 0 1 1 1 0 1 low high
## 20 20 0 0 1 0 0 1 low low
## 21 21 0 1 0 1 1 1 low high
## 22 22 1 1 1 0 0 0 high high
## 23 23 1 1 0 1 1 1 high high
## 24 24 1 0 1 1 0 0 high low
## 25 25 1 1 0 0 0 1 high high
## 26 26 1 0 1 0 0 0 high low
## 27 27 1 1 1 1 0 1 high high
## 28 28 1 1 1 1 0 0 high high
## 29 29 1 0 0 1 0 0 high low
## 30 30 1 1 1 1 1 1 high high
## 31 31 1 0 1 0 1 0 high low
## 32 32 0 0 0 0 0 1 low low
## 33 33 1 0 0 1 1 1 high low
## 34 34 1 1 0 1 1 0 high high
## 35 35 1 1 0 1 0 1 high high
## 36 36 1 0 1 1 1 1 high low
## 37 37 1 1 1 0 1 0 high high
## 38 38 0 1 0 1 0 0 low high
## 39 39 0 0 0 0 0 0 low low
## 40 40 0 0 0 0 1 1 low low
## 41 41 0 1 0 0 0 0 low high
## 42 42 1 1 0 0 0 0 high high
## 43 43 0 0 1 0 1 1 low low
## 44 44 0 1 0 0 1 1 low high
## 45 45 0 0 1 1 1 1 low low
## 46 46 1 1 0 1 0 0 high high
## 47 47 1 1 0 0 1 1 high high
## 48 48 0 0 0 1 1 0 low low
## 49 49 1 0 0 0 1 1 high low
## 50 50 0 0 0 1 1 1 low low
## 51 51 0 0 0 0 1 0 low low
## 52 52 0 0 0 1 0 0 low low
## 53 53 0 0 1 1 1 0 low low
## 54 54 1 0 1 0 0 1 high low
## 55 55 0 1 0 0 1 0 low high
## 56 56 1 0 1 1 0 1 high low
## 57 57 1 1 0 0 1 0 high high
## 58 58 0 0 1 1 0 1 low low
## 59 59 1 0 0 1 0 1 high low
## 60 60 1 0 0 0 0 1 high low
## 61 61 0 1 1 0 1 1 low high
## 62 62 0 1 0 1 0 1 low high
## 63 63 1 0 0 1 1 0 high low
## 64 64 0 0 0 1 0 1 low low
## X1_race X2_weeks
## 1 low med
## 2 low med
## 3 med high
## 4 med low
## 5 high med
## 6 high med
## 7 high high
## 8 med med
## 9 med med
## 10 med med
## 11 med med
## 12 med med
## 13 med low
## 14 low low
## 15 med high
## 16 high low
## 17 high low
## 18 high med
## 19 high med
## 20 med med
## 21 med high
## 22 med low
## 23 med high
## 24 high low
## 25 low med
## 26 med low
## 27 high med
## 28 high low
## 29 med low
## 30 high high
## 31 med med
## 32 low med
## 33 med high
## 34 med med
## 35 med med
## 36 high high
## 37 med med
## 38 med low
## 39 low low
## 40 low high
## 41 low low
## 42 low low
## 43 med high
## 44 low high
## 45 high high
## 46 med low
## 47 low high
## 48 med med
## 49 low high
## 50 med high
## 51 low med
## 52 med low
## 53 high med
## 54 med med
## 55 low med
## 56 high med
## 57 low med
## 58 high med
## 59 med med
## 60 low med
## 61 med high
## 62 med med
## 63 med med
## 64 med med
#use for loop to select one data point for each fractional factorial design run
vars <- cbind.data.frame(df$sex, df$smoke, df$race, df$weeks)
combos <- cbind.data.frame(df2$A_sex, df2$B_smoke, df2$X1_race, df2$X2_weeks)
vars[,4] <- as.character(vars[,4])
vars[which(vars[,1]==1),1] <- 0
vars[which(vars[,1]==2),1] <- 1
vars[which(vars[,2]==0),2] <- 0
vars[which(vars[,2]==1),2] <- 1
vars[which(vars[,3]==1),3] <- 0
vars[which(vars[,3]==2),3] <- 1
vars[which(vars[,3]==3),3] <- 2
vars[which(vars[,4]=="<=36"),4] <- 0
vars[which(vars[,4]=="37-40"),4] <- 1
vars[which(vars[,4]==">40"),4] <- 2
combos[,1] <- as.character(combos[,1])
combos[which(combos[,1]=="low"),1] <- 0
combos[which(combos[,1]=="high"),1] <- 1
combos[,2] <- as.character(combos[,2])
combos[which(combos[,2]=="low"),2] <- 0
combos[which(combos[,2]=="high"),2] <- 1
combos[,3] <- as.character(combos[,3])
combos[which(combos[,3]=="low"),3] <- 0
combos[which(combos[,3]=="med"),3] <- 1
combos[which(combos[,3]=="high"),3] <- 2
combos[,4] <- as.character(combos[,4])
combos[which(combos[,4]=="low"),4] <- 0
combos[which(combos[,4]=="med"),4] <- 1
combos[which(combos[,4]=="high"),4] <- 2
combos[,1] <- as.numeric(combos[,1])
combos[,2] <- as.numeric(combos[,2])
combos[,3] <- as.numeric(combos[,3])
combos[,4] <- as.numeric(combos[,4])
good <- matrix(0,nrow(combos))
for (i in 1:nrow(combos))
{
row_i <- combos[i,]
row_i <- as.numeric(row_i)
indexes <- which(vars[,1]==row_i[1] & vars[,2]==row_i[2] & vars[,3]==row_i[3] & vars[,4]==row_i[4])
if(length(indexes)>0)
{
temp <- sample(1:length(indexes),size = 1, replace=FALSE)
good[i] <- indexes[temp]
}
}
#save response variable for each fractional factorial design run in a single matrix
ans<-df$weight[good]
ans <- c(ans[1:29],NA,ans[30:length(ans)])
ans <- as.matrix(ans)
ans
## [,1]
## [1,] 116
## [2,] 123
## [3,] 125
## [4,] 66
## [5,] 129
## [6,] 105
## [7,] 110
## [8,] 111
## [9,] 153
## [10,] 98
## [11,] 108
## [12,] 159
## [13,] 91
## [14,] 102
## [15,] 130
## [16,] 96
## [17,] 70
## [18,] 123
## [19,] 105
## [20,] 94
## [21,] 122
## [22,] 100
## [23,] 129
## [24,] 93
## [25,] 133
## [26,] 36
## [27,] 123
## [28,] 88
## [29,] 87
## [30,] NA
## [31,] 97
## [32,] 134
## [33,] 125
## [34,] 109
## [35,] 108
## [36,] 110
## [37,] 128
## [38,] 19
## [39,] 121
## [40,] 125
## [41,] 51
## [42,] 78
## [43,] 141
## [44,] 127
## [45,] 102
## [46,] 101
## [47,] 134
## [48,] 111
## [49,] 102
## [50,] 147
## [51,] 158
## [52,] 94
## [53,] 158
## [54,] 161
## [55,] 118
## [56,] 133
## [57,] 107
## [58,] 121
## [59,] 118
## [60,] 125
## [61,] 122
## [62,] 109
## [63,] 109
## [64,] 121
#randomly design 1/8 fractional factorial run and determine aliasing
eightdesign = FrF2(8,6,factor.names = c("sex","smoke","race1","race2","weeks1", "weeks2"))
eightdesign
## sex smoke race1 race2 weeks1 weeks2
## 1 -1 1 -1 -1 1 -1
## 2 -1 -1 1 1 -1 -1
## 3 1 -1 1 -1 1 -1
## 4 1 1 -1 1 -1 -1
## 5 -1 1 1 -1 -1 1
## 6 1 1 1 1 1 1
## 7 -1 -1 -1 1 1 1
## 8 1 -1 -1 -1 -1 1
## class=design, type= FrF2
design.info(eightdesign)$aliased
## $legend
## [1] "A=sex" "B=smoke" "C=race1" "D=race2" "E=weeks1" "F=weeks2"
##
## $main
## [1] "A=BD=CE" "B=AD=CF" "C=AE=BF" "D=AB=EF" "E=AC=DF" "F=BC=DE"
##
## $fi2
## [1] "AF=BE=CD"
#load in 1/8 fractional factorial design
df3 <- read.csv("120916_DoE_EighthFractionalFactorialDesign.csv", header = TRUE)
df3
## run sex smoke race1 race2 weeks1 weeks2 sex.1 smoke.1 race weeks
## 1 1 0 0 1 1 0 0 low low high low
## 2 2 0 1 0 0 1 0 low high low med
## 3 3 0 1 1 0 0 1 low high med med
## 4 4 0 0 0 1 1 1 low low med high
## 5 5 1 0 0 0 0 1 high low low med
## 6 6 1 1 0 1 0 0 high high med low
## 7 7 1 1 1 1 1 1 high high high high
## 8 8 1 0 1 0 1 0 high low med med
#for loop selection of data for each run
vars2 <- cbind.data.frame(df$sex, df$smoke, df$race, df$weeks)
combos2 <- cbind.data.frame(df3$sex, df3$smoke, df3$race, df3$weeks)
vars2[,4] <- as.character(vars2[,4])
vars2[which(vars2[,1]==1),1] <- 0
vars2[which(vars2[,1]==2),1] <- 1
vars2[which(vars2[,2]==0),2] <- 0
vars2[which(vars2[,2]==1),2] <- 1
vars2[which(vars2[,3]==1),3] <- 0
vars2[which(vars2[,3]==2),3] <- 1
vars2[which(vars2[,3]==3),3] <- 2
vars2[which(vars2[,4]=="<=36"),4] <- 0
vars2[which(vars2[,4]=="37-40"),4] <- 1
vars2[which(vars2[,4]==">40"),4] <- 2
combos2[,1] <- as.character(combos2[,1])
combos2[which(combos2[,1]=="low"),1] <- 0
combos2[which(combos2[,1]=="high"),1] <- 1
combos2[,2] <- as.character(combos2[,2])
combos2[which(combos2[,2]=="low"),2] <- 0
combos[which(combos[,2]=="high"),2] <- 1
combos2[,3] <- as.character(combos2[,3])
combos2[which(combos2[,3]=="low"),3] <- 0
combos2[which(combos2[,3]=="med"),3] <- 1
combos2[which(combos2[,3]=="high"),3] <- 2
combos2[,4] <- as.character(combos2[,4])
combos2[which(combos2[,4]=="low"),4] <- 0
combos2[which(combos2[,4]=="med"),4] <- 1
combos2[which(combos2[,4]=="high"),4] <- 2
combos2[,1] <- as.numeric(combos2[,1])
combos2[,2] <- as.numeric(combos2[,2])
combos2[,3] <- as.numeric(combos2[,3])
combos2[,4] <- as.numeric(combos2[,4])
good2 <- matrix(0,nrow(combos2))
for (i in 1:nrow(combos2))
{
row_i <- combos2[i,]
row_i <- as.numeric(row_i)
indexes <- which(vars2[,1]==row_i[1] & vars2[,2]==row_i[2] & vars2[,3]==row_i[3] & vars2[,4]==row_i[4])
if(length(indexes)>0)
{
temp <- sample(1:length(indexes),size = 1, replace=FALSE)
good2[i] <- indexes[temp]
}
}
#Save data from corresponding runs in a matrix
ans2 <- df$weight[good2]
ans2 <- c(ans[1:6],NA,ans[8])
ans2 <- as.matrix(ans2)
ans2
## [,1]
## [1,] 116
## [2,] 123
## [3,] 125
## [4,] 66
## [5,] 129
## [6,] 105
## [7,] NA
## [8,] 111
#loaded function generator table
df4 <- read.csv("121016_DoE_EighthFractionalFactorialDesignGenerator.csv", header = TRUE)
df4
## run sex_A smoke_B race1_C race2_AB weeks1_AC weeks2_BC ABC I
## 1 1 0 0 1 1 0 0 0 1
## 2 2 0 1 0 0 1 0 1 1
## 3 3 0 1 1 0 0 1 1 1
## 4 4 0 0 0 1 1 1 0 1
## 5 5 1 0 0 0 0 1 1 1
## 6 6 1 1 0 1 0 0 0 1
## 7 7 1 1 1 1 1 1 0 1
## 8 8 1 0 1 0 1 0 1 1
## Main effect calculations
me_sex <- 1/4 * ((ans2[5] + ans2[6] + ans2[8] + mean(ans2[5] + ans2[6] + ans2[8])) - (ans2[1] + ans2[2] + ans2[3] + ans2[4]))
me_sex
## [1] 65
me_smoke <- 1/4 * ((ans2[2] + ans2[3] + ans2[6] + mean(ans2[2] + ans2[3] + ans2[6])) - (ans2[1] + ans2[4] + ans2[5] + ans2[8]))
me_smoke
## [1] 71
me_race1 <- 1/4 * ((ans2[1] + ans2[3] + mean(ans2[1] + ans2[3] +ans2[8]) + ans2[8]) - (ans2[2] + ans2[4] + ans2[5] + ans2[6]))
me_race1
## [1] 70.25
me_race2<- 1/4 * ((ans2[1] + ans2[4] + ans2[6] + mean(ans2[1] + ans2[4] + ans2[6])) - (ans2[2] + ans2[3] + ans2[5] + ans2[8]))
me_race2
## [1] 21.5
me_weeks1 <- 1/4 * ((ans2[2] + ans2[4] + mean(ans2[2] + ans2[4] + ans2[8]) + ans2[8]) - (ans2[1] + ans2[3] + ans2[5] + ans2[6]))
me_weeks1
## [1] 31.25
me_weeks2 <- 1/4 * ((ans2[3] + ans2[4] + ans2[5] + mean(ans2[3] + ans2[4] + ans2[5])) - (ans2[1] + ans2[2] + ans2[6] + ans2[8]))
me_weeks2
## [1] 46.25
#main effects and interaction effects
meif <- as.matrix(c(me_sex,me_smoke,me_race1,me_race2,me_weeks1,me_weeks2, me_race2, me_weeks1, me_smoke, me_race1, me_weeks2, me_sex, me_race1, me_sex, me_smoke, me_weeks2, me_weeks1, me_race2))
rownames(meif) <- c("me_sex","me_smoke","me_race1","me_race2","me_weeks1","me_weeks2","if_AB","if_AC","if_AD","if_AE","if_BC","if_BD","if_BF","if_CE","if_CF","if_DE","if_DF","if_EF")
colnames(meif) <- c("effects")
meif
## effects
## me_sex 65.00
## me_smoke 71.00
## me_race1 70.25
## me_race2 21.50
## me_weeks1 31.25
## me_weeks2 46.25
## if_AB 21.50
## if_AC 31.25
## if_AD 71.00
## if_AE 70.25
## if_BC 46.25
## if_BD 65.00
## if_BF 70.25
## if_CE 65.00
## if_CF 71.00
## if_DE 46.25
## if_DF 31.25
## if_EF 21.50
#ANOVA
fit <- aov(df$weight~df$sex + df$smoke + df$race + df$weeks)
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## df$sex 1 2000 2000 5.492 0.01924 *
## df$smoke 1 12597 12597 34.591 5.05e-09 ***
## df$race 1 3177 3177 8.724 0.00319 **
## df$weeks 2 178151 89075 244.599 < 2e-16 ***
## Residuals 1439 524039 364
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#**Model Adequacy Checking**
#residuals vs. fitted and normal Q-Q plot
plot(fit, which = c(1:2), caption = list("Residuals vs Fitted", "Normal Q-Q"))