1. Setting

This study analyzes data from birth weights of children in North Carolina in 2001. In this experiment, there are two 2-level factors and two 3-level factors; thus, the full factorial design is \(2^2\) * \(3^2\). Gender of the baby (sex) has two levels: male (1) or female (2). Mother smoking status (smoke) has two levels: yes (1) or no (2). The race of the mom (race) has three level: white (1), black (2), or other (3). Completed weeks of gestations (weeks) has 3 levels: pre-mature (<=36), ~full term (37-40), and past full term (>40). The dependent variable is the birth weight (weight) of the baby in ounces.

2. Design

Exploratory analysis will first be conducted. ANOVA will be conducted along with alternatives to null hypothesis statistical testing. The data will be checked for aliasing. The experiment will culminate with model validation.

Randomization: As there was no control over randomization in data collection or assignment to treatments, the data will be randomized without replacement for analysis.

Replication: This study contains replicates as different mothers had children for the same duration of gestation, of the same race, babies of the same gender, and same mother smoking status.

Blocking: Blocking is used to reduce the variability of a sample. No blocking was used with this dataset. The effect of all factors is studied.

3. Statistical Analysis

Exploratory Boxplots

Week levels

## [1] "<=36"  "37-40" ">40"

Race levels

## [1] "1" "2" "3"

Smoke levels

## [1] "1" "0"

Sex levels

## [1] "1" "2"

Head of table

##   ID weeks race sex smoke weight
## 1  1 37-40    1   1     0    111
## 2  2 37-40    1   2     0    116
## 3  3 37-40    1   1     0    138

First, exploratory boxplots were conducted to study the individual levels of all factors.

The dataset was randomized.

## [1] 1445
##        ID weeks race sex smoke weight
## 450   450 37-40    1   2     0    139
## 451   451  <=36    1   1     0    120
## 1091 1091 37-40    2   1     0    138

Fractional Factorial Design

A fractional factorial design was first done for the dataset.

## Warning: package 'FrF2' was built under R version 3.3.2
## Loading required package: DoE.base
## Warning: package 'DoE.base' was built under R version 3.3.2
## Loading required package: grid
## Loading required package: conf.design
## Warning: package 'conf.design' was built under R version 3.3.2
## 
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## The following object is masked from 'package:graphics':
## 
##     plot.design
## The following object is masked from 'package:base':
## 
##     lengths
## creating full factorial with 64 runs ...
##     A  B  C  D  E  F
## 1   1  1 -1  1  1  1
## 2  -1  1 -1  1  1 -1
## 3   1  1  1 -1 -1  1
## 4   1 -1  1  1  1 -1
## 5   1  1  1 -1  1  1
## 6  -1 -1 -1  1  1  1
## 7  -1 -1 -1  1 -1 -1
## 8  -1 -1  1  1 -1  1
## 9  -1 -1  1  1  1 -1
## 10 -1 -1 -1  1 -1  1
## 11  1  1 -1  1  1 -1
## 12  1 -1  1  1 -1 -1
## 13 -1  1 -1 -1  1  1
## 14  1 -1 -1 -1  1 -1
## 15 -1  1  1 -1 -1 -1
## 16  1  1  1 -1  1 -1
## 17 -1  1 -1  1 -1 -1
## 18  1 -1 -1 -1 -1 -1
## 19 -1  1 -1 -1 -1 -1
## 20 -1 -1  1 -1 -1 -1
## 21 -1 -1  1  1  1  1
## 22  1  1  1 -1 -1 -1
## 23 -1  1 -1 -1  1 -1
## 24  1  1  1  1  1  1
## 25 -1  1  1  1 -1 -1
## 26  1 -1 -1 -1  1  1
## 27 -1  1 -1  1  1  1
## 28 -1  1 -1 -1 -1  1
## 29  1 -1  1  1 -1  1
## 30  1 -1 -1  1 -1 -1
## 31 -1 -1 -1  1  1 -1
## 32 -1  1  1 -1  1 -1
## 33 -1  1  1  1  1 -1
## 34  1 -1  1  1  1  1
## 35 -1  1  1  1 -1  1
## 36 -1  1  1 -1 -1  1
## 37  1 -1  1 -1 -1  1
## 38 -1 -1 -1 -1  1 -1
## 39  1  1  1  1 -1  1
## 40  1 -1 -1  1  1 -1
## 41  1  1 -1 -1  1 -1
## 42 -1 -1  1 -1  1 -1
## 43 -1 -1 -1 -1 -1  1
## 44  1 -1 -1  1 -1  1
## 45  1 -1 -1  1  1  1
## 46  1 -1  1 -1  1 -1
## 47  1  1 -1  1 -1 -1
## 48 -1  1  1 -1  1  1
## 49  1  1 -1 -1 -1 -1
## 50  1 -1  1 -1 -1 -1
## 51 -1 -1 -1 -1 -1 -1
## 52 -1 -1  1  1 -1 -1
## 53  1  1  1  1  1 -1
## 54 -1  1  1  1  1  1
## 55 -1  1 -1  1 -1  1
## 56 -1 -1  1 -1 -1  1
## 57  1  1 -1 -1  1  1
## 58 -1 -1  1 -1  1  1
## 59  1 -1  1 -1  1  1
## 60 -1 -1 -1 -1  1  1
## 61  1  1  1  1 -1 -1
## 62  1 -1 -1 -1 -1  1
## 63  1  1 -1 -1 -1  1
## 64  1  1 -1  1 -1  1
## class=design, type= full factorial

The 3-levels factors were each broken into two-two level factors. Below is the fractional factorial design, containing \(2^6\) or 64 runs, that would be used to study this dataset. The low, med, and high criteria were later used to select data for the factorial design.

##    run A_sex B_smoke C_race D_race E_weeks F_weeks A_sex.1 B_smoke.1
## 1    1     0       1      0      0       0       1     low      high
## 2    2     1       0      0      0       1       0    high       low
## 3    3     1       0      1      0       1       1    high       low
## 4    4     0       1      1      0       0       0     low      high
## 5    5     1       0      1      1       1       0    high       low
## 6    6     0       1      1      1       1       0     low      high
## 7    7     0       1      1      1       1       1     low      high
## 8    8     1       1      1      0       0       1    high      high
## 9    9     0       0      1      0       1       0     low       low
## 10  10     0       1      1      0       1       0     low      high
## 11  11     0       1      1      0       0       1     low      high
## 12  12     0       1      0      1       1       0     low      high
## 13  13     0       0      1      0       0       0     low       low
## 14  14     1       0      0      0       0       0    high       low
## 15  15     1       1      1      0       1       1    high      high
## 16  16     0       1      1      1       0       0     low      high
## 17  17     0       0      1      1       0       0     low       low
## 18  18     1       1      1      1       1       0    high      high
## 19  19     0       1      1      1       0       1     low      high
## 20  20     0       0      1      0       0       1     low       low
## 21  21     0       1      0      1       1       1     low      high
## 22  22     1       1      1      0       0       0    high      high
## 23  23     1       1      0      1       1       1    high      high
## 24  24     1       0      1      1       0       0    high       low
## 25  25     1       1      0      0       0       1    high      high
## 26  26     1       0      1      0       0       0    high       low
## 27  27     1       1      1      1       0       1    high      high
## 28  28     1       1      1      1       0       0    high      high
## 29  29     1       0      0      1       0       0    high       low
## 30  30     1       1      1      1       1       1    high      high
## 31  31     1       0      1      0       1       0    high       low
## 32  32     0       0      0      0       0       1     low       low
## 33  33     1       0      0      1       1       1    high       low
## 34  34     1       1      0      1       1       0    high      high
## 35  35     1       1      0      1       0       1    high      high
## 36  36     1       0      1      1       1       1    high       low
## 37  37     1       1      1      0       1       0    high      high
## 38  38     0       1      0      1       0       0     low      high
## 39  39     0       0      0      0       0       0     low       low
## 40  40     0       0      0      0       1       1     low       low
## 41  41     0       1      0      0       0       0     low      high
## 42  42     1       1      0      0       0       0    high      high
## 43  43     0       0      1      0       1       1     low       low
## 44  44     0       1      0      0       1       1     low      high
## 45  45     0       0      1      1       1       1     low       low
## 46  46     1       1      0      1       0       0    high      high
## 47  47     1       1      0      0       1       1    high      high
## 48  48     0       0      0      1       1       0     low       low
## 49  49     1       0      0      0       1       1    high       low
## 50  50     0       0      0      1       1       1     low       low
## 51  51     0       0      0      0       1       0     low       low
## 52  52     0       0      0      1       0       0     low       low
## 53  53     0       0      1      1       1       0     low       low
## 54  54     1       0      1      0       0       1    high       low
## 55  55     0       1      0      0       1       0     low      high
## 56  56     1       0      1      1       0       1    high       low
## 57  57     1       1      0      0       1       0    high      high
## 58  58     0       0      1      1       0       1     low       low
## 59  59     1       0      0      1       0       1    high       low
## 60  60     1       0      0      0       0       1    high       low
## 61  61     0       1      1      0       1       1     low      high
## 62  62     0       1      0      1       0       1     low      high
## 63  63     1       0      0      1       1       0    high       low
## 64  64     0       0      0      1       0       1     low       low
##    X1_race X2_weeks
## 1      low      med
## 2      low      med
## 3      med     high
## 4      med      low
## 5     high      med
## 6     high      med
## 7     high     high
## 8      med      med
## 9      med      med
## 10     med      med
## 11     med      med
## 12     med      med
## 13     med      low
## 14     low      low
## 15     med     high
## 16    high      low
## 17    high      low
## 18    high      med
## 19    high      med
## 20     med      med
## 21     med     high
## 22     med      low
## 23     med     high
## 24    high      low
## 25     low      med
## 26     med      low
## 27    high      med
## 28    high      low
## 29     med      low
## 30    high     high
## 31     med      med
## 32     low      med
## 33     med     high
## 34     med      med
## 35     med      med
## 36    high     high
## 37     med      med
## 38     med      low
## 39     low      low
## 40     low     high
## 41     low      low
## 42     low      low
## 43     med     high
## 44     low     high
## 45    high     high
## 46     med      low
## 47     low     high
## 48     med      med
## 49     low     high
## 50     med     high
## 51     low      med
## 52     med      low
## 53    high      med
## 54     med      med
## 55     low      med
## 56    high      med
## 57     low      med
## 58    high      med
## 59     med      med
## 60     low      med
## 61     med     high
## 62     med      med
## 63     med      med
## 64     med      med

Data were all first converted to either 0 and 1 for 2-level factors and 0, 1, and 2 for 3-level factors.

A for-loop was used to match levels in the fractional factorial design with levels of the actual dataset. For each of the 64 runs, one data point was randomly selected for each criterion. The main issue with the use of the fractional factorial design is that the combination of levels for run 30 did not exist within the actual dataset. The birth weights selected for each run are displayed below.

##       [,1]
##  [1,]  113
##  [2,]  127
##  [3,]  112
##  [4,]   35
##  [5,]  139
##  [6,]  105
##  [7,]  110
##  [8,]  107
##  [9,]  123
## [10,]   95
## [11,]   95
## [12,]  117
## [13,]   32
## [14,]   53
## [15,]  129
## [16,]   96
## [17,]  131
## [18,]  123
## [19,]  105
## [20,]   87
## [21,]  133
## [22,]   84
## [23,]  129
## [24,]  120
## [25,]  111
## [26,]   93
## [27,]  123
## [28,]   88
## [29,]   36
## [30,]   NA
## [31,]  119
## [32,]  136
## [33,]  116
## [34,]   98
## [35,]  128
## [36,]  114
## [37,]   83
## [38,]   73
## [39,]   94
## [40,]  116
## [41,]  115
## [42,]   78
## [43,]  122
## [44,]   90
## [45,]  102
## [46,]  101
## [47,]  118
## [48,]  130
## [49,]  148
## [50,]  103
## [51,]  137
## [52,]  128
## [53,]  112
## [54,]  117
## [55,]  134
## [56,]  153
## [57,]  102
## [58,]  118
## [59,]  100
## [60,]  131
## [61,]  115
## [62,]  109
## [63,]  102
## [64,]  116

1/8 Fractional Factorial Design

As 64 runs is costly and time-consuming, this experiment can be further reduced using a 1/8 fractional factorial design. Thus, the resulting experiment is a \(2^{6-3}\) fractional factorial design experiment. In this experiment, there are 6 factors, each with 2 levels, and 8 runs total. This is a resolution III experiment, where both me and 2fi are confounded by 2-factor (or greater) interactions. The aliasing is also displayed below.

##   sex smoke race1 race2 weeks1 weeks2
## 1   1     1    -1     1     -1     -1
## 2   1    -1    -1    -1     -1      1
## 3   1    -1     1    -1      1     -1
## 4   1     1     1     1      1      1
## 5  -1    -1     1     1     -1     -1
## 6  -1    -1    -1     1      1      1
## 7  -1     1     1    -1     -1      1
## 8  -1     1    -1    -1      1     -1
## class=design, type= FrF2
## $legend
## [1] "A=sex"    "B=smoke"  "C=race1"  "D=race2"  "E=weeks1" "F=weeks2"
## 
## $main
## [1] "A=BD=CE" "B=AD=CF" "C=AE=BF" "D=AB=EF" "E=AC=DF" "F=BC=DE"
## 
## $fi2
## [1] "AF=BE=CD"

The run selection was repeated per the procedure for the fractional factorial design, where actual treatment combinations are see broken up by two levels.

##   run sex smoke race1 race2 weeks1 weeks2 sex.1 smoke.1 race weeks
## 1   1   0     0     1     1      0      0   low     low high   low
## 2   2   0     1     0     0      1      0   low    high  low   med
## 3   3   0     1     1     0      0      1   low    high  med   med
## 4   4   0     0     0     1      1      1   low     low  med  high
## 5   5   1     0     0     0      0      1  high     low  low   med
## 6   6   1     1     0     1      0      0  high    high  med   low
## 7   7   1     1     1     1      1      1  high    high high  high
## 8   8   1     0     1     0      1      0  high     low  med   med

After the for-loop was used to match levels in the 1/8 fractional factorial design with levels of the actual dataset, the output is shown below. Again, the issue is that the levels of one of the experimental runs has no match within the actual dataset. This is a major flaw with fractional designs and suggests they can only be used for certain datasets.

##      [,1]
## [1,]  113
## [2,]  127
## [3,]  112
## [4,]   35
## [5,]  139
## [6,]  105
## [7,]   NA
## [8,]  107

Function Generator

To calculate the function generation for the 1/8 fractional factorial design, after the 6 levels are randomly chosen using the FrF2 function, ABC is determined by multiplying the signs (+,-). All levels are then multiplied to determine the signs of the function generator, which are all positive.

##   run sex_A smoke_B race1_C race2_AB weeks1_AC weeks2_BC ABC I
## 1   1     0       0       1        1         0         0   0 1
## 2   2     0       1       0        0         1         0   1 1
## 3   3     0       1       1        0         0         1   1 1
## 4   4     0       0       0        1         1         1   0 1
## 5   5     1       0       0        0         0         1   1 1
## 6   6     1       1       0        1         0         0   0 1
## 7   7     1       1       1        1         1         1   0 1
## 8   8     1       0       1        0         1         0   1 1

Main and Interaction Effects

Main effects were then calculated for the 6 factors. The aliasing combinations were used to also determine the interaction effects, listed in a table below.

##           effects
## me_sex      78.75
## me_smoke    73.50
## me_race1    64.50
## me_race2     5.25
## me_weeks1   17.25
## me_weeks2   30.00
## if_AB        5.25
## if_AC       17.25
## if_AD       73.50
## if_AE       64.50
## if_BC       30.00
## if_BD       78.75
## if_BF       64.50
## if_CE       78.75
## if_CF       73.50
## if_DE       30.00
## if_DF       17.25
## if_EF        5.25

ANOVA

Finally, an ANOVA was run based on the main effect data. As all factors had quite large effects, all were included in the linear model. By ANOVA, all factors had signficant effects on child birth weight.

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## df$sex         1   2000    2000   5.492  0.01924 *  
## df$smoke       1  12597   12597  34.591 5.05e-09 ***
## df$race        1   3177    3177   8.724  0.00319 ** 
## df$weeks       2 178151   89075 244.599  < 2e-16 ***
## Residuals   1439 524039     364                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model Adequacy Checking

Model adequacy checking was done to determine the legitmacy of the model. The residuals vs. fitted plot shows a relatively uniform distribution of values. The QQ-plot only has significant deviation for negative theoretical quantiles suggesting a relatively good model but one that still could be improved.

Conclusions

Ultimately, these results suggest mothers smoking during pregnancy, baby gender, race, and weeks of gestation all have significant effects on the birth weight of babies. However, the model could be improved a bit more. As aliasing played a role in and we are unable to determine whether effects are solely due to main effects, 2 factor interactions, or even higher order interactions, main effects were included in this model. Thus, future models would benefit from closer consideration of 2fi. Additionally, fractional factorial design was not an appropriate method for analysis for this dataset, as not all of the runs were represented in the dataset, requiring further analysis via different methods.

4. References

  1. D. C. Montgomery, Design and Analysis of Experiments, 8th ed. Hoboken, NJ: John Wiley & Sons, Inc., 2013.
  2. J. Holcomb, “R: NCbirths,” R datasets, 2001. [Online]. Available: https://vincentarelbundock.github.io/Rdatasets/doc/Stat2Data/NCbirths.html. [Accessed: 04-Dec-2016].

5. Appendix

This section contains all code used to analyze this data.

#**Week levels**
week_levels = c("<=36", "37-40", ">40")
week_levels
## [1] "<=36"  "37-40" ">40"
#**Race levels**
Race_levels = c("1", "2", "3")
Race_levels
## [1] "1" "2" "3"
#**Smoke levels**
smoke_levels = c("1", "0")
smoke_levels
## [1] "1" "0"
#**Sex levels**
sex_levels = c("1", "2")
sex_levels
## [1] "1" "2"
#**Head of dataset**
#Load in data using Utils package
library(utils)
setwd("C:/Users/Alexis/Documents/R/win-library/3.3")
df <- read.csv("120316_project3birthweightdata.csv", header = TRUE)
df <- df[complete.cases(df),]
#Show first 3 rows of data table
head(df, 3)
##   ID weeks race sex smoke weight
## 1  1 37-40    1   1     0    111
## 2  2 37-40    1   2     0    116
## 3  3 37-40    1   1     0    138
#first, exploratory boxplots were conducted to study the individual levels of all factors

#Boxplot examining levels of each factor for exploratory analysis
boxplotweeks <- boxplot(df$weight~df$weeks, xlab = "Weeks of Gestation", ylab = "Birth Weight", main = "Birth Weight vs. Weeks of Gestation")

boxplotrace <- boxplot(df$weight~df$race, xlab = "Race of mother", ylab = "Birth Weight", main = "Birth Weight vs. Race of Mother")

boxplotsmoke <- boxplot(df$weight~df$smoke, xlab = "Mother Smoking Status", ylab = "Birth Weight", main = "Birth Weight vs. Mother Smoking Status")

boxplotsex <- boxplot(df$weight~df$sex, xlab = "Gender of Baby", ylab = "Birth Weight", main = "Birth Weight vs. Gender of Baby")

#the dataset was randomized
#To randomize data, data were randomly selected and combined to form a new matrix.
sample_size = length(1:nrow(df))
sample_size
## [1] 1445
dfrand = df[sample(1:nrow(df), size = sample_size, replace = FALSE),]
head(dfrand,3)
##      ID weeks race sex smoke weight
## 78   78 37-40    1   1     0    114
## 921 921 37-40    1   1     0    140
## 194 194   >40    3   2     0    110
#a fractional factorial design was first done for the dataset
library(FrF2)
FrF2(64,nfactors=6,estimable=formula("~A+B+C+D+A:(B+C+D)"), clear=FALSE)
## creating full factorial with 64 runs ...
##     A  B  C  D  E  F
## 1   1 -1 -1 -1  1 -1
## 2  -1 -1 -1 -1  1 -1
## 3   1  1  1  1  1  1
## 4   1 -1 -1 -1 -1  1
## 5  -1 -1  1  1  1 -1
## 6   1 -1  1  1 -1 -1
## 7  -1  1 -1  1  1 -1
## 8   1 -1 -1 -1  1  1
## 9  -1 -1  1  1 -1 -1
## 10  1 -1 -1  1  1 -1
## 11  1 -1 -1  1 -1 -1
## 12 -1 -1 -1  1  1 -1
## 13 -1  1  1  1  1  1
## 14  1  1 -1 -1 -1 -1
## 15  1  1  1 -1 -1  1
## 16 -1  1 -1 -1 -1  1
## 17  1  1  1  1 -1  1
## 18 -1 -1  1  1  1  1
## 19  1  1 -1 -1 -1  1
## 20  1 -1  1 -1 -1  1
## 21  1  1  1  1  1 -1
## 22  1  1 -1 -1  1 -1
## 23 -1  1  1 -1  1 -1
## 24 -1 -1  1 -1  1 -1
## 25 -1 -1 -1  1 -1  1
## 26 -1  1 -1 -1  1  1
## 27  1  1 -1 -1  1  1
## 28 -1 -1 -1 -1  1  1
## 29 -1  1 -1  1  1  1
## 30  1 -1  1  1 -1  1
## 31  1  1  1 -1 -1 -1
## 32 -1 -1  1 -1 -1  1
## 33 -1 -1 -1 -1 -1 -1
## 34  1 -1  1  1  1 -1
## 35  1  1 -1  1  1  1
## 36  1  1  1 -1  1  1
## 37 -1  1 -1 -1 -1 -1
## 38 -1  1  1  1 -1  1
## 39 -1  1  1 -1 -1 -1
## 40 -1  1 -1  1 -1 -1
## 41 -1  1  1 -1  1  1
## 42 -1 -1  1 -1  1  1
## 43 -1 -1 -1 -1 -1  1
## 44 -1  1  1 -1 -1  1
## 45  1 -1 -1  1 -1  1
## 46 -1  1  1  1 -1 -1
## 47  1 -1 -1  1  1  1
## 48 -1 -1  1  1 -1  1
## 49 -1 -1 -1  1 -1 -1
## 50  1 -1  1 -1 -1 -1
## 51 -1 -1  1 -1 -1 -1
## 52  1  1 -1  1 -1 -1
## 53  1  1 -1  1 -1  1
## 54  1 -1  1 -1  1  1
## 55  1 -1  1 -1  1 -1
## 56 -1  1 -1 -1  1 -1
## 57  1  1 -1  1  1 -1
## 58 -1  1  1  1  1 -1
## 59 -1  1 -1  1 -1  1
## 60  1  1  1  1 -1 -1
## 61 -1 -1 -1  1  1  1
## 62  1 -1  1  1  1  1
## 63  1  1  1 -1  1 -1
## 64  1 -1 -1 -1 -1 -1
## class=design, type= full factorial
#load in fractional factorial design table
df2 <- read.csv("120716_DoE_FractionalFactorialDesign.csv", header = TRUE)
df2
##    run A_sex B_smoke C_race D_race E_weeks F_weeks A_sex.1 B_smoke.1
## 1    1     0       1      0      0       0       1     low      high
## 2    2     1       0      0      0       1       0    high       low
## 3    3     1       0      1      0       1       1    high       low
## 4    4     0       1      1      0       0       0     low      high
## 5    5     1       0      1      1       1       0    high       low
## 6    6     0       1      1      1       1       0     low      high
## 7    7     0       1      1      1       1       1     low      high
## 8    8     1       1      1      0       0       1    high      high
## 9    9     0       0      1      0       1       0     low       low
## 10  10     0       1      1      0       1       0     low      high
## 11  11     0       1      1      0       0       1     low      high
## 12  12     0       1      0      1       1       0     low      high
## 13  13     0       0      1      0       0       0     low       low
## 14  14     1       0      0      0       0       0    high       low
## 15  15     1       1      1      0       1       1    high      high
## 16  16     0       1      1      1       0       0     low      high
## 17  17     0       0      1      1       0       0     low       low
## 18  18     1       1      1      1       1       0    high      high
## 19  19     0       1      1      1       0       1     low      high
## 20  20     0       0      1      0       0       1     low       low
## 21  21     0       1      0      1       1       1     low      high
## 22  22     1       1      1      0       0       0    high      high
## 23  23     1       1      0      1       1       1    high      high
## 24  24     1       0      1      1       0       0    high       low
## 25  25     1       1      0      0       0       1    high      high
## 26  26     1       0      1      0       0       0    high       low
## 27  27     1       1      1      1       0       1    high      high
## 28  28     1       1      1      1       0       0    high      high
## 29  29     1       0      0      1       0       0    high       low
## 30  30     1       1      1      1       1       1    high      high
## 31  31     1       0      1      0       1       0    high       low
## 32  32     0       0      0      0       0       1     low       low
## 33  33     1       0      0      1       1       1    high       low
## 34  34     1       1      0      1       1       0    high      high
## 35  35     1       1      0      1       0       1    high      high
## 36  36     1       0      1      1       1       1    high       low
## 37  37     1       1      1      0       1       0    high      high
## 38  38     0       1      0      1       0       0     low      high
## 39  39     0       0      0      0       0       0     low       low
## 40  40     0       0      0      0       1       1     low       low
## 41  41     0       1      0      0       0       0     low      high
## 42  42     1       1      0      0       0       0    high      high
## 43  43     0       0      1      0       1       1     low       low
## 44  44     0       1      0      0       1       1     low      high
## 45  45     0       0      1      1       1       1     low       low
## 46  46     1       1      0      1       0       0    high      high
## 47  47     1       1      0      0       1       1    high      high
## 48  48     0       0      0      1       1       0     low       low
## 49  49     1       0      0      0       1       1    high       low
## 50  50     0       0      0      1       1       1     low       low
## 51  51     0       0      0      0       1       0     low       low
## 52  52     0       0      0      1       0       0     low       low
## 53  53     0       0      1      1       1       0     low       low
## 54  54     1       0      1      0       0       1    high       low
## 55  55     0       1      0      0       1       0     low      high
## 56  56     1       0      1      1       0       1    high       low
## 57  57     1       1      0      0       1       0    high      high
## 58  58     0       0      1      1       0       1     low       low
## 59  59     1       0      0      1       0       1    high       low
## 60  60     1       0      0      0       0       1    high       low
## 61  61     0       1      1      0       1       1     low      high
## 62  62     0       1      0      1       0       1     low      high
## 63  63     1       0      0      1       1       0    high       low
## 64  64     0       0      0      1       0       1     low       low
##    X1_race X2_weeks
## 1      low      med
## 2      low      med
## 3      med     high
## 4      med      low
## 5     high      med
## 6     high      med
## 7     high     high
## 8      med      med
## 9      med      med
## 10     med      med
## 11     med      med
## 12     med      med
## 13     med      low
## 14     low      low
## 15     med     high
## 16    high      low
## 17    high      low
## 18    high      med
## 19    high      med
## 20     med      med
## 21     med     high
## 22     med      low
## 23     med     high
## 24    high      low
## 25     low      med
## 26     med      low
## 27    high      med
## 28    high      low
## 29     med      low
## 30    high     high
## 31     med      med
## 32     low      med
## 33     med     high
## 34     med      med
## 35     med      med
## 36    high     high
## 37     med      med
## 38     med      low
## 39     low      low
## 40     low     high
## 41     low      low
## 42     low      low
## 43     med     high
## 44     low     high
## 45    high     high
## 46     med      low
## 47     low     high
## 48     med      med
## 49     low     high
## 50     med     high
## 51     low      med
## 52     med      low
## 53    high      med
## 54     med      med
## 55     low      med
## 56    high      med
## 57     low      med
## 58    high      med
## 59     med      med
## 60     low      med
## 61     med     high
## 62     med      med
## 63     med      med
## 64     med      med
#use for loop to select one data point for each fractional factorial design run
vars <- cbind.data.frame(df$sex, df$smoke, df$race, df$weeks)
combos <- cbind.data.frame(df2$A_sex, df2$B_smoke, df2$X1_race, df2$X2_weeks)

vars[,4] <- as.character(vars[,4])

vars[which(vars[,1]==1),1] <- 0
vars[which(vars[,1]==2),1] <- 1

vars[which(vars[,2]==0),2] <- 0
vars[which(vars[,2]==1),2] <- 1

vars[which(vars[,3]==1),3] <- 0
vars[which(vars[,3]==2),3] <- 1
vars[which(vars[,3]==3),3] <- 2

vars[which(vars[,4]=="<=36"),4] <- 0
vars[which(vars[,4]=="37-40"),4] <- 1
vars[which(vars[,4]==">40"),4] <- 2

combos[,1] <- as.character(combos[,1])

combos[which(combos[,1]=="low"),1] <- 0
combos[which(combos[,1]=="high"),1] <- 1

combos[,2] <- as.character(combos[,2])

combos[which(combos[,2]=="low"),2] <- 0
combos[which(combos[,2]=="high"),2] <- 1

combos[,3] <- as.character(combos[,3])

combos[which(combos[,3]=="low"),3] <- 0
combos[which(combos[,3]=="med"),3] <- 1
combos[which(combos[,3]=="high"),3] <- 2

combos[,4] <- as.character(combos[,4])

combos[which(combos[,4]=="low"),4] <- 0
combos[which(combos[,4]=="med"),4] <- 1
combos[which(combos[,4]=="high"),4] <- 2

combos[,1] <- as.numeric(combos[,1])
combos[,2] <- as.numeric(combos[,2])
combos[,3] <- as.numeric(combos[,3])
combos[,4] <- as.numeric(combos[,4])

good <- matrix(0,nrow(combos))

for (i in 1:nrow(combos))
{
  row_i <- combos[i,]
  row_i <- as.numeric(row_i)
  indexes <- which(vars[,1]==row_i[1] & vars[,2]==row_i[2] & vars[,3]==row_i[3] & vars[,4]==row_i[4])
  if(length(indexes)>0)
  {
    temp <- sample(1:length(indexes),size = 1, replace=FALSE)
    good[i] <- indexes[temp]
  }
}
#save response variable for each fractional factorial design run in a single matrix

ans<-df$weight[good]
ans <- c(ans[1:29],NA,ans[30:length(ans)])
ans <- as.matrix(ans)
ans
##       [,1]
##  [1,]  116
##  [2,]  123
##  [3,]  125
##  [4,]   66
##  [5,]  129
##  [6,]  105
##  [7,]  110
##  [8,]  111
##  [9,]  153
## [10,]   98
## [11,]  108
## [12,]  159
## [13,]   91
## [14,]  102
## [15,]  130
## [16,]   96
## [17,]   70
## [18,]  123
## [19,]  105
## [20,]   94
## [21,]  122
## [22,]  100
## [23,]  129
## [24,]   93
## [25,]  133
## [26,]   36
## [27,]  123
## [28,]   88
## [29,]   87
## [30,]   NA
## [31,]   97
## [32,]  134
## [33,]  125
## [34,]  109
## [35,]  108
## [36,]  110
## [37,]  128
## [38,]   19
## [39,]  121
## [40,]  125
## [41,]   51
## [42,]   78
## [43,]  141
## [44,]  127
## [45,]  102
## [46,]  101
## [47,]  134
## [48,]  111
## [49,]  102
## [50,]  147
## [51,]  158
## [52,]   94
## [53,]  158
## [54,]  161
## [55,]  118
## [56,]  133
## [57,]  107
## [58,]  121
## [59,]  118
## [60,]  125
## [61,]  122
## [62,]  109
## [63,]  109
## [64,]  121
#randomly design 1/8 fractional factorial run and determine aliasing
eightdesign = FrF2(8,6,factor.names = c("sex","smoke","race1","race2","weeks1", "weeks2"))
eightdesign
##   sex smoke race1 race2 weeks1 weeks2
## 1  -1     1    -1    -1      1     -1
## 2  -1    -1     1     1     -1     -1
## 3   1    -1     1    -1      1     -1
## 4   1     1    -1     1     -1     -1
## 5  -1     1     1    -1     -1      1
## 6   1     1     1     1      1      1
## 7  -1    -1    -1     1      1      1
## 8   1    -1    -1    -1     -1      1
## class=design, type= FrF2
design.info(eightdesign)$aliased
## $legend
## [1] "A=sex"    "B=smoke"  "C=race1"  "D=race2"  "E=weeks1" "F=weeks2"
## 
## $main
## [1] "A=BD=CE" "B=AD=CF" "C=AE=BF" "D=AB=EF" "E=AC=DF" "F=BC=DE"
## 
## $fi2
## [1] "AF=BE=CD"
#load in 1/8 fractional factorial design
df3 <- read.csv("120916_DoE_EighthFractionalFactorialDesign.csv", header = TRUE)
df3
##   run sex smoke race1 race2 weeks1 weeks2 sex.1 smoke.1 race weeks
## 1   1   0     0     1     1      0      0   low     low high   low
## 2   2   0     1     0     0      1      0   low    high  low   med
## 3   3   0     1     1     0      0      1   low    high  med   med
## 4   4   0     0     0     1      1      1   low     low  med  high
## 5   5   1     0     0     0      0      1  high     low  low   med
## 6   6   1     1     0     1      0      0  high    high  med   low
## 7   7   1     1     1     1      1      1  high    high high  high
## 8   8   1     0     1     0      1      0  high     low  med   med
#for loop selection of data for each run
vars2 <- cbind.data.frame(df$sex, df$smoke, df$race, df$weeks)
combos2 <- cbind.data.frame(df3$sex, df3$smoke, df3$race, df3$weeks)

vars2[,4] <- as.character(vars2[,4])

vars2[which(vars2[,1]==1),1] <- 0
vars2[which(vars2[,1]==2),1] <- 1

vars2[which(vars2[,2]==0),2] <- 0
vars2[which(vars2[,2]==1),2] <- 1

vars2[which(vars2[,3]==1),3] <- 0
vars2[which(vars2[,3]==2),3] <- 1
vars2[which(vars2[,3]==3),3] <- 2

vars2[which(vars2[,4]=="<=36"),4] <- 0
vars2[which(vars2[,4]=="37-40"),4] <- 1
vars2[which(vars2[,4]==">40"),4] <- 2

combos2[,1] <- as.character(combos2[,1])

combos2[which(combos2[,1]=="low"),1] <- 0
combos2[which(combos2[,1]=="high"),1] <- 1

combos2[,2] <- as.character(combos2[,2])

combos2[which(combos2[,2]=="low"),2] <- 0
combos[which(combos[,2]=="high"),2] <- 1

combos2[,3] <- as.character(combos2[,3])

combos2[which(combos2[,3]=="low"),3] <- 0
combos2[which(combos2[,3]=="med"),3] <- 1
combos2[which(combos2[,3]=="high"),3] <- 2

combos2[,4] <- as.character(combos2[,4])

combos2[which(combos2[,4]=="low"),4] <- 0
combos2[which(combos2[,4]=="med"),4] <- 1
combos2[which(combos2[,4]=="high"),4] <- 2

combos2[,1] <- as.numeric(combos2[,1])
combos2[,2] <- as.numeric(combos2[,2])
combos2[,3] <- as.numeric(combos2[,3])
combos2[,4] <- as.numeric(combos2[,4])

good2 <- matrix(0,nrow(combos2))

for (i in 1:nrow(combos2))
{
  row_i <- combos2[i,]
  row_i <- as.numeric(row_i)
  indexes <- which(vars2[,1]==row_i[1] & vars2[,2]==row_i[2] & vars2[,3]==row_i[3] & vars2[,4]==row_i[4])
  if(length(indexes)>0)
  {
    temp <- sample(1:length(indexes),size = 1, replace=FALSE)
    good2[i] <- indexes[temp]
  }
}
#Save data from corresponding runs in a matrix
ans2 <- df$weight[good2]
ans2 <- c(ans[1:6],NA,ans[8])
ans2 <- as.matrix(ans2)
ans2
##      [,1]
## [1,]  116
## [2,]  123
## [3,]  125
## [4,]   66
## [5,]  129
## [6,]  105
## [7,]   NA
## [8,]  111
#loaded function generator table
df4 <- read.csv("121016_DoE_EighthFractionalFactorialDesignGenerator.csv", header = TRUE)
df4
##   run sex_A smoke_B race1_C race2_AB weeks1_AC weeks2_BC ABC I
## 1   1     0       0       1        1         0         0   0 1
## 2   2     0       1       0        0         1         0   1 1
## 3   3     0       1       1        0         0         1   1 1
## 4   4     0       0       0        1         1         1   0 1
## 5   5     1       0       0        0         0         1   1 1
## 6   6     1       1       0        1         0         0   0 1
## 7   7     1       1       1        1         1         1   0 1
## 8   8     1       0       1        0         1         0   1 1
## Main effect calculations
me_sex <- 1/4 * ((ans2[5] + ans2[6] + ans2[8] + mean(ans2[5] + ans2[6] + ans2[8])) - (ans2[1] + ans2[2] + ans2[3] + ans2[4]))
me_sex
## [1] 65
me_smoke <- 1/4 * ((ans2[2] + ans2[3] + ans2[6] + mean(ans2[2] + ans2[3] + ans2[6])) - (ans2[1] + ans2[4] + ans2[5] + ans2[8]))
me_smoke
## [1] 71
me_race1 <- 1/4 * ((ans2[1] + ans2[3] + mean(ans2[1] + ans2[3] +ans2[8]) + ans2[8]) - (ans2[2] + ans2[4] + ans2[5] + ans2[6]))
me_race1
## [1] 70.25
me_race2<- 1/4 * ((ans2[1] + ans2[4] + ans2[6] + mean(ans2[1] + ans2[4] + ans2[6])) - (ans2[2] + ans2[3] + ans2[5] + ans2[8]))
me_race2
## [1] 21.5
me_weeks1 <- 1/4 * ((ans2[2] + ans2[4] + mean(ans2[2] + ans2[4] + ans2[8]) + ans2[8]) - (ans2[1] + ans2[3] + ans2[5] + ans2[6]))
me_weeks1
## [1] 31.25
me_weeks2 <- 1/4 * ((ans2[3] + ans2[4] + ans2[5] + mean(ans2[3] + ans2[4] + ans2[5])) - (ans2[1] + ans2[2] + ans2[6] + ans2[8]))
me_weeks2
## [1] 46.25
#main effects and interaction effects
meif <- as.matrix(c(me_sex,me_smoke,me_race1,me_race2,me_weeks1,me_weeks2, me_race2, me_weeks1, me_smoke, me_race1, me_weeks2, me_sex, me_race1, me_sex, me_smoke, me_weeks2, me_weeks1, me_race2))
rownames(meif) <- c("me_sex","me_smoke","me_race1","me_race2","me_weeks1","me_weeks2","if_AB","if_AC","if_AD","if_AE","if_BC","if_BD","if_BF","if_CE","if_CF","if_DE","if_DF","if_EF")
colnames(meif) <- c("effects")
meif
##           effects
## me_sex      65.00
## me_smoke    71.00
## me_race1    70.25
## me_race2    21.50
## me_weeks1   31.25
## me_weeks2   46.25
## if_AB       21.50
## if_AC       31.25
## if_AD       71.00
## if_AE       70.25
## if_BC       46.25
## if_BD       65.00
## if_BF       70.25
## if_CE       65.00
## if_CF       71.00
## if_DE       46.25
## if_DF       31.25
## if_EF       21.50
#ANOVA
fit <- aov(df$weight~df$sex + df$smoke + df$race + df$weeks)
summary(fit)
##               Df Sum Sq Mean Sq F value   Pr(>F)    
## df$sex         1   2000    2000   5.492  0.01924 *  
## df$smoke       1  12597   12597  34.591 5.05e-09 ***
## df$race        1   3177    3177   8.724  0.00319 ** 
## df$weeks       2 178151   89075 244.599  < 2e-16 ***
## Residuals   1439 524039     364                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#**Model Adequacy Checking**
#residuals vs. fitted and normal Q-Q plot
plot(fit, which = c(1:2), caption = list("Residuals vs Fitted", "Normal Q-Q"))