Recipe 8: Fractional Factorial Design for Diabetes Dataset

Design of Experiments

ISYE 6020

Trevor Manzanares

Rensselaer Polytechnic Institute

11/20/14

The dataset under analysis includes data on potential medical indicators of type 2 diabetes for 403 patients. Using a fractional factorial design, the experiment will test which of 6 factors suggest a predisposition to the onset of type 2 diabetes as measured by the response variable, percent glycosolcated hemoglobin. The World Health Organization suggests using a glycosolated hemoglobin threshold of at least 6.5% to determine the presence of type 2 diabetes.

remove(list=ls())
diabetes = read.csv("C:/Users/Trevor/Documents/diabetes.csv", header=TRUE) 

#view first few lines
head(diabetes)
##   glyhb newchol newstabglu newhdl newage newgender newheight chol stabglu
## 1  7.87       1          1      1      1         1         1  292     235
## 2  4.87       1          1      1      1         1         1  293     115
## 3  9.18       1          1      1      1         1        -1  219     112
## 4  5.23       1          1      1      1         1        -1  219     112
## 5  7.40       1         -1      1      1        -1        -1  246     104
## 6 13.70       1          1      1      1        -1        -1  237     233
##   hdl age gender height ratio   location weight  frame bp.1s bp.1d bp.2s
## 1  55  79   male     70   5.3 Buckingham    165          170    90   170
## 2  54  50   male     71   5.4 Buckingham    170 medium   131    75    NA
## 3  73  59   male     66   3.0 Buckingham    170 medium   146    92   168
## 4  73  76   male     64   3.0 Buckingham    105 medium   125    82    NA
## 5  62  66 female     66   4.0     Louisa    189 medium   200    94   208
## 6  58  49 female     62   4.1 Buckingham    189  large   130    90    NA
##   bp.2d waist hip time.ppn
## 1   100    39  41      240
## 2    NA    34  39      120
## 3    98    37  40      120
## 4    NA    29  33       60
## 5    90    45  46      195
## 6    NA    43  47      195
#observe the structure of the data
str(diabetes)
## 'data.frame':    403 obs. of  24 variables:
##  $ glyhb     : num  7.87 4.87 9.18 5.23 7.4 ...
##  $ newchol   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ newstabglu: int  1 1 1 1 -1 1 1 1 1 1 ...
##  $ newhdl    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ newage    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ newgender : int  1 1 1 1 -1 -1 -1 -1 -1 -1 ...
##  $ newheight : int  1 1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ chol      : int  292 293 219 219 246 237 296 213 232 209 ...
##  $ stabglu   : int  235 115 112 112 104 233 262 203 184 113 ...
##  $ hdl       : int  55 54 73 73 62 58 60 75 114 65 ...
##  $ age       : int  79 50 59 76 66 49 74 71 91 76 ...
##  $ gender    : Factor w/ 2 levels "female","male": 2 2 2 2 1 1 1 1 1 1 ...
##  $ height    : int  70 71 66 64 66 62 63 63 61 60 ...
##  $ ratio     : num  5.3 5.4 3 3 4 ...
##  $ location  : Factor w/ 2 levels "Buckingham","Louisa": 1 1 1 1 2 1 1 1 1 1 ...
##  $ weight    : int  165 170 170 105 189 189 183 165 127 143 ...
##  $ frame     : Factor w/ 4 levels "","large","medium",..: 1 3 3 3 3 2 2 3 1 2 ...
##  $ bp.1s     : int  170 131 146 125 200 130 159 150 170 156 ...
##  $ bp.1d     : int  90 75 92 82 94 90 99 80 82 78 ...
##  $ bp.2s     : int  170 NA 168 NA 208 NA 160 145 NA 144 ...
##  $ bp.2d     : int  100 NA 98 NA 90 NA 103 80 NA 76 ...
##  $ waist     : int  39 34 37 29 45 43 42 34 35 35 ...
##  $ hip       : int  41 39 40 33 46 47 48 42 38 40 ...
##  $ time.ppn  : int  240 120 120 60 195 195 300 960 120 1200 ...
#403 observations of 24 variables

The data are tabluated into 24 columns, with detailed measurements of various vital functions and characteristics of each patient. This project will test the effects of 6 factors on the response. Due to the nature of the experimental design (2^k), the 6 predictor variables needed to be typecasted as factors and reduced to two levels for use in the Analysis of Variance model. This was performed by defining values greater and less than the column means as “1” and “-1”, respectively.

summary(diabetes)
##      glyhb          newchol          newstabglu         newhdl      
##  Min.   : 2.68   Min.   :-1.0000   Min.   :-1.000   Min.   :-1.000  
##  1st Qu.: 4.38   1st Qu.:-1.0000   1st Qu.:-1.000   1st Qu.:-1.000  
##  Median : 4.84   Median :-1.0000   Median :-1.000   Median :-1.000  
##  Mean   : 5.59   Mean   :-0.0918   Mean   :-0.509   Mean   :-0.186  
##  3rd Qu.: 5.60   3rd Qu.: 1.0000   3rd Qu.:-1.000   3rd Qu.: 1.000  
##  Max.   :16.11   Max.   : 1.0000   Max.   : 1.000   Max.   : 1.000  
##  NA's   :13                                                         
##      newage          newgender        newheight           chol    
##  Min.   :-1.0000   Min.   :-1.000   Min.   :-1.000   Min.   : 78  
##  1st Qu.:-1.0000   1st Qu.:-1.000   1st Qu.:-1.000   1st Qu.:179  
##  Median :-1.0000   Median :-1.000   Median :-1.000   Median :204  
##  Mean   :-0.0571   Mean   :-0.161   Mean   :-0.117   Mean   :208  
##  3rd Qu.: 1.0000   3rd Qu.: 1.000   3rd Qu.: 1.000   3rd Qu.:230  
##  Max.   : 1.0000   Max.   : 1.000   Max.   : 1.000   Max.   :443  
##                                                      NA's   :1    
##     stabglu         hdl             age          gender        height  
##  Min.   : 48   Min.   : 12.0   Min.   :19.0   female:234   Min.   :52  
##  1st Qu.: 81   1st Qu.: 38.0   1st Qu.:34.0   male  :169   1st Qu.:63  
##  Median : 89   Median : 46.0   Median :45.0                Median :66  
##  Mean   :107   Mean   : 50.5   Mean   :46.9                Mean   :66  
##  3rd Qu.:106   3rd Qu.: 59.0   3rd Qu.:60.0                3rd Qu.:69  
##  Max.   :385   Max.   :120.0   Max.   :92.0                Max.   :76  
##                NA's   :1                                   NA's   :5   
##      ratio             location       weight       frame         bp.1s    
##  Min.   : 1.50   Buckingham:200   Min.   : 99         : 12   Min.   : 90  
##  1st Qu.: 3.20   Louisa    :203   1st Qu.:151   large :103   1st Qu.:121  
##  Median : 4.20                    Median :172   medium:184   Median :136  
##  Mean   : 4.52                    Mean   :178   small :104   Mean   :137  
##  3rd Qu.: 5.40                    3rd Qu.:200                3rd Qu.:147  
##  Max.   :19.30                    Max.   :325                Max.   :250  
##  NA's   :1                        NA's   :1                  NA's   :5    
##      bp.1d           bp.2s         bp.2d           waist           hip    
##  Min.   : 48.0   Min.   :110   Min.   : 60.0   Min.   :26.0   Min.   :30  
##  1st Qu.: 75.0   1st Qu.:138   1st Qu.: 84.0   1st Qu.:33.0   1st Qu.:39  
##  Median : 82.0   Median :149   Median : 92.0   Median :37.0   Median :42  
##  Mean   : 83.3   Mean   :152   Mean   : 92.5   Mean   :37.9   Mean   :43  
##  3rd Qu.: 90.0   3rd Qu.:161   3rd Qu.:100.0   3rd Qu.:41.0   3rd Qu.:46  
##  Max.   :124.0   Max.   :238   Max.   :124.0   Max.   :56.0   Max.   :64  
##  NA's   :5       NA's   :262   NA's   :262     NA's   :2      NA's   :2   
##     time.ppn   
##  Min.   :   5  
##  1st Qu.:  90  
##  Median : 240  
##  Mean   : 341  
##  3rd Qu.: 518  
##  Max.   :1560  
##  NA's   :3

Randomization is not applicable here due to the nature of the data. 1046 subjects were initially interviewed in a study to understand the prevalence of obesity and diabetes in central Virginia for African Americans. The 403 of these subjects included in this dataset were the onces that were actually screened for diabetes.

The null hypothesis is that each of the 6 predictor variabes has no effect on the response, percent glycosolated hemoglobin. Analysis of Variance (ANOVA) will be used to understand the main effects of cholesterol (“newchol”, in mg/dL), stabilized glucose (“newstabglu”, in mg/dL), high density lipoprotein (“newhdl”, in mg/dL), age (“newage”, in years), gender (“newgender”, male or female), and height (“newheight”, in inches) on the response.

For these data, there are no replicates. Furthermore, the background information does not expressly state that repeated measures were used in collecting the data from each patient.

Blocking will not be utilized here for simplicity’s sake and because the objective is to focus on the design of the experiment.

Convert variables of interest from integers to factors:

diabetes$newchol=as.factor(diabetes$newchol)
diabetes$newstabglu=as.factor(diabetes$newstabglu)
diabetes$newhdl=as.factor(diabetes$newhdl)
diabetes$newage=as.factor(diabetes$newage)
diabetes$newgender=as.factor(diabetes$newgender)
diabetes$newheight=as.factor(diabetes$newheight)

Graph showing trends in the data:

par(mfrow=c(2,3))
plot(diabetes$glyhb~diabetes$newchol,xlab="newchol (mg/dL)",ylab="level of glycosolated hemoglobin (%)")
plot(diabetes$glyhb~diabetes$newstabglu,xlab="newstabglu (mg/dL)",ylab="level of glycosolated hemoglobin (%)")
plot(diabetes$glyhb~diabetes$newhdl,xlab="newhdl (mg/dL)",ylab="level of glycosolated hemoglobin (%)")
plot(diabetes$glyhb~diabetes$newage,xlab="newage (years)",ylab="level of glycosolated hemoglobin (%)")
plot(diabetes$glyhb~diabetes$gender,xlab="newgender (male or female)",ylab="level of glycosolated hemoglobin (%)")
plot(diabetes$glyhb~diabetes$newheight,xlab="newheight (inches)",ylab="level of glycosolated hemoglobin (%)")

plot of chunk unnamed-chunk-4

Create Initial ANOVA model:

model1=lm(glyhb~newchol+newstabglu+newhdl+newage+newgender+newheight, data=diabetes)
anova(model1)
## Analysis of Variance Table
## 
## Response: glyhb
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## newchol      1     63      63   21.86 4.1e-06 ***
## newstabglu   1    687     687  239.34 < 2e-16 ***
## newhdl       1      8       8    2.78  0.0965 .  
## newage       1     70      70   24.49 1.1e-06 ***
## newgender    1      0       0    0.01  0.9039    
## newheight    1     28      28    9.89  0.0018 ** 
## Residuals  383   1100       3                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this particular model based on a full factorial design, we reject the H0 that “newchol”, “newstabglu”, “newage”, and “newheight” have no effect on the response. However, we fail to reject the H0 that “newhdl” and “newgender” have no effect on the response. Thus, variation in the response can be attributed to the main effects of “newchol”, “newstabglu”, “newage”, and “newheight”.

Next we contruct a design matrix for a 2^6-1 fraction experimental design. Based on the boxplots above, it is hypothesized that the factor “newstabglu” might have interaction effects when compared with the other variables. Compared to the other variables, there is a significant difference between the median % of glycosolated hemoglobin of the two levels of “newstabglu”.

library(FrF2)
## Warning: package 'FrF2' was built under R version 3.1.2
## Loading required package: DoE.base
## Loading required package: grid
## Loading required package: conf.design
## 
## Attaching package: 'DoE.base'
## 
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## 
## The following object is masked from 'package:graphics':
## 
##     plot.design
halffrac = FrF2(32,nfactors=6,estimable=formula("~newchol+newstabglu+newhdl+newage+newgender+newheight+newstabglu:(newchol+newhdl+newage+newgender+newheight)"),factor.names=c("newchol","newstabglu","newhdl","newage","newgender","newheight"),res4=TRUE,clear=FALSE)
halffrac
##    newchol newstabglu newhdl newage newgender newheight
## 1        1          1      1     -1         1        -1
## 2        1          1      1      1         1         1
## 3        1          1      1     -1        -1         1
## 4       -1          1      1      1         1        -1
## 5        1          1     -1      1        -1         1
## 6       -1          1      1     -1         1         1
## 7       -1         -1     -1      1         1        -1
## 8       -1         -1      1     -1         1        -1
## 9       -1         -1      1      1        -1        -1
## 10       1         -1      1      1        -1         1
## 11      -1         -1     -1     -1        -1        -1
## 12      -1          1     -1     -1        -1         1
## 13       1         -1     -1      1         1         1
## 14       1         -1      1      1         1        -1
## 15      -1          1      1     -1        -1        -1
## 16      -1         -1      1     -1        -1         1
## 17      -1         -1     -1      1        -1         1
## 18      -1          1     -1      1         1         1
## 19       1         -1     -1      1        -1        -1
## 20      -1         -1      1      1         1         1
## 21       1         -1      1     -1         1         1
## 22       1         -1      1     -1        -1        -1
## 23      -1          1     -1     -1         1        -1
## 24       1         -1     -1     -1        -1         1
## 25      -1          1     -1      1        -1        -1
## 26       1          1      1      1        -1        -1
## 27      -1         -1     -1     -1         1         1
## 28       1          1     -1     -1        -1        -1
## 29       1          1     -1     -1         1         1
## 30      -1          1      1      1        -1         1
## 31       1         -1     -1     -1         1        -1
## 32       1          1     -1      1         1        -1
## class=design, type= FrF2.estimable
aliasprint(halffrac)
## $legend
## [1] A=newchol    B=newstabglu C=newhdl     D=newage     E=newgender 
## [6] F=newheight 
## 
## [[2]]
## [1] no aliasing among main effects and 2fis

Because we are interested in the main factor effects as well as interaction effects between “newstabglu” and the other factors of interest, no main effect can be aliased with any other main effect or with any 2 factor interaction. For this reason, res4 is set to TRUE in the design matrix setup. Higher resolutions are not necessary because we are not concerned with aliasing among two factor interactions or aliasing with other higher factor interactions.

Here we use the design matrix to select only the experimental runs from the diabetes dataset that correspond with the exact row value combinations found in the design matrix:

newdataset = merge(halffrac, diabetes, by=c("newchol","newstabglu","newhdl","newage","newgender","newheight"), all = FALSE)
head(newdataset)
##   newchol newstabglu newhdl newage newgender newheight glyhb chol stabglu
## 1      -1         -1     -1     -1        -1        -1  4.52  172     101
## 2      -1         -1     -1     -1        -1        -1  3.96  135      88
## 3      -1         -1     -1     -1        -1        -1  4.75  179      75
## 4      -1         -1     -1     -1        -1        -1  5.55  190      84
## 5      -1         -1     -1     -1        -1        -1  4.64  160      71
## 6      -1         -1     -1     -1        -1        -1  4.75  197      92
##   hdl age gender height ratio   location weight  frame bp.1s bp.1d bp.2s
## 1  46  42 female     65   3.7     Louisa    165  small   118    68    NA
## 2  34  29 female     65   4.0 Buckingham    123  small   118    61    NA
## 3  36  23 female     65   5.0 Buckingham    183 medium   120    80    NA
## 4  44  43 female     62   4.3 Buckingham    163  large   135    88    NA
## 5  44  36 female     64   3.6     Louisa    185 medium   110    80    NA
## 6  46  36 female     64   4.3 Buckingham    136  small    NA    NA    NA
##   bp.2d waist hip time.ppn
## 1    NA    33  45      150
## 2    NA    26  37      240
## 3    NA    43  45      720
## 4    NA    40  45      720
## 5    NA    39  45      300
## 6    NA    32  37       NA
#remove duplicate rows with same value combinations existing in range of columns 1:6
unique = unique( newdataset[ , 1:6])
unique
##     newchol newstabglu newhdl newage newgender newheight
## 1        -1         -1     -1     -1        -1        -1
## 43       -1         -1     -1     -1         1         1
## 65       -1         -1     -1      1        -1         1
## 66       -1         -1     -1      1         1        -1
## 69       -1         -1      1     -1        -1         1
## 78       -1         -1      1     -1         1        -1
## 82       -1         -1      1      1        -1        -1
## 92       -1         -1      1      1         1         1
## 103      -1          1     -1     -1        -1         1
## 106      -1          1     -1      1        -1        -1
## 109      -1          1     -1      1         1         1
## 125      -1          1      1     -1        -1        -1
## 128       1         -1     -1     -1        -1         1
## 130       1         -1     -1     -1         1        -1
## 133       1         -1     -1      1        -1        -1
## 157       1         -1     -1      1         1         1
## 164       1         -1      1     -1        -1        -1
## 173       1         -1      1     -1         1         1
## 187       1         -1      1      1         1        -1
## 190       1          1     -1     -1        -1        -1
## 193       1          1     -1     -1         1         1
## 197       1          1     -1      1        -1         1
## 200       1          1     -1      1         1        -1
## 201       1          1      1     -1        -1         1
## 202       1          1      1      1        -1        -1
## 217       1          1      1      1         1         1
#find values of response variable corresponding to the indices of unique row values
rownames(unique)
##  [1] "1"   "43"  "65"  "66"  "69"  "78"  "82"  "92"  "103" "106" "109"
## [12] "125" "128" "130" "133" "157" "164" "173" "187" "190" "193" "197"
## [23] "200" "201" "202" "217"
glyhb = newdataset$glyhb[index=c(1,43,65,66,69,78,82,92,103,106,109,125,128,130,133,157,164,173,187,190,193,197,200,201,202,217)]
glyhb
##  [1]  4.52  4.21  5.66  4.82  4.18  3.67  4.03  4.88 12.74  4.44  6.13
## [12]  8.06  3.55    NA  5.23  4.98  4.25  6.42  4.66  4.97  4.87  9.82
## [23]  5.60  4.40 11.41  7.87
#append values of response variables to appropriate unique rows
newdesignmatrix = cbind(unique,glyhb)
newdesignmatrix
##     newchol newstabglu newhdl newage newgender newheight glyhb
## 1        -1         -1     -1     -1        -1        -1  4.52
## 43       -1         -1     -1     -1         1         1  4.21
## 65       -1         -1     -1      1        -1         1  5.66
## 66       -1         -1     -1      1         1        -1  4.82
## 69       -1         -1      1     -1        -1         1  4.18
## 78       -1         -1      1     -1         1        -1  3.67
## 82       -1         -1      1      1        -1        -1  4.03
## 92       -1         -1      1      1         1         1  4.88
## 103      -1          1     -1     -1        -1         1 12.74
## 106      -1          1     -1      1        -1        -1  4.44
## 109      -1          1     -1      1         1         1  6.13
## 125      -1          1      1     -1        -1        -1  8.06
## 128       1         -1     -1     -1        -1         1  3.55
## 130       1         -1     -1     -1         1        -1    NA
## 133       1         -1     -1      1        -1        -1  5.23
## 157       1         -1     -1      1         1         1  4.98
## 164       1         -1      1     -1        -1        -1  4.25
## 173       1         -1      1     -1         1         1  6.42
## 187       1         -1      1      1         1        -1  4.66
## 190       1          1     -1     -1        -1        -1  4.97
## 193       1          1     -1     -1         1         1  4.87
## 197       1          1     -1      1        -1         1  9.82
## 200       1          1     -1      1         1        -1  5.60
## 201       1          1      1     -1        -1         1  4.40
## 202       1          1      1      1        -1        -1 11.41
## 217       1          1      1      1         1         1  7.87

Create Secondary ANOVA model testing main factor effects and interaction effects of “newstabglu” with other factors of interest:

model2=lm(glyhb~newstabglu*newchol+newstabglu*newhdl+newstabglu*newage+newstabglu*newgender+newstabglu*newheight, data=newdesignmatrix)
anova(model2)
## Analysis of Variance Table
## 
## Response: glyhb
##                      Df Sum Sq Mean Sq F value Pr(>F)  
## newstabglu            1   43.4    43.4    8.18  0.013 *
## newchol               1    0.2     0.2    0.03  0.867  
## newhdl                1    0.8     0.8    0.15  0.707  
## newage                1    2.0     2.0    0.37  0.553  
## newgender             1    3.2     3.2    0.61  0.450  
## newheight             1    4.9     4.9    0.92  0.354  
## newstabglu:newchol    1    2.4     2.4    0.45  0.516  
## newstabglu:newhdl     1    1.1     1.1    0.21  0.651  
## newstabglu:newage     1    0.1     0.1    0.02  0.895  
## newstabglu:newgender  1    7.6     7.6    1.44  0.252  
## newstabglu:newheight  1    2.6     2.6    0.49  0.496  
## Residuals            13   68.9     5.3                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this particular model based on a half fraction factorial design, we reject the H0 that “newstabglu” has no effect on the response. However, we fail to reject the H0 that “newage”, “newchol”, “newhdl”, “newgender”, and “newheight” have no effect on the response. Thus, variation in the response can be attributed to the main effect of “newstabglu”. Furthermore, there were no significant interaction effects between “newstabglu” and other factors of interest.

Estimate Parameters using Shapiro-Wilk test of normality

shapiro.test(newdesignmatrix$glyhb)
## 
##  Shapiro-Wilk normality test
## 
## data:  newdesignmatrix$glyhb
## W = 0.7676, p-value = 6.851e-05

We reject the H0 that the response variable, percent glycosolcated hemoglobin, is normally distributed. Thus, we attempt to transform the data:

shapiro.test(newdesignmatrix$glyhb^2) 
## 
##  Shapiro-Wilk normality test
## 
## data:  newdesignmatrix$glyhb^2
## W = 0.6585, p-value = 2.114e-06
shapiro.test(sqrt(newdesignmatrix$glyhb)) 
## 
##  Shapiro-Wilk normality test
## 
## data:  sqrt(newdesignmatrix$glyhb)
## W = 0.8222, p-value = 0.0005451
shapiro.test(log(newdesignmatrix$glyhb)) 
## 
##  Shapiro-Wilk normality test
## 
## data:  log(newdesignmatrix$glyhb)
## W = 0.8723, p-value = 0.004814
shapiro.test(log10(newdesignmatrix$glyhb)) 
## 
##  Shapiro-Wilk normality test
## 
## data:  log10(newdesignmatrix$glyhb)
## W = 0.8723, p-value = 0.004814
shapiro.test(1/newdesignmatrix$glyhb) 
## 
##  Shapiro-Wilk normality test
## 
## data:  1/newdesignmatrix$glyhb
## W = 0.9461, p-value = 0.2049

It appears that the data are readily transformed into a normal distribution by using the logarithm of the response variable. Thus, we permanently transform the response in the design matrix and re-run the ANOVA.

newdesignmatrix$glyhb = log(newdesignmatrix$glyhb)
newdesignmatrix
##     newchol newstabglu newhdl newage newgender newheight glyhb
## 1        -1         -1     -1     -1        -1        -1 1.509
## 43       -1         -1     -1     -1         1         1 1.437
## 65       -1         -1     -1      1        -1         1 1.733
## 66       -1         -1     -1      1         1        -1 1.573
## 69       -1         -1      1     -1        -1         1 1.430
## 78       -1         -1      1     -1         1        -1 1.300
## 82       -1         -1      1      1        -1        -1 1.394
## 92       -1         -1      1      1         1         1 1.585
## 103      -1          1     -1     -1        -1         1 2.545
## 106      -1          1     -1      1        -1        -1 1.491
## 109      -1          1     -1      1         1         1 1.813
## 125      -1          1      1     -1        -1        -1 2.087
## 128       1         -1     -1     -1        -1         1 1.267
## 130       1         -1     -1     -1         1        -1    NA
## 133       1         -1     -1      1        -1        -1 1.654
## 157       1         -1     -1      1         1         1 1.605
## 164       1         -1      1     -1        -1        -1 1.447
## 173       1         -1      1     -1         1         1 1.859
## 187       1         -1      1      1         1        -1 1.539
## 190       1          1     -1     -1        -1        -1 1.603
## 193       1          1     -1     -1         1         1 1.583
## 197       1          1     -1      1        -1         1 2.284
## 200       1          1     -1      1         1        -1 1.723
## 201       1          1      1     -1        -1         1 1.482
## 202       1          1      1      1        -1        -1 2.434
## 217       1          1      1      1         1         1 2.063
model3=lm(glyhb~newstabglu*newchol+newstabglu*newhdl+newstabglu*newage+newstabglu*newgender+newstabglu*newheight, data=newdesignmatrix)
anova(model3)
## Analysis of Variance Table
## 
## Response: glyhb
##                      Df Sum Sq Mean Sq F value Pr(>F)   
## newstabglu            1  0.962   0.962    9.19 0.0096 **
## newchol               1  0.000   0.000    0.00 0.9697   
## newhdl                1  0.013   0.013    0.13 0.7292   
## newage                1  0.090   0.090    0.86 0.3693   
## newgender             1  0.030   0.030    0.29 0.5987   
## newheight             1  0.101   0.101    0.96 0.3441   
## newstabglu:newchol    1  0.051   0.051    0.49 0.4970   
## newstabglu:newhdl     1  0.037   0.037    0.35 0.5636   
## newstabglu:newage     1  0.000   0.000    0.00 0.9542   
## newstabglu:newgender  1  0.109   0.109    1.04 0.3264   
## newstabglu:newheight  1  0.027   0.027    0.26 0.6214   
## Residuals            13  1.360   0.105                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this transformed model based on a half fraction factorial design, we reject the H0 that “newstabglu” has no effect on the response. However, we fail to reject the H0 that “newchol”, “newhdl”, “newage”, “newgender”, and “newheight” have no effect on the response. Thus variation in the response can be attributed to the main effect of “newstabglu”. Furthermore, there was no significant interaction effect between “newstabglu” and other factors of interest. This model is the most valid of the three models since the response variable has been transformed to be normally distributed.

Diagnostics and Model Adequacy Checking:

par(mfrow=c(2,2))
plot(model3)

plot of chunk unnamed-chunk-12

The model appears to have a very good fit to the data since the points are evenly dispersed across the Cartesian plane of residual error and the fitted model.

References:

Department of Biostatistics, Vanderbilt University. 2014. Originally obtained from Dr. John Schorling, Department of Medicine, University of Virginia School of Medicine. “http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/diabetes.html

Willems JP, Saunders JT, DE Hunt, JB Schorling: Prevalence of coronary heart disease risk factors among rural blacks: A community-based study. Southern Medical Journal 90:814-820; 1997