The dataset under analysis includes data on potential medical indicators of type 2 diabetes for 403 patients. Using a fractional factorial design, the experiment will test which of 6 factors suggest a predisposition to the onset of type 2 diabetes as measured by the response variable, percent glycosolcated hemoglobin. The World Health Organization suggests using a glycosolated hemoglobin threshold of at least 6.5% to determine the presence of type 2 diabetes.
remove(list=ls())
diabetes = read.csv("C:/Users/Trevor/Documents/diabetes.csv", header=TRUE)
#view first few lines
head(diabetes)
## glyhb newchol newstabglu newhdl newage newgender newheight chol stabglu
## 1 7.87 1 1 1 1 1 1 292 235
## 2 4.87 1 1 1 1 1 1 293 115
## 3 9.18 1 1 1 1 1 -1 219 112
## 4 5.23 1 1 1 1 1 -1 219 112
## 5 7.40 1 -1 1 1 -1 -1 246 104
## 6 13.70 1 1 1 1 -1 -1 237 233
## hdl age gender height ratio location weight frame bp.1s bp.1d bp.2s
## 1 55 79 male 70 5.3 Buckingham 165 170 90 170
## 2 54 50 male 71 5.4 Buckingham 170 medium 131 75 NA
## 3 73 59 male 66 3.0 Buckingham 170 medium 146 92 168
## 4 73 76 male 64 3.0 Buckingham 105 medium 125 82 NA
## 5 62 66 female 66 4.0 Louisa 189 medium 200 94 208
## 6 58 49 female 62 4.1 Buckingham 189 large 130 90 NA
## bp.2d waist hip time.ppn
## 1 100 39 41 240
## 2 NA 34 39 120
## 3 98 37 40 120
## 4 NA 29 33 60
## 5 90 45 46 195
## 6 NA 43 47 195
#observe the structure of the data
str(diabetes)
## 'data.frame': 403 obs. of 24 variables:
## $ glyhb : num 7.87 4.87 9.18 5.23 7.4 ...
## $ newchol : int 1 1 1 1 1 1 1 1 1 1 ...
## $ newstabglu: int 1 1 1 1 -1 1 1 1 1 1 ...
## $ newhdl : int 1 1 1 1 1 1 1 1 1 1 ...
## $ newage : int 1 1 1 1 1 1 1 1 1 1 ...
## $ newgender : int 1 1 1 1 -1 -1 -1 -1 -1 -1 ...
## $ newheight : int 1 1 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ chol : int 292 293 219 219 246 237 296 213 232 209 ...
## $ stabglu : int 235 115 112 112 104 233 262 203 184 113 ...
## $ hdl : int 55 54 73 73 62 58 60 75 114 65 ...
## $ age : int 79 50 59 76 66 49 74 71 91 76 ...
## $ gender : Factor w/ 2 levels "female","male": 2 2 2 2 1 1 1 1 1 1 ...
## $ height : int 70 71 66 64 66 62 63 63 61 60 ...
## $ ratio : num 5.3 5.4 3 3 4 ...
## $ location : Factor w/ 2 levels "Buckingham","Louisa": 1 1 1 1 2 1 1 1 1 1 ...
## $ weight : int 165 170 170 105 189 189 183 165 127 143 ...
## $ frame : Factor w/ 4 levels "","large","medium",..: 1 3 3 3 3 2 2 3 1 2 ...
## $ bp.1s : int 170 131 146 125 200 130 159 150 170 156 ...
## $ bp.1d : int 90 75 92 82 94 90 99 80 82 78 ...
## $ bp.2s : int 170 NA 168 NA 208 NA 160 145 NA 144 ...
## $ bp.2d : int 100 NA 98 NA 90 NA 103 80 NA 76 ...
## $ waist : int 39 34 37 29 45 43 42 34 35 35 ...
## $ hip : int 41 39 40 33 46 47 48 42 38 40 ...
## $ time.ppn : int 240 120 120 60 195 195 300 960 120 1200 ...
#403 observations of 24 variables
The data are tabluated into 24 columns, with detailed measurements of various vital functions and characteristics of each patient. This project will test the effects of 6 factors on the response. Due to the nature of the experimental design (2^k), the 6 predictor variables needed to be typecasted as factors and reduced to two levels for use in the Analysis of Variance model. This was performed by defining values greater and less than the column means as “1” and “-1”, respectively.
summary(diabetes)
## glyhb newchol newstabglu newhdl
## Min. : 2.68 Min. :-1.0000 Min. :-1.000 Min. :-1.000
## 1st Qu.: 4.38 1st Qu.:-1.0000 1st Qu.:-1.000 1st Qu.:-1.000
## Median : 4.84 Median :-1.0000 Median :-1.000 Median :-1.000
## Mean : 5.59 Mean :-0.0918 Mean :-0.509 Mean :-0.186
## 3rd Qu.: 5.60 3rd Qu.: 1.0000 3rd Qu.:-1.000 3rd Qu.: 1.000
## Max. :16.11 Max. : 1.0000 Max. : 1.000 Max. : 1.000
## NA's :13
## newage newgender newheight chol
## Min. :-1.0000 Min. :-1.000 Min. :-1.000 Min. : 78
## 1st Qu.:-1.0000 1st Qu.:-1.000 1st Qu.:-1.000 1st Qu.:179
## Median :-1.0000 Median :-1.000 Median :-1.000 Median :204
## Mean :-0.0571 Mean :-0.161 Mean :-0.117 Mean :208
## 3rd Qu.: 1.0000 3rd Qu.: 1.000 3rd Qu.: 1.000 3rd Qu.:230
## Max. : 1.0000 Max. : 1.000 Max. : 1.000 Max. :443
## NA's :1
## stabglu hdl age gender height
## Min. : 48 Min. : 12.0 Min. :19.0 female:234 Min. :52
## 1st Qu.: 81 1st Qu.: 38.0 1st Qu.:34.0 male :169 1st Qu.:63
## Median : 89 Median : 46.0 Median :45.0 Median :66
## Mean :107 Mean : 50.5 Mean :46.9 Mean :66
## 3rd Qu.:106 3rd Qu.: 59.0 3rd Qu.:60.0 3rd Qu.:69
## Max. :385 Max. :120.0 Max. :92.0 Max. :76
## NA's :1 NA's :5
## ratio location weight frame bp.1s
## Min. : 1.50 Buckingham:200 Min. : 99 : 12 Min. : 90
## 1st Qu.: 3.20 Louisa :203 1st Qu.:151 large :103 1st Qu.:121
## Median : 4.20 Median :172 medium:184 Median :136
## Mean : 4.52 Mean :178 small :104 Mean :137
## 3rd Qu.: 5.40 3rd Qu.:200 3rd Qu.:147
## Max. :19.30 Max. :325 Max. :250
## NA's :1 NA's :1 NA's :5
## bp.1d bp.2s bp.2d waist hip
## Min. : 48.0 Min. :110 Min. : 60.0 Min. :26.0 Min. :30
## 1st Qu.: 75.0 1st Qu.:138 1st Qu.: 84.0 1st Qu.:33.0 1st Qu.:39
## Median : 82.0 Median :149 Median : 92.0 Median :37.0 Median :42
## Mean : 83.3 Mean :152 Mean : 92.5 Mean :37.9 Mean :43
## 3rd Qu.: 90.0 3rd Qu.:161 3rd Qu.:100.0 3rd Qu.:41.0 3rd Qu.:46
## Max. :124.0 Max. :238 Max. :124.0 Max. :56.0 Max. :64
## NA's :5 NA's :262 NA's :262 NA's :2 NA's :2
## time.ppn
## Min. : 5
## 1st Qu.: 90
## Median : 240
## Mean : 341
## 3rd Qu.: 518
## Max. :1560
## NA's :3
Randomization is not applicable here due to the nature of the data. 1046 subjects were initially interviewed in a study to understand the prevalence of obesity and diabetes in central Virginia for African Americans. The 403 of these subjects included in this dataset were the onces that were actually screened for diabetes.
The null hypothesis is that each of the 6 predictor variabes has no effect on the response, percent glycosolated hemoglobin. Analysis of Variance (ANOVA) will be used to understand the main effects of cholesterol (“newchol”, in mg/dL), stabilized glucose (“newstabglu”, in mg/dL), high density lipoprotein (“newhdl”, in mg/dL), age (“newage”, in years), gender (“newgender”, male or female), and height (“newheight”, in inches) on the response.
For these data, there are no replicates. Furthermore, the background information does not expressly state that repeated measures were used in collecting the data from each patient.
Blocking will not be utilized here for simplicity’s sake and because the objective is to focus on the design of the experiment.
Convert variables of interest from integers to factors:
diabetes$newchol=as.factor(diabetes$newchol)
diabetes$newstabglu=as.factor(diabetes$newstabglu)
diabetes$newhdl=as.factor(diabetes$newhdl)
diabetes$newage=as.factor(diabetes$newage)
diabetes$newgender=as.factor(diabetes$newgender)
diabetes$newheight=as.factor(diabetes$newheight)
Graph showing trends in the data:
par(mfrow=c(2,3))
plot(diabetes$glyhb~diabetes$newchol,xlab="newchol (mg/dL)",ylab="level of glycosolated hemoglobin (%)")
plot(diabetes$glyhb~diabetes$newstabglu,xlab="newstabglu (mg/dL)",ylab="level of glycosolated hemoglobin (%)")
plot(diabetes$glyhb~diabetes$newhdl,xlab="newhdl (mg/dL)",ylab="level of glycosolated hemoglobin (%)")
plot(diabetes$glyhb~diabetes$newage,xlab="newage (years)",ylab="level of glycosolated hemoglobin (%)")
plot(diabetes$glyhb~diabetes$gender,xlab="newgender (male or female)",ylab="level of glycosolated hemoglobin (%)")
plot(diabetes$glyhb~diabetes$newheight,xlab="newheight (inches)",ylab="level of glycosolated hemoglobin (%)")
Create Initial ANOVA model:
model1=lm(glyhb~newchol+newstabglu+newhdl+newage+newgender+newheight, data=diabetes)
anova(model1)
## Analysis of Variance Table
##
## Response: glyhb
## Df Sum Sq Mean Sq F value Pr(>F)
## newchol 1 63 63 21.86 4.1e-06 ***
## newstabglu 1 687 687 239.34 < 2e-16 ***
## newhdl 1 8 8 2.78 0.0965 .
## newage 1 70 70 24.49 1.1e-06 ***
## newgender 1 0 0 0.01 0.9039
## newheight 1 28 28 9.89 0.0018 **
## Residuals 383 1100 3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this particular model based on a full factorial design, we reject the H0 that “newchol”, “newstabglu”, “newage”, and “newheight” have no effect on the response. However, we fail to reject the H0 that “newhdl” and “newgender” have no effect on the response. Thus, variation in the response can be attributed to the main effects of “newchol”, “newstabglu”, “newage”, and “newheight”.
Next we contruct a design matrix for a 2^6-1 fraction experimental design. Based on the boxplots above, it is hypothesized that the factor “newstabglu” might have interaction effects when compared with the other variables. Compared to the other variables, there is a significant difference between the median % of glycosolated hemoglobin of the two levels of “newstabglu”.
library(FrF2)
## Warning: package 'FrF2' was built under R version 3.1.2
## Loading required package: DoE.base
## Loading required package: grid
## Loading required package: conf.design
##
## Attaching package: 'DoE.base'
##
## The following objects are masked from 'package:stats':
##
## aov, lm
##
## The following object is masked from 'package:graphics':
##
## plot.design
halffrac = FrF2(32,nfactors=6,estimable=formula("~newchol+newstabglu+newhdl+newage+newgender+newheight+newstabglu:(newchol+newhdl+newage+newgender+newheight)"),factor.names=c("newchol","newstabglu","newhdl","newage","newgender","newheight"),res4=TRUE,clear=FALSE)
halffrac
## newchol newstabglu newhdl newage newgender newheight
## 1 1 1 1 -1 1 -1
## 2 1 1 1 1 1 1
## 3 1 1 1 -1 -1 1
## 4 -1 1 1 1 1 -1
## 5 1 1 -1 1 -1 1
## 6 -1 1 1 -1 1 1
## 7 -1 -1 -1 1 1 -1
## 8 -1 -1 1 -1 1 -1
## 9 -1 -1 1 1 -1 -1
## 10 1 -1 1 1 -1 1
## 11 -1 -1 -1 -1 -1 -1
## 12 -1 1 -1 -1 -1 1
## 13 1 -1 -1 1 1 1
## 14 1 -1 1 1 1 -1
## 15 -1 1 1 -1 -1 -1
## 16 -1 -1 1 -1 -1 1
## 17 -1 -1 -1 1 -1 1
## 18 -1 1 -1 1 1 1
## 19 1 -1 -1 1 -1 -1
## 20 -1 -1 1 1 1 1
## 21 1 -1 1 -1 1 1
## 22 1 -1 1 -1 -1 -1
## 23 -1 1 -1 -1 1 -1
## 24 1 -1 -1 -1 -1 1
## 25 -1 1 -1 1 -1 -1
## 26 1 1 1 1 -1 -1
## 27 -1 -1 -1 -1 1 1
## 28 1 1 -1 -1 -1 -1
## 29 1 1 -1 -1 1 1
## 30 -1 1 1 1 -1 1
## 31 1 -1 -1 -1 1 -1
## 32 1 1 -1 1 1 -1
## class=design, type= FrF2.estimable
aliasprint(halffrac)
## $legend
## [1] A=newchol B=newstabglu C=newhdl D=newage E=newgender
## [6] F=newheight
##
## [[2]]
## [1] no aliasing among main effects and 2fis
Because we are interested in the main factor effects as well as interaction effects between “newstabglu” and the other factors of interest, no main effect can be aliased with any other main effect or with any 2 factor interaction. For this reason, res4 is set to TRUE in the design matrix setup. Higher resolutions are not necessary because we are not concerned with aliasing among two factor interactions or aliasing with other higher factor interactions.
Here we use the design matrix to select only the experimental runs from the diabetes dataset that correspond with the exact row value combinations found in the design matrix:
newdataset = merge(halffrac, diabetes, by=c("newchol","newstabglu","newhdl","newage","newgender","newheight"), all = FALSE)
head(newdataset)
## newchol newstabglu newhdl newage newgender newheight glyhb chol stabglu
## 1 -1 -1 -1 -1 -1 -1 4.52 172 101
## 2 -1 -1 -1 -1 -1 -1 3.96 135 88
## 3 -1 -1 -1 -1 -1 -1 4.75 179 75
## 4 -1 -1 -1 -1 -1 -1 5.55 190 84
## 5 -1 -1 -1 -1 -1 -1 4.64 160 71
## 6 -1 -1 -1 -1 -1 -1 4.75 197 92
## hdl age gender height ratio location weight frame bp.1s bp.1d bp.2s
## 1 46 42 female 65 3.7 Louisa 165 small 118 68 NA
## 2 34 29 female 65 4.0 Buckingham 123 small 118 61 NA
## 3 36 23 female 65 5.0 Buckingham 183 medium 120 80 NA
## 4 44 43 female 62 4.3 Buckingham 163 large 135 88 NA
## 5 44 36 female 64 3.6 Louisa 185 medium 110 80 NA
## 6 46 36 female 64 4.3 Buckingham 136 small NA NA NA
## bp.2d waist hip time.ppn
## 1 NA 33 45 150
## 2 NA 26 37 240
## 3 NA 43 45 720
## 4 NA 40 45 720
## 5 NA 39 45 300
## 6 NA 32 37 NA
#remove duplicate rows with same value combinations existing in range of columns 1:6
unique = unique( newdataset[ , 1:6])
unique
## newchol newstabglu newhdl newage newgender newheight
## 1 -1 -1 -1 -1 -1 -1
## 43 -1 -1 -1 -1 1 1
## 65 -1 -1 -1 1 -1 1
## 66 -1 -1 -1 1 1 -1
## 69 -1 -1 1 -1 -1 1
## 78 -1 -1 1 -1 1 -1
## 82 -1 -1 1 1 -1 -1
## 92 -1 -1 1 1 1 1
## 103 -1 1 -1 -1 -1 1
## 106 -1 1 -1 1 -1 -1
## 109 -1 1 -1 1 1 1
## 125 -1 1 1 -1 -1 -1
## 128 1 -1 -1 -1 -1 1
## 130 1 -1 -1 -1 1 -1
## 133 1 -1 -1 1 -1 -1
## 157 1 -1 -1 1 1 1
## 164 1 -1 1 -1 -1 -1
## 173 1 -1 1 -1 1 1
## 187 1 -1 1 1 1 -1
## 190 1 1 -1 -1 -1 -1
## 193 1 1 -1 -1 1 1
## 197 1 1 -1 1 -1 1
## 200 1 1 -1 1 1 -1
## 201 1 1 1 -1 -1 1
## 202 1 1 1 1 -1 -1
## 217 1 1 1 1 1 1
#find values of response variable corresponding to the indices of unique row values
rownames(unique)
## [1] "1" "43" "65" "66" "69" "78" "82" "92" "103" "106" "109"
## [12] "125" "128" "130" "133" "157" "164" "173" "187" "190" "193" "197"
## [23] "200" "201" "202" "217"
glyhb = newdataset$glyhb[index=c(1,43,65,66,69,78,82,92,103,106,109,125,128,130,133,157,164,173,187,190,193,197,200,201,202,217)]
glyhb
## [1] 4.52 4.21 5.66 4.82 4.18 3.67 4.03 4.88 12.74 4.44 6.13
## [12] 8.06 3.55 NA 5.23 4.98 4.25 6.42 4.66 4.97 4.87 9.82
## [23] 5.60 4.40 11.41 7.87
#append values of response variables to appropriate unique rows
newdesignmatrix = cbind(unique,glyhb)
newdesignmatrix
## newchol newstabglu newhdl newage newgender newheight glyhb
## 1 -1 -1 -1 -1 -1 -1 4.52
## 43 -1 -1 -1 -1 1 1 4.21
## 65 -1 -1 -1 1 -1 1 5.66
## 66 -1 -1 -1 1 1 -1 4.82
## 69 -1 -1 1 -1 -1 1 4.18
## 78 -1 -1 1 -1 1 -1 3.67
## 82 -1 -1 1 1 -1 -1 4.03
## 92 -1 -1 1 1 1 1 4.88
## 103 -1 1 -1 -1 -1 1 12.74
## 106 -1 1 -1 1 -1 -1 4.44
## 109 -1 1 -1 1 1 1 6.13
## 125 -1 1 1 -1 -1 -1 8.06
## 128 1 -1 -1 -1 -1 1 3.55
## 130 1 -1 -1 -1 1 -1 NA
## 133 1 -1 -1 1 -1 -1 5.23
## 157 1 -1 -1 1 1 1 4.98
## 164 1 -1 1 -1 -1 -1 4.25
## 173 1 -1 1 -1 1 1 6.42
## 187 1 -1 1 1 1 -1 4.66
## 190 1 1 -1 -1 -1 -1 4.97
## 193 1 1 -1 -1 1 1 4.87
## 197 1 1 -1 1 -1 1 9.82
## 200 1 1 -1 1 1 -1 5.60
## 201 1 1 1 -1 -1 1 4.40
## 202 1 1 1 1 -1 -1 11.41
## 217 1 1 1 1 1 1 7.87
Create Secondary ANOVA model testing main factor effects and interaction effects of “newstabglu” with other factors of interest:
model2=lm(glyhb~newstabglu*newchol+newstabglu*newhdl+newstabglu*newage+newstabglu*newgender+newstabglu*newheight, data=newdesignmatrix)
anova(model2)
## Analysis of Variance Table
##
## Response: glyhb
## Df Sum Sq Mean Sq F value Pr(>F)
## newstabglu 1 43.4 43.4 8.18 0.013 *
## newchol 1 0.2 0.2 0.03 0.867
## newhdl 1 0.8 0.8 0.15 0.707
## newage 1 2.0 2.0 0.37 0.553
## newgender 1 3.2 3.2 0.61 0.450
## newheight 1 4.9 4.9 0.92 0.354
## newstabglu:newchol 1 2.4 2.4 0.45 0.516
## newstabglu:newhdl 1 1.1 1.1 0.21 0.651
## newstabglu:newage 1 0.1 0.1 0.02 0.895
## newstabglu:newgender 1 7.6 7.6 1.44 0.252
## newstabglu:newheight 1 2.6 2.6 0.49 0.496
## Residuals 13 68.9 5.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this particular model based on a half fraction factorial design, we reject the H0 that “newstabglu” has no effect on the response. However, we fail to reject the H0 that “newage”, “newchol”, “newhdl”, “newgender”, and “newheight” have no effect on the response. Thus, variation in the response can be attributed to the main effect of “newstabglu”. Furthermore, there were no significant interaction effects between “newstabglu” and other factors of interest.
Estimate Parameters using Shapiro-Wilk test of normality
shapiro.test(newdesignmatrix$glyhb)
##
## Shapiro-Wilk normality test
##
## data: newdesignmatrix$glyhb
## W = 0.7676, p-value = 6.851e-05
We reject the H0 that the response variable, percent glycosolcated hemoglobin, is normally distributed. Thus, we attempt to transform the data:
shapiro.test(newdesignmatrix$glyhb^2)
##
## Shapiro-Wilk normality test
##
## data: newdesignmatrix$glyhb^2
## W = 0.6585, p-value = 2.114e-06
shapiro.test(sqrt(newdesignmatrix$glyhb))
##
## Shapiro-Wilk normality test
##
## data: sqrt(newdesignmatrix$glyhb)
## W = 0.8222, p-value = 0.0005451
shapiro.test(log(newdesignmatrix$glyhb))
##
## Shapiro-Wilk normality test
##
## data: log(newdesignmatrix$glyhb)
## W = 0.8723, p-value = 0.004814
shapiro.test(log10(newdesignmatrix$glyhb))
##
## Shapiro-Wilk normality test
##
## data: log10(newdesignmatrix$glyhb)
## W = 0.8723, p-value = 0.004814
shapiro.test(1/newdesignmatrix$glyhb)
##
## Shapiro-Wilk normality test
##
## data: 1/newdesignmatrix$glyhb
## W = 0.9461, p-value = 0.2049
It appears that the data are readily transformed into a normal distribution by using the logarithm of the response variable. Thus, we permanently transform the response in the design matrix and re-run the ANOVA.
newdesignmatrix$glyhb = log(newdesignmatrix$glyhb)
newdesignmatrix
## newchol newstabglu newhdl newage newgender newheight glyhb
## 1 -1 -1 -1 -1 -1 -1 1.509
## 43 -1 -1 -1 -1 1 1 1.437
## 65 -1 -1 -1 1 -1 1 1.733
## 66 -1 -1 -1 1 1 -1 1.573
## 69 -1 -1 1 -1 -1 1 1.430
## 78 -1 -1 1 -1 1 -1 1.300
## 82 -1 -1 1 1 -1 -1 1.394
## 92 -1 -1 1 1 1 1 1.585
## 103 -1 1 -1 -1 -1 1 2.545
## 106 -1 1 -1 1 -1 -1 1.491
## 109 -1 1 -1 1 1 1 1.813
## 125 -1 1 1 -1 -1 -1 2.087
## 128 1 -1 -1 -1 -1 1 1.267
## 130 1 -1 -1 -1 1 -1 NA
## 133 1 -1 -1 1 -1 -1 1.654
## 157 1 -1 -1 1 1 1 1.605
## 164 1 -1 1 -1 -1 -1 1.447
## 173 1 -1 1 -1 1 1 1.859
## 187 1 -1 1 1 1 -1 1.539
## 190 1 1 -1 -1 -1 -1 1.603
## 193 1 1 -1 -1 1 1 1.583
## 197 1 1 -1 1 -1 1 2.284
## 200 1 1 -1 1 1 -1 1.723
## 201 1 1 1 -1 -1 1 1.482
## 202 1 1 1 1 -1 -1 2.434
## 217 1 1 1 1 1 1 2.063
model3=lm(glyhb~newstabglu*newchol+newstabglu*newhdl+newstabglu*newage+newstabglu*newgender+newstabglu*newheight, data=newdesignmatrix)
anova(model3)
## Analysis of Variance Table
##
## Response: glyhb
## Df Sum Sq Mean Sq F value Pr(>F)
## newstabglu 1 0.962 0.962 9.19 0.0096 **
## newchol 1 0.000 0.000 0.00 0.9697
## newhdl 1 0.013 0.013 0.13 0.7292
## newage 1 0.090 0.090 0.86 0.3693
## newgender 1 0.030 0.030 0.29 0.5987
## newheight 1 0.101 0.101 0.96 0.3441
## newstabglu:newchol 1 0.051 0.051 0.49 0.4970
## newstabglu:newhdl 1 0.037 0.037 0.35 0.5636
## newstabglu:newage 1 0.000 0.000 0.00 0.9542
## newstabglu:newgender 1 0.109 0.109 1.04 0.3264
## newstabglu:newheight 1 0.027 0.027 0.26 0.6214
## Residuals 13 1.360 0.105
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this transformed model based on a half fraction factorial design, we reject the H0 that “newstabglu” has no effect on the response. However, we fail to reject the H0 that “newchol”, “newhdl”, “newage”, “newgender”, and “newheight” have no effect on the response. Thus variation in the response can be attributed to the main effect of “newstabglu”. Furthermore, there was no significant interaction effect between “newstabglu” and other factors of interest. This model is the most valid of the three models since the response variable has been transformed to be normally distributed.
Diagnostics and Model Adequacy Checking:
par(mfrow=c(2,2))
plot(model3)
The model appears to have a very good fit to the data since the points are evenly dispersed across the Cartesian plane of residual error and the fitted model.
References:
Department of Biostatistics, Vanderbilt University. 2014. Originally obtained from Dr. John Schorling, Department of Medicine, University of Virginia School of Medicine. “http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/diabetes.html”
Willems JP, Saunders JT, DE Hunt, JB Schorling: Prevalence of coronary heart disease risk factors among rural blacks: A community-based study. Southern Medical Journal 90:814-820; 1997