Data is obtained from the ‘fueleconomy’ dataset. 5 possible factors that might influence city fuel economy (in mpg) are analyzed using Taguchi design. A quick view of the dataset is as below.
remove(list=ls())
#Loading raw data
library("fueleconomy")
## Warning: package 'fueleconomy' was built under R version 3.1.2
#Quick view and data summary
head(vehicles)
## id make model year class
## 1 27550 AM General DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 2 28426 AM General DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 3 27549 AM General FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 4 28425 AM General FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 5 1032 AM General Post Office DJ5 2WD 1985 Special Purpose Vehicle 2WD
## 6 1033 AM General Post Office DJ8 2WD 1985 Special Purpose Vehicle 2WD
## trans drive cyl displ fuel hwy cty
## 1 Automatic 3-spd 2-Wheel Drive 4 2.5 Regular 17 18
## 2 Automatic 3-spd 2-Wheel Drive 4 2.5 Regular 17 18
## 3 Automatic 3-spd 2-Wheel Drive 6 4.2 Regular 13 13
## 4 Automatic 3-spd 2-Wheel Drive 6 4.2 Regular 13 13
## 5 Automatic 3-spd Rear-Wheel Drive 4 2.5 Regular 17 16
## 6 Automatic 3-spd Rear-Wheel Drive 6 4.2 Regular 13 13
tail(vehicles)
## id make model year class
## 33437 31064 smart fortwo electric drive cabriolet 2011 Two Seaters
## 33438 33305 smart fortwo electric drive convertible 2013 Two Seaters
## 33439 34393 smart fortwo electric drive convertible 2014 Two Seaters
## 33440 31065 smart fortwo electric drive coupe 2011 Two Seaters
## 33441 33306 smart fortwo electric drive coupe 2013 Two Seaters
## 33442 34394 smart fortwo electric drive coupe 2014 Two Seaters
## trans drive cyl displ fuel hwy cty
## 33437 Automatic (A1) Rear-Wheel Drive NA NA Electricity 79 94
## 33438 Automatic (A1) Rear-Wheel Drive NA NA Electricity 93 122
## 33439 Automatic (A1) Rear-Wheel Drive NA NA Electricity 93 122
## 33440 Automatic (A1) Rear-Wheel Drive NA NA Electricity 79 94
## 33441 Automatic (A1) Rear-Wheel Drive NA NA Electricity 93 122
## 33442 Automatic (A1) Rear-Wheel Drive NA NA Electricity 93 122
summary(vehicles)
## id make model year
## Min. : 1 Length:33442 Length:33442 Min. :1984
## 1st Qu.: 8361 Class :character Class :character 1st Qu.:1991
## Median :16724 Mode :character Mode :character Median :1999
## Mean :17038 Mean :1999
## 3rd Qu.:25265 3rd Qu.:2008
## Max. :34932 Max. :2015
##
## class trans drive cyl
## Length:33442 Length:33442 Length:33442 Min. : 2.00
## Class :character Class :character Class :character 1st Qu.: 4.00
## Mode :character Mode :character Mode :character Median : 6.00
## Mean : 5.77
## 3rd Qu.: 6.00
## Max. :16.00
## NA's :58
## displ fuel hwy cty
## Min. :0.00 Length:33442 Min. : 9.0 Min. : 6.0
## 1st Qu.:2.30 Class :character 1st Qu.: 19.0 1st Qu.: 15.0
## Median :3.00 Mode :character Median : 23.0 Median : 17.0
## Mean :3.35 Mean : 23.6 Mean : 17.5
## 3rd Qu.:4.30 3rd Qu.: 27.0 3rd Qu.: 20.0
## Max. :8.40 Max. :109.0 Max. :138.0
## NA's :57
There are five factors that may influence vehicle city fuel economy, namely model year (year), drive train (drive), vehicle size class (class), fuel type (fuel) and number of cylinders (cyl) in this analysis.
unique(vehicles$year)
## [1] 1984 1985 1987 1997 1998 1999 1995 1996 2001 2002 2003 2000 2004 2013
## [15] 2014 1986 1988 1989 1990 1991 1992 1993 1994 2005 2006 2007 2008 2009
## [29] 2010 2011 2012 2015
unique(vehicles$drive)
## [1] "2-Wheel Drive" "Rear-Wheel Drive"
## [3] "Front-Wheel Drive" "4-Wheel or All-Wheel Drive"
## [5] "All-Wheel Drive" "4-Wheel Drive"
## [7] "Part-time 4-Wheel Drive"
unique(vehicles$class)
## [1] "Special Purpose Vehicle 2WD"
## [2] "Midsize Cars"
## [3] "Subcompact Cars"
## [4] "Compact Cars"
## [5] "Sport Utility Vehicle - 4WD"
## [6] "Small Sport Utility Vehicle 2WD"
## [7] "Small Sport Utility Vehicle 4WD"
## [8] "Two Seaters"
## [9] "Sport Utility Vehicle - 2WD"
## [10] "Special Purpose Vehicles"
## [11] "Special Purpose Vehicle 4WD"
## [12] "Small Station Wagons"
## [13] "Minicompact Cars"
## [14] "Midsize-Large Station Wagons"
## [15] "Midsize Station Wagons"
## [16] "Large Cars"
## [17] "Standard Sport Utility Vehicle 4WD"
## [18] "Standard Sport Utility Vehicle 2WD"
## [19] "Minivan - 4WD"
## [20] "Minivan - 2WD"
## [21] "Vans"
## [22] "Vans, Cargo Type"
## [23] "Vans, Passenger Type"
## [24] "Standard Pickup Trucks 2WD"
## [25] "Standard Pickup Trucks"
## [26] "Standard Pickup Trucks/2wd"
## [27] "Small Pickup Trucks 2WD"
## [28] "Standard Pickup Trucks 4WD"
## [29] "Small Pickup Trucks 4WD"
## [30] "Small Pickup Trucks"
## [31] "Vans Passenger"
## [32] "Special Purpose Vehicle"
## [33] "Special Purpose Vehicles/2wd"
## [34] "Special Purpose Vehicles/4wd"
unique(vehicles$fuel)
## [1] "Regular" "Premium"
## [3] "Diesel" "Premium or E85"
## [5] "Electricity" "Gasoline or E85"
## [7] "Premium Gas or Electricity" "Gasoline or natural gas"
## [9] "CNG" "Midgrade"
## [11] "Regular Gas and Electricity" "Gasoline or propane"
## [13] "Premium and Electricity"
unique(vehicles$cyl)
## [1] 4 6 5 8 12 10 NA 16 3 2
Originally there are too many levels in each factor, for convenience and further Taguchi design, we convert these factors and levels into categorical factors by appropriate categorization.
Years are categorized based on different decades, which is a typical way when considering the time when a vehicle is manufactured.
#Categories of years
vehicles$year[as.numeric(vehicles$year) >= 1984 & as.numeric(vehicles$year) <= 1999] = "1"
vehicles$year[as.numeric(vehicles$year) >= 2000 & as.numeric(vehicles$year) <= 2009] = "2"
vehicles$year[as.numeric(vehicles$year) >= 2010 & as.numeric(vehicles$year) <= 2015] = "3"
unique(vehicles$year)
## [1] "1" "2" "3"
Drive train is categorized into 2-wheel drive, 4-wheel drive and all-wheel drive as below. It is also a typical categorization when we want to buy a vehicle.
#Categories of drive train
vehicles$drive[ (vehicles$drive) == "2-Wheel Drive" | (vehicles$drive) == "Rear-Wheel Drive" | (vehicles$drive) == "Front-Wheel Drive"] = "1"
vehicles$drive[ (vehicles$drive) == "Part-time 4-Wheel Drive" | (vehicles$drive) == "4-Wheel Drive"] = "2"
vehicles$drive[ (vehicles$drive) == "All-Wheel Drive" | (vehicles$drive) == "4-Wheel or All-Wheel Drive"] = "3"
unique(vehicles$drive)
## [1] "1" "3" "2"
Vehicle size class is classified into 3 categories, namely commonly used class, for passenger and cargo, and for special purpose.
#Categories of class
#Commonly used classes - "1"
vehicles$class[ (vehicles$class) == "Compact Cars" | (vehicles$class) == "Subcompact Cars" | (vehicles$class) == "Two Seaters" | (vehicles$class) == "Small Station Wagons" | (vehicles$class) == "Minicompact Cars" | (vehicles$class) == "Sport Utility Vehicle - 4WD" | (vehicles$class) == "Small Sport Utility Vehicle 2WD" | (vehicles$class) == "Small Sport Utility Vehicle 4WD"| (vehicles$class) == "Sport Utility Vehicle - 2WD"| (vehicles$class) == "Standard Sport Utility Vehicle 4WD"| (vehicles$class) == "Standard Sport Utility Vehicle 2WD" | (vehicles$class) == "Midsize Cars"] = "1"
#For passenger and cargo - "2"
vehicles$class[(vehicles$class) == "Vans, Cargo Type"| (vehicles$class) == "Standard Pickup Trucks 2WD"| (vehicles$class) == "Standard Pickup Trucks"| (vehicles$class) == "Standard Pickup Trucks/2wd"| (vehicles$class) == "Standard Pickup Trucks 4WD" | (vehicles$class) == "Midsize-Large Station Wagons"| (vehicles$class) == "Midsize Station Wagons"| (vehicles$class) == "Large Cars"| (vehicles$class) == "Minivan - 4WD"| (vehicles$class) == "Minivan - 2WD"| (vehicles$class) == "Vans"| (vehicles$class) == "Vans, Passenger Type"| (vehicles$class) == "Small Pickup Trucks 2WD"| (vehicles$class) == "Small Pickup Trucks 4WD"| (vehicles$class) == "Small Pickup Trucks"| (vehicles$class) == "Vans Passenger"] = "2"
#Special purpose - "3"
vehicles$class[(vehicles$class) == "Special Purpose Vehicle 2WD" | (vehicles$class) == "Special Purpose Vehicles" | (vehicles$class) == "Special Purpose Vehicle 4WD" | (vehicles$class) == "Special Purpose Vehicle" | (vehicles$class) == "Special Purpose Vehicles/2wd" | (vehicles$class) == "Special Purpose Vehicles/4wd"] = "3"
unique(vehicles$class)
## [1] "3" "1" "2"
Fuel types are divided into 4 categories. One thing to point out is that there are some overlaps, for example, ‘gasoline or natural gas’ is categorized into ‘1’ instead of ‘4’. It is purely arbitrary when considering these levels, and different categorization ways can possibly have different results.
#Categories of fuel types
#Common gasoline - "1"
vehicles$fuel[(vehicles$fuel) == "Gasoline or E85" | (vehicles$fuel) == "Gasoline or natural gas" | (vehicles$fuel) == "Gasoline or propane" | (vehicles$fuel) == "Midgrade" | (vehicles$fuel) == "Regular" | (vehicles$fuel) == "Regular Gas and Electricity"] = "1"
#Premium gasoline - "2"
vehicles$fuel[(vehicles$fuel) == "Premium" | (vehicles$fuel) == "Premium or E85"] = "2"
#Electricity - "3"
vehicles$fuel[(vehicles$fuel) == "Premium and Electricity" | (vehicles$fuel) == "Premium Gas or Electricity" | (vehicles$fuel) == "Electricity"] = "3"
#Others - '4'
vehicles$fuel[(vehicles$fuel) == "CNG" | (vehicles$fuel) == "Diesel" ] = "4"
unique(vehicles$fuel)
## [1] "1" "2" "4" "3"
Number for cylinders has 3 catergories as below, which could reflect power of a vehicle.
#Categories of numbers of cylinder
vehicles$cyl[as.numeric(vehicles$cyl) >= 2 & as.numeric(vehicles$cyl) <= 4] = "1"
vehicles$cyl[as.numeric(vehicles$cyl) > 4 & as.numeric(vehicles$cyl) < 8] = "2"
vehicles$cyl[as.numeric(vehicles$cyl) >=8 & as.numeric(vehicles$cyl) <= 16] = "3"
unique(vehicles$cyl)
## [1] "1" "2" "3" NA
All factors are categorical variables, and the response variable ‘cty’, which means city fuel economy in mpg (miles per galon) is a continuous variable which can be any integer from 6 to 138.
Detailed information is provided in the ‘fueleconomy’ dataset, including almost everything to evaluate an auto. 5 factors are seleced in this analysis, they are model year (year), drive train (drive), vehicle size class (class), fuel type (fuel) and number of cylinders (cyl). We convert these factors into categorical factors, each with 3 levels except for fuel, which has 4 levels.
str(vehicles)
## Classes 'tbl_df', 'tbl' and 'data.frame': 33442 obs. of 12 variables:
## $ id : int 27550 28426 27549 28425 1032 1033 3347 13309 13310 13311 ...
## $ make : chr "AM General" "AM General" "AM General" "AM General" ...
## $ model: chr "DJ Po Vehicle 2WD" "DJ Po Vehicle 2WD" "FJ8c Post Office" "FJ8c Post Office" ...
## $ year : chr "1" "1" "1" "1" ...
## $ class: chr "3" "3" "3" "3" ...
## $ trans: chr "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" ...
## $ drive: chr "1" "1" "1" "1" ...
## $ cyl : chr "1" "1" "2" "2" ...
## $ displ: num 2.5 2.5 4.2 4.2 2.5 4.2 3.8 2.2 2.2 3 ...
## $ fuel : chr "1" "1" "1" "1" ...
## $ hwy : int 17 17 13 13 17 13 21 26 28 26 ...
## $ cty : int 18 18 13 13 16 13 14 20 22 18 ...
Fuel economy data are the result of vehicle testing done at the Environmental Protection Agency’s National Vehicle and Fuel Emissions Laboratory in Ann Arbor, Michigan, and by vehicle manufacturers with oversight by EPA. Therefore it is reasonalbe to assume these data are complete randomized.
There is not replication or repeated measures in raw data or this analysis.
Blocks are set in all five factors as above by appropriate categorization.
Obtain data from ‘fueleconomy’ dataset, categorize selected factors and then do exploratory data analysis and ANOVA to test effect of individual factors and levels on vehicle fuel economy. Followed is creating orthogonal array for taguchi design using oa.design in DoE package. Find the optimized situation and finally do the model adequacy checking. Therefore, null hypothesis is: H0: variance in vehicle city fuel economy can only be explained by randomization. Alternative hypothesis: Ha: variance in vehicle city fuel economy can be explained by anything other than randomization (one or more of these 5 factors).
The Taguchi method is a structured approach for determining the best combination of inputs to produce a product or service. Through Taguchi design it is possible to find the most optmized combination using only several runs. The aim in this design is to find certain combinations of factors that can have the best city fuel economy using Taguchi method.
#Histogram of response variable
hist(vehicles$cty, ylab="City Fuel Economy(mpg)")
#Boxplot
boxplot(cty~year, data=vehicles,xlab='year',ylab='city fuel economy',main='Made year-city fuel economy boxplot')
boxplot(cty~drive, data=vehicles,xlab='drive',ylab='city fuel economy',main='Drive train-city fuel economy boxplot')
boxplot(cty~class, data=vehicles,xlab='size class',ylab='city fuel economy',main='Size class-city fuel economy boxplot')
boxplot(cty~fuel, data=vehicles, xlab='fuel type',ylab='city fuel economy',main='Fuel type-city fuel economy boxplot')
boxplot(cty~cyl, data=vehicles, xlab='number of cylinders',ylab='city fuel economy',main='Number of cylinders-city fuel economy boxplot')
From plots above, influence from most factors are not obvious, except for fuel type and number of cylinders
#Create linear model and conduct ANOVA
model1=lm(cty~year+drive+class+fuel+cyl,data=vehicles)
anova(model1)
## Analysis of Variance Table
##
## Response: cty
## Df Sum Sq Mean Sq F value Pr(>F)
## year 2 21887 10943 1462 <2e-16 ***
## drive 2 46327 23164 3095 <2e-16 ***
## class 2 78989 39494 5277 <2e-16 ***
## fuel 3 54626 18209 2433 <2e-16 ***
## cyl 2 249119 124560 16644 <2e-16 ***
## Residuals 33372 249742 7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ANOVA result is different to boxplot, all 5 factor return p-values equal to 0 in ANOVA, probably because there are too many observations therefore difference cannot be obviously found in boxplot. From ANOVA result we can reject the null and accept the alternative hypothesis, which is the variance of city fuel economy can be explained by something other than randomization.
#Loading packages
library(qualityTools)
## Warning: package 'qualityTools' was built under R version 3.1.2
library(DoE.base)
## Loading required package: grid
## Loading required package: conf.design
##
## Attaching package: 'DoE.base'
##
## The following objects are masked from 'package:stats':
##
## aov, lm
##
## The following object is masked from 'package:graphics':
##
## plot.design
#Create orthogonal array for Taguchi design
oa = oa.design(factor.names=c("year","drive","class","fuel","cyl"), nlevels=c(3,3,3,4,3), column="min3")
#When column='min3', aliasing between main effects and 2-factor interactions is kept to a minimal degree
oa
## year drive class fuel cyl
## 1 1 1 2 3 2
## 2 1 3 1 4 2
## 3 1 3 2 3 1
## 4 1 3 3 3 3
## 5 3 1 1 2 1
## 6 3 2 2 3 3
## 7 2 1 3 1 2
## 8 3 3 2 2 2
## 9 3 1 2 1 3
## 10 3 2 1 1 2
## 11 2 3 1 1 3
## 12 2 1 1 3 3
## 13 2 1 3 3 1
## 14 1 3 2 1 2
## 15 1 1 1 2 3
## 16 3 1 3 4 3
## 17 2 3 2 4 3
## 18 1 2 2 2 1
## 19 2 2 3 3 2
## 20 3 3 1 3 2
## 21 3 2 3 4 2
## 22 2 2 2 1 1
## 23 2 1 2 4 2
## 24 1 1 3 2 2
## 25 2 2 1 2 2
## 26 1 2 3 1 3
## 27 2 3 3 2 1
## 28 1 1 1 1 1
## 29 2 2 2 2 3
## 30 2 3 1 4 1
## 31 3 2 1 3 1
## 32 1 2 1 4 3
## 33 3 1 2 4 1
## 34 3 3 3 2 3
## 35 1 2 3 4 1
## 36 3 3 3 1 1
## class=design, type= oa
There are several methods to create orthogonal array for Taguchi design, including param.design, oa.design and taguchiDesign. At first I wanted to use taguchiDesign, however, only certain types of array can be created, unfortunately, array with five three-level factors is not included.
Pickup data according to orthogonal array
#Create new dataset according to orthogonal array
vehicles1 = merge(oa, vehicles, by=c("year","drive","class","fuel","cyl"), all=FALSE)
head(vehicles1)
## year drive class fuel cyl id make model trans
## 1 1 1 1 1 1 11739 Mitsubishi Mirage Manual 5-spd
## 2 1 1 1 1 1 6484 Nissan Sentra Coupe Manual 5-spd
## 3 1 1 1 1 1 15022 Oldsmobile Alero Automatic 4-spd
## 4 1 1 1 1 1 6526 Honda Civic Manual 4-spd
## 5 1 1 1 1 1 1958 Chevrolet Cavalier Automatic 3-spd
## 6 1 1 1 1 1 29090 Mitsubishi Mirage Manual 5-spd
## displ hwy cty
## 1 1.8 30 23
## 2 1.6 33 24
## 3 2.4 28 19
## 4 1.5 33 28
## 5 2.0 26 20
## 6 1.8 27 22
tail(vehicles1)
## year drive class fuel cyl id make model
## 6357 3 3 2 2 2 33852 Cadillac XTS AWD
## 6358 3 3 2 2 2 28701 Audi A6 Avant quattro
## 6359 3 3 2 2 2 33491 BMW X1 xDrive35i
## 6360 3 3 2 2 2 33096 BMW 740Li xDrive
## 6361 3 3 2 2 2 33091 BMW 535i xDrive Gran Turismo
## 6362 3 3 2 2 2 34355 Porsche Panamera 4
## trans displ hwy cty
## 6357 Automatic (S6) 3.6 24 16
## 6358 Automatic (S6) 3.0 26 18
## 6359 Automatic (S6) 3.0 27 18
## 6360 Automatic (S8) 3.0 28 19
## 6361 Automatic (S8) 3.0 26 18
## 6362 Auto(AM-S7) 3.6 27 18
Then remove data with the same combination, and only keep one left.
#Create unique combination
removal=unique(vehicles1[,1:5])
removal
## year drive class fuel cyl
## 1 1 1 1 1 1
## 4089 1 1 1 2 3
## 4603 1 1 3 2 2
## 4611 1 3 2 1 2
## 5223 2 3 1 1 3
## 5561 2 3 1 4 1
## 5563 2 3 2 4 3
## 5565 3 1 1 2 1
## 5982 3 1 2 1 3
## 6231 3 1 3 4 3
## 6233 3 2 1 1 2
## 6336 3 3 2 2 2
rownames(removal)
## [1] "1" "4089" "4603" "4611" "5223" "5561" "5563" "5565" "5982" "6231"
## [11] "6233" "6336"
Create new orthogonal array based on selected rownames.
newcty=vehicles1$cty[c(1,4089,4603,4611,5223,5561,5563,5565,5982,6231,6233,6336)]
newoa=cbind(removal,newcty)
newoa
## year drive class fuel cyl newcty
## 1 1 1 1 1 1 23
## 4089 1 1 1 2 3 12
## 4603 1 1 3 2 2 15
## 4611 1 3 2 1 2 15
## 5223 2 3 1 1 3 12
## 5561 2 3 1 4 1 19
## 5563 2 3 2 4 3 8
## 5565 3 1 1 2 1 25
## 5982 3 1 2 1 3 12
## 6231 3 1 3 4 3 11
## 6233 3 2 1 1 2 16
## 6336 3 3 2 2 2 19
Signal to Noise ratio is calculated as below. In order to get a higher city fuel economy, use the formula: -10log(1/Yi^2), where Y is the response variable.
#Calculate S/N
SN=-10*log10(1/newoa$newcty^2)
SN
## [1] 27.23 21.58 23.52 23.52 21.58 25.58 18.06 27.96 21.58 20.83 24.08
## [12] 25.58
#Find optimized combination
optimal=which(SN==max(-10*log10(1/newoa$newcty^2)))
optimal
## [1] 8
newoa[optimal,]
## year drive class fuel cyl newcty
## 5565 3 1 1 2 1 25
According to result above, the highest S/N appears at the 8th combination, which is also the most optimized one (5565), representing vehicles made after 2010,2-wheel, front or rear-wheel drive, commonly used car, use premium gasoline, and have 2-4 cylinders.
ANOVA to test main effect main effect of each factor in Taguchi design.
model2=lm(newcty~year+drive+class+fuel+cyl, data=newoa)
anova(model2)
## Analysis of Variance Table
##
## Response: newcty
## Df Sum Sq Mean Sq F value Pr(>F)
## year 2 27.0 13.5 13.48 0.189
## drive 2 1.1 0.6 0.56 0.687
## class 2 131.3 65.7 65.66 0.087 .
## fuel 2 0.8 0.4 0.40 0.747
## cyl 2 123.7 61.9 61.86 0.090 .
## Residuals 1 1.0 1.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Only year, class and number of cylinder return a p-value smaller than 0.05, indicating main effect is significant. However this does not correspond to ANOVA result from raw data.
#shapiro test
shapiro.test(newoa$newcty)
##
## Shapiro-Wilk normality test
##
## data: newoa$newcty
## W = 0.9464, p-value = 0.5849
Shapiro test reveals a p-value = 0.13 which indicates null hypothesis of shapiro test cannot be rejected, namely response variable is normally distributed.
#qqplot
qqnorm(residuals(model2))
qqline(residuals(model2))
#fitted-residual plot
plot(fitted(model2),residuals(model2))
As expected from ANOVA result, Q-Q plot and Q-Q line of residuals exhibit near perfect linear pattern of residuals, especially at the middle part, which means the new modelis valid.
Fitted and residuals model also has a very good result with most points appearing at zero, indicating the model is adequate to explian the variance of cty.
None
1.In this analysis, the most optimized combination also corresponds our common sense, therefore this analysis is both statistically significant and practically significant.
2.According to the result, it is suggested that Taguchi method can be a good guide. However ANOVA of Taguchi design returns a different result to ANOVA of raw data, probably the way levels categorized, the way to remove unnecessary combinations when new dataset is obtained from orthogonal array may influence experiment result. After all a perfect fitted model will not appear under practicle conditions.