1. Introduction

Topics about properties have been analyzed in great detail by researchers in the past decades. Many factors can affect the value of housing, making the research more complex. In 100+ Interesting Data Sets for Statistics, a data set called Ecdat is introduced. It is a wealth of data sets available for R, containing gobs of econometric data.In this project, we study sales prices of houses in the City of Windsor. The data used in this recipe is found using the “100+ interesting data sets” webpage, and it is publicly available in R package named Ecdat. A summary of this package is available at https://cran.r-project.org/web/packages/Ecdat/Ecdat.pdf.

In this study, We apply Taguchi design (FFD) among Housing dataset in Ecdat. The variables of study include two 2-level factors and two 3-level factors that may influence the price. Taguchi Design was used to perform this analysis.

2. Setting

System Under Test

In the first step, the data is downloaded from Ecdat package.the Housing dataframe was imported and assigned to data dataframe:

library("Ecdat")

## Loading required package: Ecfun

## 
## Attaching package: 'Ecfun'

## The following object is masked from 'package:base':
## 
##     sign

## 
## Attaching package: 'Ecdat'

## The following object is masked from 'package:datasets':
## 
##     Orange

data<- Housing

The structure, heading and tail of the data frame is shown below to analyze the data innitially.

str(data)

## 'data.frame':    546 obs. of  12 variables:
##  $ price   : num  42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
##  $ lotsize : num  5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
##  $ bedrooms: num  3 2 3 3 2 3 3 3 3 3 ...
##  $ bathrms : num  1 1 1 1 1 1 2 1 1 2 ...
##  $ stories : num  2 1 1 2 1 1 2 3 1 4 ...
##  $ driveway: Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ recroom : Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 2 2 ...
##  $ fullbase: Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
##  $ gashw   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ airco   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 2 ...
##  $ garagepl: num  1 0 0 0 0 0 2 0 0 1 ...
##  $ prefarea: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

head(data)

##   price lotsize bedrooms bathrms stories driveway recroom fullbase gashw
## 1 42000    5850        3       1       2      yes      no      yes    no
## 2 38500    4000        2       1       1      yes      no       no    no
## 3 49500    3060        3       1       1      yes      no       no    no
## 4 60500    6650        3       1       2      yes     yes       no    no
## 5 61000    6360        2       1       1      yes      no       no    no
## 6 66000    4160        3       1       1      yes     yes      yes    no
##   airco garagepl prefarea
## 1    no        1       no
## 2    no        0       no
## 3    no        0       no
## 4    no        0       no
## 5    no        0       no
## 6   yes        0       no

tail(data)

##      price lotsize bedrooms bathrms stories driveway recroom fullbase
## 541  85000    6525        3       2       4      yes      no       no
## 542  91500    4800        3       2       4      yes     yes       no
## 543  94000    6000        3       2       4      yes      no       no
## 544 103000    6000        3       2       4      yes     yes       no
## 545 105000    6000        3       2       2      yes     yes       no
## 546 105000    6000        3       1       2      yes      no       no
##     gashw airco garagepl prefarea
## 541    no    no        1       no
## 542    no   yes        0       no
## 543    no   yes        0       no
## 544    no   yes        1       no
## 545    no   yes        1       no
## 546    no   yes        1       no

In this study, we intend to perform a statistical analysis of sales prices of house in the City of Windsor. The main question that we are addressing is whether the price of house is significantly affected by the four factors we concern. The sales price of the house is our variable of interest, named price. There are many factors that may affect the price. In this study, one of the factors we select is bedrooms, and we categorize the number of bedrooms into three levels (“1”, “2”, “3”). The second variable selected was the number of bathrooms, garagepl. It is the second most important factor that determine the price of house. We also categorize the number of garage place into three levels (“1”, “2”, “3”). The third and fourth variable we choose are binary variables - prefarea and airco. prefarea indicates that the location of the house may affect the value of house, and airco indicates that the installation of air conditioner may affect the value of house. A summary of the variable is listed below.

data<-na.omit(data)
summary(data)

##      price           lotsize         bedrooms        bathrms     
##  Min.   : 25000   Min.   : 1650   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 49125   1st Qu.: 3600   1st Qu.:2.000   1st Qu.:1.000  
##  Median : 62000   Median : 4600   Median :3.000   Median :1.000  
##  Mean   : 68122   Mean   : 5150   Mean   :2.965   Mean   :1.286  
##  3rd Qu.: 82000   3rd Qu.: 6360   3rd Qu.:3.000   3rd Qu.:2.000  
##  Max.   :190000   Max.   :16200   Max.   :6.000   Max.   :4.000  
##     stories      driveway  recroom   fullbase  gashw     airco    
##  Min.   :1.000   no : 77   no :449   no :355   no :521   no :373  
##  1st Qu.:1.000   yes:469   yes: 97   yes:191   yes: 25   yes:173  
##  Median :2.000                                                    
##  Mean   :1.808                                                    
##  3rd Qu.:2.000                                                    
##  Max.   :4.000                                                    
##     garagepl      prefarea 
##  Min.   :0.0000   no :418  
##  1st Qu.:0.0000   yes:128  
##  Median :0.0000            
##  Mean   :0.6923            
##  3rd Qu.:1.0000            
##  Max.   :3.0000

data$bedrooms[data$bedrooms <= 2] = 1
data$bedrooms[(data$bedrooms > 2) & (data$bedrooms <= 3) & (data$bedrooms != 1)] = 2
data$bedrooms[(data$bedrooms != 2) & (data$bedrooms != 1)] = 3
data$bedrooms<-as.factor(data$bedrooms)
data$garagepl[data$garagepl == 1] = 1
data$garagepl[(data$garagepl == 2) & (data$garagepl != "1")] = 2
data$garagepl[(data$garagepl != "1") & (data$garagepl != "2")] = 3
data$garagepl<-as.factor(data$garagepl)
levels(data$prefarea)<-c(1,2)
levels(data$airco)<-c(1,2)
head(data)

##   price lotsize bedrooms bathrms stories driveway recroom fullbase gashw
## 1 42000    5850        2       1       2      yes      no      yes    no
## 2 38500    4000        1       1       1      yes      no       no    no
## 3 49500    3060        2       1       1      yes      no       no    no
## 4 60500    6650        2       1       2      yes     yes       no    no
## 5 61000    6360        1       1       1      yes      no       no    no
## 6 66000    4160        2       1       1      yes     yes      yes    no
##   airco garagepl prefarea
## 1     1        1        1
## 2     1        3        1
## 3     1        3        1
## 4     1        3        1
## 5     1        3        1
## 6     2        3        1

str(data)

## 'data.frame':    546 obs. of  12 variables:
##  $ price   : num  42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
##  $ lotsize : num  5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
##  $ bedrooms: Factor w/ 3 levels "1","2","3": 2 1 2 2 1 2 2 2 2 2 ...
##  $ bathrms : num  1 1 1 1 1 1 2 1 1 2 ...
##  $ stories : num  2 1 1 2 1 1 2 3 1 4 ...
##  $ driveway: Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ recroom : Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 2 2 ...
##  $ fullbase: Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
##  $ gashw   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ airco   : Factor w/ 2 levels "1","2": 1 1 1 1 1 2 1 1 1 2 ...
##  $ garagepl: Factor w/ 3 levels "1","2","3": 1 3 3 3 3 3 2 3 3 1 ...
##  $ prefarea: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...

3. Experimental Design

The experiment seeks to observe four factors that impact the price of houses. Statistical and visual analysis can be used to assess the value of houses. We do not capture all the factors in the dataset, but it does not impact the purpose of the experiment. Our purpose is to see if the four factors we observed have impact on the value of houses.

What is the rationale for this design?

Using both full factorial design and Taguchi Design in this experiment can allow us a better understanding of the cost saving effect of the Taguchi Design. In reality, researchers always face the problem of how to design an efficient experimental design. If we have many factors, the experimental runs for full factorial design will grow exponentially. Taguchi Design can help to solve this problem with higher efficiency.

We use the tools in R package to create an orthogonal array for a taguchi design of experiment. The design is shown as follows:

#Loading packages
library(qualityTools)

## Loading required package: Rsolnp

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:Ecdat':
## 
##     SP500

## 
## Attaching package: 'qualityTools'

## The following object is masked from 'package:stats':
## 
##     sigma

library(DoE.base)

## Loading required package: grid

## Loading required package: conf.design

## 
## Attaching package: 'DoE.base'

## The following objects are masked from 'package:stats':
## 
##     aov, lm

## The following object is masked from 'package:graphics':
## 
##     plot.design

## The following object is masked from 'package:base':
## 
##     lengths

library(knitr)
library(tidyr)
library(kimisc)

# Taguchi Design
taguchiChoose(level1 = 2, factors1 = 6)

## 6 factors on 2 levels and 0 factors on 0 levels with 0 desired interactions to be estimated
## 
## Possible Designs:
## 
## L8_2 L12_2 L16_2 L32_2
## 
## Use taguchiDesign("L8_2") or different to create a taguchi design object

From the design above, we can choose L8_2, L12_2, L16_2 or L32_2. In this project, we are interest in L8_2 deisgn. Firstly, it is a fraction of 2⁶ factorial design, so that we can compare it with the results in project 3. Secondly, we can minimize the amount of runs to 8 in our experiment.

Is there any randomization, replication, or blocking used in the experiment?

Replication, replication and blocking are important in a design of experiment. Using these technics can help to reduce the bias caused by nuisance factors and increase the precision.

Randomization

Randamization Should be used in experiment in three ways,random selection, random assignment, and random execution.

In this dataset, we do not have certain information about randomization. However, the dataset should meet the assumptions required for truly random design. Also, with randomly sub-setting and ordering the data in our analysis, we can further make sure that we meet the requirement of randomization. In our experiment, randomization is applied in the taghchiDesin().

Replication

“The repeated measures design (also known as a within-subjects design) uses the same subjects with every condition of the research, including the control, while replication reflects sources of variability both between runs and (potentially) within runs.”[4]

Repeated measures involves measuring the same cases multiple times, and Replication involves running the same study on different subjects but identical conditions.In this dataset, we do not have any replication or repeated measures.

Blocking

“In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups (blocks) that are similar to one another.”[5]

The nuisance factor may have effect on the experiment but it is not the main facor that interests the researcher. In this study, blocking is not included in the experiment because we does not find the nuisance factors that may affect the results in the dataset.

Continuous variables

In the dataset, we take continuous variable - price as our response variable. price is the sale price of each house.

4.Statistical Analysis

Descriptive Analysis of Data

The data set we use in this project is a cross-section data from 1987. There are 546 observations in the dataset. from the summary of the statistics, the prices of house range from $25000 to $190000, and the mean of the price is $68120. The price distribution is shown in the histogram below.

hist(data$price,  main= "Housing Price", xlab="Price")

Plotting the histograms of all four input variables:

par(mfrow=c(2,2))
barplot(table(data$bedrooms), xlab="Bedrooms", ylab="Frequency", main="Histogram of Bedrooms")
barplot(table(data$garagepl), xlab="Garage Place", ylab="Frequency", main="Histogram of Garage Place")
barplot(table(data$prefarea), xlab="Preferred Area", ylab="Frequency", main="Histogram of Preferred Area")
barplot(table(data$airco), xlab="Air-Conditions", ylab="Frequency", main="Histogram of Air-Conditions")

The boxplots are shown below to indicate the price differences among different groups.

#bloxplots 
par(mfrow=c(2,2))
boxplot(data$price~data$bedrooms, xlab="No. of Bedrooms", ylab="House Prices")
means1 <- by(data$price,data$bedrooms, mean)
points(1:3, means1,pch = 23, cex = 2, bg = "red")
text(1:3 - 0.1, means1,labels = format(means1, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")
boxplot(data$price~data$garagepl, xlab="No. of Garage Places", ylab="House Prices")
means2 <- by(data$price,data$garagepl, mean)
points(1:3, means2,pch = 23, cex = 2, bg = "red")
text(1:3 - 0.1, means2,labels = format(means2, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")
boxplot(data$price~data$prefarea, xlab="Preferred Community", ylab="House Prices")
means3 <- by(data$price,data$prefarea, mean)
points(1:2, means3,pch = 23, cex = 2, bg = "red")
text(1:2 - 0.1, means3,labels = format(means3, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")
boxplot(data$price~data$airco, xlab="Air Conditioner", ylab="House Prices")
means4 <- by(data$price,data$airco, mean)
points(1:2, means4,pch = 23, cex = 2, bg = "red")
text(1:2 - 0.1, means4,labels = format(means4, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")

Taguchi Design

plan<-taguchiDesign("L8_2")

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

k <- as.data.frame(plan)
A <- c(as.integer(k[,4]))
B <- c(as.integer(k[,5]))
C <- c(as.integer(k[,6]))
D <- c(as.integer(k[,7]))
CD <- C*D
E <- c(as.integer(k[,8]))
F <- c(as.integer(k[,9]))
EF <- E*F
p <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(k)
for (i in 1:n){
  if (p$A[i] == 1){p$airco[i] <- "1"}
  if (p$A[i] == 2){p$airco[i] <- "2"}
 
  if (p$B[i] == 1){p$prefarea[i] <- "1"}
  if (p$B[i] == 2){p$prefarea[i] <- "2"}

  if (p$CD[i] == 1){p$bedrooms[i] <- "1"}
  if (p$CD[i] == 2){p$bedrooms[i] <- "2"}
  if (p$CD[i] == 4){p$bedrooms[i] <- "3"}
  
  if (p$EF[i] == 1){p$garagepl[i] <- "1"}
  if (p$EF[i] == 2){p$garagepl[i] <- "2"}
  if (p$EF[i] == 4){p$garagepl[i] <- "3"}
}
pp <- cbind(k,p$airco,p$prefarea,p$bedrooms,p$garagepl)
summary(pp)

##    StandOrder      RunOrder      Replicate       A             B      
##  Min.   :1.00   Min.   :1.00   Min.   :1   Min.   :1.0   Min.   :1.0  
##  1st Qu.:2.75   1st Qu.:2.75   1st Qu.:1   1st Qu.:1.0   1st Qu.:1.0  
##  Median :4.50   Median :4.50   Median :1   Median :1.5   Median :1.5  
##  Mean   :4.50   Mean   :4.50   Mean   :1   Mean   :1.5   Mean   :1.5  
##  3rd Qu.:6.25   3rd Qu.:6.25   3rd Qu.:1   3rd Qu.:2.0   3rd Qu.:2.0  
##  Max.   :8.00   Max.   :8.00   Max.   :1   Max.   :2.0   Max.   :2.0  
##        C             D             E             F             G      
##  Min.   :1.0   Min.   :1.0   Min.   :1.0   Min.   :1.0   Min.   :1.0  
##  1st Qu.:1.0   1st Qu.:1.0   1st Qu.:1.0   1st Qu.:1.0   1st Qu.:1.0  
##  Median :1.5   Median :1.5   Median :1.5   Median :1.5   Median :1.5  
##  Mean   :1.5   Mean   :1.5   Mean   :1.5   Mean   :1.5   Mean   :1.5  
##  3rd Qu.:2.0   3rd Qu.:2.0   3rd Qu.:2.0   3rd Qu.:2.0   3rd Qu.:2.0  
##  Max.   :2.0   Max.   :2.0   Max.   :2.0   Max.   :2.0   Max.   :2.0  
##     y           p$airco p$prefarea p$bedrooms p$garagepl
##  Mode:logical   1:4     1:4        1:2        1:2       
##  NA's:8         2:4     2:4        2:4        2:4       
##                                    3:2        3:2       
##                                                         
##                                                         
##

pp2<-unite(pp, refcol , c(12,13,14,15), remove=FALSE)
kable(pp2, align = 'c')

StandOrder	RunOrder	Replicate	A	B	C	D	E	F	G	y	refcol	p$airco	p$prefarea	p$bedrooms	p$garagepl
2	1	1	1	1	1	2	2	2	2	NA	1_1_2_3	1	1	2	3
7	2	1	2	2	1	1	2	2	1	NA	2_2_1_3	2	2	1	3
3	3	1	1	2	2	1	1	2	2	NA	1_2_2_2	1	2	2	2
6	4	1	2	1	2	2	1	2	1	NA	2_1_3_2	2	1	3	2
4	5	1	1	2	2	2	2	1	1	NA	1_2_3_2	1	2	3	2
5	6	1	2	1	2	1	2	1	2	NA	2_1_2_2	2	1	2	2
8	7	1	2	2	1	2	1	1	2	NA	2_2_2_1	2	2	2	1
1	8	1	1	1	1	1	1	1	1	NA	1_1_1_1	1	1	1	1

p.refcol <- c(pp2$refcol)

data1<-unite(data, refcol , c(10,12,3,11), remove=FALSE)
head(data1)

##   price lotsize  refcol bedrooms bathrms stories driveway recroom fullbase
## 1 42000    5850 1_1_2_1        2       1       2      yes      no      yes
## 2 38500    4000 1_1_1_3        1       1       1      yes      no       no
## 3 49500    3060 1_1_2_3        2       1       1      yes      no       no
## 4 60500    6650 1_1_2_3        2       1       2      yes     yes       no
## 5 61000    6360 1_1_1_3        1       1       1      yes      no       no
## 6 66000    4160 2_1_2_3        2       1       1      yes     yes      yes
##   gashw airco garagepl prefarea
## 1    no     1        1        1
## 2    no     1        3        1
## 3    no     1        3        1
## 4    no     1        3        1
## 5    no     1        3        1
## 6    no     2        3        1

tail(data1)

##      price lotsize  refcol bedrooms bathrms stories driveway recroom
## 541  85000    6525 1_1_2_1        2       2       4      yes      no
## 542  91500    4800 2_1_2_3        2       2       4      yes     yes
## 543  94000    6000 2_1_2_3        2       2       4      yes      no
## 544 103000    6000 2_1_2_1        2       2       4      yes     yes
## 545 105000    6000 2_1_2_1        2       2       2      yes     yes
## 546 105000    6000 2_1_2_1        2       1       2      yes      no
##     fullbase gashw airco garagepl prefarea
## 541       no    no     1        1        1
## 542       no    no     2        3        1
## 543       no    no     2        3        1
## 544       no    no     2        1        1
## 545       no    no     2        1        1
## 546       no    no     2        1        1

for (j in 1:546){if (pp2$refcol[1] == data1$refcol[j]){data1$RO1[j] <- data1$price[j]}else{data1$RO1[j] <- 0}}
for (j in 1:546){if (pp2$refcol[2] == data1$refcol[j]){data1$RO2[j] <- data1$price[j]}else{data1$RO2[j] <- 0}}
for (j in 1:546){if (pp2$refcol[3] == data1$refcol[j]){data1$RO3[j] <- data1$price[j]}else{data1$RO3[j] <- 0}}
for (j in 1:546){if (pp2$refcol[4] == data1$refcol[j]){data1$RO4[j] <- data1$price[j]}else{data1$RO4[j] <- 0}}
for (j in 1:546){if (pp2$refcol[5] == data1$refcol[j]){data1$RO5[j] <- data1$price[j]}else{data1$RO5[j] <- 0}}
for (j in 1:546){if (pp2$refcol[6] == data1$refcol[j]){data1$RO6[j] <- data1$price[j]}else{data1$RO6[j] <- 0}}
for (j in 1:546){if (pp2$refcol[7] == data1$refcol[j]){data1$RO7[j] <- data1$price[j]}else{data1$RO7[j] <- 0}}
for (j in 1:546){if (pp2$refcol[8] == data1$refcol[j]){data1$RO8[j] <- data1$price[j]}else{data1$RO8[j] <- 0}}
run1 <- sample.rows(subset(data1, RO1 > 0), 1)
run2 <- sample.rows(subset(data1, RO2 > 0), 1)
run3 <- sample.rows(subset(data1, RO3 > 0), 1)
run4 <- sample.rows(subset(data1, RO4 > 0), 1)
run5 <- sample.rows(subset(data1, RO5 > 0), 1)
run6 <- sample.rows(subset(data1, RO6 > 0), 1)
run7 <- sample.rows(subset(data1, RO7 > 0), 1)
run8 <- sample.rows(subset(data1, RO8 > 0), 1)

RV <- c(run1$RO1,run2$RO2,run3$RO3,run4$RO4,run5$RO5,run6$RO6,run7$RO7,run8$RO8)
barplot(RV, names.arg = c("ER1","ER2","ER3","ER4","ER5","ER6","ER7","ER8"),xlab = "Exprimental Runs",ylab = "Observations")

response(plan) = RV
summary(plan)

## Taguchi SINGLE Design
## Information about the factors:
## 
##               A       B       C       D       E       F       G
## value 1       1       1       1       1       1       1       1
## value 2       2       2       2       2       2       2       2
## name                                                           
## unit                                                           
## type    numeric numeric numeric numeric numeric numeric numeric
## 
## -----------
## 
##   StandOrder RunOrder Replicate A B C D E F G     RV
## 1          2        1         1 1 1 1 2 2 2 2  60000
## 2          7        2         1 2 2 1 1 2 2 1  69000
## 3          3        3         1 1 2 2 1 1 2 2  85000
## 4          6        4         1 2 1 2 2 1 2 1  64000
## 5          4        5         1 1 2 2 2 2 1 1  97000
## 6          5        6         1 2 1 2 1 2 1 2  84900
## 7          8        7         1 2 2 1 2 1 1 2 100000
## 8          1        8         1 1 1 1 1 1 1 1  37000
## 
## -----------

effectPlot(plan, col=4,lty=3, main="Taguchi Design")

oa <- as.data.frame(plan)
oa

##   StandOrder RunOrder Replicate A B C D E F G     RV
## 1          2        1         1 1 1 1 2 2 2 2  60000
## 2          7        2         1 2 2 1 1 2 2 1  69000
## 3          3        3         1 1 2 2 1 1 2 2  85000
## 4          6        4         1 2 1 2 2 1 2 1  64000
## 5          4        5         1 1 2 2 2 2 1 1  97000
## 6          5        6         1 2 1 2 1 2 1 2  84900
## 7          8        7         1 2 2 1 2 1 1 2 100000
## 8          1        8         1 1 1 1 1 1 1 1  37000

airco <- c(as.integer(oa[,4]))
prefarea <- c(as.integer(oa[,5]))
bd1 <- c(as.integer(oa[,6]))
bd2 <- c(as.integer(oa[,7]))
gp1 <- c(as.integer(oa[,8]))
gp2 <- c(as.integer(oa[,9]))

price <- c(as.integer(oa[,11]))

oa1 <- as.data.frame(cbind(airco,prefarea,bd1,bd2,gp1,gp2,price))
oa1

##   airco prefarea bd1 bd2 gp1 gp2  price
## 1     1        1   1   2   2   2  60000
## 2     2        2   1   1   2   2  69000
## 3     1        2   2   1   1   2  85000
## 4     2        1   2   2   1   2  64000
## 5     1        2   2   2   2   1  97000
## 6     2        1   2   1   2   1  84900
## 7     2        2   1   2   1   1 100000
## 8     1        1   1   1   1   1  37000

oa1$airco <- as.factor(oa1$airco)
oa1$prefarea <- as.factor(oa1$prefarea)
oa1$bd1 <- as.factor(oa1$bd1)
oa1$bd2 <- as.factor(oa1$bd2)
oa1$gp1 <- as.factor(oa1$gp1)
oa1$gp2 <- as.factor(oa1$gp2)

summary(oa1)

##  airco prefarea bd1   bd2   gp1   gp2       price       
##  1:4   1:4      1:4   1:4   1:4   1:4   Min.   : 37000  
##  2:4   2:4      2:4   2:4   2:4   2:4   1st Qu.: 63000  
##                                         Median : 76950  
##                                         Mean   : 74613  
##                                         3rd Qu.: 88000  
##                                         Max.   :100000

model=lm(price~airco+prefarea+bd1+bd2+gp1+gp2,data=oa1)
anova(model)

## Analysis of Variance Table
## 
## Response: price
##           Df     Sum Sq    Mean Sq F value Pr(>F)
## airco      1  189151250  189151250  0.3825 0.6474
## prefarea   1 1380751250 1380751250  2.7919 0.3433
## bd1        1  526501250  526501250  1.0646 0.4900
## bd2        1  254251250  254251250  0.5141 0.6040
## gp1        1   77501250   77501250  0.1567 0.7600
## gp2        1  209101250  209101250  0.4228 0.6330
## Residuals  1  494551250  494551250

Our result is similar with the result generated in project 3. In project 3, we use fractional factorial design, and we do not find all of these effects show significance. However, when wen perform full factorial desin analysis, the factors are all statistically significant. The full factorial design is shown in model1. We can see that all the factors are statistically significant.

#We perform the analysis of variance (ANOVA) performed on the full factorial set of data.
model1= lm(price~bedrooms+garagepl+prefarea+airco, data=data)
anova(model1)

## Analysis of Variance Table
## 
## Response: price
##            Df     Sum Sq    Mean Sq F value    Pr(>F)    
## bedrooms    2 5.9217e+10 2.9609e+10  72.118 < 2.2e-16 ***
## garagepl    2 4.0097e+10 2.0049e+10  48.833 < 2.2e-16 ***
## prefarea    1 2.6403e+10 2.6403e+10  64.309 6.656e-15 ***
## airco       1 4.1597e+10 4.1597e+10 101.320 < 2.2e-16 ***
## Residuals 539 2.2129e+11 4.1055e+08                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results of the ANOVA test show that all the factors in the full factorial design have statistically significant effects on the value of house. From the results, we can conclude that the significance can be likely attribute to other than randomization. So we reject the null hypothesis that the four main factor does not have effect on the value of house.

Diagnostics/Model Adequacy Checking

In order to check the adequacy of the ANOVA test. We perform Quantile-Quantile (Q-Q) tests on the residual errors of the original data to determine if the residuals followed a normal distribution.

par(mfrow=c(2,2))
qqnorm(residuals(model))
qqline(residuals(model))
plot(fitted(model),residuals(model))
qqnorm(residuals(model1))
qqline(residuals(model1))
plot(fitted(model1),residuals(model1))

If the resonse ins normal distributed, the points should be shown in a straight dotted line. So in our study, the response may not be strictly normal distributed.

5. Conclusion

In this project, we use Taguchi Design to test the factors that may affect the value of houses. Taguchi Design can help reseachers run the experiment with lower cost and higher efficiency. However, we should also be aware that sometimes it may affect the accuracy. In our project, we do not find any significant effects. Both Taguchi designs and previously generated fractional factorial analysis have similiar findings.However, in the full factorial design, all these factors are significant. So we need to take this risk into account when we use Taguchi Design.

6. Reference

[1] Anglin, P.M. and R. Gencay (1996) “Semiparametric estimation of a hedonic price function”, Journal of Applied Econometrics, 11(6), 633-648.

[2] Cran.r Project (2015), Package ‘Ecdat’, Accessed: 11-02-2016 https://cran.r-project.org/web/packages/Ecdat/Ecdat.pdf

[3] Montgomery, D. (2013), Design and Analysis of Experiments, Wiley and Sons, 8th Edition, 752p.

[4] https://en.wikipedia.org/wiki/Replication_(statistics)

[5] https://en.wikipedia.org/wiki/Blocking_(statistics)

Design of Experiments - Project 4

Liang Zhao

December 8, 2016