Topics about properties have been analyzed in great detail by researchers in the past decades. Many factors can affect the value of housing, making the research more complex. In 100+ Interesting Data Sets for Statistics, a data set called Ecdat is introduced. It is a wealth of data sets available for R, containing gobs of econometric data.In this project, we study sales prices of houses in the City of Windsor. The data used in this recipe is found using the “100+ interesting data sets” webpage, and it is publicly available in R package named Ecdat. A summary of this package is available at https://cran.r-project.org/web/packages/Ecdat/Ecdat.pdf.
In this study, We apply Taguchi design (FFD) among Housing dataset in Ecdat. The variables of study include two 2-level factors and two 3-level factors that may influence the price. Taguchi Design was used to perform this analysis.
In the first step, the data is downloaded from Ecdat package.the Housing dataframe was imported and assigned to data dataframe:
library("Ecdat")
## Loading required package: Ecfun
##
## Attaching package: 'Ecfun'
## The following object is masked from 'package:base':
##
## sign
##
## Attaching package: 'Ecdat'
## The following object is masked from 'package:datasets':
##
## Orange
data<- Housing
The structure, heading and tail of the data frame is shown below to analyze the data innitially.
str(data)
## 'data.frame': 546 obs. of 12 variables:
## $ price : num 42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
## $ lotsize : num 5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
## $ bedrooms: num 3 2 3 3 2 3 3 3 3 3 ...
## $ bathrms : num 1 1 1 1 1 1 2 1 1 2 ...
## $ stories : num 2 1 1 2 1 1 2 3 1 4 ...
## $ driveway: Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ recroom : Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 2 2 ...
## $ fullbase: Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
## $ gashw : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ airco : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 2 ...
## $ garagepl: num 1 0 0 0 0 0 2 0 0 1 ...
## $ prefarea: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
head(data)
## price lotsize bedrooms bathrms stories driveway recroom fullbase gashw
## 1 42000 5850 3 1 2 yes no yes no
## 2 38500 4000 2 1 1 yes no no no
## 3 49500 3060 3 1 1 yes no no no
## 4 60500 6650 3 1 2 yes yes no no
## 5 61000 6360 2 1 1 yes no no no
## 6 66000 4160 3 1 1 yes yes yes no
## airco garagepl prefarea
## 1 no 1 no
## 2 no 0 no
## 3 no 0 no
## 4 no 0 no
## 5 no 0 no
## 6 yes 0 no
tail(data)
## price lotsize bedrooms bathrms stories driveway recroom fullbase
## 541 85000 6525 3 2 4 yes no no
## 542 91500 4800 3 2 4 yes yes no
## 543 94000 6000 3 2 4 yes no no
## 544 103000 6000 3 2 4 yes yes no
## 545 105000 6000 3 2 2 yes yes no
## 546 105000 6000 3 1 2 yes no no
## gashw airco garagepl prefarea
## 541 no no 1 no
## 542 no yes 0 no
## 543 no yes 0 no
## 544 no yes 1 no
## 545 no yes 1 no
## 546 no yes 1 no
In this study, we intend to perform a statistical analysis of sales prices of house in the City of Windsor. The main question that we are addressing is whether the price of house is significantly affected by the four factors we concern. The sales price of the house is our variable of interest, named price. There are many factors that may affect the price. In this study, one of the factors we select is bedrooms, and we categorize the number of bedrooms into three levels (“1”, “2”, “3”). The second variable selected was the number of bathrooms, garagepl. It is the second most important factor that determine the price of house. We also categorize the number of garage place into three levels (“1”, “2”, “3”). The third and fourth variable we choose are binary variables - prefarea and airco. prefarea indicates that the location of the house may affect the value of house, and airco indicates that the installation of air conditioner may affect the value of house. A summary of the variable is listed below.
data<-na.omit(data)
summary(data)
## price lotsize bedrooms bathrms
## Min. : 25000 Min. : 1650 Min. :1.000 Min. :1.000
## 1st Qu.: 49125 1st Qu.: 3600 1st Qu.:2.000 1st Qu.:1.000
## Median : 62000 Median : 4600 Median :3.000 Median :1.000
## Mean : 68122 Mean : 5150 Mean :2.965 Mean :1.286
## 3rd Qu.: 82000 3rd Qu.: 6360 3rd Qu.:3.000 3rd Qu.:2.000
## Max. :190000 Max. :16200 Max. :6.000 Max. :4.000
## stories driveway recroom fullbase gashw airco
## Min. :1.000 no : 77 no :449 no :355 no :521 no :373
## 1st Qu.:1.000 yes:469 yes: 97 yes:191 yes: 25 yes:173
## Median :2.000
## Mean :1.808
## 3rd Qu.:2.000
## Max. :4.000
## garagepl prefarea
## Min. :0.0000 no :418
## 1st Qu.:0.0000 yes:128
## Median :0.0000
## Mean :0.6923
## 3rd Qu.:1.0000
## Max. :3.0000
data$bedrooms[data$bedrooms <= 2] = 1
data$bedrooms[(data$bedrooms > 2) & (data$bedrooms <= 3) & (data$bedrooms != 1)] = 2
data$bedrooms[(data$bedrooms != 2) & (data$bedrooms != 1)] = 3
data$bedrooms<-as.factor(data$bedrooms)
data$garagepl[data$garagepl == 1] = 1
data$garagepl[(data$garagepl == 2) & (data$garagepl != "1")] = 2
data$garagepl[(data$garagepl != "1") & (data$garagepl != "2")] = 3
data$garagepl<-as.factor(data$garagepl)
levels(data$prefarea)<-c(1,2)
levels(data$airco)<-c(1,2)
head(data)
## price lotsize bedrooms bathrms stories driveway recroom fullbase gashw
## 1 42000 5850 2 1 2 yes no yes no
## 2 38500 4000 1 1 1 yes no no no
## 3 49500 3060 2 1 1 yes no no no
## 4 60500 6650 2 1 2 yes yes no no
## 5 61000 6360 1 1 1 yes no no no
## 6 66000 4160 2 1 1 yes yes yes no
## airco garagepl prefarea
## 1 1 1 1
## 2 1 3 1
## 3 1 3 1
## 4 1 3 1
## 5 1 3 1
## 6 2 3 1
str(data)
## 'data.frame': 546 obs. of 12 variables:
## $ price : num 42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
## $ lotsize : num 5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
## $ bedrooms: Factor w/ 3 levels "1","2","3": 2 1 2 2 1 2 2 2 2 2 ...
## $ bathrms : num 1 1 1 1 1 1 2 1 1 2 ...
## $ stories : num 2 1 1 2 1 1 2 3 1 4 ...
## $ driveway: Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ recroom : Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 2 2 ...
## $ fullbase: Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
## $ gashw : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ airco : Factor w/ 2 levels "1","2": 1 1 1 1 1 2 1 1 1 2 ...
## $ garagepl: Factor w/ 3 levels "1","2","3": 1 3 3 3 3 3 2 3 3 1 ...
## $ prefarea: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
The experiment seeks to observe four factors that impact the price of houses. Statistical and visual analysis can be used to assess the value of houses. We do not capture all the factors in the dataset, but it does not impact the purpose of the experiment. Our purpose is to see if the four factors we observed have impact on the value of houses.
Using both full factorial design and Taguchi Design in this experiment can allow us a better understanding of the cost saving effect of the Taguchi Design. In reality, researchers always face the problem of how to design an efficient experimental design. If we have many factors, the experimental runs for full factorial design will grow exponentially. Taguchi Design can help to solve this problem with higher efficiency.
We use the tools in R package to create an orthogonal array for a taguchi design of experiment. The design is shown as follows:
#Loading packages
library(qualityTools)
## Loading required package: Rsolnp
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:Ecdat':
##
## SP500
##
## Attaching package: 'qualityTools'
## The following object is masked from 'package:stats':
##
## sigma
library(DoE.base)
## Loading required package: grid
## Loading required package: conf.design
##
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
##
## aov, lm
## The following object is masked from 'package:graphics':
##
## plot.design
## The following object is masked from 'package:base':
##
## lengths
library(knitr)
library(tidyr)
library(kimisc)
# Taguchi Design
taguchiChoose(level1 = 2, factors1 = 6)
## 6 factors on 2 levels and 0 factors on 0 levels with 0 desired interactions to be estimated
##
## Possible Designs:
##
## L8_2 L12_2 L16_2 L32_2
##
## Use taguchiDesign("L8_2") or different to create a taguchi design object
From the design above, we can choose L8_2, L12_2, L16_2 or L32_2. In this project, we are interest in L8_2 deisgn. Firstly, it is a fraction of 26 factorial design, so that we can compare it with the results in project 3. Secondly, we can minimize the amount of runs to 8 in our experiment.
Replication, replication and blocking are important in a design of experiment. Using these technics can help to reduce the bias caused by nuisance factors and increase the precision.
Randamization Should be used in experiment in three ways,random selection, random assignment, and random execution.
In this dataset, we do not have certain information about randomization. However, the dataset should meet the assumptions required for truly random design. Also, with randomly sub-setting and ordering the data in our analysis, we can further make sure that we meet the requirement of randomization. In our experiment, randomization is applied in the taghchiDesin().
“The repeated measures design (also known as a within-subjects design) uses the same subjects with every condition of the research, including the control, while replication reflects sources of variability both between runs and (potentially) within runs.”[4]
Repeated measures involves measuring the same cases multiple times, and Replication involves running the same study on different subjects but identical conditions.In this dataset, we do not have any replication or repeated measures.
“In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups (blocks) that are similar to one another.”[5]
The nuisance factor may have effect on the experiment but it is not the main facor that interests the researcher. In this study, blocking is not included in the experiment because we does not find the nuisance factors that may affect the results in the dataset.
In the dataset, we take continuous variable - price as our response variable. price is the sale price of each house.
The data set we use in this project is a cross-section data from 1987. There are 546 observations in the dataset. from the summary of the statistics, the prices of house range from $25000 to $190000, and the mean of the price is $68120. The price distribution is shown in the histogram below.
hist(data$price, main= "Housing Price", xlab="Price")
Plotting the histograms of all four input variables:
par(mfrow=c(2,2))
barplot(table(data$bedrooms), xlab="Bedrooms", ylab="Frequency", main="Histogram of Bedrooms")
barplot(table(data$garagepl), xlab="Garage Place", ylab="Frequency", main="Histogram of Garage Place")
barplot(table(data$prefarea), xlab="Preferred Area", ylab="Frequency", main="Histogram of Preferred Area")
barplot(table(data$airco), xlab="Air-Conditions", ylab="Frequency", main="Histogram of Air-Conditions")
The boxplots are shown below to indicate the price differences among different groups.
#bloxplots
par(mfrow=c(2,2))
boxplot(data$price~data$bedrooms, xlab="No. of Bedrooms", ylab="House Prices")
means1 <- by(data$price,data$bedrooms, mean)
points(1:3, means1,pch = 23, cex = 2, bg = "red")
text(1:3 - 0.1, means1,labels = format(means1, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")
boxplot(data$price~data$garagepl, xlab="No. of Garage Places", ylab="House Prices")
means2 <- by(data$price,data$garagepl, mean)
points(1:3, means2,pch = 23, cex = 2, bg = "red")
text(1:3 - 0.1, means2,labels = format(means2, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")
boxplot(data$price~data$prefarea, xlab="Preferred Community", ylab="House Prices")
means3 <- by(data$price,data$prefarea, mean)
points(1:2, means3,pch = 23, cex = 2, bg = "red")
text(1:2 - 0.1, means3,labels = format(means3, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")
boxplot(data$price~data$airco, xlab="Air Conditioner", ylab="House Prices")
means4 <- by(data$price,data$airco, mean)
points(1:2, means4,pch = 23, cex = 2, bg = "red")
text(1:2 - 0.1, means4,labels = format(means4, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")
plan<-taguchiDesign("L8_2")
## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated
## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated
k <- as.data.frame(plan)
A <- c(as.integer(k[,4]))
B <- c(as.integer(k[,5]))
C <- c(as.integer(k[,6]))
D <- c(as.integer(k[,7]))
CD <- C*D
E <- c(as.integer(k[,8]))
F <- c(as.integer(k[,9]))
EF <- E*F
p <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(k)
for (i in 1:n){
if (p$A[i] == 1){p$airco[i] <- "1"}
if (p$A[i] == 2){p$airco[i] <- "2"}
if (p$B[i] == 1){p$prefarea[i] <- "1"}
if (p$B[i] == 2){p$prefarea[i] <- "2"}
if (p$CD[i] == 1){p$bedrooms[i] <- "1"}
if (p$CD[i] == 2){p$bedrooms[i] <- "2"}
if (p$CD[i] == 4){p$bedrooms[i] <- "3"}
if (p$EF[i] == 1){p$garagepl[i] <- "1"}
if (p$EF[i] == 2){p$garagepl[i] <- "2"}
if (p$EF[i] == 4){p$garagepl[i] <- "3"}
}
pp <- cbind(k,p$airco,p$prefarea,p$bedrooms,p$garagepl)
summary(pp)
## StandOrder RunOrder Replicate A B
## Min. :1.00 Min. :1.00 Min. :1 Min. :1.0 Min. :1.0
## 1st Qu.:2.75 1st Qu.:2.75 1st Qu.:1 1st Qu.:1.0 1st Qu.:1.0
## Median :4.50 Median :4.50 Median :1 Median :1.5 Median :1.5
## Mean :4.50 Mean :4.50 Mean :1 Mean :1.5 Mean :1.5
## 3rd Qu.:6.25 3rd Qu.:6.25 3rd Qu.:1 3rd Qu.:2.0 3rd Qu.:2.0
## Max. :8.00 Max. :8.00 Max. :1 Max. :2.0 Max. :2.0
## C D E F G
## Min. :1.0 Min. :1.0 Min. :1.0 Min. :1.0 Min. :1.0
## 1st Qu.:1.0 1st Qu.:1.0 1st Qu.:1.0 1st Qu.:1.0 1st Qu.:1.0
## Median :1.5 Median :1.5 Median :1.5 Median :1.5 Median :1.5
## Mean :1.5 Mean :1.5 Mean :1.5 Mean :1.5 Mean :1.5
## 3rd Qu.:2.0 3rd Qu.:2.0 3rd Qu.:2.0 3rd Qu.:2.0 3rd Qu.:2.0
## Max. :2.0 Max. :2.0 Max. :2.0 Max. :2.0 Max. :2.0
## y p$airco p$prefarea p$bedrooms p$garagepl
## Mode:logical 1:4 1:4 1:2 1:2
## NA's:8 2:4 2:4 2:4 2:4
## 3:2 3:2
##
##
##
pp2<-unite(pp, refcol , c(12,13,14,15), remove=FALSE)
kable(pp2, align = 'c')
| StandOrder | RunOrder | Replicate | A | B | C | D | E | F | G | y | refcol | p$airco | p$prefarea | p$bedrooms | p$garagepl |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | NA | 1_1_2_3 | 1 | 1 | 2 | 3 |
| 7 | 2 | 1 | 2 | 2 | 1 | 1 | 2 | 2 | 1 | NA | 2_2_1_3 | 2 | 2 | 1 | 3 |
| 3 | 3 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | 2 | NA | 1_2_2_2 | 1 | 2 | 2 | 2 |
| 6 | 4 | 1 | 2 | 1 | 2 | 2 | 1 | 2 | 1 | NA | 2_1_3_2 | 2 | 1 | 3 | 2 |
| 4 | 5 | 1 | 1 | 2 | 2 | 2 | 2 | 1 | 1 | NA | 1_2_3_2 | 1 | 2 | 3 | 2 |
| 5 | 6 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | NA | 2_1_2_2 | 2 | 1 | 2 | 2 |
| 8 | 7 | 1 | 2 | 2 | 1 | 2 | 1 | 1 | 2 | NA | 2_2_2_1 | 2 | 2 | 2 | 1 |
| 1 | 8 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NA | 1_1_1_1 | 1 | 1 | 1 | 1 |
p.refcol <- c(pp2$refcol)
data1<-unite(data, refcol , c(10,12,3,11), remove=FALSE)
head(data1)
## price lotsize refcol bedrooms bathrms stories driveway recroom fullbase
## 1 42000 5850 1_1_2_1 2 1 2 yes no yes
## 2 38500 4000 1_1_1_3 1 1 1 yes no no
## 3 49500 3060 1_1_2_3 2 1 1 yes no no
## 4 60500 6650 1_1_2_3 2 1 2 yes yes no
## 5 61000 6360 1_1_1_3 1 1 1 yes no no
## 6 66000 4160 2_1_2_3 2 1 1 yes yes yes
## gashw airco garagepl prefarea
## 1 no 1 1 1
## 2 no 1 3 1
## 3 no 1 3 1
## 4 no 1 3 1
## 5 no 1 3 1
## 6 no 2 3 1
tail(data1)
## price lotsize refcol bedrooms bathrms stories driveway recroom
## 541 85000 6525 1_1_2_1 2 2 4 yes no
## 542 91500 4800 2_1_2_3 2 2 4 yes yes
## 543 94000 6000 2_1_2_3 2 2 4 yes no
## 544 103000 6000 2_1_2_1 2 2 4 yes yes
## 545 105000 6000 2_1_2_1 2 2 2 yes yes
## 546 105000 6000 2_1_2_1 2 1 2 yes no
## fullbase gashw airco garagepl prefarea
## 541 no no 1 1 1
## 542 no no 2 3 1
## 543 no no 2 3 1
## 544 no no 2 1 1
## 545 no no 2 1 1
## 546 no no 2 1 1
for (j in 1:546){if (pp2$refcol[1] == data1$refcol[j]){data1$RO1[j] <- data1$price[j]}else{data1$RO1[j] <- 0}}
for (j in 1:546){if (pp2$refcol[2] == data1$refcol[j]){data1$RO2[j] <- data1$price[j]}else{data1$RO2[j] <- 0}}
for (j in 1:546){if (pp2$refcol[3] == data1$refcol[j]){data1$RO3[j] <- data1$price[j]}else{data1$RO3[j] <- 0}}
for (j in 1:546){if (pp2$refcol[4] == data1$refcol[j]){data1$RO4[j] <- data1$price[j]}else{data1$RO4[j] <- 0}}
for (j in 1:546){if (pp2$refcol[5] == data1$refcol[j]){data1$RO5[j] <- data1$price[j]}else{data1$RO5[j] <- 0}}
for (j in 1:546){if (pp2$refcol[6] == data1$refcol[j]){data1$RO6[j] <- data1$price[j]}else{data1$RO6[j] <- 0}}
for (j in 1:546){if (pp2$refcol[7] == data1$refcol[j]){data1$RO7[j] <- data1$price[j]}else{data1$RO7[j] <- 0}}
for (j in 1:546){if (pp2$refcol[8] == data1$refcol[j]){data1$RO8[j] <- data1$price[j]}else{data1$RO8[j] <- 0}}
run1 <- sample.rows(subset(data1, RO1 > 0), 1)
run2 <- sample.rows(subset(data1, RO2 > 0), 1)
run3 <- sample.rows(subset(data1, RO3 > 0), 1)
run4 <- sample.rows(subset(data1, RO4 > 0), 1)
run5 <- sample.rows(subset(data1, RO5 > 0), 1)
run6 <- sample.rows(subset(data1, RO6 > 0), 1)
run7 <- sample.rows(subset(data1, RO7 > 0), 1)
run8 <- sample.rows(subset(data1, RO8 > 0), 1)
RV <- c(run1$RO1,run2$RO2,run3$RO3,run4$RO4,run5$RO5,run6$RO6,run7$RO7,run8$RO8)
barplot(RV, names.arg = c("ER1","ER2","ER3","ER4","ER5","ER6","ER7","ER8"),xlab = "Exprimental Runs",ylab = "Observations")
response(plan) = RV
summary(plan)
## Taguchi SINGLE Design
## Information about the factors:
##
## A B C D E F G
## value 1 1 1 1 1 1 1 1
## value 2 2 2 2 2 2 2 2
## name
## unit
## type numeric numeric numeric numeric numeric numeric numeric
##
## -----------
##
## StandOrder RunOrder Replicate A B C D E F G RV
## 1 2 1 1 1 1 1 2 2 2 2 60000
## 2 7 2 1 2 2 1 1 2 2 1 69000
## 3 3 3 1 1 2 2 1 1 2 2 85000
## 4 6 4 1 2 1 2 2 1 2 1 64000
## 5 4 5 1 1 2 2 2 2 1 1 97000
## 6 5 6 1 2 1 2 1 2 1 2 84900
## 7 8 7 1 2 2 1 2 1 1 2 100000
## 8 1 8 1 1 1 1 1 1 1 1 37000
##
## -----------
effectPlot(plan, col=4,lty=3, main="Taguchi Design")
oa <- as.data.frame(plan)
oa
## StandOrder RunOrder Replicate A B C D E F G RV
## 1 2 1 1 1 1 1 2 2 2 2 60000
## 2 7 2 1 2 2 1 1 2 2 1 69000
## 3 3 3 1 1 2 2 1 1 2 2 85000
## 4 6 4 1 2 1 2 2 1 2 1 64000
## 5 4 5 1 1 2 2 2 2 1 1 97000
## 6 5 6 1 2 1 2 1 2 1 2 84900
## 7 8 7 1 2 2 1 2 1 1 2 100000
## 8 1 8 1 1 1 1 1 1 1 1 37000
airco <- c(as.integer(oa[,4]))
prefarea <- c(as.integer(oa[,5]))
bd1 <- c(as.integer(oa[,6]))
bd2 <- c(as.integer(oa[,7]))
gp1 <- c(as.integer(oa[,8]))
gp2 <- c(as.integer(oa[,9]))
price <- c(as.integer(oa[,11]))
oa1 <- as.data.frame(cbind(airco,prefarea,bd1,bd2,gp1,gp2,price))
oa1
## airco prefarea bd1 bd2 gp1 gp2 price
## 1 1 1 1 2 2 2 60000
## 2 2 2 1 1 2 2 69000
## 3 1 2 2 1 1 2 85000
## 4 2 1 2 2 1 2 64000
## 5 1 2 2 2 2 1 97000
## 6 2 1 2 1 2 1 84900
## 7 2 2 1 2 1 1 100000
## 8 1 1 1 1 1 1 37000
oa1$airco <- as.factor(oa1$airco)
oa1$prefarea <- as.factor(oa1$prefarea)
oa1$bd1 <- as.factor(oa1$bd1)
oa1$bd2 <- as.factor(oa1$bd2)
oa1$gp1 <- as.factor(oa1$gp1)
oa1$gp2 <- as.factor(oa1$gp2)
summary(oa1)
## airco prefarea bd1 bd2 gp1 gp2 price
## 1:4 1:4 1:4 1:4 1:4 1:4 Min. : 37000
## 2:4 2:4 2:4 2:4 2:4 2:4 1st Qu.: 63000
## Median : 76950
## Mean : 74613
## 3rd Qu.: 88000
## Max. :100000
model=lm(price~airco+prefarea+bd1+bd2+gp1+gp2,data=oa1)
anova(model)
## Analysis of Variance Table
##
## Response: price
## Df Sum Sq Mean Sq F value Pr(>F)
## airco 1 189151250 189151250 0.3825 0.6474
## prefarea 1 1380751250 1380751250 2.7919 0.3433
## bd1 1 526501250 526501250 1.0646 0.4900
## bd2 1 254251250 254251250 0.5141 0.6040
## gp1 1 77501250 77501250 0.1567 0.7600
## gp2 1 209101250 209101250 0.4228 0.6330
## Residuals 1 494551250 494551250
Our result is similar with the result generated in project 3. In project 3, we use fractional factorial design, and we do not find all of these effects show significance. However, when wen perform full factorial desin analysis, the factors are all statistically significant. The full factorial design is shown in model1. We can see that all the factors are statistically significant.
#We perform the analysis of variance (ANOVA) performed on the full factorial set of data.
model1= lm(price~bedrooms+garagepl+prefarea+airco, data=data)
anova(model1)
## Analysis of Variance Table
##
## Response: price
## Df Sum Sq Mean Sq F value Pr(>F)
## bedrooms 2 5.9217e+10 2.9609e+10 72.118 < 2.2e-16 ***
## garagepl 2 4.0097e+10 2.0049e+10 48.833 < 2.2e-16 ***
## prefarea 1 2.6403e+10 2.6403e+10 64.309 6.656e-15 ***
## airco 1 4.1597e+10 4.1597e+10 101.320 < 2.2e-16 ***
## Residuals 539 2.2129e+11 4.1055e+08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results of the ANOVA test show that all the factors in the full factorial design have statistically significant effects on the value of house. From the results, we can conclude that the significance can be likely attribute to other than randomization. So we reject the null hypothesis that the four main factor does not have effect on the value of house.
In order to check the adequacy of the ANOVA test. We perform Quantile-Quantile (Q-Q) tests on the residual errors of the original data to determine if the residuals followed a normal distribution.
par(mfrow=c(2,2))
qqnorm(residuals(model))
qqline(residuals(model))
plot(fitted(model),residuals(model))
qqnorm(residuals(model1))
qqline(residuals(model1))
plot(fitted(model1),residuals(model1))
If the resonse ins normal distributed, the points should be shown in a straight dotted line. So in our study, the response may not be strictly normal distributed.
In this project, we use Taguchi Design to test the factors that may affect the value of houses. Taguchi Design can help reseachers run the experiment with lower cost and higher efficiency. However, we should also be aware that sometimes it may affect the accuracy. In our project, we do not find any significant effects. Both Taguchi designs and previously generated fractional factorial analysis have similiar findings.However, in the full factorial design, all these factors are significant. So we need to take this risk into account when we use Taguchi Design.
[1] Anglin, P.M. and R. Gencay (1996) “Semiparametric estimation of a hedonic price function”, Journal of Applied Econometrics, 11(6), 633-648.
[2] Cran.r Project (2015), Package ‘Ecdat’, Accessed: 11-02-2016 https://cran.r-project.org/web/packages/Ecdat/Ecdat.pdf
[3] Montgomery, D. (2013), Design and Analysis of Experiments, Wiley and Sons, 8th Edition, 752p.