Housing looking at the sales prices of houses in Windsor. It is a dataframe consiting of a cross-section from 1987 with 546 observations. The factors we will be focusing on can be seen below:
Two-Level Independent Variable : driveway - Does the house has a driveway ? Two-Level Independent Variable : fullbase - Does the house has a full finished basement ? Three-Level Independent Variable: stories - Stories number of stories excluding basement Three-Level Independent Variable: bathrms - Number of full bathrooms
This data is from the Ecdat package. I downloaded this package along with Ecfun.
library("Ecdat")
## Loading required package: Ecfun
##
## Attaching package: 'Ecfun'
## The following object is masked from 'package:base':
##
## sign
##
## Attaching package: 'Ecdat'
## The following object is masked from 'package:datasets':
##
## Orange
data("Housing")
Below is the first 6 rows of data:
head(Housing)
## price lotsize bedrooms bathrms stories driveway recroom fullbase gashw
## 1 42000 5850 3 1 2 yes no yes no
## 2 38500 4000 2 1 1 yes no no no
## 3 49500 3060 3 1 1 yes no no no
## 4 60500 6650 3 1 2 yes yes no no
## 5 61000 6360 2 1 1 yes no no no
## 6 66000 4160 3 1 1 yes yes yes no
## airco garagepl prefarea
## 1 no 1 no
## 2 no 0 no
## 3 no 0 no
## 4 no 0 no
## 5 no 0 no
## 6 yes 0 no
Below is the summary of the Unemployment dataframe:
summary(Housing)
## price lotsize bedrooms bathrms
## Min. : 25000 Min. : 1650 Min. :1.000 Min. :1.000
## 1st Qu.: 49125 1st Qu.: 3600 1st Qu.:2.000 1st Qu.:1.000
## Median : 62000 Median : 4600 Median :3.000 Median :1.000
## Mean : 68122 Mean : 5150 Mean :2.965 Mean :1.286
## 3rd Qu.: 82000 3rd Qu.: 6360 3rd Qu.:3.000 3rd Qu.:2.000
## Max. :190000 Max. :16200 Max. :6.000 Max. :4.000
## stories driveway recroom fullbase gashw airco
## Min. :1.000 no : 77 no :449 no :355 no :521 no :373
## 1st Qu.:1.000 yes:469 yes: 97 yes:191 yes: 25 yes:173
## Median :2.000
## Mean :1.808
## 3rd Qu.:2.000
## Max. :4.000
## garagepl prefarea
## Min. :0.0000 no :418
## 1st Qu.:0.0000 yes:128
## Median :0.0000
## Mean :0.6923
## 3rd Qu.:1.0000
## Max. :3.0000
For this project we needed 2 two-level factors and 2 three-factor factors. As you can see below, the two “three-level” factors that I chose were originally four-level factors.
str(Housing)
## 'data.frame': 546 obs. of 12 variables:
## $ price : num 42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
## $ lotsize : num 5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
## $ bedrooms: num 3 2 3 3 2 3 3 3 3 3 ...
## $ bathrms : num 1 1 1 1 1 1 2 1 1 2 ...
## $ stories : num 2 1 1 2 1 1 2 3 1 4 ...
## $ driveway: Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ recroom : Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 2 2 ...
## $ fullbase: Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
## $ gashw : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ airco : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 2 ...
## $ garagepl: num 1 0 0 0 0 0 2 0 0 1 ...
## $ prefarea: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
Housing$bathrms <- as.factor(Housing$bathrms)
Housing$stories <- as.factor(Housing$stories)
levels(Housing$bathrms)
## [1] "1" "2" "3" "4"
levels(Housing$stories)
## [1] "1" "2" "3" "4"
In order to get a dataframe that would fulfill the requirements of this project, I created two subsets. First, I removed all the observations with 4 stories and then all the observations with 4 bathrooms, to finally end up with the dataframe I will use: Housing3.
Housing2 = subset(Housing, stories != "4")
Housing3 = subset(Housing2, bathrms != "4")
Now when we set these variables to factors and check the levels we should be able to see the updated amount of levels. During this step for some reason R keeps the amount of levels the same as before so it is important to use the “droplevels” function.
Housing3$bathrms <- as.factor(Housing3$bathrms)
Housing3$stories <- as.factor(Housing3$stories)
levels(droplevels(Housing3$bathrms))
## [1] "1" "2" "3"
levels(droplevels(Housing3$stories))
## [1] "1" "2" "3"
There are two continuos variables in this dataframe; the sale price of the house in dollars, price, and the lot size of the property in square feet, lotsize.
The response variable for this dataframe is the sale price of the house measured in dollars, price.
The dataframe is now going to be altered again so that it is condensed into only response variable price, and the independent variables bathrms, stories, driveway, fullbase.
Housing3$lotsize <- NULL
Housing3$bedrooms <- NULL
Housing3$recroom <- NULL
Housing3$gashw <- NULL
Housing3$airco <- NULL
Housing3$garagepl <- NULL
Housing3$prefarea <- NULL
head(droplevels(Housing3))
## price bathrms stories driveway fullbase
## 1 42000 1 2 yes yes
## 2 38500 1 1 yes no
## 3 49500 1 1 yes no
## 4 60500 1 2 yes no
## 5 61000 1 1 yes no
## 6 66000 1 1 yes yes
summary(droplevels(Housing3))
## price bathrms stories driveway fullbase
## Min. : 25000 1:390 1:227 no : 77 no :317
## 1st Qu.: 48000 2:106 2:238 yes:428 yes:188
## Median : 60000 3: 9 3: 40
## Mean : 65292
## 3rd Qu.: 77500
## Max. :190000
str(droplevels(Housing3))
## 'data.frame': 505 obs. of 5 variables:
## $ price : num 42000 38500 49500 60500 61000 66000 66000 69000 83800 90000 ...
## $ bathrms : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 2 1 1 2 ...
## $ stories : Factor w/ 3 levels "1","2","3": 2 1 1 2 1 1 2 3 1 1 ...
## $ driveway: Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ fullbase: Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 2 ...
In order to use these factors for a factorial design, we needed to define the factors with levels of high and low. For the 2 three-level factors they are already set this way, but the 2 two-level factors need to be set below to have their levels altered so that yes = 1 and no = 0.
# Find length of frame
l = nrow(Housing3)
# Create four new frames for writing numbers instead of factors
drivewaynum = data.frame(l)
fullbasenum = data.frame(l)
storiesnum = data.frame(l)
bathrmsnum = data.frame(l)
## Replacement loop
for (i in 1:l){
# driveway replace yes with 1 and no with 0
if (Housing3$driveway[i] == "yes"){
drivewaynum[i,1] <- 1
} else {
drivewaynum[i,1] <- 0
}
# fullbase replace yes with 1 and no with 0
if (Housing3$fullbase[i] == "yes"){
fullbasenum[i,1] <- 1
} else {
fullbasenum[i,1] <- 0
}
#stories 1 = 0, 2 = 2, 3 = 2
if (Housing3$stories[i] == "1"){
storiesnum[i,1] <- 0
}
if (Housing3$stories[i] == "2") {
storiesnum[i,1] <- 1
}
if (Housing3$stories[i] == "3") {
storiesnum[i,1] <- 2
}
#bathrms 1 = 0, 2 = 1, 3 = 2
if (Housing3$bathrms[i] == "1"){
bathrmsnum[i,1] <- 0
}
if (Housing3$bathrms[i] == "2") {
bathrmsnum[i,1] <- 1
}
if (Housing3$bathrms[i] == "3") {
bathrmsnum[i,1] <- 2
}
}
# Create numbered data frame out of column vectors and response variable
Housing4 <- cbind(drivewaynum,fullbasenum,storiesnum,bathrmsnum,Housing3$price)
# Title columns of data frame appropriately
colnames(Housing4) <- c("driveway","full basement","stories","bathrooms","price")
head(Housing4)
## driveway full basement stories bathrooms price
## 1 1 1 1 0 42000
## 2 1 0 0 0 38500
## 3 1 0 0 0 49500
## 4 1 0 1 0 60500
## 5 1 0 0 0 61000
## 6 1 1 0 0 66000
str(Housing4)
## 'data.frame': 505 obs. of 5 variables:
## $ driveway : num 1 1 1 1 1 1 1 1 1 1 ...
## $ full basement: num 1 0 0 0 0 1 1 0 1 1 ...
## $ stories : num 1 0 0 1 0 0 1 2 0 0 ...
## $ bathrooms : num 0 0 0 0 0 0 1 0 0 1 ...
## $ price : num 42000 38500 49500 60500 61000 66000 66000 69000 83800 90000 ...
summary(Housing4)
## driveway full basement stories bathrooms
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :1.0000 Median :0.0000
## Mean :0.8475 Mean :0.3723 Mean :0.6297 Mean :0.2455
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :2.0000 Max. :2.0000
## price
## Min. : 25000
## 1st Qu.: 48000
## Median : 60000
## Mean : 65292
## 3rd Qu.: 77500
## Max. :190000
The design of this experiment is a 2^m-3 fractional factorial design. We represent each of the three-level factor as 2 two-level factors. The fractional factorial design is used to compute the lowest possible resolution. Using the main effects and interaction effects we can then create a model using ANOVA analysis in order to create the final model.
The reason one would use a fractional factorial design is not much different than any experimental design (unerstand affects and relationships in an experimental setting). The difference with this design is that it is more economical than a full design. For fractional factorials it is important to know the resolution of the design. This is because lower resolution designs are used as screeing designs and higher resolution designs are able to be used to estimate a full second order model.
Below you can see the summary of our data again, as well as boxplots of the response variable, price against the four IVs:
summary(Housing4)
## driveway full basement stories bathrooms
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :1.0000 Median :0.0000
## Mean :0.8475 Mean :0.3723 Mean :0.6297 Mean :0.2455
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :2.0000 Max. :2.0000
## price
## Min. : 25000
## 1st Qu.: 48000
## Median : 60000
## Mean : 65292
## 3rd Qu.: 77500
## Max. :190000
boxplot(Housing4$price ~ Housing4$`full basement` + Housing4$driveway + Housing4$stories + Housing4$bathrooms, xlab= "fullbasement + driveway + stories + bathrooms",ylab="price of house", main="Histogram of Price and four IVs")
We will perform the testing of this experiment using the following hypothesis: Null: The different levels of the IVs do not significantly affect the response variable. Alternate: The different levels of the IVs do significantly affect the response variable.
Full Factorial design would have 64 runs:
| Run | A | B | C | D | E | F |
|---|---|---|---|---|---|---|
| —- | -1 | 1 | 1 | 1 | 1 | -1 |
| 2 | 1 | -1 | 1 | 1 | -1 | -1 |
| 3 | 1 | 1 | 1 | 1 | -1 | -1 |
| 4 | 1 | -1 | -1 | -1 | -1 | 1 |
| 5 | -1 | -1 | 1 | -1 | -1 | 1 |
| 6 | -1 | 1 | 1 | -1 | -1 | -1 |
| 7 | -1 | 1 | 1 | 1 | 1 | 1 |
| 8 | -1 | -1 | -1 | -1 | -1 | -1 |
| 9 | 1 | 1 | -1 | 1 | -1 | -1 |
| 10 | -1 | 1 | 1 | 1 | -1 | 1 |
| 11 | -1 | -1 | 1 | -1 | 1 | -1 |
| 12 | -1 | 1 | -1 | 1 | 1 | 1 |
| 13 | 1 | 1 | -1 | 1 | -1 | 1 |
| 14 | -1 | 1 | -1 | -1 | -1 | -1 |
| 15 | -1 | -1 | -1 | -1 | -1 | 1 |
| 16 | 1 | -1 | -1 | 1 | 1 | -1 |
| 17 | -1 | 1 | 1 | -1 | 1 | -1 |
| 18 | 1 | -1 | 1 | 1 | 1 | -1 |
| 19 | -1 | 1 | -1 | -1 | 1 | 1 |
| 20 | 1 | 1 | 1 | -1 | -1 | -1 |
| 21 | 1 | -1 | -1 | -1 | 1 | 1 |
| 22 | -1 | -1 | 1 | -1 | -1 | -1 |
| 23 | -1 | -1 | 1 | 1 | -1 | 1 |
| 24 | 1 | -1 | 1 | -1 | -1 | 1 |
| 25 | 1 | -1 | -1 | 1 | -1 | 1 |
| 26 | 1 | -1 | 1 | -1 | -1 | -1 |
| 27 | -1 | -1 | 1 | 1 | 1 | 1 |
| 28 | -1 | -1 | -1 | 1 | 1 | 1 |
| 29 | 1 | 1 | 1 | 1 | 1 | 1 |
| 30 | 1 | -1 | 1 | -1 | 1 | -1 |
| 31 | -1 | -1 | -1 | -1 | 1 | 1 |
| 32 | -1 | 1 | 1 | 1 | -1 | -1 |
| 33 | -1 | -1 | 1 | 1 | -1 | -1 |
| 34 | 1 | -1 | -1 | -1 | -1 | -1 |
| 35 | 1 | 1 | 1 | -1 | 1 | -1 |
| 36 | 1 | 1 | 1 | 1 | -1 | 1 |
| 37 | 1 | -1 | -1 | 1 | -1 | -1 |
| 38 | -1 | 1 | -1 | -1 | -1 | 1 |
| 39 | 1 | 1 | 1 | -1 | -1 | 1 |
| 40 | -1 | 1 | -1 | 1 | -1 | -1 |
| 41 | 1 | -1 | 1 | 1 | 1 | 1 |
| 42 | 1 | 1 | -1 | -1 | 1 | 1 |
| 43 | 1 | -1 | -1 | -1 | 1 | -1 |
| 44 | 1 | 1 | -1 | -1 | -1 | 1 |
| 45 | 1 | 1 | 1 | 1 | 1 | -1 |
| 46 | -1 | 1 | 1 | -1 | -1 | 1 |
| 47 | -1 | 1 | 1 | -1 | 1 | 1 |
| 48 | -1 | -1 | -1 | 1 | -1 | -1 |
| 49 | 1 | 1 | -1 | -1 | -1 | -1 |
| 50 | 1 | 1 | -1 | 1 | 1 | 1 |
| 51 | -1 | -1 | 1 | 1 | 1 | -1 |
| 52 | -1 | -1 | -1 | 1 | 1 | -1 |
| 53 | 1 | -1 | -1 | 1 | 1 | 1 |
| 54 | -1 | -1 | 1 | -1 | 1 | 1 |
| 55 | 1 | 1 | -1 | -1 | 1 | -1 |
| 56 | 1 | 1 | 1 | -1 | 1 | 1 |
| 57 | 1 | -1 | 1 | 1 | -1 | 1 |
| 58 | -1 | 1 | -1 | 1 | -1 | 1 |
| 59 | -1 | 1 | -1 | -1 | 1 | -1 |
| 60 | 1 | -1 | 1 | -1 | 1 | 1 |
| 61 | -1 | 1 | -1 | 1 | 1 | -1 |
| 62 | -1 | -1 | -1 | 1 | -1 | 1 |
| 63 | -1 | -1 | -1 | -1 | 1 | -1 |
| 64 | 1 | 1 | 1 | 1 | 1 | -1 |
library("FrF2")
## Loading required package: DoE.base
## Loading required package: grid
## Loading required package: conf.design
##
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
##
## aov, lm
## The following object is masked from 'package:graphics':
##
## plot.design
## The following object is masked from 'package:base':
##
## lengths
#f = FrF2(64,6)
On the other hand, the fractional factorial design would have only 8 runs and can be seen below:
b = FrF2(8,6,res3 = T)
print(b)
## A B C D E F
## 1 1 -1 1 -1 1 -1
## 2 -1 1 -1 -1 1 -1
## 3 -1 -1 1 1 -1 -1
## 4 1 1 1 1 1 1
## 5 -1 1 1 -1 -1 1
## 6 -1 -1 -1 1 1 1
## 7 1 1 -1 1 -1 -1
## 8 1 -1 -1 -1 -1 1
## class=design, type= FrF2
Below you can see the factors that are confounded (the aliasing structure). Because this design is a resolution III, the main effects and 2-factor interactions are confounded with 2-factor interactions.
aliasprint(b)
## $legend
## [1] A=A B=B C=C D=D E=E F=F
##
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
##
## $fi2
## [1] AF=BE=CD
Below are some boxplots of the response variable against the four IV:
boxplot(Housing4$price~Housing4$driveway, main="Affect of Driveway on House Price", ylab="House Price", xlab="1= Driveway, 0 = No Driveway")
There is a noticable difference above in the median house price between factor levels.
boxplot(Housing4$price~Housing4$`full basement`, main="Affect of Full Basement on House Price", ylab="House Price", xlab="1= Full Basement, 0 = No Full Basement")
There is a noticible increase in the median price of a house with a full basement verses a house without a full basement.
boxplot(Housing4$price~Housing4$stories, main="Affect of Number of Stories on House Price", ylab="House Price", xlab="0 = 1 Story, 1 = Two Stories 2 = Three Stories")
The difference in prices between one and two stories is slight, but noticiable. The increase in house price when the numbers of stories increases to three stories is quite large.
boxplot(Housing4$price~Housing4$bathrooms, main="Affect of Number of Bathrooms on House Price", ylab="House Price", xlab="0 = 1 Bathrooms, 1 = Two Bathrooms 2 = Three Bathrooms")
It is clear to see the the median House Price increases as the amount of bathrooms increases.
I will be using a linear model and an ANOVA in order to estimate the main and interaction effects.
model1 = lm(Housing4$price ~ (Housing4$driveway + Housing4$'full basement' + Housing4$stories + Housing4$bathrooms)^2)
anova(model1)
## Analysis of Variance Table
##
## Response: Housing4$price
## Df Sum Sq Mean Sq
## Housing4$driveway 1 2.5447e+10 2.5447e+10
## Housing4$"full basement" 1 2.2698e+10 2.2698e+10
## Housing4$stories 1 2.0159e+10 2.0159e+10
## Housing4$bathrooms 1 4.0875e+10 4.0875e+10
## Housing4$driveway:Housing4$"full basement" 1 6.2472e+08 6.2472e+08
## Housing4$driveway:Housing4$stories 1 2.5928e+08 2.5928e+08
## Housing4$driveway:Housing4$bathrooms 1 3.4304e+09 3.4304e+09
## Housing4$"full basement":Housing4$stories 1 9.9197e+08 9.9197e+08
## Housing4$"full basement":Housing4$bathrooms 1 3.4163e+09 3.4163e+09
## Housing4$stories:Housing4$bathrooms 1 6.2293e+08 6.2293e+08
## Residuals 494 1.9799e+11 4.0079e+08
## F value Pr(>F)
## Housing4$driveway 63.4927 1.121e-14 ***
## Housing4$"full basement" 56.6338 2.505e-13 ***
## Housing4$stories 50.2979 4.599e-12 ***
## Housing4$bathrooms 101.9864 < 2.2e-16 ***
## Housing4$driveway:Housing4$"full basement" 1.5587 0.212443
## Housing4$driveway:Housing4$stories 0.6469 0.421596
## Housing4$driveway:Housing4$bathrooms 8.5592 0.003596 **
## Housing4$"full basement":Housing4$stories 2.4751 0.116304
## Housing4$"full basement":Housing4$bathrooms 8.5241 0.003665 **
## Housing4$stories:Housing4$bathrooms 1.5543 0.213099
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ANOVA for main effects:
model2 = lm(Housing4$price ~ (Housing4$driveway + Housing4$'full basement' + Housing4$stories + Housing4$bathrooms))
anova(model2)
## Analysis of Variance Table
##
## Response: Housing4$price
## Df Sum Sq Mean Sq F value Pr(>F)
## Housing4$driveway 1 2.5447e+10 2.5447e+10 61.367 2.863e-14 ***
## Housing4$"full basement" 1 2.2698e+10 2.2698e+10 54.738 5.863e-13 ***
## Housing4$stories 1 2.0159e+10 2.0159e+10 48.614 9.903e-12 ***
## Housing4$bathrooms 1 4.0875e+10 4.0875e+10 98.572 < 2.2e-16 ***
## Residuals 500 2.0733e+11 4.1467e+08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This table shows that all of the variables have a significant effect of the price of a house.
The models above used lm, assuming that the model is linear. Therefore we plot a normal Q-Q plot to check for normalcy.
qqnorm(residuals(model2), main = "Normal Q-Q Plot")
qqline(residuals(model2))
As you can see above, the residuals fit the normal line quite well, this is optimal. Below is the residuals plotted against the fitted values.
plot(fitted(model2),residuals(model2))
Although there is some linearity, the residuals are mostly randomly dispursed similar to a “snow storm” and do not increase significantly with the fitted values.
#Clear consule
rm(list=ls())
#Read in unemployment data
library("Ecdat")
data(Housing)
#Summary of data
head(Housing)
summary(Housing)
#structure of data and levels of factors
str(Housing)
Housing$bathrms <- as.factor(Housing$bathrms)
Housing$stories <- as.factor(Housing$stories)
levels(Housing$bathrms)
levels(Housing$driveway)
levels(Housing$fullbase)
levels(Housing$stories)
#Make subsets to delete all rows with 4 stories or 4 bathrooms
Housing2 = subset(Housing, stories != "4")
Housing3 = subset(Housing2, bathrms != "4")
#Make bathrms and stories factors
Housing3$bathrms <- as.factor(Housing3$bathrms)
Housing3$stories <- as.factor(Housing3$stories)
#Drop old levels
levels(droplevels(Housing3$bathrms))
levels(droplevels(Housing3$stories))
#condense to have only 4 IV and the respnse
Housing3$lotsize <- NULL
Housing3$bedrooms <- NULL
Housing3$recroom <- NULL
Housing3$gashw <- NULL
Housing3$airco <- NULL
Housing3$garagepl <- NULL
Housing3$prefarea <- NULL
#Look at newly condensed dataframe
head(droplevels(Housing3))
summary(droplevels(Housing3))
str(droplevels(Housing3))
# Find length of frame
l = nrow(Housing3)
# Create four new frames for writing numbers instead of factors
drivewaynum = data.frame(l)
fullbasenum = data.frame(l)
storiesnum = data.frame(l)
bathrmsnum = data.frame(l)
## Replacement loop
for (i in 1:l){
# driveway replace yes with 1 and no with 0
if (Housing3$driveway[i] == "yes"){
drivewaynum[i,1] <- 1
} else {
drivewaynum[i,1] <- 0
}
# fullbase replace yes with 1 and no with 0
if (Housing3$fullbase[i] == "yes"){
fullbasenum[i,1] <- 1
} else {
fullbasenum[i,1] <- 0
}
#stories 1 = 0, 2 = 1, 3 = 2
if (Housing3$stories[i] == "1"){
storiesnum[i,1] <- 0
}
if (Housing3$stories[i] == "2") {
storiesnum[i,1] <- 1
}
if (Housing3$stories[i] == "3") {
storiesnum[i,1] <- 2
}
#bathrms 1 = 0, 2 = 1, 3 = 2
if (Housing3$bathrms[i] == "1"){
bathrmsnum[i,1] <- 0
}
if (Housing3$bathrms[i] == "2") {
bathrmsnum[i,1] <- 1
}
if (Housing3$bathrms[i] == "3") {
bathrmsnum[i,1] <- 2
}
}
# Create numbered data frame out of column vectors and response variable
Housing4 <- cbind(drivewaynum,fullbasenum,storiesnum,bathrmsnum,Housing3$price)
# Title columns of data frame appropriately
colnames(Housing4) <- c("driveway","full basement","stories","bathrooms","price")
head(Housing4)
#Summary of new dataframe
head(Housing4)
str(Housing4)
summary(Housing4)
#Boxplot of price and four IVs
boxplot(Housing4$price ~ Housing4$`full basement` + Housing4$driveway +
Housing4$stories + Housing4$bathrooms, xlab= "fullbasement +
driveway + stories + bathrooms",ylab="price of house",
main="Histogram of Price and four IVs")
#Install FrF2
install.packages("FrF2")
library("FrF2")
#64 Full Factorial Runs
f = FrF2(64,6)
print(f)
#FrF2
b <- FrF2(8,6,res3 = T)
print(b)
#Alias structure
aliasprint(b)
#Boxplots of Four IVS and response price
boxplot(Housing4$price~Housing4$driveway,
main="Affect of Driveway on House Price",
ylab="House Price", xlab="1= Driveway, 0 = No Driveway")
boxplot(Housing4$price~Housing4$`full basement`,
main="Affect of Full Basement on House Price",
ylab="House Price", xlab="1= Full Basement, 0 = No Full Basement")
boxplot(Housing4$price~Housing4$stories,
main="Affect of Number of Stories on House Price",
ylab="House Price", xlab="0 = 1 Story, 1 = Two Stories 2 = Three Stories")
boxplot(Housing4$price~Housing4$bathrooms,
main="Affect of Number of Bathrooms on House Price",
ylab="House Price", xlab="0 = 1 Bathrooms, 1 = Two Bathrooms 2 = Three Bathrooms")
#Interaction Effects and Main Effects Model
model1 = lm(Housing4$price ~ (Housing4$driveway + Housing4$'full basement' + Housing4$stories + Housing4$bathrooms)^2)
anova(model1)
#Main Effects Model
model2 = lm(Housing4$price ~ (Housing4$driveway + Housing4$'full basement' + Housing4$stories + Housing4$bathrooms))
anova(model2)
# Model Adequacy Checking
qqnorm(residuals(model2), main = "Normal Q-Q Plot")
qqline(residuals(model2))
plot(fitted(model2),residuals(model2))
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.