1. Setting

Housing looking at the sales prices of houses in Windsor. It is a dataframe consiting of a cross-section from 1987 with 546 observations. The factors we will be focusing on can be seen below:

Two-Level Independent Variable : driveway - Does the house has a driveway ? Two-Level Independent Variable : fullbase - Does the house has a full finished basement ? Three-Level Independent Variable: stories - Stories number of stories excluding basement Three-Level Independent Variable: bathrms - Number of full bathrooms

This data is from the Ecdat package. I downloaded this package along with Ecfun.

library("Ecdat")
## Loading required package: Ecfun
## 
## Attaching package: 'Ecfun'
## The following object is masked from 'package:base':
## 
##     sign
## 
## Attaching package: 'Ecdat'
## The following object is masked from 'package:datasets':
## 
##     Orange
data("Housing")

Below is the first 6 rows of data:

head(Housing)
##   price lotsize bedrooms bathrms stories driveway recroom fullbase gashw
## 1 42000    5850        3       1       2      yes      no      yes    no
## 2 38500    4000        2       1       1      yes      no       no    no
## 3 49500    3060        3       1       1      yes      no       no    no
## 4 60500    6650        3       1       2      yes     yes       no    no
## 5 61000    6360        2       1       1      yes      no       no    no
## 6 66000    4160        3       1       1      yes     yes      yes    no
##   airco garagepl prefarea
## 1    no        1       no
## 2    no        0       no
## 3    no        0       no
## 4    no        0       no
## 5    no        0       no
## 6   yes        0       no

Below is the summary of the Unemployment dataframe:

summary(Housing)
##      price           lotsize         bedrooms        bathrms     
##  Min.   : 25000   Min.   : 1650   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 49125   1st Qu.: 3600   1st Qu.:2.000   1st Qu.:1.000  
##  Median : 62000   Median : 4600   Median :3.000   Median :1.000  
##  Mean   : 68122   Mean   : 5150   Mean   :2.965   Mean   :1.286  
##  3rd Qu.: 82000   3rd Qu.: 6360   3rd Qu.:3.000   3rd Qu.:2.000  
##  Max.   :190000   Max.   :16200   Max.   :6.000   Max.   :4.000  
##     stories      driveway  recroom   fullbase  gashw     airco    
##  Min.   :1.000   no : 77   no :449   no :355   no :521   no :373  
##  1st Qu.:1.000   yes:469   yes: 97   yes:191   yes: 25   yes:173  
##  Median :2.000                                                    
##  Mean   :1.808                                                    
##  3rd Qu.:2.000                                                    
##  Max.   :4.000                                                    
##     garagepl      prefarea 
##  Min.   :0.0000   no :418  
##  1st Qu.:0.0000   yes:128  
##  Median :0.0000            
##  Mean   :0.6923            
##  3rd Qu.:1.0000            
##  Max.   :3.0000

Factors and Levels

For this project we needed 2 two-level factors and 2 three-factor factors. As you can see below, the two “three-level” factors that I chose were originally four-level factors.

str(Housing)
## 'data.frame':    546 obs. of  12 variables:
##  $ price   : num  42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
##  $ lotsize : num  5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
##  $ bedrooms: num  3 2 3 3 2 3 3 3 3 3 ...
##  $ bathrms : num  1 1 1 1 1 1 2 1 1 2 ...
##  $ stories : num  2 1 1 2 1 1 2 3 1 4 ...
##  $ driveway: Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ recroom : Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 2 2 ...
##  $ fullbase: Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
##  $ gashw   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ airco   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 2 ...
##  $ garagepl: num  1 0 0 0 0 0 2 0 0 1 ...
##  $ prefarea: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
Housing$bathrms <- as.factor(Housing$bathrms)
Housing$stories <- as.factor(Housing$stories)
levels(Housing$bathrms)
## [1] "1" "2" "3" "4"
levels(Housing$stories)
## [1] "1" "2" "3" "4"

In order to get a dataframe that would fulfill the requirements of this project, I created two subsets. First, I removed all the observations with 4 stories and then all the observations with 4 bathrooms, to finally end up with the dataframe I will use: Housing3.

Housing2 = subset(Housing, stories != "4")
Housing3 = subset(Housing2, bathrms != "4")

Now when we set these variables to factors and check the levels we should be able to see the updated amount of levels. During this step for some reason R keeps the amount of levels the same as before so it is important to use the “droplevels” function.

Housing3$bathrms <- as.factor(Housing3$bathrms)
Housing3$stories <- as.factor(Housing3$stories)
levels(droplevels(Housing3$bathrms))
## [1] "1" "2" "3"
levels(droplevels(Housing3$stories))
## [1] "1" "2" "3"

Conitinuos Variables

There are two continuos variables in this dataframe; the sale price of the house in dollars, price, and the lot size of the property in square feet, lotsize.

Response Variable

The response variable for this dataframe is the sale price of the house measured in dollars, price.

The dataframe is now going to be altered again so that it is condensed into only response variable price, and the independent variables bathrms, stories, driveway, fullbase.

Housing3$lotsize <- NULL
Housing3$bedrooms <- NULL
Housing3$recroom <- NULL
Housing3$gashw <- NULL
Housing3$airco <- NULL
Housing3$garagepl <- NULL
Housing3$prefarea <- NULL
head(droplevels(Housing3))
##   price bathrms stories driveway fullbase
## 1 42000       1       2      yes      yes
## 2 38500       1       1      yes       no
## 3 49500       1       1      yes       no
## 4 60500       1       2      yes       no
## 5 61000       1       1      yes       no
## 6 66000       1       1      yes      yes
summary(droplevels(Housing3))
##      price        bathrms stories driveway  fullbase 
##  Min.   : 25000   1:390   1:227   no : 77   no :317  
##  1st Qu.: 48000   2:106   2:238   yes:428   yes:188  
##  Median : 60000   3:  9   3: 40                      
##  Mean   : 65292                                      
##  3rd Qu.: 77500                                      
##  Max.   :190000
str(droplevels(Housing3))
## 'data.frame':    505 obs. of  5 variables:
##  $ price   : num  42000 38500 49500 60500 61000 66000 66000 69000 83800 90000 ...
##  $ bathrms : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 2 1 1 2 ...
##  $ stories : Factor w/ 3 levels "1","2","3": 2 1 1 2 1 1 2 3 1 1 ...
##  $ driveway: Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ fullbase: Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 2 ...

In order to use these factors for a factorial design, we needed to define the factors with levels of high and low. For the 2 three-level factors they are already set this way, but the 2 two-level factors need to be set below to have their levels altered so that yes = 1 and no = 0.

# Find length of frame
l = nrow(Housing3) 

# Create four new frames for writing numbers instead of factors 
drivewaynum = data.frame(l)
fullbasenum = data.frame(l)
storiesnum = data.frame(l)
bathrmsnum = data.frame(l)

## Replacement loop
for (i in 1:l){
  
  # driveway replace yes with 1 and no with 0
  if (Housing3$driveway[i] == "yes"){
    drivewaynum[i,1] <- 1
  }  else {
    drivewaynum[i,1] <- 0
  }
  
  # fullbase replace yes with 1 and no with 0
  if (Housing3$fullbase[i] == "yes"){
    fullbasenum[i,1] <- 1
  }  else {
    fullbasenum[i,1] <- 0
  }
  
  #stories 1 = 0, 2 = 2, 3 = 2
  if (Housing3$stories[i] == "1"){
    storiesnum[i,1] <- 0
  }
  
  if (Housing3$stories[i] == "2") {
    storiesnum[i,1] <- 1
  } 
  if (Housing3$stories[i] == "3") {
    storiesnum[i,1] <- 2
  }

  #bathrms 1 = 0, 2 = 1, 3 = 2
  if (Housing3$bathrms[i] == "1"){
    bathrmsnum[i,1] <- 0
  }
  
  if (Housing3$bathrms[i] == "2") {
    bathrmsnum[i,1] <- 1
  } 
  if (Housing3$bathrms[i] == "3") {
    bathrmsnum[i,1] <- 2
  }
}

# Create numbered data frame out of column vectors and response variable 
Housing4 <- cbind(drivewaynum,fullbasenum,storiesnum,bathrmsnum,Housing3$price)

# Title columns of data frame appropriately
colnames(Housing4) <- c("driveway","full basement","stories","bathrooms","price")
head(Housing4)
##   driveway full basement stories bathrooms price
## 1        1             1       1         0 42000
## 2        1             0       0         0 38500
## 3        1             0       0         0 49500
## 4        1             0       1         0 60500
## 5        1             0       0         0 61000
## 6        1             1       0         0 66000
str(Housing4)
## 'data.frame':    505 obs. of  5 variables:
##  $ driveway     : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ full basement: num  1 0 0 0 0 1 1 0 1 1 ...
##  $ stories      : num  1 0 0 1 0 0 1 2 0 0 ...
##  $ bathrooms    : num  0 0 0 0 0 0 1 0 0 1 ...
##  $ price        : num  42000 38500 49500 60500 61000 66000 66000 69000 83800 90000 ...
summary(Housing4)
##     driveway      full basement       stories         bathrooms     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.0000   Median :1.0000   Median :0.0000  
##  Mean   :0.8475   Mean   :0.3723   Mean   :0.6297   Mean   :0.2455  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :2.0000   Max.   :2.0000  
##      price       
##  Min.   : 25000  
##  1st Qu.: 48000  
##  Median : 60000  
##  Mean   : 65292  
##  3rd Qu.: 77500  
##  Max.   :190000

2. Experimental Design

How will the experiment be organized and conducted to test the hypothesis?

The design of this experiment is a 2^m-3 fractional factorial design. We represent each of the three-level factor as 2 two-level factors. The fractional factorial design is used to compute the lowest possible resolution. Using the main effects and interaction effects we can then create a model using ANOVA analysis in order to create the final model.

What is the rationale for this design?

The reason one would use a fractional factorial design is not much different than any experimental design (unerstand affects and relationships in an experimental setting). The difference with this design is that it is more economical than a full design. For fractional factorials it is important to know the resolution of the design. This is because lower resolution designs are used as screeing designs and higher resolution designs are able to be used to estimate a full second order model.

3. Statistical Analysis

Exploratory Data Analysis) Graphics and Descriptive Summary

Below you can see the summary of our data again, as well as boxplots of the response variable, price against the four IVs:

summary(Housing4)
##     driveway      full basement       stories         bathrooms     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.0000   Median :1.0000   Median :0.0000  
##  Mean   :0.8475   Mean   :0.3723   Mean   :0.6297   Mean   :0.2455  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :2.0000   Max.   :2.0000  
##      price       
##  Min.   : 25000  
##  1st Qu.: 48000  
##  Median : 60000  
##  Mean   : 65292  
##  3rd Qu.: 77500  
##  Max.   :190000
boxplot(Housing4$price ~ Housing4$`full basement` + Housing4$driveway + Housing4$stories + Housing4$bathrooms, xlab= "fullbasement + driveway + stories + bathrooms",ylab="price of house", main="Histogram of Price and four IVs")

Testing

We will perform the testing of this experiment using the following hypothesis: Null: The different levels of the IVs do not significantly affect the response variable. Alternate: The different levels of the IVs do significantly affect the response variable.

Treatment Structure

Full Factorial design would have 64 runs:

Run A B C D E F
—- -1 1 1 1 1 -1
2 1 -1 1 1 -1 -1
3 1 1 1 1 -1 -1
4 1 -1 -1 -1 -1 1
5 -1 -1 1 -1 -1 1
6 -1 1 1 -1 -1 -1
7 -1 1 1 1 1 1
8 -1 -1 -1 -1 -1 -1
9 1 1 -1 1 -1 -1
10 -1 1 1 1 -1 1
11 -1 -1 1 -1 1 -1
12 -1 1 -1 1 1 1
13 1 1 -1 1 -1 1
14 -1 1 -1 -1 -1 -1
15 -1 -1 -1 -1 -1 1
16 1 -1 -1 1 1 -1
17 -1 1 1 -1 1 -1
18 1 -1 1 1 1 -1
19 -1 1 -1 -1 1 1
20 1 1 1 -1 -1 -1
21 1 -1 -1 -1 1 1
22 -1 -1 1 -1 -1 -1
23 -1 -1 1 1 -1 1
24 1 -1 1 -1 -1 1
25 1 -1 -1 1 -1 1
26 1 -1 1 -1 -1 -1
27 -1 -1 1 1 1 1
28 -1 -1 -1 1 1 1
29 1 1 1 1 1 1
30 1 -1 1 -1 1 -1
31 -1 -1 -1 -1 1 1
32 -1 1 1 1 -1 -1
33 -1 -1 1 1 -1 -1
34 1 -1 -1 -1 -1 -1
35 1 1 1 -1 1 -1
36 1 1 1 1 -1 1
37 1 -1 -1 1 -1 -1
38 -1 1 -1 -1 -1 1
39 1 1 1 -1 -1 1
40 -1 1 -1 1 -1 -1
41 1 -1 1 1 1 1
42 1 1 -1 -1 1 1
43 1 -1 -1 -1 1 -1
44 1 1 -1 -1 -1 1
45 1 1 1 1 1 -1
46 -1 1 1 -1 -1 1
47 -1 1 1 -1 1 1
48 -1 -1 -1 1 -1 -1
49 1 1 -1 -1 -1 -1
50 1 1 -1 1 1 1
51 -1 -1 1 1 1 -1
52 -1 -1 -1 1 1 -1
53 1 -1 -1 1 1 1
54 -1 -1 1 -1 1 1
55 1 1 -1 -1 1 -1
56 1 1 1 -1 1 1
57 1 -1 1 1 -1 1
58 -1 1 -1 1 -1 1
59 -1 1 -1 -1 1 -1
60 1 -1 1 -1 1 1
61 -1 1 -1 1 1 -1
62 -1 -1 -1 1 -1 1
63 -1 -1 -1 -1 1 -1
64 1 1 1 1 1 -1
library("FrF2")
## Loading required package: DoE.base
## Loading required package: grid
## Loading required package: conf.design
## 
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## The following object is masked from 'package:graphics':
## 
##     plot.design
## The following object is masked from 'package:base':
## 
##     lengths
#f = FrF2(64,6)

On the other hand, the fractional factorial design would have only 8 runs and can be seen below:

b = FrF2(8,6,res3 = T) 
print(b) 
##    A  B  C  D  E  F
## 1  1 -1  1 -1  1 -1
## 2 -1  1 -1 -1  1 -1
## 3 -1 -1  1  1 -1 -1
## 4  1  1  1  1  1  1
## 5 -1  1  1 -1 -1  1
## 6 -1 -1 -1  1  1  1
## 7  1  1 -1  1 -1 -1
## 8  1 -1 -1 -1 -1  1
## class=design, type= FrF2

Below you can see the factors that are confounded (the aliasing structure). Because this design is a resolution III, the main effects and 2-factor interactions are confounded with 2-factor interactions.

aliasprint(b)
## $legend
## [1] A=A B=B C=C D=D E=E F=F
## 
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD

Below are some boxplots of the response variable against the four IV:

boxplot(Housing4$price~Housing4$driveway, main="Affect of Driveway on House Price", ylab="House Price", xlab="1= Driveway, 0 = No Driveway")

There is a noticable difference above in the median house price between factor levels.

boxplot(Housing4$price~Housing4$`full basement`, main="Affect of Full Basement on House Price", ylab="House Price", xlab="1= Full Basement, 0 = No Full Basement")

There is a noticible increase in the median price of a house with a full basement verses a house without a full basement.

boxplot(Housing4$price~Housing4$stories, main="Affect of Number of Stories on House Price", ylab="House Price", xlab="0 = 1 Story, 1 = Two Stories 2 = Three Stories")

The difference in prices between one and two stories is slight, but noticiable. The increase in house price when the numbers of stories increases to three stories is quite large.

boxplot(Housing4$price~Housing4$bathrooms, main="Affect of Number of Bathrooms on House Price", ylab="House Price", xlab="0 = 1 Bathrooms, 1 = Two Bathrooms 2 = Three Bathrooms")

It is clear to see the the median House Price increases as the amount of bathrooms increases.

Testing: Model

I will be using a linear model and an ANOVA in order to estimate the main and interaction effects.

model1 = lm(Housing4$price ~ (Housing4$driveway + Housing4$'full basement' + Housing4$stories + Housing4$bathrooms)^2)
anova(model1)
## Analysis of Variance Table
## 
## Response: Housing4$price
##                                              Df     Sum Sq    Mean Sq
## Housing4$driveway                             1 2.5447e+10 2.5447e+10
## Housing4$"full basement"                      1 2.2698e+10 2.2698e+10
## Housing4$stories                              1 2.0159e+10 2.0159e+10
## Housing4$bathrooms                            1 4.0875e+10 4.0875e+10
## Housing4$driveway:Housing4$"full basement"    1 6.2472e+08 6.2472e+08
## Housing4$driveway:Housing4$stories            1 2.5928e+08 2.5928e+08
## Housing4$driveway:Housing4$bathrooms          1 3.4304e+09 3.4304e+09
## Housing4$"full basement":Housing4$stories     1 9.9197e+08 9.9197e+08
## Housing4$"full basement":Housing4$bathrooms   1 3.4163e+09 3.4163e+09
## Housing4$stories:Housing4$bathrooms           1 6.2293e+08 6.2293e+08
## Residuals                                   494 1.9799e+11 4.0079e+08
##                                              F value    Pr(>F)    
## Housing4$driveway                            63.4927 1.121e-14 ***
## Housing4$"full basement"                     56.6338 2.505e-13 ***
## Housing4$stories                             50.2979 4.599e-12 ***
## Housing4$bathrooms                          101.9864 < 2.2e-16 ***
## Housing4$driveway:Housing4$"full basement"    1.5587  0.212443    
## Housing4$driveway:Housing4$stories            0.6469  0.421596    
## Housing4$driveway:Housing4$bathrooms          8.5592  0.003596 ** 
## Housing4$"full basement":Housing4$stories     2.4751  0.116304    
## Housing4$"full basement":Housing4$bathrooms   8.5241  0.003665 ** 
## Housing4$stories:Housing4$bathrooms           1.5543  0.213099    
## Residuals                                                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA for main effects:

model2 = lm(Housing4$price ~ (Housing4$driveway + Housing4$'full basement' + Housing4$stories + Housing4$bathrooms))
anova(model2)
## Analysis of Variance Table
## 
## Response: Housing4$price
##                           Df     Sum Sq    Mean Sq F value    Pr(>F)    
## Housing4$driveway          1 2.5447e+10 2.5447e+10  61.367 2.863e-14 ***
## Housing4$"full basement"   1 2.2698e+10 2.2698e+10  54.738 5.863e-13 ***
## Housing4$stories           1 2.0159e+10 2.0159e+10  48.614 9.903e-12 ***
## Housing4$bathrooms         1 4.0875e+10 4.0875e+10  98.572 < 2.2e-16 ***
## Residuals                500 2.0733e+11 4.1467e+08                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This table shows that all of the variables have a significant effect of the price of a house.

Model Adequacy Checking

The models above used lm, assuming that the model is linear. Therefore we plot a normal Q-Q plot to check for normalcy.

qqnorm(residuals(model2), main = "Normal Q-Q Plot")
qqline(residuals(model2))

As you can see above, the residuals fit the normal line quite well, this is optimal. Below is the residuals plotted against the fitted values.

plot(fitted(model2),residuals(model2))

Although there is some linearity, the residuals are mostly randomly dispursed similar to a “snow storm” and do not increase significantly with the fitted values.

4. Appendix: Full R Code

#Clear consule
rm(list=ls())

#Read in unemployment data
library("Ecdat")
data(Housing)

#Summary of data
head(Housing)
summary(Housing)

#structure of data and levels of factors
str(Housing)
Housing$bathrms <- as.factor(Housing$bathrms)
Housing$stories <- as.factor(Housing$stories)
levels(Housing$bathrms)
levels(Housing$driveway)
levels(Housing$fullbase)
levels(Housing$stories)

#Make subsets to delete all rows with 4 stories or 4 bathrooms 
Housing2 = subset(Housing, stories != "4")
Housing3 = subset(Housing2, bathrms != "4")

#Make bathrms and stories factors 
Housing3$bathrms <- as.factor(Housing3$bathrms)
Housing3$stories <- as.factor(Housing3$stories)

#Drop old levels 
levels(droplevels(Housing3$bathrms))
levels(droplevels(Housing3$stories))

#condense to have only 4 IV and the respnse 
Housing3$lotsize <- NULL
Housing3$bedrooms <- NULL
Housing3$recroom <- NULL
Housing3$gashw <- NULL
Housing3$airco <- NULL
Housing3$garagepl <- NULL
Housing3$prefarea <- NULL

#Look at newly condensed dataframe
head(droplevels(Housing3))
summary(droplevels(Housing3))
str(droplevels(Housing3))

# Find length of frame
l = nrow(Housing3) 

# Create four new frames for writing numbers instead of factors 
drivewaynum = data.frame(l)
fullbasenum = data.frame(l)
storiesnum = data.frame(l)
bathrmsnum = data.frame(l)

## Replacement loop
for (i in 1:l){
  
  # driveway replace yes with 1 and no with 0
  if (Housing3$driveway[i] == "yes"){
    drivewaynum[i,1] <- 1
  }  else {
    drivewaynum[i,1] <- 0
  }
  
  # fullbase replace yes with 1 and no with 0
  if (Housing3$fullbase[i] == "yes"){
    fullbasenum[i,1] <- 1
  }  else {
    fullbasenum[i,1] <- 0
  }
  
  #stories 1 = 0, 2 = 1, 3 = 2
  if (Housing3$stories[i] == "1"){
    storiesnum[i,1] <- 0
  }
  
  if (Housing3$stories[i] == "2") {
    storiesnum[i,1] <- 1
  } 
  if (Housing3$stories[i] == "3") {
    storiesnum[i,1] <- 2
  }

  #bathrms 1 = 0, 2 = 1, 3 = 2
  if (Housing3$bathrms[i] == "1"){
    bathrmsnum[i,1] <- 0
  }
  
  if (Housing3$bathrms[i] == "2") {
    bathrmsnum[i,1] <- 1
  } 
  if (Housing3$bathrms[i] == "3") {
    bathrmsnum[i,1] <- 2
  }
}

  

# Create numbered data frame out of column vectors and response variable 
Housing4 <- cbind(drivewaynum,fullbasenum,storiesnum,bathrmsnum,Housing3$price)

# Title columns of data frame appropriately
colnames(Housing4) <- c("driveway","full basement","stories","bathrooms","price")
head(Housing4)

#Summary of new dataframe
head(Housing4)
str(Housing4)
summary(Housing4)

#Boxplot of price and four IVs
boxplot(Housing4$price ~ Housing4$`full basement` + Housing4$driveway +
          Housing4$stories + Housing4$bathrooms, xlab= "fullbasement + 
        driveway + stories + bathrooms",ylab="price of house", 
        main="Histogram of Price and four IVs")

#Install FrF2
install.packages("FrF2")
library("FrF2")

#64 Full Factorial Runs
f = FrF2(64,6)
print(f)

#FrF2
b <- FrF2(8,6,res3 = T) 
print(b) 
#Alias structure
aliasprint(b) 

#Boxplots of Four IVS and response price
boxplot(Housing4$price~Housing4$driveway, 
        main="Affect of Driveway on House Price", 
        ylab="House Price", xlab="1= Driveway, 0 = No Driveway")
boxplot(Housing4$price~Housing4$`full basement`, 
        main="Affect of Full Basement on House Price", 
        ylab="House Price", xlab="1= Full Basement, 0 = No Full Basement")
boxplot(Housing4$price~Housing4$stories, 
        main="Affect of Number of Stories on House Price", 
        ylab="House Price", xlab="0 = 1 Story, 1 = Two Stories 2 = Three Stories")
boxplot(Housing4$price~Housing4$bathrooms, 
        main="Affect of Number of Bathrooms on House Price", 
        ylab="House Price", xlab="0 = 1 Bathrooms, 1 = Two Bathrooms 2 = Three Bathrooms")


#Interaction Effects and Main Effects Model
model1 = lm(Housing4$price ~ (Housing4$driveway + Housing4$'full basement' + Housing4$stories + Housing4$bathrooms)^2)
anova(model1)

#Main Effects Model
model2 = lm(Housing4$price ~ (Housing4$driveway + Housing4$'full basement' + Housing4$stories + Housing4$bathrooms))
anova(model2)

# Model Adequacy Checking
qqnorm(residuals(model2), main = "Normal Q-Q Plot")
qqline(residuals(model2))
plot(fitted(model2),residuals(model2))

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.