“If you can tame a bull elephant, don’t waste your skill by making it seek alms on the road.” - Tamil Epigram.

Data

url <- "http://latul.be/mbaa_531/data/nanaimo.csv"

nanaimo <- read.csv(url)

nanaimo <- nanaimo[!(is.na(nanaimo$price)), ]
nanaimo <- nanaimo[!(is.na(nanaimo$age)), ]

# Details of the data frame

ncol(nanaimo)
## [1] 40
names(nanaimo)
##  [1] "X"            "address"      "price"        "mls"          "lat"         
##  [6] "lng"          "bed"          "area"         "bath"         "type"        
## [11] "landarea"     "lotwidth"     "lotdepth"     "water"        "lotshape"    
## [16] "sewer"        "access"       "zoning"       "mobile"       "stratainfo"  
## [21] "style"        "age"          "construction" "foundation"   "exterior"    
## [26] "bsmttype"     "bsmtdev"      "insulceil"    "insulwalls"   "roof"        
## [31] "heating"      "fuel"         "aircond"      "parking"      "title"       
## [36] "restrictions" "taxes"        "taxyear"      "stratafee"    "insulation"
nrow(nanaimo)
## [1] 246

Dummy Variable

The dummy variable “builtAfter2000” will hold 1 if the house was built in or after 2000. It will hold 0 for all houses built before 2000.

The variables used for this analysis are “price”, “bed” , “builtAfter1995” and “builtAfter2000”

builtAfter2000 <- factor(ifelse( nanaimo$age >= '2000', 1, 0))

builtAfter1995 <- factor(ifelse( nanaimo$age >= '1995', 1, 0))

nanaimo <- cbind(nanaimo, builtAfter1995)

nanaimo <- cbind(nanaimo, builtAfter2000)

head(nanaimo[,c("price","bed", "age", "builtAfter1995", "builtAfter2000")])
##    price bed  age builtAfter1995 builtAfter2000
## 2  19900   2 1975              0              0
## 3  25000   1 2008              1              1
## 5  32500   2 1980              0              0
## 6  33000   2 1983              0              0
## 7  33500   2 2007              1              1
## 21 54000   2 2005              1              1

Geom_point() and Geom_smooth()

Using geom_point() and geom_smooth()… you know, jus cuz ;D… to visualize the relationship between the selected variables.

library( ggplot2 )


ggplot( data = nanaimo, mapping = aes( x = bed, y = price, colour =  builtAfter2000  ) ) +
    geom_point( alpha = .4 ) +
    geom_smooth( method = 'lm',formula = y ~ x, se = FALSE )

ggplot( data = nanaimo, mapping = aes( x = bath, y = price, colour =  builtAfter1995  ) ) +
    geom_point( alpha = .4 ) +
    geom_smooth( method = 'lm',formula = y ~ x, se = FALSE )

Multiple Regression using lm() function

A linear model is generated using the lm() function to check the effect of the dummy variable and the number of beds on the housing prices in Nanaimo.

mhousing <- lm(formula = price ~ builtAfter2000 + bed + bed * builtAfter2000, data = nanaimo)

summary(mhousing)
## 
## Call:
## lm(formula = price ~ builtAfter2000 + bed + bed * builtAfter2000, 
##     data = nanaimo)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -391777 -130880  -37568   96451  622067 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           146423      48940   2.992  0.00306 ** 
## builtAfter20001       -16219      71996  -0.225  0.82196    
## bed                    60170      15164   3.968 9.55e-05 ***
## builtAfter20001:bed    50125      23646   2.120  0.03504 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 196400 on 242 degrees of freedom
## Multiple R-squared:  0.2192, Adjusted R-squared:  0.2095 
## F-statistic: 22.65 on 3 and 242 DF,  p-value: 5.881e-13
mhousing$coefficients
##         (Intercept)     builtAfter20001                 bed builtAfter20001:bed 
##           146422.46           -16218.79            60170.24            50124.49
bedBefore2000 <- mhousing$coefficient[3]

bedAfter2000 <- sum(mhousing$coefficient[3:4])
mhousing <- lm(formula = price ~ builtAfter1995 + bath + bath * builtAfter1995, data = nanaimo)

summary(mhousing)
## 
## Call:
## lm(formula = price ~ builtAfter1995 + bath + bath * builtAfter1995, 
##     data = nanaimo)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -344914 -112674  -40624   72355  682391 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            136236      37324   3.650 0.000321 ***
## builtAfter19951        -39627      55714  -0.711 0.477612    
## bath                    96549      16394   5.889 1.29e-08 ***
## builtAfter19951:bath    40104      23425   1.712 0.088181 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 183900 on 242 degrees of freedom
## Multiple R-squared:  0.3152, Adjusted R-squared:  0.3067 
## F-statistic: 37.13 on 3 and 242 DF,  p-value: < 2.2e-16
mhousing$coefficients
##          (Intercept)      builtAfter19951                 bath 
##            136235.49            -39626.85             96548.93 
## builtAfter19951:bath 
##             40103.62
bathBefore1995 <- mhousing$coefficient[3]

bathAfter1995 <- sum(mhousing$coefficient[3:4])

# Cost of an extra bath

bathBefore1995
##     bath 
## 96548.93
bathAfter1995
## [1] 136652.6

Enfin,

The observations made from the graph using geom_point() and geom_smooth(), along with the lm() function with the selected variables are:

  1. Houses built after the year 2000 have from 0 to 5 bedrooms.
  2. Buildings built before 2000 have 1 to 6 bedrooms.
  3. An additional bedroom in a house built before the year 2000 will cost 60170.24 more.
  4. An additional bedroom in a house built after 2000 will cost 110294.7 more.
  5. Houses built after the year 2000 also have a steeper price increase as shown by the green slope.

And also, it’s costlier to buy a house with an extra bath if it has a bath already and if the house is built after the year 1995