ஓ
“If you can tame a bull elephant, don’t waste your skill by making it seek alms on the road.” - Tamil Epigram.
url <- "http://latul.be/mbaa_531/data/nanaimo.csv"
nanaimo <- read.csv(url)
nanaimo <- nanaimo[!(is.na(nanaimo$price)), ]
nanaimo <- nanaimo[!(is.na(nanaimo$age)), ]
# Details of the data frame
ncol(nanaimo)
## [1] 40
names(nanaimo)
## [1] "X" "address" "price" "mls" "lat"
## [6] "lng" "bed" "area" "bath" "type"
## [11] "landarea" "lotwidth" "lotdepth" "water" "lotshape"
## [16] "sewer" "access" "zoning" "mobile" "stratainfo"
## [21] "style" "age" "construction" "foundation" "exterior"
## [26] "bsmttype" "bsmtdev" "insulceil" "insulwalls" "roof"
## [31] "heating" "fuel" "aircond" "parking" "title"
## [36] "restrictions" "taxes" "taxyear" "stratafee" "insulation"
nrow(nanaimo)
## [1] 246
The dummy variable “builtAfter2000” will hold 1 if the house was built in or after 2000. It will hold 0 for all houses built before 2000.
The variables used for this analysis are “price”, “bed” , “builtAfter1995” and “builtAfter2000”
builtAfter2000 <- factor(ifelse( nanaimo$age >= '2000', 1, 0))
builtAfter1995 <- factor(ifelse( nanaimo$age >= '1995', 1, 0))
nanaimo <- cbind(nanaimo, builtAfter1995)
nanaimo <- cbind(nanaimo, builtAfter2000)
head(nanaimo[,c("price","bed", "age", "builtAfter1995", "builtAfter2000")])
## price bed age builtAfter1995 builtAfter2000
## 2 19900 2 1975 0 0
## 3 25000 1 2008 1 1
## 5 32500 2 1980 0 0
## 6 33000 2 1983 0 0
## 7 33500 2 2007 1 1
## 21 54000 2 2005 1 1
Using geom_point() and geom_smooth()… you know, jus cuz ;D… to visualize the relationship between the selected variables.
library( ggplot2 )
ggplot( data = nanaimo, mapping = aes( x = bed, y = price, colour = builtAfter2000 ) ) +
geom_point( alpha = .4 ) +
geom_smooth( method = 'lm',formula = y ~ x, se = FALSE )
ggplot( data = nanaimo, mapping = aes( x = bath, y = price, colour = builtAfter1995 ) ) +
geom_point( alpha = .4 ) +
geom_smooth( method = 'lm',formula = y ~ x, se = FALSE )
A linear model is generated using the lm() function to check the effect of the dummy variable and the number of beds on the housing prices in Nanaimo.
mhousing <- lm(formula = price ~ builtAfter2000 + bed + bed * builtAfter2000, data = nanaimo)
summary(mhousing)
##
## Call:
## lm(formula = price ~ builtAfter2000 + bed + bed * builtAfter2000,
## data = nanaimo)
##
## Residuals:
## Min 1Q Median 3Q Max
## -391777 -130880 -37568 96451 622067
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 146423 48940 2.992 0.00306 **
## builtAfter20001 -16219 71996 -0.225 0.82196
## bed 60170 15164 3.968 9.55e-05 ***
## builtAfter20001:bed 50125 23646 2.120 0.03504 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 196400 on 242 degrees of freedom
## Multiple R-squared: 0.2192, Adjusted R-squared: 0.2095
## F-statistic: 22.65 on 3 and 242 DF, p-value: 5.881e-13
mhousing$coefficients
## (Intercept) builtAfter20001 bed builtAfter20001:bed
## 146422.46 -16218.79 60170.24 50124.49
bedBefore2000 <- mhousing$coefficient[3]
bedAfter2000 <- sum(mhousing$coefficient[3:4])
mhousing <- lm(formula = price ~ builtAfter1995 + bath + bath * builtAfter1995, data = nanaimo)
summary(mhousing)
##
## Call:
## lm(formula = price ~ builtAfter1995 + bath + bath * builtAfter1995,
## data = nanaimo)
##
## Residuals:
## Min 1Q Median 3Q Max
## -344914 -112674 -40624 72355 682391
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 136236 37324 3.650 0.000321 ***
## builtAfter19951 -39627 55714 -0.711 0.477612
## bath 96549 16394 5.889 1.29e-08 ***
## builtAfter19951:bath 40104 23425 1.712 0.088181 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 183900 on 242 degrees of freedom
## Multiple R-squared: 0.3152, Adjusted R-squared: 0.3067
## F-statistic: 37.13 on 3 and 242 DF, p-value: < 2.2e-16
mhousing$coefficients
## (Intercept) builtAfter19951 bath
## 136235.49 -39626.85 96548.93
## builtAfter19951:bath
## 40103.62
bathBefore1995 <- mhousing$coefficient[3]
bathAfter1995 <- sum(mhousing$coefficient[3:4])
# Cost of an extra bath
bathBefore1995
## bath
## 96548.93
bathAfter1995
## [1] 136652.6
The observations made from the graph using geom_point() and geom_smooth(), along with the lm() function with the selected variables are:
And also, it’s costlier to buy a house with an extra bath if it has a bath already and if the house is built after the year 1995