Introduction

The assignment will use three variables in the Nanaimo dataset to explain house prices.

Creation of new Data Frame

A new data frame was created to include the following variables “price”,“Area”,“bedroom” and “age”

# Store the url value to url

url<-"http://latul.be/mbaa_531/data/nanaimo.csv"

# Read the data from the file nanaimo.csv and store  all the observation in a data.frame named nanaimo.
nanaimo<-read.csv(url,stringsAsFactors = TRUE)

nanaimo<- nanaimo[,c("price","area","bed","age")]
nanaimo<-na.omit(nanaimo)
nanaimo$age <- ifelse(nanaimo$age>2000, "after 2000","before 2000")

Dataset

head(nanaimo, n=4)
##   price area bed         age
## 2 19900  720   2 before 2000
## 3 25000  946   1  after 2000
## 5 32500  672   2 before 2000
## 6 33000  784   2 before 2000
library(ggplot2)
ggplot(data = nanaimo, mapping = aes(x = area, y = price, colour = age)) +
  geom_point(alpha = .6) +
  geom_smooth(formula = y ~ x,method = 'lm',se = FALSE) # geom_smooth()` using method = 'lm' and formula 'y ~ x'

Conclusion

The graph showed that the area and the age of a building have effects on house prices.