The assignment will use three variables in the Nanaimo dataset to explain house prices.
A new data frame was created to include the following variables “price”,“Area”,“bedroom” and “age”
# Store the url value to url
url<-"http://latul.be/mbaa_531/data/nanaimo.csv"
# Read the data from the file nanaimo.csv and store all the observation in a data.frame named nanaimo.
nanaimo<-read.csv(url,stringsAsFactors = TRUE)
nanaimo<- nanaimo[,c("price","area","bed","age")]
nanaimo<-na.omit(nanaimo)
nanaimo$age <- ifelse(nanaimo$age>2000, "after 2000","before 2000")
head(nanaimo, n=4)
## price area bed age
## 2 19900 720 2 before 2000
## 3 25000 946 1 after 2000
## 5 32500 672 2 before 2000
## 6 33000 784 2 before 2000
library(ggplot2)
ggplot(data = nanaimo, mapping = aes(x = area, y = price, colour = age)) +
geom_point(alpha = .6) +
geom_smooth(formula = y ~ x,method = 'lm',se = FALSE) # geom_smooth()` using method = 'lm' and formula 'y ~ x'
The graph showed that the area and the age of a building have effects on house prices.