Let’s load the houseprice data.
For this practice, I will use the houseprices dataset. This dataset includes the following:
The floor area
price
the number of bedrooms
for a sample of houses sold in Aranda in 1999. Aranda is a suburb of Canberra, Australia.
list.of.packages <- c("Stat2Data", "datasets", "boot", "HistData")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
library(HistData)
library(Stat2Data)
houseprices <- read.csv("houseprices.csv")
dim(houseprices)
## [1] 15 3
This dataset is at the house level (1 observation = 1 house) The dataset has 15 observations and 3 variables. I used R code to compute this and display it inline (i.e. calling dim(houseprices)[.]), but in case you don’t believe these numbers, you can prove it with an str statement.
str(houseprices)
## 'data.frame': 15 obs. of 3 variables:
## $ area : int 694 905 802 1366 716 963 821 714 1018 887 ...
## $ bedrooms : int 4 4 4 4 4 4 4 4 4 4 ...
## $ sale.price: num 192 215 215 274 113 ...
As shown in the above table, thes dataset is perfectly equipped to examine the determinants of house prices, making it a very strong piece of research.
summary(houseprices)
## area bedrooms sale.price
## Min. : 694.0 Min. :4.000 Min. :112.7
## 1st Qu.: 743.5 1st Qu.:4.000 1st Qu.:213.5
## Median : 821.0 Median :4.000 Median :221.5
## Mean : 889.3 Mean :4.333 Mean :237.7
## 3rd Qu.: 984.5 3rd Qu.:4.500 3rd Qu.:267.0
## Max. :1366.0 Max. :6.000 Max. :375.0
I use inline r syntax to compute min, median, and max of “sale.price”
Here’s a pairs plot of the data.
pairs(houseprices)
Here’s a regression model of sale price on area and bedrooms.
fit<- lm(sale.price ~ area + bedrooms, data = houseprices)
summary(fit)
##
## Call:
## lm(formula = sale.price ~ area + bedrooms, data = houseprices)
##
## Residuals:
## Min 1Q Median 3Q Max
## -80.897 -4.247 1.539 13.249 42.027
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -141.76132 67.87204 -2.089 0.05872 .
## area 0.14255 0.04697 3.035 0.01038 *
## bedrooms 58.32375 14.75962 3.952 0.00192 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 33.06 on 12 degrees of freedom
## Multiple R-squared: 0.731, Adjusted R-squared: 0.6861
## F-statistic: 16.3 on 2 and 12 DF, p-value: 0.0003792
Both the number of bedrooms and the floor area are statistically significant at 1% and 5% level, respectively.