Boston Housing Data

Find interesting dataset and prepare short report (in R Markdown) which will consists: -* short description of the dataset, -* 3 scatterplots which will present interesting relationships between variables, -* brief comments which describes obtained results.

Then, edit theme of the graphs and all scales of the graph and prepare

publication-ready plots.

Introduction

Boston Housing Data consists of price of house in suburbs of Boston. The median value variable ‘medv’ is the dependent variable which might be dependent on a set/all other predictor variables of this dataset such as crime rate in the vicinity, accessibility in terms of distance, pollution levels et cetera.
Boston Housing Data comes with the MASS library.

Source: Library

Data Variables

1. CRIM | per capita crime rate by town
2.ZN | proportion of residential land zoned for lots over 25,000 sq.ft
3.INDUS | proportion of non-retail business acres per town
4.CHAS | Charles River dummy variable (1 if tract bounds river; else 0)
5.NOX | nitric oxides concentration (parts per 10 million)
6.RM | average number of rooms per dwelling
7.AGE | proportion of owner-occupied units built prior to 1940
8.DIS | weighted distances to five Boston employment centres
9.RAD | index of accessibility to radial highways
10.TAX | full-value property-tax rate per $10,000
11.PTRATIO | pupil-teacher ratio by town
12.B | 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
13.LSTAT | % lower status of the population
1. MEDV | Median value of owner-occupied homes in $1000’s

library(ggplot2)
library(dplyr)

## 
## 載入套件：'dplyr'

## 下列物件被遮斷自 'package:stats':
## 
##     filter, lag

## 下列物件被遮斷自 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
boston <- read.csv('/Users/jeank4723/Desktop/Advance VR/1/Data/boston.csv', header = T, dec = ',', sep = ';')

head(boston)

##      CRIM ZN INDUS CHAS   NOX    RM  AGE    DIS RAD TAX PTRATIO      B LSTAT
## 1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90  4.98
## 2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90  9.14
## 3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83  4.03
## 4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63  2.94
## 5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90  5.33
## 6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12  5.21
##   MEDV
## 1 24.0
## 2 21.6
## 3 34.7
## 4 33.4
## 5 36.2
## 6 28.7

str(boston)

## 'data.frame':    506 obs. of  14 variables:
##  $ CRIM   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
##  $ ZN     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
##  $ INDUS  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
##  $ CHAS   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ NOX    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
##  $ RM     : num  6.58 6.42 7.18 7 7.15 ...
##  $ AGE    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
##  $ DIS    : num  4.09 4.97 4.97 6.06 6.06 ...
##  $ RAD    : int  1 2 2 3 3 3 5 5 5 5 ...
##  $ TAX    : int  296 242 242 222 222 222 311 311 311 311 ...
##  $ PTRATIO: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
##  $ B      : num  397 397 393 395 397 ...
##  $ LSTAT  : num  4.98 9.14 4.03 2.94 5.33 ...
##  $ MEDV   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

3 Scatterplots

TAX and CRIM We can observe that there is low percentage of crime by town Also, when the full-value property-tax rate per $10,000 more than 650 the crime rate per capita by town rapidly increasing over 75%.

p1 <- ggplot(data = boston, aes(x = TAX,
                           y = CRIM
                           ))
p1 + 
  geom_point() +
  theme_classic()

2. LSTAT and CRIM According to the plot, we obtain the result that the crime rate is slightly proportional to the lower status of the population. We can say that when lower status of the population is more than 10% the crime rate per capita by town rapidly increasing.

p2 <- ggplot(data = boston, aes(x = LSTAT,
                           y = CRIM,
                           color = TAX
                           ))
p2 + geom_point()

3. The plot shows that average number of rooms per dwelling and full-value property-tax rate per $10,000 have low relevance. It means that high number of rooms does not related to high amount of full-value property-tax rate.

library(RColorBrewer)
p3 <- ggplot(data = boston, aes(x = TAX,
                           y = RM,
                           color = ZN
                           ))
p3 + 
  geom_point() +
  labs(x = "full-value property-tax rate per $10,000",
       y = "average number of rooms per dwelling",
       title = "TAX and Average Number of rooms relation")+
  theme(axis.title = element_text(color = "black"), 
    panel.background = element_rect(fill = "gray")) +
  guides(color = guide_legend(label.theme = element_text(size = 10,
                                                         colour = "brown",
                                                         angle = 0),
                              label.position = "left"))

Boston Housing Data

Min-Jhen Wu

2021/10/21

Introduction

Data Variables

3 Scatterplots