Find interesting dataset and prepare short report (in R Markdown) which will consists: -* short description of the dataset, -* 3 scatterplots which will present interesting relationships between variables, -* brief comments which describes obtained results.
Then, edit theme of the graphs and all scales of the graph and prepare
publication-ready plots.
Boston Housing Data consists of price of house in suburbs of Boston. The median value variable ‘medv’ is the dependent variable which might be dependent on a set/all other predictor variables of this dataset such as crime rate in the vicinity, accessibility in terms of distance, pollution levels et cetera.
Boston Housing Data comes with the MASS library.
Source: Library
library(ggplot2)
library(dplyr)
##
## 載入套件:'dplyr'
## 下列物件被遮斷自 'package:stats':
##
## filter, lag
## 下列物件被遮斷自 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
boston <- read.csv('/Users/jeank4723/Desktop/Advance VR/1/Data/boston.csv', header = T, dec = ',', sep = ';')
head(boston)
## CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
## 1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
## 2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
## 3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
## 4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
## 5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
## 6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21
## MEDV
## 1 24.0
## 2 21.6
## 3 34.7
## 4 33.4
## 5 36.2
## 6 28.7
str(boston)
## 'data.frame': 506 obs. of 14 variables:
## $ CRIM : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ ZN : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ INDUS : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ CHAS : int 0 0 0 0 0 0 0 0 0 0 ...
## $ NOX : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ RM : num 6.58 6.42 7.18 7 7.15 ...
## $ AGE : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ DIS : num 4.09 4.97 4.97 6.06 6.06 ...
## $ RAD : int 1 2 2 3 3 3 5 5 5 5 ...
## $ TAX : int 296 242 242 222 222 222 311 311 311 311 ...
## $ PTRATIO: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ B : num 397 397 393 395 397 ...
## $ LSTAT : num 4.98 9.14 4.03 2.94 5.33 ...
## $ MEDV : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
p1 <- ggplot(data = boston, aes(x = TAX,
y = CRIM
))
p1 +
geom_point() +
theme_classic()
2. LSTAT and CRIM According to the plot, we obtain the result that the crime rate is slightly proportional to the lower status of the population. We can say that when lower status of the population is more than 10% the crime rate per capita by town rapidly increasing.
p2 <- ggplot(data = boston, aes(x = LSTAT,
y = CRIM,
color = TAX
))
p2 + geom_point()
3. The plot shows that average number of rooms per dwelling and full-value property-tax rate per $10,000 have low relevance. It means that high number of rooms does not related to high amount of full-value property-tax rate.
library(RColorBrewer)
p3 <- ggplot(data = boston, aes(x = TAX,
y = RM,
color = ZN
))
p3 +
geom_point() +
labs(x = "full-value property-tax rate per $10,000",
y = "average number of rooms per dwelling",
title = "TAX and Average Number of rooms relation")+
theme(axis.title = element_text(color = "black"),
panel.background = element_rect(fill = "gray")) +
guides(color = guide_legend(label.theme = element_text(size = 10,
colour = "brown",
angle = 0),
label.position = "left"))