For this project I am using The following dataset: “Hedonic Prices of Cencus Tracts in Boston” Dataset Documentation is available at: http://vincentarelbundock.github.io/Rdatasets/doc/Ecdat/Hedonic.html
require(Rcurl)
library(ggplot2)
url<-"http://vincentarelbundock.github.io/Rdatasets/csv/MASS/Boston.csv"
rawdata <- read.table(url, header=TRUE, sep=",")
knitr::kable(head(rawdata))
| X | crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | black | lstat | medv |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.00632 | 18 | 2.31 | 0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1 | 296 | 15.3 | 396.90 | 4.98 | 24.0 |
| 2 | 0.02731 | 0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2 | 242 | 17.8 | 396.90 | 9.14 | 21.6 |
| 3 | 0.02729 | 0 | 7.07 | 0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2 | 242 | 17.8 | 392.83 | 4.03 | 34.7 |
| 4 | 0.03237 | 0 | 2.18 | 0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3 | 222 | 18.7 | 394.63 | 2.94 | 33.4 |
| 5 | 0.06905 | 0 | 2.18 | 0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3 | 222 | 18.7 | 396.90 | 5.33 | 36.2 |
| 6 | 0.02985 | 0 | 2.18 | 0 | 0.458 | 6.430 | 58.7 | 6.0622 | 3 | 222 | 18.7 | 394.12 | 5.21 | 28.7 |
Crime rate to proportion of owner units built prior to 1940 scatterplot. The FOllowing scatterplot clearly shows that Crime rates are higher in areas where units where built prior ro 1940.
ggplot(rawdata, aes(x=age, y=crim)) + geom_point() +xlab('Proportion of owner units built prior to 1940') +ylab('Crime Rate')
Crime rate to weighted distances to five employment centers in the Boston area. The following scatterplot shows that the crime rate is usually lower further away from boston five employment centers and is the higherst crime rates are within 2.5 miles of the employment centers
ggplot(rawdata, aes(x=dis, y=crim)) + geom_point() +xlab('weighted distances to five employment centers in the Boston area') + ylab('Crime Rate')
Following Histogram shows the distribution of median value of owner occupied homes
ggplot(rawdata, aes(x=medv)) + geom_histogram(binwidth=4) +xlab('median value of owner-occupied homes')
Following Histogram shows distribution areas where units where built prior to 1940
ggplot(rawdata, aes(x=age)) + geom_histogram(binwidth=4) + xlab('Proportion of owner units built prior to 1940')
Following Boxplot shows relationship of Medium Value to Crime
ggplot(rawdata, aes(x=crim, y=medv)) + geom_boxplot() + xlab('Crime') +ylab('Median value of owner-occupied homes')
The following scatter plot shows that areas around rad 24( Possibly Highway 24) have the highest crime rate
ggplot(rawdata, aes(x=age, y=crim, shape=as.factor(chas), colour=as.factor(rad))) + geom_point() + xlab('Proportion of owner units built prior to 1940') + ylab('Crime')
Following scatter plots show relationship of Median value of owner occupied homes to number of rooms subdivided by rad(Highways)
ggplot(rawdata, aes(x=rm, y=medv, colour=as.factor(chas))) + geom_point() + facet_grid( rad ~ .) + xlab('Number of Rooms') + ylab('Median value of owner-occupied homes')