For this project I am using The following dataset: “Hedonic Prices of Cencus Tracts in Boston” Dataset Documentation is available at: http://vincentarelbundock.github.io/Rdatasets/doc/Ecdat/Hedonic.html

require(Rcurl)
library(ggplot2)
url<-"http://vincentarelbundock.github.io/Rdatasets/csv/MASS/Boston.csv"
rawdata <- read.table(url, header=TRUE, sep=",")
knitr::kable(head(rawdata))
X crim zn indus chas nox rm age dis rad tax ptratio black lstat medv
1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0
2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6
3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2
6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21 28.7

Crime rate to proportion of owner units built prior to 1940 scatterplot. The FOllowing scatterplot clearly shows that Crime rates are higher in areas where units where built prior ro 1940.

ggplot(rawdata, aes(x=age, y=crim)) + geom_point() +xlab('Proportion of owner units built prior to 1940') +ylab('Crime Rate')

Crime rate to weighted distances to five employment centers in the Boston area. The following scatterplot shows that the crime rate is usually lower further away from boston five employment centers and is the higherst crime rates are within 2.5 miles of the employment centers

ggplot(rawdata, aes(x=dis, y=crim)) + geom_point() +xlab('weighted distances to five employment centers in the Boston area') + ylab('Crime Rate')

Following Histogram shows the distribution of median value of owner occupied homes

ggplot(rawdata, aes(x=medv)) + geom_histogram(binwidth=4) +xlab('median value of owner-occupied homes')

Following Histogram shows distribution areas where units where built prior to 1940

ggplot(rawdata, aes(x=age)) + geom_histogram(binwidth=4) + xlab('Proportion of owner units built prior to 1940')

Following Boxplot shows relationship of Medium Value to Crime

ggplot(rawdata, aes(x=crim, y=medv)) + geom_boxplot() + xlab('Crime') +ylab('Median value of owner-occupied homes')

The following scatter plot shows that areas around rad 24( Possibly Highway 24) have the highest crime rate

ggplot(rawdata, aes(x=age, y=crim, shape=as.factor(chas), colour=as.factor(rad))) + geom_point() + xlab('Proportion of owner units built prior to 1940') + ylab('Crime')

Following scatter plots show relationship of Median value of owner occupied homes to number of rooms subdivided by rad(Highways)

ggplot(rawdata, aes(x=rm, y=medv, colour=as.factor(chas))) + geom_point() + facet_grid( rad ~ .) + xlab('Number of Rooms') + ylab('Median value of owner-occupied homes')