In this section, we will be talking about Boston in a sense of the greater Boston area. We will make an attempt to study the pollution emission level of different parts of the city and will try to find out that whether or not there is any relationship between Property Valuation and Pollution Level ? This data comes from a paper, “Hedonic Housing Prices uci and the Demand for Clean Air,” which has been cited more than 1,000 times. This paper was written on a relationship between house prices and clean air in the late 1970s by David Harrison of Harvard and Daniel Rubinfeld of the University of Michigan. Let us first understand the data. Each entry of this data set corresponds to a census tract, a statistical division of the area that is used by researchers to break down towns and cities. As a result, there will usually be multiple census tracts per town.
Here we will only use few parameters in order to understand the relationship. We will consider Longitude and latitude to plot the location, MEDV for the valuation of owner occupied house, NOX the level of nitrous oxide in the air. Let us first see the relationship between the NOX emission level and the valuation of the property. We will plot the data using ggplot.
#Read the dataset
boston=read.csv("boston.csv")
## Load the ggplot2
library(ggplot2)
ggplot(data=boston,aes(x=MEDV,y=NOX))+geom_point(alpha=0.6,colour="green")+theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))+xlab("Property Value(thousand dollars)")+ylab("Nox Emission")
## We can clearly see that there is negative correlation between the two,
We will plot a smooth curve to see the slope of the line. There are places in greater Boston where the emission level as high as 0.8 and the property value is low. These are the industry zone and we will later see the high and low industrial belt.
ggplot(data=boston,aes(x=MEDV,y=NOX))+geom_point(alpha=0.6,colour="green")+theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))+geom_smooth(colour="#996600",span=0.6)+xlab("Property Value(thousand dollars)")+ylab("Nox Emission")
## `geom_smooth()` using method = 'loess'
# Plot the emission level with the ggmap()
Here we first find out the mean or the average level of pollution levels. So we will get the mean of NOX emission level and for our clear understanding,
Let us assume the mean as the benchmark of appropriate level of NOX emission level.
## Load the necessary Packages
library(ggmap)
## Create dataframe for above mean of Air Pollution
AirPollution_abovemean<-data.frame(boston$LON[boston$NOX>=0.5547],
boston$LAT[boston$NOX>=0.5547],
boston$NOX[boston$NOX>=0.5547],
boston$MEDV[boston$NOX>=0.5547])
colnames(AirPollution_abovemean)=c("LON","LAT","NOX","Price")
names(AirPollution_abovemean)
## Create the dataframe where the nox emission is lower than mean value
AirPollution_belowmean=data.frame(boston$LON[boston$NOX<=0.5547],
boston$LAT[boston$NOX<=0.5547],
boston$NOX[boston$NOX<=0.5547],
boston$MEDV[boston$NOX<=0.5547])
colnames(AirPollution_belowmean)=c("LON","LAT","NOX","Price")
## [1] "LON" "LAT" "NOX" "Price"
The command qmap is the short form for ggmap(get_map(“”,…))
## Create the map for emission level above 0.55
Boston=qmap("boston",zoom=10,legend='topleft',extent='device')
Boston+geom_point(aes(x=LON,y=LAT,size=NOX),alpha=0.1,colour="#330000",data=AirPollution_abovemean)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=boston&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=boston&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead
The black zone is the high pollution zone and we can also see the minimum NOX emission level is starting from 0.6
## Create the map for emission level below 0.55
Boston+geom_point(aes(x=LON,y=LAT,size=NOX),colour="#66FF00",
data=AirPollution_belowmean)
## Warning: Removed 1 rows containing missing values (geom_point).
The green zone is the low pollution zone
## Combined plot of Pollution zone
Boston+geom_point(aes(x=LON,y=LAT,size=NOX),alpha=0.1,colour="#FF3300",
data=AirPollution_abovemean)+
geom_point(aes(x=LON,y=LAT,size=NOX),colour="#99CC00",
data=AirPollution_belowmean)
## Warning: Removed 1 rows containing missing values (geom_point).
## Pollution Zone # Area with High and low NOX emission level
## Create the density plot for the high Pollution zone
High_NOX<-stat_density2d(aes(x=LON,y=LAT,fill=..level..,alpha=..level..),
size=3,bins=8,data=AirPollution_abovemean,
geom='polygon')
Boston+scale_fill_continuous(guide = guide_legend(title = "Zone Level"))+High_NOX
## Create the density plot for the low Pollution zone
Low_NOX<-stat_density2d(aes(x=LON,y=LAT,fill=..level..,alpha=0.1),colour="#66CC66",
size=3,bins=3,data=AirPollution_belowmean,
geom='polygon')
Boston+scale_fill_continuous(guide = guide_legend(title = "Zone Level"))+Low_NOX
## Combine both zone
CZ<-Boston+scale_fill_continuous(guide = guide_legend(title = "Zone Level"))+High_NOX+Low_NOX
CZ
## Warning: Removed 1 rows containing non-finite values (stat_density2d).
## Warning: Removed 1 rows containing non-finite values (stat_density2d).
## Plot the Property Value for the Greater Boston City Here we will plot the property value for the entire Boston city and we will see the different valuation in two different zones of Pollution.
combined_df<-rbind(AirPollution_belowmean,AirPollution_abovemean)
## Plot the valuation of property for both the zone
CZ+geom_point(aes(x=LON,y=LAT,size=Price),
data=combined_df,colour="#990099",alpha=0.18)+scale_fill_continuous(guide = guide_legend(title = ""))
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
## Warning: Removed 1 rows containing non-finite values (stat_density2d).
## Warning: Removed 1 rows containing missing values (geom_point).
There are other style of map in ggmap(). Please check the other maps by using the below commands,
Longest form
You can use the same data set and plot it on the differnt style of map. Remember each style has it own purpose. For instance, you can use the Indus and ZN parameter in #stamen style to better understand the industrial zone of the Boston city.