A study on the relationship between the clean Air and valuation of House Property

In real estate, there is a famous saying, the most important thing is location, location, location.

In this section, we will be talking about Boston in a sense of the greater Boston area. We will make an attempt to study the pollution emission level of different parts of the city and will try to find out that whether or not there is any relationship between Property Valuation and Pollution Level ? This data comes from a paper, “Hedonic Housing Prices uci and the Demand for Clean Air,” which has been cited more than 1,000 times. This paper was written on a relationship between house prices and clean air in the late 1970s by David Harrison of Harvard and Daniel Rubinfeld of the University of Michigan. Let us first understand the data. Each entry of this data set corresponds to a census tract, a statistical division of the area that is used by researchers to break down towns and cities. As a result, there will usually be multiple census tracts per town.

  • LON and LAT are the longitude and latitude of the center of the census tract.
  • MEDV is the median value of owner-occupied homes, measured in thousands of dollars.
  • CRIM is the per capita crime rate.
  • ZN is related to how much of the land is zoned for large residential properties.
  • INDUS is the proportion of the area used for industry.
  • NOX is the concentration of nitrous oxides in the air, a measure of air pollution.
  • RM is the average number of rooms per dwelling.
  • AGE is the proportion of owner-occupied units built before 1940.
  • DIS is a measure of how far the tract is from centers of employment in Boston.
  • RAD is a measure of closeness to important highways.
  • TAX is the property tax per $10,000 of value.
  • PTRATIO is the pupil to teacher ratio by town.

Here we will only use few parameters in order to understand the relationship. We will consider Longitude and latitude to plot the location, MEDV for the valuation of owner occupied house, NOX the level of nitrous oxide in the air. Let us first see the relationship between the NOX emission level and the valuation of the property. We will plot the data using ggplot.

#Read the dataset

boston=read.csv("boston.csv")

## Load the ggplot2

library(ggplot2)

ggplot(data=boston,aes(x=MEDV,y=NOX))+geom_point(alpha=0.6,colour="green")+theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))+xlab("Property Value(thousand dollars)")+ylab("Nox Emission")

## We can clearly see that there is negative correlation between the two,

We will plot a smooth curve to see the slope of the line. There are places in greater Boston where the emission level as high as 0.8 and the property value is low. These are the industry zone and we will later see the high and low industrial belt.

ggplot(data=boston,aes(x=MEDV,y=NOX))+geom_point(alpha=0.6,colour="green")+theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))+geom_smooth(colour="#996600",span=0.6)+xlab("Property Value(thousand dollars)")+ylab("Nox Emission")
## `geom_smooth()` using method = 'loess'

# Plot the emission level with the ggmap()

Here we first find out the mean or the average level of pollution levels. So we will get the mean of NOX emission level and for our clear understanding,

Let us assume the mean as the benchmark of appropriate level of NOX emission level.

## Load the necessary Packages

library(ggmap)

## Create dataframe for above mean of Air Pollution

AirPollution_abovemean<-data.frame(boston$LON[boston$NOX>=0.5547],
                                   boston$LAT[boston$NOX>=0.5547],
                                   boston$NOX[boston$NOX>=0.5547],
                                  boston$MEDV[boston$NOX>=0.5547])
colnames(AirPollution_abovemean)=c("LON","LAT","NOX","Price")
names(AirPollution_abovemean)



## Create the dataframe where the nox emission is lower than mean value

AirPollution_belowmean=data.frame(boston$LON[boston$NOX<=0.5547],
                                  boston$LAT[boston$NOX<=0.5547],
                                  boston$NOX[boston$NOX<=0.5547],
                                  boston$MEDV[boston$NOX<=0.5547])

colnames(AirPollution_belowmean)=c("LON","LAT","NOX","Price")
## [1] "LON"   "LAT"   "NOX"   "Price"

Note

The command qmap is the short form for ggmap(get_map(“”,…))

## Create the map for emission level above 0.55

Boston=qmap("boston",zoom=10,legend='topleft',extent='device')
Boston+geom_point(aes(x=LON,y=LAT,size=NOX),alpha=0.1,colour="#330000",data=AirPollution_abovemean)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=boston&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=boston&sensor=false
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead

The black zone is the high pollution zone and we can also see the minimum NOX emission level is starting from 0.6

## Create the map for emission level below 0.55

Boston+geom_point(aes(x=LON,y=LAT,size=NOX),colour="#66FF00",
                  data=AirPollution_belowmean)
## Warning: Removed 1 rows containing missing values (geom_point).

The green zone is the low pollution zone

## Combined plot of Pollution zone

Boston+geom_point(aes(x=LON,y=LAT,size=NOX),alpha=0.1,colour="#FF3300",
                  data=AirPollution_abovemean)+
  geom_point(aes(x=LON,y=LAT,size=NOX),colour="#99CC00",
             data=AirPollution_belowmean)
                  
## Warning: Removed 1 rows containing missing values (geom_point).

## Pollution Zone # Area with High and low NOX emission level

## Create the density plot for the high Pollution zone

High_NOX<-stat_density2d(aes(x=LON,y=LAT,fill=..level..,alpha=..level..),
                         size=3,bins=8,data=AirPollution_abovemean,
                         geom='polygon')
Boston+scale_fill_continuous(guide = guide_legend(title = "Zone Level"))+High_NOX
  

## Create the density plot for the low Pollution zone

Low_NOX<-stat_density2d(aes(x=LON,y=LAT,fill=..level..,alpha=0.1),colour="#66CC66",
                        size=3,bins=3,data=AirPollution_belowmean,
                        geom='polygon')
Boston+scale_fill_continuous(guide = guide_legend(title = "Zone Level"))+Low_NOX

## Combine both zone

CZ<-Boston+scale_fill_continuous(guide = guide_legend(title = "Zone Level"))+High_NOX+Low_NOX
CZ

## Warning: Removed 1 rows containing non-finite values (stat_density2d).

## Warning: Removed 1 rows containing non-finite values (stat_density2d).

## Plot the Property Value for the Greater Boston City Here we will plot the property value for the entire Boston city and we will see the different valuation in two different zones of Pollution.

combined_df<-rbind(AirPollution_belowmean,AirPollution_abovemean)

## Plot the valuation of property for both the zone

CZ+geom_point(aes(x=LON,y=LAT,size=Price),
                                   data=combined_df,colour="#990099",alpha=0.18)+scale_fill_continuous(guide = guide_legend(title = ""))
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
## Warning: Removed 1 rows containing non-finite values (stat_density2d).
## Warning: Removed 1 rows containing missing values (geom_point).

There are other style of map in ggmap(). Please check the other maps by using the below commands,

Longest form

You can use the same data set and plot it on the differnt style of map. Remember each style has it own purpose. For instance, you can use the Indus and ZN parameter in #stamen style to better understand the industrial zone of the Boston city.