Cities all over north America contain “historic” neighborhoods. Historic neighborhoods are areas where the buildings have historical importance not because of their individual significance but because as a collection they represent a the architectural sensibilities of a particular time period. In Boulder, for example, the Martin Acres subdivision was recently surveyed for possible inclusion in a historic district. Establishing the “significance” of a neighborhood's historical character is a matter for historians not statisticians. However, there is an important question about historic preservation - does it help or hurt property values?
Once an area is designated a historic district development restrictions come into force. Typically, these restrictions aim to preserve the historic “look and feel” of the buildings in a neighborhood and thus restrict major modifications to the buildings. The owners of property in historic districts often face significant expenses and restrictions. For example, they might have to maintain the original facade on their building as opposed to replacing it with something more economical like vinyl-siding. As a result of these restrictions the historic districts are often opposed by residents.
In Manhattan there are a number of historic districts. In this exercise we will exploit a database describing every building on the island of Manhattan. Your job is to answer the question - does the designation of historic districts affect the value of buildings? This question has important policy significance, the 5th Amendment to the U.S. Constitution, states, “. nor shall private property be taken for public use without just compensation.” If designating an area as a historic district reduces the values of a property owners might be able to sue the government for compensation.
To answer this question you will have to identify all of the buildings that are a part of a historic district and compare the value of those historic properties to properties outside of a historic district. You have to be careful to construct a meaningful comparison. Many factors influence the value of a property, as best as you can you must take those factors into account.
# install.packages('foreign') for the first time using
# Load the 'foreign' library in order to read dbase files
library(foreign)
# Import the dbf. data
MN <- read.dbf("C:/Users/Li Xu/Documents/aaa/CU Boulder/GEOG5023/Homework 2 Hypothesis Testing/mnmappluto.dbf")
# or MN<-read.dbf(choose.files())
# Remove data with incorrect location information
MN <- MN[MN$YCoord > 0 & MN$XCoord > 0, ]
# Draw data by locations
plot(MN$YCoord ~ MN$XCoord)
# Create a dummy variable HD(1=in a historic district, 0=not in a historic
# district)
MN$HD <- ifelse(is.na(MN[, "HistDist"]), 0, 1)
# convert MN$HD to a factor
MN$HD <- as.factor(MN$HD)
# note how the summary changes after changing the 'HD' column to a factor
summary(MN$HD)
## 0 1
## 34024 9294
# Draw a map of historic districts.(Red=in a HD, Black=not in a HD)
#'col' changes the color of dots depending upon the value in the 'HD' column
#'pch' sets the symbol to a solid dot
#'cex' makes the dot .5 the normal size
plot(y = MN$YCoord, x = MN$XCoord, col = MN$HD, pch = 16, cex = 0.5)
# split the 'MN' object based on whether a building is in or out of a
# historic district inHD stores all buildings in a historic distric (MN$HD
# = 1)
inHD <- MN[MN$HD == 1, ]
# outHD stores all buildings outside of a historic distric (MN$HD = 0)
outHD <- MN[MN$HD == 0, ]
# Run a t-test with a null hypothesis: there is no significant difference
# between the values of the buildings in historic districts and those
# outside of historic districts
t.test(x = inHD$AssessTot, y = outHD$AssessTot) #Hypothesis Test 1
##
## Welch Two Sample t-test
##
## data: inHD$AssessTot and outHD$AssessTot
## t = -15.05, df = 43286, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1724546 -1327117
## sample estimates:
## mean of x mean of y
## 1233743 2759574
# Run a t-test with the null hypothesis: there is no significant
# difference between the building sizes in historic districts and those
# outside of historic districts
t.test(x = inHD$BldgArea, y = outHD$BldgArea) #Hypothesis Test 2
##
## Welch Two Sample t-test
##
## data: inHD$BldgArea and outHD$BldgArea
## t = -9.037, df = 15819, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -25103 -16154
## sample estimates:
## mean of x mean of y
## 22050 42678
Location is an important component of a property's value. To test the impact of a historic district designation we should revise our test to examine only buildings that have similar locations. One way to do this is to identify buildings that are close to but outside of historic districts. Each building in the database has a block number. Lets revise outHD so that it only includes buildings which are on the same block as a historic district but outside of the district boundaries.
# Get a list of all blocks that contain historic buildings
blocks <- inHD$Block
# Select all buildings (from MN) that are on the same block as historic
# buildings. The line below selects all rows where the block column
# contains values in our list of blocks Save the result as a new object
HDB <- MN[MN$Block %in% blocks, ]
# Create the object HDB_out to include buildings outside of HDs but in the
# same block with any buildings in HDs.
HDB_out <- HDB[HDB$HD == 0, ]
# Create the object HDB_in to include buildings in HDs and in historic
# district blocks.
HDB_in <- HDB[HDB$HD == 1, ]
# Run a t-test after controlling for location factor.
t.test(x = HDB_in$AssessTot, y = HDB_out$AssessTot) #Hypothesis Test 3
##
## Welch Two Sample t-test
##
## data: HDB_in$AssessTot and HDB_out$AssessTot
## t = -9.728, df = 4349, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1507426 -1001727
## sample estimates:
## mean of x mean of y
## 1233743 2488319
The size of the building is an important determinant of its value. In hypothesis test 3 we did not control for the size of the building, we can do this by calculating the price per square foot:
# We have a problem. Some buildings have 0 area (square footage).
summary(HDB_in$BldgArea)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 4160 6370 22100 13000 17600000
# this could mean the lot is vacant, it could be an error. either way it
# makes it hard to compute the price per square foot. We need to exlude
# these zero area buildings from out t-test
# Calcuate price per square foot for historic buildings Only for buildings
# with an area greater than 0
HDB_in_sqft <- HDB_in[HDB_in$BldgArea > 0, "AssessTot"]/HDB_in[HDB_in$BldgArea >
0, "BldgArea"]
# Calcuate price per square foot for non-historic buildings
HDB_out_sqft <- HDB_out[HDB_out$BldgArea > 0, "AssessTot"]/HDB_out[HDB_out$BldgArea >
0, "BldgArea"]
# Perform the t-test
t.test(x = HDB_in_sqft, y = HDB_out_sqft) #Hypothesis Test 4
##
## Welch Two Sample t-test
##
## data: HDB_in_sqft and HDB_out_sqft
## t = -1.664, df = 4521, p-value = 0.09614
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -36.413 2.976
## sample estimates:
## mean of x mean of y
## 66.76 83.48
First of all, we need to know if the variable 'YCoord' is numeric to be used for cor(), and how it is related to location.
summary(MN$YCoord)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 189000 208000 220000 220000 231000 259000
It seems that 'YCoord' is a numeric variable that can be used for cor() function. Although we don't know the coordinate system and projection used in this dataset, it is apparent that the 'YCoord' is not latitude, instead it is very likely to be a Cartesian coordinate such as in Mercator. Thus the greater value is associated with a location farther north.
By plotting YCoords against Building values, we cannot see a apparent correlation between the two variables. However, there is a peak in buidling values around the Ycoords of 210000~220000, which is the mid-south of the whole area.
plot(MN$YCoord, MN$AssessTot)
Now perform a correlation test to have a closer inspection.
cor.test(MN$YCoord, MN$AssessTot)
##
## Pearson's product-moment correlation
##
## data: MN$YCoord and MN$AssessTot
## t = -12.43, df = 43316, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.06902 -0.05025
## sample estimates:
## cor
## -0.05964
The results (p<2.2e-16) suggest that the observed correlation between north-south position and building value is significantly different from 0. Although the correlation is relatively close to zero (-0.05964), it is not unusal in such a large dataset. According to previous analysis (downtown=smaller YCoord, and uptown=larger YCoord), we can say that the negative correlation of -0.05964 indicates that the prices go up from uptown building to downtown building.
Since we already know from Test 4 that building size is an important component in determining building value, it might be helpful to exclude the effect of building size before making a conclusion.
# create an object 'MN_sqft' to store price per square foot for all
# buildings with size>0
MN_sqft <- MN[MN$BldgArea > 0, "AssessTot"]/MN[MN$BldgArea > 0, "BldgArea"]
MN_sqft <- MN[MN$BldgArea > 0, "AssessTot"]/MN[MN$BldgArea > 0, "BldgArea"]
# create an object 'MN_Y' to store YCoords for all buildings with size>0
MN_Y <- MN[MN$BldgArea > 0, "YCoord"]
# run the correlation test with controlling for building size
cor.test(MN_Y, MN_sqft)
##
## Pearson's product-moment correlation
##
## data: MN_Y and MN_sqft
## t = 0.2044, df = 40826, p-value = 0.8381
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.008689 0.010711
## sample estimates:
## cor
## 0.001011
The p-value suggests that we cannot reject the null hypothesis that the observed correlation is equal to 0.
layout(matrix(c(1, 2, 0, 0), 2, 2, byrow = TRUE))
plot(y = MN$YCoord, x = MN$XCoord, col = MN$HD, pch = 16, cex = 0.5, main = "Map of Manhattan Buildings",
xlab = "X Coordinate", ylab = "Y Coordinate")
plot(y = MN$AssessTot, x = MN$YCoord, main = "Building Value vs Y Coordinate(North-South\n Position)",
xlab = "Y Coordinate", ylab = "Building Value")
Created by: Li Xu; Created on: 02/05/2013; Updated on: 02/06/2013