1995 Census
2015 Census
Intensity
Complete Spatial Randomness(CSR) and Quadrat Counting
Smoothing Estimation of Intensity Function
Kernel Estimation
Contour map of Intensity
Perspective plot of intensity
Standard error of intensity
I am interested how dense NYC is with trees. I came up with two data sets which are about NYC trees and very interesting. Both data sets are in public domain(https://data.cityofnewyork.us/browse?q=tree+census). There are two censuses about NYC trees, one from 1995 and onother from 2015, which is 20 years later after 1995 census. Let’s find out whether NYC is much more dense with tress or not!
There are about 516989 entries with 30 marks. I will just mention only important ones.
Site: This field clarifies the position relative to the address for trees not located in the front of buildings.
Species: Species of the tree using a four-letter code, comprised of the first two letters of the genus followed by the first two letters of the species.
Condition: Overall tree condition. (Critical, Dead, Excellent, Fair, Good, Planting Space, Poor, Shaft, Stump, Unknown)
Wires: Whether the tree is located under utility wires. (None, Secendary, Yes)
Sidewalk_Condition: Whether the tree roots have lifted the adjacent sidewalk. (Good or Raised)
Support_Structure: Whether the tree has support structures installed. (Guard Strangling, None, Perimeter Guard and Stakes / Wires )
Borough: Borough where tree was recorded.(Bronx, Brooklyn, Manhattan, Queens and Staten Island)
Longitude: Longitude of point
Latitude: Latitude of point
There are about 683788 entries with 45 marks. Again i will mention the important ones
curb_loc: Location of tree bed in relationship to the curb; trees are either along the curb (OnCurb) or offset from the curb (OffsetFromCurb)
status: Indicates whether the tree is alive, standing dead, or a stump.
health: Indicates the user’s perception of tree health. (Fair, Good and Poor)
guards: Indicates whether a guard is present, and if the user felt it was a helpful or harmful guard. Not recorded for dead trees and stumps.(Harmful, Helpful, None and Unsure)
sidewalk: Indicates whether one of the sidewalk flags immediately adjacent to the tree was damaged, cracked, or lifted. Not recorded for dead trees and stumps. (Damage or NoDamage)
problems
postcode: Five-digit zipcode in which tree is located
borough: Name of borough in which tree point is located. (Bronx, Brooklyn, Manhattan, Queens and Staten Island)
latitude: Latitude of point
longitude: Longitude of point
Dividing the total number of points by the area of the survey region gives the average density of points per unit area. In different contexts, the average number of points per unit area could be a measure of abundance (for example, the abundance of virus particles on a cell surface), density (of light-sensitive cells in the retina) and productivity (of crops) etc. The standard generic term is intensity. In our case, number of points(latitude and longitude of tree’s location) per certain area.
In order to do spatial point pattern analysis we need CSR. A simple way to check inhomogeneity or not CSR (complete spatial randomness) or to see the number of points are not uniformly distributed is Quadrat Counting. If it is homogeneous, we no longer can use other spatial test on the data sets.
In quadrat counting, the observation window W is divided into subregions \(B_{1},...,B_{m}\) called quadrats. For simplicity, suppose the regions have equal area. We count the numbers of points falling in each quadrat,\(n_{j} = n(x\bigcap B_{j})\)for j = 1, . . . , m. Since these counts are unbiased estimators of the corresponding expected values \(E[n(x\bigcap B_{j})]\), they should be equal ‘on average’ if the intensity is homogeneous. In particular, any apparent spatial trend in the counts \(n_{j}\) suggests that the intensity is inhomogeneous.
Lets check quadrat counting for 1995 census
Hypothesis Testing
Null Hypothesis: CSR
Alternative Hypothesis: not CSR or inhomogeneity
##
## Chi-squared test of CSR using quadrat counts
## Pearson X2 statistic
##
## data: objrescale95
## X2 = 156270, df = 3, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
## Quadrats: 2 by 2 grid of tiles
Since p-value is close to Zero \(\le\) significance level= 0.05 we reject the null hypothesis. Thus, it is an inhomogeneous process.
Kernel Estimation: If the point process has an intensity function \(\lambda(u)\), this function can be estimated nonparametrically by kernel estimation.
The usual kernel estimators of the intensity function are:
uncorrected: \[\lambda^{(0)}(u)= \sum_{i=1}^{n}K(u-x_{i})\]
uniformly corrected: \[\lambda^{(U)}(u)= \frac{1}{e(u)}\sum_{i=1}^{n}K(u-x_{i}) \]
Diggle’s correction:\[\lambda^{(D)}(u)= \sum_{i=1}^{n}\frac{1}{e(x_{i})}K(u-x_{i}) \]
for any spatial location u inside the window W , where K(u) is the kernel function and \[ e(u)=\int_W K(u-v)dv\]
is a correction for bias due to edge effects. Outside the window W , the estimated intensity is zero. The kernel K must be a probabiity density, that is, K(u)\(\ge\) 0 for all locations u, and \(\int_{R^2}K(u)du=1\). A common choice is the isotropic Gaussian (Normal distribution) probability density.
If we look at intensity map, we can see that the more yellow represents more density. Perspective map visualizes intensity in three dimentinal ways, the higher the peak the more density it has and contour map visualizes the intensity in 2D ways.
We did a good estimation since the standard error is very low.
Lets look at 2015 census
Again this is a good estimation since the standard error is very low.
## Getting 1995 data set
columname <- c("RecordId","Address","House_Number","Postcode_Original","Community",
"Board_Original","Species","Diameter","Condition","Wires","Sidewalk_Condition",
"Support_Structure","Borough","X","Y","Longitude","Latitude", "CB_New",
"Zip_New","CensusTract_2010", "CensusBlock_2010","NTA_2010",
"SegmentID","Spc_Common","Spc_Latin","Location","Council District","BIN","BBL")
type_col <- c("integer","character", "integer", "character", "integer","integer",
"factor","factor","integer",rep("factor",5),rep("integer",2),
rep("character",2), rep("integer",4), "character",
"integer", rep("character",3),rep("integer",3))
trees1995 <- read.csv(file="/Users/azizur/Desktop/stat 790/data/1995_Street_Tree_Census.csv")
trees1995.1 <- trees1995[c(7,8,10,11,12,13,14,17,18)]
trees1995.final <- na.exclude(trees1995.1)
## Getting 2015 data set
columname <- c("tree_id", "block_id","created_at","tree_dbh","stump_diam","curb_loc","status","health",
"spc_latin","spc_common","steward","guards","sidewalk","user_type",
"problems","root_stone","root_grate", "root_other",
"trunk_wire","trnk_light", "trnk_other","brch_light",
"brch_shoe","brch_other","address","postcode","zip_city","community board",
"borocode","borough","cncldist","st_assem","st_senate",
"nta","nta_name","boro_ct","state","latitude","longitude","x_sp",
"y_sp","council district","census tract","bin","bbl")
type_col <- c("integer","interger","character", "integer", "integer",
"factor","factor", rep("factor",17),
"factor","integer","factor", rep("integer",2), "factor",
rep("integer",3), rep("factor",2),"integer",
"factor",rep("character",2), rep("integer",6))
trees2015 <- read.csv(file="/Users/azizur/Desktop/stat 790/data/2015_Street_Tree_Census_-_Tree_Data.csv")
trees2015.1 <- trees2015[c(6,7,8,12,13,15,26,27,30,38,39)]
trees2015.final.2 <- na.exclude(trees2015.1)
## Creating a ppp object which will help us to do spatatial statistics tests
library(rgdal)
library(sp)
data95 = data.frame(long= trees1995.final$Longitude, lat=trees1995.final$Latitude)
coordinates(data95) <- ~ long+lat
proj4string(data95) <- CRS("+init=epsg:4326")
data.proj95 <- spTransform(data95, CRS("+init=epsg:2831"))
tj95 <- as.data.frame(data.proj95)
#install.packages("/Users/azizur/Desktop/spatstat 3", repos = NULL, type="source")
library(spatstat)
pppobj95 <- ppp(tj95$long, tj95$lat, c(278389, 325297), c(36872, 82874))
#### Quadrat Counting and inhomogeneity #####
objrescale95 <- rescale(pppobj95)
Quadrat_counting95 <- quadratcount(objrescale95, nx=2, y=2)
plot(Quadrat_counting95)
plot(intensity(Quadrat_counting95, image=TRUE))
quadrat.test(objrescale95,2,2) # gives p vale ###
#### Kernel estimation####
intensity_map95 <- density(pppobj95, sigma= 500)
plot(intensity_map95)
##Perspective Plot of Intensity Function, 1995
persp.im(intensity_map95)
##Contour Map of Intensity Function, 1995
contour.im(intensity_map95)
##Standard Error for Intensity, 1995
std.err.for.intsity95 <- density(pppobj95, 80, se=TRUE)$SE
plot(std.err.for.intsity95 )
## 2015 data set
library(rgdal)
library(sp)
data = data.frame(long= trees2015.final.2$longitude, lat=trees2015.final.2$latitude)
coordinates(data) <- ~ long+lat
proj4string(data) <- CRS("+init=epsg:4326")
data.proj <- spTransform(data, CRS("+init=epsg:2831"))
tj <- as.data.frame(data.proj)
# install.packages("devtools")
# library(devtools)
install.packages("/Users/azizur/Desktop/spatstat 3", repos = NULL, type="source")
library(spatstat)
pppobj <- ppp(tj$long, tj$lat, c(278389, 325297), c(36872, 82874))
##Intensity Map of Trees in 2015
intensity_map <- density(pppobj, sigma= 500)
plot(intensity_map)
##Perspective Plot of Intensity Function, 2015
persp.im(intensity_map)
##Contour Map of Intensity Function, 2015
contour.im(intensity_map)
####Standard Error for Intensity, 2015
std.err.for.intsity <- density(pppobj, 80, se=TRUE)$SE
plot(std.err.for.intsity )
## 2015
library(rgdal)
library(sp)
data = data.frame(long= trees2015.final.2$longitude, lat=trees2015.final.2$latitude)
coordinates(data) <- ~ long+lat
proj4string(data) <- CRS("+init=epsg:4326")
data.proj <- spTransform(data, CRS("+init=epsg:2831"))
tj <- as.data.frame(data.proj)
# install.packages("devtools")
# library(devtools)
#install.packages("/Users/azizur/Desktop/spatstat 3", repos = NULL, type="source")
library(spatstat)
pppobj <- ppp(tj$long, tj$lat, c(278389, 325297), c(36872, 82874))
##Intensity Map of Trees in 2015
intensity_map <- density(pppobj, sigma= 500)
plot(intensity_map)
##Perspective Plot of Intensity Function, 2015
persp.im(intensity_map)
##Contour Map of Intensity Function, 2015
contour.im(intensity_map)
####Standard Error for Intensity, 2015
std.err.for.intsity <- density(pppobj, 80, se=TRUE)$SE
plot(std.err.for.intsity )
## Comparing trees density changed over 20 years
##1995
intensity_map95 <- density(pppobj95, sigma= 500)
plot(intensity_map95)
##2015
intensity_map <- density(pppobj, sigma= 500)
plot(intensity_map)
##1995
intensity_map95 <- density(pppobj95, sigma= 500)
plot(intensity_map95)
##2015
intensity_map <- density(pppobj, sigma= 500)
plot(intensity_map)
Faraway, Julian James. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. CRC Press Taylor & Francis Group, 2016.
Dobson, A. and Barnett, A. (2008). An introduction to generalized linear models. Boca Raton: CRC Press.
Baddeley, Adrian, et al. Spatial Point Patterns: Methodology and Applications with R. CRC Press/Taylor & Francis Group, 2016.
Montero, José-María, et al. Spatial and spatio-Temporal geostatistical modeling and kriging. Wiley, 2015.
ANDRIENKO, NATALIA. EXPLORATORY ANALYSIS OF SPATIAL AND TEMPORAL DATA: a systematic approach. SPRINGER, 2016.
Data Source: NYC Open Data (https://data.cityofnewyork.us/browse?q=tree+census)