Contents

About Data

Spatial Point Pattern Analysis

  1. Kernel Estimation

  2. Contour map of Intensity

  3. Perspective plot of intensity

  4. Standard error of intensity

Intensity/density of trees in NYC over 20 years

References


About Data

I am interested how dense NYC is with trees. I came up with two data sets which are about NYC trees and very interesting. Both data sets are in public domain(https://data.cityofnewyork.us/browse?q=tree+census). There are two censuses about NYC trees, one from 1995 and onother from 2015, which is 20 years later after 1995 census. Let’s find out whether NYC is much more dense with tress or not!

1995 Census

There are about 516989 entries with 30 marks. I will just mention only important ones.

2015 Census

There are about 683788 entries with 45 marks. Again i will mention the important ones


Spatial Point Pattern Analysis

Intensity

Dividing the total number of points by the area of the survey region gives the average density of points per unit area. In different contexts, the average number of points per unit area could be a measure of abundance (for example, the abundance of virus particles on a cell surface), density (of light-sensitive cells in the retina) and productivity (of crops) etc. The standard generic term is intensity. In our case, number of points(latitude and longitude of tree’s location) per certain area.

Complete Spatial Randomness(CSR) and Quadrat Counting

In order to do spatial point pattern analysis we need CSR. A simple way to check inhomogeneity or not CSR (complete spatial randomness) or to see the number of points are not uniformly distributed is Quadrat Counting. If it is homogeneous, we no longer can use other spatial test on the data sets.

In quadrat counting, the observation window W is divided into subregions \(B_{1},...,B_{m}\) called quadrats. For simplicity, suppose the regions have equal area. We count the numbers of points falling in each quadrat,\(n_{j} = n(x\bigcap B_{j})\)for j = 1, . . . , m. Since these counts are unbiased estimators of the corresponding expected values \(E[n(x\bigcap B_{j})]\), they should be equal ‘on average’ if the intensity is homogeneous. In particular, any apparent spatial trend in the counts \(n_{j}\) suggests that the intensity is inhomogeneous.

Lets check quadrat counting for 1995 census

Hypothesis Testing

## 
##  Chi-squared test of CSR using quadrat counts
##  Pearson X2 statistic
## 
## data:  objrescale95
## X2 = 156270, df = 3, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## Quadrats: 2 by 2 grid of tiles

Since p-value is close to Zero \(\le\) significance level= 0.05 we reject the null hypothesis. Thus, it is an inhomogeneous process.


Smoothing Estimation of Intensity Function

Kernel Estimation: If the point process has an intensity function \(\lambda(u)\), this function can be estimated nonparametrically by kernel estimation.

The usual kernel estimators of the intensity function are:

uncorrected: \[\lambda^{(0)}(u)= \sum_{i=1}^{n}K(u-x_{i})\]

uniformly corrected: \[\lambda^{(U)}(u)= \frac{1}{e(u)}\sum_{i=1}^{n}K(u-x_{i}) \]

Diggle’s correction:\[\lambda^{(D)}(u)= \sum_{i=1}^{n}\frac{1}{e(x_{i})}K(u-x_{i}) \]

for any spatial location u inside the window W , where K(u) is the kernel function and \[ e(u)=\int_W K(u-v)dv\]

is a correction for bias due to edge effects. Outside the window W , the estimated intensity is zero. The kernel K must be a probabiity density, that is, K(u)\(\ge\) 0 for all locations u, and \(\int_{R^2}K(u)du=1\). A common choice is the isotropic Gaussian (Normal distribution) probability density.

Intensity Map of trees in 1995

If we look at intensity map, we can see that the more yellow represents more density. Perspective map visualizes intensity in three dimentinal ways, the higher the peak the more density it has and contour map visualizes the intensity in 2D ways.

Perspective Plot of Intensity Function, 1995

Contour Map of Intensity Function, 1995

Standard Error for Intensity, 1995

We did a good estimation since the standard error is very low.


Lets look at 2015 census

Intensity Map of Trees in 2015

Perspective Plot of Intensity Function, 2015

Contour Map of Intensity Function, 2015

Standard Error for Intensity, 2015

Again this is a good estimation since the standard error is very low.

Now lets compare how the trees density changed over 20 years

1995

2015

We can see that NYC is much more densed with trees now compare to 20 years ago


Code/Appendix

## Getting 1995 data set
columname <- c("RecordId","Address","House_Number","Postcode_Original","Community",
        "Board_Original","Species","Diameter","Condition","Wires","Sidewalk_Condition",
      
        "Support_Structure","Borough","X","Y","Longitude","Latitude", "CB_New", 
        "Zip_New","CensusTract_2010", "CensusBlock_2010","NTA_2010", 
        "SegmentID","Spc_Common","Spc_Latin","Location","Council District","BIN","BBL")
               
type_col <- c("integer","character", "integer", "character", "integer","integer", 
              "factor","factor","integer",rep("factor",5),rep("integer",2),
              rep("character",2), rep("integer",4), "character",
              "integer", rep("character",3),rep("integer",3))

trees1995 <- read.csv(file="/Users/azizur/Desktop/stat 790/data/1995_Street_Tree_Census.csv")
trees1995.1 <- trees1995[c(7,8,10,11,12,13,14,17,18)]
trees1995.final <- na.exclude(trees1995.1)

## Getting 2015 data set
columname <- c("tree_id", "block_id","created_at","tree_dbh","stump_diam","curb_loc","status","health",
               "spc_latin","spc_common","steward","guards","sidewalk","user_type",
               "problems","root_stone","root_grate", "root_other",
               "trunk_wire","trnk_light", "trnk_other","brch_light",
               "brch_shoe","brch_other","address","postcode","zip_city","community board",
               "borocode","borough","cncldist","st_assem","st_senate",
               "nta","nta_name","boro_ct","state","latitude","longitude","x_sp",
               "y_sp","council district","census tract","bin","bbl")

type_col <- c("integer","interger","character", "integer", "integer",
              "factor","factor", rep("factor",17),
              "factor","integer","factor", rep("integer",2), "factor",
              rep("integer",3), rep("factor",2),"integer",
              "factor",rep("character",2), rep("integer",6))

trees2015 <- read.csv(file="/Users/azizur/Desktop/stat 790/data/2015_Street_Tree_Census_-_Tree_Data.csv")
trees2015.1 <- trees2015[c(6,7,8,12,13,15,26,27,30,38,39)]
trees2015.final.2 <- na.exclude(trees2015.1)

## Creating a ppp object which will help us to do spatatial statistics tests 
library(rgdal)
library(sp)
data95 = data.frame(long= trees1995.final$Longitude, lat=trees1995.final$Latitude)
coordinates(data95) <- ~ long+lat
proj4string(data95) <- CRS("+init=epsg:4326")
data.proj95 <- spTransform(data95, CRS("+init=epsg:2831"))
tj95 <- as.data.frame(data.proj95)
#install.packages("/Users/azizur/Desktop/spatstat 3", repos = NULL, type="source")
library(spatstat)

pppobj95 <- ppp(tj95$long, tj95$lat, c(278389, 325297), c(36872, 82874))
#### Quadrat Counting and inhomogeneity #####
objrescale95 <- rescale(pppobj95)
Quadrat_counting95 <- quadratcount(objrescale95, nx=2, y=2)

plot(Quadrat_counting95)
plot(intensity(Quadrat_counting95, image=TRUE))

quadrat.test(objrescale95,2,2) # gives p vale ###

#### Kernel estimation####
intensity_map95 <- density(pppobj95, sigma= 500)
plot(intensity_map95)

##Perspective Plot of Intensity Function, 1995 
persp.im(intensity_map95)

##Contour Map of Intensity Function, 1995 
contour.im(intensity_map95)

##Standard Error for Intensity, 1995
std.err.for.intsity95 <- density(pppobj95, 80, se=TRUE)$SE
plot(std.err.for.intsity95 )

## 2015 data set
library(rgdal)
library(sp)
data = data.frame(long= trees2015.final.2$longitude, lat=trees2015.final.2$latitude)
coordinates(data) <- ~ long+lat
proj4string(data) <- CRS("+init=epsg:4326")
data.proj <- spTransform(data, CRS("+init=epsg:2831"))
tj <- as.data.frame(data.proj)

# install.packages("devtools")
# library(devtools)
install.packages("/Users/azizur/Desktop/spatstat 3", repos = NULL, type="source")
library(spatstat)

pppobj <- ppp(tj$long, tj$lat, c(278389, 325297), c(36872, 82874))

##Intensity Map of Trees in 2015
intensity_map <- density(pppobj, sigma= 500)
plot(intensity_map)

##Perspective Plot of Intensity Function, 2015
persp.im(intensity_map)

##Contour Map of Intensity Function, 2015
contour.im(intensity_map)

####Standard Error for Intensity, 2015
std.err.for.intsity <- density(pppobj, 80, se=TRUE)$SE
plot(std.err.for.intsity )

## 2015
library(rgdal)
library(sp)
data = data.frame(long= trees2015.final.2$longitude, lat=trees2015.final.2$latitude)
coordinates(data) <- ~ long+lat
proj4string(data) <- CRS("+init=epsg:4326")
data.proj <- spTransform(data, CRS("+init=epsg:2831"))
tj <- as.data.frame(data.proj)

# install.packages("devtools")
# library(devtools)
#install.packages("/Users/azizur/Desktop/spatstat 3", repos = NULL, type="source")
library(spatstat)

pppobj <- ppp(tj$long, tj$lat, c(278389, 325297), c(36872, 82874))

##Intensity Map of Trees in 2015
intensity_map <- density(pppobj, sigma= 500)
plot(intensity_map)

##Perspective Plot of Intensity Function, 2015
persp.im(intensity_map)

##Contour Map of Intensity Function, 2015
contour.im(intensity_map)

####Standard Error for Intensity, 2015
std.err.for.intsity <- density(pppobj, 80, se=TRUE)$SE
plot(std.err.for.intsity )

## Comparing trees density changed over 20 years 
##1995
intensity_map95 <- density(pppobj95, sigma= 500)
plot(intensity_map95)
##2015
intensity_map <- density(pppobj, sigma= 500)
plot(intensity_map)

##1995
intensity_map95 <- density(pppobj95, sigma= 500)
plot(intensity_map95)

##2015
intensity_map <- density(pppobj, sigma= 500)
plot(intensity_map)

References