Chapter 1 Introduction

In the contemporary urban landscape, the analysis of spatial point processes has become increasingly crucial, with Uber pick-up points serving as a paradigmatic example of such patterns. The focal point of this thesis revolves around the estimation of the intensity function within spatial point processes, considering diverse state spaces such as R^2 and linear networks. A central theme of this research lies in the exploration and visualization of the intricate spatial dynamics characterizing these phenomena.

Accurately estimating the intensity function is paramount for understanding the underlying distribution of Uber pick-up points, facilitating informed decision-making in urban planning and resource allocation. Traditional analyses often fall short when faced with irregular windows and complex linear networks. This research confronts these challenges head-on, by delving into irregular windows and time-ordered point patterns, the study aims to provide a comprehensive understanding of the nuanced spatial processes inherent in services like Uber.

A distinctive aspect of this research involves the detailed discussion of various kernel-based intensity estimators within the statistical computing environment R. Kernel-based methods offer a versatile and flexible approach to estimating the intensity function by assigning weights to neighboring events. The thesis meticulously examines these estimators, providing an in-depth analysis of their strengths, weaknesses, and applications within the context of spatial point processes. Through this exploration, the research contributes to the methodological advancements in spatial statistics, offering practitioners a comprehensive toolkit for analyzing complex point patterns.

The visual representation of intensity functions emerges as a crucial facet of this investigation. Visualization serves not only as a means of presenting the intricate spatial dynamics but also as a tool for gaining intuitive insights into the underlying patterns. The thesis emphasizes the importance of visualization techniques in conveying the intensity of Uber pick-up points across different state spaces and temporal sequences. By leveraging advanced visualization methods, this research aims to enhance the interpretability of intensity functions, providing a visually compelling narrative that aids both researchers and urban planners in grasping the complexities of spatial point processes.

In summary, this thesis pioneers an in-depth exploration of Uber pick-up point patterns, extending its analysis to irregular windows, time-ordered point patterns, and diverse state spaces. Through a detailed discussion of kernel-based intensity estimators in R and a focus on innovative visualization techniques, the research endeavors to not only contribute to the theoretical understanding of spatial point processes but also to provide practical tools for effective decision-making in urban planning.

1.1 References

Non-parametric intensity estimation for spatial point patterns with R

authors:

name: Jorge Mateu

address: Department of Mathematics, University Jaume I, Castellón, Spain.

email: mateu@uji.es

name: Mehdi Moradi

address: Department of Mathematics and Mathematical Statistics, Umeå University, Umeå, Sweden.

email: mehdi.moradi@umu.se

Chapter2 Dataset and Preprocessing

For this project, I will use two datasets:

for Uber, just retrieve the data from this link : download
for OpenStreetMap, I use the GeoFabrik website to download the new-york-latest-free.shp.zip shapefile

2.1 Uber data

There are raw data on Uber pickups in New York City from April 2014. The datad has the following columns:

Date/Time : The date and time of the Uber pickup
Lat : The latitude of the Uber pickup
Lon : The longitude of the Uber pickup
Base : The TLC base company code affiliated with the Uber pickup

#uber movement on 04.2014 in newyork
uber = read.csv(here("data/uber-raw-data-apr14.csv"))

Once loaded. We can have glimpse at the Uber table:

str(uber)

## 'data.frame':    564516 obs. of  4 variables:
##  $ Date.Time: chr  "4/1/2014 0:11:00" "4/1/2014 0:17:00" "4/1/2014 0:21:00" "4/1/2014 0:28:00" ...
##  $ Lat      : num  40.8 40.7 40.7 40.8 40.8 ...
##  $ Lon      : num  -74 -74 -74 -74 -74 ...
##  $ Base     : chr  "B02512" "B02512" "B02512" "B02512" ...

head(uber)

##          Date.Time     Lat      Lon   Base
## 1 4/1/2014 0:11:00 40.7690 -73.9549 B02512
## 2 4/1/2014 0:17:00 40.7267 -74.0345 B02512
## 3 4/1/2014 0:21:00 40.7316 -73.9873 B02512
## 4 4/1/2014 0:28:00 40.7588 -73.9776 B02512
## 5 4/1/2014 0:33:00 40.7594 -73.9722 B02512
## 6 4/1/2014 0:33:00 40.7383 -74.0403 B02512

We can see 564,516 Uber pickup locations happend on the 04. 2014 in New York.

2.2 Brooklyn spatial data

Visualizing data related to Brooklyn, New York. The code involves loading geographic data for New York state, specifically the Brooklyn area, and plotting various spatial features such as places, roads, and landuse. Additionally, limiting the x and y-axis ranges and Uber pickup points to focus on the Brooklyn area.

# New york state border
newyork.place = sf::read_sf(dsn="data/new-york-latest-free.shp/",
                            layer="gis_osm_places_a_free_1")
# specifically the Brooklyn area
br.polygon = newyork.place %>% filter(name=="Brooklyn") 

#convert to sp 
br.polygon.sp= as_Spatial(br.polygon)

In the next step, I filtered the Uber pickups dataset to retain only those occurring within the Brooklyn area. Containing 61,668 Uber pickups that fall within the defined Brooklyn grid.

# Secondly, only reserve the pickups on Brooklyn 
uber.br=uber
coordinates(uber.br)=c("Lon","Lat")
crs(uber.br)=crs(br.polygon)
uber.br$id=over(uber.br,br.polygon.sp)
uber.br=data.frame(uber.br)
uber.br[!is.na(uber.br$id.osm_id),]->uber.br 
#61,668 points into brooklyn area

Sys.setlocale("LC_TIME", "English")

## [1] "English_United States.1252"

uber.br$Date.Time <- as.POSIXct(uber.br$Date.Time, format = "%m/%d/%Y %H:%M:%S")

uber.br=uber.br %>%
  mutate(day=day(Date.Time),
         hour=hour(Date.Time),
         weekdays=weekdays(Date.Time),
         week=week(Date.Time))

head(uber.br)

##               Date.Time     Lat      Lon   Base id.osm_id id.code id.fclass
## 23  2014-04-01 04:15:00 40.6561 -73.9531 B02512   9691750    1010    suburb
## 65  2014-04-01 06:15:00 40.7204 -73.9545 B02512   9691750    1010    suburb
## 89  2014-04-01 06:44:00 40.7231 -73.9442 B02512   9691750    1010    suburb
## 102 2014-04-01 07:00:00 40.6990 -73.9951 B02512   9691750    1010    suburb
## 178 2014-04-01 08:19:00 40.7214 -73.9475 B02512   9691750    1010    suburb
## 193 2014-04-01 08:32:00 40.6819 -73.9358 B02512   9691750    1010    suburb
##     id.population  id.name optional day hour weekdays week
## 23        2736074 Brooklyn     TRUE   1    4  Tuesday   14
## 65        2736074 Brooklyn     TRUE   1    6  Tuesday   14
## 89        2736074 Brooklyn     TRUE   1    6  Tuesday   14
## 102       2736074 Brooklyn     TRUE   1    7  Tuesday   14
## 178       2736074 Brooklyn     TRUE   1    8  Tuesday   14
## 193       2736074 Brooklyn     TRUE   1    8  Tuesday   14

br.polygon <- st_transform(br.polygon, crs = 6257) # transform coordinates
# convert to owin class
winn.br <- as.owin(br.polygon) 

coordinates(uber.br)=c("Lon","Lat")
uber.br <- st_as_sf(uber.br)
uber.br <- st_set_crs(uber.br, 4326)
uber.br <- st_transform(uber.br, crs = 6257)

xy_uber <- st_coordinates(uber.br) # retrieve coordinates
#convet to ppp class
X_uber <- as.ppp(xy_uber, winn.br)
plot(X_uber, pch = 20, main = "Uber Pick-up Points", lwd = 1, cex = 0.3)

Chapter 3 Spatial point patterns on R^2

3.1 Kernel-based Bandwidth selection

Below we estimate the smoothing bandwidth parameter for the considered forest fire data using the reviewed methods:

Scott's Rule : A bandwidth selection method based on a heuristic formula that considers the sample variance and the number of data points, aiming to find an optimal smoothing parameter for kernel density estimation.
Likelihood Cross-Validation : A technique that maximizes the likelihood function while leaving out each data point iteratively, helping to select a bandwidth that balances model fit and complexity.
Diggle's Approach : A method proposed by Diggle for bandwidth selection in kernel density estimation.
Cronie and van Lieshout's Criterion : A criterion for bandwidth selection that minimizes an objective function related to the integrated mean squared error, aiming to strike a balance between over-smoothing and under-smoothing in kernel density estimation.

# Scott’s rule
scott.uber <- bw.scott(X_uber) 

# Likelihood cross validation,running time is very long
ppl.uber <- bw.ppl(X_uber) 

#Didggle's approach
diggle.uber <- bw.diggle(X_uber) 

# Cronie and van Lieshout's criterio, running time also long
CvL.uber <- bw.CvL(X_uber) 

parameters_list <- list(scott.uber,
                        ppl.uber,
                        diggle.uber,
                        CvL.uber)

# 保存到RDS文件
#saveRDS(parameters_list, file = "output/parameter_values.rds")
a= read_rds("output/parameter_values.rds")
a

## [[1]]
##  sigma.x  sigma.y 
## 324.3484 455.0993 
## 
## [[2]]
##    sigma 
## 21.98301 
## 
## [[3]]
##    sigma 
## 1.113344 
## 
## [[4]]
##    sigma 
## 4228.733

Bandwidth from different methods:

Scott’s rule: 324.3484 m, 455.0993 m

Likelihood cross validation (ppl): 21.98301 m

Diggle’s approach:1.113344 m

Cronie and van Lieshout’s criterion (CvL): 4228.733 m.

The larger bandwidths obtained from Scott’s rule and Cronie and van Lieshout’s criterion reflect differences in bandwidth selection methods for point pattern density estimation.

3.2 Smoothed Intensity estimation

Having estimated bandwidth through different methods, I next estimate the intensity of the uber pick-up points data by employing the intensity estimators and using both kernel functions Gaussian and Epanechnikov. The function density.ppp provides kernel-based intensity estimation for point patterns on R^2.

3.2.1 Gaussian kernel and uniformly-edge-corrected estimator

Employing the Gaussian kernel and uniformly-edge-corrected estimator delving into the spatial patterns of Uber pick-up points with 4 representing different approaches: Scott’s rule, Likelihood cross-validation, Diggle’s approach, and Cronie and Van Lieshout’s criterion .

scott.uber <- density.ppp(X_uber,  
                            sigma = a[[1]],
                            leaveoneout = FALSE,
                            positive = TRUE)

ppl.uber <- density.ppp(X_uber,  
                          sigma = a[[2]], 
                          leaveoneout = FALSE,
                          positive = TRUE)

diggle.uber <- density.ppp(X_uber,  
                             sigma =a[[3]],
                             leaveoneout = FALSE,
                             positive = TRUE)

CvL.uber <- density.ppp(X_uber,  
                          sigma = a[[4]], 
                          leaveoneout = FALSE,
                          positive = TRUE)

inten.gua.uber <- stack(raster(scott.uber02), 
                     raster(ppl.uber),
                     raster(diggle.uber), 
                     raster(CvL.uber))

inten.gua.uber <- inten.gua.uber*10^3

names(inten.gua.uber) <- c("scott_gaus_U",
                        "ppl_gaus_U", 
                        "diggle_gaus_U",
                        "CvL_gaus_U")
                        
at <- c(seq(0, 1.4, 0.2), 2,4,7)

# Plot the intensity using spplot
spplot(inten.gua.uber, at = at, scales = list(draw = FALSE),
       col.regions = rev(viridis(20,option = "A")),
       colorkey = list(labels = list(cex = 3)),
       par.strip.text = list(cex = 2),
       layout = c(2, 2),
       main = list(label = "Uber Pick-up Intensity with Gaussian kernel and uniformly-edge-corrected estimator", cex = 1.5))

# Save the plot
dev.copy(png, "pic/gua.intensity.png", width = 800, height = 800)
dev.off()

Kernel-based intensity estimation for Uber Pick-Up points in Brooklyn during 04.2014.

In the “Gaussian kernel and uniformly-edge-corrected estimator” plot, a distinct contrast is evident among the results obtained using the four different bandwidth selection methods:

Scott’s rule: The map reveals two areas of high intensity situated to the north and north-west of Brooklyn. This highlights a non-uniform intensity distribution in the Uber pick-up locations data.
Likelihood cross-validation and Diggle’s approach: With bandwidths of 21.98301 meters and 1.113344 meters, respectively, these methods generate under-smoothed maps. The similarity between the two maps is striking, with little discernible difference. Both maps exhibit finer details, indicating a higher resolution due to the relatively smaller bandwidths.
Cronie and Van Lieshout’s criterion: Utilizing the largest bandwidth of 4228.733 meters, this approach results in over-smoothed maps. The map loses some details about the intensity distribution, presenting a generalized view of the spatial pattern. The larger bandwidth leads to a more aggregated representation, potentially obscuring localized variations in the intensity of the events.

So in this case, I recommend using the Scott’s rule plot for its balanced approach, avoiding both under-smoothing and over-smoothing. This choice preserves essential details and captures the overall tendency in the spatial distribution of events effectively.

This comparative analysis emphasizes the importance of choosing an appropriate bandwidth selection method, as it significantly influences the interpretation of spatial patterns and the level of detail captured in the resulting intensity maps.

3.2.2 Try different kernel and estimators

Gaussian kernel and Jones-Diggle estimator

Now I turn to estimate the intensity through the Jones-Diggle estimator. The Diggle model takes into account directionality and boundary effects in point pattern density estimation.

scott_dig.uber <- density.ppp(X_uber,  
                                sigma =a[[1]],
                                leaveoneout = FALSE,
                                positive = TRUE,
                                diggle = TRUE)

ppl_dig.uber <- density.ppp(X_uber,  
                              sigma = a[[2]],
                              leaveoneout = FALSE,
                              positive = TRUE,  
                              diggle = TRUE)


diggle_dig.uber <- density.ppp(X_uber,  
                                 sigma =a[[3]],
                                 leaveoneout = FALSE,
                                 positive = TRUE, 
                                 diggle = TRUE)

cvl_dig.uber <- density.ppp(X_uber,  
                              sigma =a[[4]],
                              leaveoneout = FALSE,
                              positive = TRUE, 
                              diggle = TRUE)

Epanechnikov kernel and uniformly-edge-corrected estimator

The Epanechnikov kernel is also a type of smoothing function used in kernel density estimation. We can compare the default kernel Gaussian and Epanechnikov to see the difference.

scott_epa.uber <- density.ppp(X_uber,  
                            sigma = a[[1]],
                            leaveoneout = FALSE,
                            positive = TRUE,
                            kernel = "epanechnikov")

ppl_epa.uber <- density.ppp(X_uber,  
                          sigma = a[[2]], 
                          leaveoneout = FALSE,
                          positive = TRUE,
                          kernel = "epanechnikov")

diggle_epa.uber <- density.ppp(X_uber,  
                             sigma =a[[3]],
                             leaveoneout = FALSE,
                             positive = TRUE,
                             kernel = "epanechnikov")

cvL_epa.uber <- density.ppp(X_uber,  
                          sigma = a[[4]], 
                          leaveoneout = FALSE,
                          positive = TRUE,
                          kernel = "epanechnikov")

Epanechnikov kernel and Jones-Diggle estimator

scott_dig_epa.uber <- density.ppp(X_uber,  
                                sigma =a[[1]],
                                leaveoneout = FALSE,
                                positive = TRUE,
                                diggle = TRUE,
                                kernel = "epanechnikov")

ppl_dig_epa.uber <- density.ppp(X_uber,  
                              sigma = a[[2]],
                              leaveoneout = FALSE,
                              positive = TRUE,  
                              diggle = TRUE,
                              kernel = "epanechnikov")


diggle_dig_epa.uber <- density.ppp(X_uber,  
                                 sigma =a[[3]],
                                 leaveoneout = FALSE,
                                 positive = TRUE, 
                                 diggle = TRUE,
                                 kernel = "epanechnikov")

cvl_dig_epa.uber <- density.ppp(X_uber,  
                              sigma =a[[4]],
                              leaveoneout = FALSE,
                              positive = TRUE, 
                              diggle = TRUE,
                              kernel = "epanechnikov")

3.2.3 Visualizing different kernel and estimator combinations

After employing diverse bandwidth selection approaches and kernel functions, resulting in varied intensity estimates, Then I want to visually present and analyze these estimates, highlighting potential discrepancies.

sp_int.uber <- stack(raster(scott.uber), 
                     raster(ppl.uber),
                     raster(diggle.uber), 
                     raster(CvL.uber), 
                     
                     raster(scott_dig.uber),
                     raster(ppl_dig.uber),
                     raster(diggle_dig.uber), 
                     raster(cvl_dig.uber),
                     
                     raster(scott_epa.uber),
                     raster(ppl_epa.uber),
                     raster(diggle_epa.uber),
                     raster(cvL_epa.uber),
                     
                     raster(scott_dig_epa.uber),
                     raster(ppl_dig_epa.uber),
                     raster(diggle_dig_epa.uber),
                     raster(cvl_dig_epa.uber))

sp_int.uber <- sp_int.uber*10^3

names(sp_int.uber) <- c("scott_gaus_U",
                        "ppl_gaus_U", 
                        "diggle_gaus_U",
                        "CvL_gaus_U",
                        
                        "scott_gaus_JD",  
                        "ppl_gaus_JD", 
                        "diggle_gaus_JD",
                         "CvL_gaus_JD",
                        
                        "scott_epa_U",
                        "ppl_epa_U", 
                        "diggle_epa_U",
                        "CvL_epa_U",
                        
                        "scott_epa_JD", 
                        "ppl_epa_JD", 
                        "diggle_epa_JD", 
                        "CvL_epa_JD")
                        
at <- c(seq(0, 1.4, 0.2), 2, 4, 7,12)
spplot(sp_int.uber,  at = at, scales = list(draw = FALSE), 
       col.regions = rev(viridis(20,option = "A")),
       colorkey = list(labels = list(cex = 3)),
       par.strip.text = list(cex = 2),
         layout = c(4, NA),
       main = list(label = "Uber Pick-up Intensity with different kernels and estimators", cex = 1.5))

# Save the plot
dev.copy(png, "pic/compare.estimator.png", width = 1000, height = 1200)
dev.off()

Kernel-based intensity estimation for Uber Pick-Up points in Brooklyn during 04.2014. Layer’s names start with bandwidth selection approach followed by the employed kernel and the edge-correctionKernel-based intensity estimation.

From this “Different kernels and estimators” plot, we can see:

Scott’s rule: There is no discernible difference between the Gaussian and Epanechnikov kernels, as well as the uniformly-edge-corrected and Jones-Diggle estimators. This lack of distinction implies that, under the specified bandwidth, the choice of kernel type and estimator has minimal impact on the resulting intensity maps.
Likelihood cross-validation and Diggle’s approach: Despite employing various kernels and estimators, consistently produce under-smoothed maps. This outcome is attributed to the small bandwidths chosen in the application of these methods. The limited smoothing effect results in intensity maps that retain detailed structures but may overlook broader trends or patterns. While these methods offer flexibility in capturing intricate spatial features, their performance is highly sensitive to the choice of bandwidth, necessitating careful consideration to strike a balance between local detail preservation and global trend representation in the resulting intensity maps.
Cronie and Van Lieshout’s criterion: Noticeable disparities when employing uniformly-edge-corrected and Jones-Diggle estimators. However, determining the superior estimator is challenging due to the substantial bandwidth of 4228.733 meters. This approach tends to over-smooth the resulting intensity maps, obscuring finer spatial details and potentially diminishing the ability to capture localized variations in point patterns.

This comparative analysis underscores the critical role of selecting an appropriate bandwidth and kernel-estimator combination, impacting the interpretation of spatial patterns and level of detail in intensity maps. The study highlights the nuanced trade-offs between under-smoothing and over-smoothing, emphasizing the need for careful consideration to strike a balance tailored to the specific characteristics of the dataset and the analytical objectives.

Chapter 4 Spatial point patterns on linear networks

The relationship between Uber pick-up points and the Brooklyn main road network introduces a novel perspective into urban transportation dynamics. Understanding how these pick-up points interact with the intricate structure of the road network sheds light on mobility patterns, congestion hotspots, and potential areas of high demand. This concept not only explores the spatial distribution of pick-up points but also delves into the network’s influence on user behavior and the efficiency of the overall transportation system. It provides valuable insights for optimizing ride-sharing services and improving the urban mobility experience.

4.1 Preparing data: point pattern on linear network

#the boundary
winn.br

## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units

#the road-linestring 
net.br = sf::read_sf(dsn="data/new-york-latest-free.shp/",
                            layer="gis_osm_roads_free_1")
net.br = st_transform(net.br,crs=6257)
#filter the main road
net.br %>% filter(fclass %in% c("motorway","primary","residential"))->net.br
net.br = as.linnet.SpatialLines(as_Spatial(net.br))

## Warning: Repeated vertices (on the same line) were removed

## Warning: Duplicated segments were ignored

Window(net.br) = winn.br
# networks on the brooklyn boundary window
net.br = as.linnet(net.br, W= winn.br)
net.br

## Linear network with 42938 vertices and 47333 lines
## Line segments are marked
## Enclosing window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units

A linear network(Brooklyn roads) consists of 42938 vertices and 47333 lines. Each line segment is marked with specific attributes. The enclosing window is defined by a polygonal boundary (Brooklyn boundary), and the bounding rectangle spans the coordinate range [963014, 981944.6] x [4977346, 4998278] units.

# uber points
uber.br

## Simple feature collection with 61668 features and 12 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 964406.4 ymin: 4979850 xmax: 979539.2 ymax: 4998196
## Projected CRS: MAGNA-SIRGAS / Medellin urban grid
## First 10 features:
##               Date.Time   Base id.osm_id id.code id.fclass id.population
## 23  2014-04-01 04:15:00 B02512   9691750    1010    suburb       2736074
## 65  2014-04-01 06:15:00 B02512   9691750    1010    suburb       2736074
## 89  2014-04-01 06:44:00 B02512   9691750    1010    suburb       2736074
## 102 2014-04-01 07:00:00 B02512   9691750    1010    suburb       2736074
## 178 2014-04-01 08:19:00 B02512   9691750    1010    suburb       2736074
## 193 2014-04-01 08:32:00 B02512   9691750    1010    suburb       2736074
## 219 2014-04-01 09:10:00 B02512   9691750    1010    suburb       2736074
## 284 2014-04-01 10:53:00 B02512   9691750    1010    suburb       2736074
## 318 2014-04-01 11:24:00 B02512   9691750    1010    suburb       2736074
## 333 2014-04-01 11:44:00 B02512   9691750    1010    suburb       2736074
##      id.name optional day hour weekdays week                 geometry
## 23  Brooklyn     TRUE   1    4  Tuesday   14 POINT (971721.3 4989061)
## 65  Brooklyn     TRUE   1    6  Tuesday   14   POINT (971472 4996172)
## 89  Brooklyn     TRUE   1    6  Tuesday   14 POINT (972336.9 4996473)
## 102 Brooklyn     TRUE   1    7  Tuesday   14 POINT (968083.4 4993797)
## 178 Brooklyn     TRUE   1    8  Tuesday   14 POINT (972061.5 4996284)
## 193 Brooklyn     TRUE   1    8  Tuesday   14 POINT (973131.6 4991918)
## 219 Brooklyn     TRUE   1    9  Tuesday   14 POINT (971878.3 4995222)
## 284 Brooklyn     TRUE   1   10  Tuesday   14 POINT (970414.6 4992984)
## 318 Brooklyn     TRUE   1   11  Tuesday   14 POINT (972408.7 4993055)
## 333 Brooklyn     TRUE   1   11  Tuesday   14 POINT (968188.1 4986541)

# uber points on net.br linear networks
X_uber.linear <- lpp(st_coordinates(uber.br), L = net.br)

## Warning: data contain duplicated points

#X_uber.linear <- unique(X_uber.linear)
X_uber.linear

## Point pattern on linear network
## 61668 points
## Linear network with 42938 vertices and 47333 lines
## Line segments are marked
## Enclosing window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units

4.1.1 Visualization: Point pattern on linear network

# plot boundary
plot(br.polygon$geometry, lwd = 2, col = "lightgray")

# add points
plot(X_uber.linear, pch = 20, main = "", lwd = 1, cex = 0.5, add = TRUE, 
     cols = "red", col = "steelblue")

# title and legend
title("Uber Pick-Up Points on Linear Network", cex.main = 1.5)
legend("topright", legend = c("Linear Network", "Uber Pick-Up Points"), 
       pch = c(-1, 20), col = c("steelblue", "red"), lty = c(1, 0), bty = "n")

axis(1, cex.axis = 0.8)
axis(2, cex.axis = 0.8)

grid(col = "lightgray", lty = 2)

# save
dev.copy(png, "pic/uber_pickup_linear_network.png", width = 1200, height = 800)
dev.off()

Uber Pick-Up Points on Brooklyn road. The boundary of Brooklyn is in gray, the street network in blue, and red dots stand for the location of uber pick-up locations.

4.2 Intensity estimation: Point pattern on linear network

Utilizing uniformly-edge-corrected estimators, I will focuses on analyzing Uber pick-up points in Brooklyn during the specific timeframe of April 2014 with applications involving two prominent methods for bandwidth selection: likelihood cross-validation and Scott’s rule.

Likelihood cross-validation offers a robust approach by assessing the model’s predictive performance, optimizing bandwidth selection to achieve reliable intensity estimates.

On the other hand, Scott’s rule, a widely used heuristic, provides a straightforward criterion for bandwidth determination based on the sample’s standard deviation and sample size.

scott_uber <- bw.scott.iso(X_uber.linear) # Scott rule
ppl_uber <- bw.lppl(X_uber.linear) # likelihood cross validation

parameters_list02 <- list(scott.uber,
                        ppl.uber)

#saveRDS(parameters_list02, file = "output/parameter_values02.rds")
b= read_rds("output/parameter_values02.rds")

Bandwidth for these 2 methods:

Scott’s rule: 384.2227 m

Likelihood cross validation (ppl): 38.4227 m

d_scott_uber <- densityQuick.lpp(X_uber.linear,  sigma = b[[1]],
                                     leaveoneout = FALSE, positive = TRUE,dimyx = 1024)

d_ppl_uber <- densityQuick.lpp(X_uber.linear,  sigma = b[[2]],
                                   leaveoneout = FALSE,  positive = TRUE,
                               dimyx = 1024)

d_scott_uber02 <- d_scott_uber*20
(d_ppl_uber02 <- d_ppl_uber*20)

## Image on linear network
## Linear network with 42938 vertices and 47333 lines
## Line segments are marked
## Enclosing window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
## real-valued pixel image
## 1024 x 1024 pixel array (ny, nx)
## enclosing rectangle: [963010, 981940] x [4977300, 4998300] units
##  Data frame: 125948 sample points along network
##  Average density: one sample point per 17 units

4.2.1 Visualization: Intensity estimation

par(mfrow=c(1,2))

plot(raster(as.im(d_scott_uber02)),
     main = "Scott’s rule for bandwidth selection", 
     legend = FALSE,
     col =pal,
     zlim = c(0,6),
     cex.main = 1.5)
plot(br.polygon$geometry, add=TRUE, lwd=1)

plot(raster(as.im(d_ppl_uber02 )), 
     main = "Likelihood cross validation bandwidth selection",
     axis.args = list(cex.axis = 4),
     legend.width = 2, 
     col=pal,
     zlim = c(0,6),
     cex.main = 1.5)
plot(br.polygon$geometry, add = TRUE, lwd = 1)

# save
dev.copy(png, "pic/scottPpl.estimator.png", width = 1200, height = 800)
dev.off()

Kernel-based intensity estimation, using the uniformly-edge-corrected estimators, for Uber pick-up locations data in Brooklyn, Newyork, during 04.2014

In the “Kernel-based intensity estimation for Spatial point patterns on linear networks” plot, distinct patterns of high and low intensity are discernible along the road networks. However, the differentiation between Scott’s rule and Likelihood cross-validation appears subtle, suggesting similar intensity distributions. This similarity may stem from comparable bandwidth selections or inherent characteristics of the spatial point pattern. Despite the overlap, the plot effectively visualizes the spatial variations in intensity, offering valuable insights into the underlying point process on the linear networks.

The findings contribute to the broader understanding of kernel-based intensity estimation for spatial point patterns on linear networks, providing a foundation for refining analytical approaches in similar contexts.

Chapter 5 Time series of point patterns

I now consider a series of time-ordered spatial point patterns related to Uber pickup points in Brooklyn. This type of data typically includes patterns observed hourly, daily, weekly, weekdays. Which are expected to show correlation. Here, one might be interested in studying the temporal evolution of the underlying process while considering local spatial variation.

In this context, I will examine a time series of point patterns representing Uber pickup points in Brooklyn. The variations in these point patterns can be observed in terms of both daily and hourly patterns.

X_uber.time <- ppp(xy_uber[,1], xy_uber[,2], window = winn.br,
                     marks = data.frame(day = uber.br$day,
                                        hour= uber.br$hour,
                                        weekdays = uber.br$weekdays,
                                        week = uber.br$week))
X_uber.time

## Marked planar point pattern: 61668 points
## Mark variables: day, hour, weekdays, week 
## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units

5.1 weekly patterns

X_uber.time.week <- split(X_uber.time, 
                          f = as.factor(X_uber.time$marks$week))
X_uber.time.week

## Point pattern split by factor 
## 
## 14:
## Marked planar point pattern: 14708 points
## Mark variables: day, hour, weekdays, week 
## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
## 
## 15:
## Marked planar point pattern: 13601 points
## Mark variables: day, hour, weekdays, week 
## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
## 
## 16:
## Marked planar point pattern: 13659 points
## Mark variables: day, hour, weekdays, week 
## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
## 
## 17:
## Marked planar point pattern: 15241 points
## Mark variables: day, hour, weekdays, week 
## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
## 
## 18:
## Marked planar point pattern: 4459 points
## Mark variables: day, hour, weekdays, week 
## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units

par(mfrow=c(2,3))

for (i in 1:length(X_uber.time.week)) {
  plot(unmark(X_uber.time.week[[i]]), pch = 20, cex = 2, main = paste("Week", i + 13),lwd = 2, cex.main = 6)
}

# save
dev.copy(png, "pic/weekly.pattern.png", width = 800, height = 600)
dev.off()

Weekly Uber pick-up locations data from week14-week18 in Brooklyn during April 2014

After splitting the data into time-ordered weekly point patterns, my next step is to estimate bandwidth using the Scott’s rule. Focusing on the timely evolution of patterns, we compute the geometric mean of chosen bandwidths as a common bandwidth for all instances.

scott_uber.week <- lapply(X = 1:length(X_uber.time.week), function(i){
  bw.scott(X_uber.time.week[[i]])
})
scott_uber.week  <- unlist(scott_uber.week )  
sigma.x <- exp(mean(log(scott_uber.week[c(1,3,5,7,9)])))
sigma.y <- exp(mean(log(scott_uber.week[c(2,4,6,8,10)])))

d_uber.week <- lapply(X = 1:length(X_uber.time.week), function(i){
  d <- density.ppp(X_uber.time.week[[i]],
                   sigma = c(sigma.x = sigma.x, 
                             sigma.y = sigma.y), 
               diggle = TRUE, 
               positive = TRUE, 
               leaveoneout = FALSE)
  raster(d)
})

d_uber.week  <- stack(d_uber.week)
d_uber.week02  <- d_uber.week *10^3

names <- expand.grid(c(14:18))

spplot(d_uber.week02, col.regions = rev(viridis(20,option = "A")),
colorkey = list(labels = list(cex = 3)), scales = list(draw = FALSE),
par.strip.text = list(cex = 3), names.attr = paste("Week", names[,1])
) 

# save
dev.copy(png, "pic/weekly.intensity.png", width = 800, height = 600)
dev.off()

Kernel-based intensity estimation, using the Jones-Diggle estimator and Scott’s rule bandwidth, for Weekly Uber pick-up locations data from week14-week18 in Brooklyn during April 2014.

In the “Weekly Intensity Estimation Plot,” which employs the Jones-Diggle estimator with Scott’s rule bandwidth, depicting Uber pick-up locations in Brooklyn from week 14 to week 18 in April 2014, the intensity pattern remains relatively consistent across weeks. The apparent difference in week 18 may be attributed to insufficient data for the entire week. Two prominent high-intensity areas are concentrated in the North and North-West regions, revealing spatial concentrations of Uber activities. The stability in the weekly patterns suggests a certain regularity in pick-up point distribution, while the localized intensity concentrations indicate specific hotspots that consistently attract high Uber pick-up activity.

5.2 weekdays pattern

X_uber.time.weekdays <- split(X_uber.time, 
                          f = as.factor(X_uber.time$marks$weekdays))
X_uber.time.weekdays[c(2,6,7,5,1,3,4)] -> X_uber.time.weekdays
X_uber.time.weekdays


names=  expand.grid(c("Monday","Tuesday","Wednesday","Thursday","Friday",
                      "Saturday","Sunday"))
par(mfrow=c(2,4))
for (i in 1:length(X_uber.time.weekdays)) {
  plot(unmark(X_uber.time.weekdays[[i]]), pch = 20, cex = 2, main = names[,1][i],lwd = 2, cex.main = 6)
}

# save
dev.copy(png, "pic/weekdays.pattern.png", width = 800, height = 600)
dev.off()

weekdays Uber pick-up locations data from Monday-Sunday in Brooklyn during April 2014

After splitting the data into time-ordered weekdays point patterns, my next step is to estimate bandwidth using the Scott’s rule. Focusing on the timely evolution of patterns, we compute the geometric mean of chosen bandwidths as a common bandwidth for all instances.

scott_uber.weekdays <- lapply(X = 1:length(X_uber.time.weekdays), function(i){
  bw.scott(X_uber.time.weekdays[[i]])
})
scott_uber.weekdays  <- unlist(scott_uber.weekdays ) 
sigma.x <- exp(mean(log(scott_uber.weekdays[c(1,3,5,7,9,11,13)])))
sigma.y <- exp(mean(log(scott_uber.weekdays[c(2,4,6,8,10,12,14)])))

d_uber.weekdays <- lapply(X = 1:length(X_uber.time.weekdays), function(i){
  d <- density.ppp(X_uber.time.weekdays[[i]],
                   sigma = c(sigma.x = sigma.x, 
                             sigma.y = sigma.y), 
               diggle = TRUE, 
               positive = TRUE, 
               leaveoneout = FALSE)
  raster(d)
})

d_uber.weekdays  <- stack(d_uber.weekdays)
d_uber.weekdays02  <- d_uber.weekdays *10^3

names=  expand.grid(c("Monday","Tuesday","Wednesday","Thursday","Friday",
                      "Saturday","Sunday"))

spplot(d_uber.weekdays02, 
       col.regions = rev(viridis(20,option = "A")),
       colorkey = list(labels = list(cex = 3)), 
       scales = list(draw = FALSE),
       par.strip.text = list(cex = 3), 
       names.attr = names[,1],
       layout=c(4,2)) 

# save
dev.copy(png, "pic/weekdays.intensity.png", width = 1000, height = 800)
dev.off()

Kernel-based intensity estimation, using the Jones-Diggle estimator and Scott’s rule bandwidth, for weekdays Uber pick-up locations data from Monday-Sunday in Brooklyn during April 2014.

In the “Weekdays Intensity Estimation Plot,” utilizing the Jones-Diggle estimator with Scott’s rule bandwidth for Uber pick-up locations in Brooklyn from Monday to Sunday in April 2014, notable variations in intensity patterns are observed. Wednesdays and Fridays exhibit relatively higher intensities compared to Mondays, Tuesdays, and Thursdays, which show lower intensities. The peak intensity is observed on Saturdays, while Sundays display a relatively lower intensity. These patterns suggest a dynamic spatial-temporal distribution of Uber pick-up points throughout the weekdays, with specific days experiencing higher or lower concentrations of activity. Understanding such variations is crucial for optimizing service allocation and meeting user demand during different times of the week.

5.3 daily pattern

X_uber.time.daily <- split(X_uber.time, 
                          f = as.factor(X_uber.time$marks$day))
X_uber.time.daily[1:6]

names=  expand.grid(c(1:30))
par(mfrow=c(4,8))
for (i in 1:length(X_uber.time.daily)) {
  plot(unmark(X_uber.time.daily[[i]]), pch = 20, cex = 2, main = paste0("Day",names[,1][i]),lwd = 2, cex.main = 6)
}

# save
dev.copy(png, "pic/daily.pattern.png", width = 1200, height = 900)
dev.off()

daily Uber pick-up locations data from Day1-Day30 in Brooklyn during April 2014

After splitting the data into time-ordered daily point patterns, my next step is to estimate bandwidth using the Scott’s rule. Focusing on the timely evolution of patterns, we compute the geometric mean of chosen bandwidths as a common bandwidth for all instances.

scott_uber.daily <- lapply(X = 1:length(X_uber.time.daily), function(i){
  bw.scott(X_uber.time.daily[[i]])
})
scott_uber.daily  <- unlist(scott_uber.daily ) 
sigma.x <- exp(mean(log(scott_uber.daily[c(seq(1,59,by=2))])))
sigma.y <- exp(mean(log(scott_uber.daily[c(seq(2,60,by=2))])))

d_uber.daily <- lapply(X = 1:length(X_uber.time.daily), function(i){
  d <- density.ppp(X_uber.time.daily[[i]],
                   sigma = c(sigma.x = sigma.x, 
                             sigma.y = sigma.y), 
               diggle = TRUE, 
               positive = TRUE, 
               leaveoneout = FALSE)
  raster(d)
})

d_uber.daily  <- stack(d_uber.daily)
d_uber.daily02  <- d_uber.daily *10^3

names=  expand.grid(paste0("Day",c(1:30)))

spplot(d_uber.daily02, 
       col.regions = rev(viridis(20,option = "A")),
       colorkey = list(labels = list(cex = 3)), 
       scales = list(draw = FALSE),
       par.strip.text = list(cex = 2), 
       names.attr = names[,1],
       layout=c(8,4)) 

# save
dev.copy(png, "pic/daily.intensity.png", width = 1200, height = 900)
dev.off()

## [1] "D:/2023/Spatial data journey/project"

Kernel-based intensity estimation, using the Jones-Diggle estimator and Scott’s rule bandwidth, for daily Uber pick-up locations data from Day1-Day30 in Brooklyn during April 2014.

In the “Daily Intensity Estimation Plot” depicting Uber pick-up locations in Brooklyn from Day 1 to Day 30 in April 2014, significant variations in intensity patterns are discerned. Notably, higher peak circles appear every 7 days, specifically on Day 5, Day 12, Day 19, and Day 26. This recurrent pattern suggests a weekly cycle, where the intensity fluctuates from low to high and then returns to a lower level within each week. Understanding these weekly variations is crucial for comprehending the temporal dynamics of Uber pick-up activities, allowing for more informed decision-making in terms of resource allocation, service planning, and meeting the evolving demands of users throughout the month.

5.4 hourly pattern

X_uber.time.hourly <- split(X_uber.time, 
                          f = as.factor(X_uber.time$marks$hour))

X_uber.time.hourly[1:6]

names=  expand.grid(paste0("Hour",c(0:23)))
par(mfrow=c(4,6))
for (i in 1:length(X_uber.time.hourly)) {
  plot(unmark(X_uber.time.hourly[[i]]), pch = 20, cex = 2, main = names[,1][i],lwd = 2, cex.main = 6)
}

# save
dev.copy(png, "pic/hourly.pattern.png", width = 1000, height = 800)
dev.off()

hourly Uber pick-up locations data from Hour0-Hour23 in Brooklyn during April 2014

After splitting the data into time-ordered hourly point patterns, my next step is to estimate bandwidth using the Scott’s rule. Focusing on the timely evolution of patterns, we compute the geometric mean of chosen bandwidths as a common bandwidth for all instances.

scott_uber.hourly <- lapply(X = 1:length(X_uber.time.hourly), function(i){
  bw.scott(X_uber.time.hourly[[i]])
})
scott_uber.hourly  <- unlist(scott_uber.hourly ) 

sigma.x <- exp(mean(log(scott_uber.hourly[c(seq(1,59,by=2))])))
sigma.y <- exp(mean(log(scott_uber.hourly[c(seq(2,48,by=2))])))

d_uber.hourly <- lapply(X = 1:length(X_uber.time.hourly), function(i){
  d <- density.ppp(X_uber.time.hourly[[i]],
                   sigma = c(sigma.x = sigma.x, 
                             sigma.y = sigma.y), 
               diggle = TRUE, 
               positive = TRUE, 
               leaveoneout = FALSE)
  raster(d)
})

d_uber.hourly  <- stack(d_uber.hourly)
d_uber.hourly02  <- d_uber.hourly *10^3

names=  expand.grid(paste0("Hour",c(0:23)))

spplot(d_uber.hourly02, 
       col.regions = rev(viridis(20,option = "A")),
       colorkey = list(labels = list(cex = 3)), 
       scales = list(draw = FALSE),
       par.strip.text = list(cex = 2), 
       names.attr = names[,1],
       layout=c(6,4)) 

# save
dev.copy(png, "pic/hourly.intensity.png", width = 1000, height = 800)
dev.off()

Kernel-based intensity estimation, using the Jones-Diggle estimator and Scott’s rule bandwidth, for hourly Uber pick-up locations data from Hour0-Hour23 in Brooklyn during April 2014.

The “Daily Intensity Estimate Plot” for Uber pickup locations in Brooklyn from 1 hour to 23 hours in April 2014 reveals notable changes in intensity patterns throughout the day. A distinct peak in intensity is observed during the morning hours, particularly at 7-8 hours, reflecting the heightened demand associated with the morning work commute. Following this, another substantial peak occurs between Hour 17 and Hour 22, corresponding to increased activity and the evening work commute. Conversely, the lowest intensity is recorded between Hour 1 and Hour 3, characterized by a relatively sparse distribution with only one unobtrusive region in the north. However, other hours exhibit higher intensity concentrations, notably in the north and north-west regions. Understanding these temporal variations is essential for optimizing Uber services, ensuring efficient resource allocation during peak demand periods, and enhancing overall user experience throughout the diverse hours of the day.

Non-parametric intensity estimation for Uber Pick-up spatial points patterns in Brooklyn, New York, USA

Spatial Data Journey

Ting Wei (t.wei@student.uw.edu.pl)

2024-01-14 11:52:17.51361