In the contemporary urban landscape, the analysis of spatial point processes has become increasingly crucial, with Uber pick-up points serving as a paradigmatic example of such patterns. The focal point of this thesis revolves around the estimation of the intensity function within spatial point processes, considering diverse state spaces such as R^2 and linear networks. A central theme of this research lies in the exploration and visualization of the intricate spatial dynamics characterizing these phenomena.
Accurately estimating the intensity function is paramount for understanding the underlying distribution of Uber pick-up points, facilitating informed decision-making in urban planning and resource allocation. Traditional analyses often fall short when faced with irregular windows and complex linear networks. This research confronts these challenges head-on, by delving into irregular windows and time-ordered point patterns, the study aims to provide a comprehensive understanding of the nuanced spatial processes inherent in services like Uber.
A distinctive aspect of this research involves the detailed discussion of various kernel-based intensity estimators within the statistical computing environment R. Kernel-based methods offer a versatile and flexible approach to estimating the intensity function by assigning weights to neighboring events. The thesis meticulously examines these estimators, providing an in-depth analysis of their strengths, weaknesses, and applications within the context of spatial point processes. Through this exploration, the research contributes to the methodological advancements in spatial statistics, offering practitioners a comprehensive toolkit for analyzing complex point patterns.
The visual representation of intensity functions emerges as a crucial facet of this investigation. Visualization serves not only as a means of presenting the intricate spatial dynamics but also as a tool for gaining intuitive insights into the underlying patterns. The thesis emphasizes the importance of visualization techniques in conveying the intensity of Uber pick-up points across different state spaces and temporal sequences. By leveraging advanced visualization methods, this research aims to enhance the interpretability of intensity functions, providing a visually compelling narrative that aids both researchers and urban planners in grasping the complexities of spatial point processes.
In summary, this thesis pioneers an in-depth exploration of Uber pick-up point patterns, extending its analysis to irregular windows, time-ordered point patterns, and diverse state spaces. Through a detailed discussion of kernel-based intensity estimators in R and a focus on innovative visualization techniques, the research endeavors to not only contribute to the theoretical understanding of spatial point processes but also to provide practical tools for effective decision-making in urban planning.
Non-parametric intensity estimation for spatial point patterns with R
authors:
address: Department of Mathematics, University Jaume I, Castellón, Spain.
email: mateu@uji.es
address: Department of Mathematics and Mathematical Statistics, Umeå University, Umeå, Sweden.
email: mehdi.moradi@umu.se
For this project, I will use two datasets:
for Uber, just retrieve the data from this link : download
for OpenStreetMap, I use the GeoFabrik website to download the new-york-latest-free.shp.zip
shapefile
There are raw data on Uber pickups in New York City from April 2014. The datad has the following columns:
Date/Time : The date and time of
the Uber pickup
Lat : The latitude of the Uber
pickup
Lon : The longitude of the Uber
pickup
Base : The TLC base company code
affiliated with the Uber pickup
Once loaded. We can have glimpse at the Uber table:
## 'data.frame': 564516 obs. of 4 variables:
## $ Date.Time: chr "4/1/2014 0:11:00" "4/1/2014 0:17:00" "4/1/2014 0:21:00" "4/1/2014 0:28:00" ...
## $ Lat : num 40.8 40.7 40.7 40.8 40.8 ...
## $ Lon : num -74 -74 -74 -74 -74 ...
## $ Base : chr "B02512" "B02512" "B02512" "B02512" ...
## Date.Time Lat Lon Base
## 1 4/1/2014 0:11:00 40.7690 -73.9549 B02512
## 2 4/1/2014 0:17:00 40.7267 -74.0345 B02512
## 3 4/1/2014 0:21:00 40.7316 -73.9873 B02512
## 4 4/1/2014 0:28:00 40.7588 -73.9776 B02512
## 5 4/1/2014 0:33:00 40.7594 -73.9722 B02512
## 6 4/1/2014 0:33:00 40.7383 -74.0403 B02512
We can see 564,516 Uber pickup locations happend on the 04. 2014 in New York.
Visualizing data related to Brooklyn, New York. The code involves loading geographic data for New York state, specifically the Brooklyn area, and plotting various spatial features such as places, roads, and landuse. Additionally, limiting the x and y-axis ranges and Uber pickup points to focus on the Brooklyn area.
# New york state border
newyork.place = sf::read_sf(dsn="data/new-york-latest-free.shp/",
layer="gis_osm_places_a_free_1")
# specifically the Brooklyn area
br.polygon = newyork.place %>% filter(name=="Brooklyn")
#convert to sp
br.polygon.sp= as_Spatial(br.polygon)In the next step, I filtered the Uber pickups dataset to retain only those occurring within the Brooklyn area. Containing 61,668 Uber pickups that fall within the defined Brooklyn grid.
# Secondly, only reserve the pickups on Brooklyn
uber.br=uber
coordinates(uber.br)=c("Lon","Lat")
crs(uber.br)=crs(br.polygon)
uber.br$id=over(uber.br,br.polygon.sp)
uber.br=data.frame(uber.br)
uber.br[!is.na(uber.br$id.osm_id),]->uber.br
#61,668 points into brooklyn area
Sys.setlocale("LC_TIME", "English")## [1] "English_United States.1252"
uber.br$Date.Time <- as.POSIXct(uber.br$Date.Time, format = "%m/%d/%Y %H:%M:%S")
uber.br=uber.br %>%
mutate(day=day(Date.Time),
hour=hour(Date.Time),
weekdays=weekdays(Date.Time),
week=week(Date.Time))
head(uber.br)## Date.Time Lat Lon Base id.osm_id id.code id.fclass
## 23 2014-04-01 04:15:00 40.6561 -73.9531 B02512 9691750 1010 suburb
## 65 2014-04-01 06:15:00 40.7204 -73.9545 B02512 9691750 1010 suburb
## 89 2014-04-01 06:44:00 40.7231 -73.9442 B02512 9691750 1010 suburb
## 102 2014-04-01 07:00:00 40.6990 -73.9951 B02512 9691750 1010 suburb
## 178 2014-04-01 08:19:00 40.7214 -73.9475 B02512 9691750 1010 suburb
## 193 2014-04-01 08:32:00 40.6819 -73.9358 B02512 9691750 1010 suburb
## id.population id.name optional day hour weekdays week
## 23 2736074 Brooklyn TRUE 1 4 Tuesday 14
## 65 2736074 Brooklyn TRUE 1 6 Tuesday 14
## 89 2736074 Brooklyn TRUE 1 6 Tuesday 14
## 102 2736074 Brooklyn TRUE 1 7 Tuesday 14
## 178 2736074 Brooklyn TRUE 1 8 Tuesday 14
## 193 2736074 Brooklyn TRUE 1 8 Tuesday 14
br.polygon <- st_transform(br.polygon, crs = 6257) # transform coordinates
# convert to owin class
winn.br <- as.owin(br.polygon)
coordinates(uber.br)=c("Lon","Lat")
uber.br <- st_as_sf(uber.br)
uber.br <- st_set_crs(uber.br, 4326)
uber.br <- st_transform(uber.br, crs = 6257)xy_uber <- st_coordinates(uber.br) # retrieve coordinates
#convet to ppp class
X_uber <- as.ppp(xy_uber, winn.br)
plot(X_uber, pch = 20, main = "Uber Pick-up Points", lwd = 1, cex = 0.3)Below we estimate the smoothing bandwidth parameter for the considered forest fire data using the reviewed methods:
Scott's Rule : A bandwidth selection method based on
a heuristic formula that considers the sample variance and the number of
data points, aiming to find an optimal smoothing parameter for kernel
density estimation.
Likelihood Cross-Validation : A technique that
maximizes the likelihood function while leaving out each data point
iteratively, helping to select a bandwidth that balances model fit and
complexity.
Diggle's Approach : A method proposed by Diggle for
bandwidth selection in kernel density estimation.
Cronie and van Lieshout's Criterion : A criterion
for bandwidth selection that minimizes an objective function related to
the integrated mean squared error, aiming to strike a balance between
over-smoothing and under-smoothing in kernel density
estimation.
# Scott’s rule
scott.uber <- bw.scott(X_uber)
# Likelihood cross validation,running time is very long
ppl.uber <- bw.ppl(X_uber)
#Didggle's approach
diggle.uber <- bw.diggle(X_uber)
# Cronie and van Lieshout's criterio, running time also long
CvL.uber <- bw.CvL(X_uber)
parameters_list <- list(scott.uber,
ppl.uber,
diggle.uber,
CvL.uber)# 保存到RDS文件
#saveRDS(parameters_list, file = "output/parameter_values.rds")
a= read_rds("output/parameter_values.rds")
a## [[1]]
## sigma.x sigma.y
## 324.3484 455.0993
##
## [[2]]
## sigma
## 21.98301
##
## [[3]]
## sigma
## 1.113344
##
## [[4]]
## sigma
## 4228.733
Bandwidth from different methods:
Scott’s rule: 324.3484 m, 455.0993 m
Likelihood cross validation (ppl): 21.98301 m
Diggle’s approach:1.113344 m
Cronie and van Lieshout’s criterion (CvL): 4228.733 m.
The larger bandwidths obtained from Scott’s rule and Cronie and van Lieshout’s criterion reflect differences in bandwidth selection methods for point pattern density estimation.
Having estimated bandwidth through different methods, I next estimate the intensity of the uber pick-up points data by employing the intensity estimators and using both kernel functions Gaussian and Epanechnikov. The function density.ppp provides kernel-based intensity estimation for point patterns on R^2.
Employing the Gaussian kernel and uniformly-edge-corrected estimator delving into the spatial patterns of Uber pick-up points with 4 representing different approaches: Scott’s rule, Likelihood cross-validation, Diggle’s approach, and Cronie and Van Lieshout’s criterion .
scott.uber <- density.ppp(X_uber,
sigma = a[[1]],
leaveoneout = FALSE,
positive = TRUE)
ppl.uber <- density.ppp(X_uber,
sigma = a[[2]],
leaveoneout = FALSE,
positive = TRUE)
diggle.uber <- density.ppp(X_uber,
sigma =a[[3]],
leaveoneout = FALSE,
positive = TRUE)
CvL.uber <- density.ppp(X_uber,
sigma = a[[4]],
leaveoneout = FALSE,
positive = TRUE)inten.gua.uber <- stack(raster(scott.uber02),
raster(ppl.uber),
raster(diggle.uber),
raster(CvL.uber))
inten.gua.uber <- inten.gua.uber*10^3
names(inten.gua.uber) <- c("scott_gaus_U",
"ppl_gaus_U",
"diggle_gaus_U",
"CvL_gaus_U")
at <- c(seq(0, 1.4, 0.2), 2,4,7)
# Plot the intensity using spplot
spplot(inten.gua.uber, at = at, scales = list(draw = FALSE),
col.regions = rev(viridis(20,option = "A")),
colorkey = list(labels = list(cex = 3)),
par.strip.text = list(cex = 2),
layout = c(2, 2),
main = list(label = "Uber Pick-up Intensity with Gaussian kernel and uniformly-edge-corrected estimator", cex = 1.5))
# Save the plot
dev.copy(png, "pic/gua.intensity.png", width = 800, height = 800)
dev.off()Kernel-based intensity estimation for Uber Pick-Up points in Brooklyn during 04.2014.
In the “Gaussian kernel and uniformly-edge-corrected estimator” plot, a distinct contrast is evident among the results obtained using the four different bandwidth selection methods:
Scott’s rule: The map reveals two areas of high intensity situated to the north and north-west of Brooklyn. This highlights a non-uniform intensity distribution in the Uber pick-up locations data.
Likelihood cross-validation and Diggle’s approach: With bandwidths of 21.98301 meters and 1.113344 meters, respectively, these methods generate under-smoothed maps. The similarity between the two maps is striking, with little discernible difference. Both maps exhibit finer details, indicating a higher resolution due to the relatively smaller bandwidths.
Cronie and Van Lieshout’s criterion: Utilizing the largest bandwidth of 4228.733 meters, this approach results in over-smoothed maps. The map loses some details about the intensity distribution, presenting a generalized view of the spatial pattern. The larger bandwidth leads to a more aggregated representation, potentially obscuring localized variations in the intensity of the events.
So in this case, I recommend using the Scott’s rule plot for its balanced approach, avoiding both under-smoothing and over-smoothing. This choice preserves essential details and captures the overall tendency in the spatial distribution of events effectively.
This comparative analysis emphasizes the importance of choosing an appropriate bandwidth selection method, as it significantly influences the interpretation of spatial patterns and the level of detail captured in the resulting intensity maps.
Now I turn to estimate the intensity through the Jones-Diggle estimator. The Diggle model takes into account directionality and boundary effects in point pattern density estimation.
scott_dig.uber <- density.ppp(X_uber,
sigma =a[[1]],
leaveoneout = FALSE,
positive = TRUE,
diggle = TRUE)
ppl_dig.uber <- density.ppp(X_uber,
sigma = a[[2]],
leaveoneout = FALSE,
positive = TRUE,
diggle = TRUE)
diggle_dig.uber <- density.ppp(X_uber,
sigma =a[[3]],
leaveoneout = FALSE,
positive = TRUE,
diggle = TRUE)
cvl_dig.uber <- density.ppp(X_uber,
sigma =a[[4]],
leaveoneout = FALSE,
positive = TRUE,
diggle = TRUE)The Epanechnikov kernel is also a type of smoothing function used in kernel density estimation. We can compare the default kernel Gaussian and Epanechnikov to see the difference.
scott_epa.uber <- density.ppp(X_uber,
sigma = a[[1]],
leaveoneout = FALSE,
positive = TRUE,
kernel = "epanechnikov")
ppl_epa.uber <- density.ppp(X_uber,
sigma = a[[2]],
leaveoneout = FALSE,
positive = TRUE,
kernel = "epanechnikov")
diggle_epa.uber <- density.ppp(X_uber,
sigma =a[[3]],
leaveoneout = FALSE,
positive = TRUE,
kernel = "epanechnikov")
cvL_epa.uber <- density.ppp(X_uber,
sigma = a[[4]],
leaveoneout = FALSE,
positive = TRUE,
kernel = "epanechnikov")scott_dig_epa.uber <- density.ppp(X_uber,
sigma =a[[1]],
leaveoneout = FALSE,
positive = TRUE,
diggle = TRUE,
kernel = "epanechnikov")
ppl_dig_epa.uber <- density.ppp(X_uber,
sigma = a[[2]],
leaveoneout = FALSE,
positive = TRUE,
diggle = TRUE,
kernel = "epanechnikov")
diggle_dig_epa.uber <- density.ppp(X_uber,
sigma =a[[3]],
leaveoneout = FALSE,
positive = TRUE,
diggle = TRUE,
kernel = "epanechnikov")
cvl_dig_epa.uber <- density.ppp(X_uber,
sigma =a[[4]],
leaveoneout = FALSE,
positive = TRUE,
diggle = TRUE,
kernel = "epanechnikov")After employing diverse bandwidth selection approaches and kernel functions, resulting in varied intensity estimates, Then I want to visually present and analyze these estimates, highlighting potential discrepancies.
sp_int.uber <- stack(raster(scott.uber),
raster(ppl.uber),
raster(diggle.uber),
raster(CvL.uber),
raster(scott_dig.uber),
raster(ppl_dig.uber),
raster(diggle_dig.uber),
raster(cvl_dig.uber),
raster(scott_epa.uber),
raster(ppl_epa.uber),
raster(diggle_epa.uber),
raster(cvL_epa.uber),
raster(scott_dig_epa.uber),
raster(ppl_dig_epa.uber),
raster(diggle_dig_epa.uber),
raster(cvl_dig_epa.uber))
sp_int.uber <- sp_int.uber*10^3
names(sp_int.uber) <- c("scott_gaus_U",
"ppl_gaus_U",
"diggle_gaus_U",
"CvL_gaus_U",
"scott_gaus_JD",
"ppl_gaus_JD",
"diggle_gaus_JD",
"CvL_gaus_JD",
"scott_epa_U",
"ppl_epa_U",
"diggle_epa_U",
"CvL_epa_U",
"scott_epa_JD",
"ppl_epa_JD",
"diggle_epa_JD",
"CvL_epa_JD")
at <- c(seq(0, 1.4, 0.2), 2, 4, 7,12)
spplot(sp_int.uber, at = at, scales = list(draw = FALSE),
col.regions = rev(viridis(20,option = "A")),
colorkey = list(labels = list(cex = 3)),
par.strip.text = list(cex = 2),
layout = c(4, NA),
main = list(label = "Uber Pick-up Intensity with different kernels and estimators", cex = 1.5))
# Save the plot
dev.copy(png, "pic/compare.estimator.png", width = 1000, height = 1200)
dev.off()Kernel-based intensity estimation for Uber Pick-Up points in Brooklyn during 04.2014. Layer’s names start with bandwidth selection approach followed by the employed kernel and the edge-correctionKernel-based intensity estimation.
From this “Different kernels and estimators” plot, we can see:
Scott’s rule: There is no discernible difference between the Gaussian and Epanechnikov kernels, as well as the uniformly-edge-corrected and Jones-Diggle estimators. This lack of distinction implies that, under the specified bandwidth, the choice of kernel type and estimator has minimal impact on the resulting intensity maps.
Likelihood cross-validation and Diggle’s approach: Despite employing various kernels and estimators, consistently produce under-smoothed maps. This outcome is attributed to the small bandwidths chosen in the application of these methods. The limited smoothing effect results in intensity maps that retain detailed structures but may overlook broader trends or patterns. While these methods offer flexibility in capturing intricate spatial features, their performance is highly sensitive to the choice of bandwidth, necessitating careful consideration to strike a balance between local detail preservation and global trend representation in the resulting intensity maps.
Cronie and Van Lieshout’s criterion: Noticeable disparities when employing uniformly-edge-corrected and Jones-Diggle estimators. However, determining the superior estimator is challenging due to the substantial bandwidth of 4228.733 meters. This approach tends to over-smooth the resulting intensity maps, obscuring finer spatial details and potentially diminishing the ability to capture localized variations in point patterns.
This comparative analysis underscores the critical role of selecting an appropriate bandwidth and kernel-estimator combination, impacting the interpretation of spatial patterns and level of detail in intensity maps. The study highlights the nuanced trade-offs between under-smoothing and over-smoothing, emphasizing the need for careful consideration to strike a balance tailored to the specific characteristics of the dataset and the analytical objectives.
The relationship between Uber pick-up points and the Brooklyn main road network introduces a novel perspective into urban transportation dynamics. Understanding how these pick-up points interact with the intricate structure of the road network sheds light on mobility patterns, congestion hotspots, and potential areas of high demand. This concept not only explores the spatial distribution of pick-up points but also delves into the network’s influence on user behavior and the efficiency of the overall transportation system. It provides valuable insights for optimizing ride-sharing services and improving the urban mobility experience.
## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
#the road-linestring
net.br = sf::read_sf(dsn="data/new-york-latest-free.shp/",
layer="gis_osm_roads_free_1")
net.br = st_transform(net.br,crs=6257)
#filter the main road
net.br %>% filter(fclass %in% c("motorway","primary","residential"))->net.br
net.br = as.linnet.SpatialLines(as_Spatial(net.br))## Warning: Repeated vertices (on the same line) were removed
## Warning: Duplicated segments were ignored
Window(net.br) = winn.br
# networks on the brooklyn boundary window
net.br = as.linnet(net.br, W= winn.br)
net.br## Linear network with 42938 vertices and 47333 lines
## Line segments are marked
## Enclosing window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
A linear network(Brooklyn roads) consists of 42938 vertices and 47333 lines. Each line segment is marked with specific attributes. The enclosing window is defined by a polygonal boundary (Brooklyn boundary), and the bounding rectangle spans the coordinate range [963014, 981944.6] x [4977346, 4998278] units.
## Simple feature collection with 61668 features and 12 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 964406.4 ymin: 4979850 xmax: 979539.2 ymax: 4998196
## Projected CRS: MAGNA-SIRGAS / Medellin urban grid
## First 10 features:
## Date.Time Base id.osm_id id.code id.fclass id.population
## 23 2014-04-01 04:15:00 B02512 9691750 1010 suburb 2736074
## 65 2014-04-01 06:15:00 B02512 9691750 1010 suburb 2736074
## 89 2014-04-01 06:44:00 B02512 9691750 1010 suburb 2736074
## 102 2014-04-01 07:00:00 B02512 9691750 1010 suburb 2736074
## 178 2014-04-01 08:19:00 B02512 9691750 1010 suburb 2736074
## 193 2014-04-01 08:32:00 B02512 9691750 1010 suburb 2736074
## 219 2014-04-01 09:10:00 B02512 9691750 1010 suburb 2736074
## 284 2014-04-01 10:53:00 B02512 9691750 1010 suburb 2736074
## 318 2014-04-01 11:24:00 B02512 9691750 1010 suburb 2736074
## 333 2014-04-01 11:44:00 B02512 9691750 1010 suburb 2736074
## id.name optional day hour weekdays week geometry
## 23 Brooklyn TRUE 1 4 Tuesday 14 POINT (971721.3 4989061)
## 65 Brooklyn TRUE 1 6 Tuesday 14 POINT (971472 4996172)
## 89 Brooklyn TRUE 1 6 Tuesday 14 POINT (972336.9 4996473)
## 102 Brooklyn TRUE 1 7 Tuesday 14 POINT (968083.4 4993797)
## 178 Brooklyn TRUE 1 8 Tuesday 14 POINT (972061.5 4996284)
## 193 Brooklyn TRUE 1 8 Tuesday 14 POINT (973131.6 4991918)
## 219 Brooklyn TRUE 1 9 Tuesday 14 POINT (971878.3 4995222)
## 284 Brooklyn TRUE 1 10 Tuesday 14 POINT (970414.6 4992984)
## 318 Brooklyn TRUE 1 11 Tuesday 14 POINT (972408.7 4993055)
## 333 Brooklyn TRUE 1 11 Tuesday 14 POINT (968188.1 4986541)
## Warning: data contain duplicated points
## Point pattern on linear network
## 61668 points
## Linear network with 42938 vertices and 47333 lines
## Line segments are marked
## Enclosing window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
# plot boundary
plot(br.polygon$geometry, lwd = 2, col = "lightgray")
# add points
plot(X_uber.linear, pch = 20, main = "", lwd = 1, cex = 0.5, add = TRUE,
cols = "red", col = "steelblue")
# title and legend
title("Uber Pick-Up Points on Linear Network", cex.main = 1.5)
legend("topright", legend = c("Linear Network", "Uber Pick-Up Points"),
pch = c(-1, 20), col = c("steelblue", "red"), lty = c(1, 0), bty = "n")
axis(1, cex.axis = 0.8)
axis(2, cex.axis = 0.8)
grid(col = "lightgray", lty = 2)
# save
dev.copy(png, "pic/uber_pickup_linear_network.png", width = 1200, height = 800)
dev.off()Uber Pick-Up Points on Brooklyn road. The boundary of Brooklyn is in gray, the street network in blue, and red dots stand for the location of uber pick-up locations.
Utilizing uniformly-edge-corrected estimators, I will focuses on analyzing Uber pick-up points in Brooklyn during the specific timeframe of April 2014 with applications involving two prominent methods for bandwidth selection: likelihood cross-validation and Scott’s rule.
Likelihood cross-validation offers a robust approach by assessing the model’s predictive performance, optimizing bandwidth selection to achieve reliable intensity estimates.
On the other hand, Scott’s rule, a widely used heuristic, provides a straightforward criterion for bandwidth determination based on the sample’s standard deviation and sample size.
scott_uber <- bw.scott.iso(X_uber.linear) # Scott rule
ppl_uber <- bw.lppl(X_uber.linear) # likelihood cross validation
parameters_list02 <- list(scott.uber,
ppl.uber)#saveRDS(parameters_list02, file = "output/parameter_values02.rds")
b= read_rds("output/parameter_values02.rds")Bandwidth for these 2 methods:
Scott’s rule: 384.2227 m
Likelihood cross validation (ppl): 38.4227 m
d_scott_uber <- densityQuick.lpp(X_uber.linear, sigma = b[[1]],
leaveoneout = FALSE, positive = TRUE,dimyx = 1024)
d_ppl_uber <- densityQuick.lpp(X_uber.linear, sigma = b[[2]],
leaveoneout = FALSE, positive = TRUE,
dimyx = 1024)
d_scott_uber02 <- d_scott_uber*20
(d_ppl_uber02 <- d_ppl_uber*20)## Image on linear network
## Linear network with 42938 vertices and 47333 lines
## Line segments are marked
## Enclosing window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
## real-valued pixel image
## 1024 x 1024 pixel array (ny, nx)
## enclosing rectangle: [963010, 981940] x [4977300, 4998300] units
## Data frame: 125948 sample points along network
## Average density: one sample point per 17 units
par(mfrow=c(1,2))
plot(raster(as.im(d_scott_uber02)),
main = "Scott’s rule for bandwidth selection",
legend = FALSE,
col =pal,
zlim = c(0,6),
cex.main = 1.5)
plot(br.polygon$geometry, add=TRUE, lwd=1)
plot(raster(as.im(d_ppl_uber02 )),
main = "Likelihood cross validation bandwidth selection",
axis.args = list(cex.axis = 4),
legend.width = 2,
col=pal,
zlim = c(0,6),
cex.main = 1.5)
plot(br.polygon$geometry, add = TRUE, lwd = 1)
# save
dev.copy(png, "pic/scottPpl.estimator.png", width = 1200, height = 800)
dev.off()Kernel-based intensity estimation, using the uniformly-edge-corrected estimators, for Uber pick-up locations data in Brooklyn, Newyork, during 04.2014
In the “Kernel-based intensity estimation for Spatial point patterns on linear networks” plot, distinct patterns of high and low intensity are discernible along the road networks. However, the differentiation between Scott’s rule and Likelihood cross-validation appears subtle, suggesting similar intensity distributions. This similarity may stem from comparable bandwidth selections or inherent characteristics of the spatial point pattern. Despite the overlap, the plot effectively visualizes the spatial variations in intensity, offering valuable insights into the underlying point process on the linear networks.
The findings contribute to the broader understanding of kernel-based intensity estimation for spatial point patterns on linear networks, providing a foundation for refining analytical approaches in similar contexts.
I now consider a series of time-ordered spatial point patterns related to Uber pickup points in Brooklyn. This type of data typically includes patterns observed hourly, daily, weekly, weekdays. Which are expected to show correlation. Here, one might be interested in studying the temporal evolution of the underlying process while considering local spatial variation.
In this context, I will examine a time series of point patterns representing Uber pickup points in Brooklyn. The variations in these point patterns can be observed in terms of both daily and hourly patterns.
X_uber.time <- ppp(xy_uber[,1], xy_uber[,2], window = winn.br,
marks = data.frame(day = uber.br$day,
hour= uber.br$hour,
weekdays = uber.br$weekdays,
week = uber.br$week))
X_uber.time## Marked planar point pattern: 61668 points
## Mark variables: day, hour, weekdays, week
## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
## Point pattern split by factor
##
## 14:
## Marked planar point pattern: 14708 points
## Mark variables: day, hour, weekdays, week
## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
##
## 15:
## Marked planar point pattern: 13601 points
## Mark variables: day, hour, weekdays, week
## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
##
## 16:
## Marked planar point pattern: 13659 points
## Mark variables: day, hour, weekdays, week
## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
##
## 17:
## Marked planar point pattern: 15241 points
## Mark variables: day, hour, weekdays, week
## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
##
## 18:
## Marked planar point pattern: 4459 points
## Mark variables: day, hour, weekdays, week
## window: polygonal boundary
## enclosing rectangle: [963014, 981944.6] x [4977346, 4998278] units
par(mfrow=c(2,3))
for (i in 1:length(X_uber.time.week)) {
plot(unmark(X_uber.time.week[[i]]), pch = 20, cex = 2, main = paste("Week", i + 13),lwd = 2, cex.main = 6)
}
# save
dev.copy(png, "pic/weekly.pattern.png", width = 800, height = 600)
dev.off()Weekly Uber pick-up locations data from week14-week18 in Brooklyn during April 2014
After splitting the data into time-ordered weekly point patterns, my next step is to estimate bandwidth using the Scott’s rule. Focusing on the timely evolution of patterns, we compute the geometric mean of chosen bandwidths as a common bandwidth for all instances.
scott_uber.week <- lapply(X = 1:length(X_uber.time.week), function(i){
bw.scott(X_uber.time.week[[i]])
})
scott_uber.week <- unlist(scott_uber.week )
sigma.x <- exp(mean(log(scott_uber.week[c(1,3,5,7,9)])))
sigma.y <- exp(mean(log(scott_uber.week[c(2,4,6,8,10)])))
d_uber.week <- lapply(X = 1:length(X_uber.time.week), function(i){
d <- density.ppp(X_uber.time.week[[i]],
sigma = c(sigma.x = sigma.x,
sigma.y = sigma.y),
diggle = TRUE,
positive = TRUE,
leaveoneout = FALSE)
raster(d)
})
d_uber.week <- stack(d_uber.week)
d_uber.week02 <- d_uber.week *10^3
names <- expand.grid(c(14:18))
spplot(d_uber.week02, col.regions = rev(viridis(20,option = "A")),
colorkey = list(labels = list(cex = 3)), scales = list(draw = FALSE),
par.strip.text = list(cex = 3), names.attr = paste("Week", names[,1])
)
# save
dev.copy(png, "pic/weekly.intensity.png", width = 800, height = 600)
dev.off()Kernel-based intensity estimation, using the Jones-Diggle estimator and Scott’s rule bandwidth, for Weekly Uber pick-up locations data from week14-week18 in Brooklyn during April 2014.
In the “Weekly Intensity Estimation Plot,” which employs the Jones-Diggle estimator with Scott’s rule bandwidth, depicting Uber pick-up locations in Brooklyn from week 14 to week 18 in April 2014, the intensity pattern remains relatively consistent across weeks. The apparent difference in week 18 may be attributed to insufficient data for the entire week. Two prominent high-intensity areas are concentrated in the North and North-West regions, revealing spatial concentrations of Uber activities. The stability in the weekly patterns suggests a certain regularity in pick-up point distribution, while the localized intensity concentrations indicate specific hotspots that consistently attract high Uber pick-up activity.
X_uber.time.weekdays <- split(X_uber.time,
f = as.factor(X_uber.time$marks$weekdays))
X_uber.time.weekdays[c(2,6,7,5,1,3,4)] -> X_uber.time.weekdays
X_uber.time.weekdays
names= expand.grid(c("Monday","Tuesday","Wednesday","Thursday","Friday",
"Saturday","Sunday"))
par(mfrow=c(2,4))
for (i in 1:length(X_uber.time.weekdays)) {
plot(unmark(X_uber.time.weekdays[[i]]), pch = 20, cex = 2, main = names[,1][i],lwd = 2, cex.main = 6)
}
# save
dev.copy(png, "pic/weekdays.pattern.png", width = 800, height = 600)
dev.off()weekdays Uber pick-up locations data from Monday-Sunday in Brooklyn during April 2014
After splitting the data into time-ordered weekdays point patterns, my next step is to estimate bandwidth using the Scott’s rule. Focusing on the timely evolution of patterns, we compute the geometric mean of chosen bandwidths as a common bandwidth for all instances.
scott_uber.weekdays <- lapply(X = 1:length(X_uber.time.weekdays), function(i){
bw.scott(X_uber.time.weekdays[[i]])
})
scott_uber.weekdays <- unlist(scott_uber.weekdays )
sigma.x <- exp(mean(log(scott_uber.weekdays[c(1,3,5,7,9,11,13)])))
sigma.y <- exp(mean(log(scott_uber.weekdays[c(2,4,6,8,10,12,14)])))
d_uber.weekdays <- lapply(X = 1:length(X_uber.time.weekdays), function(i){
d <- density.ppp(X_uber.time.weekdays[[i]],
sigma = c(sigma.x = sigma.x,
sigma.y = sigma.y),
diggle = TRUE,
positive = TRUE,
leaveoneout = FALSE)
raster(d)
})
d_uber.weekdays <- stack(d_uber.weekdays)
d_uber.weekdays02 <- d_uber.weekdays *10^3
names= expand.grid(c("Monday","Tuesday","Wednesday","Thursday","Friday",
"Saturday","Sunday"))
spplot(d_uber.weekdays02,
col.regions = rev(viridis(20,option = "A")),
colorkey = list(labels = list(cex = 3)),
scales = list(draw = FALSE),
par.strip.text = list(cex = 3),
names.attr = names[,1],
layout=c(4,2))
# save
dev.copy(png, "pic/weekdays.intensity.png", width = 1000, height = 800)
dev.off()Kernel-based intensity estimation, using the Jones-Diggle estimator and Scott’s rule bandwidth, for weekdays Uber pick-up locations data from Monday-Sunday in Brooklyn during April 2014.
In the “Weekdays Intensity Estimation Plot,” utilizing the Jones-Diggle estimator with Scott’s rule bandwidth for Uber pick-up locations in Brooklyn from Monday to Sunday in April 2014, notable variations in intensity patterns are observed. Wednesdays and Fridays exhibit relatively higher intensities compared to Mondays, Tuesdays, and Thursdays, which show lower intensities. The peak intensity is observed on Saturdays, while Sundays display a relatively lower intensity. These patterns suggest a dynamic spatial-temporal distribution of Uber pick-up points throughout the weekdays, with specific days experiencing higher or lower concentrations of activity. Understanding such variations is crucial for optimizing service allocation and meeting user demand during different times of the week.
X_uber.time.daily <- split(X_uber.time,
f = as.factor(X_uber.time$marks$day))
X_uber.time.daily[1:6]
names= expand.grid(c(1:30))
par(mfrow=c(4,8))
for (i in 1:length(X_uber.time.daily)) {
plot(unmark(X_uber.time.daily[[i]]), pch = 20, cex = 2, main = paste0("Day",names[,1][i]),lwd = 2, cex.main = 6)
}
# save
dev.copy(png, "pic/daily.pattern.png", width = 1200, height = 900)
dev.off()daily Uber pick-up locations data from Day1-Day30 in Brooklyn during April 2014
After splitting the data into time-ordered daily point patterns, my next step is to estimate bandwidth using the Scott’s rule. Focusing on the timely evolution of patterns, we compute the geometric mean of chosen bandwidths as a common bandwidth for all instances.
scott_uber.daily <- lapply(X = 1:length(X_uber.time.daily), function(i){
bw.scott(X_uber.time.daily[[i]])
})
scott_uber.daily <- unlist(scott_uber.daily )
sigma.x <- exp(mean(log(scott_uber.daily[c(seq(1,59,by=2))])))
sigma.y <- exp(mean(log(scott_uber.daily[c(seq(2,60,by=2))])))
d_uber.daily <- lapply(X = 1:length(X_uber.time.daily), function(i){
d <- density.ppp(X_uber.time.daily[[i]],
sigma = c(sigma.x = sigma.x,
sigma.y = sigma.y),
diggle = TRUE,
positive = TRUE,
leaveoneout = FALSE)
raster(d)
})
d_uber.daily <- stack(d_uber.daily)
d_uber.daily02 <- d_uber.daily *10^3
names= expand.grid(paste0("Day",c(1:30)))
spplot(d_uber.daily02,
col.regions = rev(viridis(20,option = "A")),
colorkey = list(labels = list(cex = 3)),
scales = list(draw = FALSE),
par.strip.text = list(cex = 2),
names.attr = names[,1],
layout=c(8,4))
# save
dev.copy(png, "pic/daily.intensity.png", width = 1200, height = 900)
dev.off()## [1] "D:/2023/Spatial data journey/project"
Kernel-based intensity estimation, using the Jones-Diggle estimator and Scott’s rule bandwidth, for daily Uber pick-up locations data from Day1-Day30 in Brooklyn during April 2014.
In the “Daily Intensity Estimation Plot” depicting Uber pick-up locations in Brooklyn from Day 1 to Day 30 in April 2014, significant variations in intensity patterns are discerned. Notably, higher peak circles appear every 7 days, specifically on Day 5, Day 12, Day 19, and Day 26. This recurrent pattern suggests a weekly cycle, where the intensity fluctuates from low to high and then returns to a lower level within each week. Understanding these weekly variations is crucial for comprehending the temporal dynamics of Uber pick-up activities, allowing for more informed decision-making in terms of resource allocation, service planning, and meeting the evolving demands of users throughout the month.
X_uber.time.hourly <- split(X_uber.time,
f = as.factor(X_uber.time$marks$hour))
X_uber.time.hourly[1:6]
names= expand.grid(paste0("Hour",c(0:23)))
par(mfrow=c(4,6))
for (i in 1:length(X_uber.time.hourly)) {
plot(unmark(X_uber.time.hourly[[i]]), pch = 20, cex = 2, main = names[,1][i],lwd = 2, cex.main = 6)
}
# save
dev.copy(png, "pic/hourly.pattern.png", width = 1000, height = 800)
dev.off()hourly Uber pick-up locations data from Hour0-Hour23 in Brooklyn during April 2014
After splitting the data into time-ordered hourly point patterns, my next step is to estimate bandwidth using the Scott’s rule. Focusing on the timely evolution of patterns, we compute the geometric mean of chosen bandwidths as a common bandwidth for all instances.
scott_uber.hourly <- lapply(X = 1:length(X_uber.time.hourly), function(i){
bw.scott(X_uber.time.hourly[[i]])
})
scott_uber.hourly <- unlist(scott_uber.hourly )
sigma.x <- exp(mean(log(scott_uber.hourly[c(seq(1,59,by=2))])))
sigma.y <- exp(mean(log(scott_uber.hourly[c(seq(2,48,by=2))])))
d_uber.hourly <- lapply(X = 1:length(X_uber.time.hourly), function(i){
d <- density.ppp(X_uber.time.hourly[[i]],
sigma = c(sigma.x = sigma.x,
sigma.y = sigma.y),
diggle = TRUE,
positive = TRUE,
leaveoneout = FALSE)
raster(d)
})
d_uber.hourly <- stack(d_uber.hourly)
d_uber.hourly02 <- d_uber.hourly *10^3
names= expand.grid(paste0("Hour",c(0:23)))
spplot(d_uber.hourly02,
col.regions = rev(viridis(20,option = "A")),
colorkey = list(labels = list(cex = 3)),
scales = list(draw = FALSE),
par.strip.text = list(cex = 2),
names.attr = names[,1],
layout=c(6,4))
# save
dev.copy(png, "pic/hourly.intensity.png", width = 1000, height = 800)
dev.off()Kernel-based intensity estimation, using the Jones-Diggle estimator and Scott’s rule bandwidth, for hourly Uber pick-up locations data from Hour0-Hour23 in Brooklyn during April 2014.
The “Daily Intensity Estimate Plot” for Uber pickup locations in Brooklyn from 1 hour to 23 hours in April 2014 reveals notable changes in intensity patterns throughout the day. A distinct peak in intensity is observed during the morning hours, particularly at 7-8 hours, reflecting the heightened demand associated with the morning work commute. Following this, another substantial peak occurs between Hour 17 and Hour 22, corresponding to increased activity and the evening work commute. Conversely, the lowest intensity is recorded between Hour 1 and Hour 3, characterized by a relatively sparse distribution with only one unobtrusive region in the north. However, other hours exhibit higher intensity concentrations, notably in the north and north-west regions. Understanding these temporal variations is essential for optimizing Uber services, ensuring efficient resource allocation during peak demand periods, and enhancing overall user experience throughout the diverse hours of the day.