This research proposes an analytical framework — by introducing urban vitality which is an important index for evaluating urban space with POI density and diversity, road network density and population density as four indicators of urban vitality — to confirm that the security of the neighborhood comes from the surveillance of people and shops on the street. In other words, this research proposes a question: What is the relationship between crime rate and urban vitality? Besides, in order to better analyze and summarize the relationship between crime and urban vitality in space, this research will also use clustering analysis to map how their different levels of relationships are distributed in cities.
library(sp)
library(sf)
library(tmap)
library(tmaptools)
library(sf)
library(geojsonio)
library(dplyr)
library(ggplot2)
library(ggthemes)
This study uses the geographical boundary of LSOA and need to process five factors. We choose 1)crime rate as dependent variable and urban vitality as independent variables. There are four factors of urban vitality which are the 2)road network density, 3)POIs density, 4)POIs diversity and 5)population density.
London LSOA boundary is chosen as the geographical boundary for spatial analysis. Lower Layer Super Output Areas (LSOA) are a geographic hierarchy which contain 4835 geographical boundaries. Import a shapefile of LSOA. You can download the file from https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london
# Get the London LSOA Boundaries
LD_LSOA <- st_read(
'D:/F_SCUA-UCL/CASA05/workshop_1/statistical-gis-boundaries-london/statistical-gis-boundaries-london/ESRI/LSOA_2011_London_gen_MHW.shp') #You can change it to your own path
The next step is to calculate the area of each region for further analysis
# Calculate area
LD_LSOA_area <- st_area(LD_LSOA)
LD_LSOA_area <- LD_LSOA_area %>% as.numeric(as.character(LD_LSOA_area))
# Add a colomn of area to LD_LSOA
LD_LSOA$area<- LD_LSOA_area
# Transform geographic coordinate to 4326
LD_LSOA <- LD_LSOA %>%
st_transform(4326)
# Remove useless data
LD_LSOA <- LD_LSOA %>%
select('LSOA11CD', 'LSOA11NM','POPDEN', 'geometry', 'area')
head(LD_LSOA)
## Simple feature collection with 6 features and 4 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -0.0997488 ymin: 51.50974 xmax: 0.09360014 ymax: 51.54279
## geographic CRS: WGS 84
## LSOA11CD LSOA11NM POPDEN area
## 1 E01000001 City of London 001A 112.9 133320.77
## 2 E01000002 City of London 001B 62.9 226191.27
## 3 E01000003 City of London 001C 227.7 57302.97
## 4 E01000005 City of London 001E 52.0 190738.76
## 5 E01000006 Barking and Dagenham 016A 116.2 144195.85
## 6 E01000007 Barking and Dagenham 015A 69.6 198134.81
## geometry
## 1 MULTIPOLYGON (((-0.09728867...
## 2 MULTIPOLYGON (((-0.08812915...
## 3 MULTIPOLYGON (((-0.09678574...
## 4 MULTIPOLYGON (((-0.07323094...
## 5 MULTIPOLYGON (((0.09115207 ...
## 6 MULTIPOLYGON (((0.07774095 ...
Crime data is collected by data.police.uk https://data.police.uk, which provides street-level crime information by month, and broken down by police force and lower layer super output area. There are two police stations in Greater London which are city of London and Metropolitan.
This study uses the London crime data in September 2019 which contains 88195 cases.
There are two teps to process crime data 1) Combine datasets into the LOSA boundaries 2) Calculate crime rate by dividing the total number of crime cases by the local population
# Read csv of crime
Crime_MtroP <- read.csv('D:/F_SCUA-UCL/Final/GIS_Final/data/2019-09/2019-09-metropolitan-street.csv')
Crime_CityofLD <- read.csv('D:/F_SCUA-UCL/Final/GIS_Final/data/2019-09/2019-09-city-of-london-street.csv')
# Rbind two dataset into the greater London dataset
Crime_LD <- rbind(Crime_MtroP, Crime_CityofLD)
# Data cleaning
Crime_LD_Cleaning <- select(Crime_LD, 8:10)
# London Crime of LSOA
LSOA_Crime <- Crime_LD_Cleaning %>%
group_by(LSOA.code) %>%
summarise(n=n())
# Combine crime data to LSOA dataset
LSOA_Crime <-left_join(LD_LSOA, LSOA_Crime, by=c('LSOA11CD'= 'LSOA.code'))
# Add a column of crime rate
LSOA_Crime$denpop <- LSOA_Crime$n / LSOA_Crime$POPDEN * LSOA_Crime$area
# plot
tmap_mode('plot')
# plot
tm_shape(LSOA_Crime)+
## boundries
tm_borders(col = "grey",alpha = 0.1, lwd = 0.01)+
## fill color
tm_fill(col = 'denpop',n = 5,style = 'quantile', palette = "YlGnBu",colorNA = "gray",
legend.show = TRUE,legend.hist = FALSE,
title = 'Crime_index')+
tm_compass(size = 2.5, text.size = 0.7, type = "arrow", position=c("right", "bottom"))+ #compass
tm_scale_bar(width = 0.15, position=c("right", "bottom"), text.size = 0.5) ## bar
Road network data comes from Open Street Map. Download from http://download.geofabrik.de/europe/great-britain/england/greater-london.html. The road density of a region is calculated by dividing the total number of road lengths (meter) by the area of the region (square meter).
# Read data
LD_road <- st_read('D:/F_SCUA-UCL/Final/GIS_Final/data/street/gis_osm_roads_free_1.shp')
## Reading layer `gis_osm_roads_free_1' from data source `D:\F_SCUA-UCL\Final\GIS_Final\data\street\gis_osm_roads_free_1.shp' using driver `ESRI Shapefile'
## Simple feature collection with 325369 features and 10 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: -0.5177851 ymin: 51.28324 xmax: 0.3385625 ymax: 51.69869
## geographic CRS: WGS 84
# Intersect the road and boundaries
road_intersect <- LD_road %>%
st_join(., LD_LSOA)
# Calculate the length of road in each region
length <- st_length(road_intersect)
length <- as.numeric(length)
road_intersect$length <- length
# Calculate the account length of each region
road_intersect <- road_intersect %>%
group_by(LSOA11CD, ) %>%
summarise(.,sum(length))
# Remove NA
road_intersect <- road_intersect%>%
filter(LSOA11CD != '')%>%
st_drop_geometry()
# Join the road data to LD_LSOA
LSOA_road_account <- left_join(LD_LSOA, road_intersect, by=c('LSOA11CD'= 'LSOA11CD'))
# Calculate road density
LSOA_road_account$density <- LSOA_road_account$`sum(length)` / LSOA_road_account$area
# plot
tm_shape(LSOA_road_account)+
## boundries
tm_borders(col = "white",alpha = 0.01, lwd = 0.01)+
## fill color
tm_fill(col = 'density',n = 5,style = 'quantile', palette = "YlGnBu",colorNA = "gray",
legend.show = TRUE,legend.hist = FALSE,
title = 'Road Density')+
tm_compass(size = 2.5, text.size = 0.7, type = "arrow", position=c("left", "bottom"))+ #compass
tm_scale_bar(width = 0.15, position=c("left", "bottom"), text.size = 0.5)+ ## bar
# tm_xlab("Longitude") + tm_ylab("Latitude") # coordinate
tm_layout(title = "a.Density of Road Network(per m2)",
main.title = "", title.size = 0.77)
POI data is the data for September 2019 and downloaded from the digimap (https://digimap.edina.ac.uk/, Ordnance Survey > Data Download > Boundary and Location Data). which providing a full range of topographic Ordnance Survey data. There are 353532 points in Greater London.
1) Process data of POI
# Read data
poi <- st_read('D:/F_SCUA-UCL/Final/GIS_Final/data/POI/POI.shp')
## Reading layer `POI' from data source `D:\F_SCUA-UCL\Final\GIS_Final\data\POI\POI.shp' using driver `ESRI Shapefile'
## Simple feature collection with 661547 features and 30 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 480235 ymin: 133176 xmax: 575542 ymax: 227187
## projected CRS: OSGB 1936 / British National Grid
## Data Copyright Note
## Ordnance Survey Digimap: Licensed Data: © Crown copyright and database rights 2020 Ordnance Survey (100025252).
## This material includes data licensed from PointX© Database Right/Copyright 2020.
# Remove useless data
poi_LD <- poi %>%
select('ref_no', 'name','groupname', 'geographic', 'categoryna', 'classname', 'geometry')
# PLOT Points heat map to view the data
poi_LD <- poi_LD %>%
st_transform(4326) %>% # tranform to same CRS as stations and lines
cbind(st_coordinates(.))
### Add boundaries of borough (LSOA is too small)
Londonborough <- st_read(
'D:/F_SCUA-UCL/CASA05/workshop_1/statistical-gis-boundaries-london/statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.shp')
## Reading layer `London_Borough_Excluding_MHW' from data source `D:\F_SCUA-UCL\CASA05\workshop_1\statistical-gis-boundaries-london\statistical-gis-boundaries-london\ESRI\London_Borough_Excluding_MHW.shp' using driver `ESRI Shapefile'
## Simple feature collection with 33 features and 7 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 503568.2 ymin: 155850.8 xmax: 561957.5 ymax: 200933.9
## projected CRS: OSGB 1936 / British National Grid
### plot heatmap
ggplot() +
geom_bin2d(data = poi_LD, aes(X, Y), binwidth = c(0.01, 0.01)) + # heatmap of 2d bin counts
geom_sf(data = Londonborough, col="white", size=0.001, alpha = 0.01) +
theme_tufte() +
scale_fill_viridis_c(option = "plasma") +
labs(x="", y="")
Clean data
# Remove points outside of London
poi_LD_clean <- poi_LD %>%
filter(geographic == 'Greater London')
poi_LSOA <- poi_LD_clean %>%
st_join(., LD_LSOA)
Step1 Divide the total number of POIs by the area to get the result of POIs density.
# count points of each LOSA
poi_count <- poi_LSOA %>%
group_by(LSOA11CD) %>%
summarise(n=n())
poi_count <- poi_count %>%
st_drop_geometry()
# add the numbers of points to LOSA (then can calculate density and plot)
LSOA_POI_account <- left_join(LD_LSOA, poi_count, by=c('LSOA11CD'= 'LSOA11CD'))
LSOA_POI_account$density <- LSOA_POI_account$n / LSOA_POI_account$area
Plot POI density
tm_shape(LSOA_POI_account)+
tm_borders(col = "white",alpha = 0.01, lwd = 0.01)+
tm_fill(col = 'density',n = 5,style = 'quantile', palette = "YlGnBu",colorNA = "gray",
legend.show = TRUE,legend.hist = FALSE,
title = 'LSOA_POI_account')+
tm_compass(size = 2.5, text.size = 0.7, type = "arrow", position=c("left", "bottom"))+ #compass
tm_scale_bar(width = 0.15, position=c("left", "bottom"), text.size = 0.5)+ ## bar
tm_layout(title = "b.Density of Points of Interest (per m2)",
main.title = "", title.size = 0.77)
Step2 Count all types of POI to get the result of POIs diversity.
# Create a 'Richness' metric by 'Group', "Category' and 'Class'
div_tab <- poi_LSOA %>%
st_drop_geometry() %>%
group_by(LSOA11CD) %>%
summarise(rich_g = n_distinct(groupname),
rich_cat = n_distinct(categoryna),
rich_cla = n_distinct(classname))
# Merge
diversity <- LSOA_POI_account %>%
merge(., div_tab)
diversity$rich_cla_den <- diversity$rich_cla / diversity$area
Plot POI density
## plot
tm_shape(diversity)+
## boundries
tm_borders(col = "white",alpha = 0.01, lwd = 0.01)+
## fill color
tm_fill(col = 'rich_cla_den',n = 5,style = 'quantile', palette = "YlGnBu",colorNA = "gray",
legend.show = TRUE,legend.hist = FALSE,
title = 'POI_Diversity')+
tm_compass(size = 2.5, text.size = 0.7, type = "arrow", position=c("left", "bottom"))+ #compass
tm_scale_bar(width = 0.15, position=c("left", "bottom"), text.size = 0.5)+ ## bar
# tm_xlab("Longitude") + tm_ylab("Latitude") # coordinate
tm_layout(title = "c.Diversity of Points of Interest(per m2)",
main.title = "", title.size = 0.77)
Data of population will directly use the data from LOSA. plot the population density.
# plot
tm_shape(LSOA_road_account)+
## boundries
tm_borders(col = "white",alpha = 0.01, lwd = 0.01)+
## fill color
tm_fill(col = 'POPDEN',n = 5,style = 'quantile', palette = "YlGnBu",colorNA = "gray",
legend.show = TRUE,legend.hist = FALSE,
title = 'Road Density')+
tm_compass(size = 2.5, text.size = 0.7, type = "arrow", position=c("left", "bottom"))+ #compass
tm_scale_bar(width = 0.15, position=c("left", "bottom"), text.size = 0.5)+ ## bar
# tm_xlab("Longitude") + tm_ylab("Latitude") # coordinate
tm_layout(title = "d. Population Density",
main.title = "", title.size = 0.77)
# Remove outliers by z-score
LSOA_Crime$Zscore <- scale(LSOA_Crime$denpop,center = TRUE, scale = TRUE)
LSOA_Crime_cleaning <- LSOA_Crime %>%
filter(Zscore < 3) %>%
filter(Zscore > -3)
# view the data (whether it is normalize)
par(mfrow=c(1, 2),pin = c(3,2.5))
boxplot(LSOA_Crime_cleaning$denpop, main="Crime")
hist(LSOA_Crime_cleaning$denpop, main="Crime")
# make it normalize by log
LSOA_Crime_cleaning$denpoplog <- log(LSOA_Crime_cleaning$denpop)
# make it normalize by log
LSOA_Crime_cleaning$denpoplog <- log(LSOA_Crime_cleaning$denpop)
boxplot(LSOA_Crime_cleaning$denpoplog, main="Crime",col = 'orange')
Dist <- LSOA_Crime_cleaning %>%
ggplot(aes(x=denpoplog)) +
geom_histogram(position="identity",
alpha=0.5,
bins=15,
fill="orange", col="black")+
geom_vline(aes(xintercept=mean(denpoplog)),
color="darkblue",
linetype="dashed")+
labs(title="log of Crime",
x="log of Crime",
y="Frequency")+
theme_classic()+
theme(plot.title = element_text(hjust = 0.5))
Dist
# remove outliers by z-score
LSOA_Crime_cleaning$Zscorelog <- scale(LSOA_Crime_cleaning$denpoplog,center = TRUE, scale = TRUE)
LSOA_Crime_cleaning <- LSOA_Crime_cleaning %>%
filter(Zscorelog < 3) %>%
filter(Zscorelog > -3)
## plot
tm_shape(LSOA_Crime_cleaning)+
## boundries
tm_borders(col = "grey",alpha = 0.1, lwd = 0.01)+
## fill color
tm_fill(col = 'denpoplog',n = 5,style = 'quantile', palette = "YlGnBu",colorNA = "gray",
legend.show = TRUE,legend.hist = FALSE,
title = 'Crime')+
tm_compass(size = 2.5, text.size = 0.7, type = "arrow", position=c("right", "bottom"))+ #compass
tm_scale_bar(width = 0.15, position=c("right", "bottom"), text.size = 0.5) ## bar
# Select POI_density
Independentdata <- select(LSOA_POI_account, LSOA11CD, density, POPDEN) %>%
rename(.,POI_density = density)
# Select POI_diversity
POI_diversity <- select(diversity,LSOA11CD,rich_cla_den) %>%
st_drop_geometry()
Independentdata <- left_join(Independentdata, POI_diversity, by=c('LSOA11CD'= 'LSOA11CD'))
Independentdata <- rename(Independentdata,POI_diversity = rich_cla_den)
# Select Road_density
Road_density <- select(LSOA_road_account,LSOA11CD,density) %>%
st_drop_geometry()
Independentdata <- left_join(Independentdata, Road_density, by=c('LSOA11CD'= 'LSOA11CD'))
Independentdata <-rename(Independentdata, Road_density = density)
# drop geo inf
Independentdata <- Independentdata %>%
st_drop_geometry()
None of them are Normal distribution so we make them to logarithmic
#log
Independentdata$log_POI_density <- log(Independentdata$POI_density)
Independentdata$log_POI_diversity <- log(Independentdata$POI_diversity)
Independentdata$log_Road_density <- log(Independentdata$Road_density)
Independentdata$log_popden <- log(Independentdata$POPDEN)
#combine independent data and dependent data
##both datasets have POPDEN column so remove one
Independentdata <- Independentdata %>%
select(.,-POPDEN)
##left join them
Variables <- left_join(LSOA_Crime_cleaning, Independentdata, by = c('LSOA11CD'= 'LSOA11CD'))
Variables <- Variables %>%
rename(crime =denpop ,log_crime = denpoplog)
Show the data in boxlot
par(mfrow=c(2, 5)) # divide graph area in 2 columns
# The original data
boxplot(LSOA_Crime_cleaning$denpop, main="Crime",col = 'orange')
boxplot(Independentdata$POI_density, main="POI_density", col = '#b0c7f0')
boxplot(Independentdata$POI_diversity, main="POI_diversity",col = '#b0c7f0')
boxplot(Independentdata$Road_density, main="Road_density", col = '#b0c7f0')
boxplot(Independentdata$Road_density, main="Poplation_density", col = '#b0c7f0')
#The transformed data
boxplot(Variables$log_crime, main="Variables", col = 'orange')
boxplot(Variables$log_POI_density, main="log_POI_density",col = '#b0c7f0')
boxplot(Variables$log_POI_diversity, main="log_POI_diversity", col = '#b0c7f0')
boxplot(Variables$log_Road_density, main="log_Road_density", col = '#b0c7f0')
boxplot(Variables$log_popden, main="log_Poplation_density",col = '#b0c7f0')
Show the data distribution in histgram
par(mfrow=c(1, 5), pin =c(1,0.7)) # divide graph area in 2 columns
hist(Variables$log_crime, main="Variables", col = 'orange')
hist(Variables$log_POI_density, main="log_POI_density",col = '#b0c7f0')
hist(Variables$log_POI_diversity, main="log_POI_diversity", col = '#b0c7f0')
hist(Variables$log_Road_density, main="log_Road_density", col = '#b0c7f0')
hist(Variables$log_popden, main="log_Poplation_density",col = '#b0c7f0')
The calculation of the vitality index uses the same weight to calculate the four indicators. unity-based normalization is used to bring all values into the range [0,1]. After the four indicators are normalized into same rage, the average is calculated to obtain the vitality index. The normalization formula is: x’= (x- x_min)/(x_max - x_min )
#POI density
minPd <- min(Variables$log_POI_density)
Pd <- max(Variables$log_POI_density) - min(Variables$log_POI_density)
pd_norm <- Variables$log_POI_density - minPd
Variables$pd_norm <- pd_norm / Pd
#POI diversity
minPdiv <- min(Variables$log_POI_diversity)
Pdiv <- max(Variables$log_POI_diversity) - min(Variables$log_POI_diversity)
pdiv_norm <- Variables$log_POI_diversity - minPdiv
Variables$pdiv_norm <- pdiv_norm / Pdiv
#Road density
minR <- min(Variables$log_Road_density)
R <- max(Variables$log_Road_density) - min(Variables$log_Road_density)
r_norm <- Variables$log_Road_density - minR
Variables$r_norm <- r_norm / R
#Population density
minPp <- min(Variables$log_popden)
Pp <- max(Variables$log_popden) - min(Variables$log_popden)
Pp_norm <- Variables$log_popden - minPp
Variables$Pp_norm <- Pp_norm / Pp
Variables$vitality <- Variables$pd_norm + Variables$pdiv_norm + Variables$r_norm + Variables$Pp_norm
Variables$vitality <- Variables$vitality / 4 * 100
tm_shape(Variables)+
## boundries
tm_borders(col = "grey",alpha = 0.1, lwd = 0.01)+
## fill color
tm_fill(col = 'vitality',n = 5,style = 'quantile', palette = "YlGnBu",colorNA = "gray",
legend.show = TRUE,legend.hist = FALSE,
title = 'Urban Vitality')+
tm_compass(size = 2.5, text.size = 0.7, type = "arrow", position=c("right", "bottom"))+ #compass
tm_scale_bar(width = 0.15, position=c("right", "bottom"), text.size = 0.5) ## bar
Determine whether the normal distribution by histogram
Vitality_Dist <- Variables %>%
ggplot(aes(x=vitality)) +
geom_histogram(position="identity",
alpha=0.5,
bins=15,
fill="#b0c7f0", col="black")+
geom_vline(aes(xintercept=mean(vitality)),
color="darkblue",
linetype="dashed")+
labs(title="Vitality",
x="log_vitality",
y="Frequency")+
theme_classic()+
theme(plot.title = element_text(hjust = 0.5))
Vitality_Dist
Pearson’s correlation coefficient is used to calculate the correlation between crime and urban vitality and the four factors. R and p value can be observed to determine the correlation.
# Select variables
corre_variable <- Variables %>%
select(., log_crime, vitality, log_POI_density,log_POI_diversity,log_Road_density) %>%
st_drop_geometry()
# Plot it
library(corrplot)
cormat <- cor(corre_variable, use="complete.obs", method="pearson")
corrplot(cormat)
# Get the result of coefficient
library(Hmisc)
res2 <- corre_variable
res2 <- dplyr::mutate_all(res2,as.integer)
res <- rcorr(as.matrix(res2))
res
## log_crime vitality log_POI_density log_POI_diversity
## log_crime 1.00 -0.49 -0.18 -0.32
## vitality -0.49 1.00 0.86 0.89
## log_POI_density -0.18 0.86 1.00 0.85
## log_POI_diversity -0.32 0.89 0.85 1.00
## log_Road_density -0.32 0.60 0.39 0.43
## log_Road_density
## log_crime -0.32
## vitality 0.60
## log_POI_density 0.39
## log_POI_diversity 0.43
## log_Road_density 1.00
##
## n= 4750
##
##
## P
## log_crime vitality log_POI_density log_POI_diversity
## log_crime 0 0 0
## vitality 0 0 0
## log_POI_density 0 0 0
## log_POI_diversity 0 0 0
## log_Road_density 0 0 0 0
## log_Road_density
## log_crime 0
## vitality 0
## log_POI_density 0
## log_POI_diversity 0
## log_Road_density
Establish the space weight matrix by the k-nearest neighbours.
# Calculate the centroids of all Wards in London
coordsLSOA <- Variables %>%
st_centroid()%>%
st_geometry()
## Warning in st_centroid.sf(.): st_centroid assumes attributes are constant over
## geometries of x
## Warning in st_centroid.sfc(st_geometry(x), of_largest_polygon =
## of_largest_polygon): st_centroid does not give correct centroids for longitude/
## latitude data
# Generate a spatial weights matrix
library(spdep)
#Nearest neighbours
knn_LSOA <-coordsLSOA %>%
knearneigh(., k=4)
LSOA_knn <- knn_LSOA %>%
knn2nb()
LSOA.knn_4_weight <- LSOA_knn %>%
nb2listw(., style="C")
library(broom)
slag_dv_model_knn4 <- lagsarlm(log_crime ~ vitality,
data = corre_variable,
nb2listw(LSOA_knn,
style="C"),
method = "eigen")
## Warning: Function lagsarlm moved to the spatialreg package
#what do the outputs show?
tidy(slag_dv_model_knn4)
## # A tibble: 3 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 rho 0.321 0.0170 18.9 0
## 2 (Intercept) 9.77 0.230 42.5 0
## 3 vitality -0.0506 0.00167 -30.3 0
glance(slag_dv_model_knn4)
## # A tibble: 1 x 6
## r.squared AIC BIC deviance logLik nobs
## <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 0.320 15640. 15666. 7300. -7816. 4750
The p values are all close to 0.00 and less than 0.05, and the results are significant. The regression coefficient of urban vitality is -0.051, which means that urban vitality will have a significant negative impact on the crime rate.
Use the Moran’s I to check the residuals in the spatial lag model.
#write out the residuals
modelresiduals <- corre_variable %>%
mutate(slag_dv_model_knn_resids = residuals(slag_dv_model_knn4))
KNN4Moran <- modelresiduals %>%
dplyr::select(slag_dv_model_knn_resids)%>%
pull()%>%
moran.test(., LSOA.knn_4_weight)%>%
tidy()
KNN4Moran
## # A tibble: 1 x 7
## estimate1 estimate2 estimate3 statistic p.value method alternative
## <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 0.0176 -0.000211 0.0000962 1.82 0.0344 Moran I test unde~ greater
The Moran’s I is 0.0176, so there is no exhibiting spatial autocorrelation.
# Select data
Cluster_data <- corre_variable %>%
select(log_crime, vitality)
# translate data into int.
Cluster_data <- dplyr::mutate_all(Cluster_data,as.integer)
fit <- Cluster_data %>%
kmeans(., 3, nstart=50)
# get cluster means
library(tidymodels)
centroid <- tidy(fit)%>%
#print the results of the cluster groupings
print()%>%
dplyr::select(log_crime, vitality)
## # A tibble: 3 x 5
## log_crime vitality size withinss cluster
## <dbl> <dbl> <int> <dbl> <fct>
## 1 9.81 50.1 2179 42530. 1
## 2 11.1 35.0 1014 49499. 2
## 3 9.25 64.7 1557 45710. 3
# Plot the clusters on a graph
p <- ggplot(Cluster_data,aes(log_crime, vitality))+
geom_point(aes(colour=factor(fit$cluster)))+
geom_point(data=centroid,aes(denpoplog, vitality), size=7, shape=18)+ theme(legend.position="none")
LD_LSOA_dep <- fit %>%
#
augment(., LSOA_Crime_cleaning)%>%
dplyr::select(LSOA11CD, .cluster)%>%
#make sure the .cluster column is numeric
mutate(across(.cluster, as.numeric))%>%
# join the .cluster to our sf layer
left_join(LSOA_Crime_cleaning,
.,
by = c("LSOA11CD" = "LSOA11CD"))
## make the scatter plot
Variables$cluster <- LD_LSOA_dep$.cluster
cluster_scatter <- ggplot(Variables, aes(x=vitality,y=log_crime,color = cluster))+
geom_point(aes(colour = cluster))
cluster_scatter
tm_shape(LD_LSOA_dep)+
tm_borders(col = "white",alpha = 0.01, lwd = 0.01)+
tm_fill(col = '.cluster',n = 3,breaks=c(1,2,3,4), palette = "YlGnBu",colorNA = "gray",
legend.show = TRUE,legend.hist = FALSE,
title = 'Cluster')+
tm_compass(size = 2.5, text.size = 0.7, type = "arrow", position=c("right", "bottom"))+ #compass
tm_scale_bar(width = 0.15, position=c("right", "bottom"), text.size = 0.5)+ ## bar
tm_layout(title = "Cluster",
main.title = "", title.size = 0.77)
The distribution of crime rate and urban vitality has a relatively obvious trend: places with high crime rate and low vitality are more distributed on the edge of the city, and places with low crime rate and high vitality are more distributed in the city center.
This research discusses the spatial distribution of crime and urban vitality. From the results, we can see that:
the vitality of the city is related to the crime rate.
Where the urban vitality is high, the crime rate is low.
the distribution of crime rate and urban vitality has a relatively obvious trend: places with high crime rate and low vitality are more distributed on the edge of the city, and places with low crime rate and high vitality are more distributed in the city center.
But this study has many limitations:
Exploring urban vitality may be more suitable for using street data
The calculation of urban vitality can also take a more effective way. This study simply uses the normalization method, but then you can consider different coefficients obtained through correlation analysis to weight calculation.
Street accessibility can also be more precise. This research simply uses the data of road network density, and further research can also consider advanced methods such as space syntax.
If you have any feedback or comments please contact me at sihan.chen.20@ucl.ac.uk