General flow
I chose to use both R and Python and use both of the strengths of these platforms.
Python is used to download the extent of the data, the data itself, and make simplifications in the network (consolidation of intersections and removal of pseudo links).
R is used for data manipulation and visualization. The sfnetworks package is still under development, and simplifications using it is not ideal.
Getting the data
I use the Haifa network to get impressions. I know haifa well and it is a very diverse network with distinct areas. The data is acquired using osmnx (python library). The network went through intersection consolidation and link simplification (removal of interstitial/psedou nodes).

Converting the data into R
After this, we convert the graph to two tables(edges and nodes) with a geometry column. we rebuild the graph with sfnetworks, using only the edges. using the nodes uses too much ram from an unexplained reason, and extracting the nodes data is also difficult.

Network scope issues
The network chosen is arbitrary, therefore it might be to big or too small. however, I try to use scale indifferent metrics, such as degree, neighborhood size and mean edge length per network size.
Calculating indices
I calculate the neighborhood size of each node given the neighborhood order, the mean edge length in the neighborhood, and the mean circuity. the neighborhood order used is 10. I filtered extreme circuity values(there are infinite values in the case of self loops. besides that, I filtered out values over 2, they are very rare).
What’s next?
- Trying this on other cities
- thinking about how to perform spatial autocorrelation/other spatial static analysis on the result
- Trying these measures on groups on the graphs, segmented by the louvain method (see more in the appendix)
- Clustering the nodes according to edge length distribution, thus getting a new feature for each node.
Appendix: graph groups
Using the louvain method, I have clustered the graph into groups.
It is possible to the aforementioned measures on the communities found on the graph. WIP

---
title: 'Synthetic cities report #1'
author: "Ido Klein"
date: "27/12/2020"
output:
  html_notebook
editor_options: 
  chunk_output_type: inline
---

## General flow
I chose to use both R and Python and use both of the strengths of these platforms.  
Python is used to download the extent of the data, the data itself, and make simplifications in the network (consolidation of intersections and removal of pseudo links).  
R is used for data manipulation and visualization. The sfnetworks package is still under development, and  simplifications using it is not ideal. 
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(reticulate)
library(tidyverse)
library(sf)
library(igraph)
library(tidygraph)
library(sfnetworks)
library(ggspatial)
library(dbscan)
library(patchwork)
library(mapview)
use_condaenv("oxtwo",required = TRUE)
```
## Getting the data
I use the Haifa network to get impressions. I know haifa well and it is a very diverse network with distinct areas. The data is acquired using osmnx (python library). The network went through intersection consolidation and link simplification (removal of interstitial/psedou nodes). 
```{python getting the data and simplifications, echo=FALSE}
import osmnx as ox
import pandas as pd
import geopandas as gpd
import networkx as nx
import requests
import collections
from shapely.geometry import shape,JOIN_STYLE, Polygon, MultiPolygon
import shapely
import matplotlib.pyplot as plt
#download directly from osm - bounding polygon, with nominatim, by osm relation id (copenhagen)
params = collections.OrderedDict()
params['osmtype'] = 'R'
params['osmid'] = '1387888'
params['format'] = 'json'
params['polygon_geojson'] = '1'
headers = ox.downloader._get_http_headers()
url = ox.settings.nominatim_endpoint.rstrip("/") + "/" + "details"
response = requests.get(url, params=params, timeout=ox.settings.timeout, headers=headers)
response_json = response.json()
bounding = shape(response_json["geometry"])
city = ox.graph_from_polygon(bounding,network_type = "drive_service")
# project to israel crs
city = ox.projection.project_graph(city,2039)
# consolidate intersections
city = ox.simplification.consolidate_intersections(city,15,dead_ends=True)
gdf_nodes, gdf_edges = ox.graph_to_gdfs(city)
nodes_wkb_geom = gdf_nodes.geometry.apply(lambda x: shapely.wkb.dumps(x,hex = True))
gdf_nodes = pd.concat([gdf_nodes.drop(columns="geometry"),nodes_wkb_geom],axis = 1)
edges_wkb_geom = gdf_edges.geometry.apply(lambda x: shapely.wkb.dumps(x,hex = True))
gdf_edges = pd.concat([gdf_edges.drop(columns="geometry"),edges_wkb_geom],axis = 1)
fig, ax = ox.plot_graph(city)

```
<!-- <img src="pyplot.png" alt="MyImage"> -->


## Converting the data into R
After this, we convert the graph to two tables(edges and nodes) with a geometry column. 
we rebuild the graph with sfnetworks, using only the edges. using the nodes uses too much ram from an unexplained reason, and extracting the nodes data is also difficult. 
```{r showing conversion to R, echo=FALSE, fig.height=5, fig.width=5}
gdf_edges <- py$gdf_edges 
edges <- gdf_edges %>% 
  mutate(geometry = st_as_sfc(sf:::as_wkb(geometry), EWKB=TRUE,crs = 2039),
         from = as.character( u),
         to = as.character( v)) %>% 
  st_as_sf()
gdf_nodes <- py$gdf_nodes
nodes <- gdf_nodes %>% 
  mutate(geometry = st_as_sfc(sf:::as_wkb(geometry), EWKB=TRUE,crs = 2039),
         osmid = as.character(osmid)) %>% 
  st_as_sf()
ggplot(nodes) + 
  geom_sf() +
  geom_sf(data = edges)
```
## Network scope issues
The network chosen is arbitrary, therefore it might be to big or too small. however, I try to use scale indifferent metrics, such as degree, neighborhood size and mean edge length per network size.  

## Calculating indices
I calculate the neighborhood size of each node given the neighborhood order, the mean edge length in the neighborhood, and the mean circuity. the neighborhood order used is 10. I filtered extreme circuity values(there are infinite values in the case of self loops. besides that, I filtered out values over 2, they are very rare). 
```{r echo=FALSE,message=FALSE}
net_by_edge <- as_sfnetwork(edges,directed = F)
neigh_size <- 10
net_edges <- net_by_edge %>% 
  activate(edges) %>% 
  mutate(weight = edge_length()%>% as.numeric(),
         circuity = edge_circuity()%>% as.numeric()) %>% 
  as_tibble() %>% 
  st_drop_geometry()

net <- net_by_edge %>% 
  as_sfnetwork() %>% 
  activate(nodes) %>% 
  mutate(members= local_members(neigh_size),
         size = local_size(neigh_size),
         edgeW = map_dbl(members, function (x) {
           res <- net_edges %>% 
             filter(from %in% x, to %in% x) %>% 
             summarise(mean(weight)) %>% 
             pull()
           return(res)}),
         edgeC = map_dbl(members, function (x) {
           res <- net_edges %>% 
             filter(from %in% x, to %in% x,!is.infinite(circuity), circuity < 2) %>% 
             summarise(mean(circuity,na.rm = T)) %>% 
             pull()
           return(res)})
  ) %>% 
  select(-members)
```


## Visualization

```{r echo=FALSE}
df <- net %>% 
  as_tibble() 
p1 <- df %>% 
  ggplot(aes(color =size)) + 
  geom_sf() + 
  ggtitle("Node neighborhood size \norder 10")+
  scale_color_viridis_c()
p2 <- df %>% 
  ggplot(aes(color =edgeW)) + 
  geom_sf()+
  ggtitle("Node neighborhood mean edge length \norder 10")+
  scale_color_viridis_c()
p3 <- df %>% 
  ggplot(aes(color =edgeC)) + 
  geom_sf()+
  ggtitle("Node neighborhood mean circuity \norder 10")+
  scale_color_viridis_c()
p1
p2
p3
```

## What's next?
- Trying this on other cities
- thinking about how to perform spatial autocorrelation/other spatial static analysis on the result
- Trying these measures on groups on the graphs, segmented by the louvain method (see more in the appendix)
- Clustering the nodes according to edge length distribution, thus getting a new feature for each node.

## Appendix: graph groups
Using the [louvain method](https://en.wikipedia.org/wiki/Louvain_method), I have clustered the graph into groups.  
It is possible to the aforementioned measures on the communities found on the graph. WIP
```{r}
memberships <- net_by_edge %>% 
  activate(edges) %>% 
  mutate(weight = 1/edge_length()) %>%
  cluster_louvain() %>%
  membership()
net_by_edge %>% 
  activate(nodes) %>% 
  mutate(memberships = memberships) %>% 
  st_as_sf() %>% 
  ggplot(aes(color = factor(memberships))) + 
  geom_sf(data = net_by_edge %>% 
  activate(edges) %>% 
    st_as_sf(), color = "black") + 
  geom_sf() + 
  scale_color_manual(values = randomcoloR::randomColor(memberships %>% unique() %>% length()))


```

