Geoprocesos

GEOPROCESOS

¿Los proyectos de vivienda con requerimientos de inclusión social en San Francisco (“Residential Projects With Inclusionary Requirements”) tienen acceso a servicios?

A fin de contestar la pregunta, se utilizarán los dataset de:
- Vivienda con Zonificación inclusiva
- Colegios
- Barrios

Todos los datos se obtuvieron del Portal de datos abiertos de San Francsico

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ ggplot2 3.3.5     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.1.3     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1

## Warning: package 'readr' was built under R version 4.0.5

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(sf)

## Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1

Carga de datos

barrios <- st_read("https://raw.githubusercontent.com/paulavidela/utdt_cienciadedatos/main/data/sf_neighborhoods.geojson")

## Reading layer `sf_neighborhoods' from data source `https://raw.githubusercontent.com/paulavidela/utdt_cienciadedatos/main/data/sf_neighborhoods.geojson' using driver `GeoJSON'
## Simple feature collection with 41 features and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -122.5149 ymin: 37.70813 xmax: -122.357 ymax: 37.8333
## geographic CRS: WGS 84

colegios <- read.csv("https://raw.githubusercontent.com/paulavidela/utdt_cienciadedatos/main/data/sf_schools.csv", 
                     encoding =  "UTF-8")
proyectos_zi <- read.csv("https://raw.githubusercontent.com/paulavidela/utdt_cienciadedatos/main/data/sf_zi_residential_projects.csv", encoding =  "UTF-8")

Primero, vamos a realizar realizar un análisis exploratorio.

head(barrios)

## Simple feature collection with 6 features and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -122.4543 ymin: 37.70822 xmax: -122.357 ymax: 37.80602
## geographic CRS: WGS 84
##                            nhood                       geometry
## 1          Bayview Hunters Point MULTIPOLYGON (((-122.3816 3...
## 2                 Bernal Heights MULTIPOLYGON (((-122.4036 3...
## 3            Castro/Upper Market MULTIPOLYGON (((-122.4266 3...
## 4                      Chinatown MULTIPOLYGON (((-122.4062 3...
## 5                      Excelsior MULTIPOLYGON (((-122.424 37...
## 6 Financial District/South Beach MULTIPOLYGON (((-122.3875 3...

Es un dataset espacial que contiene los nombres de los barrios y sus geometrías.

str(colegios)

## 'data.frame':    445 obs. of  21 variables:
##  $ Campus.Name              : chr  "Milk, Harvey Milk Childrens Center" "Mckinley Elementary School" "Jewish Community Center San Francisco - Rosenberg Early Childhood Center" "Eureka Learning Center" ...
##  $ CCSF.Entity              : chr  "SFUSD" "SFUSD" "Private" "Private" ...
##  $ Lower.Grade              : int  -2 0 -2 -2 -2 0 0 0 -2 -2 ...
##  $ Upper.Grade              : int  -1 5 -1 -1 5 8 1 8 -1 -1 ...
##  $ Grade.Range              : chr  "PK" "K-5" "PK" "PK" ...
##  $ Category                 : chr  "USD PreK" "USD Grades K-5" "Independent / Private" "Independent / Private" ...
##  $ Map.Label                : chr  "CDC095" "PS075" "CDC058" "CDC035" ...
##  $ Lower.Age                : int  3 5 3 3 3 5 5 5 3 3 ...
##  $ Upper.Age                : int  4 10 4 4 10 13 6 13 4 4 ...
##  $ General.Type             : chr  "CDC" "PS" "CDC" "CDC" ...
##  $ CDS.Code                 : num  NA 3.87e+13 3.81e+08 3.87e+13 3.87e+13 ...
##  $ Campus.Address           : chr  "841 ELLIS ST, SAN FRANCISCO CA 94117" "1025 14TH ST, San Francisco, CA 94114" "325 ARGUELLO BLVD, SAN FRANCISCO, CA 94118" "464 DIAMOND ST, SAN FRANCISCO, CA 94114" ...
##  $ Supervisor.District      : int  6 8 1 8 4 8 5 7 10 3 ...
##  $ County.FIPS              : int  6075 6075 6075 6075 6075 6075 6075 6075 6075 6075 ...
##  $ County.Name              : chr  "SAN FRANCISCO" "SAN FRANCISCO" "SAN FRANCISCO" "SAN FRANCISCO" ...
##  $ Location.1               : chr  "CA\n(37.783802, -122.420105)" "CA\n(37.766884, -122.436279)" "CA\n(37.784588, -122.459488)" "CA\n(37.754967, -122.437004)" ...
##  $ Neighborhoods..old.      : int  36 3 11 22 35 3 41 40 39 4 ...
##  $ Zip.Codes                : int  28858 28862 54 28862 56 28862 28858 29491 309 308 ...
##  $ Fire.Prevention.Districts: int  7 15 11 2 1 2 13 1 9 3 ...
##  $ Police.Districts         : int  9 5 6 4 8 4 9 8 7 1 ...
##  $ Supervisor.Districts     : int  9 5 2 5 3 5 11 4 8 10 ...

El dataset de colegios contiene 21 variables. Aparentemente la variable “Location.1” (de tipo character) es la única que contiene los valores de las coordenadas espaciales. Sin embargo, no está escrita en un formato estandar.
Exploremos un poco más la variable a partir de una muestra de 20 valores

set.seed(1)
sample_n(colegios, 20) %>%
  .$Location.1

##  [1] "CA\n(37.783512, -122.40081)"  "CA\n(37.763493, -122.450424)"
##  [3] "CA\n(37.772942, -122.426147)" "CA\n(37.771988, -122.421211)"
##  [5] "CA\n(37.73621, -122.424728)"  "CA\n(37.763557, -122.399467)"
##  [7] "CA\n(37.753468, -122.416717)" "CA\n(37.738415, -122.4907)"  
##  [9] "CA\n(37.763454, -122.430359)" "CA\n(37.792713, -122.435089)"
## [11] "CA\n(37.731804, -122.38958)"  "CA\n(37.775456, -122.421577)"
## [13] "CA\n(37.787209, -122.42234)"  "CA\n(37.761398, -122.403702)"
## [15] "CA\n(37.792866, -122.433601)" "CA\n(37.743015, -122.500076)"
## [17] "CA\n(37.734238, -122.380966)" "CA\n(37.79372, -122.409294)" 
## [19] "CA\n(37.777092, -122.461357)" "CA\n(37.764423, -122.438148)"

Efectivamente es necesario transformar la variable. Para ello, vamos a obtener la latitud y la longitud en dos variables separadas.
Podemos observar que la latitud (primer valor) y la longitud (segundo valor) no poseen la misma cantidad de caracteres. No podemos asumir valores según posición. Dado que la coma , separa la latitud de la longitud, podemos utilizar la función gsub() para reemplazar los valores antes y después de la coma. El parámetro .* significa “todos los caracteres”.

colegios <- colegios %>%
  mutate(lat_int = gsub(",.*", "", Location.1), 
         lon_int = gsub(".*,", "", Location.1))

Veamos el resultado de esta primer transformación

set.seed(1)
colegios %>% 
  select(lon_int, lat_int) %>% 
  sample_n(10)

##          lon_int        lat_int
## 1    -122.40081) CA\n(37.783512
## 2   -122.450424) CA\n(37.763493
## 3   -122.426147) CA\n(37.772942
## 4   -122.421211) CA\n(37.771988
## 5   -122.424728)  CA\n(37.73621
## 6   -122.399467) CA\n(37.763557
## 7   -122.416717) CA\n(37.753468
## 8     -122.4907) CA\n(37.738415
## 9   -122.430359) CA\n(37.763454
## 10  -122.435089) CA\n(37.792713

Estamos más cerca. Ahora vemos que para la longitud nos sobra un paréntesis, mientras que para la latitud hay un indicador “CA\n” que no corresponde. Vamos a quitar esos textos con la función str_remove(). Además como la variable es de tipo caracter, vamos a transformarla a numérica.
Dado que los patrones a reemplazar contienen caracteres especiales que permiten operar dentro de los caracteres, es necesario “escapar” el efecto especial. Por eso se agregan barras: \ .

colegios <- colegios %>%
  mutate(LATITUDE = as.numeric(str_remove(lat_int, "CA\\n\\(")), 
         LONGITUDE = as.numeric(str_remove(lon_int, "\\)")))

Veamos el resultado.

set.seed(1)
colegios %>% 
  select(LONGITUDE, LATITUDE) %>% 
  sample_n(10)

##    LONGITUDE LATITUDE
## 1  -122.4008 37.78351
## 2  -122.4504 37.76349
## 3  -122.4261 37.77294
## 4  -122.4212 37.77199
## 5  -122.4247 37.73621
## 6  -122.3995 37.76356
## 7  -122.4167 37.75347
## 8  -122.4907 37.73842
## 9  -122.4304 37.76345
## 10 -122.4351 37.79271

Ahora si! Nuestras variables de latitud y longitud tienen el formato correcto para transformarlas espacialmente.

El dataset de colegios ya limpiado se encuentra aqui.

##colegios <- read.csv("https://raw.githubusercontent.com/paulavidela/utdt_cienciadedatos/main/data/sf_colegios_clean.csv")

Por último exploremos el dataset de proyectos de viviendas

summary(proyectos_zi)

##   Project.ID        Project.Status     Project.Name       Street.Number     
##  Length:406         Length:406         Length:406         Length:406        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Street.Name        Street.Type           Zip.Code     Housing.Tenure    
##  Length:406         Length:406         Min.   :94102   Length:406        
##  Class :character   Class :character   1st Qu.:94103   Class :character  
##  Mode  :character   Mode  :character   Median :94107   Mode  :character  
##                                        Mean   :94111                     
##                                        3rd Qu.:94111                     
##                                        Max.   :94410                     
##  Section.415.Declaration Entitlement.Approval.Date
##  Length:406              Length:406               
##  Class :character        Class :character         
##  Mode  :character        Mode  :character         
##                                                   
##                                                   
##                                                   
##  Actual.Estimated.Completion.Date Planning.Case.Number
##  Length:406                       Length:406          
##  Class :character                 Class :character    
##  Mode  :character                 Mode  :character    
##                                                       
##                                                       
##                                                       
##  Property.Information.Map.Link Planning.Entitlements Project.Units    
##  Length:406                    Length:406            Min.   :   5.00  
##  Class :character              Class :character      1st Qu.:  21.25  
##  Mode  :character              Mode  :character      Median :  43.00  
##                                                      Mean   : 112.28  
##                                                      3rd Qu.: 124.00  
##                                                      Max.   :3522.00  
##  Affordable.Units Units.Subject.to.Section.415 On.Site.Affordable.Units
##  Min.   :  0.0    Min.   :   0.0               Min.   :  0.000         
##  1st Qu.:  0.0    1st Qu.:  21.0               1st Qu.:  0.000         
##  Median :  3.0    Median :  42.0               Median :  3.000         
##  Mean   : 11.2    Mean   : 111.5               Mean   :  9.896         
##  3rd Qu.: 11.0    3rd Qu.: 120.8               3rd Qu.: 10.000         
##  Max.   :176.0    Max.   :3522.0               Max.   :175.924         
##  Off.Site.Affordable.Units Off.Site.Affordable.Units.at.This.Site
##  Min.   :  0.00            Min.   :  0.000                       
##  1st Qu.:  0.00            1st Qu.:  0.000                       
##  Median :  0.00            Median :  0.000                       
##  Mean   :  1.69            Mean   :  1.099                       
##  3rd Qu.:  0.00            3rd Qu.:  0.000                       
##  Max.   :164.00            Max.   :164.000                       
##    SRO.Units         Studio.Units      X1bd.Units       X2bd.Units    
##  Min.   : 0.00000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 0.00000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median : 0.00000   Median : 0.000   Median : 0.000   Median : 1.000  
##  Mean   : 0.07389   Mean   : 1.537   Mean   : 3.446   Mean   : 3.155  
##  3rd Qu.: 0.00000   3rd Qu.: 1.000   3rd Qu.: 3.000   3rd Qu.: 3.000  
##  Max.   :14.00000   Max.   :56.000   Max.   :66.000   Max.   :94.000  
##    X3bd.Units        X4bd.Units        X30..AMI          X50..AMI       
##  Min.   : 0.0000   Min.   :0.0000   Min.   : 0.0000   Min.   :0.000000  
##  1st Qu.: 0.0000   1st Qu.:0.0000   1st Qu.: 0.0000   1st Qu.:0.000000  
##  Median : 0.0000   Median :0.0000   Median : 0.0000   Median :0.000000  
##  Mean   : 0.6305   Mean   :0.0197   Mean   : 0.2364   Mean   :0.007389  
##  3rd Qu.: 0.0000   3rd Qu.:0.0000   3rd Qu.: 0.0000   3rd Qu.:0.000000  
##  Max.   :25.0000   Max.   :5.0000   Max.   :50.0000   Max.   :3.000000  
##     X55..AMI          X60..AMI          X80..AMI         X90..AMI     
##  Min.   :  0.000   Min.   : 0.0000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:  0.000   1st Qu.: 0.0000   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median :  0.000   Median : 0.0000   Median : 0.000   Median : 0.000  
##  Mean   :  4.537   Mean   : 0.4507   Mean   : 0.133   Mean   : 1.791  
##  3rd Qu.:  0.000   3rd Qu.: 0.0000   3rd Qu.: 0.000   3rd Qu.: 0.000  
##  Max.   :158.000   Max.   :48.0000   Max.   :23.000   Max.   :72.000  
##    X100..AMI         X120..AMI         X150..AMI        Supervisor.District
##  Min.   :  0.000   Min.   : 0.0000   Min.   : 0.00000   Min.   : 1.000     
##  1st Qu.:  0.000   1st Qu.: 0.0000   1st Qu.: 0.00000   1st Qu.: 5.000     
##  Median :  0.000   Median : 0.0000   Median : 0.00000   Median : 6.000     
##  Mean   :  2.446   Mean   : 0.2438   Mean   : 0.03941   Mean   : 6.451     
##  3rd Qu.:  1.000   3rd Qu.: 0.0000   3rd Qu.: 0.00000   3rd Qu.: 9.000     
##  Max.   :170.000   Max.   :24.0000   Max.   :10.00000   Max.   :11.000     
##  Neighborhood       Planning.Neighborhood  Plan.Area        
##  Length:406         Length:406            Length:406        
##  Class :character   Class :character      Class :character  
##  Mode  :character   Mode  :character      Mode  :character  
##                                                             
##                                                             
##                                                             
##  Off.Site.Principal.Project.ID Off.Site.Principal.Project Off.Site.Project.ID
##  Length:406                    Length:406                 Length:406         
##  Class :character              Class :character           Class :character   
##  Mode  :character              Mode  :character           Mode  :character   
##                                                                              
##                                                                              
##                                                                              
##  Off.Site.Project      Latitude       Longitude        Location        
##  Length:406         Min.   :37.71   Min.   :-122.5   Length:406        
##  Class :character   1st Qu.:37.76   1st Qu.:-122.4   Class :character  
##  Mode  :character   Median :37.78   Median :-122.4   Mode  :character  
##                     Mean   :37.77   Mean   :-122.4                     
##                     3rd Qu.:37.78   3rd Qu.:-122.4                     
##                     Max.   :37.81   Max.   :-122.4

Este dataset cuenta con información de latitud y longitud.

¿Cómo se ubican los datos en el mapa?

ggplot() + 
  geom_sf(data = barrios) + 
  geom_point(data = colegios, aes(x = LONGITUDE, y = LATITUDE), alpha = 0.5) +
  geom_point(data = proyectos_zi, aes(x = Longitude, y = Latitude), color = "blue", size = 2) + 
  theme_void()

Bounding box o caja de coordenadas: st_bbox()

Para descargar un mapa base, vamos a utilizar la librería ggmap. Esta librería nos permite generar mapas utilizando imágenes como fondo. ggmap se conecta a diferentes servidores de mapas (OpenStreetMap, StamenMap, GoogleMaps entre otros) y descarga los mosaicos (los recortes de las imágenes).

library(ggmap)

## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.

## Please cite ggmap if you use it! See citation("ggmap") for details.

En este ejemplo, vamos a descargarnos el mapa base de Stamen Map.

Con el objetivo de incorporar un mapa base, es necesario primero contar con una caja de coordenadas que defina el recorte del mapa base.
Para obtener la caja de coordenadas del mapa base, vamos a utilizar los datos de los barrios (la geometría que mayor abarca el territorio). Podemos utilizar la función st_bbox para calcular el bounding box.

bbox <- st_bbox(barrios)
class(bbox)

## [1] "bbox"

La función st_bbox devuelve un objeto de tipo “bbox”.
Para descargar un mapa base vamos a utilizar la función get_stamenmap(). Esta función toma múltiples parámetros, en particular el parámetro bbox que refiere al bounding box o caja de coordenadas. Este parámetro, de acuerdo a la documentación, tiene que ser un vector numérico con 4 valores. Como nuestro bbox está en formato bbox, vamos a transformalo a numérico.

bbox <- as.numeric(bbox)
class(bbox)

## [1] "numeric"

Ahora vamos a descargar nuestro mapa base.

mapa_base <- get_stamenmap(bbox = bbox, 
                           maptype = "terrain", 
                           zoom = 12)

## Source : http://tile.stamen.com/terrain/12/654/1582.png

## Source : http://tile.stamen.com/terrain/12/655/1582.png

## Source : http://tile.stamen.com/terrain/12/654/1583.png

## Source : http://tile.stamen.com/terrain/12/655/1583.png

## Source : http://tile.stamen.com/terrain/12/654/1584.png

## Source : http://tile.stamen.com/terrain/12/655/1584.png

Realicemos una visualización.

ggmap(mapa_base) + 
  geom_sf(data = barrios, inherit.aes = FALSE, fill = NA) + 
  geom_point(data = colegios, aes(x = LONGITUDE, y = LATITUDE), alpha = 0.5) +
  geom_point(data = proyectos_zi, aes(x = Longitude, y = Latitude), color = "blue", size = 2) + 
  theme_void()

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

Como vamos a realizar geoprocesos, convertimos los datos de colegios y de proyectos_zi a datos espaciales.

colegios_geo <- st_as_sf(colegios, coords = c("LONGITUDE", "LATITUDE"), crs = 4326)
proyectos_zi_geo <- st_as_sf(proyectos_zi, coords = c("Longitude", "Latitude"), crs=4326)

Se verifica que estén bien cargados

ggmap(mapa_base) + 
  geom_sf(data = barrios, inherit.aes = FALSE, fill = NA, size = 1.2) + 
  geom_sf(data = colegios_geo,  alpha = 0.5, inherit.aes = FALSE) +
  geom_sf(data = proyectos_zi_geo,  color = "blue", size = 2, inherit.aes = FALSE) + 
  theme_void()

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

Dado que nuestro bounding box no es más que una caja de coordenadas, podríamos ajustar los números para poder descargar un mapa mayor Veamos como es nuestro objeto bbox

bbox

## [1] -122.51495   37.70813 -122.35697   37.83330

Podemos acceder a cada elemento del objeto indicando la posición entre corchetes. Por ejemplo:

bbox[2]

## [1] 37.70813

Podemos hacer un bounding box más amplio.

bbox_large <- c(bbox[1] -0.01, bbox[2] - 0.01, bbox[3] + 0.01 , bbox[4] + 0.01)

mapa_base_large <- get_stamenmap(bbox = bbox_large, 
                           maptype = "terrain", 
                           zoom = 12)

## Source : http://tile.stamen.com/terrain/12/653/1582.png

## Source : http://tile.stamen.com/terrain/12/653/1583.png

## Source : http://tile.stamen.com/terrain/12/653/1584.png

ggmap(mapa_base_large) + 
  geom_sf(data = barrios, inherit.aes = FALSE, fill = NA, size = 1.2) + 
  geom_sf(data = colegios_geo,  alpha = 0.5, inherit.aes = FALSE) +
  geom_sf(data = proyectos_zi_geo,  color = "blue", size = 2, inherit.aes = FALSE) + 
  theme_void()

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

Sistemas de coordenadas: st_crs()

El sistema de referencia espacial (SRS: spatial reference system) o el sistema de referencia de coordenadas (CRS: coordinate reference system) es un marco utilizado para medir con precisión ubicaciones en la superficie de la Tierra como coordenadas. (Fuente: Wikipedia)
Para transformar nuestros datos anteriores utilizamos crs=4326 que corresponde a WGS84 (World Geodetic System) o Mercator. Existen algunos sistemas proyectados Pseudo-Mercator o Transverse Mercator (que dependen de las zonas).

Verifiquemos el crs de nuestros datos espaciales.

st_crs(colegios_geo)

## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["World"],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

st_crs(proyectos_zi_geo)

## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["World"],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

Ambos dataset tienen el mismo CRS.

st_crs(colegios_geo)$units

## NULL

Este sistema de proyección no cuenta con unidades de medida. Vamos a reproyectar las coordenadas para que se encuentren en metros. Para San Francisco, vamos a utilizar WGS 84 / UTM zone 10N (crs = 32610). Los sistemas reproyectados dependen de cada ciudad. Pueden encontrar aqui más información sobre los sistemas de coordenadas disponibles.

colegios_reproyectados <- colegios_geo %>%
  st_transform(crs = 32610)
proyectos_zi_reproyectados <- proyectos_zi_geo %>%
  st_transform(crs = 32610)

Revisemos la unidad de medida

st_crs(colegios_reproyectados)$units

## [1] "m"

Buffers o áreas de influencia: st_buffer() y st_intersection()

¿Cuantas escuelas hay en un radio de 1km de los proyectos con zonificación inclusiva?

En primer lugar, determinamos un buffer de 1km desde cada proyecto. Ahora que nuestros datos están reproyectados, podemos generar un radio de 1000m.

buffer_proyecto_zi <- st_buffer(proyectos_zi_reproyectados, 1000)

Se unen los colegios con los buffers. Para eso vamos a utilizar la función st_intersection. Esta funcion devuelve un dataset con una línea para cada pareja colegio-buffer. En caso de que un colegio se encuentre en más de un buffer, entonces dicho colegio se repetira para cada buffer.

colegios_en_buffer <- st_intersection(colegios_reproyectados, buffer_proyecto_zi)

## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries

Ahora se determinan la cantidad de colegios por Proyecto ZI

colegios_x_proyecto_zi <- colegios_en_buffer %>%
  st_set_geometry(NULL) %>%
  group_by(Project.ID) %>%
  summarise(cantidad =n())
head(colegios_x_proyecto_zi)

## # A tibble: 6 × 2
##   Project.ID cantidad
##   <chr>         <int>
## 1 1989-001         18
## 2 1990-019         30
## 3 1991-001         10
## 4 1991-002         16
## 5 1991-003         13
## 6 1992-001         11

Se une a la geometría de proyectos zi. Vamos a utilizar los datos sin reproyectar para poder luego mapearlos.

proyectos_zi_geo <- left_join(proyectos_zi_geo, colegios_x_proyecto_zi)

## Joining, by = "Project.ID"

Queremos mostrar los resultados en un mapa. Dado que nuestro mapa basa se encuentra en WSG84 así como los proyectos, los colegios y los barrios, vamos a reproyectar el buffer para poder visualizarlo también en el mapa.

buffer_reproyectado <- buffer_proyecto_zi %>%
  st_transform(crs = 4236)

ggmap(mapa_base) + 
  geom_sf(data = barrios, 
          inherit.aes = FALSE, fill = NA) + 
  geom_sf(data = buffer_reproyectado,  
          fill= "green", size = 0.1, alpha = 0.1, color ="green", inherit.aes = FALSE) + 
  geom_sf(data = colegios_geo,  
          alpha = 0.5, inherit.aes = FALSE) +
  geom_sf(data = proyectos_zi_geo,  
          color = "blue", size = 2, inherit.aes = FALSE) + 
  theme_void()

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

A continuación, se muestra en el mapa cuales son los proyectos ZI con más colegios próximos:

ggmap(mapa_base) + 
  geom_sf(data = barrios, inherit.aes = FALSE, fill = NA) + 
  geom_sf(data = colegios_geo,  alpha = 0.2, inherit.aes = FALSE, color = "grey20") +
  geom_sf(data = proyectos_zi_geo,  aes(color = cantidad), size = 2, inherit.aes = FALSE) + 
  scale_color_viridis_c( direction = -1) + 
  labs(title = "Cantidad de colegios a menos de 1km" ) +
  theme_void()

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

Si bien hay una concentración de proyectos en el centro, los proyectos más perífericos también tienen colegios cercanos.

¿Cuáles son los colegios que se encuentran a una distancia a pie de más proyectos ZI ?

En primer lugar, determinamos un buffer de 1km desde cada colegio considerando que 10 cuadras es una distancia fácil de recorrer de forma diara a pie. Vamos a utilizar los datos reproyectados.

buffer_colegios <- st_buffer(colegios_reproyectados, 1000)

Unimos ahora los buffers de los colegios con los proyectos ZI.

proyectos_en_buffer <- st_intersection(proyectos_zi_reproyectados, buffer_colegios)

## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries

Ahora se determinan la cantidad de proyectos ZI por colegio.

proyecto_zi_x_colegio <- proyectos_en_buffer %>%
  st_set_geometry(NULL) %>%
  group_by(Campus.Name) %>%
  summarise(cantidad =n())
head(proyecto_zi_x_colegio)

## # A tibble: 6 × 2
##   Campus.Name               cantidad
##   <chr>                        <int>
## 1 ABC Preschool                    3
## 2 Adda Clevenger School           17
## 3 Alamo Elementary School          2
## 4 Alt School - Alamo Square       42
## 5 Alt School - Dogpatch 1         30
## 6 Alt School - Dogpatch 2         32

Se une a la geometría de colegios. Vamos a utilizar los datos sin reproyectar para poder luego mapearlos.

colegios_geo<- left_join(colegios_geo, proyecto_zi_x_colegio)

## Joining, by = "Campus.Name"

A continuación, se muestra en el mapa cuales son colegios con más proyectos ZI próximos:

ggmap(mapa_base) + 
  geom_sf(data = barrios, inherit.aes = FALSE, fill = NA) + 
  geom_sf(data = proyectos_zi_geo,alpha = 0.8, inherit.aes = FALSE, color = "grey20") + 
  geom_sf(data = colegios_geo,  aes(color = cantidad), size = 2, inherit.aes = FALSE) +
  scale_color_viridis_c( direction = -1) + 
  labs(title = "Cantidad de colegios a menos de 1km" ) +
  theme_void()

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

Observamos que en el centro de la ciudad se concentran los colegios que más proyectos de zonificacion inclusiva abastecen. ¿Cuál es el colegio que se encuentra más próximo a más proyecto de zonificación inclusiva?

Objetos más próximos: st_nearest_feature()

Colegio más proximo

Se busca determinar cuál es el colegio que tiene más viviendas de inclusión social próximas a él.
Se incorpora al dataframe de colegios, el id de la fila. Esto nos va permitir identificar el colegio más adelante.

colegios_geo <- colegios_geo %>%
  mutate(id = row_number())

Para localizar el colegio más próximo a cada proyecto ZI, utilizamos la función st_nearest_feature(). Esta función busca para cada observación del dataset de proyectos, cual es el objeto linealmente más cercano del dataset de colegios.

colegios_proximos <- st_nearest_feature(proyectos_zi_geo, colegios_geo)

## although coordinates are longitude/latitude, st_nearest_points assumes that they are planar

Como podemos observar, esta función devuelve una lista con el número de fila de cada colegio, en la posición correspondiente a cada proyecto. Para simplificar la lectura, vamos a generar esta información dentro de nuestro dataset de proyectos.

Se genera una nueva columna con el colegio más cercano. Podríamos reproyectar los datos para tener distancias exactas en m. Dado el contexto reducido de datos, y que buscamos calcular relaciones, vamos a mantener los datos en WGS84.

proyectos_zi_geo_colegio <- proyectos_zi_geo %>%
  mutate(id_colegio = st_nearest_feature(proyectos_zi_geo, colegios_geo ))

## although coordinates are longitude/latitude, st_nearest_points assumes that they are planar

¿Cuál es el id de colegio más repetido?

id_mas_repetido <- proyectos_zi_geo_colegio %>%
  st_set_geometry(NULL) %>%
  group_by(id_colegio) %>%
  summarise(repeticiones = n()) %>%
  arrange(desc(repeticiones)) 

head(id_mas_repetido)

## # A tibble: 6 × 2
##   id_colegio repeticiones
##        <int>        <int>
## 1        386           16
## 2        133           10
## 3        155           10
## 4        251           10
## 5        203            8
## 6        234            8

¿Cuál es ese colegio?

colegios_geo %>%
  filter(id == 386)

## Simple feature collection with 1 feature and 25 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -122.3956 ymin: 37.78529 xmax: -122.3956 ymax: 37.78529
## geographic CRS: WGS 84
##                Campus.Name CCSF.Entity Lower.Grade Upper.Grade Grade.Range
## 1 Bright Horizons - 2Nd St     Private          -2           4        PK-4
##                Category Map.Label Lower.Age Upper.Age General.Type  CDS.Code
## 1 Independent / Private    IND217         3         9          IND 384002149
##                                 Campus.Address Supervisor.District County.FIPS
## 1 303 02ND ST STE 250, SAN FRANCISCO, CA 94107                   6        6075
##     County.Name                   Location.1 Neighborhoods..old. Zip.Codes
## 1 SAN FRANCISCO CA\n(37.785294, -122.395569)                   6     28856
##   Fire.Prevention.Districts Police.Districts Supervisor.Districts
## 1                         6                2                    9
##          lat_int       lon_int cantidad                   geometry  id
## 1 CA\n(37.785294  -122.395569)       57 POINT (-122.3956 37.78529) 386

Es el colegio Bright Horizons. Tiene 57 proyectos ZI dentro su radio de 1km, y es el más próximo a 16 proyectos ZI. ¿Donde se ubica?

ggmap(mapa_base) + 
  geom_sf(data = barrios, inherit.aes = FALSE, fill = NA) + 
  geom_sf(data = proyectos_zi_geo, size = 2, inherit.aes = FALSE) + 
  geom_sf(data = colegios_geo %>% filter(id == 386),  size = 10, inherit.aes = FALSE, color = "red", shape = 3) +
  scale_color_viridis_c( direction = -1) + 
  labs(title = "Colegio con Proyectos ZI más próximos" ) +
  theme_void()

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

¿Tal vez la política pública debería estar orientada a ese colegio?

Distancias: st_distance()

Distancias a los colegios más proximos

El dataset que generamos anteriormente proyectos_zi_geo_colegio contiene para cada proyecto, el id del colegio más cercano.

str(proyectos_zi_geo_colegio)

## Classes 'sf' and 'data.frame':   406 obs. of  47 variables:
##  $ Project.ID                            : chr  "2010-007" "2002-006" "2016-078" "2005-011" ...
##  $ Project.Status                        : chr  "(6) Complete" "(6) Complete" "(4) Site Work Permit Issued" "(6) Complete" ...
##  $ Project.Name                          : chr  "301-303, 307-309 & 313-315 Moraga" "1828 Geneva" "2600 Harrison" "333 Fremont" ...
##  $ Street.Number                         : chr  "1701" "1828" "2600" "333" ...
##  $ Street.Name                           : chr  "09th" "Geneva" "Harrison" "Fremont" ...
##  $ Street.Type                           : chr  "Ave" "Ave" "St" "St" ...
##  $ Zip.Code                              : int  94122 94134 94110 94105 94103 94103 94103 94102 94103 94110 ...
##  $ Housing.Tenure                        : chr  "Unknown" "Rental" "Unknown" "Rental" ...
##  $ Section.415.Declaration               : chr  "Fee Payment" "On-site BMR Project" "Fee Payment" "Fee Payment" ...
##  $ Entitlement.Approval.Date             : chr  "01/01/2010" "01/10/2002" "08/11/2016" "06/16/2005" ...
##  $ Actual.Estimated.Completion.Date      : chr  "01/08/2013" "03/01/2011" "12/28/2020" "06/19/2014" ...
##  $ Planning.Case.Number                  : chr  "2009.0904" "2000.1191" "2014.0503" "2002.1263" ...
##  $ Property.Information.Map.Link         : chr  "http://propertymap.sfplanning.org/?search=2009.0904" "http://propertymap.sfplanning.org/?search=2000.1191" "http://propertymap.sfplanning.org/?search=2014.0503" "http://propertymap.sfplanning.org/?search=2002.1263" ...
##  $ Planning.Entitlements                 : chr  "2009.0904D" "03-100,03-101,2000.1191C,2000.1191E,2000.1191S" "2014.0503E,2014.0503ENX,2014.0503PPA,2014.0503U" "2002.1263C,2002.1263E,2002.1263K,2002.1263Q,2002.1263S,2002.1263U,2002.1263V" ...
##  $ Project.Units                         : int  6 48 19 83 14 160 15 330 11 36 ...
##  $ Affordable.Units                      : int  0 6 0 0 2 0 3 50 2 8 ...
##  $ Units.Subject.to.Section.415          : int  6 48 19 83 14 160 15 330 11 36 ...
##  $ On.Site.Affordable.Units              : num  0 6 0 0 2 0 3 50 2 8 ...
##  $ Off.Site.Affordable.Units             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Off.Site.Affordable.Units.at.This.Site: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ SRO.Units                             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Studio.Units                          : int  0 0 0 0 0 0 0 12 2 0 ...
##  $ X1bd.Units                            : int  0 1 0 0 1 0 0 23 0 8 ...
##  $ X2bd.Units                            : int  0 4 0 0 1 0 0 15 0 0 ...
##  $ X3bd.Units                            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ X4bd.Units                            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ X30..AMI                              : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ X50..AMI                              : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ X55..AMI                              : int  0 0 0 0 0 0 0 50 0 0 ...
##  $ X60..AMI                              : int  0 6 0 0 0 0 0 0 0 0 ...
##  $ X80..AMI                              : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ X90..AMI                              : int  0 0 0 0 0 0 3 0 2 0 ...
##  $ X100..AMI                             : int  0 0 0 0 2 0 0 0 0 4 ...
##  $ X120..AMI                             : int  0 0 0 0 0 0 0 0 0 4 ...
##  $ X150..AMI                             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Supervisor.District                   : int  7 11 9 6 6 6 6 8 6 9 ...
##  $ Neighborhood                          : chr  "Inner Sunset" "Crocker Amazon" "Mission" "South of Market" ...
##  $ Planning.Neighborhood                 : chr  "Inner Sunset" "Excelsior" "Mission" "Financial District/South Beach" ...
##  $ Plan.Area                             : chr  "" "" "Mission (EN)" "Rincon Hill" ...
##  $ Off.Site.Principal.Project.ID         : chr  "0" "0" "0" "0" ...
##  $ Off.Site.Principal.Project            : chr  "0" "0" "0" "0" ...
##  $ Off.Site.Project.ID                   : chr  "0" "0" "0" "0" ...
##  $ Off.Site.Project                      : chr  "0" "0" "0" "0" ...
##  $ Location                              : chr  "(37.75638263, -122.4660624)" "(37.70990597, -122.4248645)" "(37.75537821, -122.4124064)" "(37.78768215, -122.3926945)" ...
##  $ cantidad                              : int  10 4 27 9 24 24 21 26 26 18 ...
##  $ geometry                              :sfc_POINT of length 406; first list element:  'XY' num  -122.5 37.8
##  $ id_colegio                            : int  328 280 52 253 315 418 251 129 224 291 ...
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:46] "Project.ID" "Project.Status" "Project.Name" "Street.Number" ...

La variable ‘id_colegio’ indica el colegio más cercano para ese proyecto.
Para calcular la distancia entre el proyecto de zonificación inclusiva y el colegio, vamos a utilizar la función st_distance() que toma como parámetro dos objetos de tipo sf. En nuestro caso, el primer objeto es la ubicación del proyecto y el segundo objeto es la ubicación del colegio. Vamos a entonces armar un sf con la ubicación de los colegios, considerando los colegios más próximos.

colegios_proximos <- proyectos_zi_geo_colegio %>%
  st_set_geometry(NULL) %>%
  select(id_colegio) %>%
  left_join(colegios_geo, by = c("id_colegio" = "id")) %>%
  st_as_sf(sf_column_name = "geometry")

Veamos nuestro dataset

head(arrange(colegios_proximos, id_colegio))

## Simple feature collection with 6 features and 25 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -122.5037 ymin: 37.75383 xmax: -122.4201 ymax: 37.78505
## geographic CRS: WGS 84
##   id_colegio                        Campus.Name CCSF.Entity Lower.Grade
## 1          1 Milk, Harvey Milk Childrens Center       SFUSD          -2
## 2          1 Milk, Harvey Milk Childrens Center       SFUSD          -2
## 3          1 Milk, Harvey Milk Childrens Center       SFUSD          -2
## 4          1 Milk, Harvey Milk Childrens Center       SFUSD          -2
## 5          5     Noriega Early Education School       SFUSD          -2
## 6          7       Montessori House Of Children     Private           0
##   Upper.Grade Grade.Range              Category Map.Label Lower.Age Upper.Age
## 1          -1          PK              USD PreK    CDC095         3         4
## 2          -1          PK              USD PreK    CDC095         3         4
## 3          -1          PK              USD PreK    CDC095         3         4
## 4          -1          PK              USD PreK    CDC095         3         4
## 5           5        PK-5         USD PreK/TK-5     PS085         3        10
## 6           1         K-1 Independent / Private    IND276         5         6
##   General.Type     CDS.Code                            Campus.Address
## 1          CDC           NA      841 ELLIS ST, SAN FRANCISCO CA 94117
## 2          CDC           NA      841 ELLIS ST, SAN FRANCISCO CA 94117
## 3          CDC           NA      841 ELLIS ST, SAN FRANCISCO CA 94117
## 4          CDC           NA      841 ELLIS ST, SAN FRANCISCO CA 94117
## 5           PS 3.868478e+13    1775 44TH AVE, San Francisco, CA 94122
## 6          IND 3.805013e+08 1187 FRANKLIN ST, SAN FRANCISCO, CA 94109
##   Supervisor.District County.FIPS   County.Name                   Location.1
## 1                   6        6075 SAN FRANCISCO CA\n(37.783802, -122.420105)
## 2                   6        6075 SAN FRANCISCO CA\n(37.783802, -122.420105)
## 3                   6        6075 SAN FRANCISCO CA\n(37.783802, -122.420105)
## 4                   6        6075 SAN FRANCISCO CA\n(37.783802, -122.420105)
## 5                   4        6075 SAN FRANCISCO CA\n(37.753834, -122.503654)
## 6                   5        6075 SAN FRANCISCO CA\n(37.785046, -122.423416)
##   Neighborhoods..old. Zip.Codes Fire.Prevention.Districts Police.Districts
## 1                  36     28858                         7                9
## 2                  36     28858                         7                9
## 3                  36     28858                         7                9
## 4                  36     28858                         7                9
## 5                  35        56                         1                8
## 6                  41     28858                        13                9
##   Supervisor.Districts        lat_int       lon_int cantidad
## 1                    9 CA\n(37.783802  -122.420105)       74
## 2                    9 CA\n(37.783802  -122.420105)       74
## 3                    9 CA\n(37.783802  -122.420105)       74
## 4                    9 CA\n(37.783802  -122.420105)       74
## 5                    3 CA\n(37.753834  -122.503654)        3
## 6                   11 CA\n(37.785046  -122.423416)       51
##                     geometry
## 1  POINT (-122.4201 37.7838)
## 2  POINT (-122.4201 37.7838)
## 3  POINT (-122.4201 37.7838)
## 4  POINT (-122.4201 37.7838)
## 5 POINT (-122.5037 37.75383)
## 6 POINT (-122.4234 37.78505)

El listado contiene colegios repetidos: representa el colegio más cercano al proyecto de zonificación inclusiva.
Agregamos la distancia al sf de proyectos de zonificación inclusiva. El parámetro by_element = TRUE permite calcular la distancia elemento con elemento. De lo contrario, se calcula la matriz de distancia de todos los elementos contra todos los elementos.
Nuevamente podríamos reproyectar las coordenadas, pero vamos a mantener el crs actual.

proyectos_zi_geo_colegio2 <- proyectos_zi_geo_colegio %>%
  select(Project.ID, Project.Status, Project.Name, Project.Units, id_colegio) %>%
  mutate(distancia= st_distance(proyectos_zi_geo_colegio, colegios_proximos, by_element = TRUE))

Veamos el resultado

head(proyectos_zi_geo_colegio2)

## Simple feature collection with 6 features and 6 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -122.4661 ymin: 37.70991 xmax: -122.3927 ymax: 37.78768
## geographic CRS: WGS 84
##   Project.ID                         Project.Status
## 1   2010-007                           (6) Complete
## 2   2002-006                           (6) Complete
## 3   2016-078            (4) Site Work Permit Issued
## 4   2005-011                           (6) Complete
## 5   2003-024                           (6) Complete
## 6   2016-023 (5) First Construction Document Issued
##                        Project.Name Project.Units id_colegio
## 1 301-303, 307-309 & 313-315 Moraga             6        328
## 2                       1828 Geneva            48        280
## 3                     2600 Harrison            19         52
## 4                       333 Fremont            83        253
## 5           1277 Howard /776 Tehama            14        315
## 6                       1699 Market           160        418
##                     geometry     distancia
## 1 POINT (-122.4661 37.75638) 228.77921 [m]
## 2 POINT (-122.4249 37.70991) 387.30658 [m]
## 3 POINT (-122.4124 37.75538) 152.42529 [m]
## 4 POINT (-122.3927 37.78768) 311.56146 [m]
## 5 POINT (-122.4125 37.77502) 216.17237 [m]
## 6 POINT (-122.4219 37.77224)  64.92708 [m]

Transformamos la variable ‘distancia’ a numérica.

proyectos_zi_geo_colegio2 <- proyectos_zi_geo_colegio2 %>%
  mutate(dist_m = as.numeric(distancia))

¿Cómo se distribuye la distancia entre proyectos de zonificación inclusiva y colegios?

proyectos_zi_geo_colegio2 %>%
  ggplot() + 
  geom_histogram(aes(x = dist_m)) + 
  labs(title = "Acceso a la educación en proyectos ZI", 
       x = "Distancia (m)", 
       y = "Cantidad de proyectos") + 
  scale_y_continuous(breaks = seq(0, 60, 10)) + 
  theme_minimal()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Veamoslo en un boxplot según estado del proyecto

proyectos_zi_geo_colegio2 %>%
  ggplot() + 
  geom_boxplot(aes(x = dist_m, y = Project.Status, fill = Project.Status)) + 
  labs(title = "Acceso a la educación en proyectos ZI según estado", 
       x = "Distancia (m)", 
       y = "Estado del proyecto") + 
  scale_x_continuous(breaks = seq(0, 1000, 100)) + 
  theme_minimal() + 
  theme(legend.position = "none")

Se observa que los proyectos ya completos se encuentran a una distancia mayor de establecimientos educativos. ¿Que políticas públicas urbanas pueden incentivar el desarrollo de proyectos zi en zonas mejor abastecidas?

Áreas: st_area()

Queremos evaluar si la cantidad de colegios por barrio tiene alguna relación con la superficie del barrio. Para eso, necesitamos calcular las superficies de los barrios.

head(barrios)

## Simple feature collection with 6 features and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -122.4543 ymin: 37.70822 xmax: -122.357 ymax: 37.80602
## geographic CRS: WGS 84
##                            nhood                       geometry
## 1          Bayview Hunters Point MULTIPOLYGON (((-122.3816 3...
## 2                 Bernal Heights MULTIPOLYGON (((-122.4036 3...
## 3            Castro/Upper Market MULTIPOLYGON (((-122.4266 3...
## 4                      Chinatown MULTIPOLYGON (((-122.4062 3...
## 5                      Excelsior MULTIPOLYGON (((-122.424 37...
## 6 Financial District/South Beach MULTIPOLYGON (((-122.3875 3...

Podemos usar la función mutate() para generar una nueva columna con el área de cada barrio.

barrios <- barrios %>%
  mutate(area = st_area(geometry))

Veamos los resultados

head(barrios)

## Simple feature collection with 6 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -122.4543 ymin: 37.70822 xmax: -122.357 ymax: 37.80602
## geographic CRS: WGS 84
##                            nhood                       geometry
## 1          Bayview Hunters Point MULTIPOLYGON (((-122.3816 3...
## 2                 Bernal Heights MULTIPOLYGON (((-122.4036 3...
## 3            Castro/Upper Market MULTIPOLYGON (((-122.4266 3...
## 4                      Chinatown MULTIPOLYGON (((-122.4062 3...
## 5                      Excelsior MULTIPOLYGON (((-122.424 37...
## 6 Financial District/South Beach MULTIPOLYGON (((-122.3875 3...
##               area
## 1 13400197.5 [m^2]
## 2  2792220.2 [m^2]
## 3  2220409.2 [m^2]
## 4   581896.7 [m^2]
## 5  3606239.5 [m^2]
## 6  2910823.7 [m^2]

¿Cuál es la clase del resultado?

class(barrios$area)

## [1] "units"

Vamos a transformarlo en una variable numérica, y que mida el área en hectáreas (¡m2 es muy pequeño para un barrio!)

barrios <- barrios %>%
  mutate(area_km2 = round(as.numeric(area)/ 1000000, digits = 2))

head(barrios)

## Simple feature collection with 6 features and 3 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -122.4543 ymin: 37.70822 xmax: -122.357 ymax: 37.80602
## geographic CRS: WGS 84
##                            nhood                       geometry
## 1          Bayview Hunters Point MULTIPOLYGON (((-122.3816 3...
## 2                 Bernal Heights MULTIPOLYGON (((-122.4036 3...
## 3            Castro/Upper Market MULTIPOLYGON (((-122.4266 3...
## 4                      Chinatown MULTIPOLYGON (((-122.4062 3...
## 5                      Excelsior MULTIPOLYGON (((-122.424 37...
## 6 Financial District/South Beach MULTIPOLYGON (((-122.3875 3...
##               area area_km2
## 1 13400197.5 [m^2]    13.40
## 2  2792220.2 [m^2]     2.79
## 3  2220409.2 [m^2]     2.22
## 4   581896.7 [m^2]     0.58
## 5  3606239.5 [m^2]     3.61
## 6  2910823.7 [m^2]     2.91

Uniones espaciales: st_join()

Vamos a incluir el barrio de cada colegio a partir de una unión espacial.

colegios_con_barrios <- colegios_geo %>%
  select(Campus.Name) %>%
  st_intersection(barrios %>% select(nhood))

## although coordinates are longitude/latitude, st_intersection assumes that they are planar

## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries

Veamos el resultado

head(colegios_con_barrios)

## Simple feature collection with 6 features and 2 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -122.4053 ymin: 37.71953 xmax: -122.381 ymax: 37.73866
## geographic CRS: WGS 84
##                                                       Campus.Name
## 12                        Marshall, Thurgood Marshall High School
## 36 Economic Opportunity Council Sf - Martin Luther King Childcare
## 37                                    Malcolm X Elementary School
## 54                                           Kipp Bayview Academy
## 60     Faces Sf Child Development Program - Bayview-Hunters Point
## 94                                     Big City Montessori School
##                    nhood                   geometry
## 12 Bayview Hunters Point POINT (-122.4016 37.73631)
## 36 Bayview Hunters Point  POINT (-122.386 37.73704)
## 37 Bayview Hunters Point  POINT (-122.381 37.73424)
## 54 Bayview Hunters Point POINT (-122.3966 37.71953)
## 60 Bayview Hunters Point POINT (-122.3842 37.73404)
## 94 Bayview Hunters Point POINT (-122.4053 37.73866)

Ahora vamos a realizar un conteo de colegios para cada barrio.

colegios_x_barrio <- colegios_con_barrios %>%
  st_set_geometry(NULL) %>%
  group_by(nhood) %>%
  summarise(cantidad_colegios =n())

Vamos a realizar una unión simple con los barrios.

barrios <- left_join(barrios, colegios_x_barrio, by = "nhood")

Ahora que tenemos el área del barrio y la cantidad de colegios, vamos a calcular el ratio de colegios por km2.

barrios <- barrios %>%
  mutate(ratio = round(cantidad_colegios / area_km2, digits = 2))

head(barrios)

## Simple feature collection with 6 features and 5 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -122.4543 ymin: 37.70822 xmax: -122.357 ymax: 37.80602
## geographic CRS: WGS 84
##                            nhood             area area_km2 cantidad_colegios
## 1          Bayview Hunters Point 13400197.5 [m^2]    13.40                21
## 2                 Bernal Heights  2792220.2 [m^2]     2.79                 9
## 3            Castro/Upper Market  2220409.2 [m^2]     2.22                12
## 4                      Chinatown   581896.7 [m^2]     0.58                16
## 5                      Excelsior  3606239.5 [m^2]     3.61                10
## 6 Financial District/South Beach  2910823.7 [m^2]     2.91                 9
##                         geometry ratio
## 1 MULTIPOLYGON (((-122.3816 3...  1.57
## 2 MULTIPOLYGON (((-122.4036 3...  3.23
## 3 MULTIPOLYGON (((-122.4266 3...  5.41
## 4 MULTIPOLYGON (((-122.4062 3... 27.59
## 5 MULTIPOLYGON (((-122.424 37...  2.77
## 6 MULTIPOLYGON (((-122.3875 3...  3.09

Vamos a realizar un mapa coroplético

ggmap(mapa_base) + 
  geom_sf(data = barrios, aes(fill = ratio), inherit.aes = FALSE, alpha = 0.5) + 
  scale_fill_viridis_c(option = "magma", direction = -1) +
  labs(title = "Cobertura de colegios por barrio", 
       fill = "Colegios/km2")+ 
  theme_void() + 
  theme(legend.position = "bottom")

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

En el centro de la ciudad se observa una mayor cobertura de colegios.

Union de polígonos: st_union()

Vamos a utilizar la capa de los barrios para generar una unión de todos los barrios (obteniendo el perímetro de San Francisco). Dado que los barrios no se encuentran perfectamente dibujados, vamos a hacer un pequeño buffer para garantizar la unión.

san_francisco <- barrios %>%
  st_buffer(0.001) %>%
  mutate(ciudad = "SF") %>%
  st_make_valid() %>% 
  group_by(ciudad) %>%
  summarise(geometry = st_union(geometry)) %>%
  ungroup()

## Warning in st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle =
## endCapStyle, : st_buffer does not correctly buffer longitude/latitude data

## dist is assumed to be in decimal degrees (arc_degrees).

ggplot() + 
  geom_sf(data = san_francisco)

Vemos que quedaron algunas geometrías no resueltas. Vamos a transformar el objeto (multipolígono) en un solo objeto, vamos a calcular el área, y vamos a quedarnos con el polígono de mayor área

Obteniendo coordenadas: st_coordinates()

Por último, nos interesaría generar un mapa de densidad de colegios utilizando el mapa base. Para eso, vamos a obtener las coordenadas de los colegios.
La función cbind() nos permite unir columnas.

colegios_coord <- colegios_geo %>%
  select(Campus.Name) %>%
  cbind(st_coordinates(colegios_geo))

Con eso vamos a generar el mapa de densidad

ggmap(mapa_base)+
stat_density2d(data = colegios_geo%>% cbind(., st_coordinates(colegios_geo)), 
               aes(x = X, y = Y, fill = ..density..), 
                geom = 'tile', contour = san_francisco, alpha = 0.5)+ 
    geom_sf(data = barrios,
            fill = NA, alpha = 0.5, inherit.aes = FALSE) +
  labs(title = "Concentración de colegios",
       fill = "Densidad") + 
  scale_fill_viridis_c(option = "A", direction = -1) + 
  theme_void()

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

Conclusión

Estos geoprocesos permiten trabajar y manipular datos espaciales. Existen muchas más funciones dentro de la librería sf. ¡Los invito a seguir explorandolas!