vector data represents the world using points (e.g., buildings, houses), lines (e.g., rivers, roads) and polygons (e.g., states, counties, ZCTAs, tracts)
raster data divides Earth’s surface up into cells of constant size. Have been a vital source of geographic data since the origins of aerial photography and satellite-based remote sensing devices. It is more computationally expensive.
Social scientists rely more often on vector data becuase it captures social dynamics more accurately (buildings, settlements). Raster data is more useful to model nature (montains, elevation).
This course relies on vector data, unless only raster data are available to address our research questions.
Georeferenced objects enable
add a distinct perspective on the world.
a unique lens through which to examine events, patterns, and processes.
concern what happens where, and makes use of geographic information that links features and phenomena on the Earth’s surface to their locations
Place: neighbourhood, the city, the state, or the country.
Attributes: any recorded characteristic or property of a place (e.g., name, number of crimes, GDP).
Objects are the operationalization of* places which are part of a Coordinate reference system (CRS) representing geolocated or georeferenced points that can be linked together to form more complex geometries such as lines and polygons.
Points: pairs of coordinates, in latitude/longitude or some other standard system
Lines: ordered sequences of points connected by straight lines
Areas or polygons: ordered rings of points, also connected by straight lines to form polygons.
Objects representation in vector data, see Medina and Solymosi (2019)
Identify any location on the Earth’s surface using two values — longitude and latitude
Latitude (y-axis) and Longitude (x-axis) Source Illinois State University
Example of intersections, source encyclopedia britannica
Map projections try to portray the surface of the earth or a portion of the earth on a flat piece of paper or computer screen.
The decision as to which map projection and coordinate reference system to use, depends on the regional extent of the area
Maps, are representations of reality.
* For data modeling purposes, as conducted in this class, most projections will already come in planar form.
For this section we will rely on the following packages
install.packages("tigris")
install.packages("acs")
install.packages("spdep")
install.pacakges("tmaptools")
library(spdep)
library(tigris)
options(tigris_use_cache = TRUE)
library(acs)
library(stringr)
library(tmaptools)## Loading required package: sp
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## Loading required package: sf
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
## To enable
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
## Loading required package: stringr
## Loading required package: XML
##
## Attaching package: 'acs'
## The following object is masked from 'package:base':
##
## apply
Recall that we have the following georefenced objects:
Simple feature types and their respective ‘multi’ versions see Lovelace, Nowosad, and Muenchow (2019)
However, lines are seldomly used in social sciences research. So, we will focus on polygons and points. The following commands build upon Walker (2020)
## Warning in proj4string(obj): CRS object has comment, which is lost in output
Limit to the contiguous USA
states<-states[states$NAME %ni% c('Alaska','American Samoa','Commonwealth of the Northern Mariana Islands','Guam','Hawaii','United States Virgin Islands','Puerto Rico'),]
plot(states)## Warning in proj4string(obj): CRS object has comment, which is lost in output
co@data<-cbind(co@data,coordinates(co))
colnames(co@data)[(ncol(co@data)-1):ncol(co@data)]<-c("lon","lat")
plot(co)## Warning in proj4string(obj): CRS object has comment, which is lost in output
Trim to the contiguous USA
zip$lat<-as.numeric(as.character(zip$INTPTLAT10))
zip$lon<-as.numeric(as.character(zip$INTPTLON10))
length(zip$lon)## [1] 33144
zip<-zip[zip$lon > -124.848 &
zip$lon < -66.886 &
zip$lat > 24.3964 &
zip$lat < 49.3844, ]
plot(zip)Reading zip code of a particular state or group of states
# For NY state you can use the function starts with
zip_NY<-zctas(starts_with = c("06","10","11","12","13","14"), class="sp")## Warning in proj4string(obj): CRS object has comment, which is lost in output
## Warning in proj4string(obj): CRS object has comment, which is lost in output
## Warning in proj4string(obj): CRS object has comment, which is lost in output
Back in the day, it was sort of straigthforward…
Nowadays is cumbersome and mostly dissapointing.
Since R inline geocoding ain’t good, I usually rely on free resources that are quite powerful, such as GPS Visualiser.
However, let’s try out the newest, (not so great) tool.
library(tmaptools)
a<-getwd()
url18 <- paste('http://nces.ed.gov/ipeds/datacenter/data/HD2018.zip')
download.file(url18, destfile = paste(a,"HD2018.zip",sep="/"))
d18<- read.csv(unz("HD2018.zip", "hd2018.csv"))
set.seed(47)
d18<-d18[sample(nrow(d18), 20), c("UNITID", "INSTNM", "ADDR", "CITY", "STABBR", "ZIP", "LONGITUD", "LATITUDE")]
d18$address <- apply( d18[sample(nrow(d18), 20), c("ADDR", "CITY", "STABBR", "ZIP") ] , 1 , paste , collapse = ", ")
head(d18)## [1] "404 South Upham St., Lakewood, CO, 80226"
## [2] "1400 Washington Avenue, Albany, NY, 12222"
## [3] "476 Hubbard Drive, Lancaster, SC, 29720-0889"
## [4] "1910 University Dr, Boise, ID, 83725"
## [5] "2000 Westmoreland Street, Suite A, Richmond, VA, 23230"
## [6] "100 College Boulevard, Niceville, FL, 32578-1295"
## [7] "6191 Kraft Avenue S.E., Grand Rapids, MI, 49512-9396"
## [8] "2383 Cherry Road, Rock Hill, SC, 29730"
## [9] "639 38th St, Rock Island, IL, 61201-2296"
## [10] "1156 Barranca, El Paso, TX, 79935-5538"
## [11] "1624 Woodworth NE, Grand Rapids, MI, 49525-2473"
## [12] "15400 Sherman Way, Suite 101, Van Nuys, CA, 91406"
## [13] "2211 W Germann Road, Chandler, AZ, 85286"
## [14] "1111 Hwy 75, Macy, NE, 68039-0428"
## [15] "2345 Southwest 3rd Street, Suite101, Grand Prairie, TX, 75051-4892"
## [16] "14555 Potomac Mills Rd, Woodbridge, VA, 22192-6808"
## [17] "One University Plaza, Youngstown, OH, 44555-0001"
## [18] "5171 Eisenhower Parkway, Macon, GA, 31206-5309"
## [19] "1000 S. Fremont Ave. Mailbox #45, Bldg A10, 4th Floor, Suite 10402, Alhambra, CA, 91803"
## [20] "501 West College Drive, Brainerd, MN, 56401-3900"
## The following with address did NOT EVEN WORK
# tests <- geocode_OSM(paste(d18$address, "bar", sep = " "), details = TRUE, as.data.frame = TRUE)
# tests
# dissapointing :(
tests <- geocode_OSM(paste(d18$INSTNM, "bar", sep = " "), details = TRUE, as.data.frame = TRUE)## No results found for "Ross Medical Education Center-Grand Rapids North bar".
## No results found for "Kenneth Shuler School of Cosmetology-Rock Hill bar".
## No results found for "International Baptist College and Seminary bar".
## No results found for "Fortis College-Richmond bar".
## No results found for "University of South Carolina-Lancaster bar".
## No results found for "Colorado Media School bar".
## No results found for "California Institute of Advanced Management bar".
## No results found for "Brightwood College-Los Angeles-Van Nuys bar".
## No results found for "Nebraska Indian Community College bar".
## No results found for "Mid Cities Barber College bar".
## No results found for "Altierus Career College-Woodbridge bar".
## No results found for "International Business College-El Paso bar".
## [1] 0.4
# our success rate is 40%, NO BUENO -->
# extracting from the result only coordinates and address -->
tests <- tests[, c("lat", "lon", "display_name")]
# Print the results
testsLovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2019. Geocomputation with R. CRC Press. https://geocompr.robinlovelace.net/.
Medina, Juanjo, and Reka Solymosi. 2019. Crime Mapping in R. Open Access rmarkdown/bookdown. https://maczokni.github.io/crimemapping_textbook_bookdown/.
Walker, Kyle. 2020. Tigris: Load Census Tiger/Line Shapefiles. https://CRAN.R-project.org/package=tigris.