As was mentioned earlier, Socrata is a platform that is used by many local governments to publish open data. The Socrata Open Data API (SODA API) allows us to access these data resources from an R script or notebook in a standardized way. In the video you watched, the API was likened to a waiter or liaison who carries requests and responses back and forth between a client computer (i.e., you) and server computer (i.e., the open data portal). The documentation for the SODA API can be found here and begins with API endpoints, which are essentially Uniform Resource Locators (URLs) that provide access to data. Without getting into too much detail at the moment, we use the Hypertext Transfer Protocol (HTTP) to send requests and receive responses. Most often, we are using the GET method to send queries and retrieve data.

In some cases, an R package exists that allows us to interact with an API without having to construct queries that conform to the requirements of the API and pass them using a package like httr (this is what the ungraded exercise that we started in class last time asked you to do from your web browser). The RSocrata package that we installed and loaded above does this “dirty work” of formatting the underlying HTTP queries that allow us to interact with open data portals that use the Socrata platform.

Let’s take a closer look…

help(package="RSocrata")

The “Inspect RSocrata Documentation” code chunk skimply opens a page with seven functions that are included in this package. Take a look at this page (i.e., in the Help tab in the lower-right corner), then proceed to Exercise 1 below.

Exercise 1

Review the documentation page that appears after running the code chunk above.

1. Identify the function you would use to list datasets that exist on the City of San Francisco’s open data portal

Review the usage, arguments, and examples for that function

2. Insert a new code chunk below this text

Write and execute a line of code that retrieves a list of all datasets on the portal
Note: if you decide to work in a new R script, you will need to install and load the RSocrata package there

sf_data <-  ls.socrata("https://data.sfgov.org/browse")

head(sf_data)

3. Save the information returned by the function in 1. to an object (e.g., sf_data) using the assignment operator `<-`

What kind of object is this?

typeof(sf_data)

## [1] "list"

How many rows and columns are there?

dim(sf_data)

## [1] 650  14

There are 650 rows of 14 columns.

What are the names of those columns?

colnames(sf_data)

##  [1] "accessLevel"  "landingPage"  "issued"       "@type"        "modified"    
##  [6] "keyword"      "contactPoint" "publisher"    "identifier"   "description" 
## [11] "title"        "distribution" "license"      "theme"

There are more detailed instructions on what to submit and how at the end of this notebook, but you essentially you should add code chunks and text chunks (i.e., Markdown sections) to this R Notebook that you have saved locally that perform the tasks and that answer the questions posed above.

Exploring the Inventory

Next, return to the documentation page for the RSocrata package. The read.socrata function can be used to retrieve a dataset from the San Francisco open data portal. In addition to indexing objects like lists or data frames by their positions using bracket notation, we can also refer to variable names, if they exist using the $ operator. Let’s sort the inventory and peruse the types of information available. The table function provides a count of instances within each category.

sf_contents <- ls.socrata("https://data.sfgov.org/limitTo=datasets")
names(sf_contents)
head(sf_contents)

sort(sf_contents$title)
table(sf_contents$theme)

How many datasets are there on the San Francisco portal right now?

650

How many are related to Public Safety?

Recall that the default behavior is for the output of a code chunk to appear below it, and the tabular results of the four functions that are executed after the sf_contents data frame object is created can be viewed there too. The names function displays the column or attribute names while the head function displays the first six rows of the sf_contents data frame object. The sort function places the contents of the specified column or attributes in alphabetical order.

Tables are nice, but there must be a better way to present this information. Let’s create a barchart of the portal’s contents with the ggplot2 package that we used in a practice exercise last week (we will begin formally working with tidyverse packages next week). Take a look at the arguments for the geom_bar function. The code chunk below also makes use of the ggsave function to export the barchart to a .png file.

ggplot(data = sf_contents) + 
  geom_bar(stat = "count", aes(x = as_factor(theme)), fill = "dodgerblue") + 
  labs(x="", y = "Number of Datasets")

ggsave("Datasets By Theme.png", units = "in", width = 16, height = 8)

Now that we have a sense of what types of datasets are available, we can use what we have learned about subsetting to extract only those datasets related to Public Safety from the larger sf_contents data frame object that represents the holdings of the entire data portal. The line of code below creates a new data frame object where only the contents of the San Francisco data portals that are tagged as Pubic Safety are included. Insert a new code chunk or type it directly into the Console to try it yourself.

pubsaf_contents <- sf_contents[sf_contents["theme"] == "Public Safety", ]

sort(pubsaf_contents$title)

##  [1] "100-Year Storm + 24\" Sea Level Rise"                           
##  [2] "100-year storm + 66 inches Sea Level Rise"                      
##  [3] "Arrests Presented and Prosecutions"                             
##  [4] "Bureau of Fire Prevention Districts - Fire Department"          
##  [5] "Current Police Districts"                                       
##  [6] "Dam or Reservoir Failure Hazard Zone"                           
##  [7] "District Attorney Incoming Caseload"                            
##  [8] "District Attorney Trials"                                       
##  [9] "Fire Department Calls for Service"                              
## [10] "Fire Incidents"                                                 
## [11] "Hayward M7.0 Earthquake Scenario"                               
## [12] "Historical Police Department Car Sectors"                       
## [13] "Historical Police Department Reporting Plots"                   
## [14] "Historical Police Districts"                                    
## [15] "Landslide Susceptibility Hazard Zones"                          
## [16] "Law Enforcement Dispatched Calls for Service: Real-Time"        
## [17] "MTA.offstreetparking"                                           
## [18] "Police Department Calls for Service"                            
## [19] "Police Department Incident Reports: 2018 to Present"            
## [20] "Police Department Incident Reports: Historical 2003 to May 2018"
## [21] "Police Stations (2011)"                                         
## [22] "Reference: Police Department Incident Code Crosswalk"           
## [23] "San Andreas M7.8 Earthquake Scenario"                           
## [24] "San Francisco Speed Limit Compliance"                           
## [25] "SFPUC 100-Year Storm Flood Hazard Zone"                         
## [26] "Soil Liquefaction Hazard Zone"                                  
## [27] "Tsunami Inundation Hazard Zone (Outdated, 2015)"

Again the `sort`` function just places things in alphabetical order. Do any of these datasets look like they might be relevant to the State of Emergency in the Tenderloin?

The code chunk below demonstrates how you might pull down a specific dataset of interest included in the inventory.

police_calls_0 <- sf_contents[sf_contents$title == "Police Department Incident Reports: 2018 to Present", ]

str(police_calls_0)

## 'data.frame':    1 obs. of  14 variables:
##  $ accessLevel : chr "public"
##  $ landingPage : chr "https://data.sfgov.org/d/wg3w-h783"
##  $ issued      : POSIXct, format: "2021-05-15"
##  $ @type       : chr "dcat:Dataset"
##  $ modified    : POSIXct, format: "2022-02-09"
##  $ keyword     :List of 1
##   ..$ : chr  "crime" "crime reports" "police department" "sfpd"
##  $ contactPoint:'data.frame':    1 obs. of  3 variables:
##   ..$ @type   : chr "vcard:Contact"
##   ..$ fn      : chr "OpenData"
##   ..$ hasEmail: chr "mailto:no-reply@data.sfgov.org"
##  $ publisher   :'data.frame':    1 obs. of  2 variables:
##   ..$ @type: chr "org:Organization"
##   ..$ name : chr "data.sfgov.org"
##  $ identifier  : chr "https://data.sfgov.org/api/views/wg3w-h783"
##  $ description : chr "<strong>A. SUMMARY</strong>\nRead the <u><a href=\"https://datasf.gitbook.io/datasf-dataset-explainers/sfpd-inc"| __truncated__
##  $ title       : chr "Police Department Incident Reports: 2018 to Present"
##  $ distribution:List of 1
##   ..$ :'data.frame': 4 obs. of  5 variables:
##   .. ..$ @type          : chr  "dcat:Distribution" "dcat:Distribution" "dcat:Distribution" "dcat:Distribution"
##   .. ..$ downloadURL    : chr  "https://data.sfgov.org/api/views/wg3w-h783/rows.csv?accessType=DOWNLOAD" "https://data.sfgov.org/api/views/wg3w-h783/rows.rdf?accessType=DOWNLOAD" "https://data.sfgov.org/api/views/wg3w-h783/rows.json?accessType=DOWNLOAD" "https://data.sfgov.org/api/views/wg3w-h783/rows.xml?accessType=DOWNLOAD"
##   .. ..$ mediaType      : chr  "text/csv" "application/rdf+xml" "application/json" "application/xml"
##   .. ..$ describedBy    : chr  NA "https://data.sfgov.org/api/views/wg3w-h783/columns.rdf" "https://data.sfgov.org/api/views/wg3w-h783/columns.json" "https://data.sfgov.org/api/views/wg3w-h783/columns.xml"
##   .. ..$ describedByType: chr  NA "application/rdf+xml" "application/json" "application/xml"
##  $ license     : chr "http://opendatacommons.org/licenses/pddl/1.0/"
##  $ theme       : chr "Public Safety"
##  - attr(*, "@context")= chr "https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld"
##  - attr(*, "@id")= chr "https://data.sfgov.org/data.json"
##  - attr(*, "@type")= chr "dcat:Catalog"
##  - attr(*, "conformsTo")= chr "https://project-open-data.cio.gov/v1.1/schema"
##  - attr(*, "describedBy")= chr "https://project-open-data.cio.gov/v1.1/schema/catalog.json"

class(police_calls_0)

## [1] "data.frame"

police_calls_0$columns$fieldName

## NULL

We can see that the police_calls_0 data frame object is fairly complex and consists of lists, vectors, and data frames. Because I spent some time looking at the API documentation, the dataset itself, and the results of the str function above, I know that the JSON endpoint is https://data.sfgov.org/resource/wg3w-h783.json and I can use the fromJSON function to pull this data into my R Notebook directly from the internet.

police_calls_1 <- fromJSON("https://data.sfgov.org/resource/wg3w-h783.json")

police_calls_tb <- as_tibble(police_calls_1)

str(police_calls_tb)

## tibble [1,000 x 33] (S3: tbl_df/tbl/data.frame)
##  $ incident_datetime          : chr [1:1000] "2021-09-29T12:59:00.000" "2021-05-14T01:51:00.000" "2021-03-01T08:00:00.000" "2021-04-01T12:00:00.000" ...
##  $ incident_date              : chr [1:1000] "2021-09-29T00:00:00.000" "2021-05-14T00:00:00.000" "2021-03-01T00:00:00.000" "2021-04-01T00:00:00.000" ...
##  $ incident_time              : chr [1:1000] "12:59" "01:51" "08:00" "12:00" ...
##  $ incident_year              : chr [1:1000] "2021" "2021" "2021" "2021" ...
##  $ incident_day_of_week       : chr [1:1000] "Wednesday" "Friday" "Monday" "Thursday" ...
##  $ report_datetime            : chr [1:1000] "2021-09-29T18:48:00.000" "2021-05-14T01:57:00.000" "2021-09-26T11:40:00.000" "2021-05-04T09:22:00.000" ...
##  $ row_id                     : chr [1:1000] "107597928150" "103010326030" "107606971000" "102792271000" ...
##  $ incident_id                : chr [1:1000] "1075979" "1030103" "1076069" "1027922" ...
##  $ incident_number            : chr [1:1000] "216138427" "210295348" "216138659" "216049830" ...
##  $ report_type_code           : chr [1:1000] "II" "II" "II" "II" ...
##  $ report_type_description    : chr [1:1000] "Coplogic Initial" "Initial" "Coplogic Initial" "Coplogic Initial" ...
##  $ filed_online               : logi [1:1000] TRUE NA TRUE TRUE TRUE TRUE ...
##  $ incident_code              : chr [1:1000] "28150" "26030" "71000" "71000" ...
##  $ incident_category          : chr [1:1000] "Malicious Mischief" "Arson" "Lost Property" "Lost Property" ...
##  $ incident_subcategory       : chr [1:1000] "Vandalism" "Arson" "Lost Property" "Lost Property" ...
##  $ incident_description       : chr [1:1000] "Malicious Mischief, Vandalism to Property" "Arson" "Lost Property" "Lost Property" ...
##  $ resolution                 : chr [1:1000] "Open or Active" "Open or Active" "Open or Active" "Open or Active" ...
##  $ police_district            : chr [1:1000] "Ingleside" "Bayview" "Southern" "Park" ...
##  $ cad_number                 : chr [1:1000] NA "211340138" NA NA ...
##  $ intersection               : chr [1:1000] NA "03RD ST \\ CUSTER AVE" NA NA ...
##  $ cnn                        : chr [1:1000] NA "20240000" NA NA ...
##  $ analysis_neighborhood      : chr [1:1000] NA "Bayview Hunters Point" NA NA ...
##  $ supervisor_district        : chr [1:1000] NA "10" NA NA ...
##  $ latitude                   : chr [1:1000] NA "37.74425940578451" NA NA ...
##  $ longitude                  : chr [1:1000] NA "-122.38737260846696" NA NA ...
##  $ point                      :'data.frame': 1000 obs. of  2 variables:
##   ..$ type       : chr [1:1000] NA "Point" NA NA ...
##   ..$ coordinates:List of 1000
##   .. ..$ : NULL
##   .. ..$ : num [1:2] -122.4 37.7
##   .. ..$ : NULL
##   .. ..$ : NULL
##   .. ..$ : NULL
##   .. ..$ : NULL
##   .. ..$ : NULL
##   .. ..$ : NULL
##   .. ..$ : NULL
##   .. ..$ : NULL
##   .. ..$ : NULL
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.7
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.7
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : NULL
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.7
##   .. ..$ : NULL
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.5 37.7
##   .. ..$ : num [1:2] -122.4 37.7
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.5 37.7
##   .. ..$ : num [1:2] -122.4 37.7
##   .. ..$ : NULL
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.7
##   .. ..$ : num [1:2] -122.5 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : NULL
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.7
##   .. ..$ : num [1:2] -122.5 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.5 37.7
##   .. ..$ : num [1:2] -122.4 37.7
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.5 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.7
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.5 37.7
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.5 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.5 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.7
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : NULL
##   .. ..$ : num [1:2] -122.4 37.7
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : NULL
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.7
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.5 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : num [1:2] -122.5 37.8
##   .. ..$ : NULL
##   .. ..$ : num [1:2] -122.4 37.8
##   .. ..$ : NULL
##   .. .. [list output truncated]
##  $ :@computed_region_jwn9_ihcz: chr [1:1000] NA "56" NA NA ...
##  $ :@computed_region_26cr_cadq: chr [1:1000] NA "9" NA NA ...
##  $ :@computed_region_qgnn_b9vv: chr [1:1000] NA "2" NA NA ...
##  $ :@computed_region_nqbw_i6c3: chr [1:1000] NA NA NA NA ...
##  $ :@computed_region_jg9y_a9du: chr [1:1000] NA NA NA NA ...
##  $ :@computed_region_n4xg_c4py: chr [1:1000] NA NA NA NA ...
##  $ :@computed_region_h4ep_8xdi: chr [1:1000] NA NA NA NA ...

We now have 1,000 records (recall that this is the default limit for some APIs) in a tibble format. If we wanted to write our own API query we could use the $limit parameter to retrieve more than the default 1,000 records, but this is okay for our purposes. Let’s explore the of police calls to the Tenderloin neighborhood.

Exercise 2

Execute the code chunk above and make sure you understand the code, then proceed with the tasks below:

1. Insert a new code chunk and modify the code in “Get Info for Incident Reports Dataset” to create a new object that only contains police calls to the Tenderloin:

You will want to use the analysis_neighborhood attribute

police_calls_2 <- sf_contents[sf_contents$title == "Police Department Incident Reports: 2018 to Present", ]

police_calls_2 <- fromJSON("https://data.sfgov.org/resource/wg3w-h783.json")

police_calls_TEND <- filter(police_calls_2, analysis_neighborhood == "Tenderloin")

2. Write and execute code to create and export to .png a barchart that shows at least one of the following:

Year
Day of the week

ggplot(data = police_calls_TEND) + 
  geom_bar(mapping = aes(x = incident_year, fill = incident_day_of_week), position = "dodge")

ggsave("Tenderloin-Police-Calls-byYear-byDay.png", units = "in", width = 16, height = 8)

3. Choose another neighborhood `table(police_calls_tb$analysis_neighborhood)` and compare it with the Tenderloin in terms of police calls.

Interpret your findings in a Markdown section

police_calls_SOMA <- filter(police_calls_2, analysis_neighborhood == "South of Market")

ggplot(data = police_calls_TEND) + 
  geom_bar(mapping = aes(x = incident_year), position = "dodge") + 
  ggtitle("Police Calls to the Tenderloin")

ggsave("TEND-Police-Calls-byYear.png", units = "in", width = 16, height = 8)

police_calls_SOMA <- filter(police_calls_2, analysis_neighborhood == "South of Market")

ggplot(data = police_calls_SOMA) + 
  geom_bar(mapping = aes(x = incident_year), position = "dodge") + 
  ggtitle("Police Calls to South of Market")

ggsave("SOMA-Police-Calls-byYear.png", units = "in", width = 16, height = 8)

Findings

The number of police calls to the Tenderloin over these three years has been much more volatile than the number of the calls to the South of Market (SoMA) neighborhood, which experienced the same total number of calls each of the three years captured. The Tenderloin had the highest number of calls (30) in 2018, dipped to less than 15 in 2019, and then two years later, the number of calls (27) nearly matched the 2018 numbers.

Take some time to reflect on this portion of the exercise, then proceed.

Exercise 2 Reflection

This portion of the exercise was relatively straightforward, but it made me realize that I have a lot to learn to use ggplot more effectively. I struggled to, and ultimately failed, to figure out how to sort the data for days of week in the correct order–I hope we will learn how to do so when working with dates. Until then, my bar chart has a random display of the days of the week beginning with Friday! As well, spending more time to learn how to add value labels, chart labels, change values, trend lines, etc. will be very valuable.

The Census API

Rather than downloading data from the official U.S. Census Bureau website, we can also access data programmatically from R. The first step is to take a quick look at the API documentation, then request an API key by visiting this site. You should receive a response in a few minutes, but be sure to safeguard your API key because it it tied to each individual user.

In order to get a sense for how API queries work, open a web browser and type the following into the search bar, inserting your own personal Census API key where indicated:

https://api.census.gov/data/2017/acs/acs5?key=7f5b3f883e24d9d3a7f207f76bfb731d19124299&get=B01003_001E&for=zip%20code%20tabulation%20area:94114,94110

The resulting browser window should contain the results of the query above in JavaScript Object Notation (JSON) format. This is a compact way to store and transmit information over the internet that we will return to later in the semester. For now, we just need to know that the response there tells us:

That there 73,737 people living in zip code tabulation area 94110 and 34,561 people in San Francisco’s 94114 zip code tabulation area

So, we can send queries to the Census API for the 2013-2017 ACS 5-Year Estimates directly from a web browser, but it is more efficient to do this from R. But first of all, what are the different components of the query we just executed?

The https://api.census.gov/data/2017/acs/acs5? component is the base url. It tells the Census servers that it we are interested in 5-Year ACS data that ends in 2017 (i.e., ACS 2013-2017)
The key=[YOUR_API_KEY] component is the unique identifier for the client (i.e., your browser/script) that is making the request
The get=B01003_001E is the variable name (Total population). For a full list, see here.
The for=zip%20code%20tabulation%20area:94114,94110 for the census geography ZCTA for the Castro/Noe Valley (94114) and Inner Mission/Bernal Heights (94110) areas of San Francisco. Note that these areas are generally southwest of the Tenderloin. The %20 is a hexademical representation of the space character. There are quite a few special characters that need to be properly encoded, if they are being passed to the server.
Finally, note that the all the arguments are concatenated using &.

This structure is unique to Census APIs and endpoints. If you want to interact with other APIs, you will first have to refer to their documentation and understand how to properly format a query. It is also possible that APIs change over time causing your code to stop working. Luckily, it is usually not onerous to make the necessary tweaks, once you understand how the API has changed.

Exercise 3

Still using a web browser, practice what we have learned here by constructing the proper API query in the search bar to:

1. Retrieve the total population of the same two San Francisco ZCTAs based on the 2014-2018 ACS 5-Year Estimates

2017

CALL: ‘https://api.census.gov/data/2017/acs/acs5?get=B01003_001E&for=zip%20code%20tabulation%20area:94114,94110&in=state:06&key=7f5b3f883e24d9d3a7f207f76bfb731d19124299’

RESPONSE: [[“B01003_001E”,“state”,“zip code tabulation area”], [“73737”,“06”,“94110”], [“34561”,“06”,“94114”]]’

2017 Population Estimates

94114 - 34,561 ; 94110 - 73,737

2018

CALL: ‘https://api.census.gov/data/2018/acs/acs5?get=B01003_001E&for=zip%20code%20tabulation%20area:94114,94110&in=state:06&key=7f5b3f883e24d9d3a7f207f76bfb731d19124299’

RESPONSE: ‘[[“B01003_001E”,“state”,“zip code tabulation area”], [“34754”,“06”,“94114”], [“74161”,“06”,“94110”]]’

2018 Populaion Estimates 94114 - 34,561 ; 94110 - 73,737

2. Retrieve the unemployment rate for residents of the same two San Francisco ZCTAs based on the 2014-2018 ACS 5-Year Estimates

Hint: Take a look at variable "DP03_0005PE" which is located in a different API with a slightly different base URL here

CALL: ‘https://api.census.gov/data/2018/acs/acs5/profile?get=DP03_0005PE&for=zip%20code%20tabulation%20area:94114,94110&in=state:06&key=7f5b3f883e24d9d3a7f207f76bfb731d19124299’

RESPONSE: [[“DP03_0005PE”,“state”,“zip code tabulation area”], [“3.2”,“06”,“94114”], [“3.1”,“06”,“94110”]]

2018 Percent Unemployed 94114 - 3.2% ; 94110 - 3.1%

3. Now change geographies and retrieve the unemployment rate for the city as a whole

Hint: Try &for=place:67000&in=state:06

CALL: ‘https://api.census.gov/data/2018/acs/acs5/profile?get=DP03_0005PE&for=place:67000&in=state:06&key=7f5b3f883e24d9d3a7f207f76bfb731d19124299’

RESPONSE: ‘[[“DP03_0005PE”,“state”,“place”], [“3.3”,“06”,“67000”]]’

2018 Percent Unemployed San Francisco - 3.3%

Take some time to experiment with one or more of the other variables, then proceed with the exercise.

Using the examples provided by the Census, we can create a simple loop to retrieve a variety of variables for the City of San Francisco.

install.packages("httr")

## Installing package into 'C:/Users/Student/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)

## package 'httr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Student\AppData\Local\Temp\RtmpA5vUGU\downloaded_packages

library(httr)

## Warning: package 'httr' was built under R version 4.0.5

library(jsonlite)

baseurl <- "https://api.census.gov/data/2018/acs/acs5?"
param1 <- "get="
param2 <- "&for=place:"
param3 <- "&in=state:06"
key <- "&key=f9f31e09fed0f44d23a8a354469e461df98f34cb"

# These are the place FIPS codes for San Francisco, Oakland, and Berkeley
places.list <- paste("67000", "53000", "06000", sep = ",")
vars.list <- c("B00001_001E", "B19013_001E")
vars.names <- c("Total population", "Median household income")

req <- httr::GET(paste0(baseurl, param1, vars.list[1],
                        param2, places.list, param3, key))
req.json <- fromJSON(content(req, "text"), flatten=TRUE)
req.df <- as.data.frame(req.json)
col.names <- req.json[1,]
df <- req.df[2:4, ]
colnames(df) <- col.names
df

req.2 <- httr::GET(paste0(baseurl, param1, vars.list[2],
                        param2, places.list, param3, key))
req.json.2 <- fromJSON(content(req.2, "text"), flatten=TRUE)
req.df.2 <- as.data.frame(req.json.2)
col.names.2 <- req.json.2[1,]
df.2 <- req.df.2[2:4, ]
colnames(df.2) <- col.names.2
df.2

df.2 <- subset(df.2, select = -state)

merged.df <- merge(df, df.2, by = "place")   
merged.df

Lab Reflection

This lab was a useful introduction to navigating the Census API and working in the tidyverse, as well as structuring a digestible markdown document.

Working with Census

Navigating the Census documentation was rather confusing, given the repetitive structure of their site and the many, many different datasets and geographic levels that one can query. I found it very confusing that I could not find anywhere on the site a direct reference to querying by ZCTA–their 53 examples of geographic calls excluded this particular level. And why does a call for ZCTA require a state code when a zip code should be a unique identifier nationally?

The Tidyverse

I’ve certainly learned I have a long way to go with ggplot, as I noted in an earlier reflection. I look forward to developing a much greater command of this tool.

Structuring Markdowns

I believe I’m getting the hang of the basics of building a digestible markdown document, and I look forward to learning more. I’d also like to develop a better command of navigating a longer markdown notebook within RStudio to be able to quickly make adjustments in specific areas of the notebook.

Navigating Open Data Portals with

PLAN 6122 Urban Analytics

Brian Kusiak

Lab Exercise 1 / 2-10-2022

Exercise 1

1. Identify the function you would use to list datasets that exist on the City of San Francisco’s open data portal

2. Insert a new code chunk below this text

3. Save the information returned by the function in 1. to an object (e.g., sf_data) using the assignment operator `<-`

Exploring the Inventory

How many datasets are there on the San Francisco portal right now?

Exercise 2

1. Insert a new code chunk and modify the code in “Get Info for Incident Reports Dataset” to create a new object that only contains police calls to the Tenderloin:

2. Write and execute code to create and export to .png a barchart that shows at least one of the following:

3. Choose another neighborhood `table(police_calls_tb$analysis_neighborhood)` and compare it with the Tenderloin in terms of police calls.

Findings

Exercise 2 Reflection

The Census API

Exercise 3

1. Retrieve the total population of the same two San Francisco ZCTAs based on the 2014-2018 ACS 5-Year Estimates

2017

2018

2. Retrieve the unemployment rate for residents of the same two San Francisco ZCTAs based on the 2014-2018 ACS 5-Year Estimates

3. Now change geographies and retrieve the unemployment rate for the city as a whole

Lab Reflection

Working with Census

The Tidyverse

Structuring Markdowns

Navigating Open Data Portals with

PLAN 6122 Urban Analytics

Brian Kusiak

Lab Exercise 1 / 2-10-2022

Exercise 1

1. Identify the function you would use to list datasets that exist on the City of San Francisco’s open data portal

2. Insert a new code chunk below this text

3. Save the information returned by the function in 1. to an object (e.g., sf_data) using the assignment operator <-

Exploring the Inventory

How many datasets are there on the San Francisco portal right now?

How many are related to Public Safety?

Exercise 2

1. Insert a new code chunk and modify the code in “Get Info for Incident Reports Dataset” to create a new object that only contains police calls to the Tenderloin:

2. Write and execute code to create and export to .png a barchart that shows at least one of the following:

3. Choose another neighborhood table(police_calls_tb$analysis_neighborhood) and compare it with the Tenderloin in terms of police calls.

Findings

Exercise 2 Reflection

The Census API

Exercise 3

1. Retrieve the total population of the same two San Francisco ZCTAs based on the 2014-2018 ACS 5-Year Estimates

2017

2018

2. Retrieve the unemployment rate for residents of the same two San Francisco ZCTAs based on the 2014-2018 ACS 5-Year Estimates

3. Now change geographies and retrieve the unemployment rate for the city as a whole

Lab Reflection

Working with Census

The Tidyverse

Structuring Markdowns

3. Save the information returned by the function in 1. to an object (e.g., sf_data) using the assignment operator `<-`

3. Choose another neighborhood `table(police_calls_tb$analysis_neighborhood)` and compare it with the Tenderloin in terms of police calls.