The mass power outages on the U.S. west coast provided an opportunity to test out my R 3.4.0 installation and share a quick idiom for turning portions of an XML document into a data frame — generically — using xml2.
Packages we’ll need:
library(xml2)
library(httr)
library(leaflet)
library(htmltools)
library(widgetframe)
library(tidyverse)
This is where the power outage data is. You can grab this on your own and make your own map whenver you want!
res <- GET("https://m.sce.com/nrc/AOC/AOC_Location_Report.xml")
doc <- content(res, as="parsed", encoding="UTF-8")
Here’s what that document looks like. Scroll around a bit to see that there are definitely groups of nodes we can capture separately but it’d be a royal pain to write extraction code for every set of nodes.
To setup the generic extraction idiom, we first setup two helper functions.
The first is just a shorthand way to extract all targeted nodes into a vector.
xtrct <- function(doc, target) { xml_find_all(doc, target) %>% xml_text() %>% trimws() }
The next one will take a document or node list and a target node then extract the first one and find the children. Then it will use those child node names to extract them each into a separate vector (which becomes a named list) and then turns those into a data frame and makes sane types.
There is a big assumption that all child nodes exist and there aren’t multiple descendents. This can be made more generic, but this idiom handles many, many use-caes.
xtrct_df <- function(doc, top) {
xml_find_first(doc, sprintf(".//%s", top)) %>%
xml_children() %>%
xml_name() %>%
map(~{
xtrct(doc, sprintf(".//%s/%s", top, .x)) %>%
list() %>%
set_names(tolower(.x))
}) %>%
flatten_df() %>%
readr::type_convert()
}
Now, we can use that to extract data frames from that XML document:
county_df <- xtrct_df(doc, "COUNTY")
city_df <- xtrct_df(doc, "CITY")
zipcode_df <- xtrct_df(doc, "ZIPCODE")
district_df <- xtrct_df(doc, "DISTRICT")
sector_df <- xtrct_df(doc, "SECTOR")
incident_df <- xtrct_df(doc, "INCIDENT")
We can peek at a couple of them:
glimpse(county_df)
## Observations: 9
## Variables: 5
## $ county_name <chr> "Kern", "Los Angeles", "Not Available", "Ora...
## $ nbr_incidents <int> 1, 20, 1, 5, 5, 8, 1, 5, 2
## $ nbr_cust_affected <int> 54, 856, 0, 314, 126, 134, 19, 88, 26
## $ centroid_x <dbl> -118.7291, -118.3511, NA, -117.7793, -115.99...
## $ centroid_y <dbl> 35.34358, 33.81972, NA, 33.21832, 33.74648, ...
glimpse(incident_df)
## Observations: 48
## Variables: 23
## $ incident_id <int> 116558720, 116585597, 116586143, 1165567...
## $ incident_type <chr> "PO", "Not Available", "QI", "PO", "PO",...
## $ fac_job_status_cd <chr> "D", "W", "D", "D", "D", "D", "D", "W", ...
## $ oan_no <int> 818762, 820791, 820777, 818542, 818645, ...
## $ outage_start_datetime <chr> "4/21/2017 8:00:56 AM", "4/21/2017 6:41:...
## $ version_dt <chr> "4/21/2017 1:24:00 PM", "4/21/2017 1:24:...
## $ last_chng_datetime <chr> "4/21/2017 1:12:50 PM", "4/21/2017 12:54...
## $ est_clu_datetime <chr> "4/21/2017 2:15:00 PM", "4/21/2017 4:00:...
## $ memo_cause_cd <chr> "07", "16", "16", "07", "07", "07", "07"...
## $ memo_cause_cd_desc <chr> "Upgrading Equipment", "Equipment Proble...
## $ crew_status_cd <chr> "04", "04", "04", "04", "04", "04", "04"...
## $ crew_status_cd_desc <chr> "Work is in progress.", "Work is in prog...
## $ result_cd <chr> "01", "0", "0", "01", "01", "01", "01", ...
## $ result_cd_desc <chr> "Enhancing Reliability", "Not Available"...
## $ nbr_cust_affected <int> 23, 15, 168, 56, 15, 66, 50, 17, 5, 21, ...
## $ zip_code <chr> "90043", "92562", "91765", "92404", "917...
## $ county_name <chr> "Los Angeles", "Riverside", "Los Angeles...
## $ city_name <chr> "View Park-Windsor Hills", "Murrieta", "...
## $ district_no <int> 44, 88, 26, 31, 22, 42, 27, 32, 39, 39, ...
## $ sector_no <chr> "EL NIDO", "VALLEY", "MIRALOMA", "VISTA"...
## $ ert_cd <chr> "07", "06", "06", "06", "06", "07", "06"...
## $ centroid_x <dbl> -118.3328, -117.2026, -117.7970, -117.24...
## $ centroid_y <dbl> 33.99711, 33.60503, 34.02160, 34.15461, ...
And, now we’ll use Leaflet to make our own outage map:
incident_df %>%
mutate(popup_text =
sprintf("%s<br/><br/>%s / %s County<br/><br/><b>Customers impacted:</b> %s<br/><b>Cause:</b> %s<br/><b>Status:</b> %s",
htmlEscape(outage_start_datetime),
htmlEscape(city_name),
htmlEscape(county_name),
htmlEscape(nbr_cust_affected),
htmlEscape(memo_cause_cd_desc),
htmlEscape(crew_status_cd_desc))) %>%
rename(lat=centroid_y, lng=centroid_x) %>%
leaflet(height="600px") %>%
addProviderTiles(providers$Esri.WorldStreetMap) %>%
addCircleMarkers(weight=1, radius=~sqrt(nbr_cust_affected)*2, color = "#bd0026",
fillColor="#fc4e2a", opacity=1, fillOpacity=0.5, popup = ~popup_text)%>%
setView(mean(range(incident_df$centroid_x)),
mean(range(incident_df$centroid_y)), zoom=7) %>%
frameWidget()