April 28, 2016

Problem

Why would someone live near a volcano?

Identify investigate the risk posed by volcanoes on nearby cities

Data Utilized:

  • List of volcanoes (Smithsonian)
  • List of cities (GeoNames)
    • Country data
  • Nearby places API (GeoNames)

Data Management

Data stored in MySQL database

  • volcanoes

  • cities

  • countries

  • near: instances of cities located near volcanoes

Data Acquisition

GVP <- read.csv('https://github.com/dsmilo/DATA607/raw/master/Projects/Data/
                GVP_Volcano_List.csv', stringsAsFactors = FALSE)[1:10]

cities1000 <- read.table('https://raw.githubusercontent.com/dsmilo/DATA607/master/
                         Projects/Data/cities1000.txt', header = FALSE, sep = "\t",
                         fill = TRUE, stringsAsFactors = FALSE, quote = "")

countries <- read.table('http://download.geonames.org/export/dump/countryInfo.txt',
                        header = TRUE, skip = 50, sep = "\t", fill = TRUE,
                        stringsAsFactors = FALSE,quote = "", comment.char = "",
                        check.names = FALSE)

dbWriteTable(volcano_db, "volcanoes", GVP, append = TRUE, row.names = FALSE)
dbWriteTable(volcano_db, "cities", cities1000, append = TRUE, row.names = FALSE)
dbWriteTable(volcano_db, "countries", countries, append = TRUE, row.names = FALSE)

Data Acquisition

Function to query API:

get.places <- function(lat, lng, maxRows = 10000, radius = 161) {
  username <- getOption('geonamesUsername')
  api_query <- paste0('http://api.geonames.org/findNearbyPlaceNameJSON?style=short&
cities=cities1000&username=', username, '&lat=', lat, '&lng=', lng, '&maxRows=',
maxRows, '&radius=', radius)
  places <- fromJSON(api_query)$geonames
  if (class(places) == "data.frame"){
    places <- places %>% select(geonameId, distance)
  }
  else{places <- list(NULL)}
  places
}

Data Acquisition

Loop to identify instances:

for (i in 1:nrow(GVP)) {
  nearby <- data.frame(NULL)
  
  nearby <- get.places(GVP$latitude[i], GVP$longitude[i])
  
  if(class(nearby) == "list") next
  
  nearby$near_id <- c(seq(near_count + 1, near_count + nrow(nearby), 1))
  nearby$volc_id <- rep(GVP$volc_id[i], nrow(nearby))
  nearby <- nearby %>% select(near_id, geonameId, volc_id, distance)
  
  dbWriteTable(volcano_db, "near", nearby, append = TRUE, row.names = FALSE)
  
  near_count <- near_count + nrow(nearby)
}

Processing Results

Querying Database

near_query <- dbSendQuery(volcano_db,
                          'SELECT c.asciiname, co.Country, c.latitude, c.longitude,
                          c.population, v.eruption, n.distance
                          FROM near AS n
                          INNER JOIN cities AS c
                          ON c.geonameId = n.geonameId
                          INNER JOIN countries AS co
                          ON c.country_code = co.country_code
                          INNER JOIN volcanoes AS v
                          ON n.volc_id = v.volc_id;')

at_risk <- dbFetch(near_query, n = -1)

at_risk$distance <- as.numeric(at_risk$distance) * 0.621371
at_risk$eruption <- 2016 - at_risk$eruption

Quantifying Risk

\[risk = \frac{\log_{10} (population)}{\log_{10} (eruption) \times distance^2}\]

at_risk <- at_risk %>%
  mutate(risk = log10(population + 2) / log10(eruption + 2) / ((distance + 1)^2))

city_risk <- at_risk %>% mutate(City = paste(asciiname, Country, sep = ", ")) %>%
  group_by(City) %>% summarise(lat = mean(latitude), lng = mean(longitude),
                               pop = mean(population), risk = sum(risk)) %>%
  arrange(desc(risk))

max_risk <- max(city_risk$risk)
city_risk$risk <- city_risk$risk / max_risk * 1000

Results

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
   0.0058    0.2350    0.9656    7.0330    4.2160 1000.0000 
                       City      lat      lng        pop      risk
1    San Roque, Philippines 13.00413 123.0549   3181.258 1000.0000
2        Nindiri, Nicaragua 12.00390 -86.1213   7073.000  808.3912
3        Tomohon, Indonesia  1.31678 124.8040  27624.000  747.2402
4        Lembang, Indonesia -6.81167 107.6180 183130.000  658.4757
5   Vestmannaeyjar, Iceland 63.44270 -20.2734   4219.000  602.8504
6       Monbon, Philippines 12.73070 124.0190   3408.000  592.2178
7         Masaya, Nicaragua 11.97440 -86.0942 130113.000  581.0437
8  La Concepcion, Nicaragua 11.93710 -86.1898   6946.000  579.5223
9     Ticuantepe, Nicaragua 12.02260 -86.2049  13209.000  578.3507
10     Almolonga, Guatemala 14.81670 -91.5000  11913.000  557.9093

Results

Findings

  • Most cities located face a very low level of risk

  • Cities in Central America, Indonesia, the Philippines, and Japan face higher risk

Country Median Mean
Nicaragua 23.4 85.3
Guatemala 42.0 66.4
Costa Rica 45.3 48.3
Japan 9.6 18.5
Philippines 4.1 13.0
Honduras 2.4 5.2
Indonesia 1.2 4.5
Panama 0.2 1.4

Challenges

  • NYT Geographic did not peform as expected

  • GeoNames API has transaction limits

  • GeoNames API returns list, not data frame, for empty results

  • ggmap plot difficult to evaluate

  • Standard ggplot2 plot subject to distortion near poles

Caveats

  • Assumptions for quantification of risk should be investigated

    • Number of cities and degree of risk may change
  • Ratings not meant to serve as predictor of future volcanic activity

References and full report available on RPubs