Why would someone live near a volcano?
Identify investigate the risk posed by volcanoes on nearby cities
Data Utilized:
- List of volcanoes (Smithsonian)
- List of cities (GeoNames)
- Country data
- Nearby places API (GeoNames)
April 28, 2016
Why would someone live near a volcano?
Identify investigate the risk posed by volcanoes on nearby cities
Data Utilized:
Data stored in MySQL database
volcanoes
cities
countries
near: instances of cities located near volcanoes
GVP <- read.csv('https://github.com/dsmilo/DATA607/raw/master/Projects/Data/
GVP_Volcano_List.csv', stringsAsFactors = FALSE)[1:10]
cities1000 <- read.table('https://raw.githubusercontent.com/dsmilo/DATA607/master/
Projects/Data/cities1000.txt', header = FALSE, sep = "\t",
fill = TRUE, stringsAsFactors = FALSE, quote = "")
countries <- read.table('http://download.geonames.org/export/dump/countryInfo.txt',
header = TRUE, skip = 50, sep = "\t", fill = TRUE,
stringsAsFactors = FALSE,quote = "", comment.char = "",
check.names = FALSE)
dbWriteTable(volcano_db, "volcanoes", GVP, append = TRUE, row.names = FALSE)
dbWriteTable(volcano_db, "cities", cities1000, append = TRUE, row.names = FALSE)
dbWriteTable(volcano_db, "countries", countries, append = TRUE, row.names = FALSE)
Function to query API:
get.places <- function(lat, lng, maxRows = 10000, radius = 161) {
username <- getOption('geonamesUsername')
api_query <- paste0('http://api.geonames.org/findNearbyPlaceNameJSON?style=short&
cities=cities1000&username=', username, '&lat=', lat, '&lng=', lng, '&maxRows=',
maxRows, '&radius=', radius)
places <- fromJSON(api_query)$geonames
if (class(places) == "data.frame"){
places <- places %>% select(geonameId, distance)
}
else{places <- list(NULL)}
places
}
Loop to identify instances:
for (i in 1:nrow(GVP)) {
nearby <- data.frame(NULL)
nearby <- get.places(GVP$latitude[i], GVP$longitude[i])
if(class(nearby) == "list") next
nearby$near_id <- c(seq(near_count + 1, near_count + nrow(nearby), 1))
nearby$volc_id <- rep(GVP$volc_id[i], nrow(nearby))
nearby <- nearby %>% select(near_id, geonameId, volc_id, distance)
dbWriteTable(volcano_db, "near", nearby, append = TRUE, row.names = FALSE)
near_count <- near_count + nrow(nearby)
}
Querying Database
near_query <- dbSendQuery(volcano_db,
'SELECT c.asciiname, co.Country, c.latitude, c.longitude,
c.population, v.eruption, n.distance
FROM near AS n
INNER JOIN cities AS c
ON c.geonameId = n.geonameId
INNER JOIN countries AS co
ON c.country_code = co.country_code
INNER JOIN volcanoes AS v
ON n.volc_id = v.volc_id;')
at_risk <- dbFetch(near_query, n = -1)
at_risk$distance <- as.numeric(at_risk$distance) * 0.621371
at_risk$eruption <- 2016 - at_risk$eruption
\[risk = \frac{\log_{10} (population)}{\log_{10} (eruption) \times distance^2}\]
at_risk <- at_risk %>%
mutate(risk = log10(population + 2) / log10(eruption + 2) / ((distance + 1)^2))
city_risk <- at_risk %>% mutate(City = paste(asciiname, Country, sep = ", ")) %>%
group_by(City) %>% summarise(lat = mean(latitude), lng = mean(longitude),
pop = mean(population), risk = sum(risk)) %>%
arrange(desc(risk))
max_risk <- max(city_risk$risk)
city_risk$risk <- city_risk$risk / max_risk * 1000
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.0058 0.2350 0.9656 7.0330 4.2160 1000.0000
City lat lng pop risk 1 San Roque, Philippines 13.00413 123.0549 3181.258 1000.0000 2 Nindiri, Nicaragua 12.00390 -86.1213 7073.000 808.3912 3 Tomohon, Indonesia 1.31678 124.8040 27624.000 747.2402 4 Lembang, Indonesia -6.81167 107.6180 183130.000 658.4757 5 Vestmannaeyjar, Iceland 63.44270 -20.2734 4219.000 602.8504 6 Monbon, Philippines 12.73070 124.0190 3408.000 592.2178 7 Masaya, Nicaragua 11.97440 -86.0942 130113.000 581.0437 8 La Concepcion, Nicaragua 11.93710 -86.1898 6946.000 579.5223 9 Ticuantepe, Nicaragua 12.02260 -86.2049 13209.000 578.3507 10 Almolonga, Guatemala 14.81670 -91.5000 11913.000 557.9093
Most cities located face a very low level of risk
Cities in Central America, Indonesia, the Philippines, and Japan face higher risk
| Country | Median | Mean |
|---|---|---|
| Nicaragua | 23.4 | 85.3 |
| Guatemala | 42.0 | 66.4 |
| Costa Rica | 45.3 | 48.3 |
| Japan | 9.6 | 18.5 |
| Philippines | 4.1 | 13.0 |
| Honduras | 2.4 | 5.2 |
| Indonesia | 1.2 | 4.5 |
| Panama | 0.2 | 1.4 |
NYT Geographic did not peform as expected
GeoNames API has transaction limits
GeoNames API returns list, not data frame, for empty results
ggmap plot difficult to evaluate
Standard ggplot2 plot subject to distortion near poles
Assumptions for quantification of risk should be investigated
Ratings not meant to serve as predictor of future volcanic activity
References and full report available on RPubs