Motivation

The goal of this analysis is to find an interesting spatial dataset from the web and

use the techniques covered in a spatial statistics class to make some informative plots of the data
identify potential questions that could be addressed with the data going forward

The Data

Dataset name: Population Exposure Estimates in Proximity to Nuclear Power Plants, Locations
Source: NASA Socioeconomic Data and Applications Center (SEDAC). Here’s the website, and here’s a direct link to download the zipped data, although you must create an account on their website before doing so.
Purpose: To provide a global dataset of point locations and attributes describing nuclear power plants and reactors.
Abstract: This dataset combines information from a global dataset developed by Declan Butler of Nature News and the Power Reactor Information System (PRIS), an up-to-date database of nuclear reactors maintained by the International Atomic Energy Agency (IAEA). The locations of nuclear reactors around the world are represented as point features associated with reactor specification and performance history attributes as of March 2012.
Codebook: For info on variable names/descriptions and the methodology used to collect the data click here.

Reading in the Data

The following code reads in the power plant dataset which is saved in my working directory.

locations <- read.csv("energy-pop-exposure-nuclear-plants-locations_plants.csv")

There are 276 observations in the data which represent different nuclear power plants and 61 variables.

load packages used in this analysis:

library(dplyr)   
library(fields) 
library(maps)
library(ggmap)

Making the Plots

Since the dataset is global, the first plot I make is global to get an idea of where the nuclear power plants are concentrated.

# get range of latitudes in the dataset for plotting
world.lat.lims <- range(locations$Latitude) + c(-3,3)

# get range of longitudes in the dataset for plotting
world.lon.lims <- range(locations$Longitude) + c(-3,3)

# plot world map with specified lat/lon limits
map('world',xlim=world.lon.lims,ylim=world.lat.lims)
title('Global Locations of Nuclear Power Plants')
points(locations$Longitude, locations$Latitude, cex=0.60, col='red')

It’s clear that the majority of nuclear power plants are located in the Eastern United States, Europe, and Eastern Asia. For now, I will reduce the scope of the analyis to population exposure estimates in the Eastern United States. Going forward, I will update this document as more attributes of the data for this region are explored, formal statistical tests are performed, and similar analyses for Europe and Japan are complete.

Reduce Data to the Eastern United States

There are 85 nuclear power plants total in this dataset for the USA, and the following commands narrow that number down to 68 in the Eastern USA. Lastly, I split this region into two halves (Northeastern & Southeastern) in order to zoom in more. The split occurs approximately at the northern borders of North Carolina, Tennessee, and Arkansas.

# make factor varialbe character in order to use filter()
locations$Country <- as.character(locations$Country)

# reduce locations dataset to the Northeastern USA
northeast.usa <- filter(locations,Country=="UNITED STATES OF AMERICA" & Longitude > -94 & Latitude > 36.5)

# reduce locations dataset to the Southeastern USA
southeast.usa <- filter(locations,Country=="UNITED STATES OF AMERICA" & Longitude > -94 & Latitude < 36.5)

# remove one outlier in Puerto Rico labeled as "Bonus"
east.usa.bottom$Plant <- as.character(east.usa.bottom$Plant)
east.usa.bottom <- filter(east.usa.bottom,Plant!="BONUS")

Plot Nuclear Power Plant Locations/Population Exposures in the Eastern USA

The size and color of points are dictated by the amount of the population that is exposed to the particular plant within 30 kilometers (the smallest radius in the dataset) in 2010.

2010

top.lon.lims <- range(northeast.usa$Longitude) + c(-2,2)
top.lat.lims <- range(northeast.usa$Latitude) + c(-1,1)

# create bounding box for lowerleftlon, lowerleftlat, upperrightlon, upperrightlat
extent <- c(top.lon.lims[1],top.lat.lims[1],top.lon.lims[2],top.lat.lims[2])

# obtain the map
mtop <- get_map(extent,source="stamen",maptype="toner")

# plot the raster and add points with lat/lon coordinates
options(scipen=999) # prevent scientific notation in legend
north2010 <- ggmap(mtop) + 
    geom_point(aes(x=Longitude,y=Latitude,colour=p10_30,size=p10_30),data=northeast.usa) + 
    scale_color_gradient(low="yellow",high="red",limits=c(min(northeast.usa$p90_30),max(northeast.usa$p10_30))) + 
    ggtitle("Northeastern Nuclear Power Plants in 2010 \n Size/Color by # Population Within 30 KM Exposed") +
    xlab("Longitude") + ylab("Latitude")+
    guides(colour=guide_colorbar(title="Population")) + 
    scale_size(guide = guide_legend(direction = "vertical",title="Population"),limits=c(min(northeast.usa$p90_30),max(northeast.usa$p10_30)))

#################################################################

bottom.lon.lims <- range(east.usa.bottom$Longitude) + c(-2,2)
bottom.lat.lims <- range(east.usa.bottom$Latitude) + c(-1,1)

# create bounding box for lowerleftlon, lowerleftlat, upperrightlon, upperrightlat
extent2 <- c(bottom.lon.lims[1],bottom.lat.lims[1],bottom.lon.lims[2],bottom.lat.lims[2])

# obtain the map
mbot <- get_map(extent2,source="stamen",maptype="toner")

# plot the raster and add points with lat/lon coordinates
options(scipen=999) # prevent scientific notation in legend
south2010 <- ggmap(mbot) + 
    geom_point(aes(x=Longitude,y=Latitude,colour=p10_30,size=p10_30),data=east.usa.bottom) + 
    scale_color_gradient(low="yellow",high="red",limits=c(min(east.usa.bottom$p90_30),max(east.usa.bottom$p10_30))) + 
    ggtitle("Southeastern Nuclear Power Plants in 2010 \n Size/Color by # Population Within 30 KM Exposed") + 
    xlab("Longitude") + 
    ylab("Latitude") + 
    guides(colour=guide_colorbar(title="Population")) + 
    scale_size(guide = guide_legend(direction = "vertical",title="Population"),limits=c(min(east.usa.bottom$p90_30),max(east.usa.bottom$p10_30)))

Arrange Plots by Region

north2010

south2010

Question the Data Can Answer Going Forward

This data can be used to answer several interesting questions, but for the sake of brevity here’s just a few:

In what locations of the United States is the population exposed to nuclear power plants the highest?
How have population exposures to nuclear power plants in the United States changed over time between 1990, 2000, and 2010?
How does the United States’ nuclear power plant population exposures compare to that of Europe’s and Asia’s?

Exploratory Analysis of Nuclear Power Plants

Bryan Cole

Saturday, January 16, 2016