Motivation

The goal of this analysis is to find an interesting spatial dataset from the web and

The Data

Reading in the Data

The following code reads in the power plant dataset which is saved in my working directory.

locations <- read.csv("energy-pop-exposure-nuclear-plants-locations_plants.csv")

There are 276 observations in the data which represent different nuclear power plants and 61 variables.

load packages used in this analysis:

library(dplyr)   
library(fields) 
library(maps)
library(ggmap)

Making the Plots

Since the dataset is global, the first plot I make is global to get an idea of where the nuclear power plants are concentrated.

# get range of latitudes in the dataset for plotting
world.lat.lims <- range(locations$Latitude) + c(-3,3)

# get range of longitudes in the dataset for plotting
world.lon.lims <- range(locations$Longitude) + c(-3,3)

# plot world map with specified lat/lon limits
map('world',xlim=world.lon.lims,ylim=world.lat.lims)
title('Global Locations of Nuclear Power Plants')
points(locations$Longitude, locations$Latitude, cex=0.60, col='red')

It’s clear that the majority of nuclear power plants are located in the Eastern United States, Europe, and Eastern Asia. For now, I will reduce the scope of the analyis to population exposure estimates in the Eastern United States. Going forward, I will update this document as more attributes of the data for this region are explored, formal statistical tests are performed, and similar analyses for Europe and Japan are complete.

Reduce Data to the Eastern United States

There are 85 nuclear power plants total in this dataset for the USA, and the following commands narrow that number down to 68 in the Eastern USA. Lastly, I split this region into two halves (Northeastern & Southeastern) in order to zoom in more. The split occurs approximately at the northern borders of North Carolina, Tennessee, and Arkansas.

# make factor varialbe character in order to use filter()
locations$Country <- as.character(locations$Country)

# reduce locations dataset to the Northeastern USA
northeast.usa <- filter(locations,Country=="UNITED STATES OF AMERICA" & Longitude > -94 & Latitude > 36.5)

# reduce locations dataset to the Southeastern USA
southeast.usa <- filter(locations,Country=="UNITED STATES OF AMERICA" & Longitude > -94 & Latitude < 36.5)

# remove one outlier in Puerto Rico labeled as "Bonus"
east.usa.bottom$Plant <- as.character(east.usa.bottom$Plant)
east.usa.bottom <- filter(east.usa.bottom,Plant!="BONUS")

Plot Nuclear Power Plant Locations/Population Exposures in the Eastern USA

The size and color of points are dictated by the amount of the population that is exposed to the particular plant within 30 kilometers (the smallest radius in the dataset) in 2010.

2010

top.lon.lims <- range(northeast.usa$Longitude) + c(-2,2)
top.lat.lims <- range(northeast.usa$Latitude) + c(-1,1)

# create bounding box for lowerleftlon, lowerleftlat, upperrightlon, upperrightlat
extent <- c(top.lon.lims[1],top.lat.lims[1],top.lon.lims[2],top.lat.lims[2])

# obtain the map
mtop <- get_map(extent,source="stamen",maptype="toner")

# plot the raster and add points with lat/lon coordinates
options(scipen=999) # prevent scientific notation in legend
north2010 <- ggmap(mtop) + 
    geom_point(aes(x=Longitude,y=Latitude,colour=p10_30,size=p10_30),data=northeast.usa) + 
    scale_color_gradient(low="yellow",high="red",limits=c(min(northeast.usa$p90_30),max(northeast.usa$p10_30))) + 
    ggtitle("Northeastern Nuclear Power Plants in 2010 \n Size/Color by # Population Within 30 KM Exposed") +
    xlab("Longitude") + ylab("Latitude")+
    guides(colour=guide_colorbar(title="Population")) + 
    scale_size(guide = guide_legend(direction = "vertical",title="Population"),limits=c(min(northeast.usa$p90_30),max(northeast.usa$p10_30)))

#################################################################

bottom.lon.lims <- range(east.usa.bottom$Longitude) + c(-2,2)
bottom.lat.lims <- range(east.usa.bottom$Latitude) + c(-1,1)

# create bounding box for lowerleftlon, lowerleftlat, upperrightlon, upperrightlat
extent2 <- c(bottom.lon.lims[1],bottom.lat.lims[1],bottom.lon.lims[2],bottom.lat.lims[2])

# obtain the map
mbot <- get_map(extent2,source="stamen",maptype="toner")

# plot the raster and add points with lat/lon coordinates
options(scipen=999) # prevent scientific notation in legend
south2010 <- ggmap(mbot) + 
    geom_point(aes(x=Longitude,y=Latitude,colour=p10_30,size=p10_30),data=east.usa.bottom) + 
    scale_color_gradient(low="yellow",high="red",limits=c(min(east.usa.bottom$p90_30),max(east.usa.bottom$p10_30))) + 
    ggtitle("Southeastern Nuclear Power Plants in 2010 \n Size/Color by # Population Within 30 KM Exposed") + 
    xlab("Longitude") + 
    ylab("Latitude") + 
    guides(colour=guide_colorbar(title="Population")) + 
    scale_size(guide = guide_legend(direction = "vertical",title="Population"),limits=c(min(east.usa.bottom$p90_30),max(east.usa.bottom$p10_30)))

Arrange Plots by Region

north2010

south2010

Question the Data Can Answer Going Forward

This data can be used to answer several interesting questions, but for the sake of brevity here’s just a few: