1 The IAEA open data bank and chatGPT

The International Atomic Energy Agency provides information about nuclear power plants by country and region in this database: https://pris.iaea.org/pris/CountryStatistics/

Unfortunately, it is not possible to download the datasets, so we asked chatGPT to write some R code to help us in extracting the required information. The code generated by the robot produced the desired results.

2 R libraries

In this exercise we will use the following packages:

  • rvest: Easily Harvest (Scrape) Web Pages
  • tidyverse: A collection of R packages designed for data science
  • janitor: Simple Tools for Examining and Cleaning Dirty Data
  • skimr: Compact and Flexible Summaries of Data
  • ggblanket: Simplify ggplot2 Visualisation
  • ggbeeswarm: Categorical Scatter (Violin Point) Plots
  • hrbrthemes: Additional Themes, Theme Components and Utilities for ggplot2
  • scales: Scale Functions for Visualization
  • knitr: A General-Purpose Package for Dynamic Report Generation in R
  • ggthemes: Extra Themes, Scales and Geoms for ggplot2
  • patchwork: The Composer of Plots
  • ggmosaic: Mosaic Plots in the ggplot2 Framework
  • CGPfunctions: Package to Draw Slopegraphs in R
  • ggbump: Bump Chart and Sigmoid Curves
  • wesanderson: A Wes Anderson Palette Generator
  • leaflet: Create Interactive Web Maps with the JavaScript ‘Leaflet’ Library

3 Data extraction

The code produced by chatGPT worked flawlessly. We just had to add the country names to each subset in order to use this information for further analysis.

4 Data cleaning

More often than not, the initial datasets need to be checked, revised and cleaned in order to facilitate further analysis. There are good libraries able to facilitate this work. In this case, we will use the janitor1 and tidyverse::dplyr2 packages to clean the dataset.

5 Quick exploratory data analysis

Once we have cleaned the dataset, we need to make an exploratory analysis to check for inconsistencies, errors, missing values, data distributions, relationthip among variables, etc.

Out of the many packages out there, we will use (again) janitor and skimr.

5.1 Using janitor::tabyl to create frequency tables

For instance, we can use janitor::tabyl to create a frequency table for the nuclear plants in operation or under construction.

The output of tabyl can be “piped” to knitr::kable in order to automatically create formatted tables in PDF, Word or HTML.

Source: IAEA
country BWR FBR HTGR LWGR PHWR PWR Total
Canada 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 19 (100.0%) 0 (0.0%) 19 (100.0%)
China 0 (0.0%) 3 (3.9%) 1 (1.3%) 0 (0.0%) 2 (2.6%) 71 (92.2%) 77 (100.0%)
France 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 57 (100.0%) 57 (100.0%)
Russia 0 (0.0%) 3 (7.5%) 0 (0.0%) 11 (27.5%) 0 (0.0%) 26 (65.0%) 40 (100.0%)
United States 31 (33.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 63 (67.0%) 94 (100.0%)
Total 31 (10.8%) 6 (2.1%) 1 (0.3%) 11 (3.8%) 21 (7.3%) 217 (75.6%) 287 (100.0%)

5.2 Using skimr::skim for a quick exploratory analysis

Let’s say we are interested in a quick analysis of the installed capacity (MW) by country for plants under construction or in operation. We can use the skimr::skim() function.

Source: IAEA
type var country missing complete mean sd p0 p25 p50 p75 p100
numeric MW Canada 0 1 769.9 167.6 540 542 868 881.5 934
numeric MW China 0 1 1046.0 290.4 25 1060 1089 1200.0 1750
numeric MW France 0 1 1152.5 239.6 917 951 956 1381.0 1650
numeric MW Russia 0 1 810.1 387.2 12 440 1000 1000.0 1255
numeric MW United States 0 1 1090.2 217.8 560 922 1192 1250.0 1500

6 Plots, charts and tables

Once we are sure that our dataset is finally clean, we can start preparing visualizations, including plots, charts and tables.

The main graphic library in R is ggplot2, which is part of the tidyverse. Within the ggplot2 eco-system, the developers have created lots of compatible extensions.

For an exhaustive list of extensions, refer to:

Apart from static plots, we can easiliy produce dynamic or animated plots with packages such as plotly, gganimate or ggiraph.

6.1 Column chart

For instance, let’s use ggplot, ggblanket and ggthemes to produce a nice looking column chart showing the evolution of installed MW by year and technology.

We can repeat the plot showing the evolution by country instead of by technology.

6.2 Density plots

Interested in plotting the density distribution of NPPs by country and technology? Easy!

What we can see is that there are only PHWR plants in Canada, while the PWR technology dominates in France.

The dominant installed capacities in France are ~900 MW and ~1,200 MW, while the MW in Russia, China and the US are much more dispersed.

6.3 Column chart using ggblanket

Another variation of a column chart using ggblanket.

6.4 Tile geoms

The below chart shows the total installed capacity by country and technology for plants in operation or under construction. PWR is the dominant technology, particularly in China and France, while Canada relies exclusively on the PHWR technology.

6.5 Box plot

6.6 Beeswarm plots

A beeswarm plot conveys the size of a group of items by visually clustering the each individual data point. Here we can use the ggbeeswarm along with ggplot.

6.7 Patchwork library

patchwork is a package that expands ggplot to allow for arbitrarily complex composition of plots by, among others, providing mathematical operators for combining multiple plots.

For instance, if p1 and p2are ggplot objects, then p1 / p2 produces the following plot:

While p2 + p3 yields:

6.8 Mosaic plots

Designed to create visualizations of categorical data, geom_mosaic() has the capability to produce bar charts, stacked bar charts, mosaic plots, and double decker plots and therefore offers a wide range of potential plots.

6.9 Slopegraphs

Slopegraphs are simplified line graphs in which neighboring points are connected by a straight line. The package CGPfunctions draws slopegraphs in R.

6.10 Bump charts

Bump charts are used to plot ranking over time, or other examples when the path between two nodes have no statistical significance. The package ggbump draws bump charts in R.

7 Maps

It is always advisable to draw maps: the audience loves maps! For illustrative purposes, we will generate an interactive map with the locations and characteristics of nuclear power plants in Canada, with the help of the AI and the leaflet package.