The International Atomic Energy Agency provides information about nuclear power plants by country and region in this database: https://pris.iaea.org/pris/CountryStatistics/
Unfortunately, it is not possible to download the datasets, so we asked chatGPT to write some R code to help us in extracting the required information. The code generated by the robot produced the desired results.
In this exercise we will use the following packages:
rvest: Easily Harvest (Scrape) Web Pagestidyverse: A collection of R packages designed for data
sciencejanitor: Simple Tools for Examining and Cleaning Dirty
Dataskimr: Compact and Flexible Summaries of Dataggblanket: Simplify ggplot2
Visualisationggbeeswarm: Categorical Scatter (Violin Point)
Plotshrbrthemes: Additional Themes, Theme Components and
Utilities for ggplot2scales: Scale Functions for Visualizationknitr: A General-Purpose Package for Dynamic Report
Generation in Rggthemes: Extra Themes, Scales and Geoms for
ggplot2patchwork: The Composer of Plotsggmosaic: Mosaic Plots in the ggplot2
FrameworkCGPfunctions: Package to Draw Slopegraphs in Rggbump: Bump Chart and Sigmoid Curveswesanderson: A Wes Anderson Palette Generatorleaflet: Create Interactive Web Maps with the
JavaScript ‘Leaflet’ LibraryThe code produced by chatGPT worked flawlessly. We just had to add the country names to each subset in order to use this information for further analysis.
More often than not, the initial datasets need to be checked, revised
and cleaned in order to facilitate further analysis. There are good
libraries able to facilitate this work. In this case, we will use the
janitor1 and tidyverse::dplyr2 packages
to clean the dataset.
Once we have cleaned the dataset, we need to make an exploratory analysis to check for inconsistencies, errors, missing values, data distributions, relationthip among variables, etc.
Out of the many packages out there, we will use (again)
janitor and skimr.
janitor::tabyl to create frequency tablesFor instance, we can use janitor::tabyl to create a
frequency table for the nuclear plants in operation or under
construction.
The output of tabyl can be “piped” to
knitr::kable in order to automatically create formatted
tables in PDF, Word or HTML.
| country | BWR | FBR | HTGR | LWGR | PHWR | PWR | Total |
|---|---|---|---|---|---|---|---|
| Canada | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 19 (100.0%) | 0 (0.0%) | 19 (100.0%) |
| China | 0 (0.0%) | 3 (3.9%) | 1 (1.3%) | 0 (0.0%) | 2 (2.6%) | 71 (92.2%) | 77 (100.0%) |
| France | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 57 (100.0%) | 57 (100.0%) |
| Russia | 0 (0.0%) | 3 (7.5%) | 0 (0.0%) | 11 (27.5%) | 0 (0.0%) | 26 (65.0%) | 40 (100.0%) |
| United States | 31 (33.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 63 (67.0%) | 94 (100.0%) |
| Total | 31 (10.8%) | 6 (2.1%) | 1 (0.3%) | 11 (3.8%) | 21 (7.3%) | 217 (75.6%) | 287 (100.0%) |
skimr::skim for a quick exploratory analysisLet’s say we are interested in a quick analysis of the installed
capacity (MW) by country for plants under construction or in operation.
We can use the skimr::skim() function.
| type | var | country | missing | complete | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| numeric | MW | Canada | 0 | 1 | 769.9 | 167.6 | 540 | 542 | 868 | 881.5 | 934 |
| numeric | MW | China | 0 | 1 | 1046.0 | 290.4 | 25 | 1060 | 1089 | 1200.0 | 1750 |
| numeric | MW | France | 0 | 1 | 1152.5 | 239.6 | 917 | 951 | 956 | 1381.0 | 1650 |
| numeric | MW | Russia | 0 | 1 | 810.1 | 387.2 | 12 | 440 | 1000 | 1000.0 | 1255 |
| numeric | MW | United States | 0 | 1 | 1090.2 | 217.8 | 560 | 922 | 1192 | 1250.0 | 1500 |
Once we are sure that our dataset is finally clean, we can start preparing visualizations, including plots, charts and tables.
The main graphic library in R is ggplot2,
which is part of the tidyverse. Within the
ggplot2 eco-system, the developers have created lots of
compatible extensions.
For an exhaustive list of extensions, refer to:
Apart from static plots, we can easiliy produce dynamic or animated
plots with packages such as plotly, gganimate
or ggiraph.
For instance, let’s use ggplot, ggblanket
and ggthemes to produce a nice looking column chart showing
the evolution of installed MW by year and technology.
We can repeat the plot showing the evolution by country instead of by technology.
Interested in plotting the density distribution of NPPs by country and technology? Easy!
What we can see is that there are only PHWR plants in Canada, while the PWR technology dominates in France.
The dominant installed capacities in France are ~900 MW and ~1,200 MW, while the MW in Russia, China and the US are much more dispersed.
ggblanketAnother variation of a column chart using ggblanket.
The below chart shows the total installed capacity by country and technology for plants in operation or under construction. PWR is the dominant technology, particularly in China and France, while Canada relies exclusively on the PHWR technology.
A beeswarm plot conveys the size of a group of items by visually
clustering the each individual data point. Here we can use the
ggbeeswarm along with ggplot.
patchwork is a package that expands ggplot
to allow for arbitrarily complex composition of plots by, among others,
providing mathematical operators for combining multiple plots.
For instance, if p1 and p2are
ggplot objects, then p1 / p2 produces the
following plot:
While p2 + p3 yields:
Designed to create visualizations of categorical data,
geom_mosaic() has the capability to produce bar charts,
stacked bar charts, mosaic plots, and double decker plots and therefore
offers a wide range of potential plots.
Slopegraphs are simplified line graphs in which neighboring points
are connected by a straight line. The package CGPfunctions
draws slopegraphs in R.
Bump charts are used to plot ranking over time, or other examples
when the path between two nodes have no statistical significance. The
package ggbump draws bump charts in R.
It is always advisable to draw maps: the audience loves maps! For
illustrative purposes, we will generate an interactive map with the
locations and characteristics of nuclear power plants in Canada, with
the help of the AI and the leaflet package.