library(tidyverse)
library(ggplot2)
library(knitr)
radiation <- read.csv("~/BUA 681/provisional-2016-bi-annual-aquatic-and-terrestrial-monitoring-results-26062017-raw-results.csv")

Requirements

2016 Radiological Monitoring Data from the UK

This data was obtained through data.gov.uk, a government run site that provides open use data. This is explicitly stated on the site, and confirmed at the bottom of the webpage, where it states that “All content is available under the Open Government Licence v3.0, except where otherwise stated”. A link to the page where the data was downloaded is available here.

Graph 1: Highest Radiation by Sample

Illustrates the 5 individual samples with highest amount of radiation

topradiation <- radiation  %>% drop_na() %>% arrange(desc(Result_SFG)) %>% top_n(5) %>% rename(sample_type = DESCRIPTION, radiation_amount = Result_SFG)
ggplot(topradiation, aes(x = fct_reorder(sample_type, radiation_amount, .desc = FALSE), y = radiation_amount)) +
  geom_bar(position = "dodge", stat = "identity", fill = "deepskyblue1") +
  coord_flip() +
  ggtitle("Sample Types with Highest Average Radiation")

Table 1: Average Radiation by Sample type

The table below shows the various sample types that were tested, the number of each sample type (n), and the average amount of radiation in Bq/kg or Bq/L:

radiation %>% drop_na() %>% group_by(DESCRIPTION) %>% summarize(n = n(), average_radiation = mean(as.numeric(Result_SFG), na.rm = TRUE)) %>% arrange(desc(average_radiation)) %>% top_n(20) %>% kable()
DESCRIPTION n average_radiation
SOI - Soil 13 9395.39962
PEE - Edible winkle 155 1104.23357
RAB - Rabbit 9 776.95974
MUS - Blue (edible) mussel 105 550.57173
LAM - Lamb muscle 32 329.32360
DER - Deer 6 263.71813
COC - Common cockle 11 196.40618
LBE - European lobster 64 190.03128
LLK - Lamb liver & kidney 40 118.54764
CRE - Edible crab 74 102.39907
NEP - Norway lobster 21 100.10405
LUC - Lucerne 2 93.41200
GRS - Grass 58 86.01175
MSH - Mushroom 7 82.14471
BAR - Barley 3 80.21395
SOL - Sole (Dover sole) 3 71.14200
ESB - European sea bass 2 67.96000
WHT - Wheat 21 62.21095
HAD - Haddock 2 61.29000
POK - Saithe 2 59.42900

Average Radiation by Site Location

In addition to what types of samples have the highest radiation, we can also look at WHERE the highest radiation occurrs by grouping by collection site.

radlocation <- radiation  %>% drop_na() %>% group_by(SITE_NAME) %>% summarize(n = n(), average_radiation = mean(as.numeric(Result_SFG), na.rm = TRUE)) %>% arrange(desc(average_radiation)) %>% top_n(5) %>% rename(Site_Name = SITE_NAME, radiation_amount = average_radiation)
ggplot(radlocation, aes(x = fct_reorder(Site_Name, radiation_amount, .desc = FALSE), y = radiation_amount)) +
  geom_bar(position = "dodge", stat = "identity", fill = "deepskyblue1") +
  coord_flip() +
  ggtitle("Locations with Highest Average Radiation")

Table 2: Locations with Highest Average Radiation, including sample sizes

Something important to consider with the above graph is the sample sizes for each location. Sellafield has a disproportionately large amount of radiation compared to other locations, but looking at the number of samples, n, in the table below, we see that nearly 4600 samples were tested from this location, more than 27 times the other location with top average radiation:

radiation %>% group_by(SITE_NAME) %>% summarize(n = n(), average_radiation = mean(as.numeric(Result_SFG), na.rm = TRUE)) %>% arrange(desc(average_radiation)) %>% rename(Site_Name = SITE_NAME) %>% top_n(5) %>% kable()
Site_Name n average_radiation
Sellafield 4599 368.93201
Ascot 32 141.04380
Capenhurst 128 94.07750
Vickers shipyard 32 85.76607
Harwell 170 68.43296

Discussion

I chose this data set because I thought it was interesting that many commonly consumed food items contain at least some levels of radiation. This data set measures radiation in either Bq/kg or Bq/L, and contained measurements from over 10,000 samples in 2016. I would have expected more of a trend in radiation levels, but the data shows that the highest radiation was found in an escallop, followed by potatoes and deer, which constitute seafood, crops, and meat. When considering types of sample instead of individual samples, this showed that soil had the highest amount of radiation, which is what we would expect for something that is picked up from the environment.

When shifting focus to the location of irradiated samples, it initially looks as though Sellafield contained the most irradiated samples with more than double the average of the next highest location; however, looking at the data another way shows that this outlier is most likely due to the very large sample size. Considering the number of samples, I would be far less likely to want to consume something from Ascot, which was the second highest in average radiation with only 0.7% of the sample size of Sellafield. Yikes!

Something I had trouble with when working with the data was deciphering what each column represented, as there was no guide to break down acronyms. Many of the columns were not needed for the analysis that I performed, and there were multiple columns that had slightly different results (Non-Amended Results, Amended Results, Results SFG). All analysis was done using the Results_SFG column, as this appeared to be the result of accounting for slight variations in testing. Another struggle was manipulating the Results_SFG data to show the information I wanted. While the inputs were numerical, R showed the inputs as character vectors and thus would not accept functions such as mean() to find the average value for groups. I needed to use the as.numerical() function several times in order to transform the data into something that could be manipulated in this manner.