library(tidyverse)
library(ggplot2)
library(knitr)
radiation <- read.csv("~/BUA 681/provisional-2016-bi-annual-aquatic-and-terrestrial-monitoring-results-26062017-raw-results.csv")
This data was obtained through data.gov.uk, a government run site that provides open use data. This is explicitly stated on the site, and confirmed at the bottom of the webpage, where it states that “All content is available under the Open Government Licence v3.0, except where otherwise stated”. A link to the page where the data was downloaded is available here.
Illustrates the 5 individual samples with highest amount of radiation
topradiation <- radiation %>% drop_na() %>% arrange(desc(Result_SFG)) %>% top_n(5) %>% rename(sample_type = DESCRIPTION, radiation_amount = Result_SFG)
ggplot(topradiation, aes(x = fct_reorder(sample_type, radiation_amount, .desc = FALSE), y = radiation_amount)) +
geom_bar(position = "dodge", stat = "identity", fill = "deepskyblue1") +
coord_flip() +
ggtitle("Sample Types with Highest Average Radiation")
The table below shows the various sample types that were tested, the number of each sample type (n), and the average amount of radiation in Bq/kg or Bq/L:
radiation %>% drop_na() %>% group_by(DESCRIPTION) %>% summarize(n = n(), average_radiation = mean(as.numeric(Result_SFG), na.rm = TRUE)) %>% arrange(desc(average_radiation)) %>% top_n(20) %>% kable()
| DESCRIPTION | n | average_radiation |
|---|---|---|
| SOI - Soil | 13 | 9395.39962 |
| PEE - Edible winkle | 155 | 1104.23357 |
| RAB - Rabbit | 9 | 776.95974 |
| MUS - Blue (edible) mussel | 105 | 550.57173 |
| LAM - Lamb muscle | 32 | 329.32360 |
| DER - Deer | 6 | 263.71813 |
| COC - Common cockle | 11 | 196.40618 |
| LBE - European lobster | 64 | 190.03128 |
| LLK - Lamb liver & kidney | 40 | 118.54764 |
| CRE - Edible crab | 74 | 102.39907 |
| NEP - Norway lobster | 21 | 100.10405 |
| LUC - Lucerne | 2 | 93.41200 |
| GRS - Grass | 58 | 86.01175 |
| MSH - Mushroom | 7 | 82.14471 |
| BAR - Barley | 3 | 80.21395 |
| SOL - Sole (Dover sole) | 3 | 71.14200 |
| ESB - European sea bass | 2 | 67.96000 |
| WHT - Wheat | 21 | 62.21095 |
| HAD - Haddock | 2 | 61.29000 |
| POK - Saithe | 2 | 59.42900 |
In addition to what types of samples have the highest radiation, we can also look at WHERE the highest radiation occurrs by grouping by collection site.
radlocation <- radiation %>% drop_na() %>% group_by(SITE_NAME) %>% summarize(n = n(), average_radiation = mean(as.numeric(Result_SFG), na.rm = TRUE)) %>% arrange(desc(average_radiation)) %>% top_n(5) %>% rename(Site_Name = SITE_NAME, radiation_amount = average_radiation)
ggplot(radlocation, aes(x = fct_reorder(Site_Name, radiation_amount, .desc = FALSE), y = radiation_amount)) +
geom_bar(position = "dodge", stat = "identity", fill = "deepskyblue1") +
coord_flip() +
ggtitle("Locations with Highest Average Radiation")
Something important to consider with the above graph is the sample sizes for each location. Sellafield has a disproportionately large amount of radiation compared to other locations, but looking at the number of samples, n, in the table below, we see that nearly 4600 samples were tested from this location, more than 27 times the other location with top average radiation:
radiation %>% group_by(SITE_NAME) %>% summarize(n = n(), average_radiation = mean(as.numeric(Result_SFG), na.rm = TRUE)) %>% arrange(desc(average_radiation)) %>% rename(Site_Name = SITE_NAME) %>% top_n(5) %>% kable()
| Site_Name | n | average_radiation |
|---|---|---|
| Sellafield | 4599 | 368.93201 |
| Ascot | 32 | 141.04380 |
| Capenhurst | 128 | 94.07750 |
| Vickers shipyard | 32 | 85.76607 |
| Harwell | 170 | 68.43296 |
I chose this data set because I thought it was interesting that many commonly consumed food items contain at least some levels of radiation. This data set measures radiation in either Bq/kg or Bq/L, and contained measurements from over 10,000 samples in 2016. I would have expected more of a trend in radiation levels, but the data shows that the highest radiation was found in an escallop, followed by potatoes and deer, which constitute seafood, crops, and meat. When considering types of sample instead of individual samples, this showed that soil had the highest amount of radiation, which is what we would expect for something that is picked up from the environment.
When shifting focus to the location of irradiated samples, it initially looks as though Sellafield contained the most irradiated samples with more than double the average of the next highest location; however, looking at the data another way shows that this outlier is most likely due to the very large sample size. Considering the number of samples, I would be far less likely to want to consume something from Ascot, which was the second highest in average radiation with only 0.7% of the sample size of Sellafield. Yikes!
Something I had trouble with when working with the data was deciphering what each column represented, as there was no guide to break down acronyms. Many of the columns were not needed for the analysis that I performed, and there were multiple columns that had slightly different results (Non-Amended Results, Amended Results, Results SFG). All analysis was done using the Results_SFG column, as this appeared to be the result of accounting for slight variations in testing. Another struggle was manipulating the Results_SFG data to show the information I wanted. While the inputs were numerical, R showed the inputs as character vectors and thus would not accept functions such as mean() to find the average value for groups. I needed to use the as.numerical() function several times in order to transform the data into something that could be manipulated in this manner.