Exploring the Periodic Table with R

	BoilingPoint	Density	MeltingPoint
Mean	2513.14	9.23	1281.48
Std.Dev	1601.90	8.63	903.69
Min	4.22	0.00	14.18
Q1	1040.00	2.70	505.21
Median	2767.00	7.29	1204.15
Q3	3609.00	12.00	1812.15
Max	5869.00	41.00	3948.15
MAD	1948.88	6.89	955.15
IQR	2527.75	9.30	1300.46
CV	0.64	0.93	0.71
Skewness	0.11	1.68	0.71
SE.Skewness	0.24	0.24	0.24
Kurtosis	-1.01	3.17	0.07
N.Valid	98	105	98
Pct.Valid	83.05	88.98	83.05
---
title: "Exploring the Periodic Table with R"
output: 
  flexdashboard::flex_dashboard:
    social: [ "twitter", "facebook", "menu" ]
    orientation: rows
    theme: cerulean
    vertical_layout: fill
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(ggplot2)
library(readxl)
library(tidyverse)
library(plotly)
library(ggmap)
library(tmaptools)
library(DT)
library(lubridate)
library(leaflet)
library(countrycode)
library(png)
library(htmlwidgets)
library(htmltools)
library(wordcloud2)
library(summarytools)


require(knitr)
```

```{r prepare_data}
# Read data

ptable <- read_csv("https://gist.githubusercontent.com/GoodmanSciences/c2dd862cd38f21b0ad36b8f96b4bf1ee/raw/1d92663004489a5b6926e944c1b3d9ec5c40900e/Periodic%2520Table%2520of%2520Elements.csv")

# Read country_data
# Created from https://www.smithsonianmag.com/smart-news/the-periodic-table-of-elemental-discoveries-1773011/

pcountries <- read_csv("data/pt_countries.csv")

ptable_countries <- ptable %>% 
  left_join(pcountries, by = c("AtomicNumber" = "atomic_number")) %>% 
  pivot_longer(cols = starts_with("country"),
               names_to = "country_list",
               names_prefix = "country",
               values_to = "country", 
               values_drop_na = TRUE) %>% 
  mutate(country_name = countrycode(country, origin = "iso3c", destination = "country.name")) %>% 
  mutate(address = case_when(country == "aut" ~ paste("Vienna, ", country_name),
                          country == "che" ~ paste("Bern, ", country_name),
                          country == "deu" ~ paste("Berlin, ", country_name),
                          country == "dnk" ~ paste("Copenhagen, ", country_name),
                          country == "esp" ~ paste("Madrid, ", country_name),
                          country == "fin" ~ paste("Helsinki, ", country_name),
                          country == "fra" ~ paste("Paris, ", country_name),
                          country == "gbr" ~ paste("London, ", country_name),
                          country == "ita" ~ paste("Rome, ", country_name),
                          country == "rou" ~ paste("Bucharest, ", country_name),
                          country == "rus" ~ paste("Moscow, ", country_name),
                          country == "swe" ~ paste("Stockholm, ", country_name),
                          country == "usa" ~ paste("Washington DC, ", country_name)))
# Geocode data
# Run only once to get geocode, then load from RDA file
# source("scripts/geocode_countries.R")
# geocoding <- geocode_countries(ptable_countries, "address")
# save(geocoding, file = "data/country_geocoding.rda")

load("data/country_geocoding.rda")

ptable_countries <- ptable_countries %>% 
  left_join(geocoding, by = c("address" = "query")) 

# Clean up
remove(geocoding)

```

About {data-icon="fa-address-card"}
=============================

Row {data-height=400}
-----------------------------

### About this dashboard

This dashboard was developed to showcase a number of awesome features of R (and other programming languages used in data analysis today). Many people still think programming is used by developers to build systems and is out of reach of everyone, but with this dashboard we'd like to show non-programmers how powerful, versatile and fun it is to learn tools like R or Python. 

The dashboard was developed for Chemistry Honours students in South Africa hence the focus on the periodic table. 

We hope this resource will assist others to build lessons around R (or other programming languages and data tools such as OpenRefine, FrictionlessData, and more).

### Reproducible data analysis is fun and good!

```{r words, fig.align="center"}

count_words <- c(8, 4, 8, 7, 5, 
                 3, 6, 3, 2, 4, 
                 3, 1, 2, 5, 2, 
                 4, 5, 6, 3, 2, 
                 2, 3, 4, 5, 6,
                 1, 2, 6, 4, 5,
                 5, 3, 2, 1, 2,
                 5, 3, 2, 1, 2,
                 5, 3, 2, 1, 2, 
                 2, 3, 4, 5, 6, 
                 5, 3, 2, 1, 2, 
                 2, 3, 4, 2, 1, 
                 2, 3, 2, 1, 2, 
                 2, 3, 2, 1, 2, 
                 2, 3, 2, 1, 2, 
                 2, 3, 2, 1, 2, 
                 2, 3, 2, 1, 2, 
                 2, 3, 2, 1, 2, 
                 2, 3, 2, 1, 2)
count_words <- count_words * count_words

words_df <- data.frame(words = c("fun", "powerful", "reproducible", "transferrable",
                                 "community", 
                                 "data science", "collaboration", "free",
                                 "open source", "R", 
                                 "Python", "OpenRefine", "FrictionlessData", "tidy data", "employability", 
                                 "learn", "courses", "online", "you can!", "possibilities", 
                                 "opportunities", "analysis", "visualisation", "dashboards", "mapping", 
                                 "gis", "presentations", "reports", "plots", "statistics", 
                                 "analysis", "manipulation", "wrangling", "cleaning", "combining", 
                                 "websites", "blogs", "books", "databases", "impress",
                                 "web apps", "interactive", "documentation", "AI", "machine learning",
                                 "APIs", "serve", "games", "cloud", "modeling", 
                                 "programming", "scripts", "calculate", "mean", "descriptive", 
                                 "present", "teach", "share", "open", "data",
                                 "geography", "chemistry", "science", "humanities", "social", 
                                 "grow", "chemistry", "biology", "genomics", "stats",
                                 "analysis", "learn", "new", "skills", "wonderful",
                                 "amazing", "cool", "interesting", "novel", "exciting",
                                 "will", "growth", "mindset", "can", "do",
                                 "challenge", "adventure", "achieve", "achievement", "CV",
                                 "resume", "expand", "imagine", "super", "strength"),
                       freq = count_words)


cloud <- wordcloud2(data = words_df, size=0.3, shape = "circle", color = rep_len(c("#ffbe0b", "#fb5607", "#ff006e", "#8338ec", "#3a86ff"), nrow(words_df)), fontFamily= "mono")

# Clean up 
remove(count_words, words_df, cloud)

```





Row
----------------------------

### Resources for learning R

Here are a few good resources if you'd like to jump in:

- RStudio suggests [6 Ways to Begin with R](https://education.rstudio.com/learn/beginner/)
- We've found that it's really important to find a community. You can join the [Carpentries community](https://carpentries.org) to learn about upcoming workshops in your region and to get access to awesome resources
- Twitter is very powerful and there are loads of handles to follow:
    - [rstudio](https://twitter.com/rstudio)
    - [AfricaRUsers](https://twitter.com/AfricaRUsers)
    - [WeAreRLadies](https://twitter.com/WeAreRLadies)
    - many, many more!
    
### Dashboard Instructions

The dashboard is like a storyboard - different pages can be accessed by clicking on the tabs at the top of the page (i.e. [About](#about), [Periodic Table](#periodic-table), [Data](#data), [Analysis](#analysis), [Visualisation](visualisation), [Maps](maps), and [Timeline](timeline)).

##### Data

- Comma-seperated value data for the periodic table was obtained from https://gist.github.com/GoodmanSciences/c2dd862cd38f21b0ad36b8f96b4bf1ee.
- Country data was obtained from https://www.jamiebgall.co.uk/periodic-table?pgid=jry6l6ed-6997541b-20fd-431a-86ca-b27a46663898 by Dr Jamie Gallagher. The data is not based on the nationality of the researcher(s) but on the institution/funder.

##### Code
The project lives in a [Github repository](https://github.com/anelda/data-intro) and is published under an open license to encourage reuse.

##### Under the hood
The data and process along with more information about the packages that were used to create the different pages, are described in the [README file](https://github.com/anelda/showcaseR-periodic-table/blob/master/README.md) in the Github repository. 

_Please cite as_:  [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3878642.svg)](https://doi.org/10.5281/zenodo.3878642)

Anelda van der Walt. (2020, June 5). Exploring the Periodic Table with R (Version v0.1). Zenodo. http://doi.org/10.5281/zenodo.3878642




  

Periodic Table {data-icon="fa-columns"}
=============================

### Periodic Table

![Periodic Table - Image by Gerd Altmann from Pixabay](img/per_table.png)



Data {data-icon="fa-info-circle"}
=============================
Row {data-width=150}
--------------------------------------
### Total number of elements
```{r num_elements}
num_elements <- nrow(ptable)
valueBox(value = num_elements,icon = "fa-list-ol",caption = "Number of elements",color = "#89ff89")

# Clean up
remove(num_elements)
```

### Artificial
```{r artificial}
artificial <- ptable %>% filter(Phase == "artificial") %>% nrow()
valueBox(value = artificial,icon = "fa-vial", caption = "Artificial elements", color = "orange")

# Clean up
remove(artificial)
```

### Radioactive
```{r radioactive}
radioactive <- ptable %>% filter(Radioactive == "yes") %>% nrow()
valueBox(value = radioactive,icon = "fa-exclamation-triangle",caption = "Radioactive elements", color = "#ff4e4e")

# Clean up
remove(radioactive)
```


Row
----------------------------------

### Periodic table
```{r periodic_table}
datatable(ptable, class = 'cell-border stripe', caption="Source: https://gist.github.com/GoodmanSciences/c2dd862cd38f21b0ad36b8f96b4bf1ee")
```

Analysis {data-icon="fa-binoculars"}
===================================

Row
----------------------

### Descriptive statistics

```{r statistics}

descr_table <- ptable %>% 
  select(Density, MeltingPoint, BoilingPoint)

print(descr(descr_table), method = 'render', table.classes = 'st-small')

# Clean up
remove(descr_table)
```

Visualisation {data-icon="fa-chart-bar"}
================================

Row
----------------------

### Density distribution for different element types

```{r results='asis'}

analysis1 <- ptable %>% 
  select(Element, Type, Density) %>% 
  filter(!is.na(Type), !is.na(Density)) %>% 
  group_by(Type) %>% 
  ggplot(aes(x=Type, y=Density, color=Type)) +
  geom_boxplot() +
  geom_jitter(position=position_jitter(0.2)) +
  labs(y="Density", x="Element type") +
  theme_minimal() +
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())

analysis1

# Clean up
remove(analysis1)
```

### Bubble plot - Boiling point vs Melting point vs Density

```{r }

analysis2 <- ptable %>%
  # Remove NAs
  filter(!is.na(MeltingPoint), !is.na(BoilingPoint), !is.na(Density)) %>% 
  # Reorder countries to having big bubbles on top
  arrange(desc(Density)) %>%
  mutate(Element = factor(Element, Element)) %>%
  # prepare text for tooltip
  mutate(text = paste("Element: ", Element, "\nBoilingPoint: ", BoilingPoint, "\nMeltingPoint: ", MeltingPoint, "\nDensity: ", Density, sep="")) %>%
  # Classic ggplot
  ggplot( aes(x=BoilingPoint, y=MeltingPoint, size = Density, color = Type, text=text)) +
    geom_point(alpha=0.7) +
    scale_size(range = c(1.4, 14), name="Boiling Point") +
    theme(legend.position="none")

# turn ggplot interactive with plotly
analysis2_plot <- ggplotly(analysis2, tooltip="text")
analysis2_plot

# Clean up
remove(analysis2_plot, analysis2)

```

Maps {data-icon="fa-globe-africa"}
==================

Row
----------

### Where were funders/institutions based that were involved in discovery of elements?

This map was created by combining data from https://gist.github.com/GoodmanSciences/c2dd862cd38f21b0ad36b8f96b4bf1ee and  https://www.jamiebgall.co.uk/periodic-table?pgid=jry6l6ed-6997541b-20fd-431a-86ca-b27a46663898 and using OpenStreetMap geocoding through the [`tmaptools`](https://github.com/mtennekes/tmaptools) R package.

```{r maps}

# From https://www.earthdatascience.org/courses/earth-analytics/get-data-using-apis/leaflet-r/

map_count <- ptable_countries %>% 
  group_by(Element) %>% 
  tally(name = "count")

map_table <- ptable_countries %>% 
  left_join(map_count, by="Element") %>% 
  mutate(count = factor(count))

new <- c("lightblue", "lightgray","lightgreen")[map_table$count]

icons <- awesomeIcons(
  icon = 'ios-close',
  iconColor = 'black',
  library = 'ion',
  markerColor = new
)

unique_markers_map <- leaflet(map_table) %>%
  addProviderTiles("CartoDB.Positron") %>%
  setView(lng = -31, lat = 42, zoom = 2) %>% 
  addAwesomeMarkers(lng=~lon, lat=~lat, icon=icons,
                    popup=~paste(Element, " discovered by ", Discoverer),
                    label=~paste(Element, " funder/institution based in ", country_name),
                   clusterOptions = markerClusterOptions()) %>% 
  addLegend("topright", 
            colors =c("lightblue", "lightgray","lightgreen"),
            labels= c("1", "2","3"),
            title= "Number of countries involved in funding/discovery",
            opacity = 1) 

unique_markers_map

# Clean up
remove(new, icons, unique_markers_map, map_count, map_table)

```

Timeline {data-icon="fa-calendar-alt"}
====================

Row{data-height=250}
--------

### Earliest discovery
```{r}
earliest_year  <-  sort(ptable$Year)[1]
earliest_element <-  ptable %>% filter(Year == earliest_year)

valueBox(earliest_element$Element, caption = paste0("Earliest element discovered"," (", earliest_year, ")"),color = "#70A2F2")

# Clean up
remove(earliest_year, earliest_element)

```

### Most recent discovery
```{r}
last_year  <-  sort(ptable$Year, decreasing = TRUE)[1]
last_elements <-  ptable %>% filter(Year == last_year)
  
valueBox(last_elements$Element[2],caption = paste0("Most recent element discovered", " (", last_year, ")"),color = "#89ff89")

# Clean up
remove(last_year, last_elements)

```

### Discovery per century
```{r}
ptable  <-  ptable %>% 
  mutate(Century = case_when(Year >= 1200 & Year < 1300 ~ "13th Century",
                             Year >= 1600 & Year < 1700 ~ "17th Century",
                             Year >= 1700 & Year < 1800 ~ "18th Century",
                             Year >= 1800 & Year < 1900 ~ "19th Century",
                             Year >= 1900 & Year < 2000 ~ "20th Century",
                             Year >= 2000 & Year < 2100 ~ "21th Century",
                             is.na(Year) ~ "Unknown"))
century_table <- ptable %>% 
  group_by(Century) %>% 
  summarise(count=n())

century_plot <- plot_ly(century_table) %>% 
  add_pie(labels=century_table$Century,
          values=century_table$count,
          hole=0.6,
          textinfo="label+percent") %>% 
  layout(showlegend = FALSE)


century_plot

# Clean up
remove(century_table, century_plot)

```


Row
----------
### Timeline of discoveries since 1950
```{r}

# From https://benalexkeen.com/creating-a-timeline-graphic-using-r-and-ggplot2/

# Add column with proper date
ptable$date <- lubridate::ymd(ptable$Year, truncated = 2L)

ptable <- ptable %>% 
  filter(Year >= "1950") %>% 
  mutate(Type = case_when(is.na(Type) ~ "Not available",
                          TRUE ~ as.character(Type))
  )

# Add column with colour per country
type_levels <- as.character(unique(ptable$Type))
# Colour palette from http://vrl.cs.brown.edu/color
type_colors <- c("#0070C0", "#00B050", "#FFC000", "#C00000")

ptable$Type <- factor(ptable$Type, levels=type_levels, ordered=TRUE)

positions <- c(0.5, -0.5, 1.0, -1.0, 1.5, -1.5, 2.0, -2.0, 2.5, -2.5)
directions <- c(1, -1)

line_pos <- data.frame(
  "date"=unique(ptable$date),
  "position"=rep(positions, length.out=length(unique(ptable$date))),
  "direction"=rep(directions, length.out=length(unique(ptable$date)))
)

ptable <- merge(x=ptable, y=line_pos, by="date", all = TRUE)
ptable <- ptable[with(ptable, order(date, Type)), ]

text_offset <- 0.1

ptable$year_count <- ave(ptable$date==ptable$date, ptable$date, FUN=cumsum)
ptable$text_position <- (ptable$year_count * text_offset * ptable$direction) + ptable$position



year_buffer <- 2
sort_date <- ptable %>%
  filter(!is.na(date)) %>% 
  arrange(date) 
earliest_date <- sort_date$date[1]
latest_date <- sort_date$date[21]



year_date_range <- seq(earliest_date - years(year_buffer),latest_date + years(year_buffer), by=1825)

year_format <- format(year_date_range, '%Y')
year_df <- data.frame(year_date_range, year_format)

timeline_plot<-ggplot(ptable,aes(x=date,y=0, col=Type, label=Element))+
  labs(col="Type") +
  scale_color_manual(values=type_colors, labels=type_levels, drop = FALSE) + 
  theme_classic()

# Plot horizontal black line for timeline
timeline_plot<-timeline_plot+geom_hline(yintercept=0,
                                        color = "black", size=0.3)

# Plot vertical segment lines for milestones
timeline_plot<-timeline_plot +
  geom_segment(data=ptable[ptable$year_count == 1,],
               aes(y=position,yend=0,xend=date), 
               color='black', size=0.2)

# Plot scatter points at zero and date
timeline_plot<-timeline_plot +
  geom_point(aes(y=0), size=3)

# Don't show axes, appropriately position legend
timeline_plot<-timeline_plot+theme(axis.line.y=element_blank(),
                                   axis.text.y=element_blank(),
                                   axis.title.x=element_blank(),
                                   axis.title.y=element_blank(),
                                   axis.ticks.y=element_blank(),
                                   axis.text.x =element_blank(),
                                   axis.ticks.x =element_blank(),
                                   axis.line.x =element_blank(),
                                   legend.position = "bottom"
)

# Show text for each month
timeline_plot<-timeline_plot+geom_text(data=year_df, aes(x=year_date_range,y=-0.2,label=year_format, fontface = "bold"),size=2.5,vjust=0.5, color='black', angle=90)

# Show text for each milestone
timeline_plot<-timeline_plot+geom_text(aes(y=text_position,label=Element),size=2.5)
print(timeline_plot)

# Clean up
remove(timeline_plot, year_df, year_date_range, year_format, year_buffer, sort_date, text_offset, line_pos, type_levels, type_colors)
```
About

Row

About this dashboard

Reproducible data analysis is fun and good!

Row

Resources for learning R

Dashboard Instructions

Data

Code

Under the hood

Periodic Table

Periodic Table

Data

Row

Total number of elements

Artificial

Radioactive

Row

Periodic table

Analysis

Row

Descriptive statistics

Descriptive Statistics

descr_table

Visualisation

Row

Density distribution for different element types

Bubble plot - Boiling point vs Melting point vs Density

Maps

Row

Where were funders/institutions based that were involved in discovery of elements?

Timeline

Row

Earliest discovery

Most recent discovery

Discovery per century

Row

Timeline of discoveries since 1950