7-19-2022 Tidy Tuesday: Technology Consumption

The Data

This week’s Tidy Tuesday Data comes is about Technology Adoption and comes from data.nber.org. The data details technology production and consumption over time for different countries. For more information read the working paper.

Loading the Data

technology <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-07-19/technology.csv')

Glimpse of what the data looks like:

head(technology)

## # A tibble: 6 × 7
##   variable label                                 iso3c  year group categ…¹ value
##   <chr>    <chr>                                 <chr> <dbl> <chr> <chr>   <dbl>
## 1 BCG      % children who received a BCG immuni… AFG    1982 Cons… Vaccin…    10
## 2 BCG      % children who received a BCG immuni… AFG    1983 Cons… Vaccin…    10
## 3 BCG      % children who received a BCG immuni… AFG    1984 Cons… Vaccin…    11
## 4 BCG      % children who received a BCG immuni… AFG    1985 Cons… Vaccin…    17
## 5 BCG      % children who received a BCG immuni… AFG    1986 Cons… Vaccin…    18
## 6 BCG      % children who received a BCG immuni… AFG    1987 Cons… Vaccin…    27
## # … with abbreviated variable name ¹category

technology %>%
  filter(category == "Energy") %>%
  group_by(group) %>%
  count()

## # A tibble: 1 × 2
## # Groups:   group [1]
##   group          n
##   <chr>      <int>
## 1 Production 66748

I then used the countrycode package to change the ISO3c codes to country names. A few were left off, so I manually added them after. I used an analogous procedure to extract the continents.

technology_country <- technology %>%
  mutate(country_name = countrycode(iso3c, origin = 'iso3c', destination = 'country.name')) %>%
  mutate(country_name = case_when(
    iso3c == "ANT" ~ "Netherlands Antilles",
    iso3c == "CSK" ~ "Czechoslovakia",
    iso3c == "ROM" ~ "Romania",
    iso3c == "XKX" ~ "Kosovo",
    iso3c == "XCD" ~ "Carribean", #not totally sure on this one
    TRUE ~ country_name)) %>%
  mutate(continent = countrycode(country_name, origin = 'country.name', destination = 'continent')) %>%
  mutate(continent = case_when(
    country_name == "Kosovo" ~ "Europe",
    country_name == "Antarctica" ~ "Antarctica",
    country_name == "Czechoslovakia" ~ "Europe",
    country_name == "Carribean" ~ "Americas",
    TRUE ~ continent))

Then I looked at the group and category to see what options I had to work with for creating a data visualization.

technology_country %>%
  select(group,category) %>%
  group_by(group,category) %>%
  unique()

## # A tibble: 15 × 2
## # Groups:   group, category [15]
##    group       category                   
##    <chr>       <chr>                      
##  1 Consumption Vaccines                   
##  2 Production  Agriculture                
##  3 Non-Tech    Agriculture                
##  4 Consumption Transport                  
##  5 Production  Industry                   
##  6 Consumption Financial                  
##  7 Production  Transport                  
##  8 Non-Tech    Other                      
##  9 Non-Tech    Hospital (non-drug medical)
## 10 Consumption Communications             
## 11 Consumption Hospital (non-drug medical)
## 12 Production  Energy                     
## 13 Creation    Other                      
## 14 Production  Other                      
## 15 Production  Communications

Energy!

I ultimately decided to focus on energy production per country. I first created a new data frame filtered to to find the top 20 countries by average energy production. Then I used this to filter a data frame, called ‘tech_energy’.

tech_energy_avg <- technology_country %>%
  filter(category == "Energy") %>%
  filter(label == "Electric power consumption (KWH)") %>%
  group_by(country_name) %>%
  summarize(mean_val = mean(value)) %>%
  arrange(desc(mean_val)) 

top_tech_energy_avg <- tech_energy_avg %>%
  slice_head(n = 20)

tech_energy <- technology_country %>%
  filter(category == "Energy") %>%
  #grepl regex helps to limit energy consumption only to energy types
  filter(grepl("from",label)) %>%
  #mutate_at removes the expression in parentheses
  mutate_at("label", str_replace, "(?=\\().*?(?<=\\))", "") %>%
  filter(country_name %in% top_tech_energy_avg$country_name)

## factor helps to rearrange the legend in the subsequent plots

tech_energy$label <- factor(tech_energy$label, levels = c("Electricity from coal ", "Electricity from oil ", "Electricity from gas ", "Electricity from nuclear ", "Electricity from hydro ", "Electricity from wind ", "Electricity from solar ", "Electricity from other renewables "))

The Plots

tech_plot_1 <- tech_energy %>%
  ggplot(aes(x = year, y = value))+
  xlim(c(1985,2020))+
  geom_point(size = .5, aes(color = label))+
  geom_line(aes(color = label))+
    facet_wrap(~country_name, scales = "free_y")+
  scale_color_manual(values = park_palette("smoky_mountains2", n = 8))+
  theme_bw()+
  theme(legend.position = "top", legend.justification = "left", legend.title = element_text(face = "bold"), title = element_text(size = 15), legend.text = element_text(color = "black",size = 11), axis.title = element_text(face = "bold"), legend.background = element_rect(fill = "gray95"), axis.text = element_text(color = "black"), strip.text = element_text(color = "black", face = "bold"))+
  guides(color = guide_legend(title.position = "top", title = "Energy Production Type"))+
  ylab("Annual Number of Terrawatt hours (TWh)")+
  xlab("Year (1985-2020)")+
  ggtitle("Comparing Types of Energy Production by Country")+
  theme(plot.title = element_text(size = 17, face = "bold", hjust = 0.5))+
  labs(caption ="Tidy Tuesday 07-19-2022 | GitHub: @scolando")+
  theme(plot.background = element_rect(fill = "gray95"))

tech_plot_1

One very important note is the y-axis scales change for each country. Below is what the graphs look like when the scales are fixed, that is, there is a constant y-axis for each country.

tech_plot_2 <- tech_energy %>%
  ggplot(aes(x = year, y = value))+
  xlim(c(1985,2020))+
  geom_point(size = .5, aes(color = label))+
  geom_line(aes(color = label))+
    facet_wrap(~country_name)+
  scale_color_manual(values = park_palette("smoky_mountains2", n = 8))+
  theme_bw()+
  theme(legend.position = "top", legend.justification = "left", legend.title = element_text(face = "bold"), title = element_text(size = 15), legend.text = element_text(color = "black",size = 11), axis.title = element_text(face = "bold"), legend.background = element_rect(fill = "gray95"), axis.text = element_text(color = "black"), strip.text = element_text(color = "black", face = "bold"))+
  guides(color = guide_legend(title.position = "top", title = "Energy Production Type"))+
  ylab("Annual Number of Terrawatt Hours (TWh)")+
  xlab("Year (1985-2020)")+
  ggtitle("Comparing Types of Energy Production by Country")+
  theme(plot.title = element_text(size = 14, face = "bold", hjust = 0.5))+
  labs(caption ="Tidy Tuesday 07-19-2022 | GitHub: @scolando")+
  theme(plot.background = element_rect(fill = "gray95"))

tech_plot_2

Also, my absolute favorite part of this Tidy Tuesday was looking through the National Park palette options. Definitely check out github, there are a myriad of fantastic user-created packages with beautiful color palettes.

Praise (just because)

praise()

## [1] "You are stupendous!"