Hello! This document serves as a record of the work I did as a Trainee and Junior Researcher at EMEA to be used in the future by the ever-growing research team. I hope this resource helps with the continuation of some of the projects and dashboards I worked on and provides clarity about the processes I used. If something is unclear, my door is always open to the EMEA team. Feel free to reach out on LinkedIn or by email ().


Code Management

When I started at EMEA, there was no system for code management. Everyone was working on their personal computers and using different methods for the storage of files and code. There also wasn’t an agreed upon coding language. Most of the senior researchers were using Stata because of access through their universities. I used Stata for one project at the beginning and then switched to R when the license from my master’s expired.

You will find most of my code in two places: The EMEA Microsoft OneDrive & Github

EMEA OneDrive

For data cleaning, data visualization, and other experiments, my code will be in the relevant folders in the Microsoft OneDrive. When I first started, we were using DropBox instead of OneDrive but I migrated the most important items over to OneDrive. I mention this, however, because if you are doing a deep dive on some old project, in particular the TRIS, it may be helpful to look in the EMEA DropBox.

For OneDrive there is an option to operate in your personal OneDrive or the EMEA wide section. For the most part I worked out of My Files, the personal section. However, in case my email no longer becomes available, I will copy my work into the main OneDrive. The main things to note is that as a result of migrating the files, the file paths will be different. So to run a script, you will have to change them. I have linked locations in the EMEA OneDrive throughout this document, but I beleive that you will only be able to access these links when logged into an EMEA email account.

Github

One of the main benefits of using Github is the ability to track changes and share code between a team. I started to used Github at EMEA when I was building the dashboards, mostly because R Shiny apps are easy to break so it is very usual to be able to return to a previous stage where the app is working. I used my personal Github when building the dashboards (@elena-stotts-lee). However, I created an EMEA Github account for code management (@emea-codemanagement) which I have added as a contributor to each of my repositories. This will grant whoever uses it full access and ability to make changes to the repositories. If someone wishes to add their personal account as a contributor, contact me. The log in for the code management Github account is and the password is the same as the desktop computers in the office.


The Dashboards

Of the works I have contributed to EMEA, the thing I imagine others will need to interact with the most are the dashboards. The dashboards I built were meant to be constantly evolving and I would be very happy to see them transform and be improved over time. I do ask that I be kept in the credits of the About sections if you are still using my original Shiny code.

When asked to build the TRIS Dashboard, it was not immediately apparent I would be using Shiny. The original Brain Capital dashboard, built by Sara Ronco, was constructed on a platform called Infogram. I explored this option but quickly realised that it would not be possible to use it with the amount of data we hoped to incorporate into the TRIS. Other options I considered were Tableau (easy to use but not free) and Streamlit (free with Python but a steeper learning curve and less resources for mapping). I landed on R Shiny. Despite have only coded in Stata and Python previously, this was the best option for myself and EMEA. Shiny is free with lots of documentation and resources available and can be easily incorporated into websites.

I mention that I had never coded in R before working in EMEA because I think it really shows it some of my early work. Each dashboard is slightly different from the last because I learned new skills and found more efficient ways of doing things as I grew as a programmer in R. For this reason, I would say that even if you don’t have experience in Shiny now, it is so possible to learn and start building functional apps quickly. In this section I will go through each dashboard and discuss a little bit about how they are structured. Then at the end of the section, I will add any tips, tricks, or resources I think of as I go.

TRIS Dashboard

The TRIS Dashboard is based on a development model created by Rym and Carlo in this paper. I was originally asked to build the TRIS Dashboard to be presented at COP27 in Sharm El Sheikh. For this reason, the starting point for looking at indicators was the monitoring data of the UN SDGs.

Data Collection for the TRIS

The way you download SDG data is a bit cumbersome. This link is where I exported the data. Select your indicators and then all countries and years. If I was a more advanced programmer back when I was doing this, I probably would have looked into using the SDG API.

Following Sandra’s lead, I did the first data cleaning of the SDGs in Stata. The do files of this work are in this folder, but I imagine you will not use it. I later went back and did the data cleaning of ALL SDG data (that was available in 2023) in R in a much more simple way. That R Script is available to you under the GitHub repository TRIS-datacleaning and is called sdg_datacleaning.R. The file path for the goals data as I was using it lead to a folder in the old EMEA DropBox. I have added the data to a new folder in the shared EMEA drive called SDG Goals Download. If you wish to update the TRIS, it should work to download and rename the excel files as I have and run this R script.

After exhausting the indicators available from the UN SDGs, I began to look at other open source data from places like the World Bank, IMF, IRENA…etc. To me, this was data beyond the SDGs so I collected in a folder called, additional_data which I have added to the shared OneDrive here. The script I used to clean the additional_data is available in the TRIS-datacleaning GitHub repository under the name additional_data.R.

In a separate R script, I merge the final output of the additional data (“additional_data.csv”) and the SDG data (“trisvar_long.csv”). I do so in a file entitled merge_for_shiny.R, in the TRIS-datacleaning Github Repository.

Data format of TRIS Dashboard

Now, the TRIS Dashboard was the first Shiny App I ever built. As a result, there are many things about the way I did it that I understand to be inefficient. The most notable example of this is the way in which I import the data into the app. You’ll notice that the exported file at the end of merge_for_shiny.R is a wide data set going to a second version of the TRIS Dashboard (a repository called V2-TRIS-dashboard). It was my intention to rebuild the TRIS Dashboard using the same method you will see soon in the Brain Capital Dashboard, which I will explain in the next section. However, I never had enough time between tasks to execute. This could be a nice warm up exercise for your start at EMEA because I know that Carlo and Rym and interested in having another look at the TRIS and the way it is organized.

So, instead of this nicer more manageable df_wide.csv that I created in the cleaning files, the current TRIS Dashboard calls data from df_long.csv a massive file with 5 columns and over 400,000 rows. I added df_long.csv to the OneDrive here. The file paths in the Shiny app are commented out because in order to publish an app, you just select the files and it requires you to remove the file paths. I am not sure as a result how it will go running the app from your computer. But if there are issues, you will always be able to run the app from the computer at my old desk in Room 4 of the office in Sant Pau.

The columns of df_long.csv are as follows:

  • GeoAreaName - country name, called GeoAreaName because this is typically how countries are labeled in UN Data. I started out with this and then used it in most of my code for the rest of time.
  • TimePeriod - year, also a hang up from UN Data, but I like it because it is distinct.
  • iso3 - a standard 3 letter code, very very useful for merging data, see trick about the countrycode package at the bottom of this section.
  • indicator - the code I gave the indicator of the TRIS, I used this to match with the excel file I will explain shortly.
  • value - the numeric value of the indicator (VERY important that this is an integer, numeric…etc. for mapping with Leaflet to work).

The 5 columns above are the integral part of df_long.csv, however, I needed the data to be connected with a lot of other information in order to make an interactive app that is more than just a drop down menu of indicators. I organized that information in an excel file called titles_pillars_dimensions.xlsx which I have moved to the OneDrive for you to access. In subsequent versions of dashboards, I learned how to connect the information in this type of excel with the data in the app rather than in advance (which is much less computationally expensive). However, for the TRIS, I didn’t know any better but to tack it on to every row of df_long.csv. Hence, the rest of the columns of df_long.csv are:

  • pillar - which of the 4 pillars of the TRIS the indicator belongs to.
  • dimension - which dimension within the pillar the indicator belongs to.
  • bins - this a number between 2 and 9 for how many color bins an indicator needs on the map. 9 is the maximum and will be the number of bins for all continuous values. 2 in the minimum and is typically reserved for dummy variables. Anything else less than 9 matches the number of categories for a categorical variable.
  • MAR - this column isn’t actually in use, MAR stands for missing at random. I was looking into the possibility of imputing the missing data in the TRIS but looking at the maps it was clear that not enough of the data was missing at random.
  • indicatordescription - this is the full name of each indicator displayed on the selection bar rather than the code.
  • unit - the unit of the indicator, to display on the bar.
  • sdgcode - there are 17 Sustainable Development Goals, each with their own subsections. This idenifies the number of the SDG indicators for when I link to the documentation as the source.
  • label - I coded it such that when you hover over each country on the map, the value with a short version of the unit is displayed. These are the shortened unit.
  • y_axis_title - similarly, in the country plots, the Y axis is the unit of the indicator and had to be specified.
  • min and max - the minimum and maximum of the y axis had to be specified in order for the plots to stay uniform as you add more countries. The values are unique to the ranges of each indicator.
  • access_source - for open source international data, the place in which you access data is not necessarily the original sources. This is espeically true for the World Bank and the UN SDGs (important to check the documentation for SDGs). I therefore differentiate between where I accessed the data versus its original source.
  • access_link - the link to the access source so that dashboard users can travel directly to the site.
  • data_source and data_link - the original source of the data and its link.
  • order_id - some of the maps of indicators looks better than others. I created an ID column that matches the order in which I prefer the indicators to appear rather than alphabetical.
  • na_dummy_sum - a description of the number of missing values (I think per indicator per year in this case), I set a condition to not indicator/year combos that have a certain amount of missingness.
  • na_dummy_countrysum - I do a similar thing for missingness per country, per indicator to put a limit on countries mapped on the country plots.
  • na_dummy_countrysum_indicatoravg - I also limit which indicators are included in the country plots overall if there are not enough observations.

The work of creating the df_long.csv with all of this information is in an R script called shiny_dataprep.R available here in the OneDrive.

R Shiny

So! Moving to the code of the actual app. I made another decision here that I have not repeated in any app since where I made my Shiny app out of multiple files rather than a single file. As you may know, to make a Shiny app you open R Studio go to File, New File, and then select Shiny Web App. This is where you are allowed to select if you want a single file or multiple. I chose multiple here which created files global.R, ui.R, and server.R. This proved to be nothing but annoying and I suggest you go for a single file.

Global

In global.R I read in the packages needed for the app. I will explain what some of these are as I go. Then I read in my data df_long.csv and I define another dataframe called world. When building a Shiny app with maps, there are many different ways to geolocate your data. It is especially easy if you already have a shape file, but in this case I have country names and isocodes that I wanted to fit onto a map. The solution I found was to uses the two packages rnaturalearth and rnaturalearthdata. Using ne_countries() from these packages, I define a shape file (using returnclass = "sf") of the countries of the world. I merge it with df_long using the iso3 codes. Note that I select only the column for the iso3 and geometry of the country using select(world, c(iso_a3, geometry)) because there is a lot of information in the world files that we don’t need. Thanks to the geometry column, mapping the indicators is not to difficult from here. I then use st_to_sf() from the package sf to convert all of df_long to a shape file with its new geometry element. See the relevant code here below so you get the picture:

 # relevant packages
library(shiny)
library(tidyverse)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
library(countrycode)
library(s2)
library(leaflet)
library(tigris)
library(htmlwidgets)
library(shinythemes)
library(leafpop)
library(shinyWidgets)
library(ggiraph)

#read in data
df_long <- read.csv("df_long.csv")
world <- ne_countries(scale = "Medium", returnclass = "sf")
world[120, 45] = "KOS"

#merge
df_long <- left_join(df_long, select(world, c(iso_a3, geometry)), by = c("iso3" = "iso_a3"))

#drop any unmatched rows
df_long <- df_long[!(is.na(df_long$pillar)), ]

#convert to sf
df_long <- st_as_sf(df_long)

Note: if any of the code samples look terrible, try widening the window you are using. I had to add the chunks of code in this way so that R wouldn’t try to run them since they are nonsense outside of a Shiny app. I apologize if it is difficult to follow.

User Interface

The two key components of a Shiny app are the UI (User Interface) and the Server. In a Shiny app, the UI defines the layout and elements visible to the user, while the server handles the logic, calculations, and interactions based on user inputs to generate dynamic content. Again, because I selected multiples files for the TRIS app, each of these are a separate file. I explain the UI first.

At the head of the UI is the following code:

 tags$head(
    tags$script(src="js/index.js"),
    tags$link(rel = "icon", type = "image/png", href = "favicon.png"),
  )

The line I wish to draw your attention to is tags$script(src="js/index.js"). This links to a Javascript file which is in a folder called js in the www folder. Almost every Shiny app will require you to make a www folder, especially for images and adding elements in different coding languages. The JS code in the folder is what enabled me to create links between the different tabs on the navbar. If you run the app without it, then the links that say “See the TRIS visualised with our interactive map!” …etc. won’t work.

In Shiny, one of the main things that determines the layout is the type of page you specify. In the TRIS APP and Brain Capital, I used a navbarPage(). See different options in this resource.

 navbarPage("TRIS Dashboard", id = "tris",
           theme = shinythemes::shinytheme("flatly"),
           
           tags$style(type="text/css",
                      ".shiny-output-error { visibility: hidden; }",
                      ".shiny-output-error:before { visibility: hidden; }"
           ))

What I wish to draw your attention to here is the tags$styles(). What is happening here is I am suppressing errors that come up in the app. I do this because in my Leaflet map, when I run it without this, it flashes an error before displaying the map. The error has to do with their being too many values to fit in 9 bins, but it doesn’t stop the map from building. However, I draw your attention to it because if you are building a new app, I would not include this code until it is time to publish so that you can understand what is going on in your app.

For each tab of the app, you define a tabPanel(). In the first one, “About TRIS”, I have the pillars and dimensions displayed just as an image. This was a decision taken to make sure the app was ready for COP27, but if you build another version, I suggest to make this element interactive. As I mentioned before, to add an image to the app, put it as a jpg or png in the www folder and then add it with a line of code like so img(src = 'dimensions.jpg', align = "center", width = "100%"). You can also make images clickable links! If you stick it in a tags$a()! I use tags$div() to enclose all of the content in very text heavy tabs because then you can just write text without using HTML tags to define every line and every paragraph break.

Selecting Indicators

Now, on the Interactive Map tab, a few things to note. I use a sidebar layout, pretty common in Shiny apps. The code for the selecting the indicators, dimensions, year, etc… is as follows:

 selectInput("pillar", "Select a Pillar",
                                    choices = unique(df_long$pillar)),
                        selectInput("dimension", "Select a Dimension", 
                                    choices = "", selected = ""),
                        selectInput("indicatordescription", "Select an Indicator",
                                    choices = "", selected = ""),
                        selectInput("year", "Select a Year", 
                                    choices = "", selected = "")

For the pillars, the choices may be filled because they will always be the same four. However, the dimensions depend on which pillars is selected, the indicator on what dimension is selected, and the year based on which data is available for the indicator. Hence I leave the choices and selected option blank and use this corresponding code in the server:

 shinyServer(function(session, input, output){
    
    observeEvent(
        input$pillar,
        updateSelectInput(session, "dimension", "Select a Dimension", 
                          choices = unique(df_long$dimension[df_long$pillar==input$pillar])))
    
    observeEvent(
        input$dimension,
        updateSelectInput(session, "indicatordescription", "Select an Indicator", 
                          choices = unique(df_long$indicatordescription[df_long$dimension==input$dimension & df_long$pillar==input$pillar])))
    
    observeEvent(
      input$indicatordescription,
      updateSelectInput(session, "year", "Select a Year", 
                        choices = unique(df_long$TimePeriod[df_long$indicatordescription==input$indicatordescription])))
    
    )

In the function that opens the Shiny server, “input, output” will be automatically listed, but it is important to add “session” as well for the code above to work. observeEvent() in Shiny triggers a reactive expression when a specific event, such as a user action or input change, occurs. So for example, in the first block of code I observe what the user has selected for the pillar using input$pillar referring to the id in the UI (the first expression - selectInput("pillar", ...)). Then I updateSelectInput() filtering for the dimensions under that pillar. I do the same observing the dimension and indicator selected. This process is super simple when the data is in a long format with all of the information attached. You will see with Brain Capital and Regional Integration it is a little less intuitive. It is nice that Shiny does all of this updating in real time where the user doesn’t have to press any buttons or anything.

Indicator Descriptives

Below all of the selection, the sidebar displays information about the indicator selected. In the UI, the code is as follows:

h5(tags$b("Indicator")),
textOutput("selected_var"),
tags$br(),

h5(tags$b("Units")),
textOutput("unit"),
tags$br(),

h5(tags$b("Access Source")),
uiOutput("accesssource"),
tags$br(),

h5(tags$b("Data Source")),
uiOutput("datasource")

The key information to note here is that the selected indicator and units are textOutput() and the sources are uiOutput(). The latter are uiOutput() because their content is links. The corresponding server code is:

output$selected_var <- renderText({ 
  paste(unique(d1()$indicatordescription))
})

output$unit <- renderText({ 
  paste(unique(d1()$unit))
})

output$accesssource <- renderUI({ 
  if(unique(d1()$access_source) == "UN SDGs"){
    tagList(unique(d1()$access_source), "( Goal ", a(href = unique(d1()$access_link), unique(d1()$sdgcode), target = "_blank"), ")")
  } else {
    tagList(a(href = unique(d1()$access_link), unique(d1()$access_source), target = "_blank"))
  }
})

output$datasource <- renderUI({
  if (!(is.na(unique(d1()$data_link)))){
    tagList(a(href = unique(d1()$data_link), unique(d1()$data_source), target = "_blank"))
  } else {
    paste("Not Applicable")
  }
})

For the unit and the selected indicator, displaying the text is a simple task of using renderText() and pasting the unique value in the selected data (I will explain in a moment what d1() is). The for the sources, I use an ifelse for the first to have a specific method to link for the SDGs. Then the second is an ifelse for if there is another original source or not.

The Reactive Dataframe

So, d1() is a reactive dataframe that is the result of the selections made by the user. Reactives in Shiny are objects that automatically update when their dependencies change, allowing for dynamic and responsive behavior in the application based on user inputs or other reactive values. To call a reactive, you always put the name and curved brackets (“()”). Again, using a long dataframe makes the process of defining the reactive very simple. I do so as follows:

 d1 <- reactive({
      d <- df_long %>% filter(TimePeriod == input$year & indicatordescription == input$indicatordescription & !(df_long$GeoAreaName == "World"))
      return(d)
    })

Interactive Map using Leaflet

The reactive serves as the data input for the interactive map. To build the map, I used Leaflet. This website has most of the basic documentation for Leaflet you could possibly want. Since I built this app before ChatGPT came out, I used resources like this all the time. The code for my map, which I used in all other future dashboards is:

output$map <- renderLeaflet({
      
      labels <- sprintf(
        "<strong>%s</strong><br/>%g %s",
        d1()$GeoAreaName, d1()$value, unique(d1()$label)) %>%
        lapply(htmltools::HTML)
      
      if(input$pillar == "P1: Transparent governance"){
        colpal <- colorBin(palette = "YlGnBu", bins = unique(d1()$bins), domain = d1()$value)
      }
      else if(input$pillar == "P2: Responsible living"){
        colpal <- colorBin(palette = "BuPu", bins = unique(d1()$bins), domain = d1()$value)
      }
      else if(input$pillar == "P3: Inclusive economies"){
        colpal <- colorBin(palette = "YlOrRd", bins = unique(d1()$bins), domain = d1()$value)
      }
      else if(input$pillar == "P4: Sustainable Energy and Environment"){
        colpal <- colorBin(palette = "YlGn", bins = unique(d1()$bins), domain = d1()$value)
      }
      
      d1() %>%
        st_transform(crs = "+init=epsg:4326") %>%
        leaflet() %>%
          addProviderTiles(provider = "Esri.NatGeoWorldMap") %>%
          addPolygons(label = labels,
                      layerId = ~GeoAreaName,
                      stroke = TRUE,
                      weight = 0.5,
                      color = "black",
                      smoothFactor = 0.5,
                      opacity = 1,
                      fillOpacity = 1,
                      fillColor = ~ colpal(d1()$value),
                      #popup = popupGraph(popupPlot, type = "svg"),
                      highlightOptions = highlightOptions(weight = 5,
                                                          fillOpacity = 1,
                                                          color = "blue",
                                                          opacity = 0,
                                                          bringToFront = TRUE)) %>%
          addLegend("topright",
                    pal = colpal,
                    values = ~ d1()$value,
                    title = "Legend",
                    opacity = 1) %>%
          setView(lng = 18.1, lat = 34.5, zoom = 2.5) %>%
          addEasyButton(easyButton(
                    icon="fa-globe", title="Worldwide View",
                    onClick=JS("function(btn, map){ map.setZoom(1.5); }"))) 
          
    })

Note that all of this code I have described is encompassed by renderLeaflet({}). Inside I first define the text for the labels using sprintf(). Figuring out the right way to write "&lt;strong&gt;%s&lt;/strong&gt;&lt;br/&gt;%g %s" to display what I want took ages, so tread carefully if you want to change it to meet your needs. Next I define a color pallet using colorbrewer. This website is helpful for selecting palettes, the options with colorbrewer are limited but effective. I condition based on the pillar selected and have a different color palette for each.

Then, to make the map, I name my data source, in this case the reactive d1(). st_transform(crs = "+init=epsg:4326") in Leaflet reprojects spatial data to the WGS 84 coordinate reference system (EPSG:4326), which is commonly used for web mapping, so you won’t typically change this between different maps. addProviderTiles() refers to the background of the map. Here is a very nice GitHub app which displays all of the different options. Not all of them work. Not all of them work, but some wild ones do like Thunderforest_SpinalMap. Next we add information to the map. In this case, we addPolygons() to create a choropleth map! See the Leaflet documentation on choropleths here. Other ways to add information to the map can be addCircles(), addTiles(), and more. In this case, within addPolygons(), I customize. I add the labels defined previously, the names of countries using layerId = ~GeoAreaName, the color palette based on the value with fillColor = ~ colpal(d1()$value), and so on. Then I proceed to addLegend(), which will reflect the number of bins I defined in the color palettes. After I setView() to set a default amount of zoom centering specific coordinates (I picked the Mediterranean since we are EMEA) and lastly add some buttons with a bit of Java script to let the user return to world view from any level of zoom.

Country Plots using ggiraf

One feature that appears only in the TRIS Dashboard are the Country Plots. Rym requested that I plot the indicators over time to show how they have changed. I wanted to make the plots interactive, specifically so that you hover over the points and see the values. There are several potential ways of building plots like this (including using plotly as I did in the Regional Integration Matrix Dashboard). However, in this case (probably because I built this app before ChatGPT was released), I went for the ggiraf package.

To do the selection for the country plots (of pillar, dimension, and indicator), I repeat the same steps as the interactive map with different ID names. Knowing what I know now, this would be a great case for modules so you aren’t repeating the same code.

The code for the plot itself is as follows:

 output$countryplots <- renderGirafe({
      ylim_max <- c1()$max[1]
      ylim_min <- c1()$min[1]
      
      caption_text <- if(unique(c1()$access_source) == "UN SDGs"){
        paste0("Source: UN SDGS (Goal ", unique(c1()$sdgcode), ")")
      } else {
        paste0("Source: ", unique(c1()$access_source))
      }
      
      myplot = ggplot(c1(), aes(TimePeriod, value, color = Country,
        tooltip = round(value, digits = 2), data_id = value)) +
        geom_line(size = 0.8) + 
        geom_point_interactive(size = 1.5) +
        theme_light() +
        theme(legend.position = "bottom",
              legend.background = element_rect(fill="#ebebeb", 
                                               size=0.5, linetype="solid"),
              plot.title = element_text(face = "bold"),
              axis.title = element_text(face = "bold"),
              #text = element_text(family = "sans"),
              plot.caption = element_text(hjust = 0.5, face = "italic")) +
        xlab("Year") +
        ylab(str_wrap(unique(c1()$y_axis_title), width = 48)) + 
        ylim(ylim_min, ylim_max) +
        labs(title = str_wrap(unique(c1()$indicatordescription),
        width = 59), caption = caption_text) +
        guides(fill=guide_legend(title="Country"))
      
      girafe(ggobj = myplot,
             options = list(opts_selection(type = "none")))
    })

Within renderGirafe I set the min and max value of the Y axis. Then I created the plot I wanted in ggplot using the reactive data c1(). Then you close with girafe(ggobj = myplot, options = list(opts_selection(type = "none"))) and you have your interactive plot! Such a simple way to make a ggplot more dynamic in the app. I think it is great.

Final Notes

It is likely that during you work, Rym will ask you to update the TRIS Dashboard. Carlo has been working on connecting the concepts to regenerative economy and wishes to update the indicators in the TRIS accordingly. I suppose you have the choice of continuing with this app which is a bit inefficient with the long format data but functional, or trying to build a new app (or debug the one I started here) with wide format data. Either way, I kindly ask that if you are using an app that has my code that you keep my name somewhere in the credits!

Brain Capital Dashboard

Alright! The next dashboard I built after the TRIS was the Brain Capital Dashboard. The first version I constructed was modeled very closely after the TRIS, with the data in long format. The code for that app is available here on Github, but I doubt you will need to refer to it at any point.

Data collection

The data cleaning for brain capital also transpired in a very similar vein to the TRIS. The R Scripts for data cleaning are in a GitHub repository called BrainCapital-datacleaning. Sara had done some previous data collection, so for the first dashboard I used her data with additions. However, the data was in an excel file with each indicator on a different sheet in a different format. So for the second version, I recollected all of the data.

The code for cleaning the data is in this file entitled bc_datawrangling.R. The data is called from a folder called bc_data which I have moved here in the OneDrive for you. You will have to change the file path if you want to run the code. It is a bit better organized than the TRIS one with the pillars and dimensions being in order. I merge the data in long format. I save the long format data for V1 of the BC Dashboard and then reshape the data wide for V2 of the dashboard.

Importing wide data

For V2 of the dashboard I separately import df_wide.csv from the data cleaning file and BC_titles.xlsx an excel sheet the same as the TRIS that contains information about the indicators. In the UI, the map and the selection is essentially the same. In the server, the first difference appears in the updateSelectInput().
observeEvent(
    input$pillar,
    updateSelectInput(session, "dimension", "Select a Dimension", 
                      choices = unique(titles$dimension[titles$pillar==input$pillar])))
  
observeEvent(
    input$dimension,
    updateSelectInput(session, "indicatordescription", "Select an Indicator", 
                      choices = unique(titles$indicatordescription[titles$dimension==input$dimension & titles$pillar==input$pillar])))

The code is the same but now the selection is happening from the data frame titles which has all of the information about the indicators and dimensions. Where it gets a little bit complicated is for displaying on the years available for each indicator. You could just have all of the years and have a warning that says “No Data,” but I didn’t like the idea of that. How I deal with it is to make a reactive for the selected indicator like so:

indicator_sel <- reactive({
    #store indicator selected
    ind <- unique(titles$variable[titles$indicatordescription == input$indicatordescription])
    
    #make a list of what to keep from df_wide
    keep <- append(ind, df_wide_colnames)
    
    #filter df_wide based on column names
    df_selected <- df_wide %>% 
      select(any_of(keep)) %>%
      na.omit()
    #using na.omit() filters for years available
    
    #make a column for the indicator name (to later merge in titles)
    df_selected$variable <- names(df_selected)[1]
    
    return(df_selected)

observeEvent(
    input$indicatordescription,
    updateSelectInput(session, "year", "Select a Year", 
                      choices = unique(indicator_sel()$TimePeriod)))
  })

Note the comments for the steps, I hope it makes it clear what I do! At the top of the app, I define a vector called df_wide_colnames with the key variables needed for mapping besides the value (“GeoAreaName”, “iso3”, “TimePeriod”, “label_year”, “geometry”). I filter the wide data frame down to only these 5 columns and the indicator selected. I omit rows with NA values so that I am only left with those who have data observed for the indicator selected. I make an extra column with the code for the indicator based on the column name so that I can later merge with the titles file I kept separate. Then I am able to updateSelectInput() for the year with the indicator_sel() as the data frame

Since some filtering already took place in indicator_sel(), the main reactive for the interactive map looks slightly different.

d1 <- reactive({
    #filter for selected year
    df_formap <- indicator_sel() %>%
      filter(TimePeriod == input$year) %>%
      rename(value = 1)
    
    #merge in titles
    df_formap <- left_join(df_formap, titles, by = "variable")
    
    #make full label for hovering over countries
    df_formap$label[is.na(df_formap$label)] <- ""
    df_formap$full_label <- paste(df_formap$label, df_formap$label_year)
    
    return(df_formap)
  })

I filter for the year selected and rename the column with data of the indicator selected “value.” This makes the input into the leaflet more standard rather than a new column name for every indicator. I merge the information from the titles file in using the column I created in indicator_sel(). Then I generate the label for hovering over countries! In this dashboard it includes the year in parentheses.

User Interface Improvements

Contrary to the TRIS Dashboard, the first page of the BCD is more interactive.I use bsCollapse() to add drop down descriptions of the dimensions. I also use fluidRow() to create columns of text. For some reason the columns you have must add up to 12. So you end up with something like:

fluidRow(
             column(4,
                    hr(),
                    h2(tags$b("Pillar 2: Brain Health")),
                    hr(),
             ),
             
             column(8,
                    tags$div(
                      style="text-align: justify;",
                      
                      h4("Brain health means...."),
                      tags$br())))

The rest of the app is basically the same! But it is overall much more efficient because it is not trying to process the super heavy long data frame when you open the app.

Regional Integration Matrix Dashboard

The final dashboard I built to completion is the Regional Integration Matrix (RIM) Dashboard. Similar to the others, there were 2 versions of this work. The RIM comes from a 2020 study conducted by Rym and several colleagues where they define several dimensions and indicators to measure regional integration in the Mediterranean.

Version 1

For the 2020 study, the data was collected by a team at E3 modelling. Different from the other dashboards at EMEA, the team used their specialization in Macro to write equations to measure different concepts. The data from this effort is available in an excel file I have added to the OneDrive here. All variables starting with “BV_” are the base variables. These are collected from mostly open source locations like the World Bank and the IMF. However, there is some data, specifically from EORA and GTAP, that was purchased. In the excel there is a sheet called “Base_Variables” that has the code of the variable, the description, the unit, the link to the source, and whether the data is publically available.

The base variables were then use to construct the indicators. The indicators are described in a sheet just after “Base_Variables” in one called indicators “Indicators.” This sheet is key because it gives the equation in column E of how to contruct each indicator.

The E3 team collected all of data for the Base Variables (noteably, not for all countries in the world, just the ones relevant to the study) and make a sheet for each. Then in Excel, they conducted all the construction of the indicators. This was a fine thing to do, but made it a serious undertaking when I needed to get the data in a workable format for R where every indicator is on a different sheet surrounded by noise (i.e. words that aren’t data). I used a bunch of strategies I found on the internet and wrangled the data into a usable format for the first version of the dashboard in this file. I then built the first app, only with the data from the 2020 study. The code for that app is available here in Github.

Updating the 2020 Study

After building V1, Rym asked me to try and update the RIM concept with current data and a larger coverage area of Euro-Med-Africa. In my desk research on how to measure regional integration, I gained a hyper-fixation with using bilateral data to measure integration. In the 2020 study, the only region considered was the Union for the Mediterranean. I wanted the flexibility to define any number of regions and study their level of integration. Bilateral data provided that and I fell down a bit of a rabbit hole. In the following sections, I am not going to explain my methodology in very close detail because that is available in the published RIM Dashboard app. I also made this Powerpoint for the EMANES conference explaining what I did. Instead I will focus on the code and how to continue this work.

Defining the Regions

As mentioned, to update the study, I wanted to be able to define regions in a flexible manner and find a way to measure the level of integration within them. I looked at both geographic and institutional regions. I was curious to investigate whether countries are more closely linked to their physical neighbors versus countries that they have entered an institutional consortium with, but I never had time. The geographic regions I sources from the UN geoscheme. The institutional regions in Africa I pulled from the African Regional Integration Index.

I organized the regions I wanted in this Excel file, which is now called regions_old.xlsx. I rather stupidly put the geographic regions in one column and the other each in their own column as a dummy variable. I gave each region a 4 letter code that can be seen in this file the RI Codebook. Then, in order to run my code through all of the regions, I made a file with just the isocodes and a comma separated list of the regions each country belongs to. That file became regions.xlsx here and the code for how I transformed regions_old.xlsx into that format is here in the GitHub called regions.R.

In the time I had, I only managed to make a measure of integration within a region. However, it was part of Rym and I’s original ambition to look at intergration between different regions.

Preparing the Data

Now, here is where things start to get a little complicated. Because of the methodology I used, I needed to apply an equation to every region, in every year, for every indicator. This was necessary, not only to create the scores, but also in the case of bilateral data, the values to display in the map. Hence, I started building some crazy loops (with some help from ChatGPT). The code is decently commented relative to the apps I have built (sorry), so if you look at the files in the folder Composite Indicator Scripts in GitHub, I hope you will be able to follow the process. I will say, that the process differs in each script along the lines of whether data is bilateral or not.

Since the process was so hands on and required different things for each format of data imported. I have a different file for each of the different imports. Some scripts take much longer to run than others. Here is a brief guide as to what each of the files is in Composite Indicator Scripts (Note: I say B for bilateral and NB for non-bilateral):

  • airp.R - NB, an infrasture indicator measuring air transport that I think I didn’t use in the end. I desparately wanted B flight data because I think it would be quite representative of integration, but you have to pay for it.
  • bits.R - B, I tried to webscrape this data from the the UNCTAD website but was unsuccessful. I ended up putting into an excel by hand (would not recommend, not fun) but I thought it was a good indicator. As a result the cleaning is a bit more arduous than others. And I download the data file for posterity.
  • fdi.R - B, one of my good friends doing a PhD sent me this data! For some strange reason, there is not good open source bilateral FDI data. I would have thought it very important to have. My friend email UNCTAD and they gave him this, but it isn’t on their website.
  • hofi.R - NB, this was the first NB script I wrote and it was wrong somehow so I made a new one.
  • hofi_new.R - NB, these cultural proximity indicators are very interesting I think. The data coverage is not amazing, but I thought this was an interesting dimension to have. It definitely should be developed further.
  • imfdata.R - NB, part of building a new dashboard was that I had to recollect all the old data from the 2020 study to get more recent years and to make sure all countries were included. In principle, it seems like it would be fine to only have the data from the region. However, for the Bilateral indicators, I needed to measure a countries total activity to compare it to their activity within the region.
  • intu.R - NB, internet infrastructure, but I included this in the world bank incators in the end.
  • migr.R - B, migration flows. This is UN data but downloaded from Our World in Data which offers it in a much cleaner format.
  • regions.R - This file was discussed earlier, it is how I got the regions data in a suitable format for my loops.
  • remittances.R - B, from the Bilateral Remittance Matrix. Nice data but annoying it isn’t available for every year.
  • rim_regintscores.R - NB, in most of the files I generate the regional integration scores as the final step. However, for the massive amount of data I collected to replicate the 2020 study, I did the scores in this separate file.
  • stmo.R - B, the NB version of this was in the original study, but I found bilateral version at UNESCO and went for this. I think it is a nice addition to the movement of people dimension.
  • trade.R - B, there are so many places to get bilateral trade data. I picked a slightly easier to work with version. If I had more time, I would have explore the UN Comtrade database and looked at different types of trade. This file is actually old, see the other two below for the ones I included in the dashboard.
  • trade_exports.R and trade_imports.R - B, I wanted to have flow in each direction so I work with the trade data on two separate files.
  • travelvisas.R - B, fun data I found from Oxford about level of travel visas needed. Only issue is that it stops after a certain number of years when they finished the project.
  • undata.R - NB, same as the IMF data. I recollected data from the 2020 study and clean it, in this case from the UN.
  • worldbankdata - NB, same as above, but nice thing to notes is that here I used the World Bank API in R where I can. I made a video once explaining how to use it to colleagues, ask someone to forward the email if needed but you should be able to figure it out from the documentation!
  • updated_basevariables.R - NB, after recollecting and cleaning all the open source data from the previous study, I merge it in this file. The display data is organized here but the loop is applied in the scores file above.

Version 2 Shiny

Here is the Github repository of the dashboard and here is the published dashboard on the Open Knowledge Platform. At this point, I think I was getting tired of building the same app over again. So I tried to learn and apply some new skills. As such, the code of this app looks somewhat different. The main difference is the use of modules.

Modules

The benefit of module is the ability to reuse the same code in different locations of the same app. When you are coding, ideally you don’t want to repeat anything that you have already written. You want to use loops and other tools to be as concise as possible.

The reason I wanted to use modules is because I wanted to have separate tabs within each page for the two different types of regions. For example, in the place of the app where there is a drop down to regions, I didn’t want the use to scroll through a massive list with no distinction for geographic and institutional regions. There are a few ways I could have achieved this, like with some radio buttons or a slider between the two options, but I liked these tabs.

For each module, you define a ui and a server, sort of like it is a mini Shiny app in its own right. Then when you want to use it, you call it into you general app. You can call a module more than once with different parameters like you would a function. There are a lot of resources about how to use modules, so I won’t go into too much detail, but for me the most important thing to remember is to put ns() around the inputs of the UI.

I wish I had more time to write and explain how this app works because it is probably the one I am most proud of. I propose that if it because relevant for your work, write me an email and we can do a walk through over a call!

Potential Areas for Improvement

  1. I really wanted to create an environmental performance dimension but I didn’t have time. Using variable like the Environmental Protection Index.
  2. The cultural proximity dimension could use further indicators and thought.
  3. As I mentioned before, trying to measure integration between regions in addition to within.
  4. Just a general hard look at the methodology, I really think it can be improved in many ways. Consider the results. Do they actually make sense? I write a bit of analysis after the results of the composite indicators, but it is concerning to me that this methods yields the most integration in the largest geographic areas. When intuitive, we know that the EU for example is very integrated.
  5. A more advanced measure of convergence for the NB indicators. If even not advance, a more explainable one. I think what I did makes sense to me but I found it really difficult to explain to others.

Bank Business Model Dashboard

The last dashboard I was meant to building for EMEA was for the BBM, but I got caught writing a paper on Energy that took all of my time. The nice thing about this project is that all of the data is already collected and clean. Doriana organized it and I have it in the GitHub folder under the name bbm_clean.csv here. I built the shell of an app and I started adding a few potential visualizations. It is here in the Github. Really this could be an excellent first endevour for you to develop new skills in Shiny and design your own thing rather than having to deal with updating the apps I have built. I suggested this type of dashboard from the Shiny gallery to Doriana for inspiration, but really it is your project!

R Shiny Tips and Tricks

Countrycode Package

When working with country data, it is very messy to try and merge based on country name alone. Sometimes countries will have “Republic of” in some cases or others not. Also over time country names can change and you will have issues with matching (e.g. Eswatini, Czechia, Myanmar…etc.). So I prefer to use iso3 codes for matching. However, not all data comes with these codes. The package in R countrycodes is a really great work around for this issue.

 df$iso3 <- countrycode(df$GeoAreaName, "country.name", "iso3c", warn = T)
unmatched_names <- c("UNMATCHED COUNTRY NAME 1", "UNMATCHED COUNTRY NAME 2")
right_iso <- c("ISOCODE OF UNMATCHED COUNTRY 1","ISOCODE OF UNMATCHED COUNTRY 2")
for (i in seq_along(unmatched_names)){
  df$iso3[gsci2021$GeoAreaName == unmatched_names[i]] <- right_iso[i]
}

The first line of code creates a new column of isocodes, matching the variable for countries names (in this case df$GeoAreaName) with iso3c. The warn = T gives a list of unmatched country names. Often these will be international groupings like MENA and LDS, but there are some frequent unmatched ones like “Türkiye.” So below I define two vectors, one for the unmatched countries and the other for their isocodes. Then the loop replaces the isocode value of the missings.

Publishing your Shiny App The way in which I published the Shiny apps was using shinyapps.io. I am realising, however, as I finish this document that I connected these publications with my personal Github account so I can’t just put the username and password here. I suggest that if you urgently need to update/republish any of the apps, contact me. Otherwise, you can make an account and publish it from scratch. Then you would just have to contact Nektar/the IT team and they will embed the Shiny app in the EMEA website.

General Handover Information

Brain Capital

In the Brain Capital Handover folder there are a few things I wish to explain. Here is what the folders within the folder are:

  • bc_data - the data for the Brain Capital Dashboard as I explained above.
  • Brookings Paper - some drafts from the paper on the dashboard that we wrote with Carol Graham for Brookings. Here is the published paper.
  • Dana Foundation - application for a grant for a project on impact investing and Brain Capital. We didn’t get it. But useful to have the text in case.
  • Data and Dashboards extras - I was having some issues when I worked from home with running the data cleaning file. So there are some extras in there as well as some documents from when I was searching for data.
  • Lundbeck Proposal - Lundbeck is one of the sponsors of EMEA’s work in Brain Capital. We were successful in the final proposal. Discuss this with Harris, but you may have to schedule meetings with our contact there Isabel Cerda Marcos (). It is important to make some progress towards the proposal by the end of 2024 and keep in touch with them about progress.
  • Presentations - I made a series of presentations for Rym on the topic of Brain Capital. It is very likely she will ask you to adapt these at some stage for an event so now you know where they are!
  • Working Group Pillar Definitions - in 2023, Sara and I did a big push to provide definitions for the pillars and dimensions of the Brain Capital concept. We got feedback from the workgroup and eventually turned our findings into the Brookings paper. This folder contains the steps of that effort. The final definitions are on the landing page of the BC Dashboard.

Energy Paper for Bertelsmann Stiftung

Just a quick note that Sara and I wrote a paper on energy for the German think tank Bertelsmann Stiftung. All of the drafts and notes we took for it are here in the OneDrive. This work is closed but I mention it because we really had to do a lot of research to execute it so if you need some resources regarding the energy transition in the Southern Med or EU external policy/finacial mechanisms this could be a helpful resource. I have especially found the Workshop Minutes that I put together from the event we had with Bertelsmann Stiftung in July 2023 have a lot of insights.

Conclusions and well wishes

I hope that you have found this document useful! Please really feel free to contact me if you need help. Good luck!