Wednesday talks, Dynamic plots in R

Author

Andrei Pyko

Published

November 5, 2024

In this session, we will talk about why and how we can use dynamic plots in our work. Dynamic plots help researchers look at complex data interactively, making it easier to find relationships, patterns, and unusual data points that might be missed in regular, static charts. This interactivity is especially helpful in epidemiology, where being able to filter, zoom, and add different data layers can give us better insights into how health events are spread out and how they change over time.

Interactive tools also help us communicate our findings more effectively. They make data more engaging for different audiences, like our supervisors, co-authors, and colleagues. By letting people interact with the data, these tools help us explain the details of our findings more clearly.

The outline of our discussion will include:

  1. GIS Work in R: Converting coordinates data into GIS objects using the sf library, and understanding how to visualize data in both simple and advanced formats.

  2. GIS Data Visualization: Exploring different methods of visualizing GIS data, ranging from static plots to dynamic maps using the tmap library, including adding overlays and custom map settings.

  3. Advanced Visualization Techniques: Using interactive maps with pop-ups to provide detailed contextual information for mapped data points.

  4. Interactive Plots Using Plotly: Creating interactive visualizations for non-GIS data, such as histograms, to enhance data exploration and presentation.

  5. Practical Example of Interactive Plot: Analyzing noise annoyance data using dynamic plots to visualize exposure-response relationships and better understand the impacts of environmental noise on health outcomes.

1. Load necessary libraries and data

2. We convert data to sf (GIS) object in R

Using sf library and command st_as_sf we can convert the data to GIS object. For it we need to know which columns contain the coordinates and the coordinate reference system (CRS) of the data. For Swedish data most common CRS are 3006, 3021 and 4326, however, could be others, therefore it is always good to visualize data after convertion as well as check it with technical documents.

Routing checks and plotting

Code
library(raster)
library(haven)
library(sf)
library(dplyr)
library(ggplot2)



#lets convert mhe23_analys to sf

mhe23_analys_sf<-st_as_sf(mhe23_analys, coords = c("xkoord","ykoord"), crs=3006, remove=FALSE)
    # coords = c("xkoord","ykoord") - the columns in the data frame that contain the coordinates
    # X and Y coordinates
    # crs=3006 - the coordinate reference system (CRS) of the data
    # remove=FALSE - whether to remove the columns used for coordinates from the data frame

plot(st_geometry(mhe23_analys_sf))

Practical tip: As you can see from the illustration above, coordinate columns could be often swapped in the dataset i.e. x and y are actual y and x. Therefore, after converting coordinates to GIS object, it is recommended to check the data visually.

With st_geometry command we can extract the geometry column from the sf object and plot it. Otherwise, if we use plot command directly on the sf object, it will plot all columns in the dataset i.e. many plots on one figure (is not what we want and it takes too long).

Code
#lets convert mhe23_analys to sf

mhe23_analys_sf<-st_as_sf(mhe23_analys, coords = c("ykoord","xkoord"), crs=3006, remove=FALSE)
    # coords = c("xkoord","ykoord") - the columns in the data frame that contain the coordinates
    # X and Y coordinates
    # crs=3006 - the coordinate reference system (CRS) of the data
    # remove=FALSE - whether to remove the columns used for coordinates from the data frame

plot(st_geometry(mhe23_analys_sf)) # 

GIS Data visualization (static)

Instead of using simple plotting some additional dimensions of data could be visualized. In contrast to previsou plot command where we used st_geometry to extract the geometry column, we will select necessary columns using dplyr::select first and then call plot command.

To speed up all vilusaisations below we focus on Stockholm County only i.e. filter(lan==1) `

Code
mhe23_analys_sf %>%
  filter(lan==1) %>%
  dplyr::select(kon, lan) %>% # select the columns to visualize
  plot()

GIS data visualisation (advanced)

Instead of simple plotting we cold use tmap library to create more fancy maps.

Code
# remotes::install_github('r-tmap/tmap')
library(tmap)

mhe23_analys_sf %>%
  filter(lan==1) %>%
  dplyr::select(lan) %>% # select the 'lan' column
  tm_shape() +
  tm_dots("lan", title = "Lan") + # use 'lan' as fill color for points
  tm_layout(legend.position = c("right", "center")) # set legend position

Moreover, by configurating tmap we can create interactive maps and add overlay over the map like Open Streen Map.

Code
# tmap_mode("view") # Set tmap to interactive mode

mhe23_analys_sf %>%
  # filter(lan==1) %>%
  dplyr::select(lan) %>% # select the 'lan' column
  tm_shape() +
  tm_dots("lan", title = "Lan") + # use 'lan' as fill color for points
  tm_basemap(server = "OpenStreetMap") + # Add OSM basemap
  tm_layout(legend.position = c("right", "center")) # set legend position

We can conditionally can configurate the map so that it starts at certain level of zoom and center at certain coordinates (by using tm_view options).

Code
mhe23_analys_sf %>%
  filter(lan==1) %>%
  dplyr::select(lan) %>% # select the 'lan' column
  tm_shape() +
  tm_dots("lan", title = "Lan") + # use 'lan' as fill color for points
  tm_basemap(server = "OpenStreetMap") + # Add OSM basemap
  tm_view(set.view = c(lon = 18.0686, lat = 59.3293, zoom = 12)) + # Set initial view over Stockholm with city-level zoom
  tm_layout(legend.position = c("right", "center")) # set legend position

And for exploration purposes we can configurate popups windows for the points on the map. popup.vars = c("Lan" = "lan", "Gender" = "kon", "Age" = "alder")

Code
tmap_mode("view") # Set tmap to interactive mode
mhe23_analys_sf %>%
  filter(lan==1) %>%
  dplyr::select(lan, kon, alder) %>% # select the 'lan' column
  tm_shape() +
  tm_dots("lan", title = "Lan",  # use 'lan' as fill color for points
          popup.vars = c("Lan" = "lan", "Gender" = "kon", "Age" = "alder")) +
  tm_basemap(server = "OpenStreetMap") + # Add OSM basemap
  tm_view(set.view = c(lon = 18.0686, lat = 59.3293, zoom = 12)) + # Set initial view over Stockholm with city-level zoom
  tm_layout(legend.position = c("right", "center")) # set legend position
Code
library(tmap)

tmap_mode("view") # Set tmap to interactive mode

mhe23_analys_sf %>%
  filter(lan==1) %>%
  dplyr::select(lan, kon, alder) %>% # select the 'lan' column
  tm_shape() +
  tm_dots("alder", title = "Age",  # use 'lan' as fill color for points
          popup.vars = c("Lan" = "lan", "Gender" = "kon", "Age" = "alder")) +
  tm_basemap(server = "OpenStreetMap") + # Add OSM basemap
  tm_view(set.view = c(lon = 18.0686, lat = 59.3293, zoom = 12)) + # Set initial view over Stockholm with city-level zoom
  tm_layout(legend.position = c("right", "center")) # set legend position

Interactive plots on non-GIS data

We can plot data in interactive way using plotly library. Below is an example of interactive histogram. That might be useful for data exploration and presentation especially when more than one dimension of data is presented.

Code
library(plotly)
mhe23_analys %>% # add ggplot histogram
  ggplot(aes(x = alder)) +
  geom_histogram() +
  theme_minimal() -> p
p

Code
p %>% plotly::ggplotly()

practical example of interactive plot

The code below is intended for analyzing noise annoyance data. Function generate_f43_indicator loads data (dt_all) with raster noise values and calculates the annoyance rate for different noise aspects (f43a to f43j) within specific noise levels. The user can dynamically set the bin width (binwidth) to group the noise data.

The annoyance rates are calculated for various aspects such as difficulty in listening to radio/TV, talking on the phone, and resting, among others. Finally, the processed data is used to create an exposure-response graph for these different types of noise annoyance indicators, visualizing the relationship between noise exposure levels and the percentage of people annoyed. The code also integrates the usage of the plotly library to make the graph interactive and includes saving and reading intermediate data.

The labels have been translated into English for easier understanding.

Code
# Function to dynamically adjust the breaks and recalculate annoyance rates for f43a to f43j
generate_f43_indicator <- function(binwidth=2) {
  max_value <- max(dt_all$raster_noise_value, na.rm = TRUE)
  # Recut the noise bins based on the dynamic binwidth
  dt_all$noise_bins <- cut(
    dt_all$raster_noise_value,
    breaks = seq(20, max_value, by = binwidth),
    include.lowest = TRUE,
    right = FALSE
  )
  # Recalculate annoyance percentages for each f43 variable
  percent_f43a <- dt_all %>%
    group_by(noise_bins) %>%
    summarize(percent = mean(ifelse(f43a == 1, 1, 0), na.rm = TRUE) * 100, count = n())
  percent_f43b <- dt_all %>%
    group_by(noise_bins) %>%
    summarize(percent = mean(ifelse(f43b == 1, 1, 0), na.rm = TRUE) * 100, count = n())
  percent_f43c <- dt_all %>%
    group_by(noise_bins) %>%
    summarize(percent = mean(ifelse(f43c == 1, 1, 0), na.rm = TRUE) * 100, count = n())
  percent_f43d <- dt_all %>%
    group_by(noise_bins) %>%
    summarize(percent = mean(ifelse(f43d == 1, 1, 0), na.rm = TRUE) * 100, count = n())
  percent_f43e <- dt_all %>%
    group_by(noise_bins) %>%
    summarize(percent = mean(ifelse(f43e == 1, 1, 0), na.rm = TRUE) * 100, count = n())
  percent_f43f <- dt_all %>%
    group_by(noise_bins) %>%
    summarize(percent = mean(ifelse(f43f == 1, 1, 0), na.rm = TRUE) * 100, count = n())
  percent_f43g <- dt_all %>%
    group_by(noise_bins) %>%
    summarize(percent = mean(ifelse(f43g == 1, 1, 0), na.rm = TRUE) * 100, count = n())
  percent_f43h <- dt_all %>%
    group_by(noise_bins) %>%
    summarize(percent = mean(ifelse(f43h == 1, 1, 0), na.rm = TRUE) * 100, count = n())
  percent_f43i <- dt_all %>%
    group_by(noise_bins) %>%
    summarize(percent = mean(ifelse(f43i == 1, 1, 0), na.rm = TRUE) * 100, count = n())
  percent_f43j <- dt_all %>%
    group_by(noise_bins) %>%
    summarize(percent = mean(ifelse(f43j == 1, 1, 0), na.rm = TRUE) * 100, count = n())
    # Combine all annoyance data into one dataset for plotting
  indicator_combined <- bind_rows(
    percent_f43a %>% mutate(source = "F43A - Difficulty listening to radio/TV"),
    percent_f43b %>% mutate(source = "F43B - Difficulty talking on the phone"),
    percent_f43c %>% mutate(source = "F43C - Difficulty talking to someone"),
    percent_f43d %>% mutate(source = "F43D - Difficulty resting"),
    percent_f43e %>% mutate(source = "F43E - Difficulty falling asleep"),
    percent_f43f %>% mutate(source = "F43F - Being woken up"),
    percent_f43g %>% mutate(source = "F43G - Reduced sleep quality"),
    percent_f43h %>% mutate(source = "F43H - Difficulty keeping window open during the day"),
    percent_f43i %>% mutate(source = "F43I - Difficulty sleeping with open window"),
    percent_f43j %>% mutate(source = "F43J - Difficulty being outside on balcony/patio")
  ) %>%
    st_drop_geometry()  # Drop geometry column if present
  # Create a mid_value to represent the midpoint of the noise bin
  indicator_combined <- indicator_combined %>%
    mutate(
      lower_bound = as.numeric(gsub("\\[|\\)|\\,.*", "", noise_bins)),  # Lower bound
      upper_bound = as.numeric(gsub(".*\\,|\\)", "", noise_bins)),      # Upper bound
      mid_value = (lower_bound + upper_bound) / 2 ,                     # Mid-point of bin
      # keep only one decimal with percent
      percent = round(percent, 1),
    # Calculate the difference for filling NA
     diff_value = lag(mid_value) - lag(lower_bound),
    # Fill NA mid_values by adding the difference to the current lower_bound
     mid_value = ifelse(is.na(mid_value), lower_bound + diff_value, mid_value)
  ) %>%
  ungroup()  # Ungroup after processing
  return(indicator_combined)
}
Code
# Load required libraries
library(dplyr)
library(gt)


# indicator_combined<-
# generate_f43_indicator(binwidth = 2)
# saveRDS(indicator_combined, file="MHE23_SCAPIS_low_level_noise_f43.rds")
indicator_combined<-readRDS("MHE23_SCAPIS_low_level_noise_f43.rds")
p<-
  ggplot(indicator_combined, aes(x = mid_value, y = percent, color = source, group = source)) +
  geom_line() +  # Create the lines for each source
  geom_point() +  # Add points to each line
  labs(title = "Exposure-Response aspects of traffic noise",
       x = "Noise Exposure Level (dB)",
       y = "Percentage Annoyed (%)",
       color = "Noise Source") +  # Label for the legend
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  theme(legend.position = "bottom")
plotly::ggplotly(p,tooltip =c("y", "color"),dynamicTicks =T )
Code
# generate_f43_indicator(binwidth = 5)  %>%
#   select(`Exposure level`=mid_value,percent, count, source) %>%
#   gt(groupname_col = "source")