Assignment 2 - Q2

Author

21315316

Time Series for Weather Stations using Shiny

1 Introduction

1.1 Brief Overview

This blog presents an in-depth analysis of rainfall per month from four key weather stations across Ireland: Belfast, Dublin Airport, University College Galway, and Cork Airport. By pairing this comprehensive dataset with an interactive and intuitive map, we can perform a critical analysis of both long-term and short-term trends.

This blog’s interactive tools and visuals will allow you to:

  • Examine monthly rainfall data through dynamic time series graphs
  • Explore geographical variations using an interactive map of Ireland with a universal time selector
  • Compare rainfall patterns across all four stations simultaneously and individually
  • Focus on individual stations for detailed analysis

The centerpiece of our visualization is a dygraph with a range selector, enabling you to easily zoom in on specific time periods across all stations. This feature maintains a consistent time window, facilitating direct comparisons between locations. By selecting all in the interactive options table, you can view all four stations simultaneously.

1.2 Interactive Shiny Application

### shiny app
ui <- fluidPage(
    titlePanel(p("Irish Weather Stations and Rainfall per Month", style="color:#eb6b34")),
    sidebarLayout(
        sidebarPanel(
            selectInput(
                inputId = "variableselected",
                label = "Select Station",
                choices = c("Belfast","Dublin Airport","University College Galway","Cork Airport","All")
            ),
            p("Map of Ireland and Stations"),
            leafletOutput(outputId = "map")
        ),
        #mainPanel("Rainfall per Month")
        mainPanel(
            dygraphOutput(outputId = "timetrend"),
        ),
    ),
)

server <- function(input, output, session) {
    
    rainstat <- reactive({
        df = st_read('weather_stations.geojson', quiet=TRUE)
        as.data.frame(df) %>%
            dplyr::select(Year, Month, Rainfall, Station)
    })
    
    map_data <- reactive({
        st_read('weather_stations.geojson', quiet=TRUE) %>%
            dplyr::select(Station, geometry) %>%
            dplyr::filter(Station %in% c("Belfast","Dublin Airport","University College Galway","Cork Airport")) %>%
            unique() %>%
            st_transform(4326)
    })
    
    output$table <- renderDT(rainstat)
    
    output$timetrend <- renderDygraph({
        rainstat <- rainstat()
        dataxts <- NULL
        
        if(input$variableselected == "All"){
            filter_stats <- c("Belfast","Dublin Airport","University College Galway","Cork Airport")
            for(l in 1:4){
                rainstat %>% as.data.frame() %>%
                    dplyr::filter(Station == filter_stats[l]) %>% 
                    dplyr::summarise(Rainfall=sum(Rainfall), .by = c(Year,Month)) %>%
                    pull(Rainfall)  -> data_station
                dd <- ts(
                    data <- as.data.frame(data_station),
                    start=c(1850,1), freq=12
                )
                
                dataxts <- cbind(dataxts,dd)
            }
            colnames(dataxts) <- filter_stats
            dygraph(dataxts) %>%
                dyRangeSelector(retainDateWindow = TRUE) %>%
                dyHighlight(highlightSeriesBackgroundAlpha = 0.2) -> d1
            d1$x$css <- "
 .dygraph-legend > span {display:none;}
 .dygraph-legend > span.highlight { display: inline; }
 "      
            d1
        }else{
            rainstat %>%
                filter(Station == input$variableselected) %>% 
                dplyr::summarise(Rainfall=sum(Rainfall), .by =c(Year,Month)) %>%
                pull(Rainfall) %>%
                ts(start=c(1850,1), freq=12) -> unique_stat
            
            dygraph(unique_stat) %>%
                dyRangeSelector(retainDateWindow = TRUE) %>%
                dyHighlight(highlightSeriesBackgroundAlpha = 0.2) -> d1
            d1   
        }
    })
    
    output$map <- renderLeaflet({
        rainstat <- rainstat()
        map <- map_data()
        
        if(input$variableselected == "All"){
            rainstat_stats <- rainstat %>% filter(Year == 1850)
            order_stats <- match(map$Station, rainstat_stats$Station)
            map$rain <- rainstat_stats[order_stats,]
            
            coords <- map %>%
                select(geometry) %>%
                st_coordinates() %>%
                as.data.frame() %>%
                rename(longitude = X, latitude = Y)
            
            map <- map %>%
                mutate(longitude = coords$longitude, latitude = coords$latitude)
            
            labels <- sprintf("%s", input$variableselected) %>%
                lapply(htmltools::HTML)
            l <- leaflet() %>%
                addTiles() %>%
                addMarkers(data = map,
                           lng = ~longitude, lat = ~latitude,
                           popup = map$Stations,
                           layerId = ~Station)
            l
        }else{
            rainstat_stats <- rainstat %>% filter(Year == 1850)
            rainstat_stats <- rainstat %>% filter(Station == input$variableselected)
            order_stats <- match(map$Station, rainstat_stats$Station)
            map$rain <- rainstat_stats[order_stats,]
            
            coords <- map %>%
                filter(Station == input$variableselected) %>%
                select(geometry) %>%
                st_coordinates() %>%
                as.data.frame() %>%
                rename(longitude = X, latitude = Y)
            
            map <- map %>%
                mutate(longitude = coords$longitude, latitude = coords$latitude)
            
            labels <- sprintf("%s", input$variableselected) %>%
                lapply(htmltools::HTML)
            
            l <- leaflet() %>%
                addTiles() %>%
                addMarkers(data = map,
                           lng = ~longitude, lat = ~latitude,
                           popup = map$Stations,
                           layerId = ~Station)
            l
        }
    })
    
    observeEvent(input$map_marker_click, {
        click <- input$map_marker_click
        station <- click$id
        updateSelectInput(session, "variableselected", selected = station)
    })
}

shinyApp(ui = ui, server = server, options = list(height=700))

Shiny applications not supported in static R Markdown documents

2 Exploration of the data:

The data set “weather_stations.geojson” sonsists of 49500 observations and 12 columns, including crutial variables like, Year, Month, Rainfall, Stations and geometry which was used when making the time series and the map of Ireland with the stations. The data spans from January 1850 to December 2014, providing a comprehensive 164-year view of rainfall patterns across four Irish weather stations

In this section I will perform a thorough analysis of the rainfall data through the following steps:

  • Data Overview: Examine the structure and content of the dataset, focusing on the key variables used for our analysis.
  • Boxplot Analysis: Visualize and compare the distribution of rainfall across the four stations, highlighting variations in median values and data spread.
  • Long-term Trend Analysis: Investigate the overall rainfall trends from 1850 to 2014, identifying any significant changes or patterns over time.
  • Seasonal Trend Analysis: Explore monthly rainfall patterns to uncover any recurring seasonal trends across the stations.

2.1 Boxplot for stations

First let us analyse the data set, their mean and general idea of the data we are working with, for this employing a box plot would provide a comprehensive summary of the data across the four weather stations:

There is similar mean and variability of the collect rainfall data among 3 out 4 stations, the similar mean is denoted by the horizontal redline. Dublin Airport however has a significantly lower average rainfall and variability.

University College Galway and Cork Airport exhibit the highest variability in rainfall, as evidenced by their larger box sizes and longer whiskers. Belfast shows moderate variability, while Dublin Airport has the least.

On top of that there appears to be a similar number of outliers upon inspecting the box plot, by examining the tibble underneath, there is a similar number of outliers for each station, perhaps the come from the same error? This becomes easier to see in the next plot.

Warning: package 'ggplot2' was built under R version 4.3.3
rainstat%>%
    filter(Station %in% c("Belfast","Dublin Airport","University College Galway","Cork Airport") )%>%
    summarise(mean_rainfall = mean(Rainfall, na.rm = TRUE)) %>%
    pull(mean_rainfall) -> mean_rainfall

rainstat%>%
    filter(Station %in% c("Belfast","Dublin Airport","University College Galway","Cork Airport") )%>%
    ggplot(aes(x=Station, y=Rainfall)) +
    geom_boxplot() + geom_hline(aes(yintercept = mean_rainfall), color="red")

rainstat%>%
    filter(Station %in% c("Belfast","Dublin Airport","University College Galway","Cork Airport") )%>%
    group_by(Station) %>%
    summarise(
        Q1 = quantile(Rainfall, 0.25),
        Q3 = quantile(Rainfall, 0.75),
        IQR = IQR(Rainfall),
        lower_bound = Q1 - 1.5 * IQR,
        upper_bound = Q3 + 1.5 * IQR,
        outliers = sum(Rainfall < lower_bound | Rainfall > upper_bound)
    )
# A tibble: 4 × 7
  Station                      Q1    Q3   IQR lower_bound upper_bound outliers
  <chr>                     <dbl> <dbl> <dbl>       <dbl>       <dbl>    <int>
1 Belfast                    55.6 114.   58.1       -31.6        201.       29
2 Cork Airport               54.4 134.   79.5       -64.9        253.       32
3 Dublin Airport             37.3  79.7  42.4       -26.3        143.       42
4 University College Galway  65.2 132.   66.6       -34.7        232.       22

2.3 Seasonal Analysis:

This section will target the seasonal analysis of rainfall under the four weather stations, it reveals some annual trends, the following aims to breakdown the analysis.

Overall Seasonal Pattern: For Cork and Galway, rainfall varies the most at the beginning and end of the year, forming an arc-like pattern throughout the year that dips during the summer months.
For Dublin and Belfast the variability of rainfall seems to increase from May/June on wards; until it reaches the end of January. One would assume that the seasonal patterns should be the same however it isn’t really the case here, it would be interesting to look into this further.

Process:
We first filter the data set to the four stations, gg_season requires a tsibble data set. So we must convert the geojson file into something interpretable.

In the data we are only given columns, Year and Month, for tsibble we need it as a datetime variable e.g: YEAR-MONTH-DAY HOUR-MINUTE-SECOND. I mutate this in mutate(date = as.POSIXct(as.yearmon(paste0(Year, Month), "%Y %B"))). The yearmon function takes the combined string of year and month, then using %Y takes in 4 digit (year) and using %B takes in the month(as the first three capital letter of each month) and converts it to a month in numbers. These are then combined into a date variable. as.POSTIXct adds the time variable, thus making a varailbe of date and time.

I believe as the time part is all zeros, this was giving us an error. The as_tsibble needs to notified that time part should be ignored, hence regular = FALSE.

data <- st_read('weather_stations.geojson', quiet=TRUE)

filter_stats <- c("Belfast","Dublin Airport","University College Galway","Cork Airport")
p_i <- list()

RAIN_pre <- as.data.frame(data) %>%
         dplyr::select(Year, Month, Rainfall, Station) %>%
         dplyr::filter(Station %in% filter_stats) %>%
         mutate(date = as.POSIXct(as.yearmon(paste0(Year, Month), "%Y %B")))
for(i in 1:4){
    RAIN <- as_tibble(RAIN_pre) %>%
        filter(Station == filter_stats[i]) %>%
        distinct(date, .keep_all = TRUE) %>%
        as_tsibble(index = date, regular = FALSE)
    
    p <- RAIN %>%
        gg_season(Rainfall, period = "year") + 
        labs(title = paste0("Seasonal Plot of Rainfall - ", filter_stats[i]), x = "Month", y = "Rainfall")
    p
    
    p_i[[i]] <-p
}

# Combine the plots into one
grid.arrange(grobs = p_i, ncol = 2)

3 Using Shiny for interactable graph

This section will help to explain how the blog implemented the interactble shiny element seen in section 1.2 . The app consists of two main components: the user interface (ui) and the server.

3.1 User Interface (UI)

The ui is split into two main parts:

  1. Sidebar Panel:
  • Contains a dropdown menu (selectInput) for station selection
  • Displays a map of Ireland with station markers (leafletOutput)
  1. Main Panel:
  • Shows the time series graph of rainfall data (dygraphOutput)

One can choose between the listed weather stations and also has the option to see them all. This will effect how the input is taken in by the server.

sidebarPanel(
    selectInput(
        inputId = "variableselected",
        label = "Select Station",
        choices = c("Belfast","Dublin Airport","University College Galway","Cork Airport","All")
    ),
    p("Map of Ireland and Stations"),
    leafletOutput(outputId = "map")
),
#mainPanel("Rainfall per Month")
mainPanel(
    dygraphOutput(outputId = "timetrend"),
)

3.2 Server

The server side takes in the input from the UI, alters the graphs and plots to show the requested values. First the server must make two function extracting the data that the graphs will require.

  • rainstat: Loads and processes the main dataset for time series visualization.
  • map_data: Prepares geographical data for the interactive map.
rainstat <- reactive({
        df = st_read('weather_stations.geojson', quiet=TRUE)
        as.data.frame(df) %>%
            dplyr::select(Year, Month, Rainfall, Station)
})
    
map_data <- reactive({
    st_read('weather_stations.geojson', quiet=TRUE) %>%
        dplyr::select(Station, geometry) %>%
        dplyr::filter(Station %in% c("Belfast","Dublin Airport","University College Galway","Cork Airport")) %>%
        unique() %>%
        st_transform(4326)
})

3.3 Timetrend

This part of the server makes the time series for monthly rainfall, first thing to consider is if the user wants to view all stations or an individual stations.

If the user chooses to see all, then the code will define each station’s time series in a for loop and combine them into one dygraph seen in (i), adding the hover able element makes it easy for the user to view a specific station.

If just one station is selected then this can be gotten from input$variableselected seen in (ii). Using the feature dyRangeSelector(retainiDateWindow = TRUE... from dygraph ensures consistent time range across station switches.

if(input$variableselected == "All"){
    filter_stats <- c("Belfast","Dublin Airport","University College Galway","Cork Airport")
    for(l in 1:4){
        rainstat %>% as.data.frame() %>%
            dplyr::filter(Station == filter_stats[l]) %>% 
            dplyr::summarise(Rainfall=sum(Rainfall), .by = c(Year,Month)) %>%
            pull(Rainfall)  -> data_station
    .
    .
    .
rainstat %>%
    filter(Station == input$variableselected) %>% 
    dplyr::summarise(Rainfall=sum(Rainfall), .by =c(Year,Month)) %>%
    pull(Rainfall) %>%
    ts(start=c(1850,1), freq=12) -> unique_stat

dygraph(unique_stat) %>%
    dyRangeSelector(retainDateWindow = TRUE) %>%
    dyHighlight(highlightSeriesBackgroundAlpha = 0.2) -> d1
d1   

3.4 The next part is the map of Ireland:

The implementation of the interactive map of Ireland involves converting geometry data to longitude and latitude coordinates and creating a responsive leaflet map.

There are two variations that accounts if the user want to see all the station or just one:

  • If only one station is chosen then again this is recognized using input$variableselected, and is filtered through the data set rainstat_stats.
  • If “all” is choosen then instead the model would specify to use all of the four stations, seen under.
rainstat_stats <- rainstat %>% filter(Year == 1850)
            order_stats <- match(map$Station, rainstat_stats$Station)
            map$rain <- rainstat_stats[order_stats,]
            
            coords <- map %>%
                select(geometry) %>%
                st_coordinates() %>%
                as.data.frame() %>%
                rename(longitude = X, latitude = Y)
            
            map <- map %>%
                mutate(longitude = coords$longitude, latitude = coords$latitude)
            
            labels <- sprintf("%s", input$variableselected) %>%
                lapply(htmltools::HTML)
            l <- leaflet() %>%
                addTiles() %>%
                addMarkers(data = map,
                           lng = ~longitude, lat = ~latitude,
                           popup = map$Stations,
                           layerId = ~Station)
            l

Finally allowing the markers of the stations show the corresponding dygraph can be seen below. It asks to update the server end using the input defined in session.

observeEvent(input$map_marker_click, {
        click <- input$map_marker_click
        station <- click$id
        updateSelectInput(session, "variableselected", selected = station)
})

This interaction allows users to click on a station marker on the map and automatically update the time series graph to display data for that specific station. It creates a seamless connection between the geographical representation and the temporal data visualization, overall improving the user experience.

4 Conclusion

This interactive visualization tool provides valuable insights into Ireland’s rainfall patterns. The analysis reveals some important findings on the spatial variation, the long and short term trends, seasonal patterns and the quality of data consideration. Using an interactive Shiny app helped to explore the depths of these patterns and trends. This analysis can also be employed to all the weather stations in Ireland and it is something that I would like to expand onto.