Weather Conditions and their Influence on Human Mobility Patterns

GEO 880 - Computational Movement Analysis - Final Project

Author

Lars Weidinger & Javier Feller

Published

June 21, 2026

Abstract

Ensuring safe and efficient pedestrian mobility requires a close understanding of its influencing factors. Weather conditions, particularly precipitation and temperature, are among important external factors affecting walking behaviour. In light of climate change with increasing temperatures and changing variability of precipitation, capturing these effects becomes increasingly important. Based on a case study of Zurich, this study examines how precipitation and temperature (external factors) influence both the spatial mobility extent (navigation capacity) and walking speed (motion capacity).

The results suggest that increasing temperature can lead to higher walking speed, except under very cold conditions. Furthermore, the spatial extent decreases with longer precipitation duration. While the data points are too few to detect significant relationships, this analysis shows initial evidence of weather-related effects on pedestrian mobility and serves as basis for further, more in-depth studies.

Introduction

Since the dawn of human evolution, humans have walked across the surface of the earth. To this day, walking remains the most essential mode of transport, since almost all trips include some part of moving on foot (Giannoulaki and Christoforou (2024)). Pedestrian movement has therefore been studied in depth to create appealing and efficient walking infrastructure.

One of the most relevant variables in pedestrian movement is walking speed. According to Giannoulaki and Christoforou (2024), it depends on five factors: flow characteristics, pedestrian attributes, layout configuration, ambient conditions and behavioural patterns. This project focuses on ambient conditions, namely weather (temperature, precipitation), and its effect on pedestrian mobility patterns. As conceptual movement space, we use an unconstrained movement model (Laube (2009)), since walking mobility does not always occur on fixed routes.

Following the established framework for context-aware Movement Analysis (CAMA) Nathan et al. (2008), we investigate precipitation and temperature effects (external factors) on spatial extent (navigation capacity) and on speed levels (motion capacity), noting that weather can affect motion capacity both directly and indirectly.

Adapted framework from Nathan et al. (2008)

In the northern hemisphere, low temperatures and precipitation are generally associated with poorer walking conditions, as rain, snow and ice reduce pavement quality and require more cautious, slower movement (Montigny, Ling, and Zacharias (2012)). However, the relationship is complex since pedestrians may also increase their speed under unfavourable weather to minimize exposure and reach their destination faster (Giannoulaki and Christoforou (2024)). Warmer temperatures up to a certain threshold encourage walking, while higher temperatures (23°C or more) may lead pedestrians to seek shade and alter their routes (Montigny, Ling, and Zacharias (2012)). The speed increase in heat was also found by Rotton, Shats, and Standers (1990), which is explained through the Yerkes–Dodson law, where moderate arousal (=heat) improves performance while excessive stress reduces it (a U-shaped curve) (Yerkes and Dodson (1908)). Furthermore, walking may also serve as a heat-production mechanism in cold temperatures, resulting in faster speeds (Shuichi et al. (2021)).

Precipitation is one of the strongest determinants of pedestrian activity. Rainfall significantly reduces pedestrian traffic, and this effect extends beyond the immediate weather event, as travellers may anticipate rainfall and adjust their mobility decisions in advance (Zhao et al. (2019)).

This project focuses on the impact of temperature and precipitation on pedestrian mobility patterns and walking behaviour, leading to the following Research Questions (RQ):
I) How does temperature affect walking speed?
II) How does precipitation affect walking speed?
III) How do temperature and precipitation affect the spatial extent of activity spaces?
IV) Do different temporal scales of the weather data matter?

Since previous research showed the importance and sensitivity of varying scale (Laube and Purves (2011), Zhao et al. (2019)), special attention is given to it in this analysis. For the weather station data, we account for the effect of spatial scale selection comparing city-level vs. station-level analysis as well as considering varying temporal scale with daily, hourly and 15 min weather data. Thus, we also perform a small cross-scale movement analysis. For the mobility data, we don’t include a cross-scale movement analysis, but discuss our choices and parameters extensively.

Material and Methods

Load Packages

Code

library(pacman)

# Load and if not found in the library will attempt to install the package
pacman::p_load(here,
               sf,
               readr,
               tmap,
               dplyr,
               ggplot2,
               tidyr,
               jsonlite, # to import JSON files
               lubridate, # temporal import from JSON file
               tidyverse,
               leaflet, # interactive maps, more personal experience than on tmap
               tmap, # interactive maps
               concaveman, # construct concave hulls
               gstat, # for IDW
               stars, # to create a grid
               purrr,
               devtools, # for the wordcountaddin to work
               wordcountaddin, # counts words and characters
               install = TRUE)

Import & Preprocess Data

preprocessing

source("preprocessing.R") # this runs the whole preprocessing file

# See the preprocessing.R file where we import:
# - weather data (weather_daily_df, weather_hourly_df, meteoblue_weather_data)
# - weather station location data (weather_stations_sf, meteoblue_stations)
# - tracking data (google_timeline_sf_lv95)
# - city boundary data (stadtgrenze)

# set CRS
crs_lv95  <- 2056
crs_wgs84 <- 4326

Weather Daily (Voronoi Polygon)

Weather Stations

Code

ggplot() +
  geom_sf(data = stadtgrenze, fill = "#D5F4FF", color = "black") +
  geom_sf(data = weather_stations_sf, aes(color = ID), size = 3) +
  
  coord_sf(datum = crs_lv95) +
  
  labs(
    title = "Weather Stations within Zurich",
    color = "Station",
    x = "Easting [m]",
    y = "Northing [m]") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8),
    axis.text = element_blank(), 
    axis.title = element_blank()
    )

Construct Voronoi Polygons around Stations

Since we need to assign the weather data to our GPS fixes and later our segments we need to relate the two datasets spatially. For that, we construct Voronoi polygons around the measuring stations. Later we can then assign each Google timeline fix the weather values from the station it is closest to (= inside the Voronoi polygon). This method is rather simple but very effective.

Code

# both datasets are in CH1903+

# create a union of points
pts_union <- st_union(weather_stations_sf)

# create Voronoi tessellation with an envelope
# st_voronoi has an envelope argument. Instead of generating a huge (or to small) tessellation we can set the scope here
envelope_stadtgrenze <- st_as_sfc(st_bbox(stadtgrenze))

voronoi <- st_voronoi(
  pts_union,
  envelope = envelope_stadtgrenze)

# extract polygons
voronoi_polygons <- st_collection_extract(voronoi, "POLYGON")

# convert to sf object
voronoi_sf <- st_sf(
  ID = weather_stations_sf$ID,
  geometry = st_cast(voronoi_polygons))

# clip to city boundary
voronoi_clipped <- sf::st_intersection(voronoi_sf, stadtgrenze)

Code

ggplot() +
  geom_sf(data = voronoi_clipped, aes(fill = ID)) +
  
  coord_sf(datum = crs_lv95) +
  
  labs(
    title = "Voronoi Polygons around\n Weather Stations in Zurich",
    color = "Station",
    x = "Easting [m]",
    y = "Northing [m]") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8),
    axis.text = element_blank(), 
    axis.title = element_blank()
    )

Assign Google Timeline to Voronoi Polygons

Code

# We can now use a spatial join to assign the corresponding weather station to each of the Google timeline fixes.

# both datasets are in CH1903+

# spatial join: assign polygon attributes (ID) to points
google_timeline_sf_lv95 <- st_join(
  google_timeline_sf_lv95,
  voronoi_clipped["ID"],
  join = st_within) |>
  
  # Since we traveled quite a lot outside of the city of Zurich, we omit the data
  # outside the city (NA values), which is approximately 60% of the data points.
  drop_na() # omit NA values (quite a lot of them)

Code

# visualise timeline points
ggplot() +
  geom_sf(data = stadtgrenze, fill = "#D5F4FF", color = "black") +
  
  geom_sf(data = google_timeline_sf_lv95, color = "orange", size = 0.6) +

  
  coord_sf(datum = crs_lv95) +
  
  labs(
    title = "Google Timeline Data Points\n within Zurich",
    color = "Stop",
    x = "Easting [m]",
    y = "Northing [m]") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8),
    axis.text = element_blank(), 
    axis.title = element_blank()
    )

Assign Weather Data to Google Timeline

Code

# Now that we have assigned each data point to its closest weather station, we need to assign them their relevant weather values. We decided on using temperature (T) and rain (RainDur). One problem is, that the station Heubeeribüel does not collect RainDur information.

# based on the exercises of week 5, Task 3 ff we first create a join key so that we can later match them exactly

google_timeline_sf_lv95 <- google_timeline_sf_lv95 |>
  mutate(
    Date = as.Date(time)) # since we only have one measurement per night

weather_daily_df <- weather_daily_df |>
  mutate(
    Date = as.Date(Datum)) # since we only have one measurement per night

Code

# prepare weather data
weather_daily_df_wide <- weather_daily_df |>
  filter(Parameter %in% c("T", "RainDur")) |> # we only want T and RainDur
  select(Date, Standort, Parameter, Wert) |>
  pivot_wider(                               # we need the parameters as columns
    names_from = Parameter,
    values_from = Wert
  ) |>
  rename(
    Temperature = T,
    RainDur = RainDur, 
    ID = Standort
  )

Code

# join the two datasets
google_timeline_weather <- google_timeline_sf_lv95 |>
  left_join(
    weather_daily_df_wide,
    by = c("Date", "ID")
  )

Code

# visualise timeline points with voronoi polygons
ggplot() +
  geom_sf(data = voronoi_clipped, aes(fill = ID)) +
  
  geom_sf(data = google_timeline_sf_lv95, color = "orange", size = 0.6) +

  
  coord_sf(datum = crs_lv95) +
  
  labs(
    title = "Google Timeline Data Points\n within the Voronoi Polygons",
    color = "Stop",
    x = "Easting [m]",
    y = "Northing [m]") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8),
    axis.text = element_blank(), 
    axis.title = element_blank()
    )

Visualise Activity Space (Concave Hulls)

To answer RQ II we need to construct some sort of measurement for activity space. We decided on constructing concave hulls around the trips for each day. We then calculate the area of these hulls, take the average weather values across the whole hull and analyse if the area is larger when temperature is higher or rain duration is lower.

We decided to use concave hulls rather than convex hulls since they are less prone to outliers. Even though there are some convex hull implementations that allow for certain variables to be manipulated in order to exclude outliers (e.g. adehabitatHR::mcp()), the concave hull is more suited for our use case. We could have also used Alpha shapes (alphahull::) which accounts for holes in our data but this is not needed since we are dealing with continuous travel data (Weibel (2024)).

As an alternative, we could have used the Standard Devitational Ellipse (SDE), which forms a closed shape that while representing distributional properties of the point set it does not enclose all points. This would have been potentially more fitting, since the Google timeline data does not capture the movement between fixes, therefore single fixes could distort the concave hulls.

Construct Concave Hulls

Code

# In the first approach we called concaveman() on the entire sfc column (all geometries at once), not on each row individually. It computed one hull over all points and that same result gets recycled into every row which resulted in all the hulls being the same. We asked Claude AI (Model Sonnet 4.6) for help and it delivered the following solution using lapply.

# Step 1: union points by date
google_timeline_hulls <- google_timeline_sf_lv95 |>
  group_by(Date) |>
  summarise(geometry = st_union(geometry), .groups = "drop") |>
  st_as_sf()

# Step 2: build concave hull for each date
hulls_list <- lapply(google_timeline_hulls$geometry, function(g) {
  coords <- st_coordinates(g)[, c("X", "Y")]
  
  # need at least 3 unique points for a polygon
  if (is.null(nrow(coords)) || nrow(unique(coords)) < 3) return(NA)
  
  st_polygon(list(concaveman(coords)))
})

# Step 3: check which dates failed
failed_dates <- google_timeline_hulls$Date[sapply(hulls_list, function(x) identical(x, NA))]
print(failed_dates)  # inspect which dates are problematic

Date of length 0

Code

# Step 4: remove failed dates and assign geometry
google_timeline_hulls <- google_timeline_hulls[!sapply(hulls_list, function(x) identical(x, NA)), ]

hulls_list <- hulls_list[!sapply(hulls_list, function(x) identical(x, NA))]

# Step 5: assign hulls back as sfc column
google_timeline_hulls$geometry <- st_sfc(hulls_list, crs = st_crs(google_timeline_sf_lv95))

Code

# calculate area
google_timeline_hulls <- google_timeline_hulls |>
  mutate(
    area_m2 = as.numeric(st_area(geometry)),
    area_km2 = area_m2 / 1000000)

Code

# visualise concave hulls
ggplot() +
  geom_sf(data = stadtgrenze, fill = "transparent", color = "black") +
  
  geom_sf(data = google_timeline_hulls, aes(fill = as.factor(Date)), alpha = 0.8) +
  
  scale_fill_discrete(guide = "none") +
  
  labs(
    title = "Concave Hulls of Daily Movement in Zurich",
    x = "Easting [m]",
    y = "Northing [m]") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8),
    axis.text = element_blank(), 
    axis.title = element_blank()
    )

Assign Weather Values

Code

# average weather values across whole hull
daily_weather <- google_timeline_weather |>
  st_drop_geometry() |>
  group_by(Date) |>
  summarise(
    Temperature = mean(Temperature, na.rm = TRUE),
    RainDur = mean(RainDur, na.rm = TRUE)
  )

daily_hulls_weather <- google_timeline_hulls |>
  left_join(daily_weather, by = "Date")

Code

# visualise concave hulls with weather data
ggplot() +
  geom_sf(data = stadtgrenze, fill = "transparent", color = "black") +
  
  geom_sf(data = daily_hulls_weather[c(1, 8, 24, 35), ], aes(fill = Temperature), alpha = 1) +
  
  scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0, name = "Temperature [°C]") +
  
  labs(
    title = "Four Random Concave Hulls of Daily Movement in Zurich",
    color = "Temperature",
    x = "Easting [m]",
    y = "Northing [m]") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8),
    axis.text = element_blank(), 
    axis.title = element_blank()
    )

Weather Daily (Field)

We only have weather measurements at at a handful of stations across the city. Assigning each location fix the weather data of the Voronoi polygon of the station is simplistic. Weather changes rather gradually across space, a more “fluid” approach would be better. That is why we decided on using Inverse Distance Weighting (IDW), which is a simple approach to generate gradually changing values across space using some sort of tessellation.

Inverse Distance Weighting (IDW)

Code

# With only few stations the interpolation accuracy will be low. We are not capturing real microclimate variation like urban heat island effects or cooling green spaces and water bodies. But since we plan on repeating our method with the finer scaled Meteoblue stations, we decided on applying the IDW method.

# Since we do not want to compute a full raster for every day we will do a point-wise IDW prediction directly at each location fix's coordinates, using all weather station information. A raster per timestamp needs a lot of computation time and we only need the value at the fix's exact coordinates. Point-wise IDW is just a weighted average of all the distance, which takes much less time and we avoid avoid having to manage hundreds of rasters.

# this code chunk was constructed with the help of Claude (Sonnet 4.6)

idw_for_date <- function(fixes_date, stations_date, power = 2) {
  # fixes_date: sf object for this date
  # stations_date: sf object with Temperature, RainDur for this date
  # power = 2 is the standard value
  
  # full distance matrix: n_fixes x n_stations
  dist_mat <- st_distance(fixes_date, stations_date)
  dist_mat <- matrix(as.numeric(dist_mat), nrow = nrow(fixes_date), ncol = nrow(stations_date))
  
  # avoid division by zero
  dist_mat[dist_mat == 0] <- 1e-9
  
  weights <- 1 / (dist_mat^power)
  weights_norm <- weights / rowSums(weights)
  
  fixes_date |>
    mutate(
      Temperature_IDW = as.numeric(weights_norm %*% stations_date$Temperature),

      RainDur_IDW = apply(weights_norm, 1,
                          function(w) {
                            valid <- !is.na(stations_date$RainDur) # ignore stations with missing RainDur
                            
                            if(sum(valid) == 0) {
                              return(NA)
                              }
                            
                            w_valid <- w[valid]
                            rain_valid <- stations_date$RainDur[valid]
                            
                            # renormalize weights only along valid stations
                            w_valid <- w_valid / sum(w_valid)
                            
                            sum(w_valid * rain_valid)
                            }
                          )
      )
}


# apply per date (groups are small: one per day x n_stations)
unique_dates <- unique(google_timeline_sf_lv95$Date)

result_list <- map(unique_dates, function(d) {
  fixes_date <- google_timeline_sf_lv95 |>
    filter(Date == d)
  stations_date <- weather_daily_df_wide |>
    filter(Date == d) |>
    inner_join(st_drop_geometry(weather_stations_sf) |>
                 select(ID), by = "ID") |>
    left_join(weather_stations_sf |>
                select(ID, geometry), by = "ID") |>
    st_as_sf()
  
  if (nrow(stations_date) == 0) {
    fixes_date$Temperature_IDW <- NA
    fixes_date$RainDur_IDW <- NA
    return(fixes_date)
  }
  
  idw_for_date(fixes_date, stations_date)
})

google_timeline_weather_daily <- bind_rows(result_list)

Assign Weather Values

Code

# average weather values across whole hull
daily_weather_field <- google_timeline_weather_daily |>
  st_drop_geometry() |>
  group_by(Date) |>
  summarise(
    Temperature = mean(Temperature_IDW, na.rm = TRUE),
    RainDur = mean(RainDur_IDW, na.rm = TRUE)
  )

daily_hulls_weather_field <- google_timeline_hulls |>
  left_join(daily_weather_field, by = "Date")

Weather Hourly (Field)

We will repeat this IDW process with the hourly weather data.
The measurements are taken by the same stations as for the daily weather data. Therefore, we only really need to adapt the input data. The concave hulls will stay the same, we will still take all fixes from one day into account to construct the movement space since our RQ IV asks about the temporal scale of the weather data.

Prepare Weather Data

Code

# prepare weather data
weather_hourly_df_wide <- weather_hourly_df |>
  filter(Parameter %in% c("T", "RainDur")) |> # we only want T and RainDur
  select(Datum, Standort, Parameter, Wert) |>
  pivot_wider(                               # we need the parameters as columns
    names_from = Parameter,
    values_from = Wert
  ) |>
  rename(
    Temperature = T,
    RainDur = RainDur, 
    ID = Standort
  )

Code

google_timeline_sf_lv95 <- google_timeline_sf_lv95 |>
  mutate(
    Datum = floor_date(time, unit = "hour")
  )

Inverse Distance Weighting (IDW)

Code

# this code chunk was constructed with the help of Claude (Sonnet 4.6)

idw_for_hour <- function(fixes_date, stations_date, power = 2) {
  # fixes_date: sf object for this date
  # stations_date: sf object with Temperature, RainDur for this date
  # power = 2 is the standard value
  
  # full distance matrix: n_fixes x n_stations
  dist_mat <- st_distance(fixes_date, stations_date)
  dist_mat <- matrix(as.numeric(dist_mat), nrow = nrow(fixes_date), ncol = nrow(stations_date))
  
  # avoid division by zero
  dist_mat[dist_mat == 0] <- 1e-9
  
  weights <- 1 / (dist_mat^power)
  weights_norm <- weights / rowSums(weights)
  
  fixes_date |>
    mutate(
      Temperature_IDW = as.numeric(weights_norm %*% stations_date$Temperature),

      RainDur_IDW = apply(weights_norm, 1,
                          function(w) {
                            valid <- !is.na(stations_date$RainDur) # ignore stations with missing RainDur
                            
                            if(sum(valid) == 0) {
                              return(NA)
                              }
                            
                            w_valid <- w[valid]
                            rain_valid <- stations_date$RainDur[valid]
                            
                            # renormalize weights only along valid stations
                            w_valid <- w_valid / sum(w_valid)
                            
                            sum(w_valid * rain_valid)
                            }
                          )
      )
}


# apply per hour 
unique_hours <- unique(google_timeline_sf_lv95$Datum)

result_list <- map(unique_hours, function(h) {
  fixes_hour <- google_timeline_sf_lv95 |>
    filter(Datum == h)
  stations_hour <- weather_hourly_df_wide |>
    filter(Datum == h) |>
    inner_join(st_drop_geometry(weather_stations_sf) |>
                 select(ID), by = "ID") |>
    left_join(weather_stations_sf |>
                select(ID, geometry), by = "ID") |>
    st_as_sf()
  
  if (nrow(stations_hour) == 0) {
    fixes_hour$Temperature_IDW <- NA
    fixes_hour$RainDur_IDW <- NA
    return(fixes_hour)
  }
  
  idw_for_hour(fixes_hour, stations_hour)
})

google_timeline_weather_hourly <- bind_rows(result_list)

Assign Weather Values

Code

# average weather values across whole hull but still for the whole day
hourly_weather_field <- google_timeline_weather_hourly |>
  st_drop_geometry() |>
  group_by(Date) |>
  summarise(
    Temperature = mean(Temperature_IDW, na.rm = TRUE),
    RainDur = mean(RainDur_IDW, na.rm = TRUE)
  )

hourly_hulls_weather_field <- google_timeline_hulls |>
  left_join(hourly_weather_field, by = "Date")

Weather 15 min (Field)

Now we want to go one step more into detail using the meteoblue weather data, which takes temperature measurements every 15 minutes at 93 stations across the whole city. It is important to note that meteoblue only measures temperature and not precipitation or rain duration.
Again, we will use the concave hulls per day, since we are interested in the effect of temporal granularity of weather data.

Prepare Weather Data

Code

# the meteoblue weather data is already in a wide format, therefore we do not need to change anything there.

google_timeline_sf_lv95 <- google_timeline_sf_lv95 |>
  mutate(
    QuarterHour = floor_date(time, unit = "15 min")
  )

Inverse Distance Weighting (IDW)

Code

# If we would repeat the same code as earlier the runtime would increase significantly. Instead of hourly measurement of four stations we have 15 min measurements of 93 stations. That is why we adapted the code slightly. 

# we move the station preparation outside the loop --> so it does not get repeated all the time
meteoblue_weather_data_join <- meteoblue_weather_data |>
  left_join(
    meteoblue_stations |> 
      select(locationID, EKoord, NKoord), by = "locationID") |>
  st_as_sf(
    coords = c("EKoord", "NKoord"),
    crs = crs_lv95
  )

Code

# this code chunk was constructed with the help of Claude (Sonnet 4.6)

idw_for_quarter_hour <- function(fixes_date, stations_date, power = 2) {
  # fixes_date: sf object for this date
  # stations_date: sf object with Temperature for this date
  # power = 2 is the standard value
  
  # remove NA temperature stations
  stations_date <- stations_date |>
    filter(!is.na(value))
  
  # full distance matrix: n_fixes x n_stations
  dist_mat <- st_distance(fixes_date, stations_date)
  dist_mat <- matrix(as.numeric(dist_mat), nrow = nrow(fixes_date), ncol = nrow(stations_date))
  
  # avoid division by zero
  dist_mat[dist_mat == 0] <- 1e-9
  
  weights <- 1 / (dist_mat^power)
  weights_norm <- weights / rowSums(weights)
  
  fixes_date |>
    mutate(
      Temperature_IDW = as.numeric(weights_norm %*% stations_date$value)
      )
}

# apply per quarter hour
unique_quarter_hours <- unique(google_timeline_sf_lv95$QuarterHour)

result_list <- map(unique_quarter_hours, function(q) {
  
  fixes_quarter_hour <- google_timeline_sf_lv95 |>
    filter(QuarterHour == q)
  
  stations_quarter_hour <- meteoblue_weather_data_join |>
    filter(timestamp == q)
  
  if (nrow(stations_quarter_hour) == 0) {
    fixes_quarter_hour$Temperature_IDW <- NA
    return(fixes_quarter_hour)
  }
  
  idw_for_quarter_hour(fixes_quarter_hour, stations_quarter_hour)
})

google_timeline_weather_quarter_hourly <- bind_rows(result_list)

Assign Weather Values

Code

# average weather values across whole hull but still for the whole day
quarter_hourly_weather_field <- google_timeline_weather_quarter_hourly |>
  st_drop_geometry() |>
  group_by(Date) |>
  summarise(
    Temperature = mean(Temperature_IDW, na.rm = TRUE)
    )

quarter_hourly_hulls_weather_field <- google_timeline_hulls |>
  left_join(quarter_hourly_weather_field, by = "Date")

Code

# join datasets since there is no rain information in meteoblue
quarter_hourly_hulls_weather_field_join <- quarter_hourly_hulls_weather_field |>
  left_join(
    hourly_hulls_weather_field |>
      st_drop_geometry() |>
      select(RainDur, Date),
    by = "Date"
  )

Influence of Weather on Walking - ArcGIS Earth trajectories analysis

Now we switch from the timeline data to the ArcGIS Earth data, since Google timeline does not allow for precise movement parameter calculations due to the coarse sampling rate.

Segmentation of trajectories

First we split our trajectories into segments of stops and moves using a moving window. Since ArcGIS Earth has a very fine sampling rates (seconds) and we sometimes let the tracking run for a whole day, a lot of the fixes are stops and will not be needed for further analysis.
Following a similar procedure as in these studies (Guo et al. (2018))(Bonavita, Guidotti, and Nanni (2022)), we employed spatial and temporal tresholds to classify movement and non-movement. We classified the segment as “movement” if the distance travelled in 1 timestep was more than 5 meters within a time of 30 seconds. This allows for certain minimal movement, such as being in crowded pedestrian areas, without being automatically being classified as stop.

Code

# define function 
difftime_secs <- function(later, now){
    as.numeric(difftime(later, now, units = "secs"))
}

Code

# define two timestamps for function using lead
later <- lead(tracks_lv95$timestamp)
now <- tracks_lv95$timestamp

tracks_lv95$timelag <- difftime_secs(later, now)

Code

# sort after timestamp
tracks_lv95 <- tracks_lv95 |>
  arrange(timestamp)

Code

# compute distance and time difference
tracks_lv95 <- tracks_lv95 |>
  mutate(
    step_dist = as.numeric(st_distance(geometry, lead(geometry), by_element = TRUE)),
    step_time = as.numeric(difftime(lead(timestamp), timestamp, units = "secs"))
  )

Code

# define threshold of moving window, movement if distance of 1 step more than 5 meters in a time of less than 30 seconds
tracks_lv95 <- tracks_lv95 |>
  mutate(
    moving_step = step_dist > 5 & step_time < 30
  )

Code

# assign new segment_id each time transition from stop to move or vice versa
# to connect the still separate points into segments, with support of Chatgpt
tracks_lv95 <- tracks_lv95 |>
  mutate(
    change = moving_step != lag(moving_step, default = first(moving_step)),
    segment_id = cumsum(change)
  )

Code

# calculate for each segment starting time, end time, duration, geometry, speed
segments <- tracks_lv95 |>
  group_by(segment_id, moving_step) |>
  summarise(
    start_time = first(timestamp),
    end_time = last(timestamp),

    duration = as.numeric(
      difftime(last(timestamp), first(timestamp), units = "secs")
    ),

    distance = sum(step_dist, na.rm = TRUE),

    speed = ifelse(duration > 0, distance / duration, NA_real_),

    speed_kmh = speed * 3.6,

    n_points = n(),
    geometry = st_union(geometry),
    .groups = "drop"
  )

Code

# classify segments to Movement or not (Stop)
segments <- segments |>
  mutate(type = ifelse(moving_step, "MOVE", "STOP"))

Code

# visualize stops and moves together with city boundary
ggplot() +
  geom_sf(data = stadtgrenze, fill = "#D5F4FF", color = "black") +
  
  geom_sf(data = segments, aes(color = type), size = 1.2) +
  
  scale_color_manual(values = c("STOP" = "blue", "MOVE" = "red")) +

  coord_sf(crs = crs_lv95) +

  labs(
    title = "ArcGIS Earth Stops and Moves",
    color = "Movement",
    x = "Easting [m]",
    y = "Northing [m]") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8),
    axis.text = element_blank(), 
    axis.title = element_blank()
    )

Code

# visualize stops and moves inside Zurich together with city boundary
ggplot() +
  geom_sf(data = stadtgrenze, fill = "#D5F4FF", color = "black") +
  
  geom_sf(data = segments |>
            st_intersection(stadtgrenze), 
          aes(color = type), size = 1.2) +
  
  scale_color_manual(values = c("STOP" = "blue", "MOVE" = "red")) +

  coord_sf(crs = crs_lv95) +

  labs(
    title = "ArcGIS Earth Stops and Moves",
    color = "Movement",
    x = "Easting [m]",
    y = "Northing [m]") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8),
    axis.text = element_blank(), 
    axis.title = element_blank()
    )

Speed Exploratory Analysis

Code

# distribution of speed to cut boundaries
quantile(segments$speed_kmh[segments$moving_step],
         probs=c(0.5,0.9,0.95, 0.98),
         na.rm = TRUE)

       50%        90%        95%        98% 
  9.981582  45.577325  80.805970 203.669696

Code

# trim upper tail, with support of ChatGPT
upper <- quantile(segments$speed_kmh, 0.975, na.rm = TRUE)

segments_trim <- segments |>
  filter(speed_kmh <= upper)

As can be seen in the plots below, the majority of the fixes have a speed below 20 km/h, indicating high shares of foot travel or slow transport.

Code

# boxplot for distribution
boxplot(
  segments_trim$speed_kmh,
  main = "Distribution of Speed (km/h) – trimmed top 2.5%",
  ylab = "Speed (km/h)",
  col = "lightgreen"
)

Code

# histogram for distribution
hist(
  segments_trim$speed_kmh,
  main = "Distribution of Speed (km/h) – trimmed top 2.5%",
  ylab = "Speed (km/h)",
  col = "lightgreen"
)

Travel Mode Detection - Walking

Code

# In transportation planning, free flow pedestrian walking speed is assumed to be between 0.8 m/s and 1.5 m/s or 3 to 7 km/h. In our case, slower walking speeds could be possible since the city of Zurich can be rather densely packed and free flow speed might not be achieved. This is why we extended the lower threshold. 

# create only walking segments (speed between 1 and 7 km/h)
walking_segments <- segments |> 
  filter(segments$speed_kmh < 7 & segments$speed_kmh > 1 )

Construct Voronoi Polygons around meteoblue Stations

Code

# To assign each segment the nearest meteoblue measuring station, we construct Voronoi polygons, following the same approach as before.

# create geometry out of coordinates for meteoblue_stations
meteoblue_sf <- meteoblue_stations |> 
  st_as_sf(coords=c("lonDecimal", "latDecimal"), crs = crs_wgs84)

meteoblue_sf <- st_transform(meteoblue_sf, crs = crs_lv95)

Code

# create union of all station points and set city extent
pts_union <- st_union(st_geometry(meteoblue_sf))

envelope_zurich <- st_as_sfc(st_bbox(stadtgrenze))

Code

# create Voronoi tesselation and extract polygons out of it, with support of ChatGPT
voronoi_meteoblue <- st_voronoi(pts_union, envelope=envelope_zurich)

voronoi_polygons <- st_collection_extract(
  voronoi_meteoblue,
  "POLYGON"
)
# convert to sf object
voronoi_sf <- st_sf(
  geometry = st_cast(voronoi_polygons, "POLYGON")
)

Code

# assign station attributes to polygons
voronoi_sf$locationID <- meteoblue_sf$locationID

# clip to Zurich city boundary
voronoi_clipped_meteoblue <- st_intersection(
  voronoi_sf,
  stadtgrenze
)

Code

# join between voronoi stations and meteoblue weather data 
voronoi_weather <- voronoi_clipped_meteoblue |>
  left_join(
    meteoblue_weather_data,
    by = "locationID"
  )

Code

# visualisation
ggplot() +
  geom_sf(data = voronoi_clipped_meteoblue, aes(fill = locationID)) +
  
  geom_sf(data = stadtgrenze, fill = NA, color = "black") +
  
  geom_sf(data = meteoblue_sf, color = "red") +
  
  coord_sf(crs = crs_lv95) +
  
  scale_fill_viridis_d(option = "turbo") +
  
  guides(fill = "none") +
  
  labs(
    title = "Voronoi Polygons around meteoblue Stations in Zurich",
    fill = "Station",
    x = "Easting [m]",
    y = "Northing [m]"
  ) +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8),
    axis.text = element_blank(),
    axis.title = element_blank()
  )

Assign Segments to Voronoi Polygons

First, we assign each segment to voronoi polygon using a spatial join. Second, we also apply a temporal join to link the weather data of the same date and time as the trajectory. For that, we define a “representative time” for each segment, meaning the midtime (end_time-start_time)/2. The time difference between each segment and the weather data is computed and the weather entry with the lowest time difference is then selected for matching. Added to this, a threshold of 120 minutes is set, where time differences above it, are automatically excluded for a temporal join. Regarding the validation of the temporal join, the median time difference of 3.84 min indicates very high temporal linkage.

Code

# only look at point instead of segment
segment_points <- walking_segments |>
  st_cast("POINT")

# join with intersect
segment_points_weather <- st_join(
  segment_points,
  voronoi_clipped_meteoblue["locationID"],
  join = st_intersects
)

# aggregate back
meteoblue_segments <- segment_points_weather |>
  group_by(segment_id, locationID) |>
  summarise(geometry = st_union(geometry))

# clip to extent of Zurich
meteoblue_segments_clipped <- st_intersection(
  meteoblue_segments,
  stadtgrenze
)

Code

# visualize Movement Segments with meteoblue weather station polygons
ggplot() +
  geom_sf(data = voronoi_clipped_meteoblue, aes(fill = locationID), alpha = 0.3) +
  
  geom_sf(data = stadtgrenze, fill = NA, color = "black") +
  
  geom_sf(data = meteoblue_segments_clipped, color = "blue", size = 1) +
  
  coord_sf(crs = crs_lv95) +
  
  scale_fill_viridis_d(option = "turbo") +
  
  guides(fill = "none") +
  
  labs(
    title = "Movement Segments with meteoblue weather station polygons",
    x = "Easting",
    y = "Northing"
  ) +
  
  theme_minimal() +

  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8),
    axis.text = element_blank(),
    axis.title = element_blank()
  )

Temporal Join of Weather Data and Walking Segments

Code

# find most representative time (midtime) of each segment
segments <- segments |>
  mutate(
    mid_time = start_time + (end_time - start_time) / 2
  )

Code

# set as posix datetime object
meteoblue_weather_data <- meteoblue_weather_data |>
  mutate(timestamp = as.POSIXct(timestamp))

Code

# add spatial information to segments
segments_spatial <- st_join(
  segments,
  meteoblue_segments["locationID"],
  join = st_intersects
)

Code

# clean
segments_clean <- segments_spatial |>
  st_drop_geometry()

Code

segments_weather <- segments_clean |>
  left_join(meteoblue_weather_data, by = "locationID") |>
  mutate(
    time_diff = abs(as.numeric(difftime(mid_time, timestamp, units = "mins")))
  ) |>
  group_by(segment_id) |>
  slice_min(time_diff, n = 1, with_ties = FALSE) |>
  ungroup()

Code

# filter unrealistic values
segments_weather_clean <- segments_weather |>
  filter(time_diff <= 120) |>
  filter(
    speed_kmh >= quantile(speed_kmh, 0.02, na.rm = TRUE),
    speed_kmh <= quantile(speed_kmh, 0.98, na.rm = TRUE)
  )

Analyze effect of Precipitation

To analzye the effect of precipitation on walking speed, we opted for two different spatial and temporal scales of the weather data (small cross-scale analysis (Laube and Purves (2011))). First, conducted an analysis with weather data on a city-level. Second, we applied the same procedure, but with weather data on a station-level.

Code

# clean weather_hourly_df
rain_hourly_df <- weather_hourly_df |>
  filter(Parameter == "RainDur")

Code

# determine representative time for segment
walking_segments <- walking_segments |>
  mutate(
    mid_time = start_time + (end_time - start_time) / 2
  )

A) City-level Analysis

Code

# We decided on using the hourly weather data since it resembles reality closer than daily weather data. 

# timestamp set as datetime object
rain_city_hourly <- weather_hourly_df |>
  filter(Parameter == "RainDur") |>
  mutate(timestamp = as.POSIXct(Datum, tz = "Europe/Zurich")) |>
  group_by(timestamp) |>
  summarise(rain = mean(Wert, na.rm = TRUE), .groups = "drop")

Code

# temporal join, merge all segments with all weather data and keep closest
analysis_city <- walking_segments |>
  st_drop_geometry() |>
  mutate(mid_time = as.POSIXct(mid_time, tz = "Europe/Zurich")) |>
  crossing(rain_city_hourly) |>
  mutate(time_diff = abs(as.numeric(difftime(mid_time, timestamp, units = "mins")))) |>
  group_by(segment_id) |>
  slice_min(time_diff, n = 1) |>
  ungroup() |>
  filter(!is.na(rain))

B) Station-level Analysis

Code

# join segment with nearest station first
segments_station <- st_join(
  walking_segments,
  voronoi_clipped["ID"],
  join = st_nearest_feature
)

Code

segments_weather_rain <- segments_station |>
  st_drop_geometry() |>
  left_join(
    rain_hourly_df,
    by = c("ID" = "Standort")
  ) |>
  mutate(
    timestamp = as.POSIXct(Datum, tz = "Europe/Zurich"),
    time_diff = abs(as.numeric(
      difftime(mid_time, timestamp, units = "mins")
    ))
  ) |>
  group_by(segment_id) |>
  slice_min(time_diff, n = 1, with_ties = FALSE) |>
  ungroup()

Code

# temporal join, with segment merging with rain weather data
analysis_station <- segments_station |>
  st_drop_geometry() |>
  left_join(rain_hourly_df, by = c("ID" = "Standort")) |>
  mutate(time_diff = abs(as.numeric(difftime(mid_time, as.POSIXct(Datum), units = "mins")))) |>
  group_by(segment_id) |>
  slice_min(time_diff, n = 1) |>
  ungroup() |>
  filter(!is.na(Wert))

Code

# define rain vs. no rain
segments_weather_rain <- segments_weather_rain |>
  mutate(
    rain_flag = ifelse(Wert > 0, "rain", "no_rain")
  )

Code

# two stations have all NA-values => no rainfall data collection
segments_weather_rain |>
  filter(is.na(Wert)) |>
  count(ID)

# A tibble: 2 × 2
  ID                        n
  <chr>                 <int>
1 Zch_Heubeeribüel        186
2 Zch_Rosengartenbrücke   162

Code

analysis_data <- analysis_station |>
  mutate(rain_flag = ifelse(Wert > 0, "rain", "no_rain"))

Code

# group if rain or not and analyze speed respectively
analysis_data |>
  group_by(rain_flag) |>
  summarise(
    mean_speed = mean(speed_kmh, na.rm = TRUE),
    median_speed = median(speed_kmh, na.rm = TRUE),
    sd_speed = sd(speed_kmh, na.rm = TRUE),
    n = n()
  )

# A tibble: 2 × 5
  rain_flag mean_speed median_speed sd_speed     n
  <chr>          <dbl>        <dbl>    <dbl> <int>
1 no_rain         4.19         4.39     1.54   213
2 rain            3.68         3.47     1.66    16

Results

In this chapter we will present the performed statistical tests as well as their results and implications. Thanks to the different weather datasets and our own tracking data, coming from two different sources, we were able to achieve results that can later be used to discuss our research questions.

Statistical Analysis Movement Space vs. Weather

After preparing the data and creating all necessary auxiliary elements we are now finally able to test our hypothesis.

A) Daily - Voronoi Polygon

Code

# visualise the variable Temperature
ggplot(daily_hulls_weather, aes(x = Temperature, y = log(area_m2))) +
  
  geom_point(size = 1.2) +
  
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  
  labs(
    x = "Temperature [°C]",
    y = "log(Activity Space Area [m^2])",
    title = "Relationship Between Temperature and Activity Space Size",
    subtitle = "Linear regression with 95% confidence interval") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
    )

Code

# visualise the variable RainDur
ggplot(daily_hulls_weather, aes(x = RainDur, y = log(area_m2))) +
  
  geom_point(size = 1.2) +
  
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  
  labs(
    x = "Rain Duration [min]",
    y = "log(Activity Space Area [m^2])",
    title = "Relationship Between Rain Duration and Activity Space Size",
    subtitle = "Linear regression with 95% confidence interval") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
    )

We can visually assess that there is a positive relation between temperature and activity space and a negative relation between temperature and rain duration.

Code

# linear model with log(area)
lm1 <- lm(log(area_m2) ~ Temperature + RainDur, data = daily_hulls_weather)

Code

# full model summary (R^2, adjusted R^2, p-values, residual std error)
summary(lm1)


Call:
lm(formula = log(area_m2) ~ Temperature + RainDur, data = daily_hulls_weather)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.6620 -0.5310 -0.0180  0.8015  1.9240 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 13.0708461  0.6248266  20.919   <2e-16 ***
Temperature  0.0888042  0.0532322   1.668    0.104    
RainDur     -0.0004278  0.0014005  -0.305    0.762    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.278 on 37 degrees of freedom
  (3 Beobachtungen als fehlend gelöscht)
Multiple R-squared:  0.09951,   Adjusted R-squared:  0.05084 
F-statistic: 2.044 on 2 and 37 DF,  p-value: 0.1438

Code

# delete comments if one wants to inspect the plots closer
#par(mfrow = c(1, 1))
#plot(lm1, which = 1)  # Residuals vs Fitted
#plot(lm1, which = 2)  # Q-Q plot
#plot(lm1, which = 3)  # Scale-Location
#plot(lm1, which = 5)  # Residuals vs Leverage

This first model is not statistically significant as a whole (F-statistic p = 0.1438). Meaning Temperature and RainDur together do explain some variation in activity space but the low R^2 (0.1) and adjusted R^2 (0.05) suggests other factors (e.g. day of week, work/leisure day, season) that influence activity space more than weather.

There is a suggestive positive trend with Temperature, but we cannot confidently reject H0.

B) Daily - Field

Code

# visualise the variable Temperature
ggplot(daily_hulls_weather_field, aes(x = Temperature, y = log(area_m2))) +
  
  geom_point(size = 1.2) +
  
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  
  labs(
    x = "Temperature [°C]",
    y = "log(Activity Space Area [m^2])",
    title = "Relationship Between Temperature and Activity Space Size",
    subtitle = "Linear regression with 95% confidence interval") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
    )

Code

# visualise the variable RainDur
ggplot(daily_hulls_weather_field, aes(x = RainDur, y = log(area_m2))) +
  
  geom_point(size = 1.2) +
  
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  
  labs(
    x = "Rain Duration [min]",
    y = "log(Activity Space Area [m^2])",
    title = "Relationship Between Rain Duration and Activity Space Size",
    subtitle = "Linear regression with 95% confidence interval") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
    )

We see the same pattern with our new field (IDW) approach.

Code

# linear model with log(area)
lm2 <- lm(log(area_m2) ~ Temperature + RainDur, data = daily_hulls_weather_field)

Code

# full model summary (R^2, adjusted R^2, p-values, residual std error)
summary(lm2)


Call:
lm(formula = log(area_m2) ~ Temperature + RainDur, data = daily_hulls_weather_field)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5100 -0.5640  0.0028  0.7882  1.9362 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 13.2111085  0.6711081  19.686   <2e-16 ***
Temperature  0.0733416  0.0548604   1.337    0.190    
RainDur     -0.0008364  0.0012219  -0.684    0.498    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.272 on 36 degrees of freedom
  (4 Beobachtungen als fehlend gelöscht)
Multiple R-squared:  0.1021,    Adjusted R-squared:  0.05226 
F-statistic: 2.048 on 2 and 36 DF,  p-value: 0.1438

The P-value of the RainDur is quite lower than from lm1, indicating that the interpolation of precipitation duration is a more appropriate way of identifying a relationship between weather and movement.
Furthermore, the R^2 and Adjusted R^2 values are slightly higher, meaning that our interpolated model explains a bit more of the values than the model with the Voronoi polygons.

C) Hourly - Field

Code

# visualise the variable Temperature
ggplot(hourly_hulls_weather_field, aes(x = Temperature, y = log(area_m2))) +
  
  geom_point(size = 1.2) +
  
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  
  labs(
    x = "Temperature [°C]",
    y = "log(Activity Space Area [m^2])",
    title = "Relationship Between Temperature and Activity Space Size",
    subtitle = "Linear regression with 95% confidence interval") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
    )

Code

# visualise the variable RainDur
ggplot(hourly_hulls_weather_field, aes(x = RainDur, y = log(area_m2))) +
  
  geom_point(size = 1.2) +
  
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  
  labs(
    x = "Rain Duration [min]",
    y = "log(Activity Space Area [m^2])",
    title = "Relationship Between Rain Duration and Activity Space Size",
    subtitle = "Linear regression with 95% confidence interval") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
    )

Again, we see the same visual pattern now with the hourly data.

Code

# linear model with log(area)
lm3 <- lm(log(area_m2) ~ Temperature + RainDur, data = hourly_hulls_weather_field)

Code

# full model summary (R^2, adjusted R^2, p-values, residual std error)
summary(lm3)


Call:
lm(formula = log(area_m2) ~ Temperature + RainDur, data = hourly_hulls_weather_field)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.6746 -0.5891 -0.1289  0.9238  1.9224 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 13.69650    0.61238  22.366   <2e-16 ***
Temperature  0.02802    0.04826   0.581   0.5647    
RainDur     -0.08160    0.03394  -2.404   0.0209 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.243 on 40 degrees of freedom
Multiple R-squared:  0.1893,    Adjusted R-squared:  0.1488 
F-statistic:  4.67 on 2 and 40 DF,  p-value: 0.01504

We see a significant negative effect of RainDur on the movement area. The whole model explains much more of the variance since the Adjusted R^2 value is 0.15, meaning that 15% of the variance is explained with our model. Still, this is not a majority but a better model than the one with daily weather data.

If we have a look at for example the 24.03.2026 we can see the differences between aggregated daily and hourly weather data. The daily one says that on that date it was 8.1°C with a RainDur of 187 min, which is quite a lot. The hourly one says that it was 10.7°C with no rain. This means that while over the whole day, it rained quite a lot, we were only moving while it was not raining and slightly warmer, effectively being a great example for our research questions.

D) 15 min - Field

Code

# visualise the variable Temperature
ggplot(quarter_hourly_hulls_weather_field, aes(x = Temperature, y = log(area_m2))) +
  
  geom_point(size = 1.2) +
  
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  
  labs(
    x = "Temperature [°C]",
    y = "log(Activity Space Area [m^2])",
    title = "Relationship Between Temperature and Activity Space Size",
    subtitle = "Linear regression with 95% confidence interval") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
    )

Note: Here we only have 15 min temperature values and not 15 min rain values. This is why we earlier merged rain duration from hourly data into quarter-hourly temperature data.

Code

# linear model with log(area)
lm4 <- lm(log(area_m2) ~ Temperature + RainDur, data = quarter_hourly_hulls_weather_field_join)

Code

# full model summary (R^2, adjusted R^2, p-values, residual std error)
summary(lm4)


Call:
lm(formula = log(area_m2) ~ Temperature + RainDur, data = quarter_hourly_hulls_weather_field_join)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.6513 -0.5815 -0.1751  0.9693  1.9765 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 13.78406    0.58028  23.754   <2e-16 ***
Temperature  0.02200    0.04821   0.456   0.6505    
RainDur     -0.08392    0.03360  -2.498   0.0167 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.245 on 40 degrees of freedom
Multiple R-squared:  0.1867,    Adjusted R-squared:  0.146 
F-statistic: 4.591 on 2 and 40 DF,  p-value: 0.01603

This last model is the best we have. It shows that hourly RainDur has a statistically significant effect (p-value = 0.0167) and temperature has its lowest p-value across all models, indicating a more reliable measurement with the 15 min interval. Our model explains 15% of the variation (Adjusted R^2) and with an F-value of 4.6 and a p-value for the whole model of 0.016, this means that the model as a whole is significant. Thsi in turn means, that temperature (15 min measurements) and rain duration (hourly measurements) have an impact on our daily activity space.

Statistical Analysis - Influence of Weather on Walking

Influence on Walking Speed

Code

# visualize
ggplot(segments_weather_clean, aes(x = value, y = speed_kmh)) +
  
  geom_point(alpha = 0.3) +
  
  geom_smooth(method = "lm", se = TRUE, linewidth = 1) +
  
  coord_cartesian(ylim = c(0, 10)) +

  labs(
    x = "Temperature [°C]",
    y = "Speed [km/h]",
    title = "Relationship Between Temperature and Walking Speed",
    subtitle = "Linear regression with 95% confidence interval") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
    )

Code

# trend analysis
trend <- lm(speed_kmh ~ value, data = segments_weather_clean)

summary(trend)


Call:
lm(formula = speed_kmh ~ value, data = segments_weather_clean)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0406 -1.0729  0.2953  1.2193  2.6072 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.253933   0.161892  26.276   <2e-16 ***
value       -0.005095   0.014407  -0.354    0.724    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.513 on 272 degrees of freedom
  (5 Beobachtungen als fehlend gelöscht)
Multiple R-squared:  0.0004597, Adjusted R-squared:  -0.003215 
F-statistic: 0.1251 on 1 and 272 DF,  p-value: 0.7238

There is no relationship between walking speed and temperature when taking the whole dataset. This was to be expected, as shown in Introduction.

Code

# classification of different temperature groups
segments_weather_clean <- segments_weather_clean |>
  mutate(
    temp_group = cut(value,
                     breaks = quantile(value, probs = seq(0, 1, 0.25), na.rm = TRUE),
                     include.lowest = TRUE)
  )

lm(speed_kmh ~ temp_group, data = segments_weather_clean)


Call:
lm(formula = speed_kmh ~ temp_group, data = segments_weather_clean)

Coefficients:
          (Intercept)   temp_group(3.7,7.65]  temp_group(7.65,12.9]  
              4.03863                0.28577                0.41269  
temp_group(12.9,25.8]  
             -0.02544

Code

# visualize
ggplot(segments_weather_clean, aes(x = temp_group, y = speed_kmh)) +
  
  geom_boxplot(fill = "lightgreen") +
  
  geom_jitter(width = 0.2, alpha = 0.3) +
  
  labs(
    x = "Temperature Group",
    y = "Speed [km/h]",
    title = "Relationship Between Temperature Group and Walking Speed",
    subtitle = "Linear regression with 95% confidence interval") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
    )

Code

# test if difference between temp_groups is significant
anova(lm(speed_kmh ~ temp_group, data = segments_weather_clean))

Analysis of Variance Table

Response: speed_kmh
            Df Sum Sq Mean Sq F value Pr(>F)
temp_group   3   9.62  3.2057  1.4109 0.2399
Residuals  270 613.47  2.2721

Code

# label bins
segments_weather_clean <- segments_weather_clean |>
  mutate(temp_group = factor(temp_group,
                             levels = levels(temp_group),
                             labels = c("Cold (1-4°C)", "Cool (4-9°C)", "Mild (9-13°C)", "Warm (13-26°C)")))

# remove NA values
segments_weather_clean <- segments_weather_clean |>
  filter(!is.na(temp_group))

However, if we classify the segments into different temperature groups, a difference between the groups is visible. In general, higher temperatures are associated with lower walking speed, from 4.8km/h for cool temperatures to almost 4km/h for warm temperatures, except for very cold temperatures, where also slow walking speeds are identified.

The median walking speed varies modestly across temperature groups (4.1–4.8 km/h), but due to the large within-group spread this difference is not statistically significant (p-value = 0.24).

A) City-level Analysis

Based on the city-level weather data, we clearly see that with increasing rain duration, walking speed is reduced.

Code

# visualize
ggplot(analysis_city, aes(rain, speed_kmh)) +
  
  geom_point(alpha = 0.3) +
  
  geom_smooth(method = "lm") +

  labs(
    x = "Rain Duration [min]",
    y = "Speed [km/h]",
    title = "City-level Relationship Between Rain and Walking Speed",
    subtitle = "Linear regression with 95% confidence interval") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
    )

B) Station-level Analysis

Similarly to the city-level data, a negative trend is visible. However, with station-level weather data this trend is more pronounced, where with high precipitation duration walking speeds are below 3km/h.

Code

# visualize
ggplot(analysis_station, aes(Wert, speed_kmh)) +

  geom_point(alpha = 0.3) +
  
  geom_smooth(method = "lm") +

  labs(
    x = "Rain Duration [min]",
    y = "Speed [km/h]",
    title = "Station-level Relationship Between Rain and Walking Speed",
    subtitle = "Linear regression with 95% confidence interval") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
    )

The relationship between speed and rain is also tested using a wilcox test. With a p-value of 0.2229, no significant relationship can be found.

Code

# test if significant relationship
wilcox.test(speed_kmh ~ rain_flag, data = analysis_data)


    Wilcoxon rank sum test with continuity correction

data:  speed_kmh by rain_flag
W = 2016, p-value = 0.2229
alternative hypothesis: true location shift is not equal to 0

Code

# see if relationship
lm(speed_kmh ~ Wert, data = analysis_data)


Call:
lm(formula = speed_kmh ~ Wert, data = analysis_data)

Coefficients:
(Intercept)         Wert  
     4.2049      -0.0235

If we only include actual raining values (>0) for the fitting line, the relationship becomes even more visible.

Code

# visualize, but only values where there actually is rain (>0)
ggplot(analysis_data, aes(Wert, speed_kmh)) +
  
  geom_jitter(alpha = 0.2, height = 0) +
  
  geom_smooth(
    data = filter(analysis_data, Wert > 0),
    method = "lm",
    color = "blue") +

  labs(
    x = "Rain Duration [min]",
    y = "Speed [km/h]",
    title = "Relationship Between Rain and Walking Speed only for observations with rainfall > 0",
    subtitle = "Linear regression with 95% confidence interval ") +
  
  theme_minimal() +
  
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
    )

Discussion

RQ1: How does temperature affect walking speed?

On the first sight, it seems that there is no relationship between temperature and walking speed. However, when dividing the segments into temperature groups, clear differences become visible. In general, higher temperatures are associated with lower walking speed, from 4.8km/h for cool temperatures to almost 4km/h for warm temperatures except for very cold temperatures, where also slow walking speeds are identified. Since the coldest temperatures are close to zero, a possible explanation for the low speed with cold temperatures is the increased caution and thus reduced velocity due to an elevated danger of slipping. These patterns are in line with previous research, which highlighted a certain increase in walking with higher temperatures, while very cold conditions could lead to slower speeds (Montigny, Ling, and Zacharias (2012); Giannoulaki and Christoforou (2024)), as type of u-shaped curve (Yerkes and Dodson (1908)). However, the relationship did not prove to be significant, which is likely due to the small set of trajectories.

RQ2: How does precipitation affect walking speed?

Contrary to our expectations, increasing precipitation duration is associated with lower walking velocity. We hypothesized similarly to Giannoulaki and Christoforou (2024) that pedestrians would increase their speed under unfavourable weather to minimize exposure. However, other studies argue that precipitation reduces similarly to snow pavement quality and thus implies slower movement patterns. (Montigny, Ling, and Zacharias (2012); Giannoulaki and Christoforou (2024)). Overall, due to the very small amount of rain data caused by dry months and two stations not collecting any rain data, this part of the research question can not be adequately answered.

RQ3: How do temperature and precipitation affect the spatial extent of activity spaces?

Using our last model, lm4, with the 15 min temperature measurements and hourly precipitation measurements, these two factors influence the spatial extent of activity spaces, although their effects are relatively small. Temperature shows a consistent positive relationship with activity space size, but its effect is only clearly observed when measured at shorter temporal intervals. Precipitation has a stronger negative influence, with longer rain durations reducing activity space extent, which was shown to be statistically significant. Overall, precipitation appears to be the stronger limiting factor, while temperature contributes to explaining variations.

RQ4: Do different temporal and spatial scales of the weather data matter?

A) Temporal

The results demonstrate that weather data resolution is crucial, as hourly and 15 min measurements better capture actual movement conditions than daily averages. The explanatory power of the models increases as the temporal resolution becomes finer (daily –> hourly –> 15 min). Daily aggregated weather data shows weak and mostly non-significant relationships, suggesting that coarse temporal averages fail to capture the actual weather conditions experienced during movement.

Hourly data improved our model performances and revealed a statistically significant negative effect of precipitation. The strongest results are obtained at the 15 min scale (lm4). This model achieves the highest explanatory power and statistical significance which confirms our hypothesis that finer temporal scale improves the detection of weather effects.

B) Spatial

The comparison between the analysis with city-level and with station-level weather data, revealed that it can have an effect on the quantification of the relationship. Although in both cases a negative relationship between precipitation and walking speed was identified, the station-level analysis showed a more clear trend. This can be likely explained by closer matching of weather stations to a certain segment.

Limitations

This analysis has various limitations. First, the classification of segments with static thresholds into movement vs. non-movement is frequent, but recent research has shown that the selection of static thresholds is often not adequate and makes no distinction between different use cases. The proposed alternative would be a more flexible and user-adapted approach, which however is out of scope for this project (Bonavita, Guidotti, and Nanni (2022)).

Second, the low amount of trajectory and weather data means that our findings need to be interpreted with caution. This is especially the case for the analysis with precipitation data, as very few entries were available due to unusual dry conditions. Thus, the conducted tests and regression analysis have very limited power.

Third, important behavioural and environmental factors were not included. Trip purpose is likely a major determinant of mobility patterns, as walking behaviour is strongly driven by daily routines and situational needs (Montigny, Ling, and Zacharias (2012)). Additional influences such as sunlight, storms, weather complexity and urban context may also affect walking speed and behaviour (Giannoulaki and Christoforou (2024); Rotton, Shats, and Standers (1990); Shuichi et al. (2021)). Moreover, precipitation effects may vary over time, including delayed or anticipatory responses (Zhao et al. (2019)).

Fourth, data limitations remain in the movement representation itself, as noted in Methods (SDE). Additionally, concave hulls were calculated at daily scale only, even when weather data was available at finer resolutions, but this was a necessary simplification to maintain comparability across analyses. Also, rain duration does not capture rain intensity, which arguably might be even more important. Finally, we introduced a bias when we filtered our movements to the city of Zurich because when the weather is nice, we might have even moved out of the canton, increasing our movement space.

AI Statement

LLMs were used to improve the structure of sentences and grammar as well as creating and debugging code, especially for content and challenges that go beyond the scope of the course. All content, analysis, and arguments remain the author’s own and any information derived from external sources is properly cited.

Appendix

Wordcount

Disclaimer: To improve structure and clarity, We included headings and subheadings for the whole report, which are around 1800 characters. That is why the total number of characters is slightly over the maximum number of characters.

Code

wordcountaddin::word_count("index.qmd") # counts words

[1] 3265

Code

wordcountaddin::text_stats("index.qmd") # counts characters

Method	koRpus	stringi
Word count	3265	3235
Character count	21552	21518
Sentence count	219	Not available
Reading time	16.3 minutes	16.2 minutes

References

Bonavita, Andrea, Riccardo Guidotti, and Mirco Nanni. 2022. “Individual and Collective Stop-Based Adaptive Trajectory Segmentation.” Geoinformatica 26 (3): 451–77. https://doi.org/10.1007/s10707-021-00449-8.

Giannoulaki, Maria, and Zoi Christoforou. 2024. “Pedestrian Walking Speed Analysis: A Systematic Review.” Sustainability 16 (11): 4813.

Guo, Sini, Xiang Li, Wai-Ki Ching, Ralescu Dan, Wai-Keung Li, and Zhiwen Zhang. 2018. “GPS Trajectory Data Segmentation Based on Probabilistic Logic.” International Journal of Approximate Reasoning 103: 227–47.

Laube, Patrick. 2009. “Progress in Movement Pattern Analysis.” In Behaviour Monitoring and Interpretation - BMI - Smart Environments, edited by Björn Gottfried and Hamid Aghajan, 43–71. Amsterdam, The Netherlands: IOS Press.

Laube, Patrick, and Ross S Purves. 2011. “How Fast Is a Cow? Cross-Scale Analysis of Movement Data.” Transactions in GIS 15 (3): 401–18.

Montigny, Luc de, Richard Ling, and John Zacharias. 2012. “The Effects of Weather on Walking Rates in Nine Cities.” Environment and Behavior 44 (November): 821–40. https://doi.org/10.1177/0013916511409033.

Nathan, Ran, Wayne M Getz, Eloy Revilla, Marcel Holyoak, Ronen Kadmon, David Saltz, and Peter E Smouse. 2008. “A Movement Ecology Paradigm for Unifying Organismal Movement Research.” Proceedings of the National Academy of Sciences 105 (49): 19052–59.

Rotton, James, Mark Shats, and Robert Standers. 1990. “Temperature and Pedestrian Tempo: Walking Without Awareness.” Environment and Behavior 22 (5): 650–74.

Shuichi, Obuchi, Hisashi Kawai, Juan Garbalosa, Kazumasa Nishida, and Kenji Murakawa. 2021. “Walking Is Regulated by Environmental Temperature.” Scientific Reports 11 (June). https://doi.org/10.1038/s41598-021-91633-1.

Weibel, Robert. 2024. “Point Patterns - Polygon Delineation.” University of Zurich; Lecture slides, GEO 872 Advanced Spatial Analysis I.

Yerkes, Robert M., and John D. Dodson. 1908. “The Relation of Strength of Stimulus to Rapidity of Habit-Formation.” Journal of Comparative Neurology and Psychology 18 (5): 459–82. https://doi.org/https://doi-org.ezproxy.uzh.ch/10.1002/cne.920180503.

Zhao, Jinbao, Cong Guo, Ruhua Zhang, Dong Guo, and Mathew Palmer. 2019. “Impacts of Weather on Cycling and Walking on Twin Trails in Seattle.” Transportation Research Part D: Transport and Environment 77: 573–88.