DS4PP: Problem set 5

Author

Ninel Melkonyan

Published

November 4, 2025

Due to bcourses by the start of class Wednesday, November 12 at 8:30am.

Name your submission files ps5.qmd and ps5.html.

Apple Mobility Data

For this problem set, we’re going to explore mobility data aggregated by Apple during 2020. This exercise is based on materials from SICSS Boot Camp.

Let’s start by loading the data.

Load data

suppressMessages({
  library(tidyverse)
  library(lubridate)
})

load("Apple_Mobility_Data_Clean.Rdata")

ls()

[1] "apple_data"

str(apple_data)

tibble [25,415 × 4] (S3: tbl_df/tbl/data.frame)
 $ country            : chr [1:25415] "Argentina" "Argentina" "Argentina" "Argentina" ...
 $ transportation_type: chr [1:25415] "driving" "driving" "driving" "driving" ...
 $ date               : Date[1:25415], format: "2020-01-13" "2020-01-14" ...
 $ mobility_avg       : num [1:25415] 100 104 108 112 111 ...

Explore data

Now let’s take a quick look at the data set’s structure.

apple_data %>% str()

tibble [25,415 × 4] (S3: tbl_df/tbl/data.frame)
 $ country            : chr [1:25415] "Argentina" "Argentina" "Argentina" "Argentina" ...
 $ transportation_type: chr [1:25415] "driving" "driving" "driving" "driving" ...
 $ date               : Date[1:25415], format: "2020-01-13" "2020-01-14" ...
 $ mobility_avg       : num [1:25415] 100 104 108 112 111 ...

0. What does each row in this data set represent?

# Each row is one day for one country and one transportation type with the average mobility index for that combination.

Summary statistics function

1. Calculate the `mean()` of `mobility_avg` by `transportation_type` in July in the United States.

q1_us_july <- apple_data %>%
  filter(country == "United States", month(date) == 7) %>%
  group_by(transportation_type) %>%
  summarise(mean_mobility = mean(mobility_avg, na.rm = TRUE)) %>%
  arrange(transportation_type)

q1_us_july

# A tibble: 3 × 2
  transportation_type mean_mobility
  <chr>                       <dbl>
1 driving                     193. 
2 transit                      59.2
3 walking                     145.

2. Now, write a function that takes the `month` as input and returns the same information as above filtered for that month. Remember to use a descriptive name for your function and to set default arguments.

mean_mobility_by_month_us <- function(month_number = 7) {
  apple_data %>%
    filter(country == "United States", month(date) == month_number) %>%
    group_by(transportation_type) %>%
    summarise(mean_mobility = mean(mobility_avg, na.rm = TRUE)) %>%
    arrange(transportation_type)
}

# test
mean_mobility_by_month_us()

# A tibble: 3 × 2
  transportation_type mean_mobility
  <chr>                       <dbl>
1 driving                     193. 
2 transit                      59.2
3 walking                     145.

3. Use your function to calculate `mean()` of `mobility_avg` by `transportation_type` in the United States in August.

q3_us_aug <- mean_mobility_by_month_us(8)
q3_us_aug

# A tibble: 3 × 2
  transportation_type mean_mobility
  <chr>                       <dbl>
1 driving                     193. 
2 transit                      61.0
3 walking                     154.

4. Modify your function to also take `country_name` as an input (i.e. make it so that your function returns the `mean()` of `mobility_avg` by `transportation_type` for any month and country you supply as arguments). Make July and United States the default arguments.

mean_mobility_country_month <- function(month_number = 7,
                                        country_name = "United States") {
  apple_data %>%
    filter(country == country_name, month(date) == month_number) %>%
    group_by(transportation_type) %>%
    summarise(mean_mobility = mean(mobility_avg, na.rm = TRUE)) %>%
    arrange(transportation_type)
}

# test default
mean_mobility_country_month()

# A tibble: 3 × 2
  transportation_type mean_mobility
  <chr>                       <dbl>
1 driving                     193. 
2 transit                      59.2
3 walking                     145.

5. Use your function to calculate the `mean()` of `mobility_avg` by `transportation_type` for Italy in May.

q5_it_may <- mean_mobility_country_month(month_number = 5, country_name = "Italy")
q5_it_may

# A tibble: 3 × 2
  transportation_type mean_mobility
  <chr>                       <dbl>
1 driving                      54.6
2 transit                      16.2
3 walking                      38.6

Plotting function

6. Make a plot of `mobility_avg` over time for the United States. Make sure to include an informative title.

p_us <- apple_data %>%
  filter(country == "United States") %>%
  ggplot(aes(date, mobility_avg, color = transportation_type)) +
  geom_line(size = 1) +
  labs(
    title = "Apple Mobility Trends in the United States in 2020",
    x = "Date",
    y = "Mobility Average (baseline 100)",
    color = "Transportation type"
  ) +
  theme_minimal()

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

p_us

7. Write a function that takes `country_name` as an argument and recreates the plot above for any country. Make sure your formatting automatically updates to include the correct country’s name in the title as well.

plot_mobility_country <- function(country_name) {
  apple_data %>%
    filter(country == country_name) %>%
    ggplot(aes(date, mobility_avg, color = transportation_type)) +
    geom_line(size = 1) +
    labs(
      title = paste("Apple Mobility Trends in", country_name, "in 2020"),
      x = "Date",
      y = "Mobility Average (baseline 100)",
      color = "Transportation type"
    ) +
    theme_minimal()
}

# quick test
plot_mobility_country("Italy")

8. Use your plotting function to plot `mobility_avg` over time for three other countries of your choice.

plot_mobility_country("Japan")

plot_mobility_country("Brazil")

plot_mobility_country("Canada")

Create Multiple Plots with Iteration

Finally, let’s use a for loop to apply our plotting function multiple times and create plots for each country included in the Apple Mobility data. Please read all of the following instructions and hints before you get started!:

You will need to iterate over a list of unique country names as they appear in the mobility data. You should be able to obtain this list without typing out all of the names by hand. Consider using group_by() with summarize(), unique(), or distinct() to generate a list of countries that appear in the mobility data.
Make sure to save each plot to the global environment with a distinctive name. Consider using assign() to ensure objects are saved to the global environment rather than the temporary environment created within your for loop and to allow for flexibly naming plots after the countries they display.
Once you’ve created all of the plots and saved them to the global environment, display three plots of your choosing (but not the same ones you displayed previously) in your write up to confirm that they are properly formatted.
Please also print a list of all of the objects in your global environment (recall the ls() function) to confirm that plots for each country are present.

# list of unique country names present in the data
country_list <- apple_data %>% distinct(country) %>% pull(country)

# create a plot for each country and save with a distinct name
for (c in country_list) {
  plot_obj <- plot_mobility_country(c)
  safe_name <- paste0("plot_", gsub("[^A-Za-z0-9]+", "_", c))
  assign(safe_name, plot_obj, envir = .GlobalEnv)
}

# display three plots that you did not display above
plot_Argentina

plot_Spain

plot_Germany

# confirm that the plots exist in the global environment
ls(pattern = "^plot_")

 [1] "plot_Argentina"            "plot_Australia"           
 [3] "plot_Austria"              "plot_Belgium"             
 [5] "plot_Brazil"               "plot_Canada"              
 [7] "plot_Chile"                "plot_Czech_Republic"      
 [9] "plot_Denmark"              "plot_Egypt"               
[11] "plot_Finland"              "plot_France"              
[13] "plot_Germany"              "plot_Greece"              
[15] "plot_Hungary"              "plot_India"               
[17] "plot_Indonesia"            "plot_Ireland"             
[19] "plot_Israel"               "plot_Italy"               
[21] "plot_Japan"                "plot_Luxembourg"          
[23] "plot_Malaysia"             "plot_Mexico"              
[25] "plot_mobility_country"     "plot_Morocco"             
[27] "plot_NA"                   "plot_Netherlands"         
[29] "plot_New_Zealand"          "plot_Norway"              
[31] "plot_obj"                  "plot_Philippines"         
[33] "plot_Poland"               "plot_Portugal"            
[35] "plot_Republic_of_Korea"    "plot_Romania"             
[37] "plot_Russia"               "plot_Saudi_Arabia"        
[39] "plot_Slovakia"             "plot_South_Africa"        
[41] "plot_Spain"                "plot_Sweden"              
[43] "plot_Switzerland"          "plot_Taiwan"              
[45] "plot_Thailand"             "plot_Turkey"              
[47] "plot_United_Arab_Emirates" "plot_United_Kingdom"      
[49] "plot_United_States"        "plot_Vietnam"

Results & Discussion

United States mobility averages varied by mode across July and August 2020. Driving rebounded first after the initial lockdowns, followed by walking, while transit recovered more slowly. Across countries such as Italy, Japan, and Brazil, mobility dropped sharply around March 2020 and gradually rebounded through the summer. These trends reflect how different nations’ responses to COVID-19 affected public movement and transportation behavior. The trends observed are consistent with expectations from mobility restrictions in early 2020.

Apple Mobility Data

Load data

Explore data

0. What does each row in this data set represent?

Summary statistics function

1. Calculate the mean() of mobility_avg by transportation_type in July in the United States.

2. Now, write a function that takes the month as input and returns the same information as above filtered for that month. Remember to use a descriptive name for your function and to set default arguments.

3. Use your function to calculate mean() of mobility_avg by transportation_type in the United States in August.

4. Modify your function to also take country_name as an input (i.e. make it so that your function returns the mean() of mobility_avg by transportation_type for any month and country you supply as arguments). Make July and United States the default arguments.

5. Use your function to calculate the mean() of mobility_avg by transportation_type for Italy in May.