Introduction

The city of Barcelona provides various resources that track air quality through numerous measurement stations scattered across the city, is possible to access data about air quality, details about the measurement stations, and specifics regarding the pollutants monitored, these resources are open to everyone and we will utilizate these for our analysis, they can be accessed through the city’s Open Data Portal

The insights gleaned from the analysis have been instrumental in understanding the triggers of atmospheric pollution episodes, the direct effects of reduced human activity during the COVID-19 pandemic on air pollution, and the complex relationships between various weather conditions and pollutant levels, this project has not only provided a detailed look at the factors affecting Barcelona’s air quality but also offered actionable recommendations to improve urban air management practices.

In our work we will examine the state of air quality in Barcelona by analyzing data across several years, focusing particularly on periods of high pollution and assessing the impact of human activities and weather. We will create an analytical Report, that will be included also description of the methodologies adopted for data collection, cleaning, and analysis, the data analysis results, with all findings from our analysis, including the identification of pollution episodes, impacts of human activity on air quality, and the effects of weather conditions on pollutant level and a final section with a summary of our insights, along with strategic recommendations for enhancing air quality policies in Barcelona and suggestions for improving the use of open data in environmental studies.

Using Shiny we will develop interactive applications that allows users to visualize the evolution of air quality over time, patterns and correlations, in that sense it will be very easy to understand complex data through interactive graphs, and users will be able to manipulate some variables and time frames to observe different scenarios of air quality and weather, enhancing the interactive experience.

Methodology: Data preparation

To carry the objectives we have created a dataset called ‘qualitat_aire’ which contains the measurements of all Barcelona stations for all pollutants from January 20202 to March 2024. Below we´ll explain how we got this dataset. The first operation was loading all the tables contained in the followed link, each table contains monthly measurements of pollutants at all stations in the city: https://opendata-ajuntament.barcelona.cat/data/en/dataset/qualitat-aire-detall-bcn

To speed up the process we created a list where each element of the list is a table related to a certain month:

directory <- "QUALITAT_AIRE"
files <- list.files(path = directory, pattern = "\\.csv$", full.names = TRUE)
file_names <- tools::file_path_sans_ext(basename(files))
qa <- lapply(setNames(files, file_names), read.csv)
rm(directory)
rm(files)
rm(file_names)

Each table contains a column with the specific day, month and year, through this code in each of the tables we create a new column called DATE that contains the date written in format “ANY-MES-DIA”

for (i in seq_along(qa)) {
  qa[[i]]$DATE <- as.Date(paste(qa[[i]]$ANY, qa[[i]]$MES, qa[[i]]$DIA, sep = "-"))
  qa[[i]] <- qa[[i]][, !names(qa[[i]]) %in% c("ANY", "MES", "DIA")]
  qa[[i]] <- qa[[i]][, c("DATE", setdiff(names(qa[[i]]), "DATE"))]
}
rm(i)

Now we create a single dataset using the command “bind_rows” and analyze the structure by eliminating the columns that do not provide any useful information for our goal.

qualitat_aire <- bind_rows(qa, .id = "source") |>
  arrange(DATE)
qualitat_aire <- qualitat_aire %>%
  mutate(across(c(CODI_PROVINCIA, PROVINCIA, CODI_MUNICIPI, MUNICIPI, ESTACIO, 
                  CODI_CONTAMINANT), as.factor))

levels(qualitat_aire$CODI_PROVINCIA)
levels(qualitat_aire$PROVINCIA)
levels(qualitat_aire$CODI_MUNICIPI)
levels(qualitat_aire$MUNICIPI)
levels(qualitat_aire$ESTACIO)
levels(qualitat_aire$CODI_CONTAMINANT)

qualitat_aire<-qualitat_aire|>
  select(-c(1, 3, 4, 5, 6, starts_with("V")))

str(as_tibble(qualitat_aire))

## tibble [88,947 × 27] (S3: tbl_df/tbl/data.frame)
##  $ DATE            : Date[1:88947], format: "2020-01-01" "2020-01-01" ...
##  $ ESTACIO         : Factor w/ 8 levels "4","42","43",..: 1 1 1 1 2 2 2 3 3 3 ...
##  $ CODI_CONTAMINANT: Factor w/ 23 levels "1","2","6","7",..: 4 5 7 8 4 5 8 1 3 4 ...
##  $ H01             : num [1:88947] 3 27 18 31 2 18 20 1 0.2 8 ...
##  $ H02             : num [1:88947] 3 31 13 36 3 24 28 1 0.2 10 ...
##  $ H03             : num [1:88947] 2 27 10 31 2 23 27 1 0.2 2 ...
##  $ H04             : num [1:88947] 1 15 9 17 2 15 18 1 0.2 18 ...
##  $ H05             : num [1:88947] 2 22 8 25 2 15 18 1 0.2 7 ...
##  $ H06             : num [1:88947] 6 32 12 41 2 17 20 1 0.2 7 ...
##  $ H07             : num [1:88947] 12 32 17 51 1 15 18 1 0.2 5 ...
##  $ H08             : num [1:88947] 20 30 18 60 2 22 26 1 0.2 12 ...
##  $ H09             : num [1:88947] 22 29 19 63 8 31 42 1 0.2 15 ...
##  $ H10             : num [1:88947] 17 26 19 51 10 28 43 1 0.2 12 ...
##  $ H11             : num [1:88947] 15 22 16 45 14 27 47 1 0.2 12 ...
##  $ H12             : num [1:88947] 13 20 15 40 6 18 28 1 0.2 8 ...
##  $ H13             : num [1:88947] 11 20 14 36 6 15 24 1 0.2 8 ...
##  $ H14             : num [1:88947] 9 22 14 35 5 13 20 1 0.2 17 ...
##  $ H15             : num [1:88947] 8 23 14 35 5 16 24 1 0.2 22 ...
##  $ H16             : num [1:88947] 5 22 13 29 4 20 26 1 0.2 13 ...
##  $ H17             : num [1:88947] 2 22 14 26 3 24 28 1 0.2 18 ...
##  $ H18             : num [1:88947] 1 24 8 26 1 23 25 1 0.2 17 ...
##  $ H19             : num [1:88947] 1 32 9 34 1 24 26 1 0.2 15 ...
##  $ H20             : num [1:88947] 18 64 13 92 4 36 42 1 0.3 23 ...
##  $ H21             : num [1:88947] 40 66 21 127 7 44 54 1 0.3 23 ...
##  $ H22             : num [1:88947] 44 63 22 131 2 27 31 1 0.2 7 ...
##  $ H23             : num [1:88947] 31 56 24 103 14 55 76 1 0.2 14 ...
##  $ H24             : num [1:88947] 33 48 24 98 17 50 76 1 0.3 20 ...

Now let’s improve the dataset obtained by adding information about pollutants. In fact, in the dataset there is only the polluting code without the name and the unit of measure. To do this we have downloaded the table from this link: https://opendata-ajuntament.barcelona.cat/data/en/dataset/contaminants-estacions-mesura-qualitat-aire

contaminants <- read.csv("qualitat_aire_contaminants.csv")

With a left_join we combine the dataset “qualitat_aire” and the new dataset “contaminants” thanks to the column CODI_CONTAMINANTS. We also modify the POLLUTANTS column to show both the name of the pollutants and the unit of measurement.

contaminants<-contaminants|>
  rename(CODI_CONTAMINANT = Codi_Contaminant)
contaminants$CODI_CONTAMINANT<-as.factor(contaminants$CODI_CONTAMINANT)
contaminants
qualitat_aire <- qualitat_aire |>
  left_join(contaminants, by = "CODI_CONTAMINANT") |>
  relocate(c(Desc_Contaminant, Unitats), .after = CODI_CONTAMINANT)|>
  mutate(POLLUTANTS = paste(Desc_Contaminant, "(",Unitats,")"))|>
  relocate(POLLUTANTS,.after = Unitats)|>
  select(-c(Desc_Contaminant,Unitats))

Then we perform a similar procedure to add information about the location of the stations thanks to a table that can be found at this link: https://opendata-ajuntament.barcelona.cat/date/en/dataset/qualitat-aire-estacions-bcn

directory<- "ESTACIONS_QUALITAT_AIRE"
files<- list.files(path = directory, pattern = "\\.csv$", full.names = TRUE)
file_names <- tools::file_path_sans_ext(basename(files))
list_estacions_qa<- lapply(setNames(files, file_names), read.csv)
rm(directory)
rm(files)
rm(file_names)
for (i in seq_along(list_estacions_qa)) {
  estacion <- list_estacions_qa[[i]]  
  estacion <- estacion[, 1:2]
  names(estacion)[1] <- "ESTACIO"
  list_estacions_qa[[i]] <- estacion
}
rm(estacion)
rm(i)

We combine two datasets, list_estacions_qa and qualitat_aire, then remove duplicate rows from the resulting dataset. Next, we convert the ESTACIO variable in the dataset to a factor. We then perform a left join with the qualitat_aire dataset based on the ESTACIO variable. After that, we rearrange the columns by moving the nom_cabina column next to the ESTACIO column.

estacions_qa <- bind_rows(list_estacions_qa, .id = "source")|>
  select(-1)|>
  distinct()
estacions_qa$ESTACIO<-as.factor(estacions_qa$ESTACIO)
estacions_qa
qualitat_aire <- qualitat_aire |>
  left_join(estacions_qa, by = "ESTACIO") |>
  relocate(nom_cabina, .after = ESTACIO)|>
  mutate(ESTACIO = paste(ESTACIO, "-", nom_cabina))|>
  select(-c(nom_cabina))

Then, we proceed to calculate the daily mean for each day, taking into account the full 24-hour period.

qualitat_aire <- qualitat_aire %>%
  mutate(across(c(ESTACIO, POLLUTANTS), as.factor))|>
  mutate(Daily_mean = rowMeans(select(qualitat_aire, starts_with("H")), na.rm = TRUE))
  
levels(qualitat_aire$ESTACIO)
levels(qualitat_aire$CODI_CONTAMINANT)
levels(qualitat_aire$POLLUTANTS)

We proceed to filter out rows where the CODI_CONTAMINANT column contains values with no name or unit and where the POLLUTANTS column is “NA”, subsequently, we calculate the mean value for each row across columns starting with “H” and assign the result to a new column named Daily_mean. Finally, we count the number of missing values in each column and print the result. Note that there is no NA in the final data.

qualitat_aire <- qualitat_aire %>%
  filter(!(CODI_CONTAMINANT %in% c(996, 997, 998)))|>
  filter(!(POLLUTANTS == "NA ( NA )"))|>
  fill(everything())

qualitat_aire <- qualitat_aire %>%
  mutate(Daily_mean = rowMeans(select(qualitat_aire, starts_with("H")), 
                               na.rm = TRUE))

na_count <- colSums(is.na(qualitat_aire))
print(na_count)

##             DATE          ESTACIO CODI_CONTAMINANT       POLLUTANTS 
##                0                0                0                0 
##              H01              H02              H03              H04 
##                0                0                0                0 
##              H05              H06              H07              H08 
##                0                0                0                0 
##              H09              H10              H11              H12 
##                0                0                0                0 
##              H13              H14              H15              H16 
##                0                0                0                0 
##              H17              H18              H19              H20 
##                0                0                0                0 
##              H21              H22              H23              H24 
##                0                0                0                0 
##       Daily_mean 
##                0

Lastly, the final outcome is as follows:

str(as_tibble(qualitat_aire))

## tibble [79,824 × 29] (S3: tbl_df/tbl/data.frame)
##  $ DATE            : Date[1:79824], format: "2020-01-01" "2020-01-01" ...
##  $ ESTACIO         : Factor w/ 8 levels "4 - Barcelona - Poblenou",..: 1 1 1 1 2 2 2 3 3 3 ...
##  $ CODI_CONTAMINANT: Factor w/ 23 levels "1","2","6","7",..: 4 5 7 8 4 5 8 1 3 4 ...
##  $ POLLUTANTS      : Factor w/ 22 levels "Biomassa Black Carbon ( % )",..: 9 11 17 13 9 11 13 21 3 9 ...
##  $ H01             : num [1:79824] 3 27 18 31 2 18 20 1 0.2 8 ...
##  $ H02             : num [1:79824] 3 31 13 36 3 24 28 1 0.2 10 ...
##  $ H03             : num [1:79824] 2 27 10 31 2 23 27 1 0.2 2 ...
##  $ H04             : num [1:79824] 1 15 9 17 2 15 18 1 0.2 18 ...
##  $ H05             : num [1:79824] 2 22 8 25 2 15 18 1 0.2 7 ...
##  $ H06             : num [1:79824] 6 32 12 41 2 17 20 1 0.2 7 ...
##  $ H07             : num [1:79824] 12 32 17 51 1 15 18 1 0.2 5 ...
##  $ H08             : num [1:79824] 20 30 18 60 2 22 26 1 0.2 12 ...
##  $ H09             : num [1:79824] 22 29 19 63 8 31 42 1 0.2 15 ...
##  $ H10             : num [1:79824] 17 26 19 51 10 28 43 1 0.2 12 ...
##  $ H11             : num [1:79824] 15 22 16 45 14 27 47 1 0.2 12 ...
##  $ H12             : num [1:79824] 13 20 15 40 6 18 28 1 0.2 8 ...
##  $ H13             : num [1:79824] 11 20 14 36 6 15 24 1 0.2 8 ...
##  $ H14             : num [1:79824] 9 22 14 35 5 13 20 1 0.2 17 ...
##  $ H15             : num [1:79824] 8 23 14 35 5 16 24 1 0.2 22 ...
##  $ H16             : num [1:79824] 5 22 13 29 4 20 26 1 0.2 13 ...
##  $ H17             : num [1:79824] 2 22 14 26 3 24 28 1 0.2 18 ...
##  $ H18             : num [1:79824] 1 24 8 26 1 23 25 1 0.2 17 ...
##  $ H19             : num [1:79824] 1 32 9 34 1 24 26 1 0.2 15 ...
##  $ H20             : num [1:79824] 18 64 13 92 4 36 42 1 0.3 23 ...
##  $ H21             : num [1:79824] 40 66 21 127 7 44 54 1 0.3 23 ...
##  $ H22             : num [1:79824] 44 63 22 131 2 27 31 1 0.2 7 ...
##  $ H23             : num [1:79824] 31 56 24 103 14 55 76 1 0.2 14 ...
##  $ H24             : num [1:79824] 33 48 24 98 17 50 76 1 0.3 20 ...
##  $ Daily_mean      : num [1:79824] 13.29 32.29 15.17 52.62 5.12 ...

This dataframe will serve as the foundamental dataset on which we will base our work for each objective of this project, manipulating it specifically according to the requirements of each one.

Objective 1: Athmospheric Pollution Episodes

Procedure

When levels of NO2 and PM10 reach a certain threshold, authorities classify it as an atmospheric pollution episode. To determine the atmospheric pollution episodes detected at each station, we must first filter the data in order to select only the values for this two contminants as its shown:

NO2

When the hourly NO2 level exceeds 200 µg/m³ at two or more monitoring stations and the forecast indicates an unfavorable trend, we consider this scenario. Simultaneously, we exclude certain variables to clean and streamline the working dataset.

qual_air <- read.csv("qualitat_aire.csv")|>
  as_tibble()
qual_air2 <- qual_air|>
  select(- CODI_CONTAMINANT, -c(starts_with("V")))|>
  filter(POLLUTANTS %in% c("NO2 ( µg/m³ )"))
qual_air2

## # A tibble: 12,336 × 27
##    DATE       ESTACIO POLLUTANTS   H01   H02   H03   H04   H05   H06   H07   H08
##    <chr>      <chr>   <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 2020-01-01 4 - Ba… NO2 ( µg/…    27    31    27    15    22    32    32    30
##  2 2020-01-01 42 - B… NO2 ( µg/…    18    24    23    15    15    17    15    22
##  3 2020-01-01 43 - B… NO2 ( µg/…    24    25    23    35    32    34    35    37
##  4 2020-01-01 44 - B… NO2 ( µg/…    39    53    55    49    47    46    39    34
##  5 2020-01-01 50 - B… NO2 ( µg/…    32    34    38    35    32    32    28    30
##  6 2020-01-01 54 - B… NO2 ( µg/…    20    22    24    42    44    43    42    29
##  7 2020-01-01 57 - B… NO2 ( µg/…    NA    15    29    23    13    16    14    26
##  8 2020-01-01 58 - B… NO2 ( µg/…     5     4     4     4     3     4     4     6
##  9 2020-01-02 4 - Ba… NO2 ( µg/…    37    30    26    23    21    22    24    28
## 10 2020-01-02 42 - B… NO2 ( µg/…    33    31    37    32    29    27    30    37
## # ℹ 12,326 more rows
## # ℹ 16 more variables: H09 <dbl>, H10 <dbl>, H11 <dbl>, H12 <dbl>, H13 <dbl>,
## #   H14 <dbl>, H15 <dbl>, H16 <dbl>, H17 <dbl>, H18 <dbl>, H19 <dbl>,
## #   H20 <dbl>, H21 <dbl>, H22 <dbl>, H23 <dbl>, H24 <dbl>

After that, we filter for only the hourly values that are greater than or equal to 200 µg/m³.

qual_air3 <- qual_air2|>
  filter(H01 > 199 | H02 > 199 | H03 > 199 | H04 > 199 | H05 > 199 | H06 > 199 | 
           H07 > 199 | H08 > 199 | H09 > 199 | H10 > 199 | H11 > 199 | H12 > 199 |
           H13 > 199 | H14 > 199 | H15 > 199 | H16 > 199 | H17 > 199 | H18 > 199 | 
           H19 > 199 | H20 > 199 | H21 > 199 | H22 > 199 | H23 >199 | H24 >199)
qual_air3

## # A tibble: 0 × 27
## # ℹ 27 variables: DATE <chr>, ESTACIO <chr>, POLLUTANTS <chr>, H01 <dbl>,
## #   H02 <dbl>, H03 <dbl>, H04 <dbl>, H05 <dbl>, H06 <dbl>, H07 <dbl>,
## #   H08 <dbl>, H09 <dbl>, H10 <dbl>, H11 <dbl>, H12 <dbl>, H13 <dbl>,
## #   H14 <dbl>, H15 <dbl>, H16 <dbl>, H17 <dbl>, H18 <dbl>, H19 <dbl>,
## #   H20 <dbl>, H21 <dbl>, H22 <dbl>, H23 <dbl>, H24 <dbl>

In this section, if there are no NO2 values greater than or equal to 200 µg/m³, indicating no episodes detected, we proceed to calculate the daily average of the NO2 pollutant using the qual_air2 table. We utilize the fill(everything()) function to fill missing values in the table by using the last non-missing value encountered in each column.

hours_no2 <- qual_air2|>
  fill(everything())|>
  select(DATE, starts_with("H")) #devo fare una media per ogni row (tutte le ore di una giornata)
hours_no2

## # A tibble: 12,336 × 25
##    DATE    H01   H02   H03   H04   H05   H06   H07   H08   H09   H10   H11   H12
##    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 2020…    27    31    27    15    22    32    32    30    29    26    22    20
##  2 2020…    18    24    23    15    15    17    15    22    31    28    27    18
##  3 2020…    24    25    23    35    32    34    35    37    37    34    30    24
##  4 2020…    39    53    55    49    47    46    39    34    27    22    19    21
##  5 2020…    32    34    38    35    32    32    28    30    30    24    32    20
##  6 2020…    20    22    24    42    44    43    42    29    21    16     9     8
##  7 2020…    20    15    29    23    13    16    14    26    32    35    22    22
##  8 2020…     5     4     4     4     3     4     4     6     6     6     7     9
##  9 2020…    37    30    26    23    21    22    24    28    29    30    35    40
## 10 2020…    33    31    37    32    29    27    30    37    43    42    44    26
## # ℹ 12,326 more rows
## # ℹ 12 more variables: H13 <dbl>, H14 <dbl>, H15 <dbl>, H16 <dbl>, H17 <dbl>,
## #   H18 <dbl>, H19 <dbl>, H20 <dbl>, H21 <dbl>, H22 <dbl>, H23 <dbl>, H24 <dbl>

Let’s compute both the row mean and the daily mean values.

mean_rows_no2 <- hours_no2|>  
  rowwise()|>
  mutate(mean_row = mean(c_across(starts_with("H")), na.rm = TRUE))
mean_rows_no2

## # A tibble: 12,336 × 26
## # Rowwise: 
##    DATE    H01   H02   H03   H04   H05   H06   H07   H08   H09   H10   H11   H12
##    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 2020…    27    31    27    15    22    32    32    30    29    26    22    20
##  2 2020…    18    24    23    15    15    17    15    22    31    28    27    18
##  3 2020…    24    25    23    35    32    34    35    37    37    34    30    24
##  4 2020…    39    53    55    49    47    46    39    34    27    22    19    21
##  5 2020…    32    34    38    35    32    32    28    30    30    24    32    20
##  6 2020…    20    22    24    42    44    43    42    29    21    16     9     8
##  7 2020…    20    15    29    23    13    16    14    26    32    35    22    22
##  8 2020…     5     4     4     4     3     4     4     6     6     6     7     9
##  9 2020…    37    30    26    23    21    22    24    28    29    30    35    40
## 10 2020…    33    31    37    32    29    27    30    37    43    42    44    26
## # ℹ 12,326 more rows
## # ℹ 13 more variables: H13 <dbl>, H14 <dbl>, H15 <dbl>, H16 <dbl>, H17 <dbl>,
## #   H18 <dbl>, H19 <dbl>, H20 <dbl>, H21 <dbl>, H22 <dbl>, H23 <dbl>,
## #   H24 <dbl>, mean_row <dbl>

mean_day_no2 <- mean_rows_no2|>
  group_by(DATE)|>
  summarise(mean = sum(mean_row)/  n() )
mean_day_no2

## # A tibble: 1,542 × 2
##    DATE        mean
##    <chr>      <dbl>
##  1 2020-01-01  30.8
##  2 2020-01-02  32.2
##  3 2020-01-03  32.6
##  4 2020-01-04  32.6
##  5 2020-01-05  32.0
##  6 2020-01-06  34.7
##  7 2020-01-07  45.1
##  8 2020-01-08  52.0
##  9 2020-01-09  44.4
## 10 2020-01-10  42.5
## # ℹ 1,532 more rows

We can now examine the plot, which illustrates that there are no instances of NO2 episodes in any case.

mean_day_no2|>
  ggplot(aes(DATE, mean,  group = 1) ) +
  geom_line(show.legend = TRUE) +
  geom_hline(yintercept = 200, linetype = "dashed", colour = "red" ) +
  labs(x = "date", y = "NO2 ( µg/m³ )", title = "NO2 ( µg/m³ ) VARIATION DURING THE ANALYZED PERIOD") +
  theme_minimal()

PM10

First scenario

For this other contaminant, a first case that we are evaluating occurs when the daily emission value of 80 µg/m³ for PM10 is exceeded the previous day at two or more stations, and the forecast indicates an unfavorable trend in the levels. Let´s take a look how we can filter the data for this conditions:

qual_air4 <- qual_air|>
  select(- CODI_CONTAMINANT, -c(starts_with("V")))|>
  filter(POLLUTANTS %in% c("PM10 ( µg/m³ )"))
qual_air4

## # A tibble: 9,252 × 27
##    DATE       ESTACIO POLLUTANTS   H01   H02   H03   H04   H05   H06   H07   H08
##    <chr>      <chr>   <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 2020-01-01 4 - Ba… PM10 ( µg…    18    13    10     9     8    12    17    18
##  2 2020-01-01 43 - B… PM10 ( µg…    13    26    14    14    14    15    13    23
##  3 2020-01-01 44 - B… PM10 ( µg…    16    17    17    17    17    14    16    11
##  4 2020-01-01 54 - B… PM10 ( µg…    16    14    14    22    21    13    16    12
##  5 2020-01-01 57 - B… PM10 ( µg…    NA    10    12    14    11    12    11    12
##  6 2020-01-01 58 - B… PM10 ( µg…     9     8     8    12    10    10    12    19
##  7 2020-01-02 4 - Ba… PM10 ( µg…    18    10    12    10    10    13    19    21
##  8 2020-01-02 43 - B… PM10 ( µg…    20    15    15    13    16    17    15    18
##  9 2020-01-02 44 - B… PM10 ( µg…    17    15    14    12    13    14    16    19
## 10 2020-01-02 54 - B… PM10 ( µg…    19    15    16    15    14    14    17    18
## # ℹ 9,242 more rows
## # ℹ 16 more variables: H09 <dbl>, H10 <dbl>, H11 <dbl>, H12 <dbl>, H13 <dbl>,
## #   H14 <dbl>, H15 <dbl>, H16 <dbl>, H17 <dbl>, H18 <dbl>, H19 <dbl>,
## #   H20 <dbl>, H21 <dbl>, H22 <dbl>, H23 <dbl>, H24 <dbl>

#Searching if there are any values >= 80 µg/m3
qual_air5 <- qual_air4 |>
  filter(H01 > 79 | H02 > 79 | H03 > 79 | H04 > 79 | H05 > 79 | H06 > 79 | 
           H07 > 79 | H08 > 79 | H09 > 79 | H10 > 79 | H11 > 79 | H12 > 79 | H13 > 79 | 
           H14 > 79 | H15 > 79 | H16 > 79 | H17 > 79 | H18 > 79 | H19 > 79 | H20 > 79 | H21 > 79 | H22 > 79 | 
           H23 > 79 | H24 > 79)
qual_air5

## # A tibble: 582 × 27
##    DATE       ESTACIO POLLUTANTS   H01   H02   H03   H04   H05   H06   H07   H08
##    <chr>      <chr>   <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 2020-01-03 57 - B… PM10 ( µg…    12    11    10    12    11    12    11    12
##  2 2020-01-08 43 - B… PM10 ( µg…    29    23    21    22    25    27    31    57
##  3 2020-01-08 57 - B… PM10 ( µg…    25    16    17    21    25    24    20    23
##  4 2020-01-09 4 - Ba… PM10 ( µg…    37    34    31    26    23    24    25    33
##  5 2020-01-15 43 - B… PM10 ( µg…    42    40    48    43    32    27    30    38
##  6 2020-01-16 43 - B… PM10 ( µg…    22    34    39    20    17    22    28    40
##  7 2020-01-22 4 - Ba… PM10 ( µg…    52    26    13    12    16     7    11    11
##  8 2020-01-22 43 - B… PM10 ( µg…    25    20    11     8     8     6     9    11
##  9 2020-01-22 44 - B… PM10 ( µg…    23    18     8     7     5     4     7    10
## 10 2020-01-23 4 - Ba… PM10 ( µg…    53    52    46    46    55    57    77    89
## # ℹ 572 more rows
## # ℹ 16 more variables: H09 <dbl>, H10 <dbl>, H11 <dbl>, H12 <dbl>, H13 <dbl>,
## #   H14 <dbl>, H15 <dbl>, H16 <dbl>, H17 <dbl>, H18 <dbl>, H19 <dbl>,
## #   H20 <dbl>, H21 <dbl>, H22 <dbl>, H23 <dbl>, H24 <dbl>

We’re doing the same thing for NO2, but this time, we’re only interested in values that are 80 or more and come from at least 2 different stations on the same day. Then, we find the average for each day by looking at these values and, like before, we end up with one number per day by averaging these daily averages. Now, we are plotingt the initial scenario.

first_pm10 <- qual_air5|>
  fill(everything())|>
  group_by(DATE)|>
  filter(n() >= 2)|>
  select(DATE, starts_with("H"))|>
  rowwise()|>
  mutate(mean_row = mean(c_across(starts_with("H")), na.rm = TRUE))|>
  filter(mean_row >= 80)|>
  group_by(DATE)|>
  summarise(mean = sum(mean_row)/  n() )

first_pm10|>
  mutate(DATE = as.Date(DATE, format = "%Y-%m-%d")) |>
  ggplot(aes(DATE, mean,  group = 1) ) +
  geom_line(show.legend = TRUE) +
  geom_hline(yintercept = 80, linetype = "dashed", colour = "red" ) +
  scale_x_date(date_breaks = "3 month", date_labels = "%B %Y") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(x = "date", y = "PM10 ( µg/m³ )", title = "EPISODES >= 80 PM10 (µg/m³) DURING THE ANALYZED PERIOD") +
  theme_minimal()

Second scenario

Similarly, for the second case, when the daily PM10 emission level reaches or exceeds 50 µg/m³ for three consecutive days and there’s an expected exceedance of this value for the following day, we conclude that there is an episode and we can reach this by the followin code:

qual_air6 <- qual_air4 |>
  filter(H01 > 49 | H02 > 49 | H03 > 49 | H04 > 49 | H05 > 49 | H06 > 49 | 
           H07 > 49 | H08 > 49 | H09 > 49 | H10 > 49 | H11 > 49 | H12 > 49 | H13 > 49 | 
           H14 > 49 | H15 > 49 | H16 > 49 | H17 > 49 | H18 > 49 | H19 > 49 | H20 > 49 | H21 > 49 | H22 > 49 | 
           H23 > 49 | H24 > 49)
qual_air6

## # A tibble: 2,165 × 27
##    DATE       ESTACIO POLLUTANTS   H01   H02   H03   H04   H05   H06   H07   H08
##    <chr>      <chr>   <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 2020-01-02 57 - B… PM10 ( µg…    16    15    17    14    12    12    12    17
##  2 2020-01-03 57 - B… PM10 ( µg…    12    11    10    12    11    12    11    12
##  3 2020-01-08 4 - Ba… PM10 ( µg…    15    15    17    20    25    26    30    34
##  4 2020-01-08 43 - B… PM10 ( µg…    29    23    21    22    25    27    31    57
##  5 2020-01-08 44 - B… PM10 ( µg…    40    27    25    21    22    23    23    26
##  6 2020-01-08 57 - B… PM10 ( µg…    25    16    17    21    25    24    20    23
##  7 2020-01-09 4 - Ba… PM10 ( µg…    37    34    31    26    23    24    25    33
##  8 2020-01-09 43 - B… PM10 ( µg…    37    29    27    21    18    15    19    27
##  9 2020-01-09 44 - B… PM10 ( µg…    20    15    11     9    12    15    12    23
## 10 2020-01-09 54 - B… PM10 ( µg…    10     7    11    13    13    14    13    16
## # ℹ 2,155 more rows
## # ℹ 16 more variables: H09 <dbl>, H10 <dbl>, H11 <dbl>, H12 <dbl>, H13 <dbl>,
## #   H14 <dbl>, H15 <dbl>, H16 <dbl>, H17 <dbl>, H18 <dbl>, H19 <dbl>,
## #   H20 <dbl>, H21 <dbl>, H22 <dbl>, H23 <dbl>, H24 <dbl>

qual_air7 <- qual_air6|>
  fill(everything())|>
  select(DATE, starts_with("H"))|>
  rowwise()|>
  mutate(mean_row = mean(c_across(starts_with("H")), na.rm = TRUE))|>
  filter(mean_row >= 50)|>
  group_by(DATE)|>
  summarise(mean = sum(mean_row)/  n() )

Lastly, we generate a table for episodes where PM10 levels are equal to or greater than 50 µg/m³ during the analyzed period. We use this code to find instances where the average value occurs for three days in a row. Finally, a graphical representation is provided for visualization as it is shown down.

second_pm10 <- qual_air7|>
  mutate(next_date = lead(DATE))|>
  mutate(next_next_date = lead(next_date))|>
  filter(!is.na(DATE) & !is.na(next_date) & !is.na(next_next_date) & 
           difftime(next_date, DATE, units = "days") == 1 & 
           difftime(next_next_date, next_date, units = "days") == 1)
second_pm10

## # A tibble: 11 × 4
##    DATE        mean next_date  next_next_date
##    <chr>      <dbl> <chr>      <chr>         
##  1 2020-01-22  53.5 2020-01-23 2020-01-24    
##  2 2022-01-27  50.9 2022-01-28 2022-01-29    
##  3 2022-06-22  51.6 2022-06-23 2022-06-24    
##  4 2022-10-18  62.1 2022-10-19 2022-10-20    
##  5 2022-10-19  68.4 2022-10-20 2022-10-21    
##  6 2022-10-23  60.7 2022-10-24 2022-10-25    
##  7 2022-10-24  73.7 2022-10-25 2022-10-26    
##  8 2022-10-25  59.2 2022-10-26 2022-10-27    
##  9 2022-10-26  66.8 2022-10-27 2022-10-28    
## 10 2022-10-27  60.8 2022-10-28 2022-10-29    
## 11 2022-10-28  52.6 2022-10-29 2022-10-30

second_pm10|>
  mutate(DATE = as.Date(DATE, format = "%Y-%m-%d")) |>
  ggplot(aes(DATE, mean,  group = 1) ) +
  geom_line(show.legend = TRUE) +
  geom_hline(yintercept = 50, linetype = "dashed", colour = "red" ) +
  scale_x_date(date_breaks = "3 month", date_labels = "%B %Y") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(x = "date", y = "PM10 ( µg/m³ )", title = "EPISODES >= 50 PM10 (µg/m³) DURING THE ANALYZED PERIOD") +
  theme_minimal()

Conclusion

From the analysis concluded on the air pollutant NO2 we obtained 0 atmospheric pollution episodes, the minimum threshold for an episode to occur was set at 200 µg/m³. With regard to the analyses carried out on the air pollutant PM10 we found 9 atmospheric pollution episodes that exceeded the threshold of 80 µg/m³ while 11 were the episodes identified when the alert threshold was set at 50 µg/m³. All analyses were carried out in the same period of interest from 1 January 2020 to 31 March 2024.

Objective 2: Effect of Human Activity in Pollutants

The COVID-19 pandemic presented an opportunity for a unique “natural experiment” to observe the impact of human activity. This was possible by the imposition of mobility restrictions at various points in 2020 and 2021 in Spain.

Procedure

To assses the impact of human activity on pollutants in Barcelona, we are comparing moments of confinement with similar moments with normal activity. For this, the first step is to create a Shinny structure and set up a selection input field for the pollutants on the application. Next, we’ll collect unique station names from the ESTACIO column in the dataframe, these sorted station names will then be used to create a list of stations in the app. Now you’ll see the app’s code as a comment to show how we manage to do it.

#SelectPollutant <- selectInput("pollutants",
#                                label = "Select Pollutant",
#                                choices = NULL)
# 
# estacio <- unique(qualitat_aire$ESTACIO)
# sorted_estacio <- sort(estacio)
# SelectEstacio_qa <- selectInput("Estacio_qa",
#                                 label = "Select Station",
#                                 choices = sorted_estacio)

Then, we create cards and value boxes highlighting key stats of the data resukts for specific period.

# card_lockdown<-card(cardHeader = "Air Quality Covid vs No-Covid Period",
#                    plotOutput("comparison"))
# card_total<-card(cardHeader = "Air Quality Covid vs No-Covid Period",
#                       plotOutput("total"))
# vb_max2020 <- value_box(
#   title = "Mean in 2020",
#   value = textOutput("Max2020"),
#   showcase = bsicons::bs_icon("virus2"),
#   theme = "red")
# 
# vb_max2022 <- value_box(
#   title = uiOutput("max2022_title"),
#   value = textOutput("Max2022"),
#   showcase = bsicons::bs_icon("bus-front-fill"),
#   theme = "blue"
# )

We procede to create the sidebar layout with interactive navigation panel for user to understand the application easily.

#ui <- page_navbar(
#   title = "Objective 2",
#   sidebar = sidebar(
#     SelectEstacio_qa,
#     SelectPollutant,
#     selectInput("year_selection", "Select Year", choices = c("2022", "2023"))
#   ),
#   layout_columns(width = 1/2, fill = FALSE),
#   nav_spacer(),
#   nav_panel("Maximum Lockdown Period vs No-Covid", 
#             fluidRow(
#               column(width = 6, vb_max2020),
#               column(width = 6, vb_max2022),
#               column(width = 12, card_total)
#             )
#   ),
#  nav_panel("Complete Year", card_lockdown)
#   )

For the server logic, we set the input of the dropdown menu for selecting pollutants based on the chosen station. It generates two plots comparing pollutant levels in 2020 and 2022, and another plot showing total pollutant levels in specific time frames for the selected variables. It also calculates and displays the average pollutant levels for the space and time period.

#server <- function(input, output) {
#   
#   observe({
#     stazione <- input$Estacio_qa
#     pollutants <- unique(qualitat_aire$POLLUTANTS[qualitat_aire$ESTACIO == stazione])
#     updateSelectInput(inputId = "pollutants", choices = pollutants)
#   })
#   
#   output$max2022_title <- renderUI({
#     HTML(paste("Mean in", input$year_selection))
#   })
#   
#   output$total <- renderPlot({
#     qualitat_aire %>%
#       filter((as.Date(DATE) >= as.Date("2020-03-14") & as.Date(DATE) <= as.Date("2020-06-14"))|(as.Date(DATE) >= as.Date(paste0(input$year_selection, "-03-14")) & 
#                 as.Date(DATE) <= as.Date(paste0(input$year_selection, "-06-14")))) %>%
#       filter(ESTACIO == input$Estacio_qa) %>%
#       filter(POLLUTANTS == input$pollutants) %>%
#       mutate(Month_Day = format(DATE, "%m-%d"),
#              Year = as.factor(year(DATE))) %>%
#       group_by(Year) %>%
#       mutate(lower_bound = quantile(Daily_mean, 0.25) - 1.5 * IQR(Daily_mean),
#              upper_bound = quantile(Daily_mean, 0.75) + 1.5 * IQR(Daily_mean)) %>%
#       filter(Daily_mean >= lower_bound & Daily_mean <= upper_bound) %>%
#       ggplot(aes(x = Month_Day, y = Daily_mean, color = Year, group = Year)) +
#       geom_line() +
#       theme_minimal() +
#       theme(axis.text.x = element_text(angle = 60, vjust = 0.5, hjust = 1)) +
#       labs(title = paste0(input$year_selection, " vs 2020 of ", input$pollutants, 
#                           " in station ", input$Estacio_qa, " from March 14th to June 14th"),
#            y = NULL, x = NULL)
#   })
#   
#   output$comparison <- renderPlot({
#     qualitat_aire %>%
#       mutate(DATE = as.Date(DATE, format = "%y-%m-%d")) %>%
#       filter((as.Date(DATE) >= as.Date("2020-01-01") & as.Date(DATE) <= as.Date("2020-12-31")) |(as.Date(DATE) >= as.Date(paste0(input$year_selection, "-01-01")) & 
#                 as.Date(DATE) <= as.Date(paste0(input$year_selection, "-12-31")))) %>%
#       filter(ESTACIO == input$Estacio_qa) %>%
#       filter(POLLUTANTS == input$pollutants) %>%
#       mutate(Month_Day = format(DATE, "%m-%d"),
#              Year = as.factor(year(DATE))) %>%
#       group_by(Year) %>%
#       mutate(lower_bound = quantile(Daily_mean, 0.25) - 1.5 * IQR(Daily_mean),
#              upper_bound = quantile(Daily_mean, 0.75) + 1.5 * IQR(Daily_mean)) %>%
#       filter(Daily_mean >= lower_bound & Daily_mean <= upper_bound) %>%
#       ggplot(aes(x = Month_Day, y = Daily_mean, color = Year, group = Year)) +
#       geom_line() +
#       theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
#       labs(title = paste0(input$year_selection, " vs 2020 of ", input$pollutants, 
#                           " in station ", input$Estacio_qa),
#            y = NULL, x = NULL) +
#       theme_minimal()
#   })
#   
#   output$Max2020 <- renderText({
#     m <- qualitat_aire %>%
#       filter(as.Date(DATE) >= as.Date(paste0("2020-03-01")) & 
#                as.Date(DATE) <= as.Date(paste0("2020-05-31"))) %>%
#       filter(ESTACIO == input$Estacio_qa) %>%
#       filter(POLLUTANTS == input$pollutants) %>%
#       mutate(Daily_mean = as.numeric(Daily_mean)) %>%
#       group_by(ESTACIO, POLLUTANTS) %>%
#       mutate(lower_bound = quantile(Daily_mean, 0.25) - 1.5 * IQR(Daily_mean),
#              upper_bound = quantile(Daily_mean, 0.75) + 1.5 * IQR(Daily_mean)) %>%
#       filter(Daily_mean >= lower_bound & Daily_mean <= upper_bound) %>%
#       pull(Daily_mean) %>%
#       mean()
#     
#     scales::label_number(accuracy = 1, big.mark = ",")(m)
#   })
#   
#   output$Max2022 <- renderText({
#     m <- qualitat_aire %>%
#       filter(as.Date(DATE) >= as.Date(paste0(input$year_selection, "-03-01")) & 
#                as.Date(DATE) <= as.Date(paste0(input$year_selection, "-05-31"))) %>%
#       filter(ESTACIO == input$Estacio_qa) %>%
#       filter(POLLUTANTS == input$pollutants) %>%
#       mutate(Daily_mean = as.numeric(Daily_mean)) %>%
#       group_by(ESTACIO, POLLUTANTS) %>%
#       mutate(lower_bound = quantile(Daily_mean, 0.25) - 1.5 * IQR(Daily_mean),
#              upper_bound = quantile(Daily_mean, 0.75) + 1.5 * IQR(Daily_mean)) %>%
#       filter(Daily_mean >= lower_bound & Daily_mean <= upper_bound) %>%
#       pull(Daily_mean) %>%
#       mean()
#     
#     scales::label_number(accuracy = 1, big.mark = ",")(m)
#   })
# }

Finally, to visualize this result, click the following link to get redirected into a Shinny app that allows you to interact with the pollutant and the station, comparing the lock down period against the years 2022 and 2023.

https://juanherrera8.shinyapps.io/OBJECTIVE2/

Conclusion

We managed to create a space with interactive information that can be useful for anyone looking for air data quality insights during the covid pandemic in Barcelona. We could see that during the years 2022 and 2023, the pollutants that experienced the most significant reduction during the pandemic period were: NO2, NOx, NO, and PM10. As you can contemplate in the following plot, the contaminants reduction in the Eixample área was one of the most significants.

“Captura1”

Objective 3: Effect of Weather in Pollutants

From the Open Data portal, we have access to hourly air quality data and daily weather information. Although weather and pollutant measurement stations are situated in different locations and record data at different intervals, there is still an opportunity to investigate the impact of weather conditions (such as rain, wind, pressure, humidity, etc.) on pollutant levels.

Objective 3a

To assess the impact of weather on pollutants in Barcelona, we need to obtain meteorological information for the same period covered by the air quality dataframe. Therefore, we integrate and prepare meteo data from multiple CSV files available at: https://opendata-ajuntament.barcelona.cat/data/en/dataset/mesures-estacions-meteorologiques

# directory <- "METEO"
# files <- list.files(path = directory, pattern = "\\.csv$", full.names = TRUE)
# file_names <- tools::file_path_sans_ext(basename(files))
# m <- lapply(setNames(files, file_names), read.csv)
# rm(directory)
# rm(files)
# rm(file_names)
# 
# meteo <- bind_rows(m, .id = "source") |>
#   rename("DATE"="DATA_LECTURA")|>
#   arrange(DATE)
# 
# meteo$CODI_ESTACIO<-as.factor(meteo$CODI_ESTACIO)
# meteo$ACRÒNIM<-as.factor(meteo$ACRÒNIM)
# 
# str(as_tibble(meteo))
# 
# levels(meteo$CODI_ESTACIO)
# levels(meteo$ACRÒNIM)
# 
# meteo<-meteo|>
#   select(-c(1,3))
# 
# 
# meteo_metadata<-read.csv("MeteoCat_Metadades.csv")
# 
# meteo <- meteo|>
#   left_join(meteo_metadata, by = "ACRÒNIM")|>
#   relocate(c(NOM_VARIABLE, UNITAT), .after = ACRÒNIM)|>
#   mutate(VARIABLE = paste(ACRÒNIM,"-",NOM_VARIABLE, "(", UNITAT, ")"))|>
#   relocate(VARIABLE,.after = UNITAT)|>
#   select(-c(NOM_VARIABLE,UNITAT,CODI_VARIABLE))
# 
# 
# directory<- "ESTACIONS_METEO"
# files<- list.files(path = directory, pattern = "\\.csv$", full.names = TRUE)
# file_names <- tools::file_path_sans_ext(basename(files))
# list_estacions_m<- lapply(setNames(files, file_names), read.csv)
# rm(directory)
# rm(files)
# rm(file_names)
# 
# for (i in seq_along(list_estacions_m)) {
#   estacion <- list_estacions_m[[i]]  
#   estacion <- estacion[, 1:2]
#   names(estacion)[1] <- "ESTACIO"
#   list_estacions_m[[i]] <- estacion
# }
# rm(estacion)
# rm(i)
# 
# estacions_m <- bind_rows(list_estacions_m, .id = "source")|>
#   select(-1)|>
#   distinct()
# 
# meteo <- meteo|>
#   rename("ESTACIO"="CODI_ESTACIO")|>
#   left_join(estacions_m, by = "ESTACIO") |>
#   relocate(NOM_ESTACIO, .after = ESTACIO)|>
#   mutate(ESTACIO = paste(ESTACIO, "-", NOM_ESTACIO))|>
#   select(-c(NOM_ESTACIO))
# 
# meteo$ESTACIO<-as.factor(meteo$ESTACIO)
# meteo$ACRÒNIM<-as.factor(meteo$ACRÒNIM)
# meteo$VARIABLE<-as.factor(meteo$VARIABLE)
# 
# str(as_tibble(meteo))
# levels(meteo$VARIABLE)

After this we create a shiny app that have interactive select input widgets, allowing users to pick pollutants and meteorological variables for analysis. Following this, a date input widget is configured to enable user to select a specific time range.

# SelectVariable1 <- selectInput("pollutants",
#                                label = "Select Pollutant:",
#                                choices = NULL)
# 
# SelectVariable2 <- selectInput("meteo",
#                                label = "Select Meteorologicall Variable:",
#                                choices = NULL)
# #STATIONS
# estacio <- unique(qualitat_aire$ESTACIO)
# sorted_estacio <- sort(estacio)
# SelectEstacio_qa <- selectInput("Estacio_qa", 
#                                 label = "Select Station: ", 
#                                 choices = sorted_estacio)
# rm(estacio)
# rm(sorted_estacio)
# 
# estacio <- unique(meteo$ESTACIO)
# sorted_estacio <- sort(estacio)
# SelectEstacio_m <- selectInput("Estacio_m", 
#                                label = "Select Station:", 
#                                choices = sorted_estacio)
# rm(estacio)
# rm(sorted_estacio)
# 
# #DATE
# SelectDate <- dateRangeInput("date",
#                              "Select the date:",
#                              start = as.Date(min(qualitat_aire$DATE)),
#                              end  = "2023-12-31")
# #CARD
# # card_quality<-card(cardHeader = "Air Quality hour by hour",
# #                    plotOutput("quality"))
# card_mean<-card(card_header="Mean",
#                 plotOutput("mean"))
# card_meteo<-card(card_header="Meteo",
#                        plotOutput("meteo"))
# 
# #VALUE BOXES
# vb_air <- value_box(
#   title = "Mean pollutant",
#   value = textOutput("Mean polltant"),
#   showcase = bsicons::bs_icon("virus2"),
#   theme = "red")
# 
# vb_meteo <- value_box(
#   title = "Mean meteo",
#   value = textOutput("Mean meteo"),
#   showcase = bsicons::bs_icon("people-fill"),
#   theme = "blue")

At the end, the only remaining step is to configure the user interface and the server. With this part of the code, we set up an interface and server functions to analyze the effect of weather. The UI includes the previous inputs for selecting air quality, meteorological variables and the date. The server generates plots showing the daily mean values of the selected pollutant and the trend of the selected meteorological variable over the specified period. The data is filtered and aggregated to create these visualizations, helping to explore the relation between weather and air quality:

#ui <- page_sidebar(
#   title = "Objective 3a: Effect of Weather in Pollutants",
#   sidebar = sidebar(SelectEstacio_qa,SelectVariable1,SelectEstacio_m,SelectVariable2,SelectDate),
#   card_mean,card_meteo)# card_quality
# 
#server <- function(input, output) {
#   
#   observe({
#     stazione_qa <- input$Estacio_qa
#     pollutant <- unique(qualitat_aire$POLLUTANTS[qualitat_aire$ESTACIO == stazione_qa])
#     updateSelectInput(inputId = "pollutants", choices = pollutant)
#   })
#   
#   observe({
#     stazione_m <- input$Estacio_m
#     meteov <- unique(meteo$VARIABLE[meteo$ESTACIO == stazione_m])
#     updateSelectInput(inputId = "meteo", choices = meteov)
#   })
#   
#   output$mean <- renderPlot({
#     qualitat_aire |>
#       select(DATE,ESTACIO,POLLUTANTS,starts_with("H"))|>
#       pivot_longer(cols = starts_with("H"),
#                    names_to = "Hour",
#                    values_to = "Value")|>
#       filter(DATE >= as.Date(input$date[1]) & DATE <= as.Date(input$date[2])) |>
#       filter(ESTACIO == input$Estacio_qa) |>
#       filter(POLLUTANTS == input$pollutants)|>
#       mutate(Value = as.numeric(Value)) |>
#       filter(!is.na(Value)) |>
#       aggregate(Value ~ DATE + POLLUTANTS + ESTACIO, FUN = mean,na.rm = TRUE)|>
#       ggplot(aes(x = DATE, y = Value)) +
#       geom_line() +
#       labs(x = "Data", y = "Media giornaliera del valore") +
#       theme_minimal() +
#       labs(title = paste0("Evolution of mean of ", input$pollutants, " in ", input$Estacio_qa), y= NULL, x = NULL)
#   })
#   
#   output$meteo<- renderPlot({
#     meteo|>
#       filter(DATE >= as.Date(input$date[1]) & DATE <= as.Date(input$date[2])) |>
#       filter(ESTACIO == input$Estacio_m)|>
#       filter(VARIABLE == input$meteo)|>
#       ggplot(aes(x = as.POSIXct(DATE), y = VALOR)) +
#       geom_line()+
#       theme_minimal() +
#       labs(title = paste0("Evolution of ", input$meteo, " in ", input$Estacio_m), y= NULL, x = NULL)
#   })
# }

In the same way as the previous objective, here is a link to visualize the results of the code. By clicking the following button you will get redirected into a Shinny app that allows you to graphically compare behaviours and trends of the meteorological values with the air quality in the same space-time frame.

https://juanherrera8.shinyapps.io/OBJECTIVE3A/

Conclusion

We have obtained a Shiny Web App that shows us the trend of a specific air pollutant measured by a specific station and compares it with the trend of a specific weather variable measured in a weather stations scattered at Barcelona. All this by selecting a specific period of interest.

Objective 3b

In the last part of this project we will examine what values of weather variables are related with the athmospheric pollution episodes detected in the Objective 1. For this Shiny we start by creating the sidebar of the Shiny app with two drop-down menus and a date range input. First, we filter the air quality data to include only PM10 measurements. Then, we extract and sort the unique station names and create a drop menu for selecting an air quality station. Next, we extract and sort unique meteorological station names from the meteorological data, creating a second drop-down menu for selecting a meteorological station,finally, we set up a date range input using the earliest date from the air quality data and an end date of December/2023.

# #APP
# #SIDEBAR
# #STATION
# 
# qa_PM10<-qualitat_aire|>
#   filter(POLLUTANTS == "PM10 ( µg/m³ )")
# estacio <- unique(qa_PM10$ESTACIO)
# sorted_estacio <- sort(estacio)
# SelectEstacio_qa <- selectInput("Estacio_qa", 
#                                 label = "Select Air Quality Station: ", 
#                                 choices = sorted_estacio)
# rm(estacio)
# rm(sorted_estacio)
# 
# estacio <- unique(meteo$ESTACIO)
# sorted_estacio <- sort(estacio)
# SelectEstacio_m <- selectInput("Estacio_m", 
#                                label = "Select Meteorological Station: ", 
#                                choices = sorted_estacio)
# rm(estacio)
# rm(sorted_estacio)
# 
# #DATE
# SelectDate <- dateRangeInput("date",
#                              "Select the date:",
#                              start = as.Date(min(qualitat_aire$DATE)),
#                              end  = "2023-12-31")

After this we procede to create several cards to display different plots in the Shiny app. We create a card for displaying mean values, another for different mean values, and a third for additional mean values. We also create cards for wind intensity, wind direction, humidity, and rainfall, each with a header and a plot output.

# #CARDS
# card_mean<-card(card_header="Mean",
#                 plotOutput("mean"))
# card_mean2<-card(card_header="Mean",
#                 plotOutput("mean2"))
# card_mean3<-card(card_header="Mean",
#                  plotOutput("mean3"))
# card_vento1<-card(card_header="Intensità vento",
#                        plotOutput("vento1"))
# card_vento2<-card(card_header="Direzione vento",
#                  plotOutput("vento2"))
# card_umidità<-card(card_header="Umidità",
#                   plotOutput("umidità"))
# card_pioggia<-card(card_header="Pioggia",
#                    plotOutput("pioggia"))

In this part of the code, we create the user interface (UI) and server logic for the Shiny app. For the UI, we set up a navbar page that includes a sidebar with the previously created menus for selecting air quality and meteorological stations and a date range. The navbar has three panels: “Wind,” “Rain,” and “Humidity,” each containing relevant cards for displaying plots. For the server logic, we update the menus based on selected stations and render plots for the selected dates and stations. The plots display daily mean values of PM10, wind speed, wind direction, humidity, and accumulated precipitation.

# ui <- page_navbar(
#   title = "Objective 3b",
#   sidebar = sidebar(SelectEstacio_qa,SelectEstacio_m,SelectDate),
#   nav_spacer(),
#   nav_panel("Wind",card_mean,card_vento1,card_vento2),
#   nav_panel("Rain",card_mean2,card_pioggia),
#   nav_panel("Humidity",card_mean3,card_umidità))
#   
# SERVER
# server <- function(input, output) {
#   
#   observe({
#     stazione_qa <- input$Estacio_qa
#     pollutant <- unique(qualitat_aire$POLLUTANTS[qualitat_aire$ESTACIO == stazione_qa])
#     updateSelectInput(inputId = "pollutants", choices = pollutant)
#   })
#   
#   observe({
#     stazione_m <- input$Estacio_m
#     meteov <- unique(meteo$VARIABLE[meteo$ESTACIO == stazione_m])
#     updateSelectInput(inputId = "meteo", choices = meteov)
#   })
# 
#   output$mean <- renderPlot({
#     qa_PM10 |>
#       filter(DATE >= as.Date(input$date[1]) & DATE <= as.Date(input$date[2])) |>
#       filter(ESTACIO == input$Estacio_qa) |>
#       ggplot(aes(x = DATE, y = Daily_mean)) +
#       geom_line() +
#       labs(x = "Data", y = "Daily Mean Value") +
#       theme_minimal() +
#       labs(title = paste0("Mean of PM10 - ( g/m³) in ", input$Estacio_qa), y= NULL, x = NULL)
#   })
#   
#   output$mean2 <- renderPlot({
#     qa_PM10 |>
#       filter(DATE >= as.Date(input$date[1]) & DATE <= as.Date(input$date[2])) |>
#       filter(ESTACIO == input$Estacio_qa) |>
#       ggplot(aes(x = DATE, y = Daily_mean)) +
#       geom_line() +
#       labs(x = "Data", y = "Daily Mean Value") +
#       theme_minimal() +
#       labs(title = paste0("Mean of PM10 - ( g/m³) in ", input$Estacio_qa), y= NULL, x = NULL)
#   })
#   
#   output$mean3 <- renderPlot({
#     qa_PM10 |>
#       filter(DATE >= as.Date(input$date[1]) & DATE <= as.Date(input$date[2])) |>
#       filter(ESTACIO == input$Estacio_qa) |>
#       ggplot(aes(x = DATE, y = Daily_mean)) +
#       geom_line() +
#       labs(x = "Data", y = "Daily Mean Value") +
#       theme_minimal() +
#       labs(title = paste0("Mean of PM10 - ( g/m³) in ", input$Estacio_qa), y= NULL, x = NULL)
#   })
#   
#   output$vento1<- renderPlot({
#     meteo|>
#       filter(DATE >= as.Date(input$date[1]) & DATE <= as.Date(input$date[2])) |>
#       filter(ESTACIO == input$Estacio_m)|>
#       filter(ACRÒNIM == "VVM10")|>
#       ggplot(aes(x = as.POSIXct(DATE), y = VALOR)) +
#       geom_line()+
#       theme_minimal() +
#       labs(title = paste0("VVM10 - Velocitat mitjana diària del vent 10 m (m/s) in ", input$Estacio_m), y= NULL, x = NULL)
#   })
#   
#   output$vento2<- renderPlot({
#     meteo|>
#       filter(DATE >= as.Date(input$date[1]) & DATE <= as.Date(input$date[2])) |>
#       filter(ESTACIO == input$Estacio_m)|>
#       filter(ACRÒNIM == "DVM10")|>
#       ggplot(aes(x = as.POSIXct(DATE), y = VALOR)) +
#       geom_line()+
#       theme_minimal() +
#       labs(title = paste0("DVM10 - Direcció mitjana diària del vent 10 m (°) in ", input$Estacio_m), y= NULL, x = NULL)
#   })
#   
#   output$umidità<- renderPlot({
#     meteo|>
#       filter(DATE >= as.Date(input$date[1]) & DATE <= as.Date(input$date[2])) |>
#       filter(ESTACIO == input$Estacio_m)|>
#       filter(ACRÒNIM == "HRM")|>
#       ggplot(aes(x = as.POSIXct(DATE), y = VALOR)) +
#       geom_line()+
#       theme_minimal() +
#       labs(title = paste0("HRM - Humitat relativa mitjana diària (%) in ", input$Estacio_m), y= NULL, x = NULL)
#   })
# 
#   output$pioggia<- renderPlot({
#     meteo|>
#       filter(DATE >= as.Date(input$date[1]) & DATE <= as.Date(input$date[2])) |>
#       filter(ESTACIO == input$Estacio_m)|>
#       filter(ACRÒNIM == "PPT")|>
#       ggplot(aes(x = as.POSIXct(DATE), y = VALOR)) +
#       geom_line()+
#       theme_minimal() +
#       labs(title = paste0("PPT - Precipitació acumulada diària (mm) in ", input$Estacio_m), y= NULL, x = NULL)
#   })
# }

You can take a look of the resultant shiny app by clicking the following link:

https://juanherrera8.shinyapps.io/OBJECTIVE3B/

Conclusion

Based on objective 1, the analysis of the second scenario reveals that episodes with PM10 levels equal or exceeding 50 µg/m³ occurred predominantly during October of 2022. Specifically, 9 out of 11 total episodes ocurred within this month. To have better insights into this timeframe, we set a period to include 15 days prior and subsequent to contextualize the data. Subsequently, we analyse this period with meteorological data as wind speed, direction, and precipitation. Its notably observed a relation between a meteo variable as precipitation with the pollutant level. For instance, examination of the data illustrates that Barcelona experienced no rainfall during these PM10 episodes. This absence of precipitation facilitated the retention and concentration of pollutants in the atmosphere, as you can see in the displayed plot.

From the Shiny app, it can also be seen that a week before the PM10 peak, the wind blew with a strong intensity at a direction of 60° and in the following days, the intensity decreased. It is likely that the strong wind at 60°, i.e. from the sea towards the mainland, caused the pollutant to accumulate above the city due to the barrier of the mountains and that the weak wind on the following days did not clear the air. This combined with the low rainfall and high humidity contributed to the PM10 peak.

Final Remarks and Recomendations

In general, Barcelona’s air quality is good, we could observe that there have been instances where pollutant levels exceeded the limits and led to episodes, but this did not happen with every of them. Our data indicate that NO2 levels remained within acceptable limits for all the period analyzed, and it never exceed the critical level of 200 µg/m³, however, for PM10, there were several instances where levels surpassed the daily limit of 80 µg/m³, registering episodes of excessive contamination. Then, taking a look at the plots from the lock down months and compared it with more recent year, we saw an important raising on the contaminants levels associated to industry and traffic, these contaminants that were significantly reduced during 2020 were principally the nitrogen oxides. This reduction can be attributed to various factors related to the pandemic, including decreased industrial activities, reduced vehicular traffic, and implementation of strict measures aimed at limiting people interaction. NO2, NOx, and NO are primarily associated with vehicle emissions and industrial processes, both of which experienced significant declines during confinement. PM10, while also influenced by industrial activities and transportation, may have been comparatively less impacted due to its diverse sources, including natural sources like dust and pollen. Therefore, the observed reduction in pollutants can be attributed to the collective effects of reduced human activities.

To improve improve air quality policy in Barcelona, we can suggest several measures. First, promoting the use of public transportation, while car usage is relatively low, there is still potential for increased public transport adoption. Second, continue and enhance the restrictions on older polluting vehicles (although such measure that already exists, but needs to be carried forward). Also industrial activities significantly impact on pollution levels, encourage companies to adopt green practices and reduce their emission, investing, for example, in cleaner technologies and create sustainability programs in that they explore all opportunity to be more green and reduce their ecological footprint, scientific research is making progress, new technologies are gradually being discovered that are less polluting. Atmospheric Pollution is a global problem, actually there is a focus on this issue, people are recognizing the problem and also governments and companies, openly adopting all changes that are less pollutants can be helpful to be more green.

Rearding to open data, our suggestions for improving this is to continue updating it, keeping it easily accessible and open to everyone and add to stations the control of a larger number of pollutants, we have noticed that some stations don’t measure all pollutants, that can make some research less accurate, particularly if in these studies we want to understand if certain type of pollutants accumulate in a specific area. Another improvement is to invest in updating older machinery that generates errors and experiences more shutdowns during pollution detection, and maybe that can be more accurate, we have noticed in fact that at certain times, some stations were not available the datas, and in those cases we uses neutral average values to fill gaps, making datas slightly less accurate.

We feel that the initiative promoted by the province of Barcelona is definitely positive and should be adopted by all public institutions not only for weather and air quality but also for other data of common interest. In this way, every citizen can be informed about what is happening, the number of university researches increases and it is easier to study cause-effect relationships. However, the tables on air quality and weather could be improved to make it easier to read and analyse the data. As far as air quality is concerned, one could remove the columns showing only N or V as values, which do not add any useful information, and add a column with the daily average value and a column indicating whether or not a heavy pollution episode occurred on the day. Another weakness of this dataset, probably due to the poor maintenance of the measuring stations, is the large number of NA values. In addition, there is one station that has measured pollutants for a few years that are marked with an asterisk (e.g. NO2*), but no explanation is given for this. On the other hand, the weather table seems more complete, although measurements could be added oa per hour as in the case of the pollutant level databes. Finally, one idea to better study the relationship between weather and air pollution is to place the measuring stations in the same place.

BUSINESS ANALYITICS PROJECT

Michael Cucchi, Juan Andres Herrera Orozco, Gioele Penatti, Antonio Pillittieri

22-05-2024

Introduction

Methodology: Data preparation

Objective 1: Athmospheric Pollution Episodes

Procedure

NO2

PM10

First scenario

Second scenario

Conclusion

Objective 2: Effect of Human Activity in Pollutants

Procedure

Conclusion

Objective 3: Effect of Weather in Pollutants

Objective 3a

Conclusion

Objective 3b

Conclusion

Final Remarks and Recomendations