Barcelona’s Air Pollution

Author

Mael Baudran, Javier Gonzalo, Mauricio Valdez

Introduction

It is no secret that one of the main concerns of the citizens of Barcelona is the immense amount of pollution produced every day in the city. For instance, according to recent studies carried out by the “Ajuntament de Barcelona”, Barcelona has particulate matter (PM10) and Nitrogen Dioxide (NO2) in the air, possibly having a very prejudicial effect on humans and not achieving to respect the OMS’ regulations (Calidad del Aire | Ajuntament de Barcelona, s. f.).

The aim of this project is to analyze the current situation and pollution of the city and how it’s related to the “zonas verdes” in order to make conclusions and find a way to make Barcelona a better place. 


Import data

The libs that are going to be used.

library(readr)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(sf)
Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
library(osmdata)
Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
library(leaflet)
library(ggmap)
ℹ Google's Terms of Service: <https://mapsplatform.google.com>
  Stadia Maps' Terms of Service: <https://stadiamaps.com/terms-of-service>
  OpenStreetMap's Tile Usage Policy: <https://operations.osmfoundation.org/policies/tiles>
ℹ Please cite ggmap if you use it! Use `citation("ggmap")` for details.

Import of the Air data and the csv that correlates the contaminants with the “Codi_Contaminant”

A_E <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/2024_01_Gener_qualitat_aire_BCN.csv")

A_F <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/2024_02_Febrer_qualitat_aire_BCN.csv")

A_M <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/2024_03_Marc_qualitat_aire_BCN.csv")

A_A <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/2024_04_Abril_qualitat_aire_BCN.csv")

A_J <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/2024_06_Juny_qualitat_aire_BCN.csv")

A_J2 <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/2024_07_Juliol_qualitat_aire_BCN.csv")

A_A2 <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/2024_08_Agost_qualitat_aire_BCN.csv")

A_S <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/2024_09_Setembre_qualitat_aire_BCN.csv")

A_O <- read.csv("~/Documents/BIDA/R_PROJECT/Datos//2024_10_Octubre_qualitat_aire_BCN.csv")

Cont <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/qualitat_aire_contaminants.csv")

vec <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/2023_qualitat_aire_estacions.csv")
vec <- select(vec, Estacio,Longitud,Latitud, Codi_barri)

###Join the DF in one

A_ToT <- rbind(A_E, A_F, A_M,A_A, A_J, A_J2, A_A2, A_S, A_O)
A_ToT <-transmute(A_ToT,ESTACIO,CODI_CONTAMINANT,MES,DIA,H01, H02, H03, H04, H05, H06, H07, H08, H09, H10, H11, H12, H13, H14, H15, 
                   H16, H17, H18, H19, H20, H21, H22, H23, H24)

Then we import the data from the trees

T_T1 <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/Trees in green areas/2024_1T_OD_Arbrat_Parcs_BCN.csv")

T_T2 <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/Trees in green areas/2024_2T_OD_Arbrat_Parcs_BCN.csv")

T_T3 <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/Trees in green areas/2024_3T_OD_Arbrat_Parcs_BCN.csv")

T_S <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/Trees on the street/2024_1T_OD_Arbrat_Viari_BCN.csv")

#Select from data the colums that may be usefull

DataT_T1 <- select(T_T1, codi,latitud,longitud, espai_verd, geom, codi_barri, nom_barri, codi_districte, nom_districte)
DataT_T2 <- select(T_T2, codi, espai_verd,latitud,longitud, geom, codi_barri, nom_barri, codi_districte, nom_districte)
DataT_T3 <- select(T_T3, codi,latitud,longitud, espai_verd, geom, codi_barri, nom_barri, codi_districte, nom_districte)

Import Barris(Info needed to map the city of barcelona) and the data for the population

Pop <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/Taula estadística.csv")
#Select the most recent info on the data frame 
Pop <- select(Pop, Territori,  "X01.Jul..2024")

We create a map of Barcelona by barris with the trees in green areas

ggplot() +
  geom_sf(data = barris, aes(fill = DISTRICTE)) +
  geom_point(data = DataT_T1, 
             aes(x = longitud, y = latitud),
             alpha = 0.8, 
             color = "black",
             shape = 21,
             stroke = 0.5, 
             fill = "darkgreen")

In order to give an adequate response to the second question, we decided to compile all information possible about green levels such as trees in green areas, trees in urban areas and population levels.

When it comes to the number of trees and after making a map using the coordinates of Barcelona to locates not only trees but the population of each area, our results showed that the number of trees in a certain location is usually related with the population of that same location and the more population there is, the more trees there usually are.

On the other hand, something that really surprised us is that we could also see that the number of trees is proportional to the amount of pollution generated in that region of the map. Therefore, to answer the question whether trees help to clean the air, the answer would be that, regardless of the scientific impact of the photosynthesis of trees, trees clearly don’t have as much of an impact on contamination levels as one would believe and don’t change the air quality of the city.

This plot is more demanding but we can see all the trees in the street and the sensors in Barcelona

Barcelona <- getbb("Barcelona, España")

Ciudad <- "Barcelona"
posicion <- 1
url <- URLencode(paste0("https://nominatim.openstreetmap.org/search.php?q=",
                        Ciudad, "&polygon_geojson=1&format=geojson"))
destfile = paste0(tempdir(), "/limits.json")
download.file(url, destfile)
Barcelona_frontera <- st_read(destfile)[posicion,]
Reading layer `limits' from data source 
  `/private/var/folders/qm/dyv7_1d56_s7xkdpp89gjbyw0000gn/T/RtmpsvnK2E/limits.json' 
  using driver `GeoJSON'
Simple feature collection with 2 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 1.360384 ymin: 41.19271 xmax: 2.778003 ymax: 42.3233
Geodetic CRS:  WGS 84
leaflet(Barcelona_frontera) %>% 
  addTiles() %>% 
  addPolygons()
Barcelona_calles <- opq(Barcelona) %>% 
  add_osm_feature(key="highway")


Barcelona_calles <- Barcelona_calles %>% 
  osmdata_sf()
Warning in fix_columns_list(res[kv_df]): Feature keys clash with id or metadata columns and will be renamed by appending `.n`:
    osm_id.1
Barcelona_calles
Object of class 'osmdata' with:
                 $bbox : 41.3170353,2.0524977,41.4679135,2.2283555
        $overpass_call : The call submitted to the overpass API
                 $meta : metadata including timestamp and version numbers
           $osm_points : 'sf' Simple Features Collection with 462013 points
            $osm_lines : 'sf' Simple Features Collection with 110985 linestrings
         $osm_polygons : 'sf' Simple Features Collection with 2665 polygons
       $osm_multilines : NULL
    $osm_multipolygons : 'sf' Simple Features Collection with 234 multipolygons
Barcelona_calles <- Barcelona_calles$osm_lines


Barcelona_calles <- st_intersection(Barcelona_calles, Barcelona_frontera)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
Barcelona_calles_Sensores <- vec %>%
  mutate(Estacio = as.numeric(Estacio), )


Barcelona_calles <- st_as_sf(Barcelona_calles)

ggplot()+
  geom_sf(data = Barcelona_calles, color = "gray20")+
  geom_point(data = Barcelona_calles_Sensores,
             aes(x = Longitud, y = Latitud),
             alpha = 0.8, 
             color = "black",
             shape = 21,
             stroke = 0.5,
             fill = "red",
             size = 3
            ) +
  geom_point(data = T_S,
             aes(x = longitud, y = latitud),
             alpha = 0.5, 
             color = "black",
             shape = 21,
             stroke = 0.5,
             fill = "skyblue1",
             position = "jitter"
             )

#ggplot() +
  #geom_sf(data = barris, aes(fill = DISTRICTE)) +
  #geom_point(data = T_S, 
            # aes(x = longitud, y = latitud),
             #alpha = 0.8, 
             #color = "black",
             #shape = 21,
             #stroke = 0.5, 
             #fill = "darkgreen")
#The plot below is less demanding

In order to give an adequate response to the second question, we decided to compile all information possible about green levels such as trees in green areas, trees in urban areas and population levels.

When it comes to the number of trees and after making a map using the coordinates of Barcelona to locates not only trees but the population of each area, our results showed that the number of trees in a certain location is usually related with the population of that same location and the more population there is, the more trees there usually are.

On the other hand, something that really surprised us is that we could also see that the number of trees is proportional to the amount of pollution generated in that region of the map. As of right now, we don’t have a clear answer as to why this phenomenon happens since it’s quite counter-intuitive but, as a Barcelona citizen, this could be caused because of the coincidence that the places where there are the most trees are in fact the ones where more population there is and the most pollution-related industries are based.

barris_P <- inner_join(barris,Pop, join_by("NOM"=="Territori"))
barris_P <- rename(barris_P, Population = X01.Jul..2024)
ggplot(barris_P)+
  geom_sf(aes(fill = Population))+
  scale_fill_viridis_b()

To analyze the total amount of pollution in the neighborhoods of the city, we managed to create a map in order to clearly visualize the degree of pollution in each neighborhood depending on the color of its outline. As a result, we managed to see that the most polluted regions were Poblenou and the “Casco Antiguo” and, as the neighborhoods were further away from said regions, the less polluted they were. For instance, as seen in the graph, we could say that the less polluted neighborhoods were Sant Pere, Nova Esquerra de L’Eixample (32 121 and 262 485 habitants respectively) and the more polluted as clearly Poblenou (34.411 habitants).

Evolution of every contaminat over the course of the year.

A_ToT2 <- rowSums(A_ToT [, c(5:28)],)
MEAN_T <- A_ToT2/24
A_ToT <- cbind(A_ToT,MEAN_T)

Cont <- select(Cont, Codi_Contaminant, Desc_Contaminant)
Cont <- Cont[!duplicated(Cont), ]
Cont <- rename(Cont, CODI_CONTAMINANT = Codi_Contaminant)
A_ToT <- left_join(A_ToT, Cont,)
Joining with `by = join_by(CODI_CONTAMINANT)`
A_ToT <- A_ToT[complete.cases(A_ToT),]


ggplot(A_ToT)+
  geom_boxplot(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant), show.legend = T,)+
  ylim(0,3000)
Warning: Removed 856 rows containing non-finite outside the scale range
(`stat_boxplot()`).

To answer the question related to the evolution of pollution and contaminants during the year, we established that the best way to gain a better view of the current situations was producing a boxplot where both the month and the amount of pollution produced were easily distinguishable. In addition, in order to make it more visual, we changed the colour of the lines depending on which contaminant it was but since lots of data was stored in a single boxplot and the range of values were extremely diverse, we finally opted for making one boxplot for each contaminant representing the year 2024 from January to October.

Same but divided in order to achieve a more visible representation

NO<- filter(A_ToT,Desc_Contaminant== "NO" )
NOx<- filter(A_ToT,Desc_Contaminant== "NOx" )
PM10<- filter(A_ToT,Desc_Contaminant== "PM10" | Desc_Contaminant== "PM2.5" )
O3 <- filter(A_ToT,Desc_Contaminant== "O3" )
SO2 <- filter(A_ToT,Desc_Contaminant== "SO2" )
NO2 <- filter(A_ToT,Desc_Contaminant== "NO2" )
CO<- filter(A_ToT,Desc_Contaminant== "CO" )
BC<-filter(A_ToT,Desc_Contaminant== "Black Carbon" )
BBC <- filter(A_ToT,Desc_Contaminant== "Biomassa Black Carbon" )
ggplot(NO)+
  geom_boxplot(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant), show.legend = T)+
  geom_smooth(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant),)+
  xlab("Month")+ ylab("Mean")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(NOx)+
  geom_boxplot(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant), show.legend = T)+
  geom_smooth(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant),)+
  xlab("Month")+ ylab("Mean")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(PM10)+
  geom_boxplot(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant), show.legend = T)+
  geom_smooth(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant),)+
  xlab("Month")+ ylab("Mean")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(O3)+
  geom_boxplot(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant), show.legend = T)+
  geom_smooth(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant),)+
  xlab("Month")+ ylab("Mean")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(SO2)+
  geom_boxplot(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant), show.legend = T)+
  geom_smooth(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant),)+
  xlab("Month")+ ylab("Mean")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(NO2)+
  geom_boxplot(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant), show.legend = T)+
  geom_smooth(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant),)+
  xlab("Month")+ ylab("Mean")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(CO)+
  geom_boxplot(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant), show.legend = T)+
  geom_smooth(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant),)+
  xlab("Month")+ ylab("Mean")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(BC)+
  geom_boxplot(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant), show.legend = T)+
  geom_smooth(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant),)+
  xlab("Month")+ ylab("Mean")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

ggplot(BBC)+
  geom_boxplot(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant), show.legend = T)+
  geom_smooth(aes(x=as.factor(MES), y=MEAN_T, color=Desc_Contaminant),)+
  xlab("Month")+ ylab("Mean")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Even though we did find the graphs quite overwhelming at first, after detailed research we ended up realising many of the conclusions just comparing the graphs with one another. Before starting, something that caught our eye at first glance was the way that the boxplots for SO2 and O3 were composed and changed with time since they didn’t follow the patterns that the other contaminants did seem to follow. 

In terms of data analysis, our clear starting point was looking at if we could see any tendencies repeated in the majority of boxplots and, to our surprise, January was the month with more contamination (excluding the contaminants mentioned before) for most contaminants and after some thorough reflection, we have realised that temperature and the weather could be a factor that contributes to this effect since, with people staying inside more often, the use of heating, cars instead of walking and other fuel consumption activities increase contamination as a whole. Notwithstanding, the reverse effect happens during the summer months such as June or July where contamination levels in the city are lower than usual and, taking into account the previous reasoning and conclusions, these results in the summer could be expected due to the lack of polluting activities compared to other months.

In spite of our multiple attempts to understand and manipulate the csv to get conclusions, we haven’t found a way to compare the pollution levels by day and by hour taking into account the large amount of data and information available. It’s obvious we can intuitively say that pollution levels increase during peak/rush hours and during school/work months, therefore reducing in the summer and in holidays. But, once again, we haven’t been able to express these conclusions in code.

Finally we create a thermal map for de contamination

vec <- read.csv("~/Documents/BIDA/R_PROJECT/Datos/2023_qualitat_aire_estacions.csv")
vec <- select(vec, Estacio,Longitud,Latitud, Codi_barri)

vec <- rename(vec, ESTACIO=Estacio)
vec <- vec[!duplicated(vec), ]
vec <- mutate(vec, Codi_barri = as.integer(vec$Codi_barri))
barris <- mutate(barris, BARRI = as.integer(barris$BARRI))
barris <- inner_join(barris, vec, join_by("BARRI"=="Codi_barri"))  
barris <- inner_join(barris, A_ToT)
Joining with `by = join_by(ESTACIO)`
ggplot(barris)+
  geom_sf(aes(fill = BARRI))+
  geom_point(data = vec, 
             aes(x = Longitud, y = Latitud),
             alpha = 0.8, 
             color = "black",
            shape = 21,
             stroke = 0.5, 
             size = 3, 
             fill = "red")+
  geom_sf_label(mapping = aes(label=NOM), size = 2, color = "purple")+
  scale_fill_viridis_b()
Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
give correct results for longitude/latitude data

##The last plot is really demanding, comment for saving time, without the label the plot runs better.

As a group, we also thought that, to further explain our conclusions and to possibly explain our difference in results with our classmates, we should put the values and graphs we have made into context. When thinking about topics such as pollution or contaminants, it’s essential to treat them as relative results that can be interpreted in different ways. 

In our case, after considering several options, we considered that the best way to give the same importance to all contaminants was respecting the scale given by the government, meaning that we didn’t multiply the amount of contaminants by its weight. On the other hand, we do understand that some groups may have decided to be more “objective” by multiplying the scale by weight. In our eyes, we think that this approach is equally valid but wouldn’t give an accurate representation of the real life results. This is due to the fact that the contaminants that weigh more would appear more pollutant and more dangerous even though this could not be the case. 

To understand better this claim, an easy example is that, if we decided to multiply every contaminant by its mass (using the scale of the government including the density of each substance), in the case of chemical plant, CO2 would be more dangerous and pollutant than Uranium only because of its weight and not because of its composition. 

After explaining this factor, we can relate this to our project since, despite the most contaminated neighborhood being Poblenou after our studies, we can understand that, in other cases, neighborhoods like L’Eixample or Sant Antoni could appear as the most polluted in other projects. 

If we had more information in our hands and more time to prepare this project in a more professional manner, we would multiply every contaminant by its weight and by “toxicity” to be able to establish which ones contaminate the most instead of seeing which ones are the heaviest.