Project goals:

Data gathering

Loading packages

library(readr)
library(dplyr)
library(tidyr)
library(openair) # http://davidcarslaw.github.io/openair/
library(purrr)
library(lubridate)
library(ggplot2)
library(stringr)
library(knitr)
library(xts)
library(zoo)
library(gridExtra)
library(astsa)
library(rvest)
library(fpp2)

First of all I have to check if I will have the basic data to make the analysis. I need air pollution and weather data of the Gijon area. The town hall of Gijon has an open data web portal here https://transparencia.gijon.es/. We can download pollution air data on csv format from year 2000 to 2017 here:

I downloaded 18 csv files with air pollution and weather data of Gijón from years 2000 to 2017. I saved them in the “data” folder. I downloaded two more files from this web, a csv file with the description of the variables and another csv file with information about the measurement stations.

We take a look to the information included in the stations_info.csv file. It includes the stations addresses, longitude, latitude and their IDs and names. All this information, as we will see, is included in the csv files with pollution and weather data too. So, we are not going to use this file anymore.

stations <- read_delim('data/stations_info.csv',
                       delim = ';',
                       escape_double = FALSE, 
                       trim_ws = TRUE,
                       locale = locale(encoding = "ISO-8859-1"),
                       col_types = cols(.default = "c"))
stations
## # A tibble: 6 x 6
##   `"ID;""Título""`     `""Dirección""`     `""Población""` `""Provincia""`
##   <chr>                <chr>               <chr>           <chr>          
## 1 "\"1;\"\"Constituci~ "\"\"Avda. Constit~ "\"\"Gijón\"\"" "\"\"Asturias\~
## 2 "\"2;\"\"Argentina\~ "\"\"Avda. Argenti~ "\"\"Gijón\"\"" "\"\"Asturias\~
## 3 "\"3;\"\"H. Felguer~ "\"\"H. Felgueroso~ "\"\"Gijón\"\"" "\"\"Asturias\~
## 4 "\"4;\"\"Castilla\"~ "\"\"Plaza Castill~ "\"\"Gijón\"\"" "\"\"Asturias\~
## 5 "\"10;\"\"Montevil\~ "\"\"Montevil\"\""  "\"\"Gijón\"\"" "\"\"Asturias\~
## 6 "\"11;\"\"Santa Bár~ "\"\"Santa Bárbara~ "\"\"Gijón\"\"" "\"\"Asturias\~
## # ... with 2 more variables: `""latitud""` <chr>, `""longitud""",,` <chr>

We can see on this image the location of each station. http://movil.asturias.es/medioambiente/articulos/ficheros/Informe%20de%20calidad%20del%20aire%20en%20Asturias%202016.pdf

Image source: Informe de calidad del aire del Principado de Asturias (2016).

Image source: “Informe de calidad del aire del Principado de Asturias (2016)”.

The air_data_descriptors.csv file contains information about the nature of the elements monitored by the stations. Names, descriptions and units.

variables <- read_csv('data/air_data_descriptors.csv', locale = locale(encoding = "ISO-8859-1"))
variables
## # A tibble: 17 x 4
##    Parametro `Descripción Parámetro`            TAG   Unidad
##    <chr>     <chr>                              <chr> <chr> 
##  1 BEN       Benceno                            BEN   µg/m³ 
##  2 CO        Concentracion de CO                CO    mg/m³ 
##  3 DD        Direccion del viento               DD    Grados
##  4 HR        Humedad relativa                   HR    %hr   
##  5 LL        Precipitacion                      LL    l/m²  
##  6 MXIL      MXileno                            MXIL  µg/m³ 
##  7 NO        Concentracion de NO                NO    µg/m³ 
##  8 NO2       Concentracion de NO2               NO2   µg/m³ 
##  9 O3        Concentracion de Ozono             O3    µg/m³ 
## 10 PM10      Particulas en suspension <10 µg/m³ PM10  µg/m³ 
## 11 PM25      Particulas en Suspension PM 2,5    PM25  µg/m³ 
## 12 PRB       Presion Atmosferica                PRB   mb    
## 13 RS        Radiacion Solar                    RS    W/m²  
## 14 SO2       Concentracion de SO2               SO2   µg/m³ 
## 15 TMP       Temperatura Seca                   TMP   ºC    
## 16 TOL       Tolueno                            TOL   µg/m³ 
## 17 VV        Velocidad del viento               VV    m/s

In order to import the data from the 18 csv files we list all the files in the object data_files.

data_files <- list.files(path = "data", pattern = "air_data_20*")

Then, we map the function read_csv on this list in order to import every file and finally merge them in a unique dataframe (air_data_0) with reduce(rbind).

air_data_0 <- data_files %>% 
    map(function(x) {
        read_csv(paste0("./data/", x), locale = locale(encoding = "ISO-8859-1"), col_types = cols(.default = "c"))
    }) %>%
    reduce(rbind)

We take a look to the data set

glimpse(air_data_0)
## Observations: 722,774
## Variables: 22
## $ Estación            <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1...
## $ Título              <chr> "Estación Avenida Constitución", "Estación...
## $ latitud             <chr> "43.529806", "43.529806", "43.529806", "43...
## $ longitud            <chr> "-5.673428", "-5.673428", "-5.673428", "-5...
## $ `Fecha Solar (UTC)` <chr> "2000-01-01T00:00:00", "2000-01-01T01:00:0...
## $ SO2                 <chr> "23", "29", "40", "50", "39", "39", "40", ...
## $ NO                  <chr> "89", "73", "53", "46", "35", "26", "27", ...
## $ NO2                 <chr> "65", "60", "57", "53", "50", "49", "51", ...
## $ CO                  <chr> "1.97", "1.61", "1.13", "1.06", "0.95", "0...
## $ PM10                <chr> "53", "63", "56", "58", "50", "50", "57", ...
## $ O3                  <chr> "9", "8", "7", "5", "6", "7", "7", "4", "5...
## $ dd                  <chr> "245", "222", "228", "239", "244", "218", ...
## $ vv                  <chr> "0.34", "1.06", "0.71", "0.84", "0.89", "0...
## $ TMP                 <chr> "5.7", "5.4", "5.3", "5.1", "4.6", "4.6", ...
## $ HR                  <chr> "76", "73", "72", "71", "72", "69", "68", ...
## $ PRB                 <chr> "1026", "1025", "1025", "1025", "1024", "1...
## $ RS                  <chr> "33", "33", "33", "33", "33", "33", "33", ...
## $ LL                  <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0...
## $ BEN                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ TOL                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ MXIL                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ PM25                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...

We change the names of some variables.

# Variables names changing
air_data_1 <- air_data_0 %>% rename(station = `Estación`,
                                    station_name = `Título`,
                                    date_time_utc = `Fecha Solar (UTC)`,
                                    latitude = latitud,
                                    longitude = longitud)

We imported all the columns as characters in order to avoid problems with the format attributions. So, we have to make now some format variable changes.

We change the date_time_utc format from character to date time.

air_data_1$date_time_utc <- ymd_hms(air_data_1$date_time_utc)

We change the station and station_name formats from character to factor.

air_data_1$station <- as.factor(air_data_1$station)
air_data_1$station_name <- as.factor(air_data_1$station_name)

We create a vector with all the variables we want to be numeric

num <- colnames(air_data_1)[c(3, 4, 6:22)]

We make the conversion of this set of variables to numeric

air_data_1 <- air_data_1 %>% mutate_at(num, as.numeric)

We create a dictionary with an alias for each station in order to add a new variable with more convenient station names

alias_dict <- data.frame(
      station = c("1", "2", "3", "4", "10", "11"),
      station_alias = c("Constitución", "Argentina", "H. Felgueroso", "Castilla", "Montevil", "Santa Bárbara")
)

We join the alias dictionary to the air_data_1 data frame to add the new variable to the data set.

air_data_1 <- air_data_1 %>% left_join(alias_dict, by = 'station')

We call the summary function to inspect the data main indicators

summary(air_data_1)
##  station                                   station_name       latitude    
##  1 :157727   Estación Avenida Argentina          :157798   Min.   :43.52  
##  10: 74630   Estación Avenida Castilla           :157409   1st Qu.:43.53  
##  11: 17544   Estación Avenida Constitución       :157727   Median :43.54  
##  2 :157798   Estación Avenida Hermanos Felgueroso:157666   Mean   :43.53  
##  3 :157666   Estación de Montevil                : 74630   3rd Qu.:43.54  
##  4 :157409   Estación Santa Bárbara              : 17544   Max.   :43.54  
##                                                                           
##    longitude      date_time_utc                      SO2          
##  Min.   :-5.699   Min.   :2000-01-01 00:00:00   Min.   :-9999.00  
##  1st Qu.:-5.673   1st Qu.:2005-02-25 05:00:00   1st Qu.:    4.00  
##  Median :-5.672   Median :2010-02-23 11:00:00   Median :    6.00  
##  Mean   :-5.670   Mean   :2009-09-06 07:33:13   Mean   :    9.77  
##  3rd Qu.:-5.658   3rd Qu.:2014-04-09 06:00:00   3rd Qu.:   11.00  
##  Max.   :-5.646   Max.   :2018-01-01 00:00:00   Max.   : 2662.00  
##                                                 NA's   :33742     
##        NO                NO2                 CO             PM10         
##  Min.   :-9999.00   Min.   :-9999.00   Min.   : 0.00   Min.   :-9999.00  
##  1st Qu.:    4.40   1st Qu.:   16.00   1st Qu.: 0.22   1st Qu.:   19.00  
##  Median :   10.00   Median :   28.00   Median : 0.36   Median :   30.00  
##  Mean   :   21.37   Mean   :   32.04   Mean   : 0.49   Mean   :   35.88  
##  3rd Qu.:   23.00   3rd Qu.:   45.00   3rd Qu.: 0.59   3rd Qu.:   46.00  
##  Max.   : 1248.00   Max.   : 1003.20   Max.   :58.20   Max.   : 1000.00  
##  NA's   :16989      NA's   :16446      NA's   :90390   NA's   :88598     
##        O3                 dd               vv              TMP        
##  Min.   :-9999.00   Min.   :  0.0    Min.   : 0.0     Min.   :-40.0   
##  1st Qu.:   17.00   1st Qu.: 96.0    1st Qu.: 0.2     1st Qu.: 10.9   
##  Median :   37.00   Median :159.0    Median : 0.7     Median : 14.7   
##  Mean   :   38.97   Mean   :161.8    Mean   : 1.0     Mean   : 14.6   
##  3rd Qu.:   57.00   3rd Qu.:228.0    3rd Qu.: 1.5     3rd Qu.: 18.4   
##  Max.   :  998.00   Max.   :360.0    Max.   :29.8     Max.   : 47.4   
##  NA's   :31417      NA's   :494134   NA's   :493893   NA's   :494151  
##        HR              PRB               RS               LL        
##  Min.   :  0.0    Min.   : 800     Min.   :  -1.0   Min.   : 0.0    
##  1st Qu.: 69.0    1st Qu.:1007     1st Qu.:  17.0   1st Qu.: 0.0    
##  Median : 80.0    Median :1013     Median :  46.0   Median : 0.0    
##  Mean   : 78.3    Mean   :1012     Mean   : 125.2   Mean   : 0.1    
##  3rd Qu.: 89.0    3rd Qu.:1018     3rd Qu.: 149.0   3rd Qu.: 0.0    
##  Max.   :123.0    Max.   :1282     Max.   :1470.0   Max.   :24.6    
##  NA's   :494176   NA's   :494019   NA's   :494273   NA's   :494124  
##       BEN              TOL              MXIL             PM25       
##  Min.   : 0.0     Min.   : -0.2    Min.   : -0.3    Min.   :  0.0   
##  1st Qu.: 0.1     1st Qu.:  0.4    1st Qu.:  0.2    1st Qu.:  5.0   
##  Median : 0.3     Median :  1.0    Median :  0.3    Median :  9.0   
##  Mean   : 0.5     Mean   :  2.5    Mean   :  1.3    Mean   : 11.3   
##  3rd Qu.: 0.5     3rd Qu.:  2.5    3rd Qu.:  0.9    3rd Qu.: 15.0   
##  Max.   :22.5     Max.   :196.0    Max.   :220.0    Max.   :947.0   
##  NA's   :629358   NA's   :629380   NA's   :635123   NA's   :554185  
##        station_alias   
##  Argentina    :157798  
##  Castilla     :157409  
##  Constitución :157727  
##  H. Felgueroso:157666  
##  Montevil     : 74630  
##  Santa Bárbara: 17544  
## 

There are several variables which minimun values are -9999.

kable(air_data_1 %>% filter(SO2 == -9999))
station station_name latitude longitude date_time_utc SO2 NO NO2 CO PM10 O3 dd vv TMP HR PRB RS LL BEN TOL MXIL PM25 station_alias
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 00:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 01:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 02:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 03:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 04:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 05:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 06:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 07:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 08:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 09:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 10:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 11:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 12:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 13:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 14:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 15:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 16:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 17:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 18:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 19:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 20:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 21:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 22:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso
3 Estación Avenida Hermanos Felgueroso 43.53506 -5.658123 2000-01-27 23:00:00 -9999 -9999 -9999 0 -9999 -9999 NA NA NA NA NA NA NA NA NA NA NA H. Felgueroso

They are all from the same day (2000-01-27) and from the same station (‘H. Felgueroso’). All the variables from that day, excepts the ‘CO’ indicator, are equal to ‘-9999’.

So, we replace these values by NAs.

air_data_2 <- air_data_1 %>% mutate(SO2 = replace(SO2, SO2 == -9999, NA),
                                    NO = replace(NO, NO == -9999, NA),
                                    NO2 = replace(NO2, NO2 == -9999, NA),
                                    PM10 = replace(PM10, PM10 == -9999, NA),
                                    O3 = replace(O3, O3 == -9999, NA))

We check again the output of summary.

summary(air_data_2)
##  station                                   station_name       latitude    
##  1 :157727   Estación Avenida Argentina          :157798   Min.   :43.52  
##  10: 74630   Estación Avenida Castilla           :157409   1st Qu.:43.53  
##  11: 17544   Estación Avenida Constitución       :157727   Median :43.54  
##  2 :157798   Estación Avenida Hermanos Felgueroso:157666   Mean   :43.53  
##  3 :157666   Estación de Montevil                : 74630   3rd Qu.:43.54  
##  4 :157409   Estación Santa Bárbara              : 17544   Max.   :43.54  
##                                                                           
##    longitude      date_time_utc                      SO2         
##  Min.   :-5.699   Min.   :2000-01-01 00:00:00   Min.   :  -2.00  
##  1st Qu.:-5.673   1st Qu.:2005-02-25 05:00:00   1st Qu.:   4.00  
##  Median :-5.672   Median :2010-02-23 11:00:00   Median :   6.00  
##  Mean   :-5.670   Mean   :2009-09-06 07:33:13   Mean   :  10.12  
##  3rd Qu.:-5.658   3rd Qu.:2014-04-09 06:00:00   3rd Qu.:  11.00  
##  Max.   :-5.646   Max.   :2018-01-01 00:00:00   Max.   :2662.00  
##                                                 NA's   :33766    
##        NO               NO2                CO             PM10        
##  Min.   :   0.00   Min.   :   0.00   Min.   : 0.00   Min.   :   0.00  
##  1st Qu.:   4.40   1st Qu.:  16.00   1st Qu.: 0.22   1st Qu.:  19.00  
##  Median :  10.00   Median :  28.00   Median : 0.36   Median :  30.00  
##  Mean   :  21.71   Mean   :  32.38   Mean   : 0.49   Mean   :  36.26  
##  3rd Qu.:  23.00   3rd Qu.:  45.00   3rd Qu.: 0.59   3rd Qu.:  46.00  
##  Max.   :1248.00   Max.   :1003.20   Max.   :58.20   Max.   :1000.00  
##  NA's   :17013     NA's   :16470     NA's   :90390   NA's   :88622    
##        O3               dd               vv              TMP        
##  Min.   :  0.00   Min.   :  0.0    Min.   : 0.0     Min.   :-40.0   
##  1st Qu.: 17.00   1st Qu.: 96.0    1st Qu.: 0.2     1st Qu.: 10.9   
##  Median : 37.00   Median :159.0    Median : 0.7     Median : 14.7   
##  Mean   : 39.32   Mean   :161.8    Mean   : 1.0     Mean   : 14.6   
##  3rd Qu.: 57.00   3rd Qu.:228.0    3rd Qu.: 1.5     3rd Qu.: 18.4   
##  Max.   :998.00   Max.   :360.0    Max.   :29.8     Max.   : 47.4   
##  NA's   :31441    NA's   :494134   NA's   :493893   NA's   :494151  
##        HR              PRB               RS               LL        
##  Min.   :  0.0    Min.   : 800     Min.   :  -1.0   Min.   : 0.0    
##  1st Qu.: 69.0    1st Qu.:1007     1st Qu.:  17.0   1st Qu.: 0.0    
##  Median : 80.0    Median :1013     Median :  46.0   Median : 0.0    
##  Mean   : 78.3    Mean   :1012     Mean   : 125.2   Mean   : 0.1    
##  3rd Qu.: 89.0    3rd Qu.:1018     3rd Qu.: 149.0   3rd Qu.: 0.0    
##  Max.   :123.0    Max.   :1282     Max.   :1470.0   Max.   :24.6    
##  NA's   :494176   NA's   :494019   NA's   :494273   NA's   :494124  
##       BEN              TOL              MXIL             PM25       
##  Min.   : 0.0     Min.   : -0.2    Min.   : -0.3    Min.   :  0.0   
##  1st Qu.: 0.1     1st Qu.:  0.4    1st Qu.:  0.2    1st Qu.:  5.0   
##  Median : 0.3     Median :  1.0    Median :  0.3    Median :  9.0   
##  Mean   : 0.5     Mean   :  2.5    Mean   :  1.3    Mean   : 11.3   
##  3rd Qu.: 0.5     3rd Qu.:  2.5    3rd Qu.:  0.9    3rd Qu.: 15.0   
##  Max.   :22.5     Max.   :196.0    Max.   :220.0    Max.   :947.0   
##  NA's   :629358   NA's   :629380   NA's   :635123   NA's   :554185  
##        station_alias   
##  Argentina    :157798  
##  Castilla     :157409  
##  Constitución :157727  
##  H. Felgueroso:157666  
##  Montevil     : 74630  
##  Santa Bárbara: 17544  
## 

Some pollutant variables have as minimum negative values. It does not make much sense. We take a look to the data in order to quantify the problem.

30 SO2 observations between 2015-12-25 and 2015-12-28 from the Montevil station:

kable(neg_SO2 <- air_data_2 %>% filter(SO2 < 0))
station station_name latitude longitude date_time_utc SO2 NO NO2 CO PM10 O3 dd vv TMP HR PRB RS LL BEN TOL MXIL PM25 station_alias
10 Estación de Montevil 43.51732 -5.672499 2015-12-25 04:00:00 -1 7 10 NA NA 1 5 0.70 11.6 73 1019 3 0 NA NA NA 8 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-25 05:00:00 -1 6 9 NA NA 1 184 0.70 11.4 73 1019 3 0 NA NA NA 8 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-25 06:00:00 -1 6 7 NA NA 1 180 0.63 11.4 73 1019 3 0 NA NA NA 6 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-25 07:00:00 -1 6 7 NA NA 1 177 0.30 11.0 73 1019 3 0 NA NA NA 7 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-25 08:00:00 -1 8 3 NA NA 1 194 0.75 11.1 73 1020 5 0 NA NA NA 10 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-25 09:00:00 -1 8 9 NA NA 1 237 0.43 12.8 68 1020 81 0 NA NA NA 8 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-25 17:00:00 -1 8 20 NA NA 1 240 0.37 19.4 58 1020 24 0 NA NA NA 4 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-25 22:00:00 -1 8 22 NA NA 1 352 1.00 13.6 61 1021 3 0 NA NA NA 17 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-26 01:00:00 -1 6 6 NA NA 1 355 1.02 12.1 64 1021 3 0 NA NA NA 11 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-26 18:00:00 -1 7 22 NA NA 1 166 1.00 18.5 35 1018 4 0 NA NA NA 9 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-26 19:00:00 -1 7 16 NA NA 1 174 0.90 17.8 36 1018 3 0 NA NA NA 5 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-26 23:00:00 -1 6 7 NA NA 1 359 1.65 14.7 42 1019 3 0 NA NA NA 21 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-27 00:00:00 -1 7 11 NA NA 1 358 1.40 14.2 42 1020 3 0 NA NA NA 16 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-27 02:00:00 -1 6 4 NA NA 1 353 1.43 13.2 46 1019 3 0 NA NA NA 10 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-27 06:00:00 -1 7 11 NA NA 1 359 0.50 10.4 51 1018 3 0 NA NA NA 12 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-27 07:00:00 -1 7 15 NA NA 1 186 0.35 9.9 54 1018 3 0 NA NA NA 11 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-27 08:00:00 -1 10 15 NA NA 1 182 0.55 10.1 53 1018 4 0 NA NA NA 11 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-27 09:00:00 -1 17 25 NA NA 1 178 0.35 9.4 56 1018 33 0 NA NA NA 8 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-27 10:00:00 -1 23 28 NA NA 1 185 0.30 12.8 49 1018 191 0 NA NA NA 5 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-27 12:00:00 -1 24 25 NA NA 1 356 0.40 16.5 49 1017 229 0 NA NA NA 10 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-27 20:00:00 -1 8 21 NA NA 1 181 1.75 18.4 26 1011 3 0 NA NA NA 6 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-27 22:00:00 -1 6 13 NA NA 1 175 2.35 18.9 25 1011 3 0 NA NA NA 28 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-27 23:00:00 -2 6 2 NA NA 1 177 3.85 20.0 25 1010 3 0 NA NA NA 13 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-28 00:00:00 -2 6 2 NA NA 1 178 4.15 19.7 26 1010 3 0 NA NA NA 8 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-28 01:00:00 -2 6 2 NA NA 1 177 2.80 19.0 29 1009 3 0 NA NA NA 16 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-28 02:00:00 -2 6 2 NA NA 1 175 2.27 18.5 32 1009 3 0 NA NA NA 7 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-28 03:00:00 -1 6 2 NA NA 1 158 1.80 18.2 33 1009 3 0 NA NA NA 9 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-28 04:00:00 -2 6 2 NA NA 1 156 2.07 17.4 36 1009 3 0 NA NA NA 11 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-28 05:00:00 -1 6 2 NA NA 1 194 2.20 19.0 35 1008 3 0 NA NA NA 8 Montevil
10 Estación de Montevil 43.51732 -5.672499 2015-12-28 06:00:00 -1 6 2 NA NA 1 197 3.13 19.2 37 1008 3 0 NA NA NA 18 Montevil

2 RS observations from the Constitucion station:

kable(neg_RS <- air_data_2 %>% filter(RS < 0))
station station_name latitude longitude date_time_utc SO2 NO NO2 CO PM10 O3 dd vv TMP HR PRB RS LL BEN TOL MXIL PM25 station_alias
1 Estación Avenida Constitución 43.52981 -5.673428 2005-07-09 01:00:00 2 15 21 0.10 18 49 25 1.09 17.7 88 1012 -1 0 NA NA NA NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2005-07-12 03:00:00 8 6 35 0.29 27 49 204 0.10 17.2 94 1014 -1 0 NA NA NA NA Constitución

27 TOL observations between the 2008-12-11 and the 2008-12-15 from the Constitucion station:

kable(neg_TOL <- air_data_2 %>% filter(TOL < 0))
station station_name latitude longitude date_time_utc SO2 NO NO2 CO PM10 O3 dd vv TMP HR PRB RS LL BEN TOL MXIL PM25 station_alias
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-11 03:00:00 3 2 3 0.10 9 39 203 0.96 5.5 80 1011.00 35 0.0 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-11 04:00:00 3 2 3 0.10 7 41 203 0.65 5.9 79 1010.15 35 0.0 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-11 05:00:00 3 2 8 0.10 5 38 225 1.79 6.0 77 1009.54 35 0.0 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-11 06:00:00 3 4 23 0.76 8 28 225 0.49 5.9 81 1009.52 35 0.4 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 20:00:00 2 13 45 0.82 9 24 158 0.44 4.8 77 987.08 34 0.4 0.0 -0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 23:00:00 3 2 11 0.91 10 44 247 1.59 3.5 82 987.34 34 0.0 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 00:00:00 2 2 8 0.79 5 45 180 0.87 3.8 77 986.57 34 0.0 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 01:00:00 3 2 10 0.84 5 44 226 1.58 3.4 78 987.19 34 2.6 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 02:00:00 3 2 6 0.81 2 50 180 1.37 1.7 87 986.68 34 0.2 0.1 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 03:00:00 3 2 5 0.81 2 51 203 1.36 2.2 85 986.61 34 0.0 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 04:00:00 3 2 5 0.85 3 52 203 1.47 2.3 86 987.05 34 1.4 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 05:00:00 3 2 5 0.86 3 51 180 1.17 2.2 86 986.36 34 0.8 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 06:00:00 3 2 8 0.88 2 49 203 1.14 2.8 82 986.05 34 0.0 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 07:00:00 3 2 9 0.86 4 47 203 0.84 3.3 78 985.92 34 0.2 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 08:00:00 3 2 10 0.89 8 46 225 1.78 3.6 76 986.65 34 0.2 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 09:00:00 2 2 6 0.92 8 51 159 1.48 1.9 85 987.42 35 2.4 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 12:00:00 2 7 31 0.36 4 35 225 1.44 2.7 87 987.90 60 0.6 0.0 -0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 16:00:00 48 11 26 0.94 2 29 5 1.20 4.1 81 991.82 44 0.6 0.0 -0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 21:00:00 4 7 41 0.45 15 32 226 0.50 4.1 78 996.81 34 0.6 0.1 -0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 22:00:00 4 7 36 0.59 17 31 180 0.38 3.2 85 997.08 34 0.6 0.1 -0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 02:00:00 3 2 3 0.29 8 55 6 1.41 7.6 66 998.75 34 0.0 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 03:00:00 3 NA NA 0.26 15 59 5 2.22 8.1 62 999.56 34 0.0 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 04:00:00 4 NA NA 0.26 7 55 5 1.05 8.1 64 1000.38 34 0.0 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 05:00:00 4 3 10 0.31 9 46 5 1.22 6.7 72 1000.93 34 0.6 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 06:00:00 3 4 21 0.30 10 33 6 0.88 7.1 75 1001.10 34 0.4 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 07:00:00 3 35 50 0.61 12 22 5 1.30 6.0 83 1001.81 34 1.8 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 10:00:00 2 12 20 0.32 17 30 5 2.67 8.6 80 1003.40 41 0.4 0.0 -0.1 -0.3 NA Constitución

59 MXIL observations between the 2008-12-10 and the 2008-12-15 from the Constitucion station:

kable(neg_MXIL <- air_data_2 %>% filter(MXIL < 0))
station station_name latitude longitude date_time_utc SO2 NO NO2 CO PM10 O3 dd vv TMP HR PRB RS LL BEN TOL MXIL PM25 station_alias
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-10 23:00:00 3 4 25 0.11 15 29 180 0.01 4.9 90 1013.39 35 0.2 0.0 0.2 -0.1 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-11 00:00:00 3 2 13 0.10 12 34 203 0.73 5.2 87 1012.97 35 0.0 0.0 0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-11 01:00:00 3 2 6 0.10 7 38 158 0.41 5.2 84 1012.54 35 0.0 0.0 0.0 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-11 02:00:00 3 2 6 0.10 6 38 225 0.96 5.2 83 1011.63 35 0.4 0.0 0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-11 03:00:00 3 2 3 0.10 9 39 203 0.96 5.5 80 1011.00 35 0.0 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-11 04:00:00 3 2 3 0.10 7 41 203 0.65 5.9 79 1010.15 35 0.0 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-11 05:00:00 3 2 8 0.10 5 38 225 1.79 6.0 77 1009.54 35 0.0 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-11 06:00:00 3 4 23 0.76 8 28 225 0.49 5.9 81 1009.52 35 0.4 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-11 09:00:00 3 21 60 0.18 11 9 158 0.17 6.8 78 1008.84 48 0.0 0.0 0.4 -0.1 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-12 04:00:00 4 2 16 0.48 12 16 137 0.00 6.3 81 1007.54 35 0.0 0.1 0.4 -0.1 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-12 05:00:00 5 24 35 0.71 19 5 5 0.00 6.4 82 1008.01 34 0.0 0.0 0.8 -0.1 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 02:00:00 10 3 20 0.11 11 27 180 0.60 9.5 55 998.36 33 0.0 0.0 0.2 -0.1 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 03:00:00 8 3 17 0.10 12 27 203 0.07 9.1 57 996.03 33 0.0 0.0 0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 04:00:00 15 3 27 0.10 13 19 226 0.20 8.9 59 994.11 33 0.0 0.0 0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 05:00:00 10 9 38 0.10 21 14 5 0.76 8.7 68 991.20 34 0.6 0.0 0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 06:00:00 11 26 56 0.13 27 4 6 0.00 8.2 76 988.69 34 0.4 0.2 0.3 -0.1 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 10:00:00 5 5 27 0.26 16 36 247 1.25 8.9 91 986.75 38 1.8 0.1 0.3 -0.1 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 11:00:00 3 6 27 0.10 5 31 181 1.04 8.9 85 986.22 53 0.0 0.0 0.2 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 14:00:00 5 12 39 0.84 20 29 248 1.22 6.9 85 985.84 71 1.8 0.1 0.2 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 15:00:00 3 8 31 0.50 19 33 225 0.48 7.9 80 985.32 64 1.8 0.1 0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 16:00:00 4 13 35 0.64 16 30 7 1.61 6.8 82 985.91 52 1.0 0.1 0.2 -0.1 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 17:00:00 4 10 37 0.70 13 29 248 1.41 6.0 78 986.60 38 0.6 0.1 0.0 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 18:00:00 2 6 30 0.64 11 36 226 1.46 5.3 76 986.86 34 0.0 0.1 0.0 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 19:00:00 2 7 32 0.70 8 33 181 1.44 5.1 77 986.86 34 0.2 0.0 0.0 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 20:00:00 2 13 45 0.82 9 24 158 0.44 4.8 77 987.08 34 0.4 0.0 -0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 22:00:00 3 7 32 0.87 10 29 225 0.89 4.8 75 986.85 34 1.0 0.1 0.2 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-13 23:00:00 3 2 11 0.91 10 44 247 1.59 3.5 82 987.34 34 0.0 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 00:00:00 2 2 8 0.79 5 45 180 0.87 3.8 77 986.57 34 0.0 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 01:00:00 3 2 10 0.84 5 44 226 1.58 3.4 78 987.19 34 2.6 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 02:00:00 3 2 6 0.81 2 50 180 1.37 1.7 87 986.68 34 0.2 0.1 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 03:00:00 3 2 5 0.81 2 51 203 1.36 2.2 85 986.61 34 0.0 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 04:00:00 3 2 5 0.85 3 52 203 1.47 2.3 86 987.05 34 1.4 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 05:00:00 3 2 5 0.86 3 51 180 1.17 2.2 86 986.36 34 0.8 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 06:00:00 3 2 8 0.88 2 49 203 1.14 2.8 82 986.05 34 0.0 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 07:00:00 3 2 9 0.86 4 47 203 0.84 3.3 78 985.92 34 0.2 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 08:00:00 3 2 10 0.89 8 46 225 1.78 3.6 76 986.65 34 0.2 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 09:00:00 2 2 6 0.92 8 51 159 1.48 1.9 85 987.42 35 2.4 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 10:00:00 2 6 23 0.98 6 39 158 1.68 1.6 88 987.88 41 1.8 0.0 0.0 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 11:00:00 2 4 18 0.95 3 41 226 2.01 1.8 87 987.84 66 0.4 0.0 0.0 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 12:00:00 2 7 31 0.36 4 35 225 1.44 2.7 87 987.90 60 0.6 0.0 -0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 13:00:00 2 7 30 0.30 5 35 203 1.08 2.9 87 987.93 55 0.8 0.1 0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 14:00:00 3 9 30 0.38 6 39 6 2.88 3.0 88 988.84 63 2.6 0.1 0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 15:00:00 3 10 34 0.33 4 34 5 0.84 5.0 79 990.38 82 0.4 0.1 0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 16:00:00 48 11 26 0.94 2 29 5 1.20 4.1 81 991.82 44 0.6 0.0 -0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 19:00:00 3 15 59 0.62 17 23 5 1.07 4.3 82 995.05 34 0.2 0.1 0.2 -0.1 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 20:00:00 3 13 39 0.47 16 39 5 1.86 4.5 78 995.85 34 0.2 0.1 0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 21:00:00 4 7 41 0.45 15 32 226 0.50 4.1 78 996.81 34 0.6 0.1 -0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 22:00:00 4 7 36 0.59 17 31 180 0.38 3.2 85 997.08 34 0.6 0.1 -0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-14 23:00:00 3 4 28 0.55 12 33 159 1.07 3.2 84 997.54 34 0.2 0.1 0.1 -0.2 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 00:00:00 3 3 24 0.51 14 32 5 0.87 3.1 86 998.28 34 0.4 0.1 0.0 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 01:00:00 3 2 9 0.36 18 46 6 0.77 6.0 73 998.51 34 0.4 0.1 0.0 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 02:00:00 3 2 3 0.29 8 55 6 1.41 7.6 66 998.75 34 0.0 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 03:00:00 3 NA NA 0.26 15 59 5 2.22 8.1 62 999.56 34 0.0 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 04:00:00 4 NA NA 0.26 7 55 5 1.05 8.1 64 1000.38 34 0.0 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 05:00:00 4 3 10 0.31 9 46 5 1.22 6.7 72 1000.93 34 0.6 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 06:00:00 3 4 21 0.30 10 33 6 0.88 7.1 75 1001.10 34 0.4 0.0 -0.2 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 07:00:00 3 35 50 0.61 12 22 5 1.30 6.0 83 1001.81 34 1.8 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 10:00:00 2 12 20 0.32 17 30 5 2.67 8.6 80 1003.40 41 0.4 0.0 -0.1 -0.3 NA Constitución
1 Estación Avenida Constitución 43.52981 -5.673428 2008-12-15 11:00:00 2 14 25 0.36 20 25 5 2.26 8.3 84 1004.68 47 0.6 0.0 0.0 -0.2 NA Constitución

There are not many cases. We replace them all by NAs and call again the summary function (to do: ask to the data owner about this detail).

air_data_2 <- air_data_2 %>% mutate(SO2 = replace(SO2, SO2 < 0, NA),
                                    RS = replace(RS, RS < 0, NA),
                                    TOL = replace(TOL, TOL < 0, NA),
                                    MXIL = replace(MXIL, MXIL < 0, NA))

summary(air_data_2)
##  station                                   station_name       latitude    
##  1 :157727   Estación Avenida Argentina          :157798   Min.   :43.52  
##  10: 74630   Estación Avenida Castilla           :157409   1st Qu.:43.53  
##  11: 17544   Estación Avenida Constitución       :157727   Median :43.54  
##  2 :157798   Estación Avenida Hermanos Felgueroso:157666   Mean   :43.53  
##  3 :157666   Estación de Montevil                : 74630   3rd Qu.:43.54  
##  4 :157409   Estación Santa Bárbara              : 17544   Max.   :43.54  
##                                                                           
##    longitude      date_time_utc                      SO2         
##  Min.   :-5.699   Min.   :2000-01-01 00:00:00   Min.   :   0.00  
##  1st Qu.:-5.673   1st Qu.:2005-02-25 05:00:00   1st Qu.:   4.00  
##  Median :-5.672   Median :2010-02-23 11:00:00   Median :   6.00  
##  Mean   :-5.670   Mean   :2009-09-06 07:33:13   Mean   :  10.12  
##  3rd Qu.:-5.658   3rd Qu.:2014-04-09 06:00:00   3rd Qu.:  11.00  
##  Max.   :-5.646   Max.   :2018-01-01 00:00:00   Max.   :2662.00  
##                                                 NA's   :33796    
##        NO               NO2                CO             PM10        
##  Min.   :   0.00   Min.   :   0.00   Min.   : 0.00   Min.   :   0.00  
##  1st Qu.:   4.40   1st Qu.:  16.00   1st Qu.: 0.22   1st Qu.:  19.00  
##  Median :  10.00   Median :  28.00   Median : 0.36   Median :  30.00  
##  Mean   :  21.71   Mean   :  32.38   Mean   : 0.49   Mean   :  36.26  
##  3rd Qu.:  23.00   3rd Qu.:  45.00   3rd Qu.: 0.59   3rd Qu.:  46.00  
##  Max.   :1248.00   Max.   :1003.20   Max.   :58.20   Max.   :1000.00  
##  NA's   :17013     NA's   :16470     NA's   :90390   NA's   :88622    
##        O3               dd               vv              TMP        
##  Min.   :  0.00   Min.   :  0.0    Min.   : 0.0     Min.   :-40.0   
##  1st Qu.: 17.00   1st Qu.: 96.0    1st Qu.: 0.2     1st Qu.: 10.9   
##  Median : 37.00   Median :159.0    Median : 0.7     Median : 14.7   
##  Mean   : 39.32   Mean   :161.8    Mean   : 1.0     Mean   : 14.6   
##  3rd Qu.: 57.00   3rd Qu.:228.0    3rd Qu.: 1.5     3rd Qu.: 18.4   
##  Max.   :998.00   Max.   :360.0    Max.   :29.8     Max.   : 47.4   
##  NA's   :31441    NA's   :494134   NA's   :493893   NA's   :494151  
##        HR              PRB               RS               LL        
##  Min.   :  0.0    Min.   : 800     Min.   :   0.0   Min.   : 0.0    
##  1st Qu.: 69.0    1st Qu.:1007     1st Qu.:  17.0   1st Qu.: 0.0    
##  Median : 80.0    Median :1013     Median :  46.0   Median : 0.0    
##  Mean   : 78.3    Mean   :1012     Mean   : 125.2   Mean   : 0.1    
##  3rd Qu.: 89.0    3rd Qu.:1018     3rd Qu.: 149.0   3rd Qu.: 0.0    
##  Max.   :123.0    Max.   :1282     Max.   :1470.0   Max.   :24.6    
##  NA's   :494176   NA's   :494019   NA's   :494275   NA's   :494124  
##       BEN              TOL              MXIL             PM25       
##  Min.   : 0.0     Min.   :  0.0    Min.   :  0.0    Min.   :  0.0   
##  1st Qu.: 0.1     1st Qu.:  0.4    1st Qu.:  0.2    1st Qu.:  5.0   
##  Median : 0.3     Median :  1.0    Median :  0.3    Median :  9.0   
##  Mean   : 0.5     Mean   :  2.5    Mean   :  1.3    Mean   : 11.3   
##  3rd Qu.: 0.5     3rd Qu.:  2.5    3rd Qu.:  0.9    3rd Qu.: 15.0   
##  Max.   :22.5     Max.   :196.0    Max.   :220.0    Max.   :947.0   
##  NA's   :629358   NA's   :629407   NA's   :635182   NA's   :554185  
##        station_alias   
##  Argentina    :157798  
##  Castilla     :157409  
##  Constitución :157727  
##  H. Felgueroso:157666  
##  Montevil     : 74630  
##  Santa Bárbara: 17544  
## 

We take a look to the data completeness. What proportion of nas do we have by variable, station, year, etc?

data_completeness <- air_data_2 %>% 
  group_by(station_alias, year = year(date_time_utc)) %>% 
  summarise_all(funs(round(sum(!is.na(.))/n(), 2))) %>%
  select(-c(3:7, 25:28)) # These columns do not have any na. We exclude them.

kable(head(data_completeness, 10))
station_alias year SO2 NO NO2 CO PM10 O3 dd vv TMP HR PRB RS LL BEN TOL MXIL PM25
Argentina 2000 0.99 0.97 0.97 0.96 0.94 0.97 0 0 0 0 0 0 0 0 0 0 0
Argentina 2001 0.99 0.99 0.99 0.98 0.97 0.99 0 0 0 0 0 0 0 0 0 0 0
Argentina 2002 1.00 0.99 0.99 0.99 0.99 1.00 0 0 0 0 0 0 0 0 0 0 0
Argentina 2003 0.99 0.98 0.98 0.98 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
Argentina 2004 0.98 0.96 0.97 0.99 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0
Argentina 2005 0.98 0.96 0.98 1.00 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0
Argentina 2006 0.92 0.90 0.92 0.92 0.93 0.93 0 0 0 0 0 0 0 0 0 0 0
Argentina 2007 0.98 0.99 0.99 0.98 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
Argentina 2008 0.98 0.96 0.98 0.97 0.98 0.98 0 0 0 0 0 0 0 0 0 0 0
Argentina 2009 1.00 1.00 1.00 0.98 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0

We are going to check the data completeness by station:

Constitución: There is data registered from the variables SO2, NO, NO2, CO, PM10, 03, dd, vv, TMP, HR, PRB, HS and LL since the year 2000. There are measurements of the variables BEN, TOL and MXIL since the year 2006 (only 0.01% ). The PM25 particles are monitored since the year 2008 (2008: only covered 0,02% of the year). During the year 2008 the completeness of several variables (HR, PRB, HS, LL, BEN, TOL y MXIL) decrease until 88% (to do: check there was not caused by a data importing problem.)

constitucion_data <- data_completeness %>% filter(station_alias == 'Constitución')
kable(constitucion_data)
station_alias year SO2 NO NO2 CO PM10 O3 dd vv TMP HR PRB RS LL BEN TOL MXIL PM25
Constitución 2000 0.97 0.95 0.95 0.97 0.92 0.93 0.96 0.98 0.96 0.95 0.97 0.95 0.96 0.00 0.00 0.00 0.00
Constitución 2001 0.99 0.99 0.99 0.98 0.99 0.99 1.00 1.00 1.00 0.99 1.00 1.00 1.00 0.00 0.00 0.00 0.00
Constitución 2002 1.00 1.00 1.00 0.99 0.99 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.00 0.00
Constitución 2003 0.99 0.99 0.99 0.98 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.00 0.00 0.00 0.00
Constitución 2004 0.99 0.99 0.99 0.99 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.00 0.00
Constitución 2005 0.98 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.00 0.00 0.00 0.00
Constitución 2006 0.91 0.91 0.91 0.90 0.91 0.91 0.91 0.91 0.91 0.91 0.91 0.91 0.91 0.01 0.01 0.01 0.00
Constitución 2007 0.98 0.99 0.99 0.97 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.00
Constitución 2008 0.98 0.99 0.99 0.99 0.99 1.00 0.88 0.88 0.88 0.88 0.88 0.88 0.88 0.88 0.88 0.88 0.02
Constitución 2009 0.99 0.99 0.99 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
Constitución 2010 0.99 0.99 0.99 0.99 0.99 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99 0.99 0.99 0.99
Constitución 2011 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.98 0.98 0.98 0.99
Constitución 2012 0.97 0.97 0.97 0.96 0.97 0.96 0.97 0.97 0.97 0.97 0.97 0.97 0.97 0.96 0.96 0.96 0.97
Constitución 2013 0.99 0.99 0.99 0.99 1.00 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99 0.99 0.99 1.00
Constitución 2014 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99 0.99 0.99 1.00
Constitución 2015 0.98 0.98 0.98 0.98 0.99 0.98 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.98 0.98 0.32 0.98
Constitución 2016 0.95 0.95 0.95 0.95 0.95 0.95 0.98 0.98 0.97 0.97 0.97 0.97 0.97 0.90 0.90 0.90 0.95
Constitución 2017 0.99 0.99 0.99 0.99 1.00 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99 0.99 0.99 1.00
Constitución 2018 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

Argentina: data since the year 2000. Variables: SO2, NO, NO2, CO, PM10 and 03.

argentina_data <- data_completeness %>% filter(station_alias == 'Argentina')
kable(argentina_data)
station_alias year SO2 NO NO2 CO PM10 O3 dd vv TMP HR PRB RS LL BEN TOL MXIL PM25
Argentina 2000 0.99 0.97 0.97 0.96 0.94 0.97 0 0 0 0 0 0 0 0 0 0 0
Argentina 2001 0.99 0.99 0.99 0.98 0.97 0.99 0 0 0 0 0 0 0 0 0 0 0
Argentina 2002 1.00 0.99 0.99 0.99 0.99 1.00 0 0 0 0 0 0 0 0 0 0 0
Argentina 2003 0.99 0.98 0.98 0.98 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
Argentina 2004 0.98 0.96 0.97 0.99 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0
Argentina 2005 0.98 0.96 0.98 1.00 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0
Argentina 2006 0.92 0.90 0.92 0.92 0.93 0.93 0 0 0 0 0 0 0 0 0 0 0
Argentina 2007 0.98 0.99 0.99 0.98 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
Argentina 2008 0.98 0.96 0.98 0.97 0.98 0.98 0 0 0 0 0 0 0 0 0 0 0
Argentina 2009 1.00 1.00 1.00 0.98 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0
Argentina 2010 0.99 0.99 1.00 0.99 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0
Argentina 2011 0.98 0.99 0.99 0.98 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
Argentina 2012 0.99 0.96 0.96 0.96 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0
Argentina 2013 0.99 0.99 0.99 0.99 1.00 0.99 0 0 0 0 0 0 0 0 0 0 0
Argentina 2014 1.00 0.99 0.99 1.00 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0
Argentina 2015 0.99 0.99 0.99 0.99 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
Argentina 2016 0.99 0.99 0.99 0.99 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
Argentina 2017 0.99 0.99 0.99 0.99 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
Argentina 2018 1.00 1.00 1.00 1.00 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0

H. Felgueroso: data since the year 2000. Variables: SO2, NO, NO2, CO, PM10 and 03. During the year 2006 the completeness of the data decrease until 88% (to do: check there was not caused by a data importing problem.)

felgueroso_data <- data_completeness %>% filter(station_alias == 'H. Felgueroso')
kable(felgueroso_data)
station_alias year SO2 NO NO2 CO PM10 O3 dd vv TMP HR PRB RS LL BEN TOL MXIL PM25
H. Felgueroso 2000 0.97 0.96 0.96 0.97 0.96 0.96 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2001 0.99 0.99 0.99 0.99 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2002 0.93 0.93 0.93 0.93 0.93 0.93 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2003 0.98 0.98 0.98 0.97 0.98 0.98 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2004 0.98 0.97 0.97 0.99 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2005 0.97 0.96 0.96 0.99 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2006 0.88 0.87 0.87 0.90 0.90 0.90 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2007 0.98 0.99 0.99 0.99 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2008 0.98 0.99 0.99 0.99 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2009 1.00 1.00 1.00 1.00 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2010 0.99 0.99 0.99 0.99 0.98 0.99 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2011 0.99 0.99 0.99 1.00 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2012 0.96 0.97 0.97 0.97 0.97 0.97 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2013 0.99 0.99 0.99 0.99 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2014 0.98 0.98 0.98 0.99 0.99 0.98 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2015 1.00 1.00 1.00 1.00 1.00 0.99 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2016 0.99 0.99 0.99 0.99 0.98 0.99 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2017 0.99 0.99 0.99 0.99 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
H. Felgueroso 2018 1.00 1.00 1.00 1.00 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0

Castilla: data since the year 2000. Variables: SO2, NO, NO2, CO, PM10 and 03. During the year 2015 the completeness of the data decrease until 77% (to do: check there was not caused by a data importing problem.)

castilla_data <- data_completeness %>% filter(station_alias == 'Castilla')
kable(castilla_data)
station_alias year SO2 NO NO2 CO PM10 O3 dd vv TMP HR PRB RS LL BEN TOL MXIL PM25
Castilla 2000 0.97 0.97 0.97 0.97 0.97 0.95 0 0 0 0 0 0 0 0 0 0 0
Castilla 2001 0.98 0.99 0.99 0.98 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
Castilla 2002 0.99 0.99 0.99 0.97 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
Castilla 2003 0.99 0.99 0.99 0.98 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
Castilla 2004 0.99 0.99 0.99 0.98 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
Castilla 2005 0.99 0.95 0.95 0.98 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0
Castilla 2006 0.91 0.91 0.91 0.91 0.92 0.93 0 0 0 0 0 0 0 0 0 0 0
Castilla 2007 0.99 1.00 1.00 0.99 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0
Castilla 2008 0.95 0.96 0.96 0.95 0.96 0.96 0 0 0 0 0 0 0 0 0 0 0
Castilla 2009 0.99 0.99 0.99 0.99 0.99 1.00 0 0 0 0 0 0 0 0 0 0 0
Castilla 2010 0.92 0.93 0.93 0.93 0.93 0.93 0 0 0 0 0 0 0 0 0 0 0
Castilla 2011 0.97 0.99 0.99 0.98 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0
Castilla 2012 0.97 0.98 0.98 0.98 0.98 0.98 0 0 0 0 0 0 0 0 0 0 0
Castilla 2013 1.00 0.99 0.99 1.00 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0
Castilla 2014 0.99 0.99 0.99 0.99 1.00 0.99 0 0 0 0 0 0 0 0 0 0 0
Castilla 2015 0.77 0.76 0.76 0.77 0.76 0.77 0 0 0 0 0 0 0 0 0 0 0
Castilla 2016 0.98 0.99 0.99 0.99 0.97 0.98 0 0 0 0 0 0 0 0 0 0 0
Castilla 2017 0.97 0.99 0.99 0.99 0.98 0.97 0 0 0 0 0 0 0 0 0 0 0
Castilla 2018 1.00 1.00 1.00 1.00 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0

Montevil: Data since the year 2009. Variables: SO2, NO, NO2, 03, dd, vv, TMP, HR, PRB, HS, LL and PM25.

montevil_data <- data_completeness %>% filter(station_alias == 'Montevil')
kable(montevil_data)
station_alias year SO2 NO NO2 CO PM10 O3 dd vv TMP HR PRB RS LL BEN TOL MXIL PM25
Montevil 2009 0.91 0.93 0.93 0 0 0.93 0.93 0.93 0.93 0.93 0.93 0.93 0.93 0 0 0 0.93
Montevil 2010 0.99 1.00 1.00 0 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0 0 0 0.92
Montevil 2011 0.99 0.99 0.99 0 0 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0 0 0 1.00
Montevil 2012 1.00 1.00 1.00 0 0 1.00 0.98 0.98 1.00 1.00 1.00 1.00 1.00 0 0 0 1.00
Montevil 2013 1.00 1.00 1.00 0 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0 0 0 1.00
Montevil 2014 1.00 1.00 1.00 0 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0 0 0 1.00
Montevil 2015 0.99 1.00 1.00 0 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0 0 0 1.00
Montevil 2016 0.99 0.99 0.99 0 0 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0 0 0 1.00
Montevil 2017 0.99 0.99 0.99 0 0 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0 0 0 0.99
Montevil 2018 1.00 1.00 1.00 0 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0 0 0 1.00

Santa Bárbara: Data since the year 2016. Variables: NO, NO2, CO, PM10, 03 and PM25

barbara_data <- data_completeness %>% filter(station_alias == 'Santa Bárbara')
kable(barbara_data)
station_alias year SO2 NO NO2 CO PM10 O3 dd vv TMP HR PRB RS LL BEN TOL MXIL PM25
Santa Bárbara 2016 0 0.97 0.97 0.98 0.98 0 0 0 0 0 0 0 0 0 0 0 0.98
Santa Bárbara 2017 0 0.98 0.98 0.99 1.00 0 0 0 0 0 0 0 0 0 0 0 1.00
Santa Bárbara 2018 0 1.00 1.00 1.00 1.00 0 0 0 0 0 0 0 0 0 0 0 1.00

All the stations have 2018 data, but it is just 6 observations. We drop them to avoid problems when visualising the data.

observations_per_year <- air_data_2 %>% group_by(year = year(date_time_utc)) %>%
                        summarise(n = n())
kable(observations_per_year)
year n
2000 35136
2001 35040
2002 35040
2003 35040
2004 35136
2005 35040
2006 34939
2007 34921
2008 35136
2009 39541
2010 43800
2011 43800
2012 43920
2013 43800
2014 43800
2015 43416
2016 52703
2017 52560
2018 6
air_data_2$year <- year(air_data_2$date_time_utc)
air_data_2 <- air_data_2 %>% filter(year != '2018')

We add to the dataset several more time variables.

air_data_2$month <- month(air_data_2$date_time_utc)
air_data_2$year_month_day <- ymd(air_data_2$date_time_utc)
air_data_2$week_day <- wday(air_data_2$date_time_utc, week_start = getOption("lubridate.week.start", 1))
air_data_2$hour <- hour(air_data_2$date_time_utc)

We take a look to the general trend of several indicators through the last 18 years

# We calcule the yearly mean of the pollutants levels.
year_avgs <- air_data_2 %>% select(station_alias, date_time_utc, PM10, PM25, SO2, NO2, NO, O3, BEN, CO, MXIL, TOL) %>%
  group_by(station_alias, year = year(date_time_utc)) %>%
  summarise_all(funs(mean(., na.rm = TRUE))) %>% 
  select(-date_time_utc) # We drop this variable

# We convert the table to long format

year_avgs_long <- gather(year_avgs, contaminante, value, 3:length(year_avgs)) %>% 
                    filter(!(station_alias == 'Constitución' & year == '2006' & contaminante %in% c('BEN', 'MXIL', 'TOL'))) %>% # We filter this data because is only completed in 0.01%
                    filter(!(station_alias == 'Constitución' & year == '2008' & contaminante == 'PM25')) # We filter this data because is only completed in 0.02%

# We present the data in a grid of graphs

ggplot(year_avgs_long, aes(x = year, y = value)) + 
  geom_line() + 
  facet_grid(contaminante~station_alias,scales="free_y") +
   theme(axis.text = element_text(size = 6))

We drop the Santa Bárbara and Montevil stations. These stations have much less data and the behavior of their variables are significantly different (they are sub-urban stations). So, we take them out from the analysis for now.

air_data_3 <- air_data_2 %>% filter(station_alias != 'Montevil' , 
                                         station_alias != 'Santa Bárbara' )
# We calcule the yearly mean of the pollutants levels.
year_avgs <- air_data_3 %>% select(station_alias, date_time_utc, PM10, PM25, SO2, NO2, NO, O3, BEN, CO, MXIL, TOL) %>%
  group_by(station_alias, year = year(date_time_utc)) %>%
  summarise_all(funs(mean(., na.rm = TRUE))) %>% 
  select(-date_time_utc) # quito ahora esta variable, porque no tiene sentido que salga su media.

# We convert the table to long format

year_avgs_long <- gather(year_avgs, contaminante, value, 3:length(year_avgs)) %>%
                    filter(!(station_alias == 'Constitución' & year == '2006' & contaminante %in% c('BEN', 'MXIL', 'TOL'))) %>% # We filter this data because is only completed in 0.01%
                    filter(!(station_alias == 'Constitución' & year == '2008' & contaminante == 'PM25')) # We filter this data because is only completed in 0.02%

# We present the data in a grid of graphs

ggplot(year_avgs_long, aes(x = year, y = value)) + 
  geom_line() + 
  facet_grid(contaminante~station_alias,scales="free_y") +
   theme(axis.text = element_text(size = 6))

# We calcule the hourly mean of the pollutants levels.
hour_avgs <- air_data_3 %>% select(station_alias, hour, PM10, PM25, SO2, NO2, NO, O3, BEN, CO, MXIL, TOL) %>%
  group_by(station_alias, hour) %>%
  summarise_all(funs(mean(., na.rm = TRUE)))  # quito ahora esta variable, porque no tiene sentido que salga su media.

# We convert the table to long format

hour_avgs_long <- gather(hour_avgs, contaminante, value, 3:length(hour_avgs))

# We present the data in a grid of graphs

ggplot(hour_avgs_long, aes(x = hour, y = value)) + 
  geom_line() + 
  facet_grid(contaminante~station_alias,scales="free_y") +
   theme(axis.text = element_text(size = 6))

# We calcule the monthly mean of the pollutants levels.
month_avgs <- air_data_3 %>% select(station_alias, month, PM10, PM25, SO2, NO2, NO, O3, BEN, CO, MXIL, TOL) %>%
  group_by(station_alias, month) %>%
  summarise_all(funs(mean(., na.rm = TRUE)))  # quito ahora esta variable, porque no tiene sentido que salga su media.

# We convert the table to long format

month_avgs_long <- gather(month_avgs, contaminante, value, 3:length(month_avgs))

# We present the data in a grid of graphs

ggplot(month_avgs_long, aes(x = month, y = value)) + 
  geom_line() + 
  facet_grid(contaminante~station_alias,scales="free_y") +
   theme(axis.text = element_text(size = 6))

# We calcule the weekly mean of the pollutants levels.
week_day_avgs <- air_data_3 %>% select(station_alias, week_day, PM10, PM25, SO2, NO2, NO, O3, BEN, CO, MXIL, TOL) %>%
  group_by(station_alias, week_day) %>%
  summarise_all(funs(mean(., na.rm = TRUE)))  # quito ahora esta variable, porque no tiene sentido que salga su media.

# We convert the table to long format

week_day_avgs_long <- gather(week_day_avgs, contaminante, value, 3:length(week_day_avgs))

# We present the data in a grid of graphs

ggplot(week_day_avgs_long, aes(x = week_day, y = value)) + 
  geom_line() + 
  facet_grid(contaminante~station_alias,scales="free_y") +
   theme(axis.text = element_text(size = 6))

Prediction models

We are going to use as base model for our predictions the ARIMA method. And, as first step we are going to try to predict the values of the PM10 pollutant for the Constitución station.

We create the dataset pm10 with PM10 values from the Constitución Station and we execute a summary

pm10 <- air_data_3 %>% filter(station_alias == 'Constitución') %>%
                        select(date_time_utc, PM10)

summary(pm10)
##  date_time_utc                      PM10       
##  Min.   :2000-01-01 00:00:00   Min.   :  0.00  
##  1st Qu.:2004-06-30 23:15:00   1st Qu.: 19.00  
##  Median :2009-01-02 00:30:00   Median : 29.00  
##  Mean   :2008-12-31 18:50:45   Mean   : 34.39  
##  3rd Qu.:2013-07-02 23:45:00   3rd Qu.: 44.00  
##  Max.   :2017-12-31 23:00:00   Max.   :888.00  
##                                NA's   :3106

25% of the values are between 44.00 and 888.00. 888.00 is a value really extreme. How many extreme values (outliers) do we have in this series? We plot all the values to visualise this:

ggplot(pm10, aes(x = date_time_utc, y = PM10)) + 
         geom_point(alpha = 0.1)

We have very few values greater than 250. So, it doesn’t seem we have a problem with the outliers (Pending: A PM10 level of 880 is something possible or is it likely to be a monitoring error?).

Daily averages

We create a new dataset with the PM10 daily averages and we plot them in a new graphic. We add a trend line too. There is a clear downward trend in the measurements and we have many fewer extreme values during the last decade. It seems like we have two very clear “epochs” in the data, before and after the year 2008.

pm10_day_avg <- pm10 %>% group_by(day = date(date_time_utc)) %>%
                          summarise(day_avg = mean(PM10, na.rm = TRUE))

ggplot(pm10_day_avg, aes(x = day, y = day_avg, , colour = day_avg)) + 
         geom_point(alpha = 0.5) +
         geom_smooth(color = "grey", alpha = 0.2) +
         scale_colour_gradientn(colours = terrain.colors(10)) +
         theme(legend.position = c(0.3, 0.9),
                legend.background = element_rect(colour = "transparent", fill = NA), legend.direction = "horizontal") +
         labs(colour = "PM10 daily average (colour scale)", x = "Year", y = "PM10 daily average", title = "PM10 daily average - 2000-2017 evolution (Constitución Station)")

We identify a very clear trend through the years on the last graph. But, as we already saw before on the grid graphs there are other things happening at the same time.

year_const <- year_avgs_long %>% filter(station_alias == "Constitución", contaminante == 'PM10')
plot1 <- ggplot(year_const, aes(x = year, y = value)) + 
  geom_line()

month_const <- month_avgs_long %>% filter(station_alias == "Constitución", contaminante == 'PM10')
plot2 <- ggplot(month_const, aes(x = month, y = value)) + 
  geom_line()

week_day_const <- week_day_avgs_long %>% filter(station_alias == "Constitución", contaminante == 'PM10')
plot3 <- ggplot(week_day_const, aes(x = week_day, y = value)) + 
  geom_line()

hour_const <- hour_avgs_long %>% filter(station_alias == "Constitución", contaminante == 'PM10')
plot4 <- ggplot(hour_const, aes(x = hour, y = value)) + 
  geom_line()

grid.arrange(plot1, plot2, plot3, plot4, ncol = 2)

As a first step we are going to try to predict the monthly levels of PM10. We create a time series object with the monthly averages.

year_month_pm10 <- pm10 %>% group_by(year = year(date_time_utc), month = month(date_time_utc)) %>%
                            summarise(year_month_avg = mean(PM10, na.rm = TRUE))

year_month_pm10 <- year_month_pm10 %>% unite("year_month", c("year", "month"), sep = "-")
                
pm10_month_ts <- ts(year_month_pm10$year_month_avg, start = 2000, frequency = 12)  

# We create another time series object with the period 2000-2008

year_month_pm10_1 <- year_month_pm10 %>% filter(year_month <= '2009-01')

pm10_1st_ts <- ts(year_month_pm10_1$year_month_avg, start = 2000, frequency = 12)  

# We create another time series object with the period 2009-2017

year_month_pm10_2 <- year_month_pm10 %>% filter(year_month > '2009-0')

pm10_2nd_ts <- ts(year_month_pm10_2$year_month_avg, start = 2009, frequency = 12)  

We have a time series of 216 observations. 18 years x 12 months (2000-2017)

glimpse(pm10_month_ts)
##  Time-Series [1:216] from 2000 to 2018: 61.3 62.2 62.3 41.1 50.4 ...

We plot the complete series. As we saw before with the daily averages, we identify two obvious things at first sight: The general downward trend during the whole series and at least two very different cycles in the data, the first between 2000 and 2007, and the second from 2008 to the end of the series. Beyond this, the first cycle seems to be much more irregular than the second one. This fact can play an important role in choosing the period to fit a prediction model.

autoplot(pm10_month_ts)

We plot a seasonal plot for all the data but it is not very easy to read.

ggseasonplot(pm10_month_ts, year.labels=TRUE, year.labels.left=TRUE) +
  ylab("PM10") +
  ggtitle("Seasonal plot: PM10 - Constitución")

If we reduce the period range to 2009-2017 we can observe more easily the seasonal trend of the data. The year starts with low PM10 levels (January and February). We have a peak in March and a decrease in April-May (with some exceptions). The levels use to increase during September and October and fall in November. And we finish the year with another increase in December.

ggseasonplot(pm10_2nd_ts, year.labels=TRUE, year.labels.left=TRUE) +
  ylab("PM10") +
  ggtitle("Seasonal plot: PM10 - Constitución (2009-2017)")

pm10_month_ts %>% ggtsdisplay(main="")

As the data is non-stationary, we apply the diff function

pm10_diff <- diff(pm10_month_ts) 

We plot the differenced series and the ACF and PACF plots

pm10_diff %>% ggtsdisplay(main="")

The auto-correlation graphs show certain patterns which could be caused by seasonal effects.

We generate an ARIMA model with the auto.arima function.

fit <- auto.arima(pm10_month_ts)
fit
## Series: pm10_month_ts 
## ARIMA(1,1,2)(2,0,0)[12] 
## 
## Coefficients:
##           ar1     ma1      ma2    sar1    sar2
##       -0.5343  0.0470  -0.4579  0.1121  0.1403
## s.e.   0.1737  0.1676   0.0855  0.0705  0.0717
## 
## sigma^2 estimated as 38.21:  log likelihood=-694.76
## AIC=1401.52   AICc=1401.93   BIC=1421.75
fit <- auto.arima(pm10_month_ts, approximation = FALSE, stepwise = FALSE)
fit
## Series: pm10_month_ts 
## ARIMA(4,1,1) 
## 
## Coefficients:
##          ar1     ar2     ar3      ar4      ma1
##       0.3995  0.0555  0.2414  -0.2766  -0.8703
## s.e.  0.0933  0.0784  0.0762   0.0711   0.0783
## 
## sigma^2 estimated as 35.71:  log likelihood=-687.41
## AIC=1386.83   AICc=1387.23   BIC=1407.05
fit <- auto.arima(pm10_2nd_ts, approximation = FALSE, stepwise = FALSE)
fit
## Series: pm10_2nd_ts 
## ARIMA(0,1,2) 
## 
## Coefficients:
##           ma1      ma2
##       -0.5468  -0.3444
## s.e.   0.0934   0.0935
## 
## sigma^2 estimated as 19.94:  log likelihood=-311.64
## AIC=629.27   AICc=629.51   BIC=637.29
autoplot(forecast(fit))

fit <- auto.arima(pm10_2nd_ts)

fit
## Series: pm10_2nd_ts 
## ARIMA(1,1,1)(1,0,0)[12] 
## 
## Coefficients:
##          ar1      ma1    sar1
##       0.3278  -0.9384  0.1470
## s.e.  0.1046   0.0401  0.0991
## 
## sigma^2 estimated as 19.82:  log likelihood=-310.9
## AIC=629.8   AICc=630.19   BIC=640.49
autoplot(forecast(fit))

fit <- auto.arima(pm10_2nd_ts, approximation = FALSE)
fit
## Series: pm10_2nd_ts 
## ARIMA(1,1,1)(1,0,0)[12] 
## 
## Coefficients:
##          ar1      ma1    sar1
##       0.3278  -0.9384  0.1470
## s.e.  0.1046   0.0401  0.0991
## 
## sigma^2 estimated as 19.82:  log likelihood=-310.9
## AIC=629.8   AICc=630.19   BIC=640.49
autoplot(forecast(fit, h = 12))

We think that the model selected by the auto.arima function generate very flat forecasts. The forecast line for 2018 does not seem to collect the variability of the past series. We set the Q parameter to 1 in order to try to get a better result.

fit2 <- Arima(pm10_2nd_ts, order=c(1,1,1), seasonal=c(1,0,1))
fit2
## Series: pm10_2nd_ts 
## ARIMA(1,1,1)(1,0,1)[12] 
## 
## Coefficients:
##          ar1      ma1    sar1     sma1
##       0.3722  -0.9463  0.9988  -0.9747
## s.e.  0.1023   0.0405  0.0093   0.0981
## 
## sigma^2 estimated as 16.5:  log likelihood=-306.62
## AIC=623.25   AICc=623.84   BIC=636.61

The shape of this new forecast seems to be more closed to the historical data (Pending: generate more differente models and compare their metrics)

autoplot(forecast(fit2, h = 12))

But we do not have 2018 data yet to test these forecasts. So, in order to test our models we will have to divide the data between in two groups, train and test.