For B assignement Do the following (at least for one station)

Getting Mataró source data

This is how we get the data from the Excel file that corresponds to Mataró and how it is transform to work on the assigment. As the information of the other cities is equally got, we only include this first chunk of code.

fileName <- "HistoricMesuresMataro_Auto.xlsx"
dataMataro <- read.xlsx2(fileName, sheetIndex = 1, colClasses = NA)
colnames(dataMataro) <- c("Dies","SO2","NO","NO2","O3","CO","PM10")
dataMataro$Dies <- as.POSIXct(as.character(dataMataro$Dies), format="%d/%m/%Y %H:%M:%S", tz="CET")
dataMataro$Day <- as.factor(weekdays(dataMataro$Dies))
dataMataro$DayNum <-NA
indexMat <-1
for (day in  as.integer(dataMataro$Day)){
  if(day==1){
    dataMataro$DayNum[indexMat]=7
    dataMataro$Weekend[indexMat]=TRUE
}
  else if(day==2){
    dataMataro$DayNum[indexMat]=4
    dataMataro$Weekend[indexMat]=FALSE
  }
  else if(day==3){
    dataMataro$DayNum[indexMat]=1
    dataMataro$Weekend[indexMat]=FALSE
  }
  else if(day==4){
    dataMataro$DayNum[indexMat]=2
    dataMataro$Weekend[indexMat]=FALSE
  }
  else if(day==5){
    dataMataro$DayNum[indexMat]=3
    dataMataro$Weekend[indexMat]=FALSE
  }
  else if(day==6){
    dataMataro$DayNum[indexMat]=6
    dataMataro$Weekend[indexMat]=TRUE
    
  }
  else if(day==7){
    dataMataro$DayNum[indexMat]=5
    dataMataro$Weekend[indexMat]=FALSE
  }
indexMat=indexMat+1
}
  • Are the PM10 concentrations from working days close to the weekends
contWeekday <-0
sumWeekday  <-0
sumWeekend  <-0
contWeekend <-0
indexDF <-1 
for (valors in dataMataro$PM10){
  if(is.nan(valors)==FALSE){
    for (dies in dataMataro$Dies[indexDF]){
      if (dataMataro$Weekend[indexDF]==FALSE){
      numWeekday <- as.numeric(as.character(valors))
      sumWeekday <- sumWeekday + numWeekday
      contWeekday <- contWeekday +1
      }
      else{
        numWeekend <- as.numeric(as.character(valors))
        sumWeekend <- sumWeekend + numWeekend
        contWeekend <- contWeekend +1
      }
    }
  }
  indexDF <- indexDF+1
}

meanPM10WeekdayMat <- sumWeekday/contWeekday
meanPM10WeekendMat <- sumWeekend/contWeekend

Mataró values

Annual weekday mean value in Passeig dels Molins, Mataró (2016): 20.4972204

During the week this value is the level at which total, cardiopulmonary and lung cancer mortality have been shown to increase with more than 95% confidence in response to long-term exposure to PM10.

Annual weekdend mean value in Passeig dels Molins, Mataró (2016): 17.0028045

During the weekend, the value meet OMS guidelines because is under the level stablished.

Barcelona values

Annual weekday mean value in Eixample, Barcelona (2016): 28.2319525

During the week this value is higher than the level at which total, cardiopulmonary and lung cancer mortality have been shown to increase with more than 95% confidence in response to long-term exposure to PM10.

Annual weekdend mean value in Eixample,Barcelona (2016): 22.0598726

During the weekend, the value continous higher than OMS guidelines but the PM decrease.

Sant Vicenç values

Annual weekday mean value in Carrer de Sant Miquel,Sant Vicenç dels Horts (2016): 31.057971

During the week in Sant Vicenç, the value is much higher than OMS recommendation,even higher than Barcelona.

Annual weekdend mean value in Carrer de Sant Miquel,Sant Vicenç dels Horts (2016): 23.9100318

During the weekend, the PM levels decrease but they keep higher than the both cities previously analized. So, the city of Sant Vicenç mantain the risk of cardiopulmonary and lung cancer mortality with more than 95% confidence in response to long-term exposure to PM10.

  • How many days are under 30? and how many days are under 50?

We have created 3 dataframes with this informaction with hours, and then we counted the days.

DaysUnder30Mat<- sqldf("SELECT Dies, PM10 
               FROM dataMataro
               WHERE PM10 <30
               GROUP BY Dies, PM10
               ORDER BY PM10 DESC;"
               )
DaysUnder50Mat <- sqldf("SELECT Dies, PM10 
               FROM dataMataro
               WHERE PM10 >=30 AND PM10 <50
               GROUP BY Dies, PM10
               ORDER BY PM10 DESC;"
               )

Mataró data

Total days under 30 : 348 days

Total days between 30 and 50 : 14 days

Total days above 50 : 3 days

There is 1 days that data was not available

Barcelona data

Total days under 30 : 256 days

Total days between 30 and 50 : 89 days

Total days above 50 : 6 days

There are 15 days that data was not available

Sant Vicenç dels Horts data

Total days under 30 : 211 days

Total days between 30 and 50 : 140 days

Total days above 50 : 12 days

There are 3 days that data was not available

  • Which is the best estimation for sigma?
bestforSigma30Mat <- sd (DaysUnder30Mat$PM10)
bestforSigma50Mat <- sd (DaysUnder50Mat$PM10)
bestTotalSigmaMat <- (bestforSigma30Mat+bestforSigma50Mat)/2

The best estimation for sigma is 5.4817576 according to Mataró data.

The best estimation for sigma is 5.931732 according to Barcelona data.

The best estimation for sigma is 5.8700496 according to Sant Vicenç dels Horts data.

  • Are data coming from a gaussian distribution?
hist(dataMataro$PM10,freq = FALSE,xlim = range(1:162))

hist(dataBcn$PM10,freq = FALSE,xlim = range(1:541))

hist(dataVicens$PM10,freq = FALSE,xlim = range(1:473))

According to this representation, the data does not seem to come from a Gaussian distribution.

For A Assignment (additional to B)

MataroDiesDate <- ymd_hms(dataMataro$Dies)
MataroDiesHour <- hour(MataroDiesDate)
worstHoursMat <- data.frame(MataroDiesHour,dataMataro$PM10)
colnames(worstHoursMat)[2] <- "dataMataroPM10"
h0Mat <- vector()
h1Mat <- vector()
h2Mat <- vector()
h3Mat <- vector()
h4Mat <- vector()
h5Mat <- vector()
h6Mat <- vector()
h7Mat <- vector()
h8Mat <- vector()
h9Mat <- vector()
h10Mat <- vector()
h11Mat <- vector()
h12Mat <- vector()
h13Mat <- vector()
h14Mat <- vector()
h15Mat <- vector()
h16Mat <- vector()
h17Mat <- vector()
h18Mat <- vector()
h19Mat <- vector()
h20Mat <- vector()
h21Mat <- vector()
h22Mat <- vector()
h23Mat <- vector()

indexMatH <-1
for (h1 in worstHoursMat$MataroDiesHour){
  if(is.nan(h1)==FALSE){
    for(h11 in worstHoursMat$dataMataroPM10[indexMatH]){
      if(is.nan(h11)==FALSE){
        if(h1==0){
          h0Mat <- c(h0Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==1){
          h1Mat <- c(h1Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==2){
          h2Mat <- c(h2Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==3){
          h3Mat <- c(h3Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==4){
          h4Mat <- c(h4Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==5){
          h5Mat <- c(h5Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==6){
          h6Mat <- c(h6Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==7){
          h7Mat <- c(h7Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==8){
          h8Mat <- c(h8Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==9){
          h9Mat <- c(h9Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==10){
          h10Mat <- c(h10Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==11){
          h11Mat <- c(h11Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==12){
          h12Mat <- c(h12Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==13){
          h13Mat <- c(h13Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==14){
          h14Mat <- c(h14Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==15){
          h15Mat <- c(h15Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==16){
          h16Mat <- c(h16Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==17){
          h17Mat <- c(h17Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==18){
          h18Mat <- c(h18Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==19){
          h19Mat <- c(h19Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==20){
          h20Mat <- c(h20Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==21){
          h21Mat <- c(h21Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(h1==22){
          h22Mat <- c(h22Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
        else if(23){
          h23Mat <- c(h23Mat,worstHoursMat$dataMataroPM10[indexMatH])
        }
      }
    }
  }
  indexMatH=indexMatH+1
}
meanH0Mat <- sum(h0Mat,na.rm = TRUE)/length(h0Mat)
meanH1Mat <- sum(h1Mat,na.rm = TRUE)/length(h1Mat)
meanH2Mat <- sum(h2Mat,na.rm = TRUE)/length(h2Mat)
meanH3Mat <- sum(h3Mat,na.rm = TRUE)/length(h3Mat)
meanH4Mat <- sum(h4Mat,na.rm = TRUE)/length(h4Mat)
meanH5Mat <- sum(h5Mat,na.rm = TRUE)/length(h5Mat)
meanH6Mat <- sum(h6Mat,na.rm = TRUE)/length(h6Mat)
meanH7Mat <- sum(h7Mat,na.rm = TRUE)/length(h7Mat)
meanH8Mat <- sum(h8Mat,na.rm = TRUE)/length(h8Mat)
meanH9Mat <- sum(h9Mat,na.rm = TRUE)/length(h9Mat)
meanH10Mat <- sum(h10Mat,na.rm = TRUE)/length(h10Mat)
meanH11Mat <- sum(h11Mat,na.rm = TRUE)/length(h11Mat)
meanH12Mat <- sum(h12Mat,na.rm = TRUE)/length(h12Mat)
meanH13Mat <- sum(h13Mat,na.rm = TRUE)/length(h13Mat)
meanH14Mat <- sum(h14Mat,na.rm = TRUE)/length(h14Mat)
meanH15Mat <- sum(h15Mat,na.rm = TRUE)/length(h15Mat)
meanH16Mat <- sum(h16Mat,na.rm = TRUE)/length(h16Mat)
meanH17Mat <- sum(h17Mat,na.rm = TRUE)/length(h17Mat)
meanH18Mat <- sum(h18Mat,na.rm = TRUE)/length(h18Mat)
meanH19Mat <- sum(h19Mat,na.rm = TRUE)/length(h19Mat)
meanH20Mat <- sum(h20Mat,na.rm = TRUE)/length(h20Mat)
meanH21Mat <- sum(h21Mat,na.rm = TRUE)/length(h21Mat)
meanH22Mat <- sum(h22Mat,na.rm = TRUE)/length(h22Mat)
meanH23Mat <- sum(h23Mat,na.rm = TRUE)/length(h23Mat)

Mataró Results

According to the differfent means of different hours the worst 5 hours in Passeig dels Molins,Mataró are:

  1. 21H: 23.8944444
  2. 20H: 23.7660167
  3. 9H: 22.9073034
  4. 22H: 22.9073034
  5. 19H: 22.6935933

Barcelona Results

According to the differfent means of different hours the worst 5 hours in Eixample ,Barcelona are:

  1. 9H: 32.7567568
  2. 12H: 32.6132075
  3. 17H: 32.1745562
  4. 11H: 31.4858934
  5. 18H:30.9061584

Sant Vicenç dels Horts Results

According to the differfent means of different hours the worst 5 hours in Carrer de Sant Miquel, Sant Vicenç dels Horts are:

  1. 9H: 41.1843575
  2. 8H: 39.6768802
  3. 10H: 37.0878187
  4. 11H: 33.3323864
  5. 7H:33.175
# From 0 to 6
Concentrations0To6Mat <- sqldf("SELECT MataroDiesHour,dataMataroPM10
                            FROM worstHoursMat
                            WHERE MataroDiesHour BETWEEN 0 AND 6")
meanCon0to6Mat <- mean(Concentrations0To6Mat$dataMataroPM10,na.rm = TRUE)

Concentrations0To6Bcn <- sqldf("SELECT BcnDiesHour,dataBcnPM10
                            FROM worstHoursBcn
                            WHERE BcnDiesHour BETWEEN 0 AND 6")
meanCon0to6Bcn <- mean(Concentrations0To6Bcn$dataBcnPM10,na.rm = TRUE)

Concentrations0To6Vicens <- sqldf("SELECT VicensDiesHour,dataVicensPM10
                            FROM worstHoursVicens
                            WHERE VicensDiesHour BETWEEN 0 AND 6")
meanCon0to6Vicens <- mean(Concentrations0To6Vicens$dataVicensPM10,na.rm = TRUE)

# From 6 to 10

Concentrations6To10Mat <- sqldf("SELECT MataroDiesHour,dataMataroPM10
                            FROM worstHoursMat
                            WHERE MataroDiesHour BETWEEN 6 AND 10")

meanCon6To10Mat <- mean(Concentrations6To10Mat$dataMataroPM10,na.rm = TRUE)

Concentrations6To10Bcn <- sqldf("SELECT BcnDiesHour,dataBcnPM10
                            FROM worstHoursBcn
                            WHERE BcnDiesHour BETWEEN 6 AND 10")

meanCon6To10Bcn <- mean(Concentrations6To10Bcn$dataBcnPM10,na.rm = TRUE)

Concentrations6To10Vicens <- sqldf("SELECT VicensDiesHour,dataVicensPM10
                            FROM worstHoursVicens
                            WHERE VicensDiesHour BETWEEN 6 AND 10")

meanCon6To10Vicens <- mean(Concentrations6To10Vicens$dataVicensPM10,na.rm = TRUE)

Mataró Data

Annual mean from 0 h to 6 h is 15.523242

Annual mean from 6 h to 10 h is 19.9606299

Barcelona data

Annual mean from 0 h to 6 h is 18.3941545

Annual mean from 6 h to 10 h is 26.9802158

Sant Vicenç dels Horts data

Annual mean from 0 h to 6 h is 24.6302187

Annual mean from 6 h to 10 h is 35.7346369

  • Are the PM10 concentrations from 15:00 to 20:00 close to the ones from 6:00 to 10:00
Concentrations15To20Mat <- sqldf("SELECT MataroDiesHour,dataMataroPM10
                            FROM worstHoursMat
                            WHERE MataroDiesHour BETWEEN 15 AND 20")
meanCon15to20Mat <- mean(Concentrations15To20Mat$dataMataroPM10,na.rm = TRUE)

Concentrations15To20Bcn <- sqldf("SELECT BcnDiesHour,dataBcnPM10
                            FROM worstHoursBcn
                            WHERE BcnDiesHour BETWEEN 15 AND 20")
meanCon15to20Bcn <- mean(Concentrations15To20Bcn$dataBcnPM10,na.rm = TRUE)

Concentrations15To20Vicens <- sqldf("SELECT VicensDiesHour,dataVicensPM10
                            FROM worstHoursVicens
                            WHERE VicensDiesHour BETWEEN 15 AND 20")
meanCon15to20Vicens <- mean(Concentrations15To20Vicens$dataVicensPM10,na.rm = TRUE)

Mataró Data

Annual mean from 15 h to 20 h is 20.7110495

Annual mean from 6 h to 10 h is 19.9606299

Barcelona data

Annual mean from 15 h to 20 h is 30.804914

Annual mean from 6 h to 10 h is 26.9802158

Sant Vicenç dels Horts data

Annual mean from 15 h to 20 h is 27.3997214

Annual mean from 6 h to 10 h is 35.7346369

  • Show a good representation to figured out the best and worst hour using 3h lenght (from 0 to 2, from 3 to 5, from 6 to 8, from 9 to 11)

Mataró Graphical Representation

Mataró From 0h to 2h
# From 0 to 2
Concentrations0To2Mat <- sqldf("SELECT MataroDiesHour,dataMataroPM10
                            FROM worstHoursMat
                            WHERE MataroDiesHour BETWEEN 0 AND 2")
plotConcen0To2Mat <- ggplot(Concentrations0To2Mat,aes(y=dataMataroPM10,x=MataroDiesHour,group=MataroDiesHour,color=MataroDiesHour))+geom_line()+ theme_bw()+ theme(legend.position="none")
plotConcen0To2Mat <- ggplot(Concentrations0To2Mat,aes(y=dataMataroPM10,x=MataroDiesHour,color=MataroDiesHour))+geom_boxplot()+theme(legend.position="none")+xlab("")+ylab("PM10")+geom_hline(aes(yintercept=20),color="blue",linetype = "dashed")
gg2Concen0To2Mat <- ggplotly(plotConcen0To2Mat)
gg2Concen0To2Mat
Mataró From 3h to 5h
Concentrations3To5Mat <- sqldf("SELECT MataroDiesHour,dataMataroPM10
                            FROM worstHoursMat
                            WHERE MataroDiesHour BETWEEN 3 AND 5")
plotConcen3To5Mat <- ggplot(Concentrations3To5Mat,aes(y=dataMataroPM10,x=MataroDiesHour,group=MataroDiesHour,color=MataroDiesHour))+geom_line()+ theme_bw()+ theme(legend.position="none")
plotConcen3To5Mat <- ggplot(Concentrations3To5Mat,aes(y=dataMataroPM10,x=MataroDiesHour,color=MataroDiesHour))+geom_boxplot()+theme(legend.position="none")+xlab("")+ylab("PM10")+geom_hline(aes(yintercept=20),color="blue",linetype = "dashed")
gg2Concen3To5Mat <- ggplotly(plotConcen3To5Mat)
gg2Concen3To5Mat
Mataró From 6h to 8h
Concentrations6To8Mat <- sqldf("SELECT MataroDiesHour,dataMataroPM10
                            FROM worstHoursMat
                            WHERE MataroDiesHour BETWEEN 6 AND 8")
plotConcen6To8Mat <- ggplot(Concentrations6To8Mat,aes(y=dataMataroPM10,x=MataroDiesHour,group=MataroDiesHour,color=MataroDiesHour))+geom_line()+ theme_bw()+ theme(legend.position="none")
plotConcen6To8Mat <- ggplot(Concentrations6To8Mat,aes(y=dataMataroPM10,x=MataroDiesHour,color=MataroDiesHour))+geom_boxplot()+theme(legend.position="none")+xlab("")+ylab("PM10")+geom_hline(aes(yintercept=20),color="blue",linetype = "dashed")
gg2Concen6To8Mat <- ggplotly(plotConcen6To8Mat)
gg2Concen6To8Mat
Mataró From 9h to 11h
Concentrations9To11Mat <- sqldf("SELECT MataroDiesHour,dataMataroPM10
                            FROM worstHoursMat
                            WHERE MataroDiesHour BETWEEN 9 AND 11")
plotConcen9To11Mat <- ggplot(Concentrations9To11Mat,aes(y=dataMataroPM10,x=MataroDiesHour,group=MataroDiesHour,color=MataroDiesHour))+geom_line()+ theme_bw()+ theme(legend.position="none")
plotConcen9To11Mat <- ggplot(Concentrations9To11Mat,aes(y=dataMataroPM10,x=MataroDiesHour,color=MataroDiesHour))+geom_boxplot()+theme(legend.position="none")+xlab("")+ylab("PM10")+geom_hline(aes(yintercept=20),color="blue",linetype = "dashed")
gg2Concen9To11Mat <- ggplotly(plotConcen9To11Mat)
gg2Concen9To11Mat

Barcelona Graphical Representation

Barcelona From 0h to 2h
Barcelona From 3h to 5h
Barcelona From 6h to 8h
Barcelona From 9h to 11h

Sant Vicenç dels Horts Graphical Representation

Sant Vicenç dels Horts From 0h to 2h
Sant Vicenç dels Horts From 3h to 5h
Sant Vicenç dels Horts From 6h to 8h
Sant Vicenç dels Horts From 9h to 11h

For A+ Assignment (additional to A)

  • Can you see any relation between stations looking at PM10 concentrations.

According to the results shown we can see some key hours that affects the different cities in relation to their workflow offered in the location. For example, Barcelona at the afternoon suffers from an increase of PM10 values due to traffic inbound and outbound. These hours are the worst for the city beacuse lots of people leave their workplace and go back home, so as Barcelona is a city that gives lots of workplaces to people and also it has an important number of population, this levels increase. Moreover, Mataró has an opposite air populution levels compared to Barcelona. The main reason could be that lots of its inhabitans work outside Mataró and then the worst hours match with the end of the workday. Finally, Sant Vicençs dels Horts. Its levels we’re unexpectedly high at a first sight, considering that its population is around 30.000 inhabitants. Although is we do a deeper analysis we can find some answers to that strange phenomenon(at first). As you can see below, in the map provided, the city of St. Vicenç is surrounded by some important infrastructures that affcets its air PM10 levels.The northen part has a state highway (N-340) and a highway (B-24). Also the eastern part has two highway (A-2 and B-23). So, after this analysis is not so strange to see why at worst hours St. Vicenç has an air-polluted atmosphere.

  • Can you check if you can acquire meteorological data close to the stations you are using to see if temperature, humidity or wind direction have some effect

Mataro Station : http://www.meteo.cat/observacions/xema/dades?codi=UP&dia=2016-11-10T00:00Z

Barcelona Station: http://www.meteo.cat/observacions/xema/dades?codi=X4&dia=2016-11-07T00:00Z

Sant Vicenç dels Horts: http://www.meteo.cat/observacions/xema/dades?codi=X8&dia=2016-11-10T00:00Z

In the first case, we have chosen the Cabrils station because is the station next to Mataró. Then, we have used the data from El Raval that is the closest to Eixample’s district. Finally, for Sant Vicenç dels Horts we have used the station from Zona Universitària, Barcelona.

For all cases we have realized that temperature, humidity or wind direction have some effect in the PM10 concentrations results. In terms of lack of rain, this phenomena is indispensable to drag the particles of the atmosphere and to clean the accumulated noxious gases. Then, if we move to temperature, it appears a key concept called, thermal inversion. The cooler and denser air gets trapped in the lower layers while the warmer air rises to the upper layers, preventing the two masses from mixing. Therefore, the cold and polluted air, much denser, can be seen as a compact fog over the city. Finally, if there is no wind nor rain, the cloud remains as if it were a dirty urban beret, waiting for another climate to arrive. Air pollution usually disappears with some variation of temperature or with the appearance of a pressure front that brings wind or rain.

library(leaflet)
m <- leaflet() %>% setView(lng = 2.149697, lat = 41.3804629, zoom = 9)
estacions <- c("Sant Vicenç dels Horts (Ribot - Sant Miquel)","Mataró (Passeig dels molins)","Barcelona (Eixample)","Barcelona - Zona Universitària","Cabrils","Barcelona - el Raval")
long <- c(2.009802,2.443254,2.153822,2.1295895,2.375738,2.168244)
lats <- c(41.392157,41.54716,41.385343,41.3883519,41.516478,41.380229)
dfEstacions <- data.frame(estacions,long,lats)
m %>% addTiles()%>% addMarkers(dfEstacions$long,dfEstacions$lats,label =dfEstacions$estacions)
  • Can you get addtional data from all these stations? try to acquire additional years (let say 2015, or 2014 or 2013 or 2012 or 2011 or 2010 or 2009 or 2008) Use some additional year (if possible)

We have used two additonal stations that are the ones from Vilafranca del Penedès and Bellver de Cerdanya

Vilafranca del Penèdes(2015)
fileName <- "HistoricMesures_Vilafranca.xls"
dataVila <- read.xlsx2(fileName, sheetIndex = 1, colClasses = NA)
colnames(dataVila) <- c("Dies","NO","NO2","O3","PM10")
dataVila$Dies <- as.POSIXct(as.character(dataVila$Dies), format="%d/%m/%Y %H:%M:%S", tz="CET")
dataVila$Day <- as.factor(weekdays(dataVila$Dies))
dataVila$DayNum <-NA
indexVila <-1
for (day in  as.integer(dataVila$Day)){
  if(day==1){
    dataVila$DayNum[indexVila]=7
    dataVila$Weekend[indexVila]=TRUE
}
  else if(day==2){
    dataVila$DayNum[indexVila]=4
    dataVila$Weekend[indexVila]=FALSE
  }
  else if(day==3){
    dataVila$DayNum[indexVila]=1
    dataVila$Weekend[indexVila]=FALSE
  }
  else if(day==4){
    dataVila$DayNum[indexVila]=2
    dataVila$Weekend[indexVila]=FALSE
  }
  else if(day==5){
    dataVila$DayNum[indexVila]=3
    dataVila$Weekend[indexVila]=FALSE
  }
  else if(day==6){
    dataVila$DayNum[indexVila]=6
    dataVila$Weekend[indexVila]=TRUE
    
  }
  else if(day==7){
    dataVila$DayNum[indexVila]=5
    dataVila$Weekend[indexVila]=FALSE
  }
indexVila=indexVila+1
}

kable(head(dataVila))
Dies NO NO2 O3 PM10 Day DayNum Weekend
2015-01-01 00:00:00 1 31 16 19 jueves 4 FALSE
2015-01-01 01:00:00 2 33 13 19 jueves 4 FALSE
2015-01-01 02:00:00 7 36 7 23 jueves 4 FALSE
2015-01-01 03:00:00 1 31 14 21 jueves 4 FALSE
2015-01-01 04:00:00 1 28 13 21 jueves 4 FALSE
2015-01-01 05:00:00 1 25 22 20 jueves 4 FALSE
Bellver de Cerdanya (2010)
fileName <- "HistoricMesures_BellverCerdanya.xls"
dataCerdanya <- read.xlsx2(fileName, sheetIndex = 1, colClasses = NA)
colnames(dataCerdanya) <- c("Dies","NO","NO2","O3","PM10")
dataCerdanya$Dies <- as.POSIXct(as.character(dataCerdanya$Dies), format="%d/%m/%Y %H:%M:%S", tz="CET")
dataCerdanya$Day <- as.factor(weekdays(dataCerdanya$Dies))
dataCerdanya$DayNum <-NA
indexCerdanya <-1
for (day in  as.integer(dataCerdanya$Day)){
  if(day==1){
    dataCerdanya$DayNum[indexCerdanya]=7
    dataCerdanya$Weekend[indexCerdanya]=TRUE
}
  else if(day==2){
    dataCerdanya$DayNum[indexCerdanya]=4
    dataCerdanya$Weekend[indexCerdanya]=FALSE
  }
  else if(day==3){
    dataCerdanya$DayNum[indexCerdanya]=1
    dataCerdanya$Weekend[indexCerdanya]=FALSE
  }
  else if(day==4){
    dataCerdanya$DayNum[indexCerdanya]=2
    dataCerdanya$Weekend[indexCerdanya]=FALSE
  }
  else if(day==5){
    dataCerdanya$DayNum[indexCerdanya]=3
    dataCerdanya$Weekend[indexCerdanya]=FALSE
  }
  else if(day==6){
    dataCerdanya$DayNum[indexCerdanya]=6
    dataCerdanya$Weekend[indexCerdanya]=TRUE
    
  }
  else if(day==7){
    dataCerdanya$DayNum[indexCerdanya]=5
    dataCerdanya$Weekend[indexCerdanya]=FALSE
  }
indexCerdanya=indexCerdanya+1
}
kable(head(dataVila))
Dies NO NO2 O3 PM10 Day DayNum Weekend
2015-01-01 00:00:00 1 31 16 19 jueves 4 FALSE
2015-01-01 01:00:00 2 33 13 19 jueves 4 FALSE
2015-01-01 02:00:00 7 36 7 23 jueves 4 FALSE
2015-01-01 03:00:00 1 31 14 21 jueves 4 FALSE
2015-01-01 04:00:00 1 28 13 21 jueves 4 FALSE
2015-01-01 05:00:00 1 25 22 20 jueves 4 FALSE