Business Problem :
Into the data
According to the World Health Organization (WHO), COVID-19 is a global pandemic and a public health emergency. The infectious disease COVID-19 could survive for hours in the environment and was contagious. As a result, this study investigates the relationship between COVID-19 distribution in Jakarta, Indonesia, and air pollutants (PM2.5, PM10, CO, SO2, NO2, and O3). Additionally, it assesses how large-scale social restrictions (LSSR) affect the air pollution index (API). Data from COVID-19 and air pollutants were examined during four time periods.
The global COVID-19 pandemic has highlighted the close connection between human health and the environment. The governments of each country have carried out various lockdown strategies to contain the spread of COVID-19. In Indonesia, COVID-19 has significantly affected the economic mobility, industry, companies, environment, and government policy arrangements and caused severe economic losses. Restrictions on human activities during the lockdown have contributed to reducing global carbon emissions because decreasing human mobility or economic activities, that assumed to reduce air pollutants due to less traffic and fewer industrial activities due to the COVID-19 lockdowns.
Data was provided at https://www.airnow.gov/international/us-embassies-and-consulates/ and https://data.jakarta.go.id/dataset/
Data Input and Inspection
Importing the required packages and libraries
Data Input:
Read “.csv” from data folder and remove unnessecary columns
##FOR PM25
south_2019 <- read.csv("data-input/pm25/2019.csv", stringsAsFactors = T)
south_2019$date <- with(south_2019, ymd_h(paste(Year, Month, Day, Hour, sep= ' ')))
south_2019 <- south_2019 %>% select (-c("Parameter", "Date..LT.", "Day", "Hour", "Year", "Month", "Duration", "Conc..Unit"))
south_2020 <- read.csv("data-input/pm25/2020.csv", stringsAsFactors = T)
south_2020$date <- with(south_2020, ymd_h(paste(Year, Month, Day, Hour, sep= ' ')))
south_2020 <- south_2020 %>% select (-c("Parameter", "Date..LT.", "Day", "Hour", "Year", "Month", "Duration", "Conc..Unit"))
south_2021 <- read.csv("data-input/pm25/2021.csv", stringsAsFactors = T)
south_2021$date <- with(south_2021, ymd_h(paste(Year, Month, Day, Hour, sep= ' ')))
south_2021 <- south_2021 %>% select (-c("Parameter", "Date..LT.", "Day", "Hour", "Year", "Month", "Duration", "Conc..Unit"))
##FOR PM10 and Other Poluttan
tbl1920 <- list.files(path = "data-input/ispu",
pattern = "*.csv",
full.names = T) %>% map_df(~read_csv(., col_types = cols(.default = "c")))
tbl1920 <- select(tbl1920, -(11))
tbl21 <- list.files(path = "data-input/ispu/ispu_2021",
pattern = "*.csv",
full.names = T) %>%
map_df(~read_csv(., col_types = cols(.default = "c")))
tbl21 <- tbl21 %>% select(-pm25)
Data Inspection, Transform, Manipulation :
Check the Missing Value of the data
#Check any Missing or Invalid or any N/A in the data frame
colSums(is.na(south_2019))
south_2019 <- south_2019[!(south_2019$QC.Name =="Missing" | south_2019$QC.Name =="Invalid" | south_2019$AQI < 0 | south_2019$NowCast.Conc. < 0 | south_2019$Raw.Conc. < 0 ),]
#Remove the last row because if we bind the data frame it the day after the last day in that year and would be a duplicate
south_2019 <- slice(south_2019, 1:(n() - 1))
#Check any Missing or Invalid or any N/A in the data frame
colSums(is.na(south_2020))
south_2020 <- south_2020[!(south_2020$QC.Name =="Missing" | south_2020$QC.Name =="Invalid" | south_2020$AQI < 0 | south_2020$NowCast.Conc. < 0 | south_2020$Raw.Conc. < 0 ),]
#Remove the last row because if we bind the data frame it the day after the last day in that year and would be a duplicate
south_2020 <- slice(south_2020, 1:(n() - 1))
#Check any Missing or Invalid or any N/A in the data frame
colSums(is.na(south_2021))
south_2021 <- south_2021[!(south_2021$QC.Name =="Missing" | south_2021$QC.Name =="Invalid" | south_2021$AQI < 0 | south_2021$NowCast.Conc. < 0 | south_2021$Raw.Conc. < 0 ),]
#Remove the last row because if we bind the data frame it the day after the last day in that year and would be a duplicate
south_2021 <- slice(south_2021, 1:(n() - 1))
Bind the PM2.5 data into one data frame and arrange the date
tbl_pm25 <- bind_rows(south_2019, south_2020, south_2021)
tbl_pm25 <- tbl_pm25 %>% arrange(date)
Bind the PM10 and other Pollutan data into one data frame and arrange the date
tbl_indicator <- bind_rows(tbl1920, tbl21)
Summarise the mean of the indicator from one hour to a full day (24H)
#summarise the mean of the indicator from one hour to a full day (24H)
tbl_pm25 <- tbl_pm25 %>%
mutate(tanggal = floor_date(date, "day")) %>%
group_by(tanggal) %>%
summarise(pm25 = mean(Raw.Conc.)) %>%
mutate(across(where(is.numeric), round, 0)) %>%
ungroup()
Add the AQI Calculation and Change selected columuns to factor format
#Add the AQI of PM25 as our paramater to analyze this research
tbl_pm25$aqi <- con2aqi("pm25", tbl_pm25$pm25)
#Change the unmatched columns format
tbl_pm25 <- tbl_pm25 %>%
mutate(
aqi = as.numeric(aqi),
pm25 = as.numeric(pm25),
tanggal = ymd(tanggal))
Replace any Misiing, N/A or Invalid Value in the data frame
library(naniar)
tbl_indicator <- tbl_indicator %>% replace_with_na_all(condition = ~.x == "---") %>% arrange(tanggal)
Check and Change the unmatched columns format, also remove unused columns
#columns format
glimpse(tbl_indicator)
## Rows: 5,480
## Columns: 10
## $ tanggal <chr> "2019-01-01", "2019-01-01", "2019-01-01", "2019-01-01", "2019…
## $ stasiun <chr> "DKI1 (Bunderan HI)", "DKI2 (Kelapa Gading)", "DKI3 (Jagakars…
## $ pm10 <chr> "29", "14", "21", "16", "10", "24", "20", "20", "17", "7", "1…
## $ so2 <chr> "15", "10", "13", "7", "10", "17", "12", "13", "7", "11", "16…
## $ co <chr> "5", "7", "3", "6", "7", "5", "4", "4", "5", "6", "5", "4", "…
## $ o3 <chr> NA, "71", "40", "24", "44", NA, "79", "47", "25", "44", "29",…
## $ no2 <chr> "13", NA, "1", "3", "4", "6", NA, "2", "4", "3", "4", NA, "2"…
## $ max <chr> "29", "71", "40", "24", "44", "24", "79", "47", "25", "44", "…
## $ critical <chr> "PM10", "O3", "O3", "O3", "O3", "PM10", "O3", "O3", "O3", "O3…
## $ categori <chr> "BAIK", "SEDANG", "BAIK", "BAIK", "BAIK", "BAIK", "SEDANG", "…
tbl_indicator <- tbl_indicator %>% mutate(stasiun = as.factor(stasiun),
pm10 = as.numeric(pm10),
co = as.numeric(co),
so2 = as.numeric(so2),
o3 = as.numeric(o3),
no2 = as.numeric(no2),
categori = as.factor(categori),
tanggal = ymd(tanggal))
#remove columns
tbl_clean <- tbl_indicator %>% select(-c(max, critical, categori))
Merge all using left join by date(
tanggal) into one data frame
tbl_south <- tbl_clean %>% filter(stasiun == "DKI3 (Jagakarsa)",) %>% arrange(tanggal) %>% select(-stasiun)
tbl_all <- left_join(tbl_south, tbl_pm25, by="tanggal")
colSums(is.na(tbl_all))
## tanggal pm10 so2 co o3 no2 pm25 aqi
## 0 41 29 25 60 41 34 34
Create the level of the AQI
tbl_all <- tbl_all %>% mutate(category = case_when(aqi <= 50 ~ "Good",
aqi >= 51 & aqi <= 100 ~ "Moderate",
aqi >= 101 & aqi <= 150 ~ "Unhealthy for Sensitive Groups",
aqi >= 151 & aqi <= 200 ~ "Unhealthy",
aqi >= 201 & aqi <= 300 ~ "Very Unhealthy",
aqi >= 301 ~ "Hazardous"),
category = as.factor(category))
Data Visualization :
Air Quality Index Trend
plotaqi <- ggplot(tbl_all, aes(x=tanggal, y=aqi)) +
geom_point(size=1.5, aes(color = category)) +
geom_smooth(method = 'gam', formula = y ~ s(x, bs = "cs")) +
scale_y_continuous(breaks = seq(20,200,20), limits=c(0,200)) +
scale_color_manual(values=c("Good" = "#26B52A",
"Moderate" = "#E2F224",
"Unhealthy for Sensitive Groups" = "#F2B224",
"Unhealthy" = "#F22424",
"Very Unhealthy" = "#7D18B4",
"Hazardous" = "#350e1d"))+
labs(title = "Air Quality Index - PM25 from January 2019 until December 2021",
x = "Year",
y = "AQI(PM2.5)",
fill = "Category") +
theme(legend.title = element_blank(),
axis.text.x = element_text(hjust = 1, angle = 45),
plot.title = element_text(face = "bold"),
panel.background = element_rect(fill = "#ffffff"),
axis.line.y = element_line(colour = "grey"),
axis.line.x = element_line(colour = "grey"))
ggplotly(plotaqi)
The air quality index (AQI) decreased significantly between the Q3 of 2020 until Q4 of 2020 when it was under large-scale social restrictions transition (PSBB Transition), as seen from the plot above. However, what is interesting is that the air quality index (AQI) increased, albeit barely, after large-scale social restrictions. After 2020, we can conclude that the AQI follows the trend line from the previous year but still declines from 2019.
library(zoo)
aqi_cat <- tbl_all %>% mutate(tahun = year(tbl_all$tanggal)) %>%
group_by(tahun, category) %>%
mutate(tahun = as.factor(tahun)) %>%
summarise(freq = n()) %>%
ungroup() %>%
mutate(label = glue("Total : {freq}
Level : {category}"))
plotcount <- ggplot(aqi_cat, aes(x=tahun, y = freq, fill = category, text = label)) +
geom_col(position = "dodge") +
scale_fill_manual(values=c("Good" = "#26B52A",
"Moderate" = "#E2F224",
"Unhealthy for Sensitive Groups" = "#F2B224",
"Unhealthy" = "#F22424",
"Very Unhealthy" = "#7D18B4",
"Hazardous" = "#350e1d")) +
labs(title = "Air Quality Index - PM2.5 Level on Each Year",
x = "Year",
y = NULL,
fill = "Category") +
theme(legend.title = element_blank(),
axis.text.x = element_text(hjust = 1),
plot.title = element_text(face = "bold"),
panel.background = element_rect(fill = "#ffffff"),
axis.line.y = element_line(colour = "grey"),
axis.line.x = element_line(colour = "grey"))
ggplotly(plotcount, tooltip = "text")
Over the year, more days with moderate air quality than days with unhealthy air quality. Even yet, there were more days in 2020 with Unhealthy air quality for Vulnerable Groups than in 2019. Compared to the prior year, good air quality also appears to be improving, albeit not much. As opposed to 2020, 2021 shows a drop in unhealthy air quality and an increase in unhealthy air quality for vulnerable groups. However, the number of days with Good air quality experienced the opposite, which decreased by almost half the number of days with good air quality in 2020.
Recap of Air Quality Index from Januari 2019 to December 2021
library(viridis)
recap <- tbl_all %>% mutate(tahun = year(tbl_all$tanggal),
bulan = month(tbl_all$tanggal, label = T),
hari = day(tbl_all$tanggal)) %>% ggplot(aes(x = bulan , y = hari, fill= category)) +
geom_tile() +
facet_grid(~tahun) +
scale_y_continuous(breaks = seq(1,31)) +
scale_fill_manual(values=c("Good" = "#26B52A",
"Moderate" = "#E2F224",
"Unhealthy for Sensitive Groups" = "#F2B224",
"Unhealthy" = "#F22424",
"Very Unhealthy" = "#7D18B4",
"Hazardous" = "#350e1d")) +
labs(title = "Air Quality Index - PM2.5 Level on Each Year",
x = "Year",
y = NULL,
fill = "Category") +
theme(legend.title = element_blank(),
axis.text.x = element_text(hjust = 1, angle = 90),
plot.title = element_text(face = "bold"),
panel.background = element_rect(fill = "#ffffff"),
axis.line.y = element_line(colour = "grey"),
axis.line.x = element_line(colour = "grey"))
ggplotly(recap)
Distribution of each Pollutant from January 2019 until December 2021
plotpoll <- tbl_all %>%
select(-c(category)) %>%
pivot_longer(col = -tanggal, names_to = 'predictor') %>%
ggplot(aes(x = tanggal, y = value)) +
geom_point() +
geom_smooth(method = 'gam', formula = y ~ s(x, bs = "cs"), col = "red") +
facet_wrap(~predictor, scales = 'free_x', nrow = 2, ncol = 4) +
labs(
title = 'Distribution of each Pollutant from January 2019 until December 2021',
x = 'Year',
y = 'Values'
)
ggplotly(plotpoll)
To get a deeper understanding, we may observe that the plot above comprises a variety of pollutants that could indicate the air quality. According to the distribution, O3 will considerably decrease between 2019 and 2021. We might infer that PM10 and 03 are displaying a descending line in Q3 to Q4 of 2020. On the other hand, beginning in the middle of Q2, PM2.5 levels start to fall until the end of the year. On the other hand, CO, SO2, and NO2 have some roller-coaster slope on several days in Q1 (Feb-Mar) and Q4 (Oct-Nov). We presume that it is a part of the outlier of the data, but we decided not to remove the outlier since it may limit the information on Air Quality.
Seasonality of Pollutant
PM10
plotpm10 <- ggplot(tbl_all, aes(x=tanggal, y=pm10)) +
geom_point(size=1.5, color = "black") +
geom_point(data = pre_lssr, aes(x = tanggal, y = pm10), col = 'blue') +
geom_point(data = lssr, aes(x = tanggal, y = pm10), col = 'red') +
geom_point(data = lssrt, aes(x = tanggal, y = pm10), col = 'green') +
geom_point(data = ppkm, aes(x = tanggal, y = pm10), col = 'pink') +
geom_smooth(method = 'gam', formula = y ~ s(x, bs = "cs"), col = "red") +
scale_y_continuous(breaks = seq(20,100,20), limits=c(0,100)) +
labs(title = "PM10 from January 2019 until December 2021",
x = "Year",
y = "PM10",
fill = "Category") +
theme(legend.title = element_blank(),
axis.text.x = element_text(hjust = 1, angle = 45),
plot.title = element_text(face = "bold"),
panel.background = element_rect(fill = "#ffffff"),
axis.line.y = element_line(colour = "grey"),
axis.line.x = element_line(colour = "grey"))
ggplotly(plotpm10)
PM2.5
plotpm25 <- ggplot(tbl_all, aes(x=tanggal, y=pm25)) +
geom_point(size= 1.5, col = "black") +
geom_point(data = pre_lssr, aes(x = tanggal, y = pm25), col = 'blue') +
geom_point(data = lssr, aes(x = tanggal, y = pm25), col = 'red') +
geom_point(data = lssrt, aes(x = tanggal, y = pm25), col = 'green') +
geom_point(data = ppkm, aes(x = tanggal, y = pm25), col = 'pink') +
geom_smooth(method = 'gam', formula = y ~ s(x, bs = "cs"), col = "red") +
scale_y_continuous(breaks = seq(25,125,25), limits=c(0,125)) +
labs(title = "PM2.5 from January 2019 until December 2021",
x = "Year",
y = "PM2.5") +
theme(legend.title = element_blank(),
axis.text.x = element_text(hjust = 1, angle = 45),
plot.title = element_text(face = "bold"),
panel.background = element_rect(fill = "#ffffff"),
axis.line.y = element_line(colour = "grey"),
axis.line.x = element_line(colour = "grey"))
ggplotly(plotpm25)
SO2
plotso2 <- tbl_all %>%
ggplot(aes(x = tanggal, y = so2)) +
geom_point(size = 1) +
geom_point(data = pre_lssr, aes(x = tanggal, y = so2), col = 'blue') +
geom_point(data = lssr, aes(x = tanggal, y = so2), col = 'red') +
geom_point(data = lssrt, aes(x = tanggal, y = so2), col = 'green') +
geom_point(data = ppkm, aes(x = tanggal, y = so2), col = 'pink') +
geom_smooth(method = 'gam', formula = y ~ s(x, bs = "cs"), col = "red") +
labs(title = "SO2 Level on Each Year",
x = "Year",
y = "SO2",
fill = "Category") +
theme(legend.title = element_blank(),
axis.text.x = element_text(hjust = 1),
plot.title = element_text(face = "bold"),
panel.background = element_rect(fill = "#ffffff"),
axis.line.y = element_line(colour = "grey"),
axis.line.x = element_line(colour = "grey"))
ggplotly(plotso2)
NO2
plotno2 <- tbl_all %>%
ggplot(aes(x = tanggal, y = no2)) +
geom_point(size = 1) +
geom_point(data = pre_lssr, aes(x = tanggal, y = no2), col = 'blue') +
geom_point(data = lssr, aes(x = tanggal, y = no2), col = 'red') +
geom_point(data = lssrt, aes(x = tanggal, y = no2), col = 'green') +
geom_point(data = lssrt, aes(x = tanggal, y = no2), col = 'green') +
geom_point(data = ppkm, aes(x = tanggal, y = no2), col = 'pink') +
labs(title = "NO2 Level on Each Year",
x = "Year",
y = "NO2",
fill = "Category") +
theme(legend.title = element_blank(),
axis.text.x = element_text(hjust = 1),
plot.title = element_text(face = "bold"),
panel.background = element_rect(fill = "#ffffff"),
axis.line.y = element_line(colour = "grey"),
axis.line.x = element_line(colour = "grey"))
ggplotly(plotno2)
The plot above shows the trend and daily distribution of some pollutants from January 2019 to December 2020.
- a: The blue dots represent the pre-lockdown period on January 1st, 2020 - April 10th, 2020.
- b: The red dots represent the PSBB 1st phase period on April 11th, 2020 - June 4th, 2020
- c: The pink dots represent the transitional PSBB including PSBB Ketat (2nd phase) period on June 5th, 2020 - January 11th, 2021
- d: The green dots represent the PPKM period on January 11th, 2021 - August 2nd, 2021.
The line indicates a linear trend between pre and during the lockdown period.
As we know, Indonesia has two seasons, one of which is the Rainy Season, which starts in October - April and has the highest rainfall at the beginning of the year. The pollutants were related to precipitation patterns, with the highest rainfall in January and decreasing each month until it reached the driest month in August. So we can assume that Air Quality is not only affected by people’s mobility or activities but also by climate change, such as rainfall, wind speed, temperature, and humidity.
Conclusion :
Based on the data, we conclude that air quality changes during lockdowns were not as large and significant as measured by PM10, PM2.5, SO2, and CO2 during the lockdown period. Although human mobility decreases during PSBB/PPKM, it cannot significantly decrease air pollution in highly-populated and polluted cities such as Jakarta, as fossil fuel usage and industrial activities were not significantly reduced. In addition to people’s mobility, climate change factors, including rainfall, wind speed, temperature, and humidity, all impact air quality. Air Quality is not only affected by people’s mobility or activities but also by climate change, such as rainfall, wind speed, temperature, and humidity.