Introduction

This report provide an analysis of air quality data, focusing on some pollutants: PM1O(particulate matter), NO(Nitric Oxide), NO2(Nitrogen Dioxide) and NdiOx(Nitrogen Oxides as NO2). The datasets are from 2018-2023 and was processed to highlight pollutant levels on specific dates, obtain a monthly average of pollutants in 2020 and I ran my own obseravations by identifying long-term changes, particularly from 2021 to 2023 when the Clean Air Zone (CAZ) was implemented.

Flex Dashboard containing my 5 Plots

click here for Interactive Flex Dashoard

Loading required libraries

library(flexdashboard) #: for implementing dashboard
library(knitr)         #: for report formatting
library(tidyverse)     #: for data manipulation
library(plotly)        #: for interactive visualization
library(viridis)       #: for color scale for plots
library(rmarkdown)     #: for report formatting
library(htmlwidgets)   #: for report formatting
library(tinytex)       #: for creation of pdf file from 

Step One: Data Cleaning

#Creating a function for reading files, skipping metadata rows and renaming
clean <- function(file) {
  data <- read.csv(file, skip = 4) %>% 
    rename(PM10 = "PM.sub.10..sub..particulate.matter..Hourly.measured.",
           NO = "Nitric.oxide" ,
           NO2 = "Nitrogen.dioxide",
           NdiOx = "Nitrogen.oxides.as.nitrogen.dioxide") %>% 
    mutate(time = ifelse(time == "24:00", "00:00", time), 
           datetime = as.POSIXct(paste(Date, time), format="%d-%m-%Y %H:%M", 
                                 tz = "UTC")) %>% 
    select(datetime, PM10, NO, NO2, NdiOx)
}

Step Two:Importing Files, Combining and Handling missing data

#Listing my dataset files
files <- c("C:\\Users\\WINDOWS11\\Desktop\\DataAVAssessement\\DataSets\\POAR_2018.csv",
           "C:\\Users\\WINDOWS11\\Desktop\\DataAVAssessement\\DataSets\\POAR_2019.csv",
           "C:\\Users\\WINDOWS11\\Desktop\\DataAVAssessement\\DataSets\\POAR_2020.csv",
           "C:\\Users\\WINDOWS11\\Desktop\\DataAVAssessement\\DataSets\\POAR_2021.csv",
           "C:\\Users\\WINDOWS11\\Desktop\\DataAVAssessement\\DataSets\\POAR_2022.csv",
           "C:\\Users\\WINDOWS11\\Desktop\\DataAVAssessement\\DataSets\\POAR_2023.csv")

#Combining all rows into one dataframe and cleaning
data_list <- files %>% 
  lapply(clean) %>% 
  bind_rows() %>% 
  drop_na()

#Using mean imputation to get rid of null values
data_list <- data_list %>% 
  mutate(PM10 = ifelse(is.na(PM10), mean(PM10, na.rm = TRUE), PM10),
         NO = ifelse(is.na(NO), mean(NO, na.rm = TRUE), NO),
         NO2= ifelse(is.na(NO2), mean(NO2, na.rm = TRUE), NO2),
         NdiOx = ifelse(is.na(NdiOx), mean(NdiOx, na.rm = TRUE), NdiOx)
  )

Step Three: Filtering

#getting required dates
dates_req <- as.Date(c("2018-12-20", "2019-01-03", "2020-03-19",
                       "2020-03-26", "2020-06-29", "2020-11-10",
                       "2020-12-20", "2021-01-03", "2021-11-29",
                       "2022-07-25", "2023-07-24"))

#Getting cleaned dates from data frame
data_req <- data_list%>% 
  filter(as.Date(datetime) %in% dates_req)

Step Four: Monthly Average Pollutants for 2020 Extraction

#Deriving monthly pollutants in 2020
avg2020 <- data_list %>% 
  filter(format(datetime, "%Y") == 2020) %>% 
  mutate(month = factor(format(datetime, "%B"), levels = month.name, ordered = TRUE))

monthlyAvg <- avg2020 %>% 
  group_by(month) %>% 
  summarise(
    PM10 = mean(PM10),
    NO = mean(NO),
    NO2 = mean(NO2),
    NdiOx = mean(NdiOx)
  ) %>% 
  pivot_longer(cols = c(PM10, 
                        NO, 
                        NO2, 
                        NdiOx), 
               names_to = "Pollutant",
               values_to = "averageValue")

Step Five: Yearly Averages of PM10 Levels from 2021- 2023 Extraction

#Deriving PM10 levels from 2021 to 2023 for 5th analysis
yearlyPM10 <- data_list %>% 
  filter(format(datetime, "%Y") %in%
           c("2021", "2022", "2023")) %>% 
  mutate(year = format(datetime, "%Y")) %>% 
  group_by(year) %>% 
  summarise(PM10 = mean(PM10))

#Step Six: Plotting * In this step, I visualized my data using interactive plots with Plotly, selecting the most suitable chart types based on recommendations from GeeksforGeeks [1], *R for Data Science [2]**, and Statology [3]. While designing the visualizations, I ensured accessibility by considering colorblind-friendly palettes, referencing Datanovia’s guide on effective color palettes [4]. Additionally, I created a flexdashboard in R and deployed it using the free RPubs service by Posit. During this process, I noticed that some variables displayed duplicate readings when hovering over them. To address this, I customized the tooltip settings, ensuring that only the most relevant information was displayed for each data point, improving readability and user experience. References: [1] GeeksforGeeks. Data Visualization in R. Available at: https://www.geeksforgeeks.org/data-visualization-in-r/ [2] Wickham, H. R for Data Science - Data Visualization. Available at: https://r4ds.had.co.nz/data-visualisation.html [3] Statology. How to Create a Bubble Chart in R. Available at: https://www.statology.org/bubble-chart-in-r/#:~:text=You%20can%20use%20the%20following%20basic%20syntax%20to,syntax%20to%20create%20a%20bubble%20chart%20in%20practice. [4] Datanovia. Top R Color Palettes to Know for Great Data Visualization. Available at: https://www.datanovia.com/en/blog/top-r-color-palettes-to-know-for-great-data-visualization/#google_vignette*

PM10 Levels for required dates

#1. Plot for pm10 levels
pm10levels <- data_req %>% 
  ggplot(aes(x = datetime, y = PM10, color = PM10, text = paste("Dates and time:", datetime, "<br>PM10:", PM10))) +
  geom_point()+
  scale_color_viridis(option = "D")+
  labs(title="Scatter Plot of PM10 Levels on Required Dates", x = "Dates and time", y = "PM10 (µg/m³)")+
  theme_gray(base_size = 14)
ggplotly(pm10levels, tooltip = "text")

NO Levels for required dates

#2. Plot for Nitric Oxide(NO)
noPlot <- data_req %>% 
  ggplot(aes(x = datetime, y = NO, color = NO, text = paste("Dates and time:", datetime, "<br>NO:", NO))) +
  geom_jitter() +
  scale_color_viridis(option = "C") +
  labs(title="Jitter Plot of NO Levels on Required Dates", x = "Dates and time", y = "NO (µg/m³)")+
  theme_gray(base_size = 14)
ggplotly(noPlot, tooltip = "text")

NdiOx Levels for required dates

#3. Plot for NdiOx levels
ndioxPlot <- data_req %>% 
  ggplot(aes(x = datetime, y = NdiOx, size = NdiOx, color = NdiOx, text = paste("Dates and time:", datetime, "<br>NO2:", NO2))) +
  geom_point(alpha = 0.5, size = 3) +
  scale_color_viridis(option = "H") +
  labs(title="Bubble Plot of NdiOx Levels on Required Dates", x = "Dates and time", y = "NdiOx (µg/m³)")+
  theme_gray(base_size = 14)
ggplotly(ndioxPlot, tooltip = "text")

Monthly Average of Pollutants for 2020

#4. Plot for Monthly Average of Pollutants for 2020
month_avgPlot <- monthlyAvg %>% 
  ggplot(aes(x = month, y =averageValue, fill = Pollutant))+
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_viridis_d(option = "C") +
  labs(title = "Monthly Averages for Pollutants in 2020", x = "Months", y = "Average Value(µg/m³)",
       fill = "Pollutants")+
  theme_gray(base_size = 14)
ggplotly(month_avgPlot)

PM10 Levels from 2021-2023

#5. PM10 Levels from 2021-2023
yearlyPM10_plot <- yearlyPM10 %>% 
  ggplot(aes(x = year, y = PM10, group = 1)) +
  geom_line(color = viridis(1, option = "D"))+
  geom_point(color = viridis(1, option = "B"))+
  labs(title = "PM10 Trends from 2021-2023", x = "Year", y = "PM10 Levels(µg/m³)")+
  theme_gray(base_size = 14)
ggplotly(yearlyPM10_plot)

Conclusion

This study provides insights into pollution trends, highlighting fluctuations in PM10, NO, NO2 and NdiOx over specific days, months, and yeas. Since the CAZ implementation, a notable decline in pollutant levels has been observed, suggesting a positive impact.

Additional Resources used for this project:

My Lesson Data Analysis with R programming Complete Course. Available here: https://youtu.be/x79bPHXCxlM?si=N7FuWNKI01Ndu_69

R programming in one hour - a crash course for beginners. Available here:https://youtu.be/eR-XRSKsuR4?si=UrhNbEFODFiLXg_f

Freecode Camp R Programming. Available here:https://youtu.be/_V8eKsto3Ug?si=LYMUNcjICY6leFST

Generate a .pdf from RMarkdown file with R. Available here:https://www.geeksforgeeks.org/generate-pdf-from-rmarkdown-file-with-r/

Openchatai: used to check just go through some basic concepts in R. Chat available here: https://chatgpt.com/share/67ed2be4-431c-800c-ba50-cd0eb332bbfd

Openchatai: used to check through my report to ensure there were no errors. Chat available here: https://chatgpt.com/share/67ed2a90-554c-800c-bfef-de99c6206bdb