This report provide an analysis of air quality data, focusing on some pollutants: PM1O(particulate matter), NO(Nitric Oxide), NO2(Nitrogen Dioxide) and NdiOx(Nitrogen Oxides as NO2). The datasets are from 2018-2023 and was processed to highlight pollutant levels on specific dates, obtain a monthly average of pollutants in 2020 and I ran my own obseravations by identifying long-term changes, particularly from 2021 to 2023 when the Clean Air Zone (CAZ) was implemented.
click here for Interactive Flex Dashoard
library(flexdashboard) #: for implementing dashboard
library(knitr) #: for report formatting
library(tidyverse) #: for data manipulation
library(plotly) #: for interactive visualization
library(viridis) #: for color scale for plots
library(rmarkdown) #: for report formatting
library(htmlwidgets) #: for report formatting
library(tinytex) #: for creation of pdf file from
#Creating a function for reading files, skipping metadata rows and renaming
clean <- function(file) {
data <- read.csv(file, skip = 4) %>%
rename(PM10 = "PM.sub.10..sub..particulate.matter..Hourly.measured.",
NO = "Nitric.oxide" ,
NO2 = "Nitrogen.dioxide",
NdiOx = "Nitrogen.oxides.as.nitrogen.dioxide") %>%
mutate(time = ifelse(time == "24:00", "00:00", time),
datetime = as.POSIXct(paste(Date, time), format="%d-%m-%Y %H:%M",
tz = "UTC")) %>%
select(datetime, PM10, NO, NO2, NdiOx)
}
#Listing my dataset files
files <- c("C:\\Users\\WINDOWS11\\Desktop\\DataAVAssessement\\DataSets\\POAR_2018.csv",
"C:\\Users\\WINDOWS11\\Desktop\\DataAVAssessement\\DataSets\\POAR_2019.csv",
"C:\\Users\\WINDOWS11\\Desktop\\DataAVAssessement\\DataSets\\POAR_2020.csv",
"C:\\Users\\WINDOWS11\\Desktop\\DataAVAssessement\\DataSets\\POAR_2021.csv",
"C:\\Users\\WINDOWS11\\Desktop\\DataAVAssessement\\DataSets\\POAR_2022.csv",
"C:\\Users\\WINDOWS11\\Desktop\\DataAVAssessement\\DataSets\\POAR_2023.csv")
#Combining all rows into one dataframe and cleaning
data_list <- files %>%
lapply(clean) %>%
bind_rows() %>%
drop_na()
#Using mean imputation to get rid of null values
data_list <- data_list %>%
mutate(PM10 = ifelse(is.na(PM10), mean(PM10, na.rm = TRUE), PM10),
NO = ifelse(is.na(NO), mean(NO, na.rm = TRUE), NO),
NO2= ifelse(is.na(NO2), mean(NO2, na.rm = TRUE), NO2),
NdiOx = ifelse(is.na(NdiOx), mean(NdiOx, na.rm = TRUE), NdiOx)
)
#getting required dates
dates_req <- as.Date(c("2018-12-20", "2019-01-03", "2020-03-19",
"2020-03-26", "2020-06-29", "2020-11-10",
"2020-12-20", "2021-01-03", "2021-11-29",
"2022-07-25", "2023-07-24"))
#Getting cleaned dates from data frame
data_req <- data_list%>%
filter(as.Date(datetime) %in% dates_req)
#Deriving monthly pollutants in 2020
avg2020 <- data_list %>%
filter(format(datetime, "%Y") == 2020) %>%
mutate(month = factor(format(datetime, "%B"), levels = month.name, ordered = TRUE))
monthlyAvg <- avg2020 %>%
group_by(month) %>%
summarise(
PM10 = mean(PM10),
NO = mean(NO),
NO2 = mean(NO2),
NdiOx = mean(NdiOx)
) %>%
pivot_longer(cols = c(PM10,
NO,
NO2,
NdiOx),
names_to = "Pollutant",
values_to = "averageValue")
#Deriving PM10 levels from 2021 to 2023 for 5th analysis
yearlyPM10 <- data_list %>%
filter(format(datetime, "%Y") %in%
c("2021", "2022", "2023")) %>%
mutate(year = format(datetime, "%Y")) %>%
group_by(year) %>%
summarise(PM10 = mean(PM10))
#Step Six: Plotting * In this step, I visualized my data using interactive plots with Plotly, selecting the most suitable chart types based on recommendations from GeeksforGeeks [1], *R for Data Science [2]**, and Statology [3]. While designing the visualizations, I ensured accessibility by considering colorblind-friendly palettes, referencing Datanovia’s guide on effective color palettes [4]. Additionally, I created a flexdashboard in R and deployed it using the free RPubs service by Posit. During this process, I noticed that some variables displayed duplicate readings when hovering over them. To address this, I customized the tooltip settings, ensuring that only the most relevant information was displayed for each data point, improving readability and user experience. References: [1] GeeksforGeeks. Data Visualization in R. Available at: https://www.geeksforgeeks.org/data-visualization-in-r/ [2] Wickham, H. R for Data Science - Data Visualization. Available at: https://r4ds.had.co.nz/data-visualisation.html [3] Statology. How to Create a Bubble Chart in R. Available at: https://www.statology.org/bubble-chart-in-r/#:~:text=You%20can%20use%20the%20following%20basic%20syntax%20to,syntax%20to%20create%20a%20bubble%20chart%20in%20practice. [4] Datanovia. Top R Color Palettes to Know for Great Data Visualization. Available at: https://www.datanovia.com/en/blog/top-r-color-palettes-to-know-for-great-data-visualization/#google_vignette*
#1. Plot for pm10 levels
pm10levels <- data_req %>%
ggplot(aes(x = datetime, y = PM10, color = PM10, text = paste("Dates and time:", datetime, "<br>PM10:", PM10))) +
geom_point()+
scale_color_viridis(option = "D")+
labs(title="Scatter Plot of PM10 Levels on Required Dates", x = "Dates and time", y = "PM10 (µg/m³)")+
theme_gray(base_size = 14)
ggplotly(pm10levels, tooltip = "text")
#2. Plot for Nitric Oxide(NO)
noPlot <- data_req %>%
ggplot(aes(x = datetime, y = NO, color = NO, text = paste("Dates and time:", datetime, "<br>NO:", NO))) +
geom_jitter() +
scale_color_viridis(option = "C") +
labs(title="Jitter Plot of NO Levels on Required Dates", x = "Dates and time", y = "NO (µg/m³)")+
theme_gray(base_size = 14)
ggplotly(noPlot, tooltip = "text")
#3. Plot for NdiOx levels
ndioxPlot <- data_req %>%
ggplot(aes(x = datetime, y = NdiOx, size = NdiOx, color = NdiOx, text = paste("Dates and time:", datetime, "<br>NO2:", NO2))) +
geom_point(alpha = 0.5, size = 3) +
scale_color_viridis(option = "H") +
labs(title="Bubble Plot of NdiOx Levels on Required Dates", x = "Dates and time", y = "NdiOx (µg/m³)")+
theme_gray(base_size = 14)
ggplotly(ndioxPlot, tooltip = "text")
#4. Plot for Monthly Average of Pollutants for 2020
month_avgPlot <- monthlyAvg %>%
ggplot(aes(x = month, y =averageValue, fill = Pollutant))+
geom_bar(stat = "identity", position = "dodge") +
scale_fill_viridis_d(option = "C") +
labs(title = "Monthly Averages for Pollutants in 2020", x = "Months", y = "Average Value(µg/m³)",
fill = "Pollutants")+
theme_gray(base_size = 14)
ggplotly(month_avgPlot)
#5. PM10 Levels from 2021-2023
yearlyPM10_plot <- yearlyPM10 %>%
ggplot(aes(x = year, y = PM10, group = 1)) +
geom_line(color = viridis(1, option = "D"))+
geom_point(color = viridis(1, option = "B"))+
labs(title = "PM10 Trends from 2021-2023", x = "Year", y = "PM10 Levels(µg/m³)")+
theme_gray(base_size = 14)
ggplotly(yearlyPM10_plot)
This study provides insights into pollution trends, highlighting fluctuations in PM10, NO, NO2 and NdiOx over specific days, months, and yeas. Since the CAZ implementation, a notable decline in pollutant levels has been observed, suggesting a positive impact.
My Lesson Data Analysis with R programming Complete Course. Available here: https://youtu.be/x79bPHXCxlM?si=N7FuWNKI01Ndu_69
R programming in one hour - a crash course for beginners. Available here:https://youtu.be/eR-XRSKsuR4?si=UrhNbEFODFiLXg_f
Freecode Camp R Programming. Available here:https://youtu.be/_V8eKsto3Ug?si=LYMUNcjICY6leFST
Generate a .pdf from RMarkdown file with R. Available here:https://www.geeksforgeeks.org/generate-pdf-from-rmarkdown-file-with-r/
Openchatai: used to check just go through some basic concepts in R. Chat available here: https://chatgpt.com/share/67ed2be4-431c-800c-ba50-cd0eb332bbfd
Openchatai: used to check through my report to ensure there were no errors. Chat available here: https://chatgpt.com/share/67ed2a90-554c-800c-bfef-de99c6206bdb