Code
library(tidyverse)
library(plotly)
library(leaflet)
library(RColorBrewer)
airquality <- read_csv("~/Documents/Data Visualization - BAN 350/Assignments/Assignment 5/AQI and Lat Long of Countries.csv")I chose to analyze Kaggle user Aditya Ramachandran’s World Air Quality Index by City and Coordinates for this assignment. The dataset merges city-level air quality data from Swiss air quality technology company IQAir’s World Air Quality Report with city latitude and longitude measurements. I chose to contrast individual AQI rankings of common air pollutants with the combined AQI rankings of cities in China and India.
library(tidyverse)
library(plotly)
library(leaflet)
library(RColorBrewer)
airquality <- read_csv("~/Documents/Data Visualization - BAN 350/Assignments/Assignment 5/AQI and Lat Long of Countries.csv")CNairquality <- filter(airquality, Country == "China")
INairquality <- filter(airquality, Country == "India")
CombinedAQ <- full_join(CNairquality, INairquality)CNplot <- plot_ly(
data = CNairquality,
x = ~`CO AQI Value`,
y = ~`Ozone AQI Value`,
color = ~`AQI Category`,
hoverinfo = "text",
text = ~paste("City:", City,
"Combined AQI Value:",`AQI Value`,
"Combined AQI Category:", `AQI Category`),
type = 'scatter',
mode = 'markers'
)
INplot <- plot_ly(
data = INairquality,
x = ~`CO AQI Value`,
y = ~`Ozone AQI Value`,
color = ~`AQI Category`,
hoverinfo = "text",
text = ~paste("City:", City,
"Combined AQI Value:",`AQI Value`,
"Combined AQI Category:", `AQI Category`),
type = 'scatter',
mode = 'markers'
)
combined_plot <- subplot(CNplot, INplot, shareX = TRUE, shareY = TRUE) |>
layout(title = 'Air Quality Indices in China and India') |>
layout(showlegend = FALSE)
combined_plotWith a 1.41 and 1.43 billion inhabitants as of 2025, China and India are the two most populous countries in the world. Their rapidly growing economies are heavily dependent on polluting fossil fuels; together they account for one third of yearly global emissions.
The dataset records city-level values for four common types of air pollutants: carbon dioxide (CO), ozone, nitrogen dioxide (NO2), and breathable particulate matter (PM2.5). Could one pollutant type be responsible for higher combined AQI values and more hazardous air quality in Chinese and Indian cities? I chose to examine CO and ozone values effect on AQI in a scatterplot format.
In the above visualization, I contrasted CO AQI values on the X axis with ozone AQI values on the Y axis. I added hovertext to include city names, combined AQI values, and combined AQI categories for further understanding of each data marker’s place on the chart. I assigned discrete colors to each combined AQI category so that color clusters might reveal patterns in CO and ozone value rankings.
Both countries show higher ozone AQI values than CO AQI values. China has only one city in the Hazardous combined AQI category, while India has 41 cities in the category. Both scatterplots show a defined pattern: at ozone AQI values of 100 or below, the more likely the city is in Good or Moderate combined AQI categories. However, some outliers in Very Unhealthy and Hazardous categories have relatively low ozone AQI values. It is possible that the other two pollutants, NO2 and PM2.5, are more influential on the combined AQI value than CO and ozone values. A more effective visualization could integrate these pollutants’ values using traces to show the values on the same chart.
pal <- colorFactor(
palette = 'Dark2',
domain = CombinedAQ$`AQI Category`)
Basemap <- leaflet(CombinedAQ) |>
addTiles()
addCircles(map = Basemap,
lng = ~lng,
lat = ~lat,
radius = 150,
weight = 10,
popup = ~City,
color = ~pal(`AQI Category`)) |>
addLegend("bottomleft",
pal = pal,
values = ~`AQI Category`,
opacity = 1)I mapped the coordinates of each city in China and India to circles in Leaflet, and assigned discrete colors to each combined AQI category. I added a popup that shows each city name for clarity, as the default zoom level does not show cities. The circles revealed a cluster of Unhealthy, Very Unhealthy, and Hazardous-category cities in China’s eastern provinces, and a similar cluster in India’s northern states. These areas are heavily populated and highly industrialized; where millions people live and work, pollution will accumulate. Combined AQI values are lower in less densly populated areas, such as China’s northeastern Heilongjiang Province and India’s southern state of Karnataka.
I would rate my map as an 7 out of 10 for its clarity in communicating the spatial story of the dataset. The discrete colors for each category clearly show ‘hotspot’ areas for high combined AQI values. However, mapping all coordinates shows several errors in the dataset’s assignment of country names - for instance, a data point meant to be mapped to Houma in China’s Shanxi Province was instead mapped to Houma, Louisiana. These incorrect values caused the map’s zoom level and starting position to show the entire world projection. If these points were removed, the map would better communicate the scope of the dataset to cover just Chinese and Indian cities. The discrete color palette could also be swapped for a continuous color scale; viewers associate the color green with ‘Good’ and red with ‘Bad’, allowing more clarity on which map areas are more affected by pollution.