This project investigates various air pollutants in South Korea using real-time and historical data obtained from the Air Korea Official Website. The primary objective is to analyze pollution trends, identify major pollutants, and assess their potential impact on public health and the environment.
The dataset was sourced from the Air Korea Official Website, which provides credible, comprehensive records of air quality data across South Korea. Its long-term coverage and real-time updates make it a reliable source for environmental research.
The raw data contains several significant issues that must be addressed before analysis.
The data cleaning workflow follows these steps:
total = data.frame()
for (i in 1:12){
wt <- read_xls(paste0(('/Users/lionlucky7/Desktop/R Project/'),i,('.xls')), skip=2)
wt <- wt[-1,]
wt %>%
rename(
Date = '날짜',
O3 = '오 존',
NO2 = '이산화질소',
CO = '일산화탄소',
SO2 = '아황산가스'
) %>%
select(-'최종확정\n여부') %>%
arrange(Date) -> wt
total <- rbind(total,wt)
}
total
This heat map shows the correlation between air pollutants
corr <- cor(total[5:10],total[5:10])
ggcorrplot(corr,outline.col = "white",
colors = c("#bf3022", "white", "#2967ff"), lab=TRUE) +
ggtitle("Correlation Matrix of Air Pollutants in South Korea (2018)") +
theme(plot.title = element_text(hjust = 0.5, size=12))
This correlation heat map tells us various features about the dataset:
total %>%
mutate(season = case_when(
Month %in% c(12, 1, 2) ~ 'Winter',
Month %in% c(3,4,5) ~ 'Spring',
Month %in% c(6,7,8) ~ 'Summer',
Month %in% c(9, 10, 11) ~ 'Fall'
)) %>%
pivot_longer(cols = c(PM10, PM2.5, O3, NO2, CO, SO2),
names_to = 'matters',
values_to = 'number') %>%
ggplot(aes(y=number, color=season)) +
geom_boxplot() +
theme_igray() +
facet_wrap(vars(matters), scales = 'free_y') +
labs(title = 'Seasonal Distribution of Air Pollutants',
y = 'Measurement Data') +
theme(plot.title = element_text(hjust=0.76, size=17, face='bold'),
axis.text.x = element_blank(),
axis.title.y = element_text(size=15))
total %>%
group_by(Month) %>%
summarise(
PM10 = mean(PM10, na.rm=TRUE),
PM2.5 = mean(PM2.5,na.rm=TRUE),
O3 = mean(O3, na.rm=TRUE),
NO2 = mean(NO2, na.rm=TRUE),
CO = mean(CO, na.rm=TRUE),
SO2 = mean(SO2, na.rm=TRUE)
) %>%
pivot_longer(!Month, names_to = 'matter', values_to = 'number') %>%
ggplot(aes(x=Month, y=number)) +
facet_wrap(vars(matter), scales='free_y',
nrow=3) +
geom_col() +
scale_x_continuous(breaks = 1:12,
labels = c(
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
)) +
theme_igray() +
labs(
y = 'Measurement Data',
title = 'Monthly Average Concentrations of Major Air Pollutants'
) +
theme(
plot.title = element_text(hjust=0.5, face='bold', size=18),
panel.grid.major.x = element_blank(),
axis.text.x = element_text(size=8),
axis.title.x = element_text(size=15),
axis.title.y = element_text(size=15)
)
PMs <- total %>% select(Date, Year, Month, Day, PM10, PM2.5)
PMs %>%
ggplot(aes(x=Date)) +
geom_line(aes(y=PM10, color='PM10'), size=0.5) +
geom_line(aes(y=PM2.5, color= 'PM2.5'), size=0.5) +
scale_color_manual(values=c("#CC6666", "#9999CC")) +
labs(y = 'PM level(㎍/㎥)')
This monthly average concentrations of major air pollutants tell us a general trend * Except O3, all other pollutants are increased during the winter and decrease in the summer