Correlation Network Graph - WHO Air Quality Dataset
Author
TEAM 16
Introduction
This project aims to analyze relationships between air pollution variables using a correlation network graph. The dataset contains variables such as PM2.5, PM10, NO2, and Ozone.
Load Required Libraries
This step loads necessary libraries for data processing and visualization.
library(readxl)
Warning: package 'readxl' was built under R version 4.5.3
library(ggplot2)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(igraph)
Warning: package 'igraph' was built under R version 4.5.3
Attaching package: 'igraph'
The following objects are masked from 'package:dplyr':
as_data_frame, groups, union
The following objects are masked from 'package:stats':
decompose, spectrum
The following object is masked from 'package:base':
union
library(ggraph)
Warning: package 'ggraph' was built under R version 4.5.3
Install Packages (Run Only Once)
This step installs required packages if not already installed.
install.packages("readxl")
Warning: package 'readxl' is in use and will not be installed
install.packages("ggraph")
Warning: package 'ggraph' is in use and will not be installed
Load Dataset
Here we load the dataset from the local system and display first few rows.
data <-read.csv("C:/Users/Prathee Gowda/Documents/Air_Quality.csv")head(data)
WHO.Region ISO3 WHO.Country.Name City.or.Locality
1 Eastern Mediterranean Region AFG Afghanistan Kabul
2 European Region ALB Albania Durres
3 European Region ALB Albania Durres
4 European Region ALB Albania Elbasan
5 European Region ALB Albania Elbasan
6 European Region ALB Albania Elbasan
Measurement.Year PM2.5...g.m3. PM10...g.m3. NO2...g.m3.
1 2019 119.77 NA NA
2 2015 NA 17.65 26.63
3 2016 14.32 24.56 24.78
4 2015 NA NA 23.96
5 2016 NA NA 26.26
6 2017 NA NA 24.70
PM25.temporal.coverage.... PM10.temporal.coverage....
1 18 NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
NO2.temporal.coverage....
1 NA
2 83.96119
3 87.93260
4 97.85388
5 96.04964
6 89.29224
Reference
1 U.S. Department of State, United States Environmental Protection Agency
2 European Environment Agency (downloaded in 2021)
3 European Environment Agency (downloaded in 2021)
4 European Environment Agency (downloaded in 2021)
5 European Environment Agency (downloaded in 2021)
6 European Environment Agency (downloaded in 2021)
Number.and.type.of.monitoring.stations Version.of.the.database Status
1 <NA> 2022 NA
2 <NA> 2022 NA
3 <NA> 2022 NA
4 <NA> 2022 NA
5 <NA> 2022 NA
6 <NA> 2022 NA
Check Column Names
This step displays column names to understand dataset structure.
Country City Year PM25 PM10 NO2 Ozone
1 Afghanistan Kabul 2019 119.77 NA NA NA
2 Albania Durres 2015 NA 17.65 26.63 83.96119
3 Albania Durres 2016 14.32 24.56 24.78 87.93260
4 Albania Elbasan 2015 NA NA 23.96 97.85388
5 Albania Elbasan 2016 NA NA 26.26 96.04964
6 Albania Elbasan 2017 NA NA 24.70 89.29224
Country City Year PM25 PM10 NO2 Ozone
1 Afghanistan Kabul 2019 119.77 NA NA NA
2 Albania Durres 2015 NA 17.65 26.63 83.96119
3 Albania Durres 2016 14.32 24.56 24.78 87.93260
4 Albania Elbasan 2015 NA NA 23.96 97.85388
5 Albania Elbasan 2016 NA NA 26.26 96.04964
6 Albania Elbasan 2017 NA NA 24.70 89.29224
Check Data Structure
Understanding the type of each variable.
str(clean_data)
'data.frame': 32191 obs. of 7 variables:
$ Country: chr "Afghanistan" "Albania" "Albania" "Albania" ...
$ City : chr "Kabul" "Durres" "Durres" "Elbasan" ...
$ Year : int 2019 2015 2016 2015 2016 2017 2015 2016 2014 2015 ...
$ PM25 : num 119.8 NA 14.3 NA NA ...
$ PM10 : num NA 17.6 24.6 NA NA ...
$ NO2 : num NA 26.6 24.8 24 26.3 ...
$ Ozone : num NA 84 87.9 97.9 96 ...
Select Numeric Data
We extract only numeric variables for correlation.
PM25 PM10 NO2 Ozone
1 119.77 NA NA NA
2 NA 17.65 26.63 83.96119
3 14.32 24.56 24.78 87.93260
4 NA NA 23.96 97.85388
5 NA NA 26.26 96.04964
6 NA NA 24.70 89.29224
Create Correlation Matrix
This calculates relationships between variables.
cor_matrix <-cor(numeric_data, use ="complete.obs")cor_matrix