1 General introduction

In the 21st century, the largest single threat to the ecology and biodiversity of the planet is a global climate disruption and mass environmental degradation due to the buildup human-generated greenhouse gases and air pollutants in the atmosphere. Although all kinds of air pollutants are harmful to the human being, the most dangerous are both coarse and fine particulate matter (PM2.5 and PM10) due to their strong ability to penetrate deeper into the lungs and bloodstreams unfiltered. In return, they age faster the person’s lungs and declining their function, increasing the risk of COPD (Chronic Obstructive Pulmonary Diseases), causing permanent DNA mutations, heart attacks and cardiovascular disease, premature delivery, birth defects, low birth weight and premature death. It has been noticed that a high prevalence rate of those health effects is strongly connected with the different weather and meteorological conditions in a given location.

2 The effects of weather conditions on local air pollutants concentration

The continuous changes in weather and climate extremes have impacted massively the prevalence of local air pollutants. According to climate scientists, the sunshine, precipitation, wind, humidity, and atmospheric pressure can all affect the air quality presents in area.

During heavy rain in the wet season, the air becomes stagnant, traps emitted PM pollutants then washes out water-soluble PM, often resulting in a decrease of their atmospheric concentration. At the other hand, in the dry season with severe sunshine, heat waves are produced and dry out vegetation to provide more fuel for wildfires and natural fire outbreaks whose smokes are serious air pollutants.

The dry season also leads to the increase in atmospheric temperature that has relevant effect on pollutants concentration due to the fact that it speeds up the atmospheric chemical reactions to form the harmful chemical compounds and sometimes produce smog air pollutants.

The atmospheric scientists are highly confident that the wind speed and direction are playing a significant role in the concentration of the air pollutants. Wind diverts the prevailed air contaminants away from their sources and lowers the former concentration to highly concentrated in the other area. For example, a northerly wind blows together with pollutants from the north to the south and the Westerlies from West to the East. Furthermore, the high wind speed can also generate bags of dust, especially in the dry season, a problem in dry windy rural areas. Generally, the higher the wind speed, the more contaminants are dispersed and the lower air pollutants concentration in the zone.

This piece of work is aiming to present the first intuitive understanding of the direct effects of those weather conditions mostly the wind speed and direction on regional harmful PM concentration through data visualization technique.

3 Employed data set

The data set used has been imported from one of the 100s air quality monitoring sites in the United Kingdom.

3.1 Source of data

The utilized data set is sourced and publicly obtained from both data archived in the London Air Quality Archive http://www.londonair.org.uk and from the openair project website http://www.openair-project.org.

3.3 Import and view the dataset

Let’s get the dataset first then check the available variables and their names

3.4 Dataset description

This dataset contains hourly measurements of different air pollutant concentrations in \(\mu gm^{-3}\) with wind speed and direction measured in \(ms^1\) and degrees respectively. It has been collected at the air quality monitoring station of Marylebone (London, UK) from 1st January 1998 to 23rd June 2005.

The dataset has \(65,533\) observations and \(10\) variables where the first date column is unique in the year-month-day hour: minute: second format. The other remaining \(9\) features are described below:

  • wd (type: dbl): the wind speed in numeric values with decimal points.
  • ws (type: int): the integer wind direction measured in degrees.
  • nox (type: int): the integer air pollutant from the reaction of nitrogen and oxygen gases in the air during combustion, especially at high temperatures.
  • no2 (type: int): the nitrogen dioxide air pollutants produced from road traffic and other fossil fuel combustion processes.
  • o3 (type: int): the ozone air pollutant which is formed when pollutants emitted by cars, power plants, industrial boilers, refineries, chemical plants, and other sources react chemically in the presence of sunlight.
  • pm10 (type: int): the atmospheric particulate matter (PM10) that have a diameter of fewer than 10 micrometres.
  • so2 (type: int): the air pollutant sulfur dioxide (SO2) which are mainly emitted by the burning of fossil fuels, coal, oil, and diesel or other materials that contain sulfur.
  • co (type: int): the carbon monoxide (CO) which is a toxic air pollutant produced in the incomplete combustion of carbon-containing fuels, such as gasoline, natural gas, oil, coal, and wood.
  • pm25(type: int): the atmospheric particulate matter (PM2.5) that have a diameter of fewer than 2.5 micrometres.

4 Data preparation and preprocessing

Since the data preparation is one of the sensitive steps in data analytics, let’s prepare our data set for further analysis.

4.1 Tracing of missing values

The above graph visualizes data by rectangles with available data represented by grey colour scheme, while missing data is visualized by a clearly distinguishable red colour.

##  date    ws    wd   nox   no2    o3  pm10   so2    co  pm25 
##     0   632   219  2423  2438  2589  2162 10450  1936  8775

The above tabular represents exactly the missing value in number for each variable of data set. To get a meaningful insight into missing values, we need also to check the percentage rate of missing values in entire dataset and in our targeted variables (PM2.5 and PM10).

## [1] 0.0482566

The rate of missing values in data is \(4.8\%\) which is relatively low compared to the available data features and it can’t be able to bias our analytical processes.

Here below chart represents the missing values in both coarse and fine particulate matter variables.

The above graph indicates that the points with no missing values represented by standard scatterplot in blue. The points for which PM10 is missing are presented in red along the y-axis and PM2.5 in red on the x-axis. In addition, the boxplots of both variables are represented along the axes with and without the missing values (in red all variable PM10 where PM2.5 is missing, in blue all variable PM2.5 where PM10 is observed).

Therefore, at which rate both PM2.5 and PM10 are missing values?

## [1] 0.003299101
## [1] 0.0133902

It is shown that the missing value rate of PM10 and PM2.5 are \(0.32\%\) and \(1.33\%\) respectively.

Basing on the facts that a large number of observations and all data set variables are sufficiently represented and thought the existing of missing values in data set lead to analytics biasedness especially in the multivariate estimate (i.e Correlation or regression estimates) we have to exclude (delete) the missing values (NA) in our dataset for smooth analysis. ## Deleting the data missing observations

We are removing the non-available data observations, viewing the remaining data set, and present its final descriptive statistics for clarification.

## date   ws   wd  nox  no2   o3 pm10  so2   co pm25 
##    0    0    0    0    0    0    0    0    0    0
Overall
(n=42524)
Fine Particulate Matter (PM2.5)
Mean (SD) 22.0 (12.5)
Median [Min, Max] 20.0 [0.00, 381]
Coarse Particulate Matter (PM10)
Mean (SD) 35.1 (21.4)
Median [Min, Max] 32.0 [1.00, 800]
Wind direction
Mean (SD) 197 (94.4)
Median [Min, Max] 210 [0.00, 360]

It is seen that no remaining missing values in data. As it is shown on the above table, we have a healthy mean of \(22.0 \mu gm^-3\) and \(35.1\mu gm^-3\) for PM2.5 and PM10 respectively which are below the international standard of \(65.4\) and \(150 \mu gm^-3\) for PM2.5 and PM10 respectively.

Finally, the data set is cleaned no more missing values and other strangers, we can go ahead with the first step of data analytics (visualization).

5 Visualize the impact of weather conditions on PM concentration

Due to the fact that the particulate matter (PM2.5 and PM10) contains microscopic solids or liquid droplets that are too small to be easily inhaled and cause serious health problems, we are highly motivated to present graphically their concentration that is strongly impacted by the aforementioned weather conditions prevailed in the region.

5.2 Yearly PM concentration

It is seen that the PM concentration decreases with time and the highest concentrations occur in three consecutive years (\(1999-2001\)) at this site. They have been caused by the northerly wind blows from North to South and make them concentrated in the south-western part of the site. The blue outer concentric circle represent the maximum wind speed resulting in low concentration of PM air pollutants.

5.3 The year 2000, daily and monthly PM concentration.

Since we have identified that the high PM concentration occurs in the year 2000, let’s present its both daily and monthly PM concentration.

Now it is possible to see that PM is likely to be concentrated at the last week of the month, especially in January, March, June and July. Besides, lesser concentration occurs in last quarter of the year 2000 especially on 25th, 26th and 27 th December 2000.

5.4 Seasonal PM concentration

The above graph gives a piece of very concise information on how PM concentration is typically affected by wind speed and direction which are presenting at a given area in different seasons. Graphs show also that the percentage of time that winds blow from a particular direction (4 cardinal direction indicated, N, S, W, E) is changing with PM distribution.

In fact, the above wind roses show that most of the time the wind at this site blow from South-West due to the long spoke around the southwest direction. The highest wind frequency occurs in summer season which is Southwest dry monsoon wind. This kind of wind concentrates particulate matter in the region and sometime might influence heavy rain and storms in the region. This change in the weather can produce flooding and even raise the wildfire threat resulting in the production of smog and smokes harmful to lives in the area.

At the other hand, the low concentration of both PM2.5 and PM10 occur in the remaining seasons with estimated concentrations of \(0-10\) and \(0-20 \mu g m^-3\) respectively with reasonably minimum wind speed prevailed in each climatic season.

Let’s assess the direct impacts of each seasons together with the existing wind on the PM concentartion in the region:

5.5 The weekdays and weekend PM concentration.

The above bivariate polar plots indicate that both fine and coarse particulate matter is highly concentrated in the south and small part of the northeast during weekdays due to the high wind frequency blowing from Northeast. The weekend met the lesser PM concentration due to the various reasons including the limited traffic jam and human activities emitting PM in the region.

5.6 Daily PM concentration

The above plots indicate that the atmospheric particulate matter is highly occurring during the weekdays than weekend due to the different reasons including the heavy traffic jam and unlimited man-made activities in working days than a weekend.

6 Daytime and Nighttime PM concentration

The highest PM concentration occurs during the daytime than nighttime at this air quality monitoring site and when the wind is blowing from the South.

6.1 Weighted Pearson Correlation between PM2.5 and PM10

Here below the Pearson correlation coefficient for two pollutants is calculated and plotted in order to identify their possible sources basing on their level of relationship.

The above graph shows that they are highly correlated at a rate of above \(70\%\). Now, we are confident to identify their main sources basing on this statistical hypothesis namely “The highly correlated pollutants are directly originated from the same sources”

The main prevailed sources of particulate matter at the Marylebone site were automobiles fuel burning, industrial processes, and windblown bags of dust.

7 Conclusion

Here we evaluated and presented graphically the impact of weather conditions on the air pollutants concentration, specifically most harmful particulate matter, PM at Marylebone air quality monitoring site in the United Kingdom(UK). We plotted the PM concentration in different period of time, and we found that the weather conditions (wind, temperature, precipitation) have significant effects on air pollutants concentration in each period of time considered.

We applied the Pearson correlation methodology to measure the strength of the linear relationship between both coarse and fine particulate matter for identifying and apportioning their main sources at the site. After the identification of the sources, we highly recommend the serious abatement of air pollutants and advice the people to avoid the exposure in case of the outdoor physical exertion especially the vulnerable population (children, elder people and those with pulmonary and cardiovascular diseases)

Finally, it is expected in the near future that Machine Learning algorithms will be widely used for the investigation of statistical dependency of weather conditions and atmospheric particulate matter concentration in Marylebone air quality monitoring site.