PM10 Spatiotemporal Patterns in Portugal: Functional Data Analysis in 2018
Air pollution significantly and severely affects human health, the environment, materials, and the economy emerging as a key microclimate and air quality regulation issue. Hence, the spatial and temporal characterisation of air pollutants and their relationship with meteorological constraining factors is paramount, particularly from a climate change perspective. Air pollutants’ spatial and temporal characterisation over Portugal is performed, focusing particularly on the emissions of Particulate Matter (PM) during the major wildfire events in 2017-2018. This will be performed based on the Copernicus Atmosphere Monitoring (CAMS) data, to benefit from having reliable and gridded information on the atmosphere composition and its related processes, anywhere in the world. Specifically, to take advantage of having gridded air quality (AQ) data over the Portuguese territory, to assess AQ environmental emergencies in less covered areas from the national air quality network. Within this context, we propose an exploratory statistical tool that combines functional data analysis (FDA) with unsupervised learning algorithms and spatial statistics to extract meaningful information about the main spatiotemporal patterns underlying air pollutant exceedances in mainland Portugal. Firstly, we describe the temporal evolution of air pollutant concentrations by CAMS grid node as a function of time and outline the main temporal patterns of variability using a functional principal component analysis. Then, CAMS grid nodes are classified according to their spatiotemporal similarities through hierarchical clustering adapted to spatially correlated functional data. The proposed methodology provides an automated and robust approach for quantifying temporal trends in PM10 time series data to support subsequent air quality classification.
environmental monitoring, air pollution, CAMS data analysis, spatial decision support systems
1 CAMS data
Data are stored in a netcdf file and contain air mass concentration of PM10 ambient aerosol (\(\mu g/m^3\)) data for Portugal, during the period between 2018-05-01 and 2018-10-31. Data were provided by the CAMS European air quality interim reanalysis. For illustrative purposes, PM10 concentrations for four of those days are shown below.
2 CAMS grid
The original coordinate reference system is lat-long WGS84 (EPSG: 4326). For analysis, the spatial grid was projected to the ETRS89/Portugal TM06 (EPSG:3763) coordinate system. The projected grid (shown below) contains PM10 concentrations in an array with 70 rows, 34 columns along the 184 days between 2018-05-01 and 2018-10-31.
3 FDA
Time-series data (top) and smoothed curves (bottom) fitted to time-series data with cubic splines (with knots every 5 days). Portugal faced a major forest fire in August 2018, which can be seen in both plots. Other smaller peaks can be seen in the remaining months of the period.
4 Temporal covariance
As a consequence of extremes events caused by major wildfires, PM10 concentrations tend to increase dramatically over a very short period of time. The next plot shows the correlation matrix (normalized covariance) between pairs of days during the analysed period. Looking at the plot, a pattern of negative or low correlations (blue-green-yellow) is visible during the days the wildfire occurred in august.
The month of june was characterized by a mix of weather patterns. While there was a heatwave between June 15 and 25 in the North and some parts of the Central region, the overall precipitation levels remained high. Indeed, according to the Portuguese Institute for Sea and Atmosphere (IPMA), June 2018 was classified as one of the rainiest june since 2000. In the plot, the strong pattern of negative correlations visible in the second half of june is probably reflecting the decrease in PM10 due to high precipitation levels.
5 FPCA
A decomposition of functional data into principal components based on the covariance matrix was performed. The following plot illustrates results of the functional principal component analysis for the first 5 components, namely the proportion of overall variability explained by each component (absolute and cumultative). The first component is the most relevant component as it represents the dominant variation in functional data. This component represents 72% of overall variability. The explained variability decreases sequentially from one component to the next.
From the same plot we can see that, for example, the first two components account for 82% of the variation.
6 Temporal patterns
The next plots show the temporal decomposition of functional data variability as seen by their functional principal components. These functions represent the functional eigenvectors (eigenfunction) associated to the principal components. For PC1, representing 72% of overall variability, the eigenfunction reveals the overall trend in PM10 concentrations, which is dominated by a noisy pattern with ups and downs.
The eigenfunction associated to PC2 (which represents 11% of overall variability) highlights a different profile of variability characterized by a peak in august reflecting PM10 concentrations released during the wildfire event in Serra of Monchique in august (3-10). Another peak can be spotted in late august that can be related with several wildfires in the Extremadura region (Spain) which may have contributed to the increase in PM10 levels observed in adjacent regions of Portugal due to transboundary transport of smoke and fine particles.
7 Spatial patterns
Looking both at these maps and at the eigenfunctions, we can establish the link between space and time, to assess which locations are mostly linked with the peaks and bumps as seen by the eigenfunctions’ curves.
The first component reflects the influence of multiple factors, including traffic emissions, wildfires, and Saharan dust intrusions (?). All these factors, individually and collectively, have contributed to the lower air quality found in coastal regions. Highest scores in second component are associated to wildfires that occurred in august.
8 Spatial correlation
Are curves spatially correlated?, i.e., do curves of neighbour pixels tend to be similar? The semi-variogram (known as trace-variogram in the functional case) shows a clear spatial correlation between neighbour pixels.
We used a variogram model - exponential- with no nugget-effect and an estimated sill equal to 76 (units: \([\mu g/m^3]^2\)) and an estimated range parameter of 654 km. These indicate that PM10 concentrations are correlated up to 654 km.
9 Hierachical classification
Using the functional principal component results we now aggregate the pixels in groups (based on their curve dissimilarity). With this aim, a hierarchical clustering (HC) technique was used. HC computes a dissimilarity matrix, based on the dissimilarity between pixel curves, that is used to cluster the pixels in homogeneous groups. Here we explicitly add the spatial correlation model fitted previously (variogram) to weight the dissimilarity matrix (so more weight is given to dissimilarities between pairs of curves from locations close to each other).
The number of clusters is set by the user. Here we set 4 clusters, with the agglomeration method ‘Ward D2’.
The spatial pattern of clusters matches the results provided by the first principal component. This is expected as the first principal components provides an overall trend, dominated by the variability of PM10 concentrations ( ~ 72% of total variability).
The increased traffic during the summer months along coastal region contributed to higher PM10 concentrations, especially during periods of atmospheric stability when pollutants can accumulate near the surface.
The major wildfire activity in Monchique (Algarve, South of Portugal) released smoke and particulate matter over vast regions of south Portugal and Spain. Moreover, wildfires in Extremadura (Spain) also contributed to transboundary pollution.
Finally, the dust transported from the Sahara Desert could have contributed to significant increases in PM10 concentrations, particularly in southern and coastal regions.
10 Curves by cluster
Next plots looks into the curves by cluster as defined by hierachical cluster algorithm. The black line, illustrated in all plots, refers to the overall median curve. The clustering method seems to capture distinct patterns of PM10 curves and cluster them into homogenous groups.