PM10 Spatiotemporal Patterns in Portugal: Functional Data Analysis in 2017
Air pollution significantly and severely affects human health, the environment, materials, and the economy emerging as a key microclimate and air quality regulation issue. Hence, the spatial and temporal characterisation of air pollutants and their relationship with meteorological constraining factors is paramount, particularly from a climate change perspective. Air pollutants’ spatial and temporal characterisation over Portugal is performed, focusing particularly on the emissions of Particulate Matter (PM) during the major wildfire events in 2017-2018. This will be performed based on the Copernicus Atmosphere Monitoring (CAMS) data, to benefit from having reliable and gridded information on the atmosphere composition and its related processes, anywhere in the world. Specifically, to take advantage of having gridded air quality (AQ) data over the Portuguese territory, to assess AQ environmental emergencies in less covered areas from the national air quality network. Within this context, we propose an exploratory statistical tool that combines functional data analysis (FDA) with unsupervised learning algorithms and spatial statistics to extract meaningful information about the main spatiotemporal patterns underlying air pollutant exceedances in mainland Portugal. Firstly, we describe the temporal evolution of air pollutant concentrations by CAMS grid node as a function of time and outline the main temporal patterns of variability using a functional principal component analysis. Then, CAMS grid nodes are classified according to their spatiotemporal similarities through hierarchical clustering adapted to spatially correlated functional data.
environmental monitoring, air pollution, CAMS data analysis, spatial decision support systems
1 CAMS data
Data are stored in a netcdf file and contain air mass concentration of PM10 ambient aerosol (\(\mu g/m^3\)) data for Portugal, during the period between 2017-05-01 and 2017-10-31. Data were provided by the CAMS European air quality interim reanalysis. For illustrative purposes, PM10 concentrations for four of those days are shown below.
2 CAMS grid
The original coordinate reference system is lat-long WGS84 (EPSG: 4326). For analysis, the spatial grid was projected to the ETRS89/Portugal TM06 (EPSG:3763) coordinate system. The projected grid (shown below) contains PM10 concentrations in an array with 70 rows, 34 columns along the 184 days between 2017-05-01 and 2017-10-31.
3 Time-series & functional data
Time-series data (top) and smoothed curves (bottom) fitted to time-series data with cubic splines (with knots every 2 days). The more severe wildfire events (june, july, august and october) can be spotted in both plots.
4 Temporal covariance
As a consequence of wildfire events, PM10 concentrations tend to increase dramatically over a very short period of time. The next plot shows the correlation matrix (normalized covariance) between pairs of days during the analysed period. Looking at the plot, a pattern of negative or low correlations (blue-yellow) is visible for the days around the wildfire events occured within the studied period.
5 Decomposing temporal variability
A decomposition of functional data into principal components based on the covariance matrix was performed. The following plot illustrates results of the functional principal component analysis for the first 5 components, namely the proportion of overall variability explained by each component (absolute and cumultative). The first component is the most relevant component as it represents the dominant variation in functional data. This component represents 55% of overall variability. The explained variability decreases sequentially from one component to the next.
From the same plot we can see that, for example, the first two components account for 73% of the variation.
6 Temporal patterns
The next plots show the temporal decomposition of functional data variability as seen by their functional principal components. These functions represent the functional eigenvectors (eigenfunction) associated to the principal components. For PC1, representing 55% of overall variability, the eigenfunction reveals the overall trend in PM10 concentrations, which is dominated by a major forest wildfire event occurred in mid-october. The high concentrations of PM10 and high variability at that time makes this period the most remarkable in terms of overall variability. Locations (i.e. pixels) with the highest positive scores in PC1 will be the ones with highest above-average PM10 concentrations.
The eigenfunction associated to PC2 (which represents 18% of overall variability) highlights a different profile of variability characterized by higher frequency of ‘curve bumps’- in june, august and early october - representing periods of relevant variability in PM10 concentrations (probably due to other, although smaller, wildfire events).
7 Spatial patterns
These maps provide the scores of each pixel and help to understand which locations (i.e. pixels) are more relevant to describe the temporal variations identified in principal component eigenfunctions (a pixel with a high score will have a curve contributing to increase variability). Looking both at these maps and at the eigenfunctions, we can establish the link between space and time, to assess which locations are mostly linked with the ‘curve bumps’ as seen by the eigenfunctions’ curves.
Highest scores in first component are associated to wildfire that occurrred in mid-october. Associated to Ophelia hurricane event, the prevalent wind directions during the wildfire were south and southwest. The highest scores from the second component are associated to other wildfire events that occurred in june, july and august in different regions of centre-south of Portugal (?).
8 Spatial correlation
Are curves spatially correlated?, i.e., do curves of neighbour pixels tend to be similar? The semi-variogram (known as trace-variogram in the functional case) shows a clear spatial correlation between neighbour pixels.
We used a variogram model - exponential- with no nugget-effect and an estimated sill equal to 7882 (units: \([\mu g/m^3]^2\)) and an estimated range parameter of 153 km. These indicate that PM10 concentrations are correlated up to 153 km.
9 Hierachical classification
Using the functional principal component results we now aggregate the pixels in groups (based on their curve dissimilarity). With this aim, a hierarchical clustering (HC) technique was used. HC computes a dissimilarity matrix, based on the dissimilarity between pixel curves, that is used to cluster the pixels in homogeneous groups. Here we explicitly add the spatial correlation model fitted previously (variogram) to weight the dissimilarity matrix (so more weight is given to dissimilarities between pairs of curves from locations close to each other).
The number of clusters is set by the user. Here we set 4 clusters, with the agglomeration method ‘Ward D2’.
The spatial pattern of clusters matches the results provided by CAMS images and summarized by the first principal component. This is expected as the first principal components provides an overall trend, dominated by the variability of PM10 concentrations ( ~ 55% of total variability) caused by the wildfire in mid-october.
10 Curves by cluster
Next plots looks into the curves by cluster as defined by hierachical cluster algorithm. The black line, illustrated in all plots, refers to the overall median curve. The clustering method seems to capture distinct patterns of PM10 curves and cluster them into homogenous groups.