PM2.5 is the type of particle matter (PM) whose particle size is less than or equal to 2.5 micrometres that causes a bunch of air quality problems with the rise of rapid industrialization in many places (Xing et al, 2016; He et al, 2001). Air pollution with PM2.5, mainly brings a visibility reduction and health concerns. Its small diameter that allows to pass through the nasal filtration and to be accumulated in our body through diffusion or air exchange in the lungs brings the risks of damaging respiratory systems or other parts of the body by carrying toxic materials with it (Xing et al, 2016). To capture the spatial distribution, it should be considered that the strong sources are likely to influence the PM concentrations evenly over large areas without topographic obstacles (Gehrig and Buchmann, 2003), therefore the creation of a continuous surface for visualizing PM2.5 spatial distribution will be required rather than using point pattern or spatial autocorrelation analysis.
Modelling techniques such as GIS are powerful tools to monitor and analyze geographic events associated with natural and artificial resources (Li and Heap, 2008). In terms of PM concentration, it would show continuous values across the surface as it travels through the atmosphere, therefore even if there are places with no measured values, theoretically, they should have some similar values to the one that was measured. This is derived from the “first law of geography” by Waldo Tobler saying, “Everything is relative to everything else but near things are more related than distant things” (2004). Spatial interpolation incorporates the concept of the first law of geography at work to generate a surface (either continuous or discontinuous) based on the known values at some specific locations by estimating unknown/unmeasured values from the known ones. One of the spatial interpolation method is Kriging. It has a lot of varieties such as Simple Kriging, Ordinary Kriging, Dual Kriging just to name a few (Li and Heap, 2008).Other non-geostatistical interpolation techniques include regression models, nearest neighbours, and Fourier series for instances (Li and Heap, 2008).Unfortunately, there is no perfect interpolation method to produce surface visualization. For instance, the Thiessen polygons technique is introduced as a simple but practical tool (Fiedler, 2003), but limitations can also be found in the creation of surface (Mu, 2009). From this context, it is required to examine which techniques will work better depending on the input data identify which method is better to visually represent the spatial distribution of PM 2.5 data based on the station measurements by comparing the outputs and methods themselves.
The are called “South Coast” was selected as the study site, which is shown in the pale blue colour near the bottom of figure 1. The air basin of South Coast stretches over the part of Los Angeles, Orange, Riverside and San Bernardino.
Figure 1: The map of California Air Basins
The climate at the air basin is characterized as subtropical for Los Angeles as an instance that describes somewhat mild, rainy winters and hot sunny summers (climates to travel). The temperature will range from 25℃ to 40℃ over a year, and the highest temperature was recorded as 45℃ in the city centre (climates to travel). When closely looking at the real data taken in the South Coast air basin, we can notice that the Air Quality Index (AQI) is utilized for the air quality forecast (the US air quality index goes: Good > Moderate > Unhealthy for sensitive groups > Unhealthy > Very unhealthy > Hazardous) (AirNow). As of November 10, 2020, the overall trend of air quality in the South Coast shows that 60% of the regions are good and the other 40% are described as moderate, so the general air quality in the area is quite good (South Coast AQMD). However, PM 2.5 seemed the major source of pollutants in this basin as well as PM10 and Ozone (South Coast AQMD).
Figure 2: The image of US Air Quality Index
Figure 3: The image of AQI forecast for November 10, 2020
The information about California Air Basins Datase are shown in the table 1.
The information about Air Monitoring Stations are shown in the table 2.
The PM2.5 dataset was downloaded as PM25HR_PICKDATA_2016-4-30.csv. The data was extracted from database on May 11, 2016. The data range was from April 1 to April 30, 2016.
After all the preparation of the datasets, the map of Mean and Maximum PM2.5 values for the monitoring stations look like the ones below. The map with Mean PM2.5 values and Max PM2.5 values for air monitoring stations in the South Coast Air Basin.
It is the simplest method in this study to create a surface from the known measurement points. The polygons will be generated by two major steps. Firstly, connect all points by straight line to be triangulated (forming the triangulated irregular network), and then draw the perpendicular bisector lines for each connecting lines. In other words, find such a straight line (bisector line) that has the same distance from the two points and connect all lines.
Figure 4: The image showing the Thiessen Polygon creation with besector lines
Image sourced from Bentley HAMMER CONNECT Edition Help
When this procedure is completed, the measurement value for each point will be assigned for each polygon as the representation of each area/polygon. Therefore, the flow from the input data points to output surface would look like the figure below.
Figure 5: The image showing input and output for Thiessen Polygon method
Image sourced from ArcGIS Pro.
IDW is designed to transform the point data into a raster dataset by assigning the predicted values for the cells that do not have measurements based on the measured points. The assumption of the first law of geography strongly influences when predicting the unknown values. We would incorporate that idea as the exponential power in the calculation. The formula for creating a surface with IDW is below:
Figure 6: The image showing the formula for Inverse Distance Weighting. More information about IDW can be retrived from here.
Where Zj represents the value at location j, dij is the distance between i and j, and the p is the exponential power that changes what degree we would like to weight by distance. An exponential power P is obviously important for the output through determining which degree we would put more weight on closer values than those of relatively distant. However, the extent of neighborhood is also the important parameter for IDW (ArcGIS Pro).
What we use here is an Ordinary Kriging. Kriging is categorized as geostatistical methods, meaning they use statistical models (ArcGIS Pro). An Ordinary Kriging aims to find the best weights from the semivariogram model. It shows the spatial variance over distance for the dataset. Each point on the semivariogram below shows the relationship between two points in terms of the distance between the two points and the difference in the measured value between those two points.
Figure 7: The image of semivariogram with all point data
We would simplify this plot as it is too messy to deal with. Bins that represent a certain range of distance, are assigned to do so. For instance, we can assign bins for each 100m distance range. They would be used for the calculation of semi-variance with the equation below.
Figure 8: The image showing the equation for calculating the semivariance with biins
Where N(h) is the number of point data in the designated bin, zi and zj are the values at the location i and j. The simplified semivariogram with bins should look like this.
Figure 9: The image of semivariogram with binned distance range
To create a source model from the point data, finding such a line on the semivariogram that holds the minimum error between the each point on the plot and the model is required. There are three characteristics to consider when doing this: sill, range and nugget (shown in the figure below).
Figure 11: The image of semivariogram with sill, nugget, and range
Three modeling methods are introduced in this study: Spherical, Exponential, Gaussian. How each model behaves on the semivariogram are slightly different at origin behavior and around the sill.
Figure 12: The image of semivariogram with three model types: spherical, exponential, and Gaussian
Spherical - can reach the sill at specific value and range.
Exponential - increases its value infinitely as it approaches the sill asymptotically at a certain range, which is until the 95% of the sill value.
Gaussian - similar to exponential method, but behave differently at the origin where the line shows relatively linear nature at first meaning low variability in values until the model hits the certain distance.
With one model above, Kriging forms weights from surrounding known points to predict the unknown values z at the location I, using the formula below:
Figure 13: The image of equation to calculate the unknown points based on the model
The figure here shows the output for mean PM2.5 data in South Coast Air Basin with Thiessen Polygon method. The value ranges from 6 to 14, where the white blue polygons show the lower PM2.5 values while the stronger purple shows the higher concentration of PM2.5. Three polygons ended up with missing values. The output with Maximum PM2.5 data is shown here. This one has a larger range than the mean values as it may contains some extremes.
There is no solid outcomes for IDW, so we modified the exponential power to see which power may work better over the others. The first map here is created with the exponential power of 1. We then started increasing the exponential power gradually from here. The map here is the surface created with the exponential power of 3.To contrast the effect of changing the exponential power in IDW, the map with power of 10 was also created. When comparing the maps with different power in IDW, it is noticeable that the higher the power is, the rapid the transition of PM2.5 values. The map with higher power shows the clear contrast of highs and lows in the map. Meanwhile, the one with lower power resulted in similar PM2.5 values across the study area. The most appropriate plot with Leave-one-out validation resulted in the exponential power of 0.001 shown in the figure here. The map with 95% confidence interval called jackknife plot was then created with the optimal power.
With keeping all the variables (sill, nugget, range) constant, three different models for Kriging were used to make a comparison. The variogram with exponential model is here. The variogram with spherical model is here. The variogram with gaussian model is here.
From the comparison among three models, the Gaussian model was selected as the most appropriate model to represent the semivariogram of point data. With the selected model and the parameters, where the sill was set as 15, range is 50000 and nugget is 0, we generated the interpolation surface map, Variance map, and 95% Confidence Interval map.
From this study, we can see that three methods that were used for spatial interpolation of the PM2.5 data enabled us to successfully produce the output surfaces for the whole South Coast Air Basin.
The Thiessen Polygon method is the simplest local interpolation technique; it required the shortest procedure to run. However, we can see some issues on the output where the abrupt changes in values at the polygon boundaries and value in each polygon would be strongly influenced by the measured values even if they might be outliers or extremes. In this study, some polygons ended up with missing values as the transition of point data did not work well.
IDW is the local, deterministic interpolation method. It created of a smoother surface than the output of Thiessen Polygon. Unlike the Thiessen Polygons method, this technique allows us to predict unknown values based on multiple data points in the defined neighbour circle. It is a useful tool as we can change how strong we weight the closer values against far points. However, two determinant parameters for IDW, which are the neighbour size and exponential power to the distance, needs to be carefully selected depending on the purpose of the study. There was an issue in the use of the Leave-one-out validation. It was utilized to identify the appropriate power to the distance, but it was challenging to get the validation line move towards the “expected” one, so the validation did not help us to figure out the appropriate power value in this study.
Kriging is the only global interpolation method. It takes a few more steps to complete, and it takes a longer time to produce the output surface. It imposes several selections for the process. We need to choose which model we would use by experimenting with all models. The selection of three parameters (that are sill, nugget, and range) would also be required manually. The assumption of a constant mean (sill) in the model can be questionable in some cases. On the other hand, as Kriging incorporates all data to build the model, it seems to create the most realistic output surface of all three methods. It additionally produced variance and confident internal maps as well as the map with measurement values.
With the comparison among three interpolation methods, it was concluded that the predicted mean PM2.5 map with Kriging was chosen as the most reasonable output of all. One of the reasons is that the output with the Thiessen Polygons was the most unreal map with a rapid transition in values at the boundaries. Another reason includes that the output with IDW represents the monitoring stations as the centers of high pollution zones as if they were polluting the air, but it may not true. The map with Kriging, on the other hand, assigned the high values where the high populated areas locate, therefore, it seemed this one is the most appropriate interpolation
Another study that compared the Thiessen Polygon to Kriging reported a higher degree in an absolute error from the real measurements for UV exposure (Tatalovich et al, 2006). This result corresponded with our result as both measures atmosphere-related values. Meanwhile, the Thiessen Polygon was utilized to supplement the forecast of the transit ridership (Wang et al, 2014), this technique may work better for the analysis of geographic phenomena that involves certain boundaries, such as roads or neighbourhoods. Furthermore, our study agreed with the idea that input data would influence interpolation methods, the density of samples or sampling distance for example (Li and Heap, 2008). We only dealt with 11 stations data to create the surface, so the most area in the air basin needed predicted values. IDW and Kriging both generated smooth surfaces, but there were some differences in outputs. Even though this study concluded the Kriging outperformed IDW, another study disagreed with this assumption in which IDW showed lower mean errors and higher r^2 values during validations (Zarco-Perello and Simões, 2017), and some results from other study supported that statement (Hernandez-Flores et al., 2015). The confrontation in those arguments may arise from the differences in input data described above, where we had 11 points while other studies incorporated lots more measurements. Thus, we should carefully look at the input data first to select which method to use for spatial interpolation.
Xing, Y. F., Xu, Y. H., Shi, M. H., & Lian, Y. X. (2016). The impact of PM2.5 on the human respiratory system. Journal of thoracic disease,8(1), 69–74. https://doi.org/10.3978/j.issn.2072-1439.2016.01.19
He, K., Yang, F., Ma, Y., Zhang, Q., Yao, X., Chan, C. K., Cadle, S., Chan, T., & Mulawa, P. (October, 2001). The characteristics of PM2.5 in Beijing, China. Atmospheric Environment, 35(29), 4959-4970. https://doi.org/10.1016/S1352-2310(01)00301-6.
Gehrig, R., and Buchmann, B. (2003). Characterising seasonal variations and spatial distribution of ambient PM10 and PM2.5 concentrations based on long-term Swiss monitoring data. Atmospheric Environment, 37(19), 2571-2580. https://doi.org/10.1016/S1352-2310(03)00221-8.
Tobler, W. (2004). On the First Law of Geography: A Reply. Annals of the Association of American Geographers, 94(2). https://doi.org/10.1111/j.1467-8306.2004.09402009.x.
Li, J., and Heap, A. D. (2008). A review of Spatial Interpolation Methods for Environmental Scientists. Geoscience Australia. Retrieved from http://planet.botany.uwc.ac.za/NISL/BCB_BIM_honours/Course_Documents_2016/Spatial_Interpolation_statistics_methods.pdf.
GIS resources. (n.d.). Classification of Interpolation. Retrieved from https://www.gisresources.com/classification-of-interpolation_2/.
Fiedler, F. R. (2003). Determining Station Weights Using Thiessen Polygons and Isohyetal Maps. Journal of Hydrologic Engineering, 8(4). https://doi.org/10.1061/(ASCE)1084-0699(2003)8:4(219)
Mu, L. (2009). Thiessen Polygon. Elsevier, International Encyclopedia of Human Geography. 231-236. https://doi.org/10.1016/B978-008044910-4.00545-9.
Climates to Travel. (n.d.). Climate - Los Angeles (California). Retrieved from https://www.climatestotravel.com/climate/united-states/los-angeles.
AirNow. (n.d.). AQI Basics. Retrieved from https://www.airnow.gov/aqi/aqi-basics/.
South Coast AQMD. (November 09, 2020). Air Quality Forecast. Retrieved from http://www.aqmd.gov/home/air-quality/air-quality-forecasts.
Bentley HAMMER CONNECT Edition Help. (n.d.). Generating Thiessen Polygons. Retrieved from https://docs.bentley.com/LiveContent/web/Bentley%20HAMMER%20SS6-v1/en/GUID-9905E1ADF07649D0845A960DB6D033DC.html.
ArcGIS pro. (n.d.). Create Thiessen Polygons (Analysis). Retrieved from https://pro.arcgis.com/en/pro-app/tool-reference/analysis/create-thiessen-polygons.htm.
ArcGIS Pro. (n.d.). How Inverse distance weighted interpolation works. Retrieved from https://pro.arcgis.com/en/pro-app/help/analysis/geostatistical-analyst/how-inverse-distance-weighted-interpolation-works.htm.
ArcGIS pro. (n.d.). How Kriging works. Retrieved from https://pro.arcgis.com/en/pro-app/tool-reference/spatial-analyst/how-kriging-works.htm.
Tatalovich, Z., Wilson, J., & Cockburn, M. (2006, January 1). A Comparison of Thiessen Polygon, Kriging, and Spline Models of Potential UV Exposure. Cartography and Geographic Information Science. https://doi.org/10.1559/152304006779077318
Wang, S., Sun, L., Rong, J., & Yang, Z. (2014). Transit Traffic Analysis Zone Delineating Method Based on Thiessen Polygon. Sustainability, 6(4). 1821-1832; https://doi.org/10.3390/su6041821.
Zarco-Perello, S., and Simões, N. (2014). Ordinary Kriging vs inverse distance weighting: spatial interpolation of the sessile community of Madagascar reef, Gulf of Mexico.
Hernandez-Flores A, Condal A, Poot-Salazar A, Espinoza-Mendez JC. (2015). Fisheries Research. Geostatistical analysis and spatial modeling of population density for the sea cucumbers Isostichopus badionotus and Holothuria floriana on the Yucatan Peninsula, Mexico. Fisheries Research 172.114–124 DOI 10.1016/j.fishres.2015.07.005.