The Modificable Areal Unit Problem

The MAUP concept

A common difficulty with spatial data is that they are often “aggregated”, that is, generalized from their original spatial scale (i.e. high detail) into a smaller spatial scale (i.e. low detail). The best example is a demographic census which is surveyed at household level but reported at several levels of aggregation such as blocks, sectors, municipalities, provinces and departments (O’Sullivan & Unwin, 2010).

The problem is that the aggregation units are arbitrary with respect to the phenomena under investigation, yet the units used will affect statistics determined on the basis of data reported in this way. This difficulty is referred to as the modifiable areal unit problem (MAUP). If the spatial units in a particular study were specified differently, users would observe very different patterns and relationships (O’Sullivan & Unwin, 2010).

Let’s illustrate MAUP using the R language and enviroment for statistical computing and graphics. First, let’s create two raster variables, x (i.e. the independent variable) and y (i.e. the dependent variable). You can create and display these rasters using this code:

if(!require(raster)){
    install.packages("raster")
    library(raster)
}

## Loading required package: raster

## Loading required package: sp

x <- c(87,95,72,37,44,24,40,55,55,38,88,34,41,30,26,35,38,24,14,56,37,34,8,18,49,44,51,67,17,37,55,25,33,32,59,54)
       

y  <- c(72,75,85,29,58,30,50,60,49,46,84,23,21,46,22,42,45,14,19,36,48,23,8,29,38,47,52,52,22,48,58,40,46,38,35,55) 
        

mx <- matrix(ncol=6, nrow=6, x)

my <- matrix(ncol=6, nrow=6, y)

rx <- raster(mx)

ry <- raster(my)

par(mfrow=c(1,2))
plot(rx,main="Independent raster (x)")
text(rx)
plot(ry,main="Dependent raster (y)")
text(ry)

Establishing relationship between x and y

Original scale

A linear regression model can be used for such a purpose, for example:

model_1 <- lm(ry[] ~ rx[])

summary(model_1)

## 
## Call:
## lm(formula = ry[] ~ rx[])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.303  -8.536   3.293   7.466  20.312 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.37497    4.12773   2.513   0.0169 *  
## rx[]         0.75435    0.08668   8.703 3.61e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.49 on 34 degrees of freedom
## Multiple R-squared:  0.6902, Adjusted R-squared:  0.6811 
## F-statistic: 75.74 on 1 and 34 DF,  p-value: 3.605e-10

plot(model_1)

Aggregation scheme 1

Let’s aggregate the raster by a 2-factor only in vertical direction (i.e. the resulting raster will have six colummns and three rows):

rx1 <- aggregate(rx, fact=c(1,2), fun=mean, expand=T)

ry1 <- aggregate(ry, fact=c(1,2), fun=mean, expand=T)

par(mfrow=c(1,2))
plot(rx1,main="Independent raster (x1)")
text(rx1)
plot(ry1,main="Dependent raster (y1)")
text(ry1)

Now, let’s create a new linear regression model:

model_2 <- lm(ry1[] ~ rx1[])

summary(model_2)

## 
## Call:
## lm(formula = ry1[] ~ rx1[])
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.884 -3.621 -1.827  5.501  9.119 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.58992    3.74759   3.626  0.00227 ** 
## rx1[]        0.67982    0.08096   8.398 2.94e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.768 on 16 degrees of freedom
## Multiple R-squared:  0.8151, Adjusted R-squared:  0.8035 
## F-statistic: 70.52 on 1 and 16 DF,  p-value: 2.94e-07

Aggregation scheme 2

Let’s aggregate the raster by a 2-factor only in horizontal direction (i.e. the resulting raster will have three colummns and six rows):

rx2 <- aggregate(rx, fact=c(2,1), fun=mean, expand=T)

ry2 <- aggregate(ry, fact=c(2,1), fun=mean, expand=T)

par(mfrow=c(1,2))
plot(rx2,main="Independent raster (x2)")
text(rx2)
plot(ry2,main="Dependent raster (y2)")
text(ry2)

Now, let’s create another linear regression model:

model_3 <- lm(ry2[] ~ rx2[])

summary(model_3)

## 
## Call:
## lm(formula = ry2[] ~ rx2[])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.4540 -3.2961 -0.8083  4.1461  8.9260 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.25704    3.88449   0.324     0.75    
## rx2[]        0.96571    0.08491  11.373 4.46e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.487 on 16 degrees of freedom
## Multiple R-squared:  0.8899, Adjusted R-squared:  0.883 
## F-statistic: 129.4 on 1 and 16 DF,  p-value: 4.456e-09

Usually, as shown here, regression relationships are strenghtened by aggregation. In fact, using a simulation approach, Openshaw and Taylor (1979) showed that with the same underlying data, it is possible to aggregate units together in ways that produce correlations anywhere between -1.0 and +1.0 (O’Sullivan & Unwin, 2010).

The practical implications of MAUP are immense for almost all decision-making processes involving GIS technology, since with the availability of aggregated maps, policy could easily focus on issues and problems which might look different if the aggregation scheme used were changed (O’Sullivan & Unwin, 2010).

If you want to see more go to http://gispopsci.org/maup/ and have a look

REFERENCE

O’Sullivan, D. and Unwin, D.J., 2010, Geographic Information Analysis Second Edition. John Wiley & Sons. 405 pp.