`dplyr`

The aim here is to provide an approach to geographically weighted approaches that fits neatly into the â€˜tidy dataâ€™ framework provided by the `dplyr`

package and other related ones. Initially, this requires the `tidyverse`

group of packages to be loaded from the library:

The way this is achieved is to store the geographical weighting as a sparse matrix - but essentially in data frame (or tibble) format. Suppose we have a *source* data set \(S\) (with each record having a geographical location) and a *target* data set \(T\) (also with each record having a geographical location. Sometimes the source and target data sets are the same. Then there will be a distance between each element in \(S\) and each element in \(T\), giving \(n(S) \times n(D)\) distances. These can be stored in a three column data frame of the form:

Source data object ID | Target data object ID | Distance |
---|---|---|

SID001 | TID001 | 1.5km |

SID001 | TID002 | 6.4km |

\(\vdots\) | \(\vdots\) | \(\vdots\) |

If \(S\) and \(T\) are both large data sets, then this may be a problem. However, here, the weights (which depend on the distances) are of more importance. The weights are a function of distance, as well as some paramter to control the window width. If we use distance-based weights to begin with, the parameter will be called \(h\) and will represent the radius within which the weights are non-zero.

Some examples are

Name | Formula when \(d<h\) | Formula when \(d \ge h\) | Slope when \(d \nearrow h\) |
---|---|---|---|

Box Car | \(1\) | \(0\) | \(-\infty\) |

Linear | \(1 - \frac{d}{h}\) | \(0\) | \(-\frac{1}{h}\) |

BiSquare | \(\left(1 - (\frac{d}{h})^2\right) ^2\) | \(0\) | \(0\) |

BiCubic | \(\left(1 - (\frac{d}{h})^3\right) ^3\) | \(0\) | \(0\) |

CosSquare | \(\cos^2 \left( \frac{\pi d}{2h} \right)\) | \(0\) | \(0\) |

and their graphs are as below when \(h = 1\)

As seen in the table, not all of these are smooth when \(d=h\) - in particular the Box Car and Linear weighting functions are not. However, these are all weighting schemes for which the weight is zero once the the distance between geographical objects exceeds \(d\). This is where a sparse storage method is useful. These weights will be used for weighted sums, means and so on, and anything with a zero weight is excluded. Thus, to save space, the tabular form suggested above, but with weights rather than distances - and more specifically only *non-zero* weights will be used. Thus the table will be:

Source data object ID | Target data object ID | Weight |
---|---|---|

SID001 | TID001 | 0.41 |

SID001 | TID002 | 0.03 |

\(\vdots\) | \(\vdots\) | \(\vdots\) |

with only rows for which the weight is non zero included. Omitted source/target ID combinations are assumed to have a weight of zero.

Here is some R code to create functions for these weighting schemes:

```
bisquare <- function(x,h) ifelse(x<h,(1 - (x/h)^2)^2,0)
bicube <- function(x,h) ifelse(x<h,(1 - (x/h)^3)^3,0)
boxcar <- function(x,h) ifelse(x<h,1,0)
linear <- function(x,h) ifelse(x<h,1-x/h,0)
cossquare <- function(x,h) ifelse(x < h, cos(x*pi/(h*2))^2, 0)
```

Next, here is a helper function to create a sparse weight table, given two â€˜simplefeaturesâ€™ (`sf`

) objects. You will need to specify the ID variables and the name of each of the objects, and the winsow size \(h\). Donâ€™t worry too much about how it works for now. The function `wt_tbl`

will be treated as a method (here for example it is just defined for `sf`

objects, but later a matrix method will be defined).

```
wt_tbl <- function(x,...) UseMethod("wt_tbl")
wt_tbl.sf <- function(sfx,sfy,idx,idy,
wt=wt,dmax=Inf,wfun = bisquare, h, ...) {
idx <- enquo(idx)
idy <- enquo(idy)
wt =enquo(wt)
dmat <- st_distance(sfx,sfy,...) %>% units::drop_units()
colnames(dmat) <- sfy %>% select(!!idy) %>%
st_set_geometry(NULL) %>% .[[1]]
rownames(dmat) <- sfx %>% select(!!idx) %>%
st_set_geometry(NULL) %>% .[[1]]
res <- dmat %>% as_tibble(rownames=quo_name(idx)) %>%
gather(key=!!idy,value= !! wt, - !! idx) %>% filter(!! wt <= dmax)
if (missing(h)) {
kfun <- partial(wfun,h=dmax) }
else {
kfun <- partial(wfun,h=h)
}
res %>% mutate(!! wt := kfun(!! wt))
}
```

Next, it can be demonstrated. Read in some data relating to house prices and convert it to an `sf`

object.

```
library(sf)
hp <- read_csv('hpdemo.csv') %>%
rename(house_ID=ID) %>%
mutate(price = price/1000, house_ID=sprintf('H%04d',house_ID)) %>%
st_as_sf(coords=c('east','north'),crs=27700)
head(hp)
```

```
Simple feature collection with 6 features and 3 fields
geometry type: POINT
dimension: XY
bbox: xmin: 514600 ymin: 169200 xmax: 547900 ymax: 189600
epsg (SRID): 27700
proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
# A tibble: 6 x 4
house_ID price fl_area geometry
<chr> <dbl> <int> <POINT [m]>
1 H0001 107 50 (523800 179700)
2 H0002 55.5 66 (533200 170900)
3 H0003 103 90 (514600 175800)
4 H0004 187 125 (516000 171000)
5 H0005 43 50 (533700 169200)
6 H0006 70.0 95 (547900 189600)
```

A straighforward map is now possible: