Longitudinal Clustering of Communities

Methodology and Initial Results (on ACS data)

Delmelle et al.

2024-06-01

Objectives

Motivation of the grant
- aims 1 & 2
Typology of neighborhood over time
- observations or trajectories?
- kmeans, kml3d
Initial results
- hexagons? … and computational costs
Time for comments

Motivation of the grant

aim 1
- [‘…characterize change in neighborhood social & build environment variables relevant to cognition…’]
- Physical function, cognitive function, ability to age in place
aim 2
- [‘…association of neighborhood characteristics with cognition…’]

Characterize neighborhoods (1/4)

Characterize neighborhoods (2/4)

Characterize neighborhoods (3/4)

Characterize neighborhoods (4/4)

Typology of neighborhood over time

How can we characterize age-friendly communities?
Are there distinct clusters of neighborhood aging resources?
Have the resources within clusters changed over time, and how?
Data will reflect the three domains:
- physical function, cognitive function, ability to age in place
- support: census tract, grid, or hexagon

`kmeans` algorithm

Observations for each geographic unit \(i\) in matrix \(\mathbf {x}\)
- changes over time, … different variables
- rescale your data (e.g. between 0 and 1)
Find \(k\) (\(≤\) n) clusters \(S = \{S_{1}, S_{2}, ..., S_{k}\}\) so that sum of square within the cluster is minimized \(\displaystyle \mathop {\operatorname {arg\,min} } _{\mathbf {S} }\sum _{i=1}^{k}\sum _{\mathbf {x} \in S_{i}}\left\|\mathbf {x} -{\boldsymbol {\mu }}_{i}\right\|^{2}\) with
\(\boldsymbol {\mu _{i}}={\frac {1}{|S_{i}|}}\sum_{\mathbf {x} \in S_{i}}\mathbf {x}\)
Number of clusters \(k\)? Calinski-Harabasz stopping rule

Illustration - ACS data for NC (1/2)

Proportion white population, median household income

vars <- c("B03002_003", "B01001_001", "B19013_001")  
years <- c(seq(2010, 2019,1))
get_acs_with_year <- function(year) {
  get_acs(
    geography = "tract",state = "NC",
    variables = vars, year = year,
    survey = "acs5"
  ) %>%
  mutate(variable = paste(variable, year, sep = "_"))
}

Illustration - ACS data for NC (2/2)

`kmeans` algorithm (1/2)

Use of 5 clusters

kmeans_result <- kmeans(select(rescaled_vars, -GEOID, -year), centers = 5)

rescaled_data_long <- rescaled_data_long %>%
  left_join(rescaled_vars %>% 
  mutate(cluster = as.factor(kmeans_result$cluster)), 
  by = c("GEOID", "year"))
}

`kmeans` algorithm (2/2)

Plotting results geographically

Visualizing transition (1/2)

Neighborhood can swap cluster to another

Visualizing transition (2/2)

spot the 5 differences!!!

`kml3d` algorithm

Neighborhoods cannot ‘jump’ from one cluster to another
Clustering is conducted on trajectories, rather than observations
Can be very slow, especially with many trajectories
Number of clusters \(k\)? Calinski-Harabasz stopping rule
Reference: (Genolini et al. 2015)

`kml3d` results

Calinski score with number of clusters

`kml3d` results

Trajectories with proportion white

`kml3d` results

Trajectories with median HH income

`kml3d` results

Plotting results geographically

Hexagons (1/4)

Smoother transition -20,000 meters and 10,000 meters

Hexagons (2/4)

Hexagons (3/4)

Hexagons (4/4)

Computation time

Wrap up

Aims
Typology of neighborhood over time
- observations or trajectories?
- kmeans, kml3d
Initial results with ACS data

Questions (1/2)

What does the group see as the advantages and disadvantages of the clusters over time versus the time-clusters?
Which conceptualization of neighborhood/space appeal to you? Why?
Are there other techniques that you might recommend? Why?
- LGM?, DTW?

Questions (2/2)

Do we conceptualize these clusters independent of the three intended outcomes?
What approaches to the actual neighborhood data will make sense?
How to handle expected correlation between the features?
How do we want to think about age composition?

Next step

Theory/background needed for the three domains (Yvonne/Jana)
Examination of disparities by sociodemographic/economic conditions
Linkage to individual participants
Paper proposal
Extend to 48 states
Use of computation cluster

References

Some useful references on kml and kml3d are (Genolini, Falissard, and Kiener 2023), (Genolini et al. 2015), (Genolini et al. 2013)
A couple of good reads using these approaches in (Delmelle 2016) and (Séguin et al. 2016)

Delmelle, Elizabeth C. 2016. “Mapping the DNA of Urban Neighborhoods: Clustering Longitudinal Sequences of Neighborhood Socioeconomic Change.” Annals of the American Association of Geographers 106 (1): 36–56.

Genolini, Christophe, Xavier Alacoque, Mélanie Sentenac, and Catherine Arnaud. 2015. “KML: A Package to Cluster Longitudinal Data.” Journal of Statistical Software 65 (4): 1–34. https://doi.org/10.18637/jss.v065.i04.

Genolini, Christophe, Bruno Falissard, and Patrice Kiener. 2023. kml: K-Means for Longitudinal Data. https://CRAN.R-project.org/package=kml.

Genolini, Christophe, Jean-Baptiste Pingault, Tarak Driss, Sylvana Côté, Richard E Tremblay, Franck Vitaro, Catherine Arnaud, and Bruno Falissard. 2013. “KmL3D: A Non-Parametric Algorithm for Clustering Joint Trajectories.” Computer Methods and Programs in Biomedicine 109 (1): 104–11.

Séguin, Anne-Marie, Philippe Apparicio, Mylène Riva, and Paula Negron-Poblete. 2016. “The Changing Spatial Distribution of Montreal Seniors at the Neighbourhood Level: A Trajectory Analysis.” Housing Studies 31 (1): 61–80.

Longitudinal Clustering of Communities

Objectives

Motivation of the grant

Characterize neighborhoods (1/4)

Characterize neighborhoods (2/4)

Characterize neighborhoods (3/4)

Characterize neighborhoods (4/4)

Typology of neighborhood over time

kmeans algorithm

Illustration - ACS data for NC (1/2)

Illustration - ACS data for NC (2/2)

kmeans algorithm (1/2)

kmeans algorithm (2/2)

Plotting results geographically

Visualizing transition (1/2)

Visualizing transition (2/2)

kml3d algorithm

kml3d results

kml3d results

kml3d results

kml3d results

Plotting results geographically

Plotting results geographically

Hexagons (1/4)

Hexagons (2/4)

Hexagons (3/4)

Hexagons (4/4)

Computation time

Wrap up

Questions (1/2)

Questions (2/2)

Next step

References

`kmeans` algorithm

`kmeans` algorithm (1/2)

`kmeans` algorithm (2/2)

`kml3d` algorithm

`kml3d` results

`kml3d` results

`kml3d` results

`kml3d` results