Longitudinal Clustering of Communities

Methodology and Initial Results (on ACS data)

Delmelle et al.

2024-06-01

Objectives

  • Motivation of the grant
    • aims 1 & 2
  • Typology of neighborhood over time
    • observations or trajectories?
    • kmeans, kml3d
  • Initial results
    • hexagons? … and computational costs
  • Time for comments

Motivation of the grant

  • aim 1
    • [‘…characterize change in neighborhood social & build environment variables relevant to cognition…’]
    • Physical function, cognitive function, ability to age in place
  • aim 2
    • [‘…association of neighborhood characteristics with cognition…’]

Characterize neighborhoods (1/4)

Characterize neighborhoods (2/4)

Characterize neighborhoods (3/4)

Characterize neighborhoods (4/4)

Typology of neighborhood over time

  • How can we characterize age-friendly communities?
  • Are there distinct clusters of neighborhood aging resources?
  • Have the resources within clusters changed over time, and how?
  • Data will reflect the three domains:
    • physical function, cognitive function, ability to age in place
    • support: census tract, grid, or hexagon

kmeans algorithm

  • Observations for each geographic unit \(i\) in matrix \(\mathbf {x}\)
    • changes over time, … different variables
    • rescale your data (e.g. between 0 and 1)
  • Find \(k\) (\(≤\) n) clusters \(S = \{S_{1}, S_{2}, ..., S_{k}\}\) so that sum of square within the cluster is minimized \(\displaystyle \mathop {\operatorname {arg\,min} } _{\mathbf {S} }\sum _{i=1}^{k}\sum _{\mathbf {x} \in S_{i}}\left\|\mathbf {x} -{\boldsymbol {\mu }}_{i}\right\|^{2}\) with
    \(\boldsymbol {\mu _{i}}={\frac {1}{|S_{i}|}}\sum_{\mathbf {x} \in S_{i}}\mathbf {x}\)
  • Number of clusters \(k\)? Calinski-Harabasz stopping rule

Illustration - ACS data for NC (1/2)

  • Proportion white population, median household income
vars <- c("B03002_003", "B01001_001", "B19013_001")  
years <- c(seq(2010, 2019,1))
get_acs_with_year <- function(year) {
  get_acs(
    geography = "tract",state = "NC",
    variables = vars, year = year,
    survey = "acs5"
  ) %>%
  mutate(variable = paste(variable, year, sep = "_"))
}

Illustration - ACS data for NC (2/2)

kmeans algorithm (1/2)

  • Use of 5 clusters
kmeans_result <- kmeans(select(rescaled_vars, -GEOID, -year), centers = 5)

rescaled_data_long <- rescaled_data_long %>%
  left_join(rescaled_vars %>% 
  mutate(cluster = as.factor(kmeans_result$cluster)), 
  by = c("GEOID", "year"))
}

kmeans algorithm (2/2)

Plotting results geographically

Visualizing transition (1/2)

  • Neighborhood can swap cluster to another

Visualizing transition (2/2)

spot the 5 differences!!!

kml3d algorithm

  • Neighborhoods cannot ‘jump’ from one cluster to another
  • Clustering is conducted on trajectories, rather than observations
  • Can be very slow, especially with many trajectories
  • Number of clusters \(k\)? Calinski-Harabasz stopping rule
  • Reference: (Genolini et al. 2015)

kml3d results

Calinski score with number of clusters

kml3d results

Trajectories with proportion white

kml3d results

Trajectories with median HH income

kml3d results

Plotting results geographically

Plotting results geographically

Hexagons (1/4)

  • Smoother transition -20,000 meters and 10,000 meters

Hexagons (2/4)

Hexagons (3/4)

Hexagons (4/4)

Computation time

Wrap up

  • Aims
  • Typology of neighborhood over time
    • observations or trajectories?
    • kmeans, kml3d
  • Initial results with ACS data

Questions (1/2)

  • What does the group see as the advantages and disadvantages of the clusters over time versus the time-clusters?
  • Which conceptualization of neighborhood/space appeal to you? Why?
  • Are there other techniques that you might recommend? Why?
    • LGM?, DTW?

Questions (2/2)

  • Do we conceptualize these clusters independent of the three intended outcomes?
  • What approaches to the actual neighborhood data will make sense?
  • How to handle expected correlation between the features?
  • How do we want to think about age composition?

Next step

  • Theory/background needed for the three domains (Yvonne/Jana)
  • Examination of disparities by sociodemographic/economic conditions
  • Linkage to individual participants
  • Paper proposal
  • Extend to 48 states
  • Use of computation cluster

References

Delmelle, Elizabeth C. 2016. “Mapping the DNA of Urban Neighborhoods: Clustering Longitudinal Sequences of Neighborhood Socioeconomic Change.” Annals of the American Association of Geographers 106 (1): 36–56.
Genolini, Christophe, Xavier Alacoque, Mélanie Sentenac, and Catherine Arnaud. 2015. “KML: A Package to Cluster Longitudinal Data.” Journal of Statistical Software 65 (4): 1–34. https://doi.org/10.18637/jss.v065.i04.
Genolini, Christophe, Bruno Falissard, and Patrice Kiener. 2023. kml: K-Means for Longitudinal Data. https://CRAN.R-project.org/package=kml.
Genolini, Christophe, Jean-Baptiste Pingault, Tarak Driss, Sylvana Côté, Richard E Tremblay, Franck Vitaro, Catherine Arnaud, and Bruno Falissard. 2013. “KmL3D: A Non-Parametric Algorithm for Clustering Joint Trajectories.” Computer Methods and Programs in Biomedicine 109 (1): 104–11.
Séguin, Anne-Marie, Philippe Apparicio, Mylène Riva, and Paula Negron-Poblete. 2016. “The Changing Spatial Distribution of Montreal Seniors at the Neighbourhood Level: A Trajectory Analysis.” Housing Studies 31 (1): 61–80.