The goal at this step is to estimate the number of jobs and its type (by income decile) at a lowest possible geographic level (block) to aggregate them later into different spatial scales, e.g. post code, TAZ, uniform grid, etc. This information represents the ‘attractiveness’ of each zone in accessibility measures (e.g. cumulative, gravity, etc.).
For this purpose, the main sources of information are the following:
The information provided by the Census contains general figures, such as:
The information provided by the Census is collected at the individual address level. However, this information is aggregated at different levels due to confidentiality. In the metropolitan area, about 90% of the occupied personnel is referenced at the block level, the rest is aggregated at larger units namely census tract, locality or municipality.
The Denue contains disaggregated information at the point level for all economic units. The information included for each unit essentially contains: (1) location (X-Y coordinates); (2) number of occupied personnel by category (e.g. 0-5, 10-30, … and ‘more than 251’); and (3) industry, e.g. construction, manufacturing, etc., following the North American Classification System, NAICS).
This includes the geographic definition for the different aggregation areas, i.e. municipality, locality, census tract (ageb), and block. It should be noted that it distinguishes urban from rural geometries. When a locality is larger than 2.5K inhabitants it is considered as urban and it includes the geometries at the block level (this is the case for the vast majority in of areas in Mexico City Metro Area). Rural localities are represented by points and do not contain block geometries. For this reason, I buffered a square shape around each of these points and created one spatial data frame including urban and rural geometries since the Census collects information from both types of areas.
As mentioned in the past revision, there are various possible approaches to estimate the missing figures at the block level, e.g. OLS, iterative proportional fitting, geographic centroid, etc.
I opted to use an approach that was transparent and relatively straightforward. In a nutshell, it first allocates as much data as possible from the Census referenced at the block level. Then, it allocates the aggregated figures to each of the blocks by proportionally adjusting these numbers following the Denue structure. The general steps are:
The approach is detailed in the subsequent sections.
The following chunk first organizes the point data level from the Denue by block. Then, it joins the Denue and Census data to the block geometries.
# Identify the block that contains the Denue points
# Check Denue points that match block's id geometries
# otherwise, spatially join nearest block
denue_join <- denue_2015_sf %>%
filter(!cvegeo %in% blocks_ur_rur$cvegeo) %>%
st_join(blocks_ur_rur[ , "cvegeo"], join = st_nearest_feature) %>%
dplyr:: select(id, cvegeo.y)
# remove geometry
denue_join <- st_set_geometry(denue_join, NULL)
# create new column with joined blocks
denue_2015 <- denue_2015_sf %>%
st_set_geometry(NULL) %>%
left_join(denue_join, by = "id", "id") %>%
mutate(cvegeo.z = ifelse(is.na(cvegeo.y), cvegeo, cvegeo.y))
# Summarize DENUE in blocks by size of economic units
# Aggregate number of economic units by block and size
denue_size_block <- denue_2015 %>%
group_by(cvegeo.z, descripcion_estrato_personal) %>% # Group by block and division
dplyr::summarise(un_ec = n()) %>%
pivot_wider(names_from = descripcion_estrato_personal,
values_from = un_ec,
names_prefix = "un_ec_") %>%
mutate_at(vars(2: ncol(.)), replace_na, 0) %>% # replace NAs with 0s
dplyr::rename_all( funs(gsub(" ", "_", .))) %>% # replace space by _ in names
dplyr::rename(cvegeo = cvegeo.z) %>%
mutate(un_ec_tot = rowSums(.[2:ncol(.)])) # create a column with total
# Percentage of economic units by size per block
denue_size_block_pct <- denue_size_block %>%
mutate_at(vars(un_ec_0_a_5_personas:un_ec_101_a_250_personas),
funs(round( . / un_ec_tot, 6))) %>% # compute proportion
mutate(un_ec_tot = rowSums(.[2:8])) %>% # sum of pct
dplyr::rename_at(vars(un_ec_0_a_5_personas:un_ec_tot), funs(paste0(., "_pct")))
# Join Census and Denue to block geometries
# join Census data only in urban geometries
blocks_data <- blocks_ur_rur %>%
st_set_geometry(NULL) %>%
filter(geografico != "lrur") %>% # omit rural blocks
dplyr::select(-fechaact) %>%
left_join(census_by_block[ , c(1, 11:12, 18, 39, 42)], # census data
by = "cvegeo", "cvegeo")
# join rural blocks
# create census summary of rural localities at ageb level
census_by_loc_rural_loc <-
census_by_block[!census_by_block$cvegeo %in%
blocks_data$cvegeo &
!is.na(census_by_block$cvegeo),
c(3, 11:12, 18, 39, 42)] %>%
group_by(cve_loc) %>%
dplyr::summarise(unidades_economicas = sum(unidades_economicas),
personal_ocupado_total = sum(personal_ocupado_total),
personal_remunerado_total = sum(personal_remunerado_total),
total_remuneraciones = sum(total_remuneraciones),
produccion_bruta_total = sum(produccion_bruta_total))
blocks_data_loc <- blocks_ur_rur %>%
st_set_geometry(NULL) %>%
filter(geografico == "lrur") %>%
dplyr::select(-fechaact) %>%
left_join(census_by_loc_rural_loc,
by = "cve_loc", "cve_loc") # census data
# Bind the rural and urban geometries in one data set
blocks_data_all <- bind_rows(blocks_data, blocks_data_loc)
# join denue data
blocks_data_all <- blocks_data_all %>%
left_join(denue_size_block, by = "cvegeo", "cvegeo")
# remove block geometries that do not contain economic units
blocks_data_all <- blocks_data_all %>%
# keep if at least one of these contain data
filter(!is.na(unidades_economicas) | !is.na(un_ec_tot))
# Rename by adding prefix of data source
blocks_data_all <- blocks_data_all %>%
dplyr::rename_at(vars(unidades_economicas:produccion_bruta_total),
~ paste0("cens_", .)) %>%
dplyr::rename_at(vars(un_ec_0_a_5_personas:un_ec_tot),
~ paste0("den_", .))
# Clean Global Env.
gc(reset = T)
The general figures allocated at the block level so far look as follows:
# Total number of economic units in Census
sum(census_by_block$unidades_economicas)
## [1] 817973
# Versus number of economic units allocated in blocks...That is 88%
sum(blocks_data_all$cens_unidades_economicas, na.rm = TRUE)
## [1] 723601
# Total number of economic units in Census
sum(census_by_block$personal_ocupado_total)
## [1] 5083414
# Total number of occupied personal allocated in blocks...That is 92%
sum(blocks_data_all$cens_personal_ocupado_total, na.rm = TRUE)
## [1] 4667532
# Number of economic units in Denue
nrow(denue_2015)
## [1] 905192
#Number of economic units from Denue allocated in blocks
sum(blocks_data_all$den_un_ec_tot, na.rm = TRUE)
## [1] 905192
As mentioned above, the aggregated data from the Census is grouped at various levels. This is an overview:
# Overview aggregated data for occupied personnel in Census
# All oc. pe. that is aggregated
filter(census_by_block, grepl("C$", manzana)) %>%
{sum(.$personal_ocupado_total)}
## [1] 415882
# Aggregated at municipality level
filter(census_by_block, grepl("CCC$", manzana)) %>%
{sum(.$personal_ocupado_total)}
## [1] 125286
# Aggregated at locality level
filter(census_by_block,
grepl("CC$", manzana) &
!grepl("C$", loc) & !is.na(loc)) %>%
{sum(.$personal_ocupado_total)}
## [1] 5776
# aggregated at ageb level
filter(census_by_block, grepl("C$", manzana)
& !grepl("C$", ageb)) %>%
{sum(.$personal_ocupado_total)}
## [1] 284820
As shown, about 70% of the aggregated data is referenced at the census tract level, 1% at the locality, and 30% at the municipal.
Next, I created three different datasets for the aggregated data in the Census at their respective level.
## Create datasets with aggregated data at different levels
# Data aggregated at municipal level
census_mun_agg <- census_by_block %>%
filter(grepl("CCC$", manzana)) %>%
dplyr::select(cve_ent_mun, unidades_economicas, personal_ocupado_total, total_remuneraciones, produccion_bruta_total)
sum(census_mun_agg$personal_ocupado_total)
## [1] 125286
# Data aggregated at locality level
census_loc_agg <- census_by_block %>%
filter(grepl("CC$", manzana) & !grepl("C$", loc) & !is.na(loc)) %>%
mutate(cve_loc = paste0(entidad, mun, loc)) %>%
dplyr::select(cve_loc, unidades_economicas, personal_ocupado_total, total_remuneraciones, produccion_bruta_total)
sum(census_loc_agg$personal_ocupado_total)
## [1] 5776
# Data aggregated at ageb level
census_ageb_agg <- census_by_block %>%
filter(grepl("C$", manzana) & !grepl("C$", ageb)) %>%
mutate(cve_ageb = paste0(entidad, mun, loc, ageb)) %>%
dplyr::select(cve_ageb, unidades_economicas, personal_ocupado_total, total_remuneraciones, produccion_bruta_total)
sum(census_ageb_agg$personal_ocupado_total)
## [1] 284820
# Validate subsets
sum(census_by_block$personal_ocupado_total) ==
filter(census_by_block, !grepl("C$", manzana)) %>%
{sum(.$personal_ocupado_total)} + # disaggregated
sum(census_mun_agg$personal_ocupado_total) + # agg. at mun level
sum(census_loc_agg$personal_ocupado_total) + # agg. at loc level
sum(census_ageb_agg$personal_ocupado_total) # agg. at ageb level
## [1] TRUE
Identify level of aggregation of each block
To assign the aggregated data it is necessary to identify the level of aggregation of the ‘empty’ geometries. This can be at: (1) the block level, (2) census tract, (3) locality, or (4) municipal.
To do this, it is inferred that the blocks that are not known at the X level, are aggregated at the next geographic hierarchy, e.g. if it is not known at the block nor the census tract, it is aggregated at the locality, and so on.
# Identify known data by geographic level
# Localities known (this is known only at locality level)
known_loc <- census_by_block %>%
filter(!grepl("C$", loc) & grepl("C$", ageb) & !is.na(loc)) %>%
mutate(cve_loc = paste0(entidad, mun, loc)) %>%
distinct(cve_loc) %>% pull()
# Agebs known (this is known at ageb level)
known_ageb <- census_by_block %>%
filter(!grepl("C$", ageb) & grepl("C$", manzana)) %>%
mutate(cve_ageb = paste0(entidad, mun, loc, ageb)) %>%
distinct(cve_ageb) %>% pull()
# blocks known (this is known at block level)
known_block <- census_by_block %>%
filter(!grepl("C$", manzana)) %>%
distinct(cvegeo) %>% pull()
# Identify unknown data by level
# Create column indicating level of aggregation
blocks_data_all <- blocks_data_all %>%
mutate(agg_level = case_when(
# The data is not known at the locality level, ageb or block level.
# Therefore, it is agg. at municipal level
!cve_loc %in% known_loc & !cve_ageb %in% known_ageb & !cvegeo %in% known_block ~ "mun",
# The data is known at locality level only, and not at the ageb or block level.
# Therefore, it is agg. at the locality level.
cve_loc %in% known_loc & !cve_ageb %in% known_ageb & !cvegeo %in% known_block ~ "loc",
# The data is known at the ageb level only, and not at the block level.
# Therefore, it is agg. at the ageb level
cve_ageb %in% known_ageb & !cvegeo %in% known_block ~ "ageb",
# The data is known at the block level.
# Therefore, it is agg. at the block level.
cvegeo %in% known_block ~ "block"
))
# The aggregation levels of data are:
table(blocks_data_all$agg_level)
##
## ageb block loc mun
## 38397 63271 817 8066
Function to distribute the figures of the census aggregated at various levels in their respective blocks
In essence the function:
# Load proportional assignment function
prop_assignment <- function(agg_data, level, agg_unit, by_agg_unit) {
agg_unit <- enquo(agg_unit)
# Join aggregated data at the respective level
blocks_agg_data <- blocks_data_all %>%
filter(agg_level == level & is.na(cens_unidades_economicas)) %>%
left_join(agg_data,
by = setNames(by_agg_unit, by_agg_unit))
# Adjust number of economic units in each block according to the census
blocks_agg_data <- blocks_agg_data %>%
# Add sum of ec. un. by aggregation unit
left_join(group_by(blocks_agg_data, !! agg_unit) %>%
dplyr::summarise(sum_ec_un = sum(den_un_ec_tot)),
by = setNames(by_agg_unit, by_agg_unit)) %>%
# Arrange data by size group of ec. un. in each block
pivot_longer(cols = den_un_ec_0_a_5_personas: den_un_ec_101_a_250_personas,
values_to = "den_un_ec", names_to = "size") %>%
# Re-scale number of ec. un. according to denue
mutate(adj_ec_un = (den_un_ec / sum_ec_un) * unidades_economicas)
# Assign occupied personnel based on adjusted ec. un. and weight
blocks_agg_data <- blocks_agg_data %>%
mutate(size = sub("den_un_ec_", "", size)) %>%
mutate(theo_oc = case_when(
size == "0_a_5_personas" ~ adj_ec_un * 2.5,
size == "6_a_10_personas" ~ adj_ec_un * 8,
size == "11_a_30_personas" ~ adj_ec_un * 20.5,
size == "31_a_50_personas" ~ adj_ec_un * 40.5,
size == "51_a_100_personas" ~ adj_ec_un * 75.5,
size == "101_a_250_personas" ~ adj_ec_un * 175.5,
size == "251_y_más_personas" ~ adj_ec_un * 375.5,
TRUE ~ 0))
# Estimate oc. pe. based on Denue and adjusted to Census data
blocks_agg_est <- blocks_agg_data %>%
# Add sum of theoretical occupied personnel (weights) by aggregation unit
left_join(group_by(blocks_agg_data, !! agg_unit) %>%
dplyr::summarise(sum_theo_oc = sum(theo_oc)),
by = setNames(by_agg_unit, by_agg_unit)) %>%
# Remove rows that do not have any information from the economic census
filter(!is.na(personal_ocupado_total)) %>%
# Re-scale occupied personnel based on census
mutate(est_pe_oc = (theo_oc / sum_theo_oc) * personal_ocupado_total,
# Compute average size of economic units
avg_est_size = case_when(
adj_ec_un < 1 ~ est_pe_oc,
est_pe_oc > 0 ~ est_pe_oc / adj_ec_un,
TRUE ~ 0)
) %>%
# Adjust when large economic units bias the smaller units,
# causing over estimating the total occupied personnel
# define limit of economic units size
mutate(limit_unit_size = case_when(
size == "0_a_5_personas" ~ 5,
size == "6_a_10_personas" ~ 10,
size == "11_a_30_personas" ~ 30,
size == "31_a_50_personas" ~ 50,
size == "51_a_100_personas" ~ 100,
size == "101_a_250_personas" ~ 250,
size == "251_y_más_personas" ~ Inf,
)) %>%
#Add the number of economic units larger than 251 within the geographic unit
left_join(filter(., size == "251_y_más_personas") %>%
group_by(., !! agg_unit) %>%
dplyr::summarise(n_ec_un_larger_251 = sum(adj_ec_un)),
by = setNames(by_agg_unit, by_agg_unit)) %>%
# Compute estimated pe oc conditioned to size only if
# the aggregation unit contains large economic units
mutate(est_pe_oc_adjusted = ifelse(avg_est_size > limit_unit_size & n_ec_un_larger_251 > 0,
theo_oc, est_pe_oc),
# Indicate whether there were changes in the area
adjusted_by_size = est_pe_oc != est_pe_oc_adjusted) %>%
# If the sum of the estimated pe oc conditioned by size is smaller than the agg. number of oc. pe. in census
# Compute the difference, and divide it by the number of ec. units larger than 251
# Include the sum of the new estimate of oc. pe.
left_join(filter(., size != "251_y_más_personas") %>% # Deducting large units
group_by(., !! agg_unit) %>%
dplyr::summarise(sum_est_pe_oc_adjusted = sum(est_pe_oc_adjusted, na.rm = T),
sum_adjusted_by_size = sum(adjusted_by_size)),
by = setNames(by_agg_unit, by_agg_unit)) %>%
# Compute the diff. between census and sum of the estimates if
# there were changes in the area due to constrain in size
mutate(diff = personal_ocupado_total - sum_est_pe_oc_adjusted) %>%
# Divide the difference by the n. of ec un larger than 251
# And assign it to units larger than 251 if...
mutate(pe_oc_adjusted_2 = case_when(
adj_ec_un > 0 &
sum_adjusted_by_size > 0 &
diff > 0 &
size == "251_y_más_personas" ~
(diff / n_ec_un_larger_251) * adj_ec_un,
TRUE ~ est_pe_oc_adjusted
)) %>%
# Add total sum by municipality (for checks)
left_join(group_by(., !! agg_unit) %>% dplyr::summarise(sum_pe_oc_adjusted_2 = sum(pe_oc_adjusted_2))) %>%
mutate(valid_census_vs_est = personal_ocupado_total == round(sum_pe_oc_adjusted_2, 0)) %>%
# Distribute paid salaries according to n. of oc. pe. (assuming all oc. pe. in the area have same salary)
mutate(est_remuner = (total_remuneraciones / personal_ocupado_total) * pe_oc_adjusted_2)
# Re-structure data by block
blocks_agg_est <- blocks_agg_est %>%
group_by(cvegeo) %>%
dplyr::summarise(ec_un = sum(adj_ec_un, na.rm = T),
oc_pe = sum(pe_oc_adjusted_2, na.rm = T),
remun_tot = sum(est_remuner, na.rm = T)
) %>%
# Create aggregation codes
mutate(cve_ent_mun = str_sub(cvegeo, 1, 5),
cve_loc = str_sub(cvegeo, 1, 9),
cve_ageb = str_sub(cvegeo, 1, 13)) %>%
# Organize variables
dplyr::select(cvegeo, cve_ent_mun:cve_ageb, everything())
# Result
return(blocks_agg_est)
}
Later, the function is applied for each of the datasets for the different levels of aggregation.
# Apply function
est_pe_oc_mun <- prop_assignment(census_mun_agg, "mun", cve_ent_mun, "cve_ent_mun")
est_pe_oc_loc <- prop_assignment(census_loc_agg, "loc", cve_loc, "cve_loc")
est_pe_oc_ageb <- prop_assignment(census_ageb_agg, "ageb", cve_ageb, "cve_ageb")
# Three agebs not assigned because these contained data at the block level already.
# These are actually rural localities. Thus, the values can be added directly.
not_joined_agebs <- census_ageb_agg[ , c(1, 3, 4)] %>%
left_join(
group_by(est_pe_oc_ageb, cve_ageb) %>%
dplyr::summarise(oc_pe = sum(oc_pe, na.rm = T),
remun_tot = sum(remun_tot, na.rm = T))) %>%
mutate(diff = round(oc_pe - personal_ocupado_total, 2),
remun_diff = round(total_remuneraciones - remun_tot , 2)) %>%
bind_rows(summarise_all(., funs(if(is.numeric(.)) sum(., na.rm = T) else "Total"))) %>%
filter(is.na(oc_pe)) %>% pull(cve_ageb)
census_ageb_agg <- as.data.frame(census_ageb_agg)
# Add the aggregated values to the rural locality
for( x in not_joined_agebs) {
# Economic units
blocks_data_all[blocks_data_all$cve_ageb == x, "cens_unidades_economicas"] <-
blocks_data_all[blocks_data_all$cve_ageb == x, "cens_unidades_economicas"] +
census_ageb_agg[census_ageb_agg$cve_ageb == x, "unidades_economicas"]
# Occupied personnel
blocks_data_all[blocks_data_all$cve_ageb == x, "cens_personal_ocupado_total"] <-
blocks_data_all[blocks_data_all$cve_ageb == x, "cens_personal_ocupado_total"] +
census_ageb_agg[census_ageb_agg$cve_ageb == x, "personal_ocupado_total"]
# Paid salaries
blocks_data_all[blocks_data_all$cve_ageb == x, "cens_total_remuneraciones"] <-
blocks_data_all[blocks_data_all$cve_ageb == x, "cens_total_remuneraciones"] +
census_ageb_agg[census_ageb_agg$cve_ageb == x, "total_remuneraciones"]
}
# Bind all estimates in one data frame
est_pe_oc_all <- bind_rows(est_pe_oc_mun, est_pe_oc_loc, est_pe_oc_ageb) %>%
dplyr::select(-cve_ent_mun:- cve_ageb)
# Join estimates to geometries containing the Census figures
blocks_data_all <- blocks_data_all %>%
left_join(est_pe_oc_all, by = "cvegeo", "cvegeo") %>%
mutate(est_ec_un = coalesce(cens_unidades_economicas, ec_un),
est_pe_oc = coalesce(cens_personal_ocupado_total, oc_pe),
est_remun_tot = coalesce(cens_total_remuneraciones, remun_tot))
Finally, the estimates are validated by summing the values at the block level and comparing them with the aggregates reported in the Census. Also, I estimated the average income for each block by computing the ratio of total paid salary and the total occupied personnel. This estimate includes all occupied personnel (not only remunerated personnel).
## Validate estimates
# Check results
# Global sums
# Total economic units
sum(blocks_data_all$est_ec_un, na.rm = T)
## [1] 817973
# vs
sum(census_by_block$unidades_economicas)
## [1] 817973
# Total occupied personnel
sum(blocks_data_all$est_pe_oc, na.rm = T)
## [1] 5083414
# vs
sum(census_by_block$personal_ocupado_total)
## [1] 5083414
# paid salaries
sum(blocks_data_all$est_remun_tot, na.rm = T)
## [1] 449577497
# vs
sum(census_by_block$total_remuneraciones)
## [1] 449577497
# All data is now allocated at block level!!
# Compute average income by block
# This estimate includes all occupied personnel (not only remunerated personnel)
# I consider this is more appropriate for our purposes
blocks_data_all <- blocks_data_all %>%
mutate(avg_income = case_when(
est_remun_tot == 0 ~ 0,
est_pe_oc == 0 ~ NA_real_,
TRUE ~ est_remun_tot / est_pe_oc
))
# Round estimate decimals to 6
blocks_data_all <- blocks_data_all %>%
mutate_at(vars(est_ec_un:avg_income), round, 6)
Once the data from the Census is allocated in block geometries, these figures are re-aggregated at different geographic unit scales, namely hexagonal grid of 500 m, 1 km, 2 km, 4 km and post code areas. To do so, I employed a proportional overlay function. This function assigns integer values by the proportion of the overlapping areas. For instance, if one block overlaps 50% of its area in the geometries, that same proportion is allocated in the receiver area. This assumes that the data contained in the blocks is uniform.
Since the uniform grids cover 100% of the territory, all the data is captured. By contrast, the post codes do not cover all the territory. Therefore, some official values are left out.
Load proportional overlay function.
# Load proportional overlay function
apportion <- function(xx, yy,
# Smallest total overlap of each xx object from which it would transfer data in sq metres,
low_lim_sqm2 = 60,
# Smallest total overlap of xx object from which it would transfer data in percentage
low_lim_percent = 0.05,
# Smallest total overlap of xx object that would be rescaled to 1,
# to transfer all data from xx to yy
low_2rescale = 0.70) {
# Transform to metric CRS
# Set CRS
myCRSutm <- "+proj=utm +zone=14 +datum=WGS84 +units=m +no_defs"
myCRSlonglat <- "+proj=longlat +zone=14 +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
print("Transforming CRS...")
xx <- st_transform(xx, myCRSutm)
yy <- st_transform(yy, myCRSutm)
# Operations to transfer data from xx (giver area) to yy (receiver area)
# Calculate xx areas
xx$xx_area_m2 <- st_area(xx) %>% as.numeric()
print("Processing spatial intersection...")
# Intersect xx and yy (takes time)
intersection <- st_intersection(xx, yy)
# Calculate area of xx in yy
intersection$inters_area_m2 <- st_area(intersection) %>% as.numeric()
# Calculate the overlap percentage of each intersection over the original (complete) area of xx
intersection$per_overlpa <- intersection$inters_area_m2 / intersection$xx_area_m2 %>% as.numeric()
# Transform intersection to non-spatial object
intersection_nonspatial <- st_set_geometry(intersection, NULL)
print("Processing after intersection...")
# Generate summaries of the intersection by xx object
intersection_sums_by_xx <- intersection_nonspatial %>%
dplyr::group_by(cvegeo) %>%
dplyr::summarise(sum_overlap_m2 = sum(inters_area_m2), sum_per_overlap = sum(per_overlpa))
# First condition. Exclude complete xx gemetries from the analysis if the sum of the xx intersection on yy is less than <low_lim_sqm2> AND if the sum of the intersection is lower than <low_lim_percent>
# Create key to identify xx areas to be excluded
xx_exclude <- intersection_sums_by_xx %>%
mutate(cond_1 = if_else(sum_per_overlap < low_lim_percent &
sum_overlap_m2 < low_lim_sqm2, TRUE, FALSE)) %>%
filter(cond_1 == TRUE) %>% pull(cvegeo)
# Create subset excluding intersections under the first condition
intersection_nonspatial <- subset(intersection_nonspatial,
!(intersection_nonspatial$cvegeo %in% xx_exclude))
# Second condition. If the total percentage of overlap is more than low_2rescale, then, re-scale to 1. This is to transfer the total values of XX.
# If it is not, transfer the data according to the simple overlap proportion
# Calculate the rescaling factor for objects under second condition
# To obtain the total overlapping area of xx objects in each intersection
intersection_nonspatial <- left_join(intersection_nonspatial, intersection_sums_by_xx) %>%
# Calculate the rescaling proportion if xx objects are larger than xx_2rescale_key:
# each intersection / sum of the xx overlap
mutate(assign_fac = if_else(sum_per_overlap > low_2rescale, per_overlpa / sum_per_overlap, per_overlpa))
# Transfer data according to the assignment factor
data_in_yy <- intersection_nonspatial %>%
mutate_at(vars(est_ec_un, est_pe_oc, est_remun_tot),
.funs = list( ~ . * intersection_nonspatial$assign_fac))
data_in_yy <- data_in_yy %>%
dplyr:: group_by(id) %>%
dplyr::summarise_at(vars(est_ec_un, est_pe_oc, est_remun_tot), sum, na.rm = T)
# Estimate average income level weighted by number of oc pe
avg_income <- intersection_nonspatial %>%
mutate(pe_oc_est = est_pe_oc * assign_fac) %>% # To estimate the pe oc in each intersection
dplyr:: group_by(id) %>%
# Median of average income weighted by n of pe oc
dplyr:: summarise(avg_income = median(rep(avg_income, times = pe_oc_est), na.rm = T))
# Results
# Join average income
data_in_yy <- left_join(data_in_yy, avg_income)
# Round decimals
data_in_yy <- mutate_at(data_in_yy, vars(est_pe_oc:avg_income), round, 6)
# Join the data to the spatial objects in yy
data_in_yy_spat <- left_join(yy, data_in_yy)
# Calculate yy areas
data_in_yy_spat$area_m2 <- st_area(data_in_yy_spat) %>% as.numeric()
# Estimate oc pe density
data_in_yy_spat <- mutate(data_in_yy_spat,
oc_pe_den_sq_km = ifelse(!is.na(est_pe_oc),
round(est_pe_oc / (area_m2/1000000), 4),
NA_real_
)) %>%
dplyr::select(-area_m2)
print("Transforming back to WGS")
# Transform back to WGS
data_in_yy_spat <- st_transform(data_in_yy_spat, myCRSlonglat)
return(data_in_yy_spat)
}
Apply function to aggregate the data from blocks to grids.
grids_data <- lapply(names(grids),
function(x) apportion(blocks_census_data, grids[[x]]))
names(grids_data) <- names(grids)
Validate results
As shown next, the sum of the figures in the receiver areas are the same as the source for the integer values. For average estimates, the medium value is slightly higher than the source.
# Source
apply(as.data.frame(blocks_census_data)[ ,c(4:6)], 2, sum, na.rm = T)
## est_ec_un est_pe_oc est_remun_tot
## 817973.0006 5083414.0001 449577497.0002
# Estimates
sapply(1:length(grids_data), function(x)
apply(as.data.frame(grids_data[[x]])[ ,c(2:4)], 2, sum, na.rm = T))
## [,1] [,2] [,3] [,4]
## est_ec_un 817973.0006 817973.0006 817973.0006 817972.4184
## est_pe_oc 5083414.0001 5083414.0001 5083414.0001 5083413.1705
## est_remun_tot 449577497.0002 449577497.0002 449577497.0002 449577496.4582
## [,5]
## est_ec_un 815083.0566
## est_pe_oc 5072677.9594
## est_remun_tot 449113057.9936
# Median of the average income
# Source
median(rep(blocks_census_data$avg_income, times = blocks_census_data$est_pe_oc), na.rm = T)
## [1] 46.981481
# Estimates
sapply(1:length(grids_data), function(x){
grid_data <- filter(grids_data[[x]], !is.na(est_pe_oc))
median(rep(grid_data$avg_income, times = grid_data$est_pe_oc), na.rm = T)})
## [1] 49.454558 49.516191 49.292275 48.015491 51.312910
Visualization of estimates
The following interactive map is an example of one of the grids (2 km) which shows the income by decile.
The following map grid shows the occupied personnel density for each of the different aggregation areas.
The last map grid shows the estimated income decile for each of the various aggregation units.