Hospitals Data Exploration

Introduction

One of the critical inputs to the Estimated Ground Evacuation Time layer (EGET) is point locations of medical facilities. They represent the end points of the travel time analysis that forms the foundation of the dataset. Given that we are in the process of updating and improving EGET, we need to ensure that we are using the best and most relevant medical facility spatial data. One thing we do know is that we will be using this dataset:

Hospitals from the Homeland Infrastructure Foundation-Level Data (HIFLD), produced by the U.S. Department of Homeland Security’s Geospatial Management Office

There are several attributes within this dataset that will help determine whether or not individual medical facilities will be useful and should be used in the new generation of EGET. This document will provide a basic spatial and thematic overview of the data to inform a conversation about how best to use this dataset.

Overview

First, we can take a look at the spatial extent and distribution of the entire dataset:

# load the necessary libraries
library(sf)
library(rmarkdown)
library(knitr)
library(terra)
library(viridis)

# read in the data
hosp <- st_read("S:/ursa2/campbell/ground_evac/pilot_areas/pilot_area_1/data_in/hospitals/Hospitals.shp")

## Reading layer `Hospitals' from data source 
##   `S:\ursa2\campbell\ground_evac\pilot_areas\pilot_area_1\data_in\hospitals\Hospitals.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 8013 features and 32 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -176.6403 ymin: -14.29024 xmax: 145.7245 ymax: 71.29773
## Geodetic CRS:  WGS 84

state <- st_read("S:/ursa2/campbell/ground_evac/full_area/data_in/boundaries/census/tl_2022_us_state.shp")

## Reading layer `tl_2022_us_state' from data source 
##   `S:\ursa2\campbell\ground_evac\full_area\data_in\boundaries\census\tl_2022_us_state.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 56 features and 14 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.43979
## Geodetic CRS:  NAD83

# match coordinate systems
crs <- st_crs(state)
hosp <- st_transform(hosp, crs)

# map the entire dataset out
plot(st_geometry(hosp), pch = 16, cex = 0.25)
plot(st_geometry(state), border = "red", add = T)

# subset just conus facilities and map them out
abbv <- state$STUSPS
abbv.conus <- abbv[!abbv %in% c("AK", "AS", "HI", "MP", "GU", "PR", "VI")]
state.conus <- state[state$STUSPS %in% abbv.conus,]
hosp.conus <- st_intersection(hosp, state.conus)
plot(st_geometry(hosp.conus), pch = 16, cex = 0.25)
plot(st_geometry(state.conus), border = "red", add = T)

Geographic summary

In total, there are 8013 medical facilities in this dataset
Within the contiguous US (CONUS), there are 7858

Henceforth in this document, all counts will be based only on CONUS hospitals as that is the focal extent of EGET.

Attributes of interest

I will now take a look at some of the attributes within the dataset that may be of use in determining whether or not a medical facility should be included in the new generation of EGET. To be clear, these are attributes that I am interpreting to be of use. That said, I am not an expert in medical geography. So, I will also include the first 100 rows of the full table below for those interested in taking a deeper dive:

# print paged table of hospitals data
paged_table(hosp.conus[1:100,])

Attribute #1: “STATUS”

This should be an easy one. It simply defines whether a facility is open or closed. Assuming the data are reliable and updated, then we would want to remove closed facilities.

# create and print table of STATUS
df.status <- table(st_drop_geometry(hosp.conus$STATUS)) %>%
  as.data.frame()
colnames(df.status) <- c("STATUS", "Count")
kable(df.status)

STATUS	Count
CLOSED	377
OPEN	7481

# remove closed facilities from further consideration
hosp.open <- hosp.conus[hosp.conus$STATUS == "OPEN",]

After this filter, there are now 7481 facilities remaining.

Attributes #2 and #3: “TYPE” and “NAICS_DESC”

These are two attributes that provide some indication as to the types of services offered by the facilities. TYPE appears to be somewhat less descriptive than NAICS_DESC, but it’s not immediately clear which makes the most sense to use for filtering purposes. So, I will take a look at both, and how they relate to one another.

# create and print table of TYPE
df.type <- table(st_drop_geometry(hosp.conus$TYPE)) %>%
  as.data.frame()
colnames(df.type) <- c("TYPE", "Count")
df.type <- df.type[order(df.type$Count, decreasing = T),]
kable(df.type)

	TYPE	Count
4	GENERAL ACUTE CARE	4140
3	CRITICAL ACCESS	1241
7	PSYCHIATRIC	970
8	REHABILITATION	453
5	LONG TERM CARE	381
6	MILITARY	272
9	SPECIAL	211
1	CHILDREN	159
10	WOMEN	19
2	CHRONIC DISEASE	12

Based on TYPE alone, it would appear that every type except for the two most abundant (“GENERAL ACUTE CARE” and “CRITICAL ACCESS”) should be removed from consideration.But let’s also take a look at NAICS_DESC (which, by the way, stands for the North American Industry Classification System’s description):

# create and print table of NAICS_DESC
df.naics <- table(st_drop_geometry(hosp.conus$NAICS_DESC)) %>%
  as.data.frame()
colnames(df.naics) <- c("NAICS_DESC", "Count")
df.naics <- df.naics[order(df.naics$Count, decreasing = T),]
kable(df.naics)

	NAICS_DESC	Count
6	GENERAL MEDICAL AND SURGICAL HOSPITALS	5979
15	PSYCHIATRIC AND SUBSTANCE ABUSE HOSPITALS	801
17	SPECIALTY (EXCEPT PSYCHIATRIC AND SUBSTANCE ABUSE) HOSPITALS	430
16	REHABILITATION HOSPITALS (EXCEPT ALCOHOLISM, DRUG ADDICTION)	414
1	CHILDREN’S HOSPITALS, GENERAL	100
5	EXTENDED CARE HOSPITALS (EXCEPT MENTAL, SUBSTANCE ABUSE)	33
3	CHILDREN’S HOSPITALS, SPECIALTY (EXCEPT PSYCHIATRIC, SUBSTANCE ABUSE)	20
8	HOSPITALS, PSYCHIATRIC (EXCEPT CONVALESCENT)	16
4	CHRONIC DISEASE HOSPITALS	12
14	ORTHOPEDIC HOSPITALS	12
11	HOSPITALS, SUBSTANCE ABUSE	11
7	HOSPITALS, ADDICTION	10
2	CHILDREN’S HOSPITALS, PSYCHIATRIC OR SUBSTANCE ABUSE	7
10	HOSPITALS, SPECIALTY (EXCEPT PSYCHIATRIC, SUBSTANCE ABUSE)	7
12	MATERNITY HOSPITALS	3
9	HOSPITALS, PSYCHIATRIC PEDIATRIC	2
13	MENTAL HEALTH HOSPITALS	1

Based on NAICS_DESC alone, I would likely only retain the first, most abundant category (“GENERAL MEDICAL AND SURGICAL HOSPITALS”). With a total of 5979, it appears somewhat more inclusive than the top two TYPE categories (), which sum to 5381. I’m curious how these two attributes overlap, so I’ll see how the most abundant NAICS_DESC class cross-tabulates with all of the TYPE domain values:

# cross-tabulate NAICS_DESC and TYPE
df.gmsh <- hosp.open[hosp.open$NAICS_DESC == "GENERAL MEDICAL AND SURGICAL HOSPITALS",]
df.naics.vs.type <- table(df.gmsh$TYPE) %>%
  as.data.frame()
colnames(df.naics.vs.type) <- c("TYPE", "Count")
df.naics.vs.type <- df.naics.vs.type[order(df.naics.vs.type$Count, decreasing = T),]
kable(df.naics.vs.type)

	TYPE	Count
3	GENERAL ACUTE CARE	3835
2	CRITICAL ACCESS	1215
5	MILITARY	270
6	PSYCHIATRIC	147
4	LONG TERM CARE	86
8	SPECIAL	76
7	REHABILITATION	45
1	CHILDREN	33
9	WOMEN	10

There are several hospitals within the NAICS_DESC category of “GENERAL MEDICAL AND SURGICAL HOSPITALS” that I would consider to be unsuitable for EGET (e.g., “PSYCHIATRIC”, etc.). So, my inclination is to use the TYPE attribute, instead of the NAICS_DESC attribute for this round of filtering. Specifically, I will retain only those facilities labeled as either “GENERAL ACUTE CARE” or “CRITICAL ACCESS”.

# filter by type
hosp.type <- hosp.open[hosp.open$TYPE %in% c("GENERAL ACUTE CARE", "CRITICAL ACCESS"),]

After this filter, there are now 5136 facilities remaining.

Attribute #4: “NAME”

The previous version of EGET from WFDSS, as far as I can tell, was solely based on the following filter:

Excluded urgent care clinics and clipped this layer to the US States layer to exclude Canadian and Mexican hospitals to get at “definitive” care.

So, besides geography, the only attribute-driven filter was based on the distinction of a facility being considered an “urgent care clinic”. There are no attributes in the hospitals dataset we are relying on that specifically make this distinction. But, perhaps by looking at the names of the facilities, we can see if a similar filter could be crafted. Of course, this is not a straightforward task, as urgent care clinics can have a variety of names. In Salt Lake, for example, some facilities are called “InstaCare”, others “Ugent Care”, and perhaps others yet that might be considered urgent care facilities that only have “Clinic” in the name… Perhaps the best starting point is to look at common terms within hospital names. I’ll do this by counting the number of common words appear. Since there will be a huge number of words, I’ll filter to only those that show up at least 50 times (~1% of remaining facilities).

# get word count
word.count <- table(unlist(strsplit(tolower(hosp.type$NAME), " "))) %>%
  as.data.frame()
colnames(word.count) <- c("Word", "Count")
word.count <- word.count[word.count$Count >= 50,]
word.count <- word.count[order(word.count$Count, decreasing = T),]
kable(word.count)

	Word	Count
1529	hospital	2730
573	center	1538
2060	medical	1481
1427	health	741
1	-	677
2077	memorial	540
2727	regional	420
816	county	350
745	community	274
2362	of	229
3061	st	206
96	and	183
3352	valley	160
1582	inc	156
3062	st.	153
497	campus	148
3320	university	138
1252	general	134
1430	healthcare	105
2089	mercy	105
3195	the	94
2304	north	91
3157	system	86
1876	llc	77
3016	south	77
3475	west	77
1535	hospital,	74
215	baptist	73
2854	saint	71
660	city	68
149	ascension	62
517	care	60
963	district	60
2102	methodist	57
3191	texas	57

Surprisingly, “urgent” does not even show up in this list… In fact, it only shows up in 1 hospital names in the filtered set. Further, it likewise only shows up in 1 of the original, unfiltered dataset. Perhaps this has to do with the fact that urgent care clinics may be contained within other medical facilities? Difficult to say… But clearly names alone will not be useful for filtering urgent care clinics.

Looking back at the TYPE attribute a little deeper, perhaps the most relevant distinction is between “GENERAL ACUTE CARE” and “CRITICAL ACCESS” hospitals. After a little Googling, I found a definition for each provided by the Madison County Memorial Hospital:

Acute Care Hospitals (ACH) are hospitals that provide short-term patient care, whereas Critical Access Hospitals (CAH) are small facilities that give limited outpatient and inpatient hospital services to people in rural areas. Acute care is being a patient in a Hospital rather than an Urgent Care center. Critical care is a unit for serious cases that need more one on one care and are normally part of emergency room care.

These two TYPE categories may be the best way to approximate the urgent care classification, with “GENERAL ACUTE CARE” representing what one might typically consider a “hospital” and “CRITICAL ACCESS” representing what one might typically consider an “urgent care” center. For later comparisons, I will create an additional filter that distinguishes between these two types of facilities.

# create an additional filter for just general acute care facilities
hosp.acut <- hosp.type[hosp.type$TYPE == "GENERAL ACUTE CARE",]

Inclusive of both types, there are 5136 facilities remaining. Inclusive only of general acute care facilities, there are only 3919.

I will retain both for further investigation.

Attribute #6: “TRAUMA”

According to the American Trauma Society, there are five levels of trauma center categorization. Rather than copy/paste all of the details here, please click the link to read about the classification system. Unfortunately, this attribute it quite complex within the dataset… Where one might want a simple I, II, III, etc. classification, there are many more categories and combinations of categories. Here is a tabulation of the counts of each unique value within the “TRAUMA” field of the inclusive “GENERAL ACUTE CARE” and “CRITICAL ACCESS” data:

# count unique TRAUMA values
df.trauma <- table(hosp.type$"TRAUMA") %>%
  as.data.frame()
colnames(df.trauma) <- c("TRAUMA", "Count")
df.trauma <- df.trauma[order(df.trauma$Count, decreasing = T),]
kable(df.trauma)

	TRAUMA	Count
20	NOT AVAILABLE	2933
18	LEVEL IV	957
16	LEVEL III	515
9	LEVEL II	324
3	LEVEL I	211
26	TRH	37
25	TRF	32
2	CTH	29
6	LEVEL I ADULT, LEVEL II PEDIATRIC	18
5	LEVEL I ADULT, LEVEL I PEDIATRIC	17
19	LEVEL V	17
1	ATH	8
12	LEVEL II ADULT, LEVEL II PEDIATRIC	8
17	LEVEL III ADULT	7
4	LEVEL I ADULT	5
23	RTC	5
8	LEVEL I, LEVEL II PEDIATRIC	2
10	LEVEL II / PEDIATRIC	2
11	LEVEL II ADULT	2
7	LEVEL I, LEVEL I PEDIATRIC	1
13	LEVEL II REHAB	1
14	LEVEL II, LEVEL II PEDIATRIC	1
15	LEVEL II, LEVEL III PEDIATRIC, LEVEL II REHAB	1
21	PARC	1
22	REGIONAL	1
24	RTH	1

After some sleuthing, I found this page with definitions specific to this dataset:

https://hifld-geoplatform.opendata.arcgis.com/datasets/geoplatform::trauma-levels-1/explore

Unfortunately, it’s incomplete, and the definitions appear somewhat inconsistent between states. But, at least it helped me define some unknown abbreviations:

ATH: Area trauma center (seemingly comparable to Level III?)
CTH: Community trauma facility (seemingly comparable to Level IV?)
PARC: Primary adult resource center (can’t find trauma center level comparison)
REGIONAL: Regional trauma center (possibly comparable to Level II?)
RTC: Also regional trauma center (also possibly comparable to Level II?)
RTH: Regional trauma facility (also possibly comparable to Level II?)
TRF: Trauma receiving facility (seemingly comparable to Level V?)
TRH: Trauma receiving hospital (also seemingly comparable to Level V?)

So, most of them can be matched up with a trauma level. But there remains a challenge where there are multiple classifications (e.g., “LEVEL I ADULT, LEVEL II PEDIATRIC”). I guess we can assume that it’s safe to ignore the pediatric and rehab levels. Taking that into account, I will create a manual lookup table to attempt to assign as many facilities to a trauma center level as possible:

# build trauma lookup table
trauma.lut <- matrix(c("ATH", 3,
                       "CTH", 4,
                       "LEVEL I", 1,
                       "LEVEL I ADULT", 1,
                       "LEVEL I ADULT, LEVEL I PEDIATRIC", 1,
                       "LEVEL I ADULT, LEVEL II PEDIATRIC", 1,
                       "LEVEL I, LEVEL I PEDIATRIC", 1,
                       "LEVEL I, LEVEL II PEDIATRIC", 1,
                       "LEVEL II", 2,
                       "LEVEL II / PEDIATRIC", 2,
                       "LEVEL II ADULT", 2,
                       "LEVEL II ADULT, LEVEL II PEDIATRIC", 2,
                       "LEVEL II REHAB", NA,
                       "LEVEL II, LEVEL II PEDIATRIC", 2,
                       "LEVEL II, LEVEL III PEDIATRIC, LEVEL II REHAB", 2,
                       "LEVEL III", 3,
                       "LEVEL III ADULT", 3,
                       "LEVEL IV", 4,
                       "LEVEL V", 5,
                       "NOT AVAILABLE", NA,
                       "PARC", NA,
                       "REGIONAL", 2,
                       "RTC", 2,
                       "RTH", 2,
                       "TRF", 5,
                       "TRH", 5),
                     ncol = 2, byrow = T)
trauma.lut <- as.data.frame(trauma.lut)
colnames(trauma.lut) <- c("TRAUMA", "TRAUMA_RC")
trauma.lut$TRAUMA_RC <- as.numeric(trauma.lut$TRAUMA_RC)
kable(trauma.lut)

TRAUMA	TRAUMA_RC
ATH	3
CTH	4
LEVEL I	1
LEVEL I ADULT	1
LEVEL I ADULT, LEVEL I PEDIATRIC	1
LEVEL I ADULT, LEVEL II PEDIATRIC	1
LEVEL I, LEVEL I PEDIATRIC	1
LEVEL I, LEVEL II PEDIATRIC	1
LEVEL II	2
LEVEL II / PEDIATRIC	2
LEVEL II ADULT	2
LEVEL II ADULT, LEVEL II PEDIATRIC	2
LEVEL II REHAB	NA
LEVEL II, LEVEL II PEDIATRIC	2
LEVEL II, LEVEL III PEDIATRIC, LEVEL II REHAB	2
LEVEL III	3
LEVEL III ADULT	3
LEVEL IV	4
LEVEL V	5
NOT AVAILABLE	NA
PARC	NA
REGIONAL	2
RTC	2
RTH	2
TRF	5
TRH	5

I will join this table to the filtered hospitals data to get counts according to this new modified trauma center level classification:

# join the tables together
hosp.type <- merge(hosp.type, trauma.lut)

# summarize trauma center levels based on reclassified data
df.trauma.rc <- table(hosp.type$"TRAUMA_RC", useNA = "ifany") %>%
  as.data.frame()
colnames(df.trauma.rc) <- c("TRAUMA_RC", "Count")
df.trauma.rc <- df.trauma.rc[order(df.trauma.rc$Count, decreasing = T),]
kable(df.trauma.rc)

	TRAUMA_RC	Count
6	NA	2935
4	4	986
3	3	530
2	2	345
1	1	254
5	5	86

OK, much simpler. But, there are a few issues:

There are a lot of NAs. Does this mean they are just not considered trauma centers? Or does this mean they do not have reliable trauma center attribution? No way to know, I suppose.
It’s still not clear what a good trauma center level threshold would be with respect to EGET.

Nevertheless, it’s probably worth understanding the relationship between the two facility types described above (“GENERAL ACUTE CARE” and “CRITICAL ACCESS”) and trauma center levels. To do this, I will cross-tabulate the totals between the two fields:

# cross-tabulate TYPE and TRAUMA_RC
df.type.vs.trauma <- table(hosp.type$"TRAUMA_RC", hosp.type$"TYPE", useNA = "ifany") %>%
  as.data.frame()
colnames(df.type.vs.trauma) <- c("TRAUMA_RC", "TYPE", "Count")
df.type.vs.trauma <- df.type.vs.trauma[,c("TYPE", "TRAUMA_RC", "Count")]
df.type.vs.trauma <- df.type.vs.trauma[order(df.type.vs.trauma$TYPE, df.type.vs.trauma$TRAUMA_RC, decreasing = T),]
kable(df.type.vs.trauma)

	TYPE	TRAUMA_RC	Count
11	GENERAL ACUTE CARE	5	3
10	GENERAL ACUTE CARE	4	443
9	GENERAL ACUTE CARE	3	504
8	GENERAL ACUTE CARE	2	343
7	GENERAL ACUTE CARE	1	253
12	GENERAL ACUTE CARE	NA	2373
5	CRITICAL ACCESS	5	83
4	CRITICAL ACCESS	4	543
3	CRITICAL ACCESS	3	26
2	CRITICAL ACCESS	2	2
1	CRITICAL ACCESS	1	1
6	CRITICAL ACCESS	NA	562

OK, so “CRITICAL ACCESS” facilities tend to be level IV and level V centers, whereas “GENERAL ACUTE CARE” facilities tend to be IV and under. Still not a very clear distinction, at least quantitatively. I suppose we would need some incident commander or safety officer or even medical professional to provide insight here?

Nevertheless, to provide a spatial comparison of what the implications of applying trauma center level thresholds on EGET would be, I will still generate a series of subset datasets:

# create subsets
hosp.trauma.5 <- hosp.type[hosp.type$TRAUMA_RC <= 5,]
hosp.trauma.4 <- hosp.type[hosp.type$TRAUMA_RC <= 4,]
hosp.trauma.3 <- hosp.type[hosp.type$TRAUMA_RC <= 3,]
hosp.trauma.2 <- hosp.type[hosp.type$TRAUMA_RC <= 2,]
hosp.trauma.1 <- hosp.type[hosp.type$TRAUMA_RC <= 1,]

Spatial comparisons

To understand how using different hospital subsets will affect EGET, I will create a series of maps representing the CONUS-wide Euclidean distance to the nearest medical facility for each subset. What we know for sure (or at least are quite confident in…) is that we want medical facilities that are:

Open (as opposed to closed)
Belonging to either “GENERAL ACUTE CARE” or “CRITICAL ACCESS” type classifications (as opposed to psychiatric, rehab, etc.)

So, those will be the bare minimum requirements. Beyond that, I will generate the following subsets:

Subset 1: All “GENERAL ACUTE CARE” and “CRITICAL ACCESS” facilities, irrespective of trauma center level
Subset 2: Only “GENERAL ACUTE CARE” facilities, irrespective of trauma center level
Subset 3: Trauma center levels 1-5, irrespective of type
Subset 4: Trauma center levels 1-4, irrespective of type
Subset 5: Trauma center levels 1-3, irrespective of type
Subset 6: Trauma center levels 1-2, irrespective of type
Subset 7: Trauma center level 1 only, irrespective of type

Before I do that, I will lay some spatial groundwork. This will include reprojecting data into a suitable projection for CONUS-wide distance analyses (EPSG:5070), and creating a template Euclidean distance raster. I will also create a function that will create a table that provides the proportional area of CONUS that is within 10, 20, 50, 100, and 200 miles of the nearest hospital that will be applied to each subset.

# reproject data
subset.1 <- st_transform(hosp.type, 5070)
subset.2 <- st_transform(hosp.acut, 5070)
subset.3 <- st_transform(hosp.trauma.5, 5070)
subset.4 <- st_transform(hosp.trauma.4, 5070)
subset.5 <- st_transform(hosp.trauma.3, 5070)
subset.6 <- st_transform(hosp.trauma.2, 5070)
subset.7 <- st_transform(hosp.trauma.1, 5070)
state.conus <- st_transform(state.conus, 5070)

# create euclidean distance template raster
state.conus$RASTERID <- 1
v <- vect(state.conus)
r <- rast(v, resolution = 1000)
ed.temp <- rasterize(v, r, "RASTERID")

# define function
ed.fun <- function(subset){
  
  # calculate and plot euclidean distance (in miles) to subset
  d <- distance(ed.temp, subset, rasterize = T) %>%
    mask(v)
  d <- d / 1609.34
  plot(d, range = c(0,575), col = viridis(256))
  lines(v)
  
  # get total area
  area <- ifel(!is.na(d), 1, NA) %>%
    global(fun = sum, na.rm = T)
  
  # get threshold counts
  d.010 <- ifel(d <= 10, 1, NA)
  p.010 <- global(d.010, fun = sum, na.rm = T) / area
  p.010 <- p.010[[1]]
  d.020 <- ifel(d <= 20, 1, NA)
  p.020 <- global(d.020, fun = sum, na.rm = T) / area
  p.020 <- p.020[[1]]
  d.050 <- ifel(d <= 50, 1, NA)
  p.050 <- global(d.050, fun = sum, na.rm = T) / area
  p.050 <- p.050[[1]]
  d.100 <- ifel(d <= 100, 1, NA)
  p.100 <- global(d.100, fun = sum, na.rm = T) / area
  p.100 <- p.100[[1]]
  d.200 <- ifel(d <= 200, 1, NA)
  p.200 <- global(d.200, fun = sum, na.rm = T) / area
  p.200 <- p.200[[1]]
  d.inf <- ifel(d > 200, 1, NA)
  p.inf <- global(d.inf, fun = sum, na.rm = T) / area
  p.inf <- p.inf[[1]]
  
  # compile into table
  result <- data.frame(dist = c(10,20,50,100,200,Inf),
                       prop = c(p.010,p.020,p.050,p.100,p.200,p.inf))
  
  # print table
  kable(result)
}

Subset 1

# apply function
ed.fun(subset.1)

dist	prop
10	0.2980930
20	0.6617513
50	0.9556925
100	0.9993678
200	1.0000000
Inf	NaN

Subset 2

# apply function
ed.fun(subset.2)

dist	prop
10	0.1924894
20	0.4471360
50	0.7811645
100	0.9472940
200	0.9999970
Inf	0.0000030

Subset 3

# apply function
ed.fun(subset.3)

dist	prop
10	0.1629902
20	0.4429677
50	0.8626250
100	0.9725907
200	0.9998984
Inf	0.0001016

Subset 4

# apply function
ed.fun(subset.4)

dist	prop
10	0.1547315
20	0.4187674
50	0.8430527
100	0.9682738
200	0.9994935
Inf	0.0005065

Subset 5

# apply function
ed.fun(subset.5)

dist	prop
10	0.0728470
20	0.2197222
50	0.6358646
100	0.8989022
200	0.9979838
Inf	0.0020162

Subset 6

# apply function
ed.fun(subset.6)

dist	prop
10	0.0378014
20	0.1207872
50	0.4445640
100	0.8002556
200	0.9950198
Inf	0.0049802

Subset 7

# apply function
ed.fun(subset.7)

dist	prop
10	0.0169285
20	0.0577828
50	0.2573693
100	0.5562324
200	0.8260210
Inf	0.1739790

Concluding remarks and questions

Hopefully this was a useful exercise to help get a better understanding of the hospitals data upon which we intend to base the next generation of EGET. As can be expected, the more restrictive of a filtering procedure one applies to the hospitals, the more pessimistic the resulting EGET will appear. It is worth noting that these Euclidean distances are best-case scenarios. EGET, which will take into account road networks and off-road terrain/vegetation conditions, will produce much higher effective travel distances to these facilities. But, this provides a fast approximation that will hopefully inform our conversation about how to best filter hospitals data.

As mentioned earlier, two filters appear clear:

Facilities must have STATUS == “OPEN”
Facilities must belong to TYPE in [“GENERAL ACUTE CARE”, “CRITICAL ACCESS”]

As for additional filtering beyond those fairly generous thresholds, the questions remain:

1. Should we filter to just “GENERAL ACUTE CARE” facilities, which tend to be larger hospitals? Or should we also include “CRITICAL ACCESS” facilities, which tend to include smaller, urgent care-type facilities?

2. Or, alternatively, should we base this thresholding on trauma center levels? If so, what is an appropriate level?