Background for the Larger Project

As is the case across pretty much all medical specialties, radiology has seen quite a significant expansion in the roles of non-physician practitioners (NPP), such as nurse practitioners and physician assistants. The growing number of these practitioners has stimulated a passionate differences of opinions among practice leaders in the community. Some are strongly opposed, worrying about “creep” into their physicians’ workloads. Others are embracing NPPs as part of physican-led teams, using them in very narrow and highly supervised clinical scenarios. The only data, however, on how they’re used and by whom is all related to surveys, which can be methodologically problematic.

Given the nuance and debate surrounding this topic, our team endeavored to characterize the present state of NPP participation in radiology practices in attempt to inform administrative discussions. Our intended workflow involves a series of publications which each dig into separate focal points of the debate. The first installment of this workflow characterizes the NPPs who work for radiology-only practices.

The following presentation covers a facet of the first installment of our series investigating this topic, and examines the geographic distribution of these individuals across the US.

Notes:

For purposes of this analysis, we only care about defining nurse practitioners and physician assistants as “NPPs”.
Similarly, exploratory data analysis revealed the data is counterproductive when considering broad definitions of radiology-practices, so we more narrowly focus on the sample of radiology-only practices (i.e., radiologists account for 100% of the physician workforce of the practice).
Finally, researchers, engineers, analysts, etc. should typically avoid using zip codes a geographic area when performing analyses as the lines are drawn by the postal service and mean little as a grouping mechanism in most contexts, however, as the goal of this heatmap is to simply present a relatively granular visualization of the geographic distribution of some population, the pitfalls of using zip codes are not necessarily a limitation to our pursuit.

Preliminary Data Construction (Tool: SAS)

As the scope of the overarching project required me to construct a full data infrastructure, it would be inappropriate and confusing for me to explain all the steps taken to arrive at the initial data extract used to ultimately construct the visualization seen at the end of this presentation. Therefore, I’ll summarize and oversimplify the relevant steps of the process as follows:

Obtained information for all U.S. physicians billing to Medicare each year of 2014-2019 from From the Physician Compare database maintained by the Centers for Medicare and Medicaid Services (CMS).
Generated list of radiology-only practices
- Classified self-reported primary specialties as either an NPP, radiologist, physician (including radiologists) or other.
- Aggregated distinct providers for each group at the practice-level.
- Computed ratio of overall physician workforce composed of radiologists.
- Filtered down to practices where radiologist-physician ratio == 1.
- Looped through all years.
Circled back to provider-level data from Physician Compare to filter out providers who were not NPPs working at a 100% radiology-only practice.
Aggregated distinct NPPs at the practice-level.
Merged in or computed other information which are not used for the visualization discussed in this presentation.
Looped through all years and then appended annual files.

Resulting Data Extract

# First 10 observations
library(readxl)
rad_pracs_npp_npis <- read_excel("~/Work Files & Folders/NPP Project/Transferred from HEAL Server/Data extracts/core data/rad_pracs_npp_npis.xlsx")
library(kableExtra)
knitr::kable(rad_pracs_npp_npis[1:10,]) %>%
    kable_styling(bootstrap_options = "condensed", font_size=12)

NPI	Grd_yr	N_Yrs_Practice	yrs_practice	pri_spec	org_pac_id	prac_size	num_org_mem	st	zip	Year	region
1003000712	1994	20	10 to 24	NURSE PRACTITIONER	5799679874	100 or more	10	CA	93720	2014	West
1003012667	1992	24	10 to 24	NURSE PRACTITIONER	4486568482	50 to 99	88	MN	55102	2016	Midwest
1003012667	1992	25	25+	NURSE PRACTITIONER	4486568482	50 to 99	93	MN	55102	2017	Midwest
1003012667	1992	28	25+	NURSE PRACTITIONER	4486568482	100 or more	176	MN	55102	2020	Midwest
1003012667	1992	27	25+	NURSE PRACTITIONER	4486568482	100 or more	185	MN	55102	2019	Midwest
1003141136	2012	6	1 to 9	PHYSICIAN ASSISTANT	1355239534	10 to 49	24	KY	40207	2018	South
1003141136	2012	6	1 to 9	PHYSICIAN ASSISTANT	1355239534	10 to 49	24	KY	40216	2018	South
1003141136	2012	7	1 to 9	PHYSICIAN ASSISTANT	1355239534	10 to 49	26	KY	40207	2019	South
1003141136	2012	8	1 to 9	PHYSICIAN ASSISTANT	1355239534	10 to 49	26	KY	40207	2020	South
1003141136	2012	7	1 to 9	PHYSICIAN ASSISTANT	1355239534	10 to 49	26	KY	40216	2019	South

The only variables we care about are:

NPI ~ unique provider ID
org_pac_id ~ unique practice/group ID
zip ~ 5 digit zip code for unique NPI+org_pac_id combination
Year ~ billing year

Calculating NPP Counts at Zip5 Level

Tool: SAS

First, I needed to work out how to designate a single zip code for each provider, as some providers are affiliated with multiple practices and most practices are listed with multiple locations. Although imperfect, my team’s determination was to assign to each provider the highest population zip code from the largest (in terms of organization members) practice with which he/she is affiliated.

I pulled 2010 U.S. Census data to achieve this.

libname dat 'P:\H64.2021.SS.NPP_Rad_pracs\dat'; 
libname zippop 'C:\DATA\Census'; 

*Pull zipcode and population census data from 2010; 
data zippop_10(drop=zcta5);
    format zip pop;
    set zippop.sf12010860us1(rename=(p0010001=pop));
    keep zip pop;
    zip = input(zcta5, 5.);
    format zip z5.;
run;

*Reduce file; 
data red_tableau;
    set dat.rad_pracs_npp_npis;
    keep npi org_pac_id num_org_mem zip Year;
    where st not in ("GU" "PR" "VI")
        and Year < 2020;
run; 
proc sort data=red_tableau; by NPI Year; run;


*Filter largest practice for each NPP; 
data red_tableau_nozip;
    set red_tableau;
    drop zip;
run; 
proc sql;
    create table npp_org as
        select *
        from red_tableau_nozip
        where num_org_mem in (
            select max(num_org_mem)
            from red_tableau_nozip
            group by npi, Year
        );
quit; 
proc sort data=npp_org nodupkey; by npi Year; run;



*Filter highest pop zipcode for each NPP; 
proc sort data=red_tableau; by zip; run; 
data red_tableau_pluspop1; 
    merge red_tableau(in=a)
          zippop_10;
    by zip;
    if a;
    drop num_org_mem npi;
run; 
%macro loop(yr); 
proc sql;
    create table red_tableau_pluspop_yr&yr as
        select *
        from red_tableau_pluspop1
        where pop in (
            select max(pop)
            from red_tableau_pluspop1
            where Year = &yr
            group by org_pac_id
        )
        and Year = &yr;
quit; 
proc sort data=red_tableau_pluspop_yr&yr nodupkey; by _all_; run; 
%mend loop; 
%loop(2014); 
%loop(2015); 
%loop(2016); 
%loop(2017); 
%loop(2018); 
%loop(2019); 
data red_tableau_pluspop2;
    set red_tableau_pluspop_yr:;
run; 
proc sort data=red_tableau_pluspop2 nodupkey; by org_pac_id Year; run; 



*Merge NPI back in; 
proc sort data=npp_org; by org_pac_id Year; run; 
proc sort data=red_tableau_pluspop2; by org_pac_id Year; run; 
data npp_single_org_zip(drop=pop); 
    format npi Year org_pac_id zip; 
    merge red_tableau_pluspop2(in=a)
          npp_org(keep=npi org_pac_id Year);
    by org_pac_id Year;
    if a;
    if npi = . then delete;
run; 
proc sort data=npp_single_org_zip nodupkey; by npi Year; run;



*Count npps per zip per year; 
proc sql;
    create table extract1 as
        select zip, year, count(npi) as cnt_npi
        from npp_single_org_zip
        group by 1,2;
quit;


*Merge in all empty zips; 
%macro loop2(yr); 
data zip_yr&yr;
    set zippop_10(keep=zip);
    Year = &yr;
run; 
%mend loop2; 
%loop2(2014); 
%loop2(2015); 
%loop2(2016); 
%loop2(2017); 
%loop2(2018); 
%loop2(2019); 
data lst_zip_yr;
    set zip_yr:;
run; 
proc sort data=lst_zip_yr; by zip Year; run; 
data npp_cnts_zip5; 
    merge extract1
          lst_zip_yr;
    by zip Year;
    *if cnt_npi = . then cnt_npi = 0;
run; 


*Export as .csv for Tableau; 
proc export data=npp_cnts_zip5
    outfile="P:\H64.2021.SS.NPP_Rad_pracs\fig\tableau heatmap\npp counts per zip\npp_cnts_zip5.csv"
    dbms=csv
    replace;
run;

Decision to use Zip3

Initially, my task was to create a basic heat map with bubbles representing counts of NPPs aggregated at the zip5 level; larger bubbles indicate a higher concentration of NPPs.

After review of the zip5 heat map, it was evident that given the small numbers of 100% radiology-only practices, the even smaller number of affiliated NPPs in a single year stretched far too thin over zip5. Therefore, we decided to use the less granular area of zip3, as it is easily transferable from zip5, and still much more granular than state-level.

The problem: Tableau, the best tool to which I have access that permits sub-state-level reporting capabilities in heatmap construction, doesn’t have zip3 areas geocoded by default, and extrapolating the aggregations to the county level would cost too much credibility.

E.g., Georgia: Zip3 (left) & Zip5 (right)

Calculating Centroids for Zip3 Data

Tool: R

Next, I pulled a list of lat/lon coordinates provided for each zip5 across the country and wrote a program to calculate the centroid for each zip3 area using the coordinates of their composite zip5 areas.

library(tidyverse)

# Import 'US' zip file (tab-delimited) 
library(readr) 
US_zip3_mapping <- read_delim("Work Files & Folders/NPP Project/ZIP3 Mapping/US.txt", 
                              delim = "\t", escape_double = FALSE, 
                              col_names = FALSE, trim_ws = TRUE)

# Select only necessary columns and rename cols 
dat <- US_zip3_mapping %>% select(2,10,11) 
names(dat) <- c("zip5", "latitude", "longitude")

# Create a zip3 column, sort & reorder data 
dat$zip3 <- substr(dat$zip5, 1, 3) 
dat[order(dat$zip5),] 
dat2 <- tibble(zip5 = dat$zip5, 
               zip3 = dat$zip3, 
               latitude = dat$latitude, 
               longitude = dat$longitude)


# Function to calculate the centroid between given geo coordinates 
geographic_average <- function(lon, lat, weight=NULL) { 
    if (is.null(weight)) { 
        weight <- rep(1, length(lon)) 
    }
    lon <- weighted.mean(lon, w = weight) 
    lat <- weighted.mean(lat, w = weight) 
    data.frame(lon = lon, lat = lat) 
}

# Calculate the centroid for each unique zip3 
tbl <- tibble(unique_zip3 = character(), lon = double(), lat = double()) 

for (i in 1:length(unique(dat2$zip3))) { 
    unique_zip3 <- unique(dat2$zip3)[i] 
    inner_dat <- dat2[dat2$zip3 == unique_zip3, ] 
    tbl[i,] <- data.frame(unique_zip3, geographic_average(inner_dat$longitude,inner_dat$latitude)) 
}

# Map the zip3 back to the zip5s 
names(tbl) <- c("zip_3", "long3","lat3") 
joined_tbl <- left_join(dat2, tbl, 
                        by = c("zip3" = "zip_3"))

# Export to .csv 
write.csv(joined_tbl, "C:\Users\ssantavicca3\Documents\zip3_mapping.csv", row.names=TRUE)

Calculating NPP Counts at Zip3 Level

Tool: SAS

One important question we aimed to gain insight into was whether these NPPs were more concentrated around large population centers. At first glance the answer seems obvious, however, I wanted to better capture it in the visualization nonetheless.

While aggregating NPP counts from the zip5 level to zip3, I additionally aggregated the 2010 Census data on area (m²) and population from the zip5 level. Using the area and population data, I computed zip3 population density (1000/mi²) over which I could layer count data on the heat map.

libname zip5_pop 'C:\DATA\Census';


*Import the zip3 crosswalk and the previous zip5 npp-count file I created; 
proc import datafile="P:\H64.2021.SS.NPP_Rad_pracs\fig\tableau heatmap\npp counts per zip\npp_cnts_zip5.xlsx"
    out=cnts_zip5 dbms=xlsx replace;
run;
proc import datafile="P:\H64.2021.SS.NPP_Rad_pracs\fig\tableau heatmap\ZIP3 MAPPING\zip3_mapping.csv"
    out=raw_zip3 dbms=csv replace;
run;

*Pull population and land area info using the initial 2010 Census data used for zip5_pop list; 
data zip5_a(drop=arealand zcta5 zip3a);
    set zip5_pop.sf12010860us1;
    keep zcta5 p0010001 arealand land_area zip5 zip3a zip3; 
    rename p0010001 = pop; 
    
    *convert arealand var to numeric; 
    land_area = input(arealand, BEST12.); 
    
    *convert zip5 to numeric; 
    zip5 = input(zcta5, 5.);
    format zip5 z5.; 
    
    *create zip3 column; 
    zip3 = substr(zcta5, 1,3); 
run;

*Give the zip5 column in the npp-count file leading zeros then create zip3 column; 
data cnts_zip5_b(drop=char_zip5);
    set cnts_zip5;
    format zip z5.; 
    
    *create zip3 column; 
    char_zip5 = put(zip,z5. -L);
    zip3 = substr(char_zip5,1,3);
    
    rename zip = zip5;
run;

*Aggregate the pop and area data from the census file to the zip3 level; 
proc sql;
    create table zip5_b as
        select zip3, sum(pop) as pop, sum(land_area) as land_area
        from zip5_a
        group by 1;
quit;

*Calculate square miles for land area and then population density for zip3 level; 
data zip5_c;
    set zip5_b;
    land_area_sqmile = round(land_area/2589988, .1);
    pop_density = round(pop/land_area_sqmile, .1);
run;


*Aggregate the npp count data to the zip3 level; 
proc sql;
    create table cnts_zip5_c as
        select zip3, Year, sum(cnt_npi) as cnt_npi
        from cnts_zip5_b
        group by 1,2
        order by 1;
quit;



*Merge the count data with the pop and area data; 
data final1;
    merge cnts_zip5_c(in=a)
          zip5_c;
    by zip3;
run; 
*Export the most recent dataframe!;

*Create additional data frame with zip5 merged in incase Tableau joins get too spicy; 
proc sort data=cnts_zip5_b out=cz5b; by zip3 Year; run; 
proc sort data=final1 out=f1; by zip3 Year; run; 

data final2;
    merge cz5b
          f1;
    by zip3 Year;
run;

*Export to .csv for Tableau; 
proc export data=final2 
    outfile="P:\H64.2021.SS.NPP_Rad_pracs\fig\tableau heatmap\npp counts per zip\raw_zip3_zip5_area_pop_nppcnt.csv"
    dbms=csv
    replace;
run;

Distribution of Non-physician Providers across the United States: Growth from 2014 to 2019

Background for the Larger Project

Preliminary Data Construction (Tool: SAS)

Resulting Data Extract

Calculating NPP Counts at Zip5 Level

Tool: SAS

Decision to use Zip3

Calculating Centroids for Zip3 Data

Tool: R

Calculating NPP Counts at Zip3 Level

Tool: SAS

Generating the Heatmap

Enjoy!

Counts of NPPs and Popoulation Density by Zip3 (2014)

Counts of NPPs and Popoulation Density by Zip3 (2019)

Sources