As is the case across pretty much all medical specialties, radiology has seen quite a significant expansion in the roles of non-physician practitioners (NPP), such as nurse practitioners and physician assistants. The growing number of these practitioners has stimulated a passionate differences of opinions among practice leaders in the community. Some are strongly opposed, worrying about “creep” into their physicians’ workloads. Others are embracing NPPs as part of physican-led teams, using them in very narrow and highly supervised clinical scenarios. The only data, however, on how they’re used and by whom is all related to surveys, which can be methodologically problematic.
Given the nuance and debate surrounding this topic, our team endeavored to characterize the present state of NPP participation in radiology practices in attempt to inform administrative discussions. Our intended workflow involves a series of publications which each dig into separate focal points of the debate. The first installment of this workflow characterizes the NPPs who work for radiology-only practices.
The following presentation covers a facet of the first installment of our series investigating this topic, and examines the geographic distribution of these individuals across the US.
Notes:
As the scope of the overarching project required me to construct a full data infrastructure, it would be inappropriate and confusing for me to explain all the steps taken to arrive at the initial data extract used to ultimately construct the visualization seen at the end of this presentation. Therefore, I’ll summarize and oversimplify the relevant steps of the process as follows:
# First 10 observations
library(readxl)
rad_pracs_npp_npis <- read_excel("~/Work Files & Folders/NPP Project/Transferred from HEAL Server/Data extracts/core data/rad_pracs_npp_npis.xlsx")
library(kableExtra)
knitr::kable(rad_pracs_npp_npis[1:10,]) %>%
kable_styling(bootstrap_options = "condensed", font_size=12)| NPI | Grd_yr | N_Yrs_Practice | yrs_practice | pri_spec | org_pac_id | prac_size | num_org_mem | st | zip | Year | region |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1003000712 | 1994 | 20 | 10 to 24 | NURSE PRACTITIONER | 5799679874 | 100 or more | 10 | CA | 93720 | 2014 | West |
| 1003012667 | 1992 | 24 | 10 to 24 | NURSE PRACTITIONER | 4486568482 | 50 to 99 | 88 | MN | 55102 | 2016 | Midwest |
| 1003012667 | 1992 | 25 | 25+ | NURSE PRACTITIONER | 4486568482 | 50 to 99 | 93 | MN | 55102 | 2017 | Midwest |
| 1003012667 | 1992 | 28 | 25+ | NURSE PRACTITIONER | 4486568482 | 100 or more | 176 | MN | 55102 | 2020 | Midwest |
| 1003012667 | 1992 | 27 | 25+ | NURSE PRACTITIONER | 4486568482 | 100 or more | 185 | MN | 55102 | 2019 | Midwest |
| 1003141136 | 2012 | 6 | 1 to 9 | PHYSICIAN ASSISTANT | 1355239534 | 10 to 49 | 24 | KY | 40207 | 2018 | South |
| 1003141136 | 2012 | 6 | 1 to 9 | PHYSICIAN ASSISTANT | 1355239534 | 10 to 49 | 24 | KY | 40216 | 2018 | South |
| 1003141136 | 2012 | 7 | 1 to 9 | PHYSICIAN ASSISTANT | 1355239534 | 10 to 49 | 26 | KY | 40207 | 2019 | South |
| 1003141136 | 2012 | 8 | 1 to 9 | PHYSICIAN ASSISTANT | 1355239534 | 10 to 49 | 26 | KY | 40207 | 2020 | South |
| 1003141136 | 2012 | 7 | 1 to 9 | PHYSICIAN ASSISTANT | 1355239534 | 10 to 49 | 26 | KY | 40216 | 2019 | South |
The only variables we care about are:
First, I needed to work out how to designate a single zip code for each provider, as some providers are affiliated with multiple practices and most practices are listed with multiple locations. Although imperfect, my team’s determination was to assign to each provider the highest population zip code from the largest (in terms of organization members) practice with which he/she is affiliated.
I pulled 2010 U.S. Census data to achieve this.
libname dat 'P:\H64.2021.SS.NPP_Rad_pracs\dat';
libname zippop 'C:\DATA\Census';
*Pull zipcode and population census data from 2010;
data zippop_10(drop=zcta5);
format zip pop;
set zippop.sf12010860us1(rename=(p0010001=pop));
keep zip pop;
zip = input(zcta5, 5.);
format zip z5.;
run;
*Reduce file;
data red_tableau;
set dat.rad_pracs_npp_npis;
keep npi org_pac_id num_org_mem zip Year;
where st not in ("GU" "PR" "VI")
and Year < 2020;
run;
proc sort data=red_tableau; by NPI Year; run;
*Filter largest practice for each NPP;
data red_tableau_nozip;
set red_tableau;
drop zip;
run;
proc sql;
create table npp_org as
select *
from red_tableau_nozip
where num_org_mem in (
select max(num_org_mem)
from red_tableau_nozip
group by npi, Year
);
quit;
proc sort data=npp_org nodupkey; by npi Year; run;
*Filter highest pop zipcode for each NPP;
proc sort data=red_tableau; by zip; run;
data red_tableau_pluspop1;
merge red_tableau(in=a)
zippop_10;
by zip;
if a;
drop num_org_mem npi;
run;
%macro loop(yr);
proc sql;
create table red_tableau_pluspop_yr&yr as
select *
from red_tableau_pluspop1
where pop in (
select max(pop)
from red_tableau_pluspop1
where Year = &yr
group by org_pac_id
)
and Year = &yr;
quit;
proc sort data=red_tableau_pluspop_yr&yr nodupkey; by _all_; run;
%mend loop;
%loop(2014);
%loop(2015);
%loop(2016);
%loop(2017);
%loop(2018);
%loop(2019);
data red_tableau_pluspop2;
set red_tableau_pluspop_yr:;
run;
proc sort data=red_tableau_pluspop2 nodupkey; by org_pac_id Year; run;
*Merge NPI back in;
proc sort data=npp_org; by org_pac_id Year; run;
proc sort data=red_tableau_pluspop2; by org_pac_id Year; run;
data npp_single_org_zip(drop=pop);
format npi Year org_pac_id zip;
merge red_tableau_pluspop2(in=a)
npp_org(keep=npi org_pac_id Year);
by org_pac_id Year;
if a;
if npi = . then delete;
run;
proc sort data=npp_single_org_zip nodupkey; by npi Year; run;
*Count npps per zip per year;
proc sql;
create table extract1 as
select zip, year, count(npi) as cnt_npi
from npp_single_org_zip
group by 1,2;
quit;
*Merge in all empty zips;
%macro loop2(yr);
data zip_yr&yr;
set zippop_10(keep=zip);
Year = &yr;
run;
%mend loop2;
%loop2(2014);
%loop2(2015);
%loop2(2016);
%loop2(2017);
%loop2(2018);
%loop2(2019);
data lst_zip_yr;
set zip_yr:;
run;
proc sort data=lst_zip_yr; by zip Year; run;
data npp_cnts_zip5;
merge extract1
lst_zip_yr;
by zip Year;
*if cnt_npi = . then cnt_npi = 0;
run;
*Export as .csv for Tableau;
proc export data=npp_cnts_zip5
outfile="P:\H64.2021.SS.NPP_Rad_pracs\fig\tableau heatmap\npp counts per zip\npp_cnts_zip5.csv"
dbms=csv
replace;
run;
Initially, my task was to create a basic heat map with bubbles representing counts of NPPs aggregated at the zip5 level; larger bubbles indicate a higher concentration of NPPs.
After review of the zip5 heat map, it was evident that given the small numbers of 100% radiology-only practices, the even smaller number of affiliated NPPs in a single year stretched far too thin over zip5. Therefore, we decided to use the less granular area of zip3, as it is easily transferable from zip5, and still much more granular than state-level.
The problem: Tableau, the best tool to which I have access that permits sub-state-level reporting capabilities in heatmap construction, doesn’t have zip3 areas geocoded by default, and extrapolating the aggregations to the county level would cost too much credibility.
E.g., Georgia: Zip3 (left) & Zip5 (right)
Next, I pulled a list of lat/lon coordinates provided for each zip5 across the country and wrote a program to calculate the centroid for each zip3 area using the coordinates of their composite zip5 areas.
library(tidyverse)
# Import 'US' zip file (tab-delimited)
library(readr)
US_zip3_mapping <- read_delim("Work Files & Folders/NPP Project/ZIP3 Mapping/US.txt",
delim = "\t", escape_double = FALSE,
col_names = FALSE, trim_ws = TRUE)
# Select only necessary columns and rename cols
dat <- US_zip3_mapping %>% select(2,10,11)
names(dat) <- c("zip5", "latitude", "longitude")
# Create a zip3 column, sort & reorder data
dat$zip3 <- substr(dat$zip5, 1, 3)
dat[order(dat$zip5),]
dat2 <- tibble(zip5 = dat$zip5,
zip3 = dat$zip3,
latitude = dat$latitude,
longitude = dat$longitude)
# Function to calculate the centroid between given geo coordinates
geographic_average <- function(lon, lat, weight=NULL) {
if (is.null(weight)) {
weight <- rep(1, length(lon))
}
lon <- weighted.mean(lon, w = weight)
lat <- weighted.mean(lat, w = weight)
data.frame(lon = lon, lat = lat)
}
# Calculate the centroid for each unique zip3
tbl <- tibble(unique_zip3 = character(), lon = double(), lat = double())
for (i in 1:length(unique(dat2$zip3))) {
unique_zip3 <- unique(dat2$zip3)[i]
inner_dat <- dat2[dat2$zip3 == unique_zip3, ]
tbl[i,] <- data.frame(unique_zip3, geographic_average(inner_dat$longitude,inner_dat$latitude))
}
# Map the zip3 back to the zip5s
names(tbl) <- c("zip_3", "long3","lat3")
joined_tbl <- left_join(dat2, tbl,
by = c("zip3" = "zip_3"))
# Export to .csv
write.csv(joined_tbl, "C:\Users\ssantavicca3\Documents\zip3_mapping.csv", row.names=TRUE)
One important question we aimed to gain insight into was whether these NPPs were more concentrated around large population centers. At first glance the answer seems obvious, however, I wanted to better capture it in the visualization nonetheless.
While aggregating NPP counts from the zip5 level to zip3, I additionally aggregated the 2010 Census data on area (m2) and population from the zip5 level. Using the area and population data, I computed zip3 population density (1000/mi2) over which I could layer count data on the heat map.
libname zip5_pop 'C:\DATA\Census';
*Import the zip3 crosswalk and the previous zip5 npp-count file I created;
proc import datafile="P:\H64.2021.SS.NPP_Rad_pracs\fig\tableau heatmap\npp counts per zip\npp_cnts_zip5.xlsx"
out=cnts_zip5 dbms=xlsx replace;
run;
proc import datafile="P:\H64.2021.SS.NPP_Rad_pracs\fig\tableau heatmap\ZIP3 MAPPING\zip3_mapping.csv"
out=raw_zip3 dbms=csv replace;
run;
*Pull population and land area info using the initial 2010 Census data used for zip5_pop list;
data zip5_a(drop=arealand zcta5 zip3a);
set zip5_pop.sf12010860us1;
keep zcta5 p0010001 arealand land_area zip5 zip3a zip3;
rename p0010001 = pop;
*convert arealand var to numeric;
land_area = input(arealand, BEST12.);
*convert zip5 to numeric;
zip5 = input(zcta5, 5.);
format zip5 z5.;
*create zip3 column;
zip3 = substr(zcta5, 1,3);
run;
*Give the zip5 column in the npp-count file leading zeros then create zip3 column;
data cnts_zip5_b(drop=char_zip5);
set cnts_zip5;
format zip z5.;
*create zip3 column;
char_zip5 = put(zip,z5. -L);
zip3 = substr(char_zip5,1,3);
rename zip = zip5;
run;
*Aggregate the pop and area data from the census file to the zip3 level;
proc sql;
create table zip5_b as
select zip3, sum(pop) as pop, sum(land_area) as land_area
from zip5_a
group by 1;
quit;
*Calculate square miles for land area and then population density for zip3 level;
data zip5_c;
set zip5_b;
land_area_sqmile = round(land_area/2589988, .1);
pop_density = round(pop/land_area_sqmile, .1);
run;
*Aggregate the npp count data to the zip3 level;
proc sql;
create table cnts_zip5_c as
select zip3, Year, sum(cnt_npi) as cnt_npi
from cnts_zip5_b
group by 1,2
order by 1;
quit;
*Merge the count data with the pop and area data;
data final1;
merge cnts_zip5_c(in=a)
zip5_c;
by zip3;
run;
*Export the most recent dataframe!;
*Create additional data frame with zip5 merged in incase Tableau joins get too spicy;
proc sort data=cnts_zip5_b out=cz5b; by zip3 Year; run;
proc sort data=final1 out=f1; by zip3 Year; run;
data final2;
merge cz5b
f1;
by zip3 Year;
run;
*Export to .csv for Tableau;
proc export data=final2
outfile="P:\H64.2021.SS.NPP_Rad_pracs\fig\tableau heatmap\npp counts per zip\raw_zip3_zip5_area_pop_nppcnt.csv"
dbms=csv
replace;
run;
Layering information in Tableau can get messy when you have a particular vision in mind. That said, I’ll save you from the explanation of how I put the maps together.
Physician Compare:
- https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/Care-Compare-DAC-Initiative
2010 US Census data:
- https://www.census.gov/programs-surveys/decennial-census/data/datasets.2010.List_327707051.html
US zip5 lat/lon coordinates:
- http://download.geonames.org/export/zip/