Teaming v. PUF

Preamble

Information is sought on transactions that may have been over-counted in the PUF, and data sources that may have been excluded from the PUF.

Executive Summary

This analysis examines differences between Teaming data and PUF data, supporting two hypotheses:

Transactions over-counted in the PUF
Data sources excluded from the PUF.

Provider meta data is the basis for analysis - NPI and Taxonomy. The result of this analysis provides taxonomy data types - paired, source, destination - ranked by likelihood of supporting the two hypotheses.

For example, the taxonomy pair General Acute Care Hospital – Internal Medicine is the most likely taxonomy pair to be included in transactions over-counted in the PUF. Forty such pairs are provided, listed in order of likelihood.

Similarly, Diagnostic Radiology – Durable Medical Equipment & Medical Supplies is the most likely taxonomy pair to be included in transactions excluded from the PUF. Thirty such pairs are provided, listed in order of likelihood.

Internal Medicine is the most likely source taxonomy, as well as most likely destination taxonomy to be included in transactions over-counted in the PUF. Twenty taxonomies are listed, for source and destination, in order of likelihood.

Durable Medical Equipment & Medical Supplies is the most likely source taxonomy, as well as most likely destination taxonomy to be included in transactions excluded from the PUF. Eight taxonomies are listed, for source and destination, in order of likelihood.

Further analysis such spatial mapping and NPI ranking, is possible.

Hopefully this analysis provides useful insight for resolving the two hypotheses.

# ingest pre-processed data
npi_sg14v2 <- readRDS("./data/npi_sg14v2.Rds")# All NPIs from teaming data
npi14_180  <- readRDS("./data/npi14_180.Rds") # All NPIs from puf 180 day file
npi14_all  <- readRDS("./data/npi14_all.Rds") # All NPIS from  puf int files
npi_to_sg14v2 <- readRDS("./data/npi_to_sg14v2.Rds")# w'd to-NPIs-teaming data
npi_to_14all  <- readRDS("./data/npi_to_14all.Rds") # w'd to-NPIs-all puf data
team_pairs <- readRDS("./data/team_pairs.Rds")# team tax pairs, shared patients 
puf_pairs  <- readRDS("./data/puf_pairs.Rds") # puf tax pairs, shared patients
fr_tax_team <- readRDS("./data/fr_tax_team.Rds") # team from_tax
fr_tax_puf  <- readRDS("./data/fr_tax_puf.Rds")  # puf from tax
to_tax_team <- readRDS("./data/to_tax_team.Rds") #  team to tax
to_tax_puf  <- readRDS("./data/to_tax_puf.Rds")  #  puf to tax

NPI Analysis

NPI, the most basic form of provider meta data is our starting point for analysis.

Each NPI charts lump together from_npi values and to_npi values into a single npi variable, for teaming data and for puf data.

NPI Delta

This chart combines the PUF interval files for comparison with teaming data. Combining interval files seems reasonable as there is minimal (less than 1%) overlap among any pair of files.

npi_sg14v2 %>%  
  full_join(npi14_all, by = "npi") %>%
  mutate(n.x = as.double(n.x)) %>% 
  mutate(n.y = as.double(n.y)) %>%       
  mutate(delta = case_when( 
    (is.na(.$n.x)  &  is.na(.$n.y)) ~ 0.0,
    (is.na(.$n.x)  & !is.na(.$n.y)) ~ -.$n.y,
    (!is.na(.$n.y) &  is.na(.$n.y)) ~ .$n.x,
    (!is.na(.$n.x) & !is.na(.$n.y)) ~ (.$n.x - .$n.y))) %>% 
  select(npi, delta) %>% 
  ggplot(aes(npi, delta)) +
    geom_line() +
    annotate("text", 
             label = "NPI Delta: SG14V2 - PUF14_ALL", 
             x = 1.30e+09, 
             y = 1e+05, 
             size = 4)

Discussion

The NPI Delta chart concurs with the hypothesis that the PUF algorithm has double counted transactions (negative data points), and to a much lesser extent concurs with the hypothesis that **data sources are excluded from the current PUF“** (positive data points). It is possible that double counting has cancelled out some excluded data sources in this illustration.

NPI Weighted Delta

This chart compares the to_npi variable of the teaming data, and the npi2 variable of the puf data, weighted by their respective shared patient count value.

npi_to_sg14v2 <- npi_to_sg14v2 %>% rename(n = weight)
npi_to_14all  <- npi_to_14all  %>% rename(n = weight)

npi_to_sg14v2 %>%  
  full_join(npi_to_14all, by = "npi") %>%
  mutate(n.x = as.double(n.x)) %>% 
  mutate(n.y = as.double(n.y)) %>%       
  mutate(delta = case_when( 
    (is.na(.$n.x)  &  is.na(.$n.y)) ~ 0.0,
    (is.na(.$n.x)  & !is.na(.$n.y)) ~ -.$n.y,
    (!is.na(.$n.y) &  is.na(.$n.y)) ~ .$n.x,
    (!is.na(.$n.x) & !is.na(.$n.y)) ~ (.$n.x - .$n.y))) %>% 
  select(npi, delta) %>% 
  ggplot(aes(npi, delta)) +
    geom_line() +
    annotate("text", 
             label = "NPI Weighted Delta: SG14V2 - PUF14_ALL", 
             x = 1.25e+09, 
             y = -1.4e+08, 
             size = 3.5)

Discussion

The NPI Weighted Delta chart reinforces the NPI Delta chart’s concurrence with the hypothesis that the PUF algorithm has double counted transactions (negative data points), and to a much lesser extent concurs with the hypothesis that data sources are excluded from the current PUF (positive data points).

Taxonomy Analysis

NPI meta data analysis was starting point, and taxonomy meta data analysis may provide further insight in our hypotheses.

Taxonomy Pairs: Probable Transactions Overcounted in PUF

The following chart provides a ranking of taxonomy pairs (referred from - referred to) and their corresponding shared patient deltas. In this chart, Shared patient delta is defined as the shared patient count for the taxonomy pair in PUF data in excess of the shared patient count for the same taxonomy pair in Teaming data. The table lists the top forty (40) out of 87,432 positive-valued taxonomy pairs for this definition of shared patient delta.

df <- full_join(puf_pairs, team_pairs, by = c("tax1" = "tax1", "tax2" = "tax2")) %>% 
  mutate(sum.x = as.double(sum.x)) %>% 
  mutate(sum.y = as.double(sum.y)) %>%  
  mutate(delta = case_when(
    is.na(.$sum.x)  &  is.na(.$sum.y) ~ 0.0,
    is.na(.$sum.x)  & !is.na(.$sum.y) ~ -.$sum.y,
    !is.na(.$sum.x) &  is.na(.$sum.y) ~ .$sum.x,
    !is.na(.$sum.x) & !is.na(.$sum.y) ~ .$sum.x - .$sum.y)) %>% 
  select(tax1, tax2, delta) %>% 
  arrange(desc(delta)) %>% 
  rename(from_taxonomy = tax1, to_taxonomy = tax2) %>% 
  rename(shared_patient_delta = delta) 
  
 df %>% 
 # filter(shared_patient_delta >= 150000000) %>% 
  mutate(taxonomy_pairs = row_number() ) %>% 
  filter(taxonomy_pairs < 40) %>% 
  mutate(taxonomy_pair = paste(as.character(from_taxonomy), as.character(to_taxonomy), sep = " -- ")) %>% 
  mutate(taxonomy_pair = factor(taxonomy_pair)) %>% 
  ggplot(aes(x = reorder(taxonomy_pair, shared_patient_delta ), y = shared_patient_delta,
             fontface = "bold",
             size = 2,
             fill = 0.05,
             alpha = 0.01)) +
  geom_bar(stat = 'identity') +
  theme(legend.position = "none") +
  labs(x = "",  y = "shared_patient_delta") +
  coord_flip()

Taxonomy Pairs: Probable Transactions Excluded from PUF

The following chart provides a ranking of taxonomy pairs (referred from - referred to) and their corresponding shared patient deltas. In this chart, Shared patient delta is defined as the shared patient count for the taxonomy pair in Teaming data in excess of the shared patient count for the same taxonomy pair in PUF data. The chart lists the top thirty (30) out of 20,225 positive-valued taxonomy pairs for this definition of shared patient delta.

df %>% 
  mutate(shared_patient_delta = -shared_patient_delta) %>% 
  arrange(desc(shared_patient_delta)) %>% 
  mutate(taxonomy_pairs = row_number() ) %>% 
  filter(taxonomy_pairs < 30) %>% 
  mutate(taxonomy_pair = paste(as.character(from_taxonomy), as.character(to_taxonomy), sep = " -- ")) %>% 
  mutate(taxonomy_pair = factor(taxonomy_pair)) %>% 
  ggplot(aes(x = reorder(taxonomy_pair, shared_patient_delta ), y = shared_patient_delta,
             fontface = "bold",
             size = 2,
             fill = 0.05,
             alpha = 0.01)) +
  geom_bar(stat = 'identity') +
  theme(legend.position = "none") +
  labs(x = "",  y = "shared_patient_delta") +
  coord_flip()

Probable Source Taxonomies for Overcounted Transactions

This chart provides a ranking of probable source (referred from) taxonomies and their corresponding shared patient deltas. In this chart, shared patient delta is defined as the shared patient count for the source taxonomy in PUF data in excess of the shared patient count for the same source taxonomy in Teaming data.

fr_tax_puf %>% 
  full_join(fr_tax_team, by = "fr_tax")  %>% 
  mutate(n.x = as.double(n.x)) %>% 
  mutate(n.y = as.double(n.y)) %>%       
  mutate(delta = case_when( 
    (is.na(.$n.x)  &  is.na(.$n.y)) ~ 0.0,
    (is.na(.$n.x)  & !is.na(.$n.y)) ~ -.$n.y,
    (!is.na(.$n.y) &  is.na(.$n.y)) ~ .$n.x,
    (!is.na(.$n.x) & !is.na(.$n.y)) ~ (.$n.x - .$n.y))) %>% 
  select(fr_tax, delta) %>% 
  arrange(desc(delta)) %>% 
  head(20) %>% 
  rename(shared_patient_delta = delta) %>% 
  ggplot(aes(x = reorder(fr_tax, shared_patient_delta ), y = shared_patient_delta,
             fontface = "bold",
             size = 2,
             fill = 0.05,
             alpha = 0.01)) +
  geom_bar(stat = 'identity') +
  theme(legend.position = "none") +
  labs(x = "",  y = "shared_patient_delta") +
  coord_flip()

Probable Destination Taxonomies for Overcounted Transactions

This chart provides a ranking of probable destination (referred to) taxonomies and their corresponding shared patient deltas. In this chart, shared patient delta is defined as the shared patient count for the destination taxonomy in PUF data in excess of the shared patient count for the same destination taxonomy in Teaming data.

to_tax_puf %>% 
  full_join(to_tax_team, by = "to_tax")  %>% 
  mutate(n.x = as.double(n.x)) %>% 
  mutate(n.y = as.double(n.y)) %>%       
  mutate(delta = case_when( 
    (is.na(.$n.x)  &  is.na(.$n.y)) ~ 0.0,
    (is.na(.$n.x)  & !is.na(.$n.y)) ~ -.$n.y,
    (!is.na(.$n.y) &  is.na(.$n.y)) ~ .$n.x,
    (!is.na(.$n.x) & !is.na(.$n.y)) ~ (.$n.x - .$n.y))) %>% 
  select(to_tax, delta) %>% 
  arrange(desc(delta)) %>% 
  head(20) %>% 
  rename(shared_patient_delta = delta) %>% 
  ggplot(aes(x = reorder(to_tax, shared_patient_delta ), y = shared_patient_delta,
             fontface = "bold",
             size = 2,
             fill = 0.05,
             alpha = 0.01)) +
  geom_bar(stat = 'identity') +
  theme(legend.position = "none") +
  labs(x = "",  y = "shared_patient_delta") +
  coord_flip()

Probable Source Taxonomies for Excluded Transactions

This chart provides a ranking of probable source (referred from) taxonomies and their corresponding shared patient deltas. In this chart, shared patient delta is defined as the shared patient count for the source taxonomy in Teaming data in excess of the shared patient count for the same source taxonomy in PUF data.

fr_tax_puf %>% 
  full_join(fr_tax_team, by = "fr_tax")  %>% 
  mutate(n.x = as.double(n.x)) %>% 
  mutate(n.y = as.double(n.y)) %>%       
  mutate(delta = case_when( 
    (is.na(.$n.x)  &  is.na(.$n.y)) ~ 0.0,
    (is.na(.$n.x)  & !is.na(.$n.y)) ~ -.$n.y,
    (!is.na(.$n.y) &  is.na(.$n.y)) ~ .$n.x,
    (!is.na(.$n.x) & !is.na(.$n.y)) ~ (.$n.x - .$n.y))) %>% 
  select(fr_tax, delta) %>%
  mutate(delta = -delta) %>% 
  filter(delta > 0) %>% 
  arrange(desc(delta)) %>% 
  head(8) %>% 
  rename(shared_patient_delta = delta) %>% 
  ggplot(aes(x = reorder(fr_tax, shared_patient_delta ), y = shared_patient_delta,
             fontface = "bold",
             size = 2,
             fill = 0.05,
             alpha = 0.01)) +
  geom_bar(stat = 'identity') +
  theme(legend.position = "none") +
  labs(x = "",  y = "shared_patient_delta") +
  coord_flip()

Probable Destination Taxonomies for Excluded Transactions

This chart provides a ranking of probable destination (referred to) taxonomies and their corresponding shared patient deltas. In this chart. Shared patient delta is defined as the shared patient count for the destination taxonomy in Teaming data in excess of the shared patient count for the same destination taxonomy in PUF data.

to_tax_puf %>% 
  full_join(to_tax_team, by = "to_tax")  %>% 
  mutate(n.x = as.double(n.x)) %>% 
  mutate(n.y = as.double(n.y)) %>%       
  mutate(delta = case_when( 
    (is.na(.$n.x)  &  is.na(.$n.y)) ~ 0.0,
    (is.na(.$n.x)  & !is.na(.$n.y)) ~ -.$n.y,
    (!is.na(.$n.y) &  is.na(.$n.y)) ~ .$n.x,
    (!is.na(.$n.x) & !is.na(.$n.y)) ~ (.$n.x - .$n.y))) %>% 
  select(to_tax, delta) %>% 
  mutate(delta = -delta) %>% 
  filter(delta > 0) %>% 
  arrange(desc(delta)) %>% 
  head(8) %>% 
  rename(shared_patient_delta = delta) %>% 
  ggplot(aes(x = reorder(to_tax, shared_patient_delta ), y = shared_patient_delta,
             fontface = "bold",
             size = 2,
             fill = 0.05,
             alpha = 0.01)) +
  geom_bar(stat = 'identity') +
  theme(legend.position = "none") +
  labs(x = "",  y = "shared_patient_delta") +
  coord_flip()

Further Analysis

Options for further analysis include: a) spatial representation of various forms of shared patient delta; b) NPI ranking of probable sources of over-counted and excluded transactions.

Exploratory Information

SG14V2 - PUF14_180: All-NPI Delta

npi_sg14v2 %>%  
  full_join(npi14_180, by = "npi") %>% 
  mutate(n.x = as.double(n.x)) %>% 
  mutate(n.y = as.double(n.y)) %>%       
  mutate(delta = case_when(
    is.na(.$n.x)  &  is.na(.$n.y) ~ 0.0,
    is.na(.$n.x)  & !is.na(.$n.y) ~ -.$n.y,
    !is.na(.$n.x) &  is.na(.$n.y) ~ .$n.x,
    !is.na(.$n.x) & !is.na(.$n.y) ~ .$n.x - .$n.y)) %>% 
  select(npi, delta) %>% 
  ggplot(aes(npi, delta)) +
    geom_line() +
    annotate("text", 
             label = "All-NPI Delta: SG14V2 - PUF14_180", 
             x = 1.30e+09, 
             y = 5e+05, 
             size = 4)

SG14V2 All-NPI Density

ggplot(npi_sg14v2, aes(npi, n)) +
  geom_line() +
  annotate("text", label = "All-NPI in SG14v2 file", x = 1.25e+09, y = 5e+05, size = 4)

PUF14_ALL All-NPI Density

ggplot(npi14_all, aes(npi, n)) +
  geom_line() +
  annotate("text", label = "All-NPI in PUF14_ALL file",x = 1.25e+09, y = 1.1e+06, size = 4)

PUF14_180 All-NPI Density

ggplot(npi14_180, aes(npi, n)) +
  geom_line() +
  annotate("text", label = "All-NPI in PUF14_180 file", x = 1.25e+09, y = 4e+05, size = 4)

Teaming v. PUF

John Williams, john.b.williams@gmail.com

May 10, 2017

Preamble

Executive Summary

NPI Analysis

NPI Delta

Discussion

NPI Weighted Delta

Discussion

Taxonomy Analysis

Taxonomy Pairs: Probable Transactions Overcounted in PUF

Taxonomy Pairs: Probable Transactions Excluded from PUF

Probable Source Taxonomies for Overcounted Transactions

Probable Destination Taxonomies for Overcounted Transactions

Probable Source Taxonomies for Excluded Transactions

Probable Destination Taxonomies for Excluded Transactions

Further Analysis

Exploratory Information

SG14V2 - PUF14_180: All-NPI Delta

SG14V2 All-NPI Density

PUF14_ALL All-NPI Density

PUF14_180 All-NPI Density