Information is sought on transactions that may have been over-counted in the PUF, and data sources that may have been excluded from the PUF.
This analysis examines differences between Teaming data and PUF data, supporting two hypotheses:
Transactions over-counted in the PUF
Data sources excluded from the PUF.
Provider meta data is the basis for analysis - NPI and Taxonomy. The result of this analysis provides taxonomy data types - paired, source, destination - ranked by likelihood of supporting the two hypotheses.
For example, the taxonomy pair General Acute Care Hospital – Internal Medicine is the most likely taxonomy pair to be included in transactions over-counted in the PUF. Forty such pairs are provided, listed in order of likelihood.
Similarly, Diagnostic Radiology – Durable Medical Equipment & Medical Supplies is the most likely taxonomy pair to be included in transactions excluded from the PUF. Thirty such pairs are provided, listed in order of likelihood.
Internal Medicine is the most likely source taxonomy, as well as most likely destination taxonomy to be included in transactions over-counted in the PUF. Twenty taxonomies are listed, for source and destination, in order of likelihood.
Durable Medical Equipment & Medical Supplies is the most likely source taxonomy, as well as most likely destination taxonomy to be included in transactions excluded from the PUF. Eight taxonomies are listed, for source and destination, in order of likelihood.
Further analysis such spatial mapping and NPI ranking, is possible.
Hopefully this analysis provides useful insight for resolving the two hypotheses.
# ingest pre-processed data
npi_sg14v2 <- readRDS("./data/npi_sg14v2.Rds")# All NPIs from teaming data
npi14_180 <- readRDS("./data/npi14_180.Rds") # All NPIs from puf 180 day file
npi14_all <- readRDS("./data/npi14_all.Rds") # All NPIS from puf int files
npi_to_sg14v2 <- readRDS("./data/npi_to_sg14v2.Rds")# w'd to-NPIs-teaming data
npi_to_14all <- readRDS("./data/npi_to_14all.Rds") # w'd to-NPIs-all puf data
team_pairs <- readRDS("./data/team_pairs.Rds")# team tax pairs, shared patients
puf_pairs <- readRDS("./data/puf_pairs.Rds") # puf tax pairs, shared patients
fr_tax_team <- readRDS("./data/fr_tax_team.Rds") # team from_tax
fr_tax_puf <- readRDS("./data/fr_tax_puf.Rds") # puf from tax
to_tax_team <- readRDS("./data/to_tax_team.Rds") # team to tax
to_tax_puf <- readRDS("./data/to_tax_puf.Rds") # puf to taxNPI, the most basic form of provider meta data is our starting point for analysis.
Each NPI charts lump together from_npi values and to_npi values into a single npi variable, for teaming data and for puf data.
This chart combines the PUF interval files for comparison with teaming data. Combining interval files seems reasonable as there is minimal (less than 1%) overlap among any pair of files.
npi_sg14v2 %>%
full_join(npi14_all, by = "npi") %>%
mutate(n.x = as.double(n.x)) %>%
mutate(n.y = as.double(n.y)) %>%
mutate(delta = case_when(
(is.na(.$n.x) & is.na(.$n.y)) ~ 0.0,
(is.na(.$n.x) & !is.na(.$n.y)) ~ -.$n.y,
(!is.na(.$n.y) & is.na(.$n.y)) ~ .$n.x,
(!is.na(.$n.x) & !is.na(.$n.y)) ~ (.$n.x - .$n.y))) %>%
select(npi, delta) %>%
ggplot(aes(npi, delta)) +
geom_line() +
annotate("text",
label = "NPI Delta: SG14V2 - PUF14_ALL",
x = 1.30e+09,
y = 1e+05,
size = 4)The NPI Delta chart concurs with the hypothesis that the PUF algorithm has double counted transactions (negative data points), and to a much lesser extent concurs with the hypothesis that **data sources are excluded from the current PUF“** (positive data points). It is possible that double counting has cancelled out some excluded data sources in this illustration.
This chart compares the to_npi variable of the teaming data, and the npi2 variable of the puf data, weighted by their respective shared patient count value.
npi_to_sg14v2 <- npi_to_sg14v2 %>% rename(n = weight)
npi_to_14all <- npi_to_14all %>% rename(n = weight)
npi_to_sg14v2 %>%
full_join(npi_to_14all, by = "npi") %>%
mutate(n.x = as.double(n.x)) %>%
mutate(n.y = as.double(n.y)) %>%
mutate(delta = case_when(
(is.na(.$n.x) & is.na(.$n.y)) ~ 0.0,
(is.na(.$n.x) & !is.na(.$n.y)) ~ -.$n.y,
(!is.na(.$n.y) & is.na(.$n.y)) ~ .$n.x,
(!is.na(.$n.x) & !is.na(.$n.y)) ~ (.$n.x - .$n.y))) %>%
select(npi, delta) %>%
ggplot(aes(npi, delta)) +
geom_line() +
annotate("text",
label = "NPI Weighted Delta: SG14V2 - PUF14_ALL",
x = 1.25e+09,
y = -1.4e+08,
size = 3.5)The NPI Weighted Delta chart reinforces the NPI Delta chart’s concurrence with the hypothesis that the PUF algorithm has double counted transactions (negative data points), and to a much lesser extent concurs with the hypothesis that data sources are excluded from the current PUF (positive data points).
NPI meta data analysis was starting point, and taxonomy meta data analysis may provide further insight in our hypotheses.
The following chart provides a ranking of taxonomy pairs (referred from - referred to) and their corresponding shared patient deltas. In this chart, Shared patient delta is defined as the shared patient count for the taxonomy pair in PUF data in excess of the shared patient count for the same taxonomy pair in Teaming data. The table lists the top forty (40) out of 87,432 positive-valued taxonomy pairs for this definition of shared patient delta.
df <- full_join(puf_pairs, team_pairs, by = c("tax1" = "tax1", "tax2" = "tax2")) %>%
mutate(sum.x = as.double(sum.x)) %>%
mutate(sum.y = as.double(sum.y)) %>%
mutate(delta = case_when(
is.na(.$sum.x) & is.na(.$sum.y) ~ 0.0,
is.na(.$sum.x) & !is.na(.$sum.y) ~ -.$sum.y,
!is.na(.$sum.x) & is.na(.$sum.y) ~ .$sum.x,
!is.na(.$sum.x) & !is.na(.$sum.y) ~ .$sum.x - .$sum.y)) %>%
select(tax1, tax2, delta) %>%
arrange(desc(delta)) %>%
rename(from_taxonomy = tax1, to_taxonomy = tax2) %>%
rename(shared_patient_delta = delta)
df %>%
# filter(shared_patient_delta >= 150000000) %>%
mutate(taxonomy_pairs = row_number() ) %>%
filter(taxonomy_pairs < 40) %>%
mutate(taxonomy_pair = paste(as.character(from_taxonomy), as.character(to_taxonomy), sep = " -- ")) %>%
mutate(taxonomy_pair = factor(taxonomy_pair)) %>%
ggplot(aes(x = reorder(taxonomy_pair, shared_patient_delta ), y = shared_patient_delta,
fontface = "bold",
size = 2,
fill = 0.05,
alpha = 0.01)) +
geom_bar(stat = 'identity') +
theme(legend.position = "none") +
labs(x = "", y = "shared_patient_delta") +
coord_flip() The following chart provides a ranking of taxonomy pairs (referred from - referred to) and their corresponding shared patient deltas. In this chart, Shared patient delta is defined as the shared patient count for the taxonomy pair in Teaming data in excess of the shared patient count for the same taxonomy pair in PUF data. The chart lists the top thirty (30) out of 20,225 positive-valued taxonomy pairs for this definition of shared patient delta.
df %>%
mutate(shared_patient_delta = -shared_patient_delta) %>%
arrange(desc(shared_patient_delta)) %>%
mutate(taxonomy_pairs = row_number() ) %>%
filter(taxonomy_pairs < 30) %>%
mutate(taxonomy_pair = paste(as.character(from_taxonomy), as.character(to_taxonomy), sep = " -- ")) %>%
mutate(taxonomy_pair = factor(taxonomy_pair)) %>%
ggplot(aes(x = reorder(taxonomy_pair, shared_patient_delta ), y = shared_patient_delta,
fontface = "bold",
size = 2,
fill = 0.05,
alpha = 0.01)) +
geom_bar(stat = 'identity') +
theme(legend.position = "none") +
labs(x = "", y = "shared_patient_delta") +
coord_flip() This chart provides a ranking of probable source (referred from) taxonomies and their corresponding shared patient deltas. In this chart, shared patient delta is defined as the shared patient count for the source taxonomy in PUF data in excess of the shared patient count for the same source taxonomy in Teaming data.
fr_tax_puf %>%
full_join(fr_tax_team, by = "fr_tax") %>%
mutate(n.x = as.double(n.x)) %>%
mutate(n.y = as.double(n.y)) %>%
mutate(delta = case_when(
(is.na(.$n.x) & is.na(.$n.y)) ~ 0.0,
(is.na(.$n.x) & !is.na(.$n.y)) ~ -.$n.y,
(!is.na(.$n.y) & is.na(.$n.y)) ~ .$n.x,
(!is.na(.$n.x) & !is.na(.$n.y)) ~ (.$n.x - .$n.y))) %>%
select(fr_tax, delta) %>%
arrange(desc(delta)) %>%
head(20) %>%
rename(shared_patient_delta = delta) %>%
ggplot(aes(x = reorder(fr_tax, shared_patient_delta ), y = shared_patient_delta,
fontface = "bold",
size = 2,
fill = 0.05,
alpha = 0.01)) +
geom_bar(stat = 'identity') +
theme(legend.position = "none") +
labs(x = "", y = "shared_patient_delta") +
coord_flip() This chart provides a ranking of probable destination (referred to) taxonomies and their corresponding shared patient deltas. In this chart, shared patient delta is defined as the shared patient count for the destination taxonomy in PUF data in excess of the shared patient count for the same destination taxonomy in Teaming data.
to_tax_puf %>%
full_join(to_tax_team, by = "to_tax") %>%
mutate(n.x = as.double(n.x)) %>%
mutate(n.y = as.double(n.y)) %>%
mutate(delta = case_when(
(is.na(.$n.x) & is.na(.$n.y)) ~ 0.0,
(is.na(.$n.x) & !is.na(.$n.y)) ~ -.$n.y,
(!is.na(.$n.y) & is.na(.$n.y)) ~ .$n.x,
(!is.na(.$n.x) & !is.na(.$n.y)) ~ (.$n.x - .$n.y))) %>%
select(to_tax, delta) %>%
arrange(desc(delta)) %>%
head(20) %>%
rename(shared_patient_delta = delta) %>%
ggplot(aes(x = reorder(to_tax, shared_patient_delta ), y = shared_patient_delta,
fontface = "bold",
size = 2,
fill = 0.05,
alpha = 0.01)) +
geom_bar(stat = 'identity') +
theme(legend.position = "none") +
labs(x = "", y = "shared_patient_delta") +
coord_flip() This chart provides a ranking of probable source (referred from) taxonomies and their corresponding shared patient deltas. In this chart, shared patient delta is defined as the shared patient count for the source taxonomy in Teaming data in excess of the shared patient count for the same source taxonomy in PUF data.
fr_tax_puf %>%
full_join(fr_tax_team, by = "fr_tax") %>%
mutate(n.x = as.double(n.x)) %>%
mutate(n.y = as.double(n.y)) %>%
mutate(delta = case_when(
(is.na(.$n.x) & is.na(.$n.y)) ~ 0.0,
(is.na(.$n.x) & !is.na(.$n.y)) ~ -.$n.y,
(!is.na(.$n.y) & is.na(.$n.y)) ~ .$n.x,
(!is.na(.$n.x) & !is.na(.$n.y)) ~ (.$n.x - .$n.y))) %>%
select(fr_tax, delta) %>%
mutate(delta = -delta) %>%
filter(delta > 0) %>%
arrange(desc(delta)) %>%
head(8) %>%
rename(shared_patient_delta = delta) %>%
ggplot(aes(x = reorder(fr_tax, shared_patient_delta ), y = shared_patient_delta,
fontface = "bold",
size = 2,
fill = 0.05,
alpha = 0.01)) +
geom_bar(stat = 'identity') +
theme(legend.position = "none") +
labs(x = "", y = "shared_patient_delta") +
coord_flip() This chart provides a ranking of probable destination (referred to) taxonomies and their corresponding shared patient deltas. In this chart. Shared patient delta is defined as the shared patient count for the destination taxonomy in Teaming data in excess of the shared patient count for the same destination taxonomy in PUF data.
to_tax_puf %>%
full_join(to_tax_team, by = "to_tax") %>%
mutate(n.x = as.double(n.x)) %>%
mutate(n.y = as.double(n.y)) %>%
mutate(delta = case_when(
(is.na(.$n.x) & is.na(.$n.y)) ~ 0.0,
(is.na(.$n.x) & !is.na(.$n.y)) ~ -.$n.y,
(!is.na(.$n.y) & is.na(.$n.y)) ~ .$n.x,
(!is.na(.$n.x) & !is.na(.$n.y)) ~ (.$n.x - .$n.y))) %>%
select(to_tax, delta) %>%
mutate(delta = -delta) %>%
filter(delta > 0) %>%
arrange(desc(delta)) %>%
head(8) %>%
rename(shared_patient_delta = delta) %>%
ggplot(aes(x = reorder(to_tax, shared_patient_delta ), y = shared_patient_delta,
fontface = "bold",
size = 2,
fill = 0.05,
alpha = 0.01)) +
geom_bar(stat = 'identity') +
theme(legend.position = "none") +
labs(x = "", y = "shared_patient_delta") +
coord_flip() Options for further analysis include: a) spatial representation of various forms of shared patient delta; b) NPI ranking of probable sources of over-counted and excluded transactions.
npi_sg14v2 %>%
full_join(npi14_180, by = "npi") %>%
mutate(n.x = as.double(n.x)) %>%
mutate(n.y = as.double(n.y)) %>%
mutate(delta = case_when(
is.na(.$n.x) & is.na(.$n.y) ~ 0.0,
is.na(.$n.x) & !is.na(.$n.y) ~ -.$n.y,
!is.na(.$n.x) & is.na(.$n.y) ~ .$n.x,
!is.na(.$n.x) & !is.na(.$n.y) ~ .$n.x - .$n.y)) %>%
select(npi, delta) %>%
ggplot(aes(npi, delta)) +
geom_line() +
annotate("text",
label = "All-NPI Delta: SG14V2 - PUF14_180",
x = 1.30e+09,
y = 5e+05,
size = 4)ggplot(npi_sg14v2, aes(npi, n)) +
geom_line() +
annotate("text", label = "All-NPI in SG14v2 file", x = 1.25e+09, y = 5e+05, size = 4) ggplot(npi14_all, aes(npi, n)) +
geom_line() +
annotate("text", label = "All-NPI in PUF14_ALL file",x = 1.25e+09, y = 1.1e+06, size = 4)ggplot(npi14_180, aes(npi, n)) +
geom_line() +
annotate("text", label = "All-NPI in PUF14_180 file", x = 1.25e+09, y = 4e+05, size = 4)