R Notebook

library(scatterplot3d)

## Warning: package 'scatterplot3d' was built under R version 3.4.4

Model of AR over time

AR is modeled using two simple difference equations, with ARI representing the portion of AR that is out to insurance companies, and ARP representing the portion of AR that is out to patients. Patients without insurance are assumed to pay at the time of service and do not contribute to AR in this model.

ARI is the value of daily production that is attributed to patients with insurance, $\alpha production[t]$, plus the ARI from the previous time step, $ARI[t]$, minus the value of claims for which EOBs are received, $\beta ARI[t]$. The value of $\alpha$ is a point estimate while the value of $\beta$ is a distribution.

\[ ARI[t+1] = \alpha production[t] + ARI[t] - \beta ARI[t] \]

ARP is the value of claims for which EOBs are received, $\beta ARI[t]$, which causes bills to be sent to patients, minus the payments received from patients, $\delta ARP[t]$. The value of $\gamma$ is a point estimate for the proportion of insurance production that is written off for PPO adjustments, and the value of $\delta$ is a distribution.

\[ ARP[t+1] = ARP[t] + (\beta ARI[t])(1 - \gamma) - \delta ARP[t] \]

Total AR is the sum of these values.

Parameter estimation

Distributions

The number of business days to receive an EOB after a procedure date and the number of days to receive a patient payment after receiving an EOB were recorded for all claims from random samples of days.

setwd('C:/Users/Jack/Documents/MS Stats/568 Industrial')
myData <- read.csv('DistData.csv')
head(myData)

eob <- na.omit(myData$EOBdays)
pat <- na.omit(myData$PATdays)
length(eob)

## [1] 38

length(pat)

## [1] 23

Four days were sampled for EOB times from 01/01/2018-04/30/2018 to generate 38 sample units. Eight days were sample for patient payment times from 09/01/2018-12/31/2018 to generate 23 sample units.

For time until payment, the data are naturally positive and skewed, making the log-normal distribution a natural choice. The parameters for log-normal distributions of days for each process are estimated using the maximum likelihood estimators of the mean and the standard deviation of the log-transformed data.

(mean.eob <- mean(log(eob)))

## [1] 3.119234

(sd.eob <- sd(log(eob)))

## [1] 0.360731

mean(eob)

## [1] 24.15789

mean(eob) + 1.96*sqrt(var(eob)/length(eob))

## [1] 27.14837

mean(eob) - 1.96*sqrt(var(eob)/length(eob))

## [1] 21.16742

REMOVE LATER

set.seed(1)
eob <- rlnorm(38, log(35), .6)
(mean.eob <- mean(log(eob)))

## [1] 3.584051

(sd.eob <- sd(log(eob)))

## [1] 0.5322896

For EOB times, the mean of the ln(number of days) is 3.1192, and the standard deviation of the log(number of days) is 0.3607.

myBreaks <- seq(from = 0, to = 365, by = 10)
x <- seq(0, 365, by = 0.01)
hist(eob, 
  breaks = myBreaks, 
  freq = FALSE,
  ylim = c(0, 0.06),
  col = 'grey',
  xlab = 'Number of days',
  main = 'Time to receive EOB'
)
lines(x, 
  dlnorm(x, mean.eob, sd.eob), 
  col = 'deepskyblue3', 
  lwd = 2
)

The line shows the probability density function for a log-normal distribution parameteerized using the estimates of the mean and standard deviation from the data above. The fit seems adequate.

The probability of payment is just the inverse of the number of days to get paid:

beta <- 1/eob
mean(beta)

## [1] 0.03232886

This is the distribution that will be sampled at each time step for a value of $\beta$.

(mean.pat <- mean(log(pat)))

## [1] 3.231312

(sd.pat <- sd(log(pat)))

## [1] 0.8276991

REMOVE LATER

set.seed(1)
pat <- rlnorm(38, log(60), .6)
(mean.pat <- mean(log(pat)))

## [1] 4.123047

(sd.pat <- sd(log(pat)))

## [1] 0.5322896

For patient payment times, the mean of the ln(number of days) is 3.2313, and the standard deviation of the log(number of days) is 0.8277. The two means are very similar, probably times are subject to similar processes of sending correspondence and receiving mail. The standard deviation for the patient payments, however, is almost three times larger than for the EOBs.

myBreaks <- seq(from = 0, to = 365, by = 10)
x <- seq(0, 365, 0.01)
hist(pat, 
  breaks = myBreaks, 
  freq = FALSE, 
  ylim = c(0, 0.06),
  col = 'grey',
  xlab = 'Number of days',
  main = 'Time to receive patient payment'
)
lines(x, 
  dlnorm(x, mean.pat, sd.pat), 
  col = 'deepskyblue3', 
  lwd = 2
)

The fit here is not as good, but the estimates are used to parameterize the model anyway because of the logical connections between the process and the log-normal distribution.

The probability of payment is just the inverse of the number of days to get paid:

delta <- 1/pat
mean(delta)

## [1] 0.0188585

mean(pat)

## [1] 69.94152

mean(pat) + 1.96*sqrt(var(pat)/length(pat))

## [1] 80.69921

mean(pat) - 1.96*sqrt(var(pat)/length(pat))

## [1] 59.18383

This is the distribution that will be sampled at each time step for a value of $\delta$.

hist(delta)

Point estimates

The collection ratio, $\alpha$ was estimated by comparing gross production for patients with insurance to total gross production for the period 01/01/2018-04/30/2018.

\[ \hat\alpha = \frac{148904.06}{369328.69} = 0.4032 \] The proportion of gross insurance production that is written off for PPO price contracts, $\gamma$ was estimated by comparing the total PPO writeoffs to the gross production for patients with insurance for the period 01/01/2018-04/30/2018.

\[ \hat\gamma = \frac{58402.24}{148904.06} = 0.3922 \]

Production

Total daily gross production data was retrieved for the period 01/01/2017-04/30/2018.

daily.prod <- read.csv('DailyProd.csv', header = TRUE)
head(daily.prod)

prod <- as.numeric(daily.prod$Production)
hist(prod, col = 'grey', xlab = 'Dollars', freq = FALSE, main = 'Daily gross production')

There are many small production days because the office typically only sees emergencies on Fridays.

By ignoring production days lower than $1000, the distribution appears to be fit well by a log-normal distribution.

hist(prod[prod > 1000], 
  col = 'grey', 
  xlab = 'Dollars', 
  freq = FALSE, 
  main = 'Daily gross production',
  ylim = c(0, 0.0003)
)
x <- seq(1000, 14000, 1)
lines(x, 
  dlnorm(x, mean(log(prod[prod > 1000])), sd(log(prod[prod > 1000]))), 
  col = 'deepskyblue3', 
  lwd = 2
)

Model run

The model is run using the historical production time series with the parameters estimated above.

set.seed(1)
days <- length(prod) # model will run over the historical data for production
alpha <- 0.4032
beta.draws <- sample(beta, size = days, replace = TRUE)
gamma <- 0.3922
delta.draws <- sample(delta, size = days, replace = TRUE)
ARI <- rep(NA, days)
ARP <- rep(NA, days)
ARI[1] <- 50000
ARP[1] <- 50000

for(i in 1:days){ARI[i + 1] <- prod[i] * alpha + ARI[i] - beta.draws[i] * ARI[i]}
for(i in 1:days){ARP[i + 1] <- ARP[i] + ((beta.draws[i] * ARI[i])*(1 - gamma)) - delta.draws[i] * ARP[i]}
AR <- ARI + ARP

plot(1:(days + 1), AR, xlab = 'Day')

This gives a picture of the current situation. The actual AR is close to the model results, so the model seems adequate.

round(
    c(quantile(AR[50:298], probs = 0.05, na.rm = TRUE), 
      Mean = mean(AR[50:298]), 
      quantile(AR[50:298], probs = 0.95, na.rm = TRUE)
    )
)

##     5%   Mean    95% 
##  88908  99011 105545

Next the model is run with target parameters, to get a distribution for AR with improved processes.

set.seed(1)
newprod <- sample(as.numeric(daily.prod$Production), 2600, replace = TRUE)
days <- length(newprod)
alpha <- 0.4032 # this cannot change without a change in the dental practice overall strategy
beta.draws <- 1 / rlnorm(days, log(35), .3607)
gamma <- 0.3922 # this cannot change without a change in the dental practice overall strategy
delta.draws <- 1 / rlnorm(days, log(60), .8277)
ARI <- rep(NA, days)
ARP <- rep(NA, days)
ARI[1] <- 40000
ARP[1] <- 40000

for(i in 1:days){ARI[i + 1] <- newprod[i] * alpha + ARI[i] - beta.draws[i] * ARI[i]}
for(i in 1:days){ARP[i + 1] <- ARP[i] + ((beta.draws[i] * ARI[i]) * (1 - gamma)) - delta.draws[i] * ARP[i]}
AR1 <- ARI + ARP

for(i in 1:days){ARI[i + 1] <- newprod[i] * alpha + ARI[i] - beta.draws[i] * ARI[i]}
AR2 <- ARI

plot(1:(days + 1), AR1, type = 'l', ylim = c(0, 120000), xlab = 'Day', ylab = 'AR', col = 'red')
lines(1:(days + 1), AR2, col = 'darkgreen')
legend('bottomright', legend = list('Status quo', 'Autopay'), fill = c('red', 'darkgreen'))

round(
    c(quantile(AR, probs = 0.05, na.rm = TRUE), 
      Mean = mean(AR), 
      quantile(AR, probs = 0.95, na.rm = TRUE)
    )
)

##     5%   Mean    95% 
##  89515 101000 109925

ARsim <- function(n, mn.eob, sd.eob, mn.pat, sd.pat){
  newprod <- sample(as.numeric(daily.prod$Production[daily.prod$Production > 0]), n, replace = TRUE)
  days <- length(newprod)
  alpha <- 0.4032 # this cannot change without a change in the dental practice overall strategy
  beta.draws <- 1 / rlnorm(days, mn.eob, sd.eob)
  gamma <- 0.3922 # this cannot change without a change in the dental practice overall strategy
  delta.draws <- 1 / rlnorm(days, mn.pat, sd.pat)
  ARI <- rep(NA, days)
  ARP <- rep(NA, days)
  ARI[1] <- 40000
  ARP[1] <- 40000
  
  for(i in 1:days){ARI[i + 1] <- newprod[i] * alpha + ARI[i] - beta.draws[i] * ARI[i]}
  for(i in 1:days){ARP[i + 1] <- ARP[i] + ((beta.draws[i] * ARI[i]) * (1 - gamma)) - delta.draws[i] * ARP[i]}
  AR <- ARI + ARP
  
  round(
      c(quantile(AR, probs = 0.05, na.rm = TRUE), 
        Mean = mean(AR), 
        quantile(AR, probs = 0.95, na.rm = TRUE)
      )
  )
}

set.seed(1)
ARsim(n = 2600, mn.eob = log(35), sd.eob = .3607, mn.pat = log(60), sd.pat = .8277)

##     5%   Mean    95% 
##  88313 101731 113794

ARsim(n = 2600, mn.eob = log(35), sd.eob = .3607, mn.pat = 0, sd.pat = .8277)

##    5%  Mean   95% 
## 50249 58782 68170

ARsim(n = 2600, mn.eob = log(35), sd.eob = .3607, mn.pat = log(60), sd.pat = .8277)[2] - 
  ARsim(n = 2600, mn.eob = log(35), sd.eob = .3607, mn.pat = 0, sd.pat = .8277)[2]

##  Mean 
## 44874

mn.eob <- ((7*2):(7*8))
mn.pat <- ((7*2):(7*8))
sim.pars <- expand.grid(list(mn.eob, mn.pat))
nrow(sim.pars)

## [1] 1849

for(i in 1:nrow(sim.pars)){
  sim.pars[i, 3] <- ARsim(n = 2600, mn.eob = log(sim.pars[i, 1]), sd.eob = .3607, mn.pat = log(sim.pars[i, 2]), sd.pat = .8277)[2]
}

s3d <-scatterplot3d(sim.pars, pch = 16, highlight.3d = TRUE,
  type = "p", main = "",
  xlab = 'Mean days EOB',
  ylab = 'Mean days patient',
  zlab = 'AR x 1000',
  angle = 70,
  z.ticklabs = seq(0, 120, by = 20)
)
fit <- lm(sim.pars[, 3] ~ sim.pars[, 1] + sim.pars[, 2])
s3d$plane3d(fit)

summary(lm(sim.pars[, 3] ~ sim.pars[, 1] + sim.pars[, 2] + sim.pars[, 1]:sim.pars[, 2]))

## 
## Call:
## lm(formula = sim.pars[, 3] ~ sim.pars[, 1] + sim.pars[, 2] + 
##     sim.pars[, 1]:sim.pars[, 2])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4324.2  -730.0     0.0   721.7  5297.7 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 1117.87183  240.05892   4.657 3.44e-06 ***
## sim.pars[, 1]               1615.04464    6.46451 249.832  < 2e-16 ***
## sim.pars[, 2]                755.93738    6.46451 116.937  < 2e-16 ***
## sim.pars[, 1]:sim.pars[, 2]   -0.05868    0.17408  -0.337    0.736    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1153 on 1845 degrees of freedom
## Multiple R-squared:  0.9973, Adjusted R-squared:  0.9973 
## F-statistic: 2.264e+05 on 3 and 1845 DF,  p-value: < 2.2e-16

Every increase in mean time to receive an insurance payment by one day increases the AR by $2,361. Every increase in mean time to receive a patient payment by one day increases the AR by $747.

ARsim(n = 2600, mn.eob = 3.1192, sd.eob = .3607, mn.pat = 0, sd.pat = 0)

##    5%  Mean   95% 
## 32162 38364 44931