FYI: If someone was told to go to the ED then we make their business days until appoint == 0.

tyler install

Read in data

## Rows: 1,140
## Columns: 45
## $ NPI                                     <dbl> 1265759062, 1265759062, 108300…
## $ age                                     <dbl> 53, 53, 36, 36, 51, 51, 52, 52…
## $ age_category                            <ord> 50 to 59 years old, 50 to 59 y…
## $ gender                                  <fct> Female, Female, Female, Female…
## $ Med_sch                                 <fct> US Senior Medical Student, US …
## $ Grd_yr                                  <dbl> 2010, 2010, 2015, 2015, 1998, …
## $ academic                                <fct> Private Practice, Private Prac…
## $ ACOG_District                           <fct> District V, District V, Distri…
## $ cbsatype10                              <fct> Metro, Metro, Metro, Metro, NA…
## $ scenario                                <fct> Prior trip to ED and was found…
## $ scenario_type                           <fct> Emergent, Emergent, Emergent, …
## $ insurance                               <fct> Blue Cross/Blue Shield, Medica…
## $ including_this_physician_in_the_study   <fct> No, No, No, Yes, No, No, Yes, …
## $ told_to_go_to_the_emergency_department  <fct> No, No, No, No, No, No, No, No…
## $ offered_a_clinic_appointment_to_be_seen <fct> No, No, No, Yes, No, No, Yes, …
## $ reason_for_exclusions                   <fct> Went to voicemail, Number cont…
## $ central_number                          <fct> No, No, Yes, No, No, No, No, N…
## $ number_of_transfers                     <fct> No transfers, No transfers, No…
## $ call_time_minutes                       <dbl> NA, NA, 1.4, 2.3, NA, 0.8, 1.0…
## $ hold_time_minutes                       <dbl> NA, NA, 0.0, 0.1, NA, NA, NA, …
## $ Provider.Enumeration.Date               <dbl> 2010, 2010, 2015, 2015, 2005, …
## $ day_of_the_week                         <ord> Thursday, Tuesday, Thursday, T…
## $ business_days_until_appointment         <dbl> NA, NA, NA, 1, NA, NA, 28, NA,…
## $ state                                   <chr> "Texas", "Texas", "Washington"…
## $ zip                                     <chr> "48001", "48001", "83535", "83…
## $ lat                                     <dbl> 42.63923, 42.63923, 46.53419, …
## $ lng                                     <dbl> -82.58170, -82.58170, -116.724…
## $ record_id                               <dbl> 1072, 201, 861, 296, 391, 1097…
## $ id_number                               <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,…
## $ Is.Sole.Proprietor                      <chr> "N", "N", "Y", "Y", "N", "N", …
## $ Grd_yr_category                         <fct> 2010 or greater, 2010 or great…
## $ Provider.Credential.Text                <chr> "MD", "MD", "DO", "DO", "MD", …
## $ median_household_income                 <dbl> 63272, 63272, 59414, 59414, NA…
## $ Medicaid_to_Medicare_Fee_Index          <dbl> 58, 58, 63, 63, 74, 74, 49, 49…
## $ basic_first_name                        <chr> NA, NA, "ABIGAIL", "ABIGAIL", …
## $ basic_last_name                         <chr> NA, NA, "PREST", "PREST", "RAS…
## $ basic_middle_name                       <chr> NA, NA, "ROSE", "ROSE", NA, NA…
## $ basic_credential                        <chr> NA, NA, "D.O.", "D.O.", "MD", …
## $ basic_sole_proprietor                   <chr> NA, NA, "NO", "NO", "NO", "NO"…
## $ basic_gender                            <chr> NA, NA, "F", "F", "F", "F", "F…
## $ basic_enumeration_date                  <date> NA, NA, 2015-04-09, 2015-04-0…
## $ basic_last_updated                      <date> NA, NA, 2024-08-22, 2024-08-2…
## $ taxonomies_code                         <chr> NA, NA, "207V00000X", "207V000…
## $ taxonomies_desc                         <chr> NA, NA, "Obstetrics & Gynecolo…
## $ taxonomies_state                        <chr> NA, NA, "Washington", "Washing…

Quality Check the Data

Are there any physicians included more than twice?

Included More than Twice
NPI	record_id	N
NA	NA	NA
—:	———:	–:

Variables of those physicians included more than twice?

Variables of Physicians Included More Than Twice
NPI	reason_for_exclusions	insurance	business_days_until_appointment
NA	NA	NA	NA
—:	:———————	:———	——————————-:

Find physicians called more than three times

NPI numbers called more than thrice
NPI	calls_count
NA	NA
—:	———–:

Do they have exclusion and have an appt?
NPI	id_number	reason_for_exclusions	business_days_until_appointment
NA	NA	NA	NA
—:	———:	:———————	——————————-:

Do they have business_days_until_appointment greater than zero but are an excluded category?

Records with Appointments but in Excluded Category
NPI	id_number	reason_for_exclusions	business_days_until_appointment
NA	NA	NA	NA
—:	———:	:———————	——————————-:

Do they have NA for business_days_until_appointment but are “Included” in the Reasons for exclusion category?

Included Records with NA for Appointments
NPI	id_number	reason_for_exclusions	business_days_until_appointment
NA	NA	NA	NA
—:	———:	:———————	——————————-:

Data Munging

Create Median Household Income Quantiles

Zip Analysis

National percentage of physicians in most affluent ZIP Codes

Results

Check data normality

The data is not normally distributed. Plus it is count data. t-test assumes that data is normally distributed, and comparing the means of counts data is also not appropriate, we can check the incidence rate ratio for comparison of business_days_until_appointment among the categories of insurance. Better to use Poisson regression.

This Q-Q plot displays the distribution of the business_days_until_appointment variable against a theoretical normal distribution. Here’s an interpretation based on the plot’s characteristics:

Heavy Right Tail (Positive Skew): The data points deviate upward from the reference line on the right side, indicating that the business_days_until_appointment distribution has a heavy right tail or positive skew. This suggests that while most appointments are scheduled within a typical range, there are a few cases where the wait time is significantly longer.
Departure from Normality: The points deviate from the reference line at both ends, especially at the upper end (right tail). This indicates that the data does not follow a normal distribution closely. Instead, it appears to have a skewed, possibly exponential or log-normal distribution, given the pattern of points rising sharply at higher values.
Outliers: The data point at the top right, well above the line, is likely an outlier with a much longer wait time than the majority. This extreme value contributes to the non-normality and might need consideration, depending on the analysis goals.

In summary, the business_days_until_appointment variable is not normally distributed and shows positive skewness with some outliers, especially toward longer wait times.

## Starting normality check and summary calculation for variable: business_days_until_appointment

## Data extracted for variable: business_days_until_appointment

## Shapiro-Wilk normality test completed with p-value: 0.0000000000000000000000000000023401989208789

## The p-value is less than or equal to 0.05, indicating that the data is not normally distributed.

## Histogram with Density Plot created.

## Q-Q Plot created.

## Data is NOT normally distributed. Use non-parametric measures like median: 8, IQR: 26

## $median
## [1] 8
## 
## $iqr
## [1] 26

## Summary calculation completed for variable: business_days_until_appointment

## $median
## [1] 8
## 
## $iqr
## [1] 26

Appointment Accessibility

## [1] "Physicians were successfully contacted in 49 states including the District of Columbia. The excluded states include North Dakota and Rhode Island."

Overall distribution of calls between BCBS and Medicaid

## [1] 1140

Insurance Acceptance Rates

## [1] "Out of the 573 unique physicians assigned Medicaid, 238 (41.5%; n = 238 / N = 573) were successfully contacted. Of the 238 physicians accepting Medicaid who were successfully contacted, 179 (75.2%; n = 179 / N = 238) provided an appointment date."

These acceptance rates reflect the proportion of physicians who were successfully contacted, accepted the respective insurance, and provided an appointment to the patient.

Medicaid Acceptance Rate: Out of the total number of physicians assigned Medicaid insurance (573), 179 physicians accepted Medicaid and provided an appointment, resulting in an acceptance rate of 75.2%.

Blue Cross/Blue Shield Acceptance Rate: Among the physicians assigned Blue Cross/Blue Shield insurance (567), 238 accepted this insurance and provided an appointment, yielding an acceptance rate of 73%.

Individual Insurance Rates of Successfully Making an Appointment

## [1] "Out of the 583 unique physicians assigned to be called for either insurance type, 567 (49%; n = 567 / N = 1166) were assigned to Blue Cross/Blue Shield, 573 (49%; n = 573 / N = 1166) were assigned to Medicaid, and 583 (50%; n = 583 / N = 1166) were assigned to at least one of these insurance types."

Told to seek Emergency Care

scenario_type	n	percent
Emergent	70	43.47826
Urgent	91	56.52174

## For the 161 patients who were told to go to the Emergency Department, 43.5% were in the Emergent scenario type (n = 70 / N = 161) and 56.5% were in the Urgent scenario type (n = 91 / N = 161).

May not need this

## Our sample included 583 calls to physician offices from 49 states excluding North Dakota and Rhode Island . We made calls to 567 unique physicians that accepted Blue Cross/Blue Shield. One Hundred Seventy-Nine physician offices accepted Medicaid, giving a 75.2 % Medicaid acceptance rate for OBGYN practices (n = 179 /N = 238 ).  Physicians offices accepted Blue Cross/Blue Shield at a rate of 73 % (n = 238 /N = 326 ).

Age Physician Description

## The median age of the participants is 53 years, with an interquartile range (IQR) spanning from 44 years (25th percentile) to 61 years (75th percentile).

Gender Physician Description

## In our dataset, the most common physician gender was Female (n = 721/N = 1,140, 63.2%).

Exclusions

## [1] "Of the total 1140 phone calls made, 1009 (88%) successfully reached a front desk representative, while 131 calls (12%) did not yield a connection even after two attempts. Among unsuccessful connections, 73 (56%) were redirected to voicemail, and 58 (44%) reached a busy signal. For successful connections, the reasons for exclusion were 39 (4%) requiring a prior referral, 63 (6%) not currently accepting new patients, and 179 (18%) offices putting the caller on hold for more than five minutes."

Visualizing the Each Individual Predictor

Graph each variable

Business days by insurance

Highly Skewed Distribution:

Both distributions are right-skewed, meaning that most patients receive appointments relatively quickly, while a smaller number experience longer wait times.
The majority of appointments for both insurance types seem to occur within a shorter range of business days, with a notable decline in frequency as the waiting time increases. Differences Between Insurance Types:
The distribution for Blue Cross/Blue Shield shows more variation in the number of days, with many individuals receiving appointments within shorter time frames, but also some patients experiencing longer delays (greater than 100 days).
Medicaid appears to have more concentrated appointments in shorter time frames, suggesting fewer extreme wait times compared to Blue Cross/Blue Shield. However, the initial spike is much higher, implying a higher number of shorter wait times compared to Blue Cross/Blue Shield. Appointment Delays:
While both distributions drop significantly after the initial spike (indicating a substantial number of shorter wait times), Medicaid exhibits a sharper drop compared to Blue Cross/Blue Shield, suggesting that fewer Medicaid patients have extended wait times.
Overall, the distribution suggests that while there is a large volume of quick appointments for both insurance types, Medicaid tends to offer fewer appointments with long delays compared to Blue Cross/Blue Shield. However, Blue Cross/Blue Shield patients exhibit a broader range of wait times, possibly indicating more variability in scheduling.

Log Business Days

## Plots saved to: output/density_plot_20241113_182519.tiff and output/density_plot_20241113_182519.png

The log transformation applied to the business_days_until_appointment variable has several significant effects:

Reducing Skewness:
The original business_days_until_appointment variable is highly skewed to the right, with a large number of values clustered at low numbers and a few extreme values extending into high numbers. By taking the logarithm, we compress these larger values and stretch out the smaller ones, reducing the extreme skewness. This makes it easier to visualize and interpret the underlying distributions.
Enhanced Comparability:
The log transformation reduces the range of values, which helps in comparing the distributions between Blue Cross/Blue Shield and Medicaid more effectively. Without this transformation, the plot would be dominated by very long tails, making it challenging to identify differences between the distributions.
Better Insight into Relative Differences:
By taking the logarithm, we can see relative differences more clearly. The log-transformed density plot reveals distinct peaks for the two insurance categories that wouldn’t be as apparent otherwise. For example, Blue Cross/Blue Shield shows a more pronounced peak around log 3 (approximately 20 days), whereas Medicaid has a more bimodal shape. This insight into the central tendencies and spread is made possible by transforming the values to a log scale.

In summary, the log transformation improves interpretability by reducing skewness, allowing for better comparison between groups, and highlighting important patterns that would otherwise be obscured in the original scale.

told_to_go_to_the_emergency_depa for emergency scenario types

Emergency vs Urgent scenario types

Day of the week by insurance

Central Appointment Line by Insurance

Physician Gender by Insurance

Physician MD vs. DO by Insurance

Scenario

Descriptive Tables

Table 1 - Split across Insurances

	Overall (N=583)
Age (years)
- Less than 50 years old	219 (38.9%)
- 50 to 55 years old	92 (16.3%)
- 56 to 60 years old	88 (15.6%)
- 61 to 65 years old	74 (13.1%)
- Greater than 65 years old	90 (16.0%)
Gender
- Female	368 (63.1%)
- Male	215 (36.9%)
Medical School Training
- Allopathic training	535 (93.5%)
- Osteopathic training	37 (6.5%)
Medical School Location
- US Senior Medical Student	425 (81.6%)
- International Medical Graduate	96 (18.4%)
Academic Affiliation
- Private Practice	526 (90.2%)
- University	57 (9.8%)
Rurality
- Metropolitan area	528 (90.6%)
- Rural area	55 (9.4%)
American College of OBGYNs Districts
- District I	37 (6.3%)
- District II	33 (5.7%)
- District III	39 (6.7%)
- District IV	68 (11.7%)
- District V	57 (9.8%)
- District VI	65 (11.1%)
- District VII	97 (16.6%)
- District VIII	99 (17.0%)
- District IX	31 (5.3%)
- District XI	22 (3.8%)
- District XII	35 (6.0%)
Central scheduling
- Yes, central scheduling number	207 (35.5%)
- No	376 (64.5%)
call_date_wday
- Monday	62 (10.6%)
- Tuesday	114 (19.6%)
- Wednesday	181 (31.0%)
- Thursday	138 (23.7%)
- Friday	88 (15.1%)

The majority are young (under 50), female, trained in allopathic programs, and primarily working in private practice????? and metropolitan areas. Most do not use central scheduling, and the calls were more likely to be made midweek.

Wait Time by Insurance Figures

Waiting time in Days (Log Scale) for Blue Cross/Blue Shield versus Medicaid. The code you provided will create a scatter plot with points representing the relationship between the insurance variable (x-axis) and the days variable (y-axis). Additionally, it includes a line plot that connects points with the same npi value.

Line Plot

## Plots saved to: Melanie/Figures/urgent_GYN_vs_insurance_20241113_182522.tiff and Melanie/Figures/urgent_GYN_vs_insurance_20241113_182522.png

Plot interpretation in R Markdown

The line plot shows the relationship between Blue Cross/Blue Shield and Medicaid regarding the log-transformed values of business days until appointment.

Data Representation:
- The vertical axis (y-axis) represents the log-transformed “business days until appointment.”
- The horizontal axis (x-axis) represents two different insurance types: “Blue Cross/Blue Shield” and “Medicaid”.
- Each individual line connecting points on both sides represents changes in “business days until appointment” for each instance or individual between the two insurance types.
Log Transformation:
- The y-axis uses a logarithmic scale to visualize “business days until appointment” because the original values may be skewed with extreme values (outliers). The log transformation helps compress high values, making it easier to view the data distribution.
Connections Between Insurance Types:
- The lines connecting the dots show the changes in “business days until appointment” for each physician office between the two insurance types.
- There is a considerable number of crossed lines, indicating inconsistent differences between “business days until appointment” for “Blue Cross/Blue Shield” versus “Medicaid.”
General Observations:
- The red line represents the median “business days until appointment” for both insurance types.
- The median appears relatively flat between the two insurance types, indicating similar average wait times.
- However, there is high variability in the individual waiting times; for some offices, “Medicaid” results in longer waiting times, while for others, “Blue Cross/Blue Shield” has longer waits.
Insights:
- There appears to be no systematic difference between the waiting times for “Blue Cross/Blue Shield” versus “Medicaid” based on the median line, which is relatively similar for both groups.
- The variability in the lines indicates unequal access depending on the provider. While the average waiting times are similar, some patients experience significant differences depending on their insurance.

Overall, this plot suggests that while on average waiting times are similar between the two insurance groups, individual-level variability is significant. This implies potential disparities in access to timely appointments based on the specific provider or office handling the appointment.

Scatter Plot

## Plots saved to: Melanie/Figures/urgent_gyn_vs_insurance_none_20241113_182523.tiff and Melanie/Figures/urgent_gyn_vs_insurance_none_20241113_182523.png

# Interpretation of Scatter Plot of Business Days Until Appointment by Insurance Type

The scatter plot depicts the distribution of business days until appointment for patients covered by Blue Cross/Blue Shield and Medicaid.

Insurance Type:
- The x-axis shows two different insurance types: “Blue Cross/Blue Shield” and “Medicaid”.
- The y-axis represents the business days until appointment, indicating the waiting period in days for appointments for patients under each insurance type.
Data Representation:
- Each point in the plot represents an appointment for an individual provider, with the purple dots for Blue Cross/Blue Shield and yellow dots for Medicaid.
- The height of the points along the y-axis represents the number of days until the next available appointment for each provider.
Distribution and Variability:
- For both Blue Cross/Blue Shield and Medicaid, the majority of the points are clustered below 50 days, indicating that most appointments are scheduled within approximately 50 business days.
- Medicaid data shows a greater spread, with a number of points rising significantly higher than 100 business days. This suggests that for some providers, Medicaid patients are facing longer delays compared to Blue Cross/Blue Shield.
- There is also noticeable variability within each insurance type, with some points reaching above 150 to 250 days, indicating outliers where patients had to wait considerably longer for appointments.
Overall Insights:
- The visual spread indicates that Medicaid tends to have more variability in waiting times, with some appointments requiring extended waiting periods.
- In contrast, Blue Cross/Blue Shield generally shows a tighter clustering of waiting times below 50 days, suggesting a more consistent access pattern.
- The difference in height and density of points suggests that while Blue Cross/Blue Shield has more consistent and possibly shorter waiting times, Medicaid patients may be more prone to experiencing significantly longer waits.
Potential Implications:
- This variability in Medicaid waiting times could imply potential barriers or delays in accessing timely care for patients depending on the provider or region.
- The differences suggest that Medicaid patients may not be receiving as consistent access to appointments as patients with Blue Cross/Blue Shield, potentially reflecting disparities in healthcare access.

The overall interpretation of this plot suggests notable differences in waiting times between Blue Cross/Blue Shield and Medicaid, with Medicaid patients generally facing a wider range and potentially longer waiting periods, indicating potential disparities in appointment availability.

Density Plot

## Plots saved to: Melanie/Figures/urgent_GYN_vs_insurance_density_20241113_182524.tiff and Melanie/Figures/urgent_GYN_vs_insurance_density_20241113_182524.png

# Comparison of the Density Plot and Scatter Plot of Waiting Times by Insurance Type

The density plot and scatter plot provide different perspectives on the waiting times for appointments by insurance type (Blue Cross/Blue Shield vs. Medicaid). Each plot reveals unique insights, and here’s how they compare:

Density Plot:
- The density plot provides a smooth representation of the distribution of waiting times for each insurance type, showing the overall trends and concentration of data points.
- It allows for an immediate understanding of the shape of the distribution, highlighting where most of the data is concentrated (i.e., peak densities).
- The log-transformed x-axis helps compress the range of waiting times, which is particularly useful for visualizing data that varies widely. The transformation helps reduce the influence of outliers, making it easier to compare the central trends of the two insurance groups.
- The density plot captures differences in variability between the two groups:
  - Blue Cross/Blue Shield has a sharper peak, indicating a more consistent and shorter waiting time.
  - Medicaid has a broader, flatter peak, indicating greater variability and suggesting longer waiting times for some patients.
- Overall, the density plot is useful for understanding the underlying probability distribution of the waiting times and highlighting trends across the two insurance groups.
Scatter Plot:
- The scatter plot provides individual-level data points, showing each appointment’s waiting time by insurance type.
- It allows us to see the spread and range of waiting times without aggregating the data. The scatter plot shows each instance, giving an immediate sense of the variability and presence of outliers.
- Unlike the density plot, the scatter plot is more focused on the actual distribution of individual data points, which helps identify clusters, patterns, or any outlying values.
- One key insight from the scatter plot is that both insurance types have a few extreme outliers (very long waiting times), but it does not provide the same visual insight into how most of the data is distributed.
- The scatter plot shows that Blue Cross/Blue Shield patients generally have shorter waiting times, with many points clustered at the lower end. Medicaid patients also tend to have shorter waiting times, but the scatter plot reveals more variation in their waiting times, with some values spread across a larger range.
Unique Insights from the Density Plot:
- The density plot reveals the central tendencies and overall distribution shape of waiting times, which may be less apparent from the scatter plot. Specifically, the density plot shows where the data peaks, which can indicate the most common waiting times.
- It also provides a better visualization of the relative concentration of waiting times, helping to identify whether there is a skew in the data or bimodal tendencies.
- The density plot helps to compare the overall spread and concentration between the insurance types, highlighting systematic differences in access to care, which might be harder to discern from individual points in the scatter plot.
Unique Insights from the Scatter Plot:
- The scatter plot shows the specific distribution of waiting times, allowing for an easy identification of outliers and the spread of individual values.
- It provides a granular view of the data, illustrating exactly how waiting times vary for each appointment and showing every observation instead of summarizing it through density.
Conclusion:
- The density plot is excellent for understanding summary distributions and identifying general trends between insurance types, such as typical waiting times and where data clusters occur.
- The scatter plot, on the other hand, is better suited for observing individual data points, detecting outliers, and understanding raw spread and variability in waiting times.

Therefore, the density plot provides a more generalized view of the distribution of waiting times, which can make trends and patterns more immediately apparent, whereas the scatter plot offers a more detailed view of individual data points, providing insights into outliers and specific instances. Using both plots together gives a comprehensive understanding of the data, balancing overall trends with individual observations.

Wait Time by Scenario Figures

Waiting time in Days (Log Scale) for Blue Cross/Blue Shield versus Medicaid. The code you provided will create a scatter plot with points representing the relationship between the scenario variable (x-axis) and the days variable (y-axis). Additionally, it includes a line plot that connects points with the same NPI name value.

Line Plot

## Plots saved to: Melanie/Figures/urgent_GYN_vs_scenario_20241113_182525.tiff and Melanie/Figures/urgent_GYN_vs_scenario_20241113_182525.png

Here we show a scatterplot that compares the hip, knee, and shoulder times. Notice that the graph is in logarithmic scale.

Scatter Plot

## Plots saved to: Melanie/Figures/urgent_GYN_vs_scenario_none_20241113_182526.tiff and Melanie/Figures/urgent_GYN_vs_scenario_none_20241113_182526.png

Density Plot

Understanding a Density Plot:

A density plot is a smoothed version of a histogram that shows the distribution of a continuous variable. It represents the relative frequency of data points in different ranges of values, with areas under the curve corresponding to proportions of the data.

X-axis (Log Waiting Times in Days):
- The x-axis shows the logarithm of waiting times in days, meaning the waiting times have been transformed to a logarithmic scale to make the distribution more manageable or easier to interpret. A log transformation is often used when the raw data is skewed.
- Values closer to the left (lower on the x-axis) represent shorter waiting times, while values to the right (higher on the x-axis) represent longer waiting times.
Y-axis (Density):
- The y-axis represents density, which is the relative concentration of data points for a given range of values on the x-axis. The area under the entire curve sums to 1, meaning it reflects the proportion of observations.
- Higher peaks represent regions where there is a higher concentration of data points, while lower regions represent ranges with fewer data points.
Colors (Insurance):
- The two colors (purple for Blue Cross/Blue Shield and yellow for Medicaid) represent the distribution of waiting times for the two different insurance groups.
- The overlap between the two distributions is shaded, showing regions where both groups have similar waiting times.

How to Read the Density Plot: 1. Shape of the Distribution: - The shape of each curve tells you about the distribution of waiting times within each insurance group. - A peak indicates the most common waiting times for that group. - A wider curve indicates a more spread-out distribution, meaning the waiting times vary more within that group. - A narrower curve indicates that waiting times are more concentrated around the peak.

## Plots saved to: Melanie/Figures/urgent_GYN_vs_scenario_density_20241113_182526.tiff and Melanie/Figures/urgent_GYN_vs_scenario_density_20241113_182526.png

Statistical testing

Consider the following scenario:

You want to examine how insurance type affects waiting time for appointments. However, patients may differ in other ways, like their age and medical condition, which could also influence waiting times.

When fitting a regression model with waiting time as the dependent variable and insurance type as one of the predictors (along with other factors like age and medical condition), the EMMs would represent the average waiting time for each insurance type, adjusted for the effects of age and medical condition. This adjustment helps isolate the effect of insurance type on waiting time, ensuring the comparison between insurance types is fair.

Interpretation: In the plot you provided earlier, the Estimated Marginal Means for each scenario represent the average predicted waiting time for an appointment, adjusted for other factors in the model. This gives a clearer, model-based comparison of the expected waiting times across different medical scenarios, taking into account variability in other factors.

This image is a plot of Estimated Marginal Means (also known as least-squares means) for different scenarios. Each point represents the estimated marginal mean waiting time (in days) for a different medical scenario, and the error bars represent the 95% confidence intervals (CI) around these estimates.

Here’s a breakdown of the different components of the plot:

Y-axis:
- “Estimated Marginal Means for Waiting Time in Days”: The y-axis shows the estimated average waiting time in business days, which is the dependent variable in this model. The values range from about 15 to 30 days.
- The scale indicates that patients in different scenarios are estimated to wait between 15 and 30 days for an appointment, on average.
X-axis:
- “scenario”: The x-axis lists five different medical scenarios, which are the levels of the predictor variable. These are:
  1. Prior trip to ED and was found to have a 6 cm TOA: This scenario seems to be related to a prior emergency department (ED) visit and the discovery of a 6 cm tubo-ovarian abscess (TOA).
  2. Positive pregnancy test after a tubal ligation: This scenario involves a positive pregnancy test after a sterilization procedure (tubal ligation).
  3. Acute cystitis: This scenario involves a urinary tract infection, commonly known as acute cystitis.
  4. Recurrent/Treatment resistant vaginitis: This scenario refers to persistent or treatment-resistant vaginal infections.
The x-axis labels are rotated for readability, showing the different medical conditions (scenarios) being compared.
Estimated Marginal Means (Points on the Plot):
- Colored Points: Each colored point represents the estimated marginal mean waiting time for that specific scenario.
  - The different colors correspond to different scenarios, as explained by the legend at the bottom of the plot.
  - The vertical position of each point represents the estimated waiting time in business days.
Confidence Intervals (Error Bars):
- Error Bars: The vertical bars around each point represent the 95% confidence intervals. These intervals give a range within which the true mean waiting time for each scenario is expected to lie 95% of the time, based on the model.
  - Narrower intervals (e.g., the scenario “Positive pregnancy test after a tubal ligation”) indicate more precision in the estimate.
  - Wider intervals (e.g., “Recurrent/Treatment resistant vaginitis”) indicate more uncertainty or variability in the estimate.
Interpretation of the Estimated Marginal Means:

A simple rule of thumb is that if error bars for 95% confidence intervals overlap by less than about half the length of one error bar, the difference between the two groups might still be statistically significant. If the error bars overlap considerably, it’s more likely (but not guaranteed) that the difference between the groups is not statistically significant.

“Prior trip to ED and was found to have a 6 cm TOA”: The estimated mean waiting time for this scenario is around 20 days, with a relatively narrow confidence interval.
“Positive pregnancy test after a tubal ligation”: This scenario has the shortest estimated waiting time, just under 17 days, and the narrowest confidence interval, indicating a highly precise estimate.
“Acute cystitis”: This scenario has an estimated waiting time of around 25 days, and the confidence interval is slightly wider, indicating some variability.
“Recurrent/Treatment resistant vaginitis”: This scenario has the longest estimated waiting time, around 27 days, and the confidence interval is quite wide, indicating a high degree of uncertainty or variability in the estimate.

Combined plot of Subspecialty and Insurance

## Extracted interaction data:

##  scenario              insurance                  rate        SE  df asymp.LCL
##  TOA                   Blue Cross/Blue Shield 4.718326 0.9436969 Inf  3.188173
##  Pregnancy after tubal Blue Cross/Blue Shield 6.408744 1.2848820 Inf  4.326298
##  UTI                   Blue Cross/Blue Shield 3.459394 0.7314328 Inf  2.285743
##  Vaginitis             Blue Cross/Blue Shield 8.426966 1.6220978 Inf  5.778623
##  TOA                   Medicaid               6.438937 1.2938995 Inf  4.342760
##  Pregnancy after tubal Medicaid               5.607467 1.1338873 Inf  3.772637
##  UTI                   Medicaid               3.114619 0.6627911 Inf  2.052433
##  Vaginitis             Medicaid               6.909619 1.3393328 Inf  4.725639
##  asymp.UCL
##   6.982870
##   9.493566
##   5.235677
##  12.289045
##   9.546902
##   8.334670
##   4.726511
##  10.102936
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale

## 
## Scenario: TOA 
## Filtered data for scenario:
##  scenario insurance                  rate        SE  df asymp.LCL asymp.UCL
##  TOA      Blue Cross/Blue Shield 4.718326 0.9436969 Inf  3.188173  6.982870
##  TOA      Medicaid               6.438937 1.2938995 Inf  4.342760  9.546902
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale 
## 
## Blue Cross/Blue Shield data:
##  scenario insurance                  rate        SE  df asymp.LCL asymp.UCL
##  TOA      Blue Cross/Blue Shield 4.718326 0.9436969 Inf  3.188173   6.98287
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale 
## 
## Medicaid data:
##  scenario insurance     rate       SE  df asymp.LCL asymp.UCL
##  TOA      Medicaid  6.438937 1.293899 Inf   4.34276  9.546902
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale 
## 
## Interaction p-value for scenario TOA : NA 
## Wait times for Medicaid are longer compared to Blue Cross/Blue Shield.
## 
## Scenario: Pregnancy after tubal 
## Filtered data for scenario:
##  scenario              insurance                  rate       SE  df asymp.LCL
##  Pregnancy after tubal Blue Cross/Blue Shield 6.408744 1.284882 Inf  4.326298
##  Pregnancy after tubal Medicaid               5.607467 1.133887 Inf  3.772637
##  asymp.UCL
##   9.493566
##   8.334670
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale 
## 
## Blue Cross/Blue Shield data:
##  scenario              insurance                  rate       SE  df asymp.LCL
##  Pregnancy after tubal Blue Cross/Blue Shield 6.408744 1.284882 Inf  4.326298
##  asymp.UCL
##   9.493566
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale 
## 
## Medicaid data:
##  scenario              insurance     rate       SE  df asymp.LCL asymp.UCL
##  Pregnancy after tubal Medicaid  5.607467 1.133887 Inf  3.772637   8.33467
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale 
## 
## Interaction p-value for scenario Pregnancy after tubal : <0.01 
## Wait times for Medicaid are shorter compared to Blue Cross/Blue Shield.
## 
## Scenario: UTI 
## Filtered data for scenario:
##  scenario insurance                  rate        SE  df asymp.LCL asymp.UCL
##  UTI      Blue Cross/Blue Shield 3.459394 0.7314328 Inf  2.285743  5.235677
##  UTI      Medicaid               3.114619 0.6627911 Inf  2.052433  4.726511
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale 
## 
## Blue Cross/Blue Shield data:
##  scenario insurance                  rate        SE  df asymp.LCL asymp.UCL
##  UTI      Blue Cross/Blue Shield 3.459394 0.7314328 Inf  2.285743  5.235677
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale 
## 
## Medicaid data:
##  scenario insurance     rate        SE  df asymp.LCL asymp.UCL
##  UTI      Medicaid  3.114619 0.6627911 Inf  2.052433  4.726511
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale 
## 
## Interaction p-value for scenario UTI : <0.01 
## Wait times for Medicaid are shorter compared to Blue Cross/Blue Shield.
## 
## Scenario: Vaginitis 
## Filtered data for scenario:
##  scenario  insurance                  rate       SE  df asymp.LCL asymp.UCL
##  Vaginitis Blue Cross/Blue Shield 8.426966 1.622098 Inf  5.778623  12.28904
##  Vaginitis Medicaid               6.909619 1.339333 Inf  4.725639  10.10294
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale 
## 
## Blue Cross/Blue Shield data:
##  scenario  insurance                  rate       SE  df asymp.LCL asymp.UCL
##  Vaginitis Blue Cross/Blue Shield 8.426966 1.622098 Inf  5.778623  12.28904
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale 
## 
## Medicaid data:
##  scenario  insurance     rate       SE  df asymp.LCL asymp.UCL
##  Vaginitis Medicaid  6.909619 1.339333 Inf  4.725639  10.10294
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale 
## 
## Interaction p-value for scenario Vaginitis : <0.01 
## Wait times for Medicaid are shorter compared to Blue Cross/Blue Shield.

## 
## Generated sentences:

## TOA: Patients with Blue Cross/Blue Shield insurance wait 4.7 days, with a 95% confidence interval (CI) ranging from 3.2 to 7.0 days. Medicaid recipients in this scenario experience longer waits, at 6.4 days with a CI of 4.3 to 9.5 days (p-value = NA).
## 
## Pregnancy after tubal: Patients with Blue Cross/Blue Shield insurance wait 6.4 days, with a 95% confidence interval (CI) ranging from 4.3 to 9.5 days. Medicaid recipients in this scenario experience shorter waits, at 5.6 days with a CI of 3.8 to 8.3 days (p-value = <0.01).
## 
## UTI: Patients with Blue Cross/Blue Shield insurance wait 3.5 days, with a 95% confidence interval (CI) ranging from 2.3 to 5.2 days. Medicaid recipients in this scenario experience shorter waits, at 3.1 days with a CI of 2.1 to 4.7 days (p-value = <0.01).
## 
## Vaginitis: Patients with Blue Cross/Blue Shield insurance wait 8.4 days, with a 95% confidence interval (CI) ranging from 5.8 to 12.3 days. Medicaid recipients in this scenario experience shorter waits, at 6.9 days with a CI of 4.7 to 10.1 days (p-value = <0.01).

Poisson Model The models need to be able to deal with NA in the business_days_until_appointment outcome variable (558) and also non-parametric data.

business_days_until_appointment can be transformed with a square root function so that 0 is not infinity from log(business_days_until_appointment).

Wait Time

Wait Time with single predictor

Poisson Predicted Wait Times

In interpreting this output:

Poisson Model Appropriateness:
- Since we are dealing with count data for the outcome (business_days_until_appointment), Poisson regression is indeed more suitable than a Kruskal-Wallis test. The Kruskal-Wallis test would only indicate if there is a statistically significant difference across groups in insurance but would not provide specific information on the effect size or direction of differences, which the Poisson model offers.
Interpretation of the Medicaid Coefficient:
- The coefficient for Medicaid, -0.008725, suggests a slight (but statistically insignificant) reduction in the log count of days until the appointment for Medicaid patients compared to the baseline insurance group. The p-value of 0.659 shows this effect is not statistically significant, meaning we don’t have enough evidence to conclude that Medicaid influences wait time compared to the baseline insurance category.
Null and Residual Deviance:
- The null and residual deviance are nearly identical, indicating that adding insurance as a predictor does not improve the model’s fit substantially. This suggests that insurance may not be a strong predictor of business_days_until_appointment.
Overdispersion:
- If you find that the variance is significantly greater than the mean (overdispersion), a negative binomial regression might be more appropriate, as it allows for extra variation in the data.
Conclusion:
- This Poisson regression model indicates that insurance type does not significantly influence the wait time for an appointment (business_days_until_appointment) based on the p-value and the similarity in deviance values.

In summary, while Poisson regression provides more detailed insights than a Kruskal-Wallis test, this model suggests that insurance type does not significantly affect the wait time for an appointment.

Is there a difference in wait times by `insurance`?

## 
## Call:
## glm(formula = business_days_until_appointment ~ as.factor(insurance), 
##     family = "poisson", data = df)
## 
## Coefficients:
##                               Estimate Std. Error z value            Pr(>|z|)
## (Intercept)                   2.893030   0.012880 224.615 <0.0000000000000002
## as.factor(insurance)Medicaid -0.008725   0.019781  -0.441               0.659
##                                 
## (Intercept)                  ***
## as.factor(insurance)Medicaid    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 16741  on 581  degrees of freedom
## Residual deviance: 16741  on 580  degrees of freedom
##   (558 observations deleted due to missingness)
## AIC: 18609
## 
## Number of Fisher Scoring iterations: 6

## Using Poisson regression, the baseline rate of business_days_until_appointment (intercept) is estimated to be 18.05 times the reference category, with a 95% confidence interval ranging from 17.6 to 18.51 . For Medicaid compared to the reference category (Blue Cross/Blue Shield), the incidence rate ratio (IRR) of business_days_until_appointment is estimated to be 0.99 , indicating that the waiting time for Medicaid patients is lower than for those with Blue Cross/Blue Shield. The 95% confidence interval for the IRR ranges from 0.95 to 1.03 , with a p-value of 0.65916 . Given that the confidence interval includes 1 and the p-value is greater than 0.05, the effect is not statistically significant, suggesting no conclusive evidence that the type of insurance impacts the wait time for an appointment.

Use tyler::generate_latex_equation functions.

Scenarios for Variable Selection

\[ \begin{{align*}} P(\text{{Business Days until New Patient Appointment}} = x) &= \frac{{e^{{-\lambda}} \cdot \lambda^x}}{{x!}} \\sqrt{{\lambda}} &= \beta_0 \& + \beta_1 \cdot \underline{{\mathbf{{\large{{\textPatient Scenario}}}}}} \& + ( 1 | \text{{Physician NPI}}) \end{{align*}} \]

## Logging inputs...
## Model Object:  glm lm 
## Specs:  ~scenario | scenario 
## Variable of Interest:  scenario 
## Color By:  scenario 
## Output Directory:  Melanie/Figures 
## Y-Axis Min:  12 
## Y-Axis Max:  24 
## Using existing output directory:  Melanie/Figures 
## Computing estimated marginal means...
## Logging estimated marginal means data...
## # A tibble: 4 × 6
##   scenario               rate    SE    df asymp.LCL asymp.UCL
##   <fct>                 <dbl> <dbl> <dbl>     <dbl>     <dbl>
## 1 TOA                    19.2 0.366   Inf      18.5      19.9
## 2 Pregnancy after tubal  17.0 0.341   Inf      16.4      17.7
## 3 UTI                    13.4 0.308   Inf      12.8      14.0
## 4 Vaginitis              22.0 0.382   Inf      21.3      22.8
## Range of estimated marginal means with CIs:  12.7995 22.80813 
## Creating the plot...
## Plot created successfully.

## Saving plot to:  Melanie/Figures/interaction_scenario_comparison_plot_20241113_182535.png

## Plot saved successfully to:  Melanie/Figures/interaction_scenario_comparison_plot_20241113_182535.png 
## Returning the estimated data and plot object.

Business Days Until Next Appointment Joint Scenario
scenario	Median_business_days_until_appointment	Q1	Q3
TOA	9	0	26
Pregnancy after tubal	9	1	22
UTI	2	0	20
Vaginitis	12	1	34

Number of offices with each of the four scenarios successfully contacted: `business_days_until_appointment ~ scenario`

Number of successful calls contacted for each scenario
scenario	count
Prior trip to ED and was found to have a 6 cm TOA	137
Positive pregnancy test after a tubal ligation	144
Acute cystitis	138
Recurrent/Treatment resistant vaginitis	145

## 
## Call:
## glm(formula = business_days_until_appointment ~ as.factor(scenario), 
##     family = "poisson", data = df)
## 
## Coefficients:
##                                                                   Estimate
## (Intercept)                                                        2.95360
## as.factor(scenario)Positive pregnancy test after a tubal ligation -0.11759
## as.factor(scenario)Acute cystitis                                 -0.35908
## as.factor(scenario)Recurrent/Treatment resistant vaginitis         0.13955
##                                                                   Std. Error
## (Intercept)                                                          0.01910
## as.factor(scenario)Positive pregnancy test after a tubal ligation    0.02764
## as.factor(scenario)Acute cystitis                                    0.02991
## as.factor(scenario)Recurrent/Treatment resistant vaginitis           0.02579
##                                                                   z value
## (Intercept)                                                       154.663
## as.factor(scenario)Positive pregnancy test after a tubal ligation  -4.255
## as.factor(scenario)Acute cystitis                                 -12.007
## as.factor(scenario)Recurrent/Treatment resistant vaginitis          5.411
##                                                                               Pr(>|z|)
## (Intercept)                                                       < 0.0000000000000002
## as.factor(scenario)Positive pregnancy test after a tubal ligation         0.0000209144
## as.factor(scenario)Acute cystitis                                 < 0.0000000000000002
## as.factor(scenario)Recurrent/Treatment resistant vaginitis                0.0000000626
##                                                                      
## (Intercept)                                                       ***
## as.factor(scenario)Positive pregnancy test after a tubal ligation ***
## as.factor(scenario)Acute cystitis                                 ***
## as.factor(scenario)Recurrent/Treatment resistant vaginitis        ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 16741  on 581  degrees of freedom
## Residual deviance: 16412  on 578  degrees of freedom
##   (558 observations deleted due to missingness)
## AIC: 18284
## 
## Number of Fisher Scoring iterations: 6

## The median wait time across all scenarios was 8 business days, with an interquartile range (IQR) of 0 to 26 days. Specifically, the median wait time was 9 days (IQR: 0 to 26) for 'Prior trip to ED and was found to have a 6 cm TOA', 9 days (IQR: 1 to 22) for 'Positive pregnancy test after a tubal ligation', 2 days (IQR: 0 to 20) for 'Acute cystitis', and 12 days (IQR: 1 to 34) for 'Recurrent/Treatment resistant vaginitis'. The p-value for the difference between 'Positive pregnancy test after a tubal ligation' and 'Prior trip to ED and was found to have a 6 cm TOA' scenarios was <0.01, for 'Acute cystitis' and 'Prior trip to ED and was found to have a 6 cm TOA', it was <0.01, and for 'Recurrent/Treatment resistant vaginitis' and 'Prior trip to ED and was found to have a 6 cm TOA', it was <0.01.

Insurance

Business Days Until Next Appointment By Each Insurance
insurance	Median_business_days_until_appointment	Q1	Q3
Blue Cross/Blue Shield	9.0	0	26
Medicaid	7.5	0	26

## Medicaid patients experienced a 0.87 % shorter wait for a new patient appointment compared to patients with BCBS (Incidence Rate Ratio: 0.991 ; CI: 1 - 1 ; p = 0.66 ) with median wait times of 7.5 business days (IQR: 25th percentile 0 - 75th percentile 26 ) and 9 business days (IQR: 25th percentile 0 - 75th percentile 26 ) respectively.

Full Poisson Model `poisson_full_model`

Single predictor models for `poisson_full_model`

This analysis explores the significance of various predictors on the outcome variable business_days_until_appointment, accounting for the random effects associated with physicians. The goal is to identify which variables significantly influence the time to appointment while controlling for variability across individual physicians.

The step-by-step approach demonstrates how individual predictors are assessed for their significance in influencing the response variable while accounting for the random effects associated with repeated measures on physicians. Significant variables will be used in the final multivariate model to better understand their impact on appointment wait times.

For poisson_full_model: This analysis explores the significance of various predictors on the outcome variable business_days_until_appointment, accounting for the random effects associated with physicians. The goal is to identify which variables significantly influence the time to appointment while controlling for variability across individual physicians.

##                        Predictor      P_Value
## 1                   basic_gender 0.0003471583
## 2                         gender 0.0004052041
## 3                       academic 0.0022440459
## 4               taxonomies_state 0.0206315888
## 5              hold_time_minutes 0.0403542942
## 6                            age 0.0631738778
## 7 Medicaid_to_Medicare_Fee_Index 0.1296461022
## 8                        Med_sch 0.1789727708
## 9                Grd_yr_category 0.1855833254
##                                      IRR
## 1      0.0001522993903460946401272335660
## 2      0.0001794285499064100981338187868
## 3 578923.1830971743911504745483398437500
## 4      0.0000000000000000000000007678469
## 5      5.1010672992072292331044991442468
## 6      0.8078885085395337561919859581394
## 7      0.8919231411244666896109833942319
## 8      0.0121337673512118947999338658406
## 9    122.8246619928129348409129306674004
##                                                 CI_Lower              CI_Upper
## 1   0.00000129461105618509357601160231904247055467749306          0.0179166586
## 2   0.00000157718846220547111759431157390620725777807820          0.0204126554
## 3 123.40247675191234577596333110705018043518066406250000 2715926460.7075295448
## 4   0.00000000000000000000000000000000000000000000390356          0.0001510387
## 5   1.07878844150120678513360417127842083573341369628906         24.1204731067
## 6   0.64557090100981973090910059909219853579998016357422          1.0110180636
## 7   0.76955882487646298795169741424615494906902313232422          1.0337440933
## 8   0.00001977359780227643952553798212257163413596572354          7.4457016677
## 9   0.10053523227188586319780227995579480193555355072021     150055.8287153571
##    Wait_Time_Effect
## 1 shorter wait time
## 2 shorter wait time
## 3  longer wait time
## 4 shorter wait time
## 5  longer wait time
## 6 shorter wait time
## 7 shorter wait time
## 8 shorter wait time
## 9  longer wait time

##                        Predictor P_Value       IRR CI_Lower      CI_Upper
## 1                   basic_gender   <0.01      0.00     0.00          0.02
## 2                         gender   <0.01      0.00     0.00          0.02
## 3                       academic   <0.01 578923.18   123.40 2715926460.71
## 4               taxonomies_state   0.021      0.00     0.00          0.00
## 5              hold_time_minutes   0.040      5.10     1.08         24.12
## 6                            age   0.063      0.81     0.65          1.01
## 7 Medicaid_to_Medicare_Fee_Index   0.130      0.89     0.77          1.03
## 8                        Med_sch   0.179      0.01     0.00          7.45
## 9                Grd_yr_category   0.186    122.82     0.10     150055.83
##    Wait_Time_Effect
## 1 shorter wait time
## 2 shorter wait time
## 3  longer wait time
## 4 shorter wait time
## 5  longer wait time
## 6 shorter wait time
## 7 shorter wait time
## 8 shorter wait time
## 9  longer wait time

Significant Variables Predicting Number of Business Days until Appointment
Predictor	P_Value	IRR	CI_Lower	CI_Upper	Wait_Time_Effect
basic_gender	<0.01	0.00	0.00	0.02	shorter wait time
gender	<0.01	0.00	0.00	0.02	shorter wait time
academic	<0.01	578923.18	123.40	2715926460.71	longer wait time
taxonomies_state	0.021	0.00	0.00	0.00	shorter wait time
hold_time_minutes	0.040	5.10	1.08	24.12	longer wait time
age	0.063	0.81	0.65	1.01	shorter wait time
Medicaid_to_Medicare_Fee_Index	0.130	0.89	0.77	1.03	shorter wait time
Med_sch	0.179	0.01	0.00	7.45	shorter wait time
Grd_yr_category	0.186	122.82	0.10	150055.83	longer wait time

Troubleshooting large IRR for `academic`

From the analysis and boxplot you provided, the issue with the high IRR seems clearer now. Let’s break down the results and address what might be going on:

Key Insights: 1. Sample Imbalance: - There is a major imbalance in the number of observations between Private Practice (556 cases) and University (47 cases). This discrepancy could lead to inflated coefficients, especially if the smaller group (University) has greater variability in wait times. This could explain why the estimate for academicUniversity is so large and significant.

Fixed Effects:
- The model indicates that being at a University is associated with a longer wait time, with an Estimate of 13.905 (p = 0.00124). This suggests that patients at University settings wait, on average, about 13.9 more days than those at private practices.
- However, due to the imbalance in the dataset and some high variance in wait times for university cases, this estimate might be exaggerated. The few outliers seen in the boxplot for University settings could be contributing to this as well.
Random Effects:
- The random effects (NPI) show variability among individual providers (standard deviation of 17.53). This means that individual providers still account for a fair amount of variation in wait times, which is typical in mixed-effects models.

Recommendations to Address the IRR Issue:

Consider Balancing the Dataset:
- The imbalance between University and Private Practice may lead to inflated estimates. You could try down-sampling the larger group (Private Practice) or performing bootstrapping to create a more balanced dataset. This might provide a more realistic estimate for the effect of academicUniversity.

Rerun `poisson_full_model` by removing `academic`

##                        Predictor      P_Value
## 1                   basic_gender 0.0003471583
## 2                         gender 0.0004052041
## 3               taxonomies_state 0.0206315888
## 4              hold_time_minutes 0.0403542942
## 5                            age 0.0631738778
## 6 Medicaid_to_Medicare_Fee_Index 0.1296461022
## 7                        Med_sch 0.1789727708
## 8                Grd_yr_category 0.1855833254
##                                   IRR
## 1   0.0001522993903460946401272335660
## 2   0.0001794285499064100981338187868
## 3   0.0000000000000000000000007678469
## 4   5.1010672992072292331044991442468
## 5   0.8078885085395337561919859581394
## 6   0.8919231411244666896109833942319
## 7   0.0121337673512118947999338658406
## 8 122.8246619928129348409129306674004
##                                               CI_Lower          CI_Upper
## 1 0.00000129461105618509357601160231904247055467749306      0.0179166586
## 2 0.00000157718846220547111759431157390620725777807820      0.0204126554
## 3 0.00000000000000000000000000000000000000000000390356      0.0001510387
## 4 1.07878844150120678513360417127842083573341369628906     24.1204731067
## 5 0.64557090100981973090910059909219853579998016357422      1.0110180636
## 6 0.76955882487646298795169741424615494906902313232422      1.0337440933
## 7 0.00001977359780227643952553798212257163413596572354      7.4457016677
## 8 0.10053523227188586319780227995579480193555355072021 150055.8287153571
##    Wait_Time_Effect
## 1 shorter wait time
## 2 shorter wait time
## 3 shorter wait time
## 4  longer wait time
## 5 shorter wait time
## 6 shorter wait time
## 7 shorter wait time
## 8  longer wait time

##                        Predictor P_Value    IRR CI_Lower  CI_Upper
## 1                   basic_gender   <0.01   0.00     0.00      0.02
## 2                         gender   <0.01   0.00     0.00      0.02
## 3               taxonomies_state   0.021   0.00     0.00      0.00
## 4              hold_time_minutes   0.040   5.10     1.08     24.12
## 5                            age   0.063   0.81     0.65      1.01
## 6 Medicaid_to_Medicare_Fee_Index   0.130   0.89     0.77      1.03
## 7                        Med_sch   0.179   0.01     0.00      7.45
## 8                Grd_yr_category   0.186 122.82     0.10 150055.83
##    Wait_Time_Effect
## 1 shorter wait time
## 2 shorter wait time
## 3 shorter wait time
## 4  longer wait time
## 5 shorter wait time
## 6 shorter wait time
## 7 shorter wait time
## 8  longer wait time

Significant Variables Predicting Number of Business Days until Appointment WITHOUT ACADEMIC
Predictor	P_Value	IRR	CI_Lower	CI_Upper	Wait_Time_Effect
basic_gender	<0.01	0.00	0.00	0.02	shorter wait time
gender	<0.01	0.00	0.00	0.02	shorter wait time
taxonomies_state	0.021	0.00	0.00	0.00	shorter wait time
hold_time_minutes	0.040	5.10	1.08	24.12	longer wait time
age	0.063	0.81	0.65	1.01	shorter wait time
Medicaid_to_Medicare_Fee_Index	0.130	0.89	0.77	1.03	shorter wait time
Med_sch	0.179	0.01	0.00	7.45	shorter wait time
Grd_yr_category	0.186	122.82	0.10	150055.83	longer wait time

Robust LMM with `log_business_days_until_appointments` with `academic`

## 
## Private Practice       University 
##              537               45

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: formula_simple
##    Data: df3_filtered
## 
##      AIC      BIC   logLik deviance df.resid 
##   5388.4   5405.9  -2690.2   5380.4      578 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3249 -0.4584 -0.2239  0.2451  6.3292 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  NPI      (Intercept) 296.6    17.22   
##  Residual             354.8    18.84   
## Number of obs: 582, groups:  NPI, 401
## 
## Fixed effects:
##                    Estimate Std. Error      df t value             Pr(>|t|)    
## (Intercept)          16.851      1.230 348.861  13.701 < 0.0000000000000002 ***
## academicUniversity   13.269      4.313 385.358   3.076              0.00224 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## acdmcUnvrst -0.285

## Robust linear mixed model fit by DAStau 
## Formula: formula_simple 
##    Data: df3_filtered 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.0400 -0.7605 -0.2985  0.7410 13.3397 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  NPI      (Intercept)   0.0     0.00   
##  Residual             299.8    17.32   
## Number of obs: 582, groups: NPI, 401
## 
## Fixed effects:
##                    Estimate Std. Error t value
## (Intercept)         13.1685     0.7664  17.182
## academicUniversity   4.8409     2.7562   1.756
## 
## Correlation of Fixed Effects:
##             (Intr)
## acdmcUnvrst -0.278
## 
## Robustness weights for the residuals: 
##  483 weights are ~= 1. The remaining 99 ones are summarized as
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.101   0.389   0.570   0.583   0.754   0.995 
## 
## Robustness weights for the random effects: 
##  All 401 weights are ~= 1.
## 
## Rho functions used for fitting:
##   Residuals:
##     eff: smoothed Huber (k = 1.345, s = 10) 
##     sig: smoothed Huber, Proposal 2 (k = 1.345, s = 10) 
##   Random Effects, variance component 1 (NPI):
##     eff: smoothed Huber (k = 1.345, s = 10) 
##     vcp: smoothed Huber, Proposal 2 (k = 1.345, s = 10)

Robust LMM with log_business_days_until_appointments

## The following predictors were found to be significant predicting business days until new patient appointment:
## -  basic_gender : p = <0.01 
## -  gender : p = <0.01 
## -  taxonomies_state : p = 0.02 
## -  hold_time_minutes : p = 0.04 
## -  age : p = 0.06 
## -  Medicaid_to_Medicare_Fee_Index : p = 0.13 
## -  Med_sch : p = 0.18 
## -  Grd_yr_category : p = 0.19

Model `poisson_significant` Formula with only significant variables

where:

Fixed effects include…
Random effects account for variability between physicians, modeled as a random intercept.

The random effect for physician suggests that there is substantial variability in appointment wait times between physician. Physicians with a higher random intercept will tend to have longer wait times compared to Physicians with a lower random intercept.

`poisson` Model with only significant variables

## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 0) [glmerMod]
##  Family: poisson  ( log )
## Formula: 
## business_days_until_appointment ~ basic_gender + gender + taxonomies_state +  
##     hold_time_minutes + age + Medicaid_to_Medicare_Fee_Index +  
##     Med_sch + Grd_yr_category + (1 | NPI)
##    Data: df3
## 
##      AIC      BIC   logLik deviance df.resid 
##   4548.9   4777.1  -2218.4   4436.9      379 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -6.5739 -0.7876 -0.0087  0.1344  7.0255 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  NPI    (Intercept) 2.721    1.65    
## Number of obs: 435, groups:  NPI, 321
## 
## Fixed effects:
##                                         Estimate Std. Error z value Pr(>|z|)
## (Intercept)                              7.34657    1.99366   3.685 0.000229
## basic_genderM                           -0.32033    0.23983  -1.336 0.181671
## taxonomies_stateAlaska                 -17.67431 1275.75539  -0.014 0.988946
## taxonomies_stateArizona                 -2.61897    1.16833  -2.242 0.024985
## taxonomies_stateArkansas                -3.18457    1.51478  -2.102 0.035525
## taxonomies_stateCalifornia              -1.82211    1.03404  -1.762 0.078047
## taxonomies_stateColorado                -1.60997    1.19914  -1.343 0.179403
## taxonomies_stateConnecticut             -1.74401    1.10047  -1.585 0.113016
## taxonomies_stateDistrict of Columbia    -1.42062    1.39906  -1.015 0.309909
## taxonomies_stateFlorida                 -2.19963    1.12030  -1.963 0.049595
## taxonomies_stateGeorgia                 -1.60231    1.10534  -1.450 0.147168
## taxonomies_stateHawaii                  -3.29420    1.35423  -2.433 0.014994
## taxonomies_stateIdaho                    0.89441    1.94472   0.460 0.645576
## taxonomies_stateIllinois                -2.54593    1.14497  -2.224 0.026176
## taxonomies_stateIndiana                 -2.50923    1.31437  -1.909 0.056252
## taxonomies_stateIowa                    -1.99539    1.30925  -1.524 0.127491
## taxonomies_stateKansas                  -1.88692    1.32508  -1.424 0.154443
## taxonomies_stateKentucky                -3.25247    1.26808  -2.565 0.010321
## taxonomies_stateLouisiana               -2.22717    1.26321  -1.763 0.077883
## taxonomies_stateMaine                  -19.60447 1275.75532  -0.015 0.987739
## taxonomies_stateMaryland                -1.74624    1.12042  -1.559 0.119100
## taxonomies_stateMassachusetts           -0.48968    1.08425  -0.452 0.651537
## taxonomies_stateMichigan                -2.02176    1.14877  -1.760 0.078418
## taxonomies_stateMinnesota               -1.66009    1.16073  -1.430 0.152654
## taxonomies_stateMississippi             -0.48869    1.15635  -0.423 0.672573
## taxonomies_stateMissouri                -3.39791    1.11845  -3.038 0.002381
## taxonomies_stateNebraska                -1.34662    1.19415  -1.128 0.259456
## taxonomies_stateNevada                  -1.71221    1.18578  -1.444 0.148752
## taxonomies_stateNew Jersey              -3.04517    1.25553  -2.425 0.015291
## taxonomies_stateNew Mexico              -1.18326    1.54777  -0.764 0.444572
## taxonomies_stateNew York                -2.54325    1.27953  -1.988 0.046851
## taxonomies_stateNorth Carolina          -1.17816    1.08903  -1.082 0.279321
## taxonomies_stateOhio                    -2.16304    1.12466  -1.923 0.054444
## taxonomies_stateOklahoma                -0.41792    1.37643  -0.304 0.761413
## taxonomies_stateOregon                  -1.31968    1.23366  -1.070 0.284744
## taxonomies_statePennsylvania            -2.14541    1.11471  -1.925 0.054274
## taxonomies_statePuerto Rico             -1.08179    1.55682  -0.695 0.487135
## taxonomies_stateRhode Island            -4.64362    2.19857  -2.112 0.034677
## taxonomies_stateSouth Carolina           0.19046    1.35726   0.140 0.888403
## taxonomies_stateTennessee               -1.84765    1.11138  -1.662 0.096415
## taxonomies_stateTexas                   -2.86271    1.15992  -2.468 0.013586
## taxonomies_stateUtah                   -19.50372  802.97582  -0.024 0.980622
## taxonomies_stateVermont                 -3.68100    2.05189  -1.794 0.072820
## taxonomies_stateVirginia                -2.52295    1.18318  -2.132 0.032978
## taxonomies_stateWashington              -4.67503    1.32441  -3.530 0.000416
## taxonomies_stateWest Virginia           -4.31539    1.73770  -2.483 0.013014
## taxonomies_stateWisconsin               -3.03590    1.18931  -2.553 0.010691
## taxonomies_stateWyoming                 -4.18343    2.18181  -1.917 0.055186
## hold_time_minutes                       -0.02417    0.01524  -1.586 0.112782
## age                                     -0.01881    0.02200  -0.855 0.392457
## Medicaid_to_Medicare_Fee_Index          -0.02914    0.01288  -2.263 0.023665
## Med_schInternational Medical Graduate    0.02084    0.28316   0.074 0.941339
## Grd_yr_category1990 to 1999             -0.11825    0.38312  -0.309 0.757580
## Grd_yr_category2000 to 2009             -0.28470    0.51096  -0.557 0.577395
## Grd_yr_category2010 or greater          -0.49797    0.69393  -0.718 0.472996
##                                          
## (Intercept)                           ***
## basic_genderM                            
## taxonomies_stateAlaska                   
## taxonomies_stateArizona               *  
## taxonomies_stateArkansas              *  
## taxonomies_stateCalifornia            .  
## taxonomies_stateColorado                 
## taxonomies_stateConnecticut              
## taxonomies_stateDistrict of Columbia     
## taxonomies_stateFlorida               *  
## taxonomies_stateGeorgia                  
## taxonomies_stateHawaii                *  
## taxonomies_stateIdaho                    
## taxonomies_stateIllinois              *  
## taxonomies_stateIndiana               .  
## taxonomies_stateIowa                     
## taxonomies_stateKansas                   
## taxonomies_stateKentucky              *  
## taxonomies_stateLouisiana             .  
## taxonomies_stateMaine                    
## taxonomies_stateMaryland                 
## taxonomies_stateMassachusetts            
## taxonomies_stateMichigan              .  
## taxonomies_stateMinnesota                
## taxonomies_stateMississippi              
## taxonomies_stateMissouri              ** 
## taxonomies_stateNebraska                 
## taxonomies_stateNevada                   
## taxonomies_stateNew Jersey            *  
## taxonomies_stateNew Mexico               
## taxonomies_stateNew York              *  
## taxonomies_stateNorth Carolina           
## taxonomies_stateOhio                  .  
## taxonomies_stateOklahoma                 
## taxonomies_stateOregon                   
## taxonomies_statePennsylvania          .  
## taxonomies_statePuerto Rico              
## taxonomies_stateRhode Island          *  
## taxonomies_stateSouth Carolina           
## taxonomies_stateTennessee             .  
## taxonomies_stateTexas                 *  
## taxonomies_stateUtah                     
## taxonomies_stateVermont               .  
## taxonomies_stateVirginia              *  
## taxonomies_stateWashington            ***
## taxonomies_stateWest Virginia         *  
## taxonomies_stateWisconsin             *  
## taxonomies_stateWyoming               .  
## hold_time_minutes                        
## age                                      
## Medicaid_to_Medicare_Fee_Index        *  
## Med_schInternational Medical Graduate    
## Grd_yr_category1990 to 1999              
## Grd_yr_category2000 to 2009              
## Grd_yr_category2010 or greater           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## fit warnings:
## fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

Table of `poisson_significant` Model Coefficients

Generic Interpretation of Significant Predictors: In a Poisson regression, significant predictors are those with p-values less than a chosen threshold (usually p < 0.05). These predictors have a statistically significant effect on the outcome variable—in this case, business days until an appointment. The Incidence Rate Ratios (IRRs) help interpret the direction and magnitude of these effects:

IRRs > 1: The predictor increases the expected mean number of business days. For example, an IRR of 2 means the expected waiting time is twice as long for that category compared to the reference group.
IRRs < 1: The predictor decreases the expected mean number of business days. For instance, an IRR of 0.5 means the waiting time is halved compared to the reference group.
p-values < 0.05: Indicate that the effect of the predictor is statistically significant, meaning it’s unlikely that the observed effect is due to random chance.

Analysis Based on Current Results

Examples of Significant Predictors:

Hold Time (IRR = 0.97, p = 0.033):
- Interpretation: For each additional minute spent on hold, the waiting time for an appointment decreases by 3% (IRR = 0.97). This effect is small but statistically significant (p = 0.033).
- Example: If a patient spends an extra 5 minutes on hold, the expected waiting time could decrease from 15 days to approximately 14.25 days.

Non-Significant Predictors: 1. Gender (Male) (IRR = 0.74, p = 0.227): - Interpretation: Being male is associated with a 26% reduction in waiting time compared to females (IRR = 0.74), but this effect is not statistically significant (p = 0.227).

Academic Setting (University) (IRR = 1.12, p = 0.770):
- Interpretation: Patients in university-affiliated settings are expected to wait 12% longer than those in non-university settings, but this effect is not statistically significant (p = 0.770).
Age (IRR = 0.99, p = 0.617):
- Interpretation: Increasing age slightly reduces waiting time, but the effect is minimal and not statistically significant.
Graduation Year Category (2010 or greater) (IRR = 0.59, p = 0.451):
- Interpretation: Physicians who graduated in 2010 or later were associated with a 41% reduction in waiting time, though this result is not statistically significant.
International Medical Graduates (IRR = 0.91, p = 0.761):
- Interpretation: Being an international medical graduate is associated with a 9% reduction in waiting time, though this effect is not statistically significant.

Random Effects and Marginal/Conditional R²:

Random Effects:
- Variance (NPI): 3.59 (indicating significant variability between NPIs).
- Intraclass Correlation Coefficient (ICC): 0.98 (suggesting that 98% of the variation in waiting times is explained by differences between NPIs).
Marginal R² (0.856): This means that 85.6% of the variance in waiting time is explained by the fixed effects in the model, such as hold time, gender, and practice setting.
Conditional R² (0.996): When accounting for both fixed effects and random effects (NPI variability), the model explains 99.6% of the total variance in waiting times.

Summary: The random effects model demonstrates that while some predictors, such as hold time, have a small but significant effect on waiting time, other factors like gender, academic setting, and graduation year show non-significant effects in this model. The high ICC (0.98) indicates that the majority of the variability in waiting times is due to differences between providers (NPIs), and the conditional R² (0.996) suggests that the model is highly effective at explaining overall variance when including both fixed and random effects.

	business days until appointment
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	1550.86	31.16 – 77193.45	<0.001
basic gender [M]	0.73	0.45 – 1.16	0.182
taxonomies state [Alaska]	0.00	0.00 – Inf	0.989
taxonomies state [Arizona]	0.07	0.01 – 0.72	0.025
taxonomies state [Arkansas]	0.04	0.00 – 0.81	0.036
taxonomies state [California]	0.16	0.02 – 1.23	0.078
taxonomies state [Colorado]	0.20	0.02 – 2.10	0.179
taxonomies state [Connecticut]	0.17	0.02 – 1.51	0.113
taxonomies state [District of Columbia]	0.24	0.02 – 3.75	0.310
taxonomies state [Florida]	0.11	0.01 – 1.00	0.050
taxonomies state [Georgia]	0.20	0.02 – 1.76	0.147
taxonomies state [Hawaii]	0.04	0.00 – 0.53	0.015
taxonomies state [Idaho]	2.45	0.05 – 110.61	0.646
taxonomies state [Illinois]	0.08	0.01 – 0.74	0.026
taxonomies state [Indiana]	0.08	0.01 – 1.07	0.056
taxonomies state [Iowa]	0.14	0.01 – 1.77	0.127
taxonomies state [Kansas]	0.15	0.01 – 2.03	0.154
taxonomies state [Kentucky]	0.04	0.00 – 0.46	0.010
taxonomies state [Louisiana]	0.11	0.01 – 1.28	0.078
taxonomies state [Maine]	0.00	0.00 – Inf	0.988
taxonomies state [Maryland]	0.17	0.02 – 1.57	0.119
taxonomies state [Massachusetts]	0.61	0.07 – 5.13	0.652
taxonomies state [Michigan]	0.13	0.01 – 1.26	0.078
taxonomies state [Minnesota]	0.19	0.02 – 1.85	0.153
taxonomies state [Mississippi]	0.61	0.06 – 5.92	0.673
taxonomies state [Missouri]	0.03	0.00 – 0.30	0.002
taxonomies state [Nebraska]	0.26	0.03 – 2.70	0.259
taxonomies state [Nevada]	0.18	0.02 – 1.84	0.149
taxonomies state [New Jersey]	0.05	0.00 – 0.56	0.015
taxonomies state [New Mexico]	0.31	0.01 – 6.36	0.445
taxonomies state [New York]	0.08	0.01 – 0.97	0.047
taxonomies state [North Carolina]	0.31	0.04 – 2.60	0.279
taxonomies state [Ohio]	0.11	0.01 – 1.04	0.054
taxonomies state [Oklahoma]	0.66	0.04 – 9.78	0.761
taxonomies state [Oregon]	0.27	0.02 – 3.00	0.285
taxonomies state [Pennsylvania]	0.12	0.01 – 1.04	0.054
taxonomies state [Puerto Rico]	0.34	0.02 – 7.17	0.487
taxonomies state [Rhode Island]	0.01	0.00 – 0.72	0.035
taxonomies state [South Carolina]	1.21	0.08 – 17.30	0.888
taxonomies state [Tennessee]	0.16	0.02 – 1.39	0.096
taxonomies state [Texas]	0.06	0.01 – 0.55	0.014
taxonomies state [Utah]	0.00	0.00 – Inf	0.981
taxonomies state [Vermont]	0.03	0.00 – 1.41	0.073
taxonomies state [Virginia]	0.08	0.01 – 0.82	0.033
taxonomies state [Washington]	0.01	0.00 – 0.13	<0.001
taxonomies state [West Virginia]	0.01	0.00 – 0.40	0.013
taxonomies state [Wisconsin]	0.05	0.00 – 0.49	0.011
taxonomies state [Wyoming]	0.02	0.00 – 1.10	0.055
hold time minutes	0.98	0.95 – 1.01	0.113
age	0.98	0.94 – 1.02	0.392
Medicaid to Medicare Fee Index	0.97	0.95 – 1.00	0.024
Med sch [International Medical Graduate]	1.02	0.59 – 1.78	0.941
Grd yr category [1990 to 1999]	0.89	0.42 – 1.88	0.758
Grd yr category [2000 to 2009]	0.75	0.28 – 2.05	0.577
Grd yr category [2010 or greater]	0.61	0.16 – 2.37	0.473
Random Effects
σ²	0.02
τ₀₀ _NPI	2.72
ICC	0.99
N _NPI	321
Observations	435
Marginal R² / Conditional R²	0.554 / 0.996

Visualize the `poisson_significant` modelFixed Effects

`poisson_significant` Model Performance

## We fitted a poisson mixed model (estimated using ML and BOBYQA optimizer) to
## predict business_days_until_appointment with basic_gender, gender,
## taxonomies_state, hold_time_minutes, age, Medicaid_to_Medicare_Fee_Index,
## Med_sch and Grd_yr_category (formula: business_days_until_appointment ~
## basic_gender + gender + taxonomies_state + hold_time_minutes + age +
## Medicaid_to_Medicare_Fee_Index + Med_sch + Grd_yr_category). The model included
## NPI as random effect (formula: ~1 | NPI). The model's total explanatory power
## is substantial (conditional R2 = 1.00) and the part related to the fixed
## effects alone (marginal R2) is of 0.55. The model's intercept, corresponding to
## basic_gender = F, gender = Female, taxonomies_state = Alabama,
## hold_time_minutes = 0, age = 0, Medicaid_to_Medicare_Fee_Index = 0, Med_sch =
## US Senior Medical Student and Grd_yr_category = Less than 1990, is at 7.35 (95%
## CI [3.44, 11.25], p < .001). Within this model:
## 
##   - The effect of basic gender [M] is statistically non-significant and negative
## (beta = -0.32, 95% CI [-0.79, 0.15], p = 0.182; Std. beta = -0.32, 95% CI
## [-0.79, 0.15])
##   - The effect of taxonomies state [Alaska] is statistically non-significant and
## negative (beta = -17.67, 95% CI [-2518.11, 2482.76], p = 0.989; Std. beta =
## -17.67, 95% CI [-2518.11, 2482.76])
##   - The effect of taxonomies state [Arizona] is statistically significant and
## negative (beta = -2.62, 95% CI [-4.91, -0.33], p = 0.025; Std. beta = -2.62,
## 95% CI [-4.91, -0.33])
##   - The effect of taxonomies state [Arkansas] is statistically significant and
## negative (beta = -3.18, 95% CI [-6.15, -0.22], p = 0.036; Std. beta = -3.18,
## 95% CI [-6.15, -0.22])
##   - The effect of taxonomies state [California] is statistically non-significant
## and negative (beta = -1.82, 95% CI [-3.85, 0.20], p = 0.078; Std. beta = -1.82,
## 95% CI [-3.85, 0.20])
##   - The effect of taxonomies state [Colorado] is statistically non-significant
## and negative (beta = -1.61, 95% CI [-3.96, 0.74], p = 0.179; Std. beta = -1.61,
## 95% CI [-3.96, 0.74])
##   - The effect of taxonomies state [Connecticut] is statistically non-significant
## and negative (beta = -1.74, 95% CI [-3.90, 0.41], p = 0.113; Std. beta = -1.74,
## 95% CI [-3.90, 0.41])
##   - The effect of taxonomies state [District of Columbia] is statistically
## non-significant and negative (beta = -1.42, 95% CI [-4.16, 1.32], p = 0.310;
## Std. beta = -1.42, 95% CI [-4.16, 1.32])
##   - The effect of taxonomies state [Florida] is statistically significant and
## negative (beta = -2.20, 95% CI [-4.40, -3.89e-03], p = 0.050; Std. beta =
## -2.20, 95% CI [-4.40, -3.89e-03])
##   - The effect of taxonomies state [Georgia] is statistically non-significant and
## negative (beta = -1.60, 95% CI [-3.77, 0.56], p = 0.147; Std. beta = -1.60, 95%
## CI [-3.77, 0.56])
##   - The effect of taxonomies state [Hawaii] is statistically significant and
## negative (beta = -3.29, 95% CI [-5.95, -0.64], p = 0.015; Std. beta = -3.29,
## 95% CI [-5.95, -0.64])
##   - The effect of taxonomies state [Idaho] is statistically non-significant and
## positive (beta = 0.89, 95% CI [-2.92, 4.71], p = 0.646; Std. beta = 0.89, 95%
## CI [-2.92, 4.71])
##   - The effect of taxonomies state [Illinois] is statistically significant and
## negative (beta = -2.55, 95% CI [-4.79, -0.30], p = 0.026; Std. beta = -2.55,
## 95% CI [-4.79, -0.30])
##   - The effect of taxonomies state [Indiana] is statistically non-significant and
## negative (beta = -2.51, 95% CI [-5.09, 0.07], p = 0.056; Std. beta = -2.51, 95%
## CI [-5.09, 0.07])
##   - The effect of taxonomies state [Iowa] is statistically non-significant and
## negative (beta = -2.00, 95% CI [-4.56, 0.57], p = 0.127; Std. beta = -2.00, 95%
## CI [-4.56, 0.57])
##   - The effect of taxonomies state [Kansas] is statistically non-significant and
## negative (beta = -1.89, 95% CI [-4.48, 0.71], p = 0.154; Std. beta = -1.89, 95%
## CI [-4.48, 0.71])
##   - The effect of taxonomies state [Kentucky] is statistically significant and
## negative (beta = -3.25, 95% CI [-5.74, -0.77], p = 0.010; Std. beta = -3.25,
## 95% CI [-5.74, -0.77])
##   - The effect of taxonomies state [Louisiana] is statistically non-significant
## and negative (beta = -2.23, 95% CI [-4.70, 0.25], p = 0.078; Std. beta = -2.23,
## 95% CI [-4.70, 0.25])
##   - The effect of taxonomies state [Maine] is statistically non-significant and
## negative (beta = -19.60, 95% CI [-2520.04, 2480.83], p = 0.988; Std. beta =
## -19.60, 95% CI [-2520.04, 2480.83])
##   - The effect of taxonomies state [Maryland] is statistically non-significant
## and negative (beta = -1.75, 95% CI [-3.94, 0.45], p = 0.119; Std. beta = -1.75,
## 95% CI [-3.94, 0.45])
##   - The effect of taxonomies state [Massachusetts] is statistically
## non-significant and negative (beta = -0.49, 95% CI [-2.61, 1.64], p = 0.652;
## Std. beta = -0.49, 95% CI [-2.61, 1.64])
##   - The effect of taxonomies state [Michigan] is statistically non-significant
## and negative (beta = -2.02, 95% CI [-4.27, 0.23], p = 0.078; Std. beta = -2.02,
## 95% CI [-4.27, 0.23])
##   - The effect of taxonomies state [Minnesota] is statistically non-significant
## and negative (beta = -1.66, 95% CI [-3.94, 0.61], p = 0.153; Std. beta = -1.66,
## 95% CI [-3.94, 0.61])
##   - The effect of taxonomies state [Mississippi] is statistically non-significant
## and negative (beta = -0.49, 95% CI [-2.76, 1.78], p = 0.673; Std. beta = -0.49,
## 95% CI [-2.76, 1.78])
##   - The effect of taxonomies state [Missouri] is statistically significant and
## negative (beta = -3.40, 95% CI [-5.59, -1.21], p = 0.002; Std. beta = -3.40,
## 95% CI [-5.59, -1.21])
##   - The effect of taxonomies state [Nebraska] is statistically non-significant
## and negative (beta = -1.35, 95% CI [-3.69, 0.99], p = 0.259; Std. beta = -1.35,
## 95% CI [-3.69, 0.99])
##   - The effect of taxonomies state [Nevada] is statistically non-significant and
## negative (beta = -1.71, 95% CI [-4.04, 0.61], p = 0.149; Std. beta = -1.71, 95%
## CI [-4.04, 0.61])
##   - The effect of taxonomies state [New Jersey] is statistically significant and
## negative (beta = -3.05, 95% CI [-5.51, -0.58], p = 0.015; Std. beta = -3.05,
## 95% CI [-5.51, -0.58])
##   - The effect of taxonomies state [New Mexico] is statistically non-significant
## and negative (beta = -1.18, 95% CI [-4.22, 1.85], p = 0.445; Std. beta = -1.18,
## 95% CI [-4.22, 1.85])
##   - The effect of taxonomies state [New York] is statistically significant and
## negative (beta = -2.54, 95% CI [-5.05, -0.04], p = 0.047; Std. beta = -2.54,
## 95% CI [-5.05, -0.04])
##   - The effect of taxonomies state [North Carolina] is statistically
## non-significant and negative (beta = -1.18, 95% CI [-3.31, 0.96], p = 0.279;
## Std. beta = -1.18, 95% CI [-3.31, 0.96])
##   - The effect of taxonomies state [Ohio] is statistically non-significant and
## negative (beta = -2.16, 95% CI [-4.37, 0.04], p = 0.054; Std. beta = -2.16, 95%
## CI [-4.37, 0.04])
##   - The effect of taxonomies state [Oklahoma] is statistically non-significant
## and negative (beta = -0.42, 95% CI [-3.12, 2.28], p = 0.761; Std. beta = -0.42,
## 95% CI [-3.12, 2.28])
##   - The effect of taxonomies state [Oregon] is statistically non-significant and
## negative (beta = -1.32, 95% CI [-3.74, 1.10], p = 0.285; Std. beta = -1.32, 95%
## CI [-3.74, 1.10])
##   - The effect of taxonomies state [Pennsylvania] is statistically
## non-significant and negative (beta = -2.15, 95% CI [-4.33, 0.04], p = 0.054;
## Std. beta = -2.15, 95% CI [-4.33, 0.04])
##   - The effect of taxonomies state [Puerto Rico] is statistically non-significant
## and negative (beta = -1.08, 95% CI [-4.13, 1.97], p = 0.487; Std. beta = -1.08,
## 95% CI [-4.13, 1.97])
##   - The effect of taxonomies state [Rhode Island] is statistically significant
## and negative (beta = -4.64, 95% CI [-8.95, -0.33], p = 0.035; Std. beta =
## -4.64, 95% CI [-8.95, -0.33])
##   - The effect of taxonomies state [South Carolina] is statistically
## non-significant and positive (beta = 0.19, 95% CI [-2.47, 2.85], p = 0.888;
## Std. beta = 0.19, 95% CI [-2.47, 2.85])
##   - The effect of taxonomies state [Tennessee] is statistically non-significant
## and negative (beta = -1.85, 95% CI [-4.03, 0.33], p = 0.096; Std. beta = -1.85,
## 95% CI [-4.03, 0.33])
##   - The effect of taxonomies state [Texas] is statistically significant and
## negative (beta = -2.86, 95% CI [-5.14, -0.59], p = 0.014; Std. beta = -2.86,
## 95% CI [-5.14, -0.59])
##   - The effect of taxonomies state [Utah] is statistically non-significant and
## negative (beta = -19.50, 95% CI [-1593.31, 1554.30], p = 0.981; Std. beta =
## -19.50, 95% CI [-1593.31, 1554.30])
##   - The effect of taxonomies state [Vermont] is statistically non-significant and
## negative (beta = -3.68, 95% CI [-7.70, 0.34], p = 0.073; Std. beta = -3.68, 95%
## CI [-7.70, 0.34])
##   - The effect of taxonomies state [Virginia] is statistically significant and
## negative (beta = -2.52, 95% CI [-4.84, -0.20], p = 0.033; Std. beta = -2.52,
## 95% CI [-4.84, -0.20])
##   - The effect of taxonomies state [Washington] is statistically significant and
## negative (beta = -4.68, 95% CI [-7.27, -2.08], p < .001; Std. beta = -4.68, 95%
## CI [-7.27, -2.08])
##   - The effect of taxonomies state [West Virginia] is statistically significant
## and negative (beta = -4.32, 95% CI [-7.72, -0.91], p = 0.013; Std. beta =
## -4.32, 95% CI [-7.72, -0.91])
##   - The effect of taxonomies state [Wisconsin] is statistically significant and
## negative (beta = -3.04, 95% CI [-5.37, -0.70], p = 0.011; Std. beta = -3.04,
## 95% CI [-5.37, -0.70])
##   - The effect of taxonomies state [Wyoming] is statistically non-significant and
## negative (beta = -4.18, 95% CI [-8.46, 0.09], p = 0.055; Std. beta = -4.18, 95%
## CI [-8.46, 0.09])
##   - The effect of hold time minutes is statistically non-significant and negative
## (beta = -0.02, 95% CI [-0.05, 5.70e-03], p = 0.113; Std. beta = -0.03, 95% CI
## [-0.08, 7.93e-03])
##   - The effect of age is statistically non-significant and negative (beta =
## -0.02, 95% CI [-0.06, 0.02], p = 0.392; Std. beta = -0.20, 95% CI [-0.66,
## 0.26])
##   - The effect of Medicaid to Medicare Fee Index is statistically significant and
## negative (beta = -0.03, 95% CI [-0.05, -3.90e-03], p = 0.024; Std. beta =
## -0.46, 95% CI [-0.87, -0.06])
##   - The effect of Med sch [International Medical Graduate] is statistically
## non-significant and positive (beta = 0.02, 95% CI [-0.53, 0.58], p = 0.941;
## Std. beta = 0.02, 95% CI [-0.53, 0.58])
##   - The effect of Grd yr category [1990 to 1999] is statistically non-significant
## and negative (beta = -0.12, 95% CI [-0.87, 0.63], p = 0.758; Std. beta = -0.12,
## 95% CI [-0.87, 0.63])
##   - The effect of Grd yr category [2000 to 2009] is statistically non-significant
## and negative (beta = -0.28, 95% CI [-1.29, 0.72], p = 0.577; Std. beta = -0.28,
## 95% CI [-1.29, 0.72])
##   - The effect of Grd yr category [2010 or greater] is statistically
## non-significant and negative (beta = -0.50, 95% CI [-1.86, 0.86], p = 0.473;
## Std. beta = -0.50, 95% CI [-1.86, 0.86])
## 
## Standardized parameters were obtained by fitting the model on a standardized
## version of the dataset. 95% Confidence Intervals (CIs) and p-values were
## computed using a Wald z-distribution approximation.

## The marginal R² value of the model is 0.554 and the conditional R² value is 0.996

## The marginal R² represents the proportion of variance explained by the fixed effects ( (Intercept), basic_genderM, taxonomies_stateAlaska, taxonomies_stateArizona, taxonomies_stateArkansas, taxonomies_stateCalifornia, taxonomies_stateColorado, taxonomies_stateConnecticut, taxonomies_stateDistrict of Columbia, taxonomies_stateFlorida, taxonomies_stateGeorgia, taxonomies_stateHawaii, taxonomies_stateIdaho, taxonomies_stateIllinois, taxonomies_stateIndiana, taxonomies_stateIowa, taxonomies_stateKansas, taxonomies_stateKentucky, taxonomies_stateLouisiana, taxonomies_stateMaine, taxonomies_stateMaryland, taxonomies_stateMassachusetts, taxonomies_stateMichigan, taxonomies_stateMinnesota, taxonomies_stateMississippi, taxonomies_stateMissouri, taxonomies_stateNebraska, taxonomies_stateNevada, taxonomies_stateNew Jersey, taxonomies_stateNew Mexico, taxonomies_stateNew York, taxonomies_stateNorth Carolina, taxonomies_stateOhio, taxonomies_stateOklahoma, taxonomies_stateOregon, taxonomies_statePennsylvania, taxonomies_statePuerto Rico, taxonomies_stateRhode Island, taxonomies_stateSouth Carolina, taxonomies_stateTennessee, taxonomies_stateTexas, taxonomies_stateUtah, taxonomies_stateVermont, taxonomies_stateVirginia, taxonomies_stateWashington, taxonomies_stateWest Virginia, taxonomies_stateWisconsin, taxonomies_stateWyoming, hold_time_minutes, age, Medicaid_to_Medicare_Fee_Index, Med_schInternational Medical Graduate, Grd_yr_category1990 to 1999, Grd_yr_category2000 to 2009, Grd_yr_category2010 or greater ) alone ( 55.42 %). The conditional R² represents the proportion of variance explained by both the fixed effects and the random effects ( NPI ) combined ( 99.6 %). This indicates how much of the variability in the outcome can be attributed to the fixed effects versus the entire model, including random effects.

For poisson_significant model: To determine which random effects were significant in your model, you need to look at the variance components for the random effects and their corresponding standard deviations. In mixed models, random effects themselves do not have p-values like fixed effects do. Instead, you evaluate their significance by looking at the variance of the random effects. If the variance is near zero, the random effect may not be contributing much to the model.

Here’s how you can extract and interpret the variance of the random effects to assess their significance for poisson_significant:

## [1] "The random effects in the model are:\n NPI"             
## [2] "The random effects in the model are:\n (Intercept)"     
## [3] "The random effects in the model are:\n NA"              
## [4] "The random effects in the model are:\n 2.72086652327837"
## [5] "The random effects in the model are:\n 1.64950493278388"
## [6] "The random effects in the model are:\n Yes"

## The significant random effects are: NPI

`simr_poisson_full_model` Model Power analysis

The power analysis you’ve conducted with the powerSim function is used to estimate the statistical power of your model for detecting effects of a specific predictor—in this case, the predictor insurance in a Poisson mixed-effects model.

Test the `poisson_significant` model assumptions

Checking the binned residuals because the data is non-parametric the residuals will not be normally distributed. Collinearity was tested as well as heteroscedasticity was checked.

The residuals appear to be spread out more as the fitted values increase. This funnel shape (with wider dispersion of residuals at higher fitted values) is an indication of heteroscedasticity. In a model with homoscedasticity, the residuals would have a consistent spread across all levels of fitted values, without a clear pattern. The data is non-parametric so the residuals will not be within error bounds.

`poisson_significant` Collinearity

Variance Inflation Factors (VIF) were calculated to assess multicollinearity among predictors. All VIF values were below the commonly used threshold of 5, suggesting that multicollinearity is not a concern for this model.

Variable Importance Factors
	GVIF	Df	GVIF^(1/(2*Df))
basic_gender	1.418134	1	1.190854
gender	192731468365052084224.000000	0	Inf
taxonomies_state	11.268446	46	1.026676
hold_time_minutes	1.008470	1	1.004226
age	5.436610	1	2.331654
Medicaid_to_Medicare_Fee_Index	4.172260	1	2.042611
Med_sch	1.208378	1	1.099263
Grd_yr_category	8.241516	3	1.421241

## 34 outliers detected: cases 30, 58, 60, 68, 69, 94, 95, 100, 137, 145,
##   180, 190, 197, 198, 203, 209, 227, 247, 251, 252, 262, 263, 284, 295,
##   296, 298, 301, 344, 346, 347, 362, 408, 425, 428.
## - Based on the following method and threshold: cook (0.9).
## - For variable: (Whole model).

`poisson` Intraclass Correlation Coefficient

The Intraclass Correlation Coefficient (ICC) is a statistical measure used to evaluate the proportion of variance in a dependent variable that can be attributed to differences between groups or clusters. It is commonly used in the context of hierarchical or mixed models to quantify the degree of similarity within clusters.

## The intraclass correlation (ICC) of the model for the random effect group ' NPI ' is 0.731 .
## This indicates that 73.1 % of the variance in the outcome variable is attributable to differences between the NPI groups.

## 
##  This is a moderate ICC for the NPI group, suggesting that a considerable portion of the variance is due to differences between these groups.

A low to moderate Intraclass Correlation Coefficient (ICC) for the group “physician NPI name” suggests that while there is some variation in the outcome variable (e.g., business days until appointment) that can be attributed to differences between individual physicians, a substantial portion of the variation occurs within these groups—meaning that much of the variability in appointment times is due to factors other than just the differences between physicians.

In practical terms, this indicates that:

Variation Between Physicians: The fact that the ICC is not zero means that there is some consistency in the appointment times associated with each physician. Some physicians might systematically have longer or shorter wait times, contributing to the variance in the data.
Variation Within Physicians: Since the ICC is low to moderate, it means that even within the same physician, there is considerable variability in appointment times. This could be due to a variety of factors, such as the type of insurance, the scenario, or other factors that are not captured by the physician’s identity alone.
Implications: The low to moderate ICC suggests that while the identity of the physician (as indicated by the NPI name) does have an effect, it is not the dominant factor driving differences in appointment times. Other factors—potentially those captured by fixed effects or residual variance—are also playing a significant role.

In summary, while who the physician is does matter to some extent, other variables are likely more influential in determining how long a patient waits for an appointment. This insight can guide you to look more closely at those other factors in your analysis or to consider whether there are ways to reduce variability within physicians, such as through standardized scheduling practices.

`poisson_significant` Dispersion

Overdispersion in your model implies that the variability in the observed data is greater than what the model predicts under the Poisson assumption. Specifically, in a Poisson model, the mean and variance of the count data are assumed to be equal.

## [1] "Significant overdispersion detected. Consider using a Negative Binomial model or adding random effects to account for overdispersion."

## Warning: Autocorrelated residuals detected (p < .001).

## [1] FALSE

Testing assumptions you can use the logLik function to get the log-likelihood of the model, and calculate the residual deviance as -2 * logLik(model). The residual degrees of freedom can be computed as the number of observations minus the number of parameters estimated (which includes both fixed effects and random effects).

The number of parameters estimated can be calculated as the number of fixed effects plus the number of random effects parameters. The number of fixed effects can be obtained from the length of fixef(model), and the number of random effects parameters can be obtained from the length of VarCorr(model).

If the dispersion parameter is considerably greater than 1, it indicates overdispersion. If it is less than 1, it indicates underdispersion. A value around 1 is considered ideal for Poisson regression.

## 'log Lik.' 8.435094 (df=56)

Linearity of logit

The Poisson regression assumes that the log of the expected count is a linear function of the predictors. One way to check this is to plot the observed counts versus the predicted counts and see if the relationship looks linear.

DRAFT: Urgent GYN Issue Mystery Caller Study

Tyler M. Muffly, MD and Melanie Mandell

13 November, 2024

tyler install

Read in data

Quality Check the Data

Are there any physicians included more than twice?

Variables of those physicians included more than twice?

Find physicians called more than three times

Do they have business_days_until_appointment greater than zero but are an excluded category?

Do they have NA for business_days_until_appointment but are “Included” in the Reasons for exclusion category?

Data Munging

Create Median Household Income Quantiles

Zip Analysis

National percentage of physicians in most affluent ZIP Codes

Results

Check data normality

Appointment Accessibility

Overall distribution of calls between BCBS and Medicaid

Insurance Acceptance Rates

Individual Insurance Rates of Successfully Making an Appointment

Told to seek Emergency Care

May not need this

Age Physician Description

Gender Physician Description

Exclusions

Visualizing the Each Individual Predictor

Business days by insurance

Log Business Days

told_to_go_to_the_emergency_depa for emergency scenario types

Emergency vs Urgent scenario types

Day of the week by insurance

Central Appointment Line by Insurance

Physician Gender by Insurance

Physician MD vs. DO by Insurance

Scenario

Descriptive Tables

Table 1 - Split across Insurances

Wait Time by Insurance Figures

Line Plot

Plot interpretation in R Markdown

Scatter Plot

Density Plot

Wait Time by Scenario Figures

Line Plot

Scatter Plot

Density Plot

Understanding a Density Plot:

Statistical testing

Combined plot of Subspecialty and Insurance

Wait Time

Poisson Predicted Wait Times

Is there a difference in wait times by insurance?

Scenarios for Variable Selection

Number of offices with each of the four scenarios successfully contacted: business_days_until_appointment ~ scenario

Insurance

Full Poisson Model poisson_full_model

Single predictor models for poisson_full_model

Troubleshooting large IRR for academic

Rerun poisson_full_model by removing academic

Robust LMM with log_business_days_until_appointments with academic

Model poisson_significant Formula with only significant variables

poisson Model with only significant variables

Table of poisson_significant Model Coefficients

Visualize the poisson_significant modelFixed Effects

poisson_significant Model Performance

simr_poisson_full_model Model Power analysis

Test the poisson_significant model assumptions

poisson_significant Collinearity

poisson Intraclass Correlation Coefficient

poisson_significant Dispersion

Linearity of logit

Is there a difference in wait times by `insurance`?

Number of offices with each of the four scenarios successfully contacted: `business_days_until_appointment ~ scenario`

Full Poisson Model `poisson_full_model`

Single predictor models for `poisson_full_model`

Troubleshooting large IRR for `academic`

Rerun `poisson_full_model` by removing `academic`

Robust LMM with `log_business_days_until_appointments` with `academic`

Model `poisson_significant` Formula with only significant variables

`poisson` Model with only significant variables

Table of `poisson_significant` Model Coefficients

Visualize the `poisson_significant` modelFixed Effects

`poisson_significant` Model Performance

`simr_poisson_full_model` Model Power analysis

Test the `poisson_significant` model assumptions

`poisson_significant` Collinearity

`poisson` Intraclass Correlation Coefficient

`poisson_significant` Dispersion