Introduction: Understanding Medicare Physician and Practitioner Services

Part 0: Dataset Library Loadings

As part of this study, R libraries such as tidyverse will be used for data manipulation and visualization. More specifically, the functions in ggplot2 and dplyr will be used frequently for plotting and reorganizing data. For consistency and clarity, the tidyverse library and the dataset itself will be loaded before descriptive analysis.
# Loading tidyverse for data manipulation and visualization
suppressPackageStartupMessages(library(tidyverse))

# Loading Medicare dataset for study (information is provide below)
medicare <- read.csv("~/Downloads/medicare_providers_2022.csv")

Part 2: Dataset Description

The Medicare Physician & Other Practitioners - by Geography and Service dataset selected for this study includes information regarding Medicare provider locations and services, overall geographic factors, and quantitative representations of payments submitted by providers and the aid paid to beneficiaries by Medicare.

Variable Renaming

Before analyzing the 15 variables in the dataset, the variables will be renamed for the purpose of clarity moving forward. This step is important to ensure that each variable name reflects its content and purpose in an accessible manner.

# Renaming medicare variables to accessible identifiers
medicare <- medicare %>%
  rename(
    provider_geo_level = Rndrng_Prvdr_Geo_Lvl,
    provider_geo_code = Rndrng_Prvdr_Geo_Cd,
    provider_geo_description = Rndrng_Prvdr_Geo_Desc,
    hcpcs_code = HCPCS_Cd,
    hcpcs_description = HCPCS_Desc,
    hcpcs_drug_indicator = HCPCS_Drug_Ind,
    place_of_service = Place_Of_Srvc,
    total_providers = Tot_Rndrng_Prvdrs,
    total_services = Tot_Srvcs,
    total_beneficiaries = Tot_Benes,
    total_beneficiary_day_services = Tot_Bene_Day_Srvcs,
    average_submitted_charge = Avg_Sbmtd_Chrg,
    average_medicare_allowed_amount = Avg_Mdcr_Alowd_Amt,
    average_medicare_payment_amount = Avg_Mdcr_Pymt_Amt,
    average_medicare_standardized_amount = Avg_Mdcr_Stdzd_Amt
  )

Variable Descriptions

The following table includes the type (character or numeric), classification (binary, nominal, ordinal, discrete, or continuous), and a basic overview description of each variable in the dataset. Additional information regarding the implications of each variable is covered below this table. However, this table can be used as a reference to understand what each variable means throughout the study.

Variable Name Type Classification Description
Geographic Information
provider_geo_level character Binary Geographic levels (e.g., “National”, “State”)
provider_geo_code character Discrete Geographic identifiers (e.g., FIPS codes in number)
provider_geo_description character Nominal Descriptive names of geographic regions
Service Information
hcpcs_code character Nominal Healthcare Common Procedure Coding System
hcpcs_description character Nominal Descriptions of HCPCS codes
hcpcs_drug_indicator character Binary Indicates if code represents a drug (‘Y’ or ‘N’)
place_of_service character Binary Facility (‘F’) or non-facility (‘O’) service
Utilization Metrics
total_providers numeric Discrete Count of rendering providers
total_services numeric Discrete Count of services provided
total_beneficiaries numeric Discrete Count of unique beneficiaries
total_beneficiary_day_services numeric Discrete Count of beneficiary day services
Financial Metrics
average_submitted_charge numeric Continuous Average charge submitted by providers
average_medicare_allowed_amount numeric Continuous Average amount allowed by Medicare
average_medicare_payment_amount numeric Continuous Average amount paid by Medicare
average_medicare_standardized_amount numeric Continuous Average standardized Medicare payment
The table above briefly details the 15 variables in the dataset. To find additional information, the following information below provides an explanation of key variables to enhance understanding of the dataset. This expanded description focuses on HCPCS codes, place of service data, service and beneficiary metrics, and financial indicators.

HCPCS Code and Description

The Healthcare Common Procedure Coding System (HCPCS) as described in the hcpcs_code variable are identifiers for describing the service furnished by the provider. Level I codes (CPT codes) are maintained by the American Medical Association and Level II codes are created by the Centers for Medicare & Medicaid Services for services that are not covered by CPT codes. These descriptions (hcpcs_description) are patient-friendly codes that describe the service provided.

Place of Service

This binary variable (place_of_service) takes a value between facility (‘F’) and non-facility (‘O’) services. Non-facility refers to an office setting but can include other entities as described by CMS.

Number of Services and Beneficiaries

The number of services (total_services) generally implies the variable name, however, the method for attaining this number can vary from service to service. The number of beneficiaries (total_beneficiaries) represents distinct individuals receiving the service and the total beneficiary days of services (total_beneficiary_day_services accounts) for multiple services provided to a beneficiary on the same day to avoid double-counting.

Financial Metrics

The average charge submitted (average_submitted_charge) represents what providers bill for the service. The average amount allowed (average_medicare_allowed_amount) is the amount Medicare allows including beneficiary responsibility and third-party payments. The average amount paid by Medicare (average_medicare_payment_amount) is what Medicare actually pays after deductibles and coinsurance. The average standardized Medicare payment (average_medicare_standardized_amount) is a standardized payment amount that removes geographic differences which allows for more accurate comparisons across regions.

Observations Analysis

Each observation or row in the dataset represents the aggregated service and payment information for a specific geographic region (National or State), the HCPCS code, and the place of service with Medicare payment amounts detailed for the procedure. While it may seem that the data represents a single beneficiary, each observation is based on the service provided and the costs associated with the procedure. However, this data can be used to determine implicit or possibly connected impacts on Medicare beneficiaries from costs.

Implications of Observations

By analyzing many of these observations from the Medicare Physician & Other Practitioners - by Geography and Service dataset, trends in healthcare delivery, cost variations, and service utilization patterns within the Medicare system can be identified.

Part 3: Motivation for Study

This study is dedicated to my father, who frequently requires medical attention and relies on Medicare as his primary insurance. Through this analysis, I aim to discover insights that will not only benefit my family but also assist others in understanding the intricacies, advantages, and procedures associated with Medicare. I hope to empower individuals and families to navigate the complexities of the Medicare system more effectively.

By analyzing this dataset, this allows for uncovering derivations in the following scopes:

  1. Analyzing healthcare cost and aid disparities across various regions
  2. Identifying 2022 trends in medical services provided and top procedures
  3. Examining the relationship between provider payments and Medicare aid
  4. Exploring potential areas for healthcare policy and cost improvements

Part 4: Basic Summary Statistics

The following functions and table summarize key information regarding the dataset. Starting from built-in methods, the summary() and str() functions provide information on the minimum, median, maximum, and quartiles for quantitative variables in the dataset. In addition to this, the str() function summarizes the dimensions of the dataset and the types associated with each variable. From this, additional R methods are used to determine the number of unique provider geographies, the number of unique HCPCS codes, and the region with the highest service utilization from the data.

Summary Descriptions

The summary() function is used to derive information on the minimum, median, maximum, and quartiles for quantitative variables.

# Summary of dataset (minimum, maximum, median, mean, etc.)
summary(medicare)
##  provider_geo_level provider_geo_code  provider_geo_description
##  Length:270673      Length:270673      Length:270673           
##  Class :character   Class :character   Class :character        
##  Mode  :character   Mode  :character   Mode  :character        
##                                                                
##                                                                
##                                                                
##   hcpcs_code        hcpcs_description  hcpcs_drug_indicator place_of_service  
##  Length:270673      Length:270673      Length:270673        Length:270673     
##  Class :character   Class :character   Class :character     Class :character  
##  Mode  :character   Mode  :character   Mode  :character     Mode  :character  
##                                                                               
##                                                                               
##                                                                               
##  total_providers    total_beneficiaries total_services     
##  Min.   :     1.0   Min.   :      11    Min.   :       11  
##  1st Qu.:    11.0   1st Qu.:      30    1st Qu.:       40  
##  Median :    29.0   Median :     106    Median :      162  
##  Mean   :   266.8   Mean   :    5343    Mean   :    23595  
##  3rd Qu.:    95.0   3rd Qu.:     586    3rd Qu.:     1102  
##  Max.   :601911.0   Max.   :21459588    Max.   :103325664  
##  total_beneficiary_day_services average_submitted_charge
##  Min.   :      11               Min.   :    0.0         
##  1st Qu.:      38               1st Qu.:  127.0         
##  Median :     143               Median :  440.6         
##  Mean   :   10396               Mean   : 1309.7         
##  3rd Qu.:     831               3rd Qu.: 1606.7         
##  Max.   :90436622               Max.   :99509.8         
##  average_medicare_allowed_amount average_medicare_payment_amount
##  Min.   :    0.00                Min.   :    0.00               
##  1st Qu.:   35.13                1st Qu.:   27.78               
##  Median :  110.84                Median :   85.52               
##  Mean   :  291.80                Mean   :  232.25               
##  3rd Qu.:  315.65                3rd Qu.:  250.72               
##  Max.   :58494.73                Max.   :46612.60               
##  average_medicare_standardized_amount
##  Min.   :    0.00                    
##  1st Qu.:   27.60                    
##  Median :   85.03                    
##  Mean   :  230.06                    
##  3rd Qu.:  249.34                    
##  Max.   :46577.95

The str() function is used to derive the dimensions of the dataset and the types associated with each variable.

# Summary of dimensions and variables (data types and example data)
str(medicare)
## 'data.frame':    270673 obs. of  15 variables:
##  $ provider_geo_level                  : chr  "State" "State" "State" "State" ...
##  $ provider_geo_code                   : chr  "9E" "9E" "9E" "9E" ...
##  $ provider_geo_description            : chr  "Foreign Country" "Foreign Country" "Foreign Country" "Foreign Country" ...
##  $ hcpcs_code                          : chr  "J1885" "G0439" "G0416" "G0283" ...
##  $ hcpcs_description                   : chr  "Injection, ketorolac tromethamine, per 15 mg" "Annual wellness visit, includes a personalized prevention plan of service (pps), subsequent visit" "Surgical pathology, gross and microscopic examinations, for prostate needle biopsy, any method" "Electrical stimulation (unattended), to one or more areas for indication(s) other than wound care, as part of a"| __truncated__ ...
##  $ hcpcs_drug_indicator                : chr  "Y" "N" "N" "N" ...
##  $ place_of_service                    : chr  "O" "O" "F" "O" ...
##  $ total_providers                     : int  6 3 3 2 3 3 3 3 2 1 ...
##  $ total_beneficiaries                 : int  21 37 89 19 13 14 38 51 50 19 ...
##  $ total_services                      : num  29 37 89 93 13 14 38 54 347 41 ...
##  $ total_beneficiary_day_services      : int  23 37 89 93 13 14 38 54 347 36 ...
##  $ average_submitted_charge            : num  39.8 226.4 821.9 41.7 947.5 ...
##  $ average_medicare_allowed_amount     : num  0.564 119.932 178.407 9.294 178.261 ...
##  $ average_medicare_payment_amount     : num  0.375 119.932 142.263 7.025 178.261 ...
##  $ average_medicare_standardized_amount: num  0.373 128.837 139.189 7.065 183.48 ...

Summary Manual Information

Additional R Data Summary Techniques

The num_unique_geos variable is calculated to be the number of unique provider geographies from the provider_geo_description variable.

# Number of Unique Provider Geographies
num_unique_geos <- length(unique(medicare$provider_geo_description))
print(paste("Number of Unique Provider Geographies:", num_unique_geos))
## [1] "Number of Unique Provider Geographies: 63"

The num_unique_hcpcs variable is calculated to be the number of unique HCPCS codes from the hcpcs_code variable.

# Number of Unique HCPCS Codes
num_unique_hcpcs <- length(unique(medicare$hcpcs_code))
print(paste("Number of Unique HCPCS Codes:", num_unique_hcpcs))
## [1] "Number of Unique HCPCS Codes: 9231"

The service_by_state variable is calculated to be the aggregate of total services from the provider_geo_description variable.

# Identify State with Highest Service Utilization
service_by_state <- aggregate(medicare$total_services, FUN=sum,
                              by=list(Category=medicare$provider_geo_description))
highest_state <- service_by_state[which.max(service_by_state$x), ]
print(paste("State with Highest Service Utilization:", highest_state$Category))
## [1] "State with Highest Service Utilization: National"

Summary Table of Information

The following table includes summarized information regarding the dataset. From this, ideas regarding Medicare services and costs can be formulated such as how aid differs from the submitted payment amount from providers. Further analysis will be conducted on such potentially observed relationships.

Metric Value Description
Total Number of Variables 15 Number of columns in the dataset
Total Number of Observations 270,673 Total number of rows representing service instances
Avg. Medicare Submitted per Service $1309.7 Average charge initially billed by providers
Avg. Medicare Allowed Aid per Service $291.80 Average amount Medicare approves for payment
Avg. Medicare Standardized Aid $230.06 Average payment adjusted for geographic variations
Avg. Medicare Actual Aid $232.25 Average payment provided by Medicare
Number of Unique Provider Geographies 63 Count of distinct geographical areas included
Number of Unique HCPCS Codes 9231 Count of different medical service codes in the dataset
State with Highest Service Utilization National Services aggregated at a National Level

Summary Key Insights

  • Extensive Range of Services: The dataset has a wide spectrum of medical services represented by over 9,000 HCPCS codes
  • Geographic Scope Beyond States: Geographic data includes national-level extending the analysis beyond individual states
  • Disparity Between Charges and Payments: There is a significant difference between the average submitted charge ($1309.7) and the average standardized payment amount ($230.06) which demonstrates that there are differences between the provider’s costs and the aid provided

Visualizations: Mapping Medicare Service Distributions and Aid Across Geographies

Part 0: Overview of Visualizations for Medicare 2022 Data

There are three plots covered in this section with two being box plots and one being a scatter plot. From these charts, significant insights regarding Medicare aid and payments are revealed. Found below will be three charts with the following titles: * Average Submitted Charge vs. Average Medicare Aid Amount * Distribution of Average Medicare Submitted Amount for Top 10 Areas * Distribution of Average Medicare Aid Amount for Top 10 Areas

Part 1: Average Submitted Charge vs. Average Medicare Aid Amount

The “Average Submitted Charge vs. Average Medicare Aid Amount” plot demonstrates the relationship between two quantitative variables, namely average_medicare_payment_amount and average_submitted_charge. A few modifications were made to the original data when plotting these variables. The first was removing outliers that existed above $50,000 for both of the variables. From calculations, there are only 45 cases where the variable values are above $50,000, which makes these insignificant for evaluating overall trends. In addition to this, 3 linear equations are mapped below. The solid black line with the equation “y = 0.20x - 29.3” demonstrates the line of best fit for the data. Two extra dashed lines are present with the green line as “y = x” and the orange as “y = 0.8x”.

The line of each slope represents the coverage as a percent that Medicare provides for each service provided. For example, the y = x equates to 100% coverage from the submitted payment. As can be seen from the chart below, there is a minimum amount of services that have been fully covered by Medicare. To make a more realistic assumption, the y = 0.8x line is graphed to represent 80% coverage by Medicare. From this, it is also observed that few points are above or at that line. In fact, the line of best fit demonstrates that the actual coverage by Medicare is 20% for all services provided in 2022.

# Loading Scales Library 
suppressPackageStartupMessages(library(scales))

# Fitting Linear Model
lm_fit <- lm(average_medicare_payment_amount ~ average_submitted_charge, data = medicare)
lm_coef <- coef(lm_fit)

# Average Submitted Charge vs. Average Medicare Aid Amount
ggplot(medicare, aes(average_submitted_charge, average_medicare_payment_amount)) +
  geom_point(alpha = 0.75, color = "navyblue", fill = "turquoise1", shape = 21, size = 2, stroke = 0.5) +
  geom_smooth(method = "lm", color = "black", se = FALSE) +
  geom_abline(intercept = 0, slope = 1, linetype = "longdash", color = "seagreen") +
  geom_abline(intercept = 0, slope = 0.80, linetype = "longdash", color = "darkorange") +
  # Labels, Annotations, and Coordinate Values
  labs(title = "Average Submitted Charge vs. Average Medicare Aid Amount",
       subtitle = "Comparison of submitted charges to Medicare aid",
       x = "Average Submitted Charge ($)",
       y = "Average Medicare Aid Amount ($)") +
  annotate("text", x = 45000, y = 45000, label = "y = x", color = "seagreen", hjust = -0.5, size = 4) +
  annotate("text", x = 45000, y = 36000, label = "y = 0.8x", color = "darkorange", hjust = -0.5, size = 4) +
  annotate("text", x = 40000, y = lm_coef[1] + lm_coef[2] * 40000, 
           label = sprintf("\ny = %.2fx + %.2f", lm_coef[2], lm_coef[1]), 
           color = "black", hjust = -0.1, size = 4) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(color = "gray50"),
    panel.grid.minor = element_blank(),
    legend.position = "none"
  ) +
  scale_x_continuous(labels = dollar_format(scale = 1e-3, suffix = "K")) +
  scale_y_continuous(labels = dollar_format(scale = 1e-3, suffix = "K")) +
  coord_cartesian(xlim = c(0, 50000), ylim = c(0, 50000))
## `geom_smooth()` using formula = 'y ~ x'

Most cases have roughly 20% coverage whereas few services are fully covered by Medicare.

Part 2: Distribution of Average Medicare Submitted Amount for Top 10 Areas

The “Distribution of Average Medicare Submitted Amount for Top 10 Areas” box plot visualizes the distribution of average submitted charges across the top 10 geographic areas as determined by the total number of services provided (total_services). A few adjustments were made to the dataset in order to create this plot. The first was sorting the data by the sum of total services and storing the top 10 areas. From here, the data was reordered for plotting. In addition to this, the maximum for the average submitted charges was set to be $50,000, which includes outliers.

From this box plot, it can be seen that the median or the 50th percentile of the average submitted amount from providers is between $1,000 and $550 for the top 10 regions in the dataset. The top 5 regions with the highest average submitted charge include the National level, New York, California, Texas, and Florida. This data is particularly useful for comparing the average aid paid out by Medicare which is investigated in the next box plot.

# Find Top 10 Geographic Areas by Total Services
top_10_geo <- medicare %>%
  group_by(provider_geo_description) %>%
  summarize(total_services_sum = sum(total_services, na.rm = TRUE)) %>%
  arrange(desc(total_services_sum)) %>%
  head(10)

# Include Top 10 Geographic Areas and Filter Payments Below $50000
medicare_top_10 <- medicare %>%
  filter(provider_geo_description %in% top_10_geo$provider_geo_description,
         average_submitted_charge <= 50000)
  
# Filter Data for Values Below or At Zero
medicare_top_10 <- medicare %>%
  filter(provider_geo_description %in% top_10_geo$provider_geo_description,
         average_submitted_charge >= 1)

# Calculate median payment for each area and arrange
area_medians <- medicare_top_10 %>%
  group_by(provider_geo_description) %>%
  summarize(median_payment = median(average_submitted_charge, na.rm = TRUE)) %>%
  arrange(desc(median_payment))

# Reorder the Geographic Areas based on arranged medians
medicare_top_10 <- medicare_top_10 %>%
  mutate(provider_geo_description = factor(provider_geo_description, 
                                           levels = area_medians$provider_geo_description))

# Distribution of Average Medicare Submitted Amount for Top 10 Areas
ggplot(medicare_top_10, aes(provider_geo_description, average_submitted_charge)) +
  geom_boxplot(aes(fill = provider_geo_description), outlier.shape = 1, outlier.size = 0.5, outlier.alpha = 0.3) +
  scale_y_log10(labels = scales::dollar_format(accuracy = 1)) +
  coord_cartesian(ylim = c(1, 100000)) +
  labs(
    title = "Distribution of Average Medicare Submitted Amount for Top 10 Areas",
    subtitle = "Top areas based on total services provided, payments <= $50,000 (log scale)",
    x = "Geographic Area",
    y = "Average Submitted Amount ($) - Log Scale"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(color = "gray50"),
    legend.position = "none"
  )

At the National level, New York, California, Texas, and Florida have the highest average submitted amount.

Part 3: Distribution of Average Medicare Aid Amount for Top 10 Areas

The “Distribution of Average Medicare Aid Amount for Top 10 Areas” box plot visualizes the distribution of average Medicare aid amounts across the top 10 geographic areas as determined by the total number of services provided (total_services). A few adjustments were made to the dataset in order to create this plot. The first was sorting the data by the sum of total services and storing the top 10 areas. From here, the data was reordered for plotting. In addition to this, the maximum for the average aid amounts was set to be $50,000, which includes outliers.

From this box plot, it can be seen that the median or the 50th percentile of the average aid amount provided by Medicare is between $300 and $90 for the top 10 regions in the dataset. The top 5 regions with the highest average aid amount provided include the National level, California, Florida, New York, and Texas. This data demonstrates a similar observation seen in the scatter plot; most of the service payments are often met with between 20% and 1% of Medicare aid.

# Find Top 10 Geographic Areas by Total Services
top_10_geo_aid <- medicare %>%
  group_by(provider_geo_description) %>%
  summarize(total_services_sum = sum(total_services, na.rm = TRUE)) %>%
  arrange(desc(total_services_sum)) %>%
  head(10)

# Include Top 10 Geographic Areas and Filter Payments Below $50000
medicare_top_10_aid <- medicare %>%
  filter(provider_geo_description %in% top_10_geo$provider_geo_description,
         average_medicare_payment_amount <= 50000)
  
# Filter Data for Values Below or At Zero
medicare_top_10_aid <- medicare %>%
  filter(provider_geo_description %in% top_10_geo$provider_geo_description,
         average_medicare_payment_amount >= 0)

# Calculate median payment for each area and arrange
area_medians <- medicare_top_10_aid %>%
  group_by(provider_geo_description) %>%
  summarize(median_payment = median(average_medicare_payment_amount, na.rm = TRUE)) %>%
  arrange(desc(median_payment))

# Reorder the Geographic Areas based on arranged medians
medicare_top_10_aid <- medicare_top_10_aid %>%
  mutate(provider_geo_description = factor(provider_geo_description, 
                                           levels = area_medians$provider_geo_description))

# Distribution of Average Medicare Aid Amount for Top 10 Areas
ggplot(medicare_top_10_aid, aes(provider_geo_description, average_medicare_payment_amount)) +
  geom_boxplot(aes(fill = provider_geo_description), outlier.shape = 1, outlier.size = 0.5, outlier.alpha = 0.3) +
  scale_y_log10(labels = scales::dollar_format(accuracy = 1)) +
  coord_cartesian(ylim = c(1, 100000)) +
  labs(
    title = "Distribution of Average Medicare Aid Amount for Top 10 Areas",
    subtitle = "Top areas based on total services provided, payments <= $50,000 (log scale)",
    x = "Geographic Area",
    y = "Average Medicare Aid Amount ($) - Log Scale"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(color = "gray50"),
    legend.position = "none"
  )

At the National level, California, Florida, New York, and Texas have the highest average Medicare aid amount.

Dataset Manipulations: Refining Medicare Provider Data for 2022

This section contains 4 parts that will cover the data manipulations performed on the original medicare dataset and have been plotted into one representation of 4 separate charts. The primary data manipulations that will be made include filtering the unique geographic locations to simply the 50 states in the United States and grouping the data into frames by state location. This section will include the following components: * Part 0: Loading Additional Libraries for Mapping * Part 1: Generating Subset Mappings for Each Map * Part 2: Creating Maps for Average Aid, Payment Amounts, and Beneficiary Data * Part 3: Multi-Dimensional Analysis of Medicare Services Across States

Part 0: Loading Additional Libraries for Mapping

This code block loads the necessary libraries for creating US maps and visualizations. The usmap package provides functions for plotting US maps, viridis offers color palettes, and patchwork allows for combining multiple plots. The last library is for suppressing warning messages.

# Libraries for Mappings
suppressPackageStartupMessages(library(usmap))
suppressPackageStartupMessages(library(viridis))
suppressPackageStartupMessages(library(patchwork))
suppressPackageStartupMessages(library(knitr))

Part 1: Generating Subset Mappings for Each Map

This section prepares the data for mapping. It defines valid US states, filters the Medicare data to include only those states, and creates several summary datasets including the average Medicare payment amount by state, the average submitted charge amount by state, the total beneficiaries by state, and the average services per beneficiary by state.

# Creating Valid Regions Vector
geo_descriptions <- c("Wyoming", "Wisconsin", "West Virginia", "Washington", "Virginia", 
  "Vermont", "Utah", "Texas", "Tennessee", "South Dakota", "South Carolina", "Rhode Island", 
  "Oregon", "Oklahoma", "Ohio", "North Dakota", "North Carolina", "New York", 
  "New Mexico", "New Jersey", "New Hampshire", "Nevada", "Nebraska", "Montana", 
  "Missouri", "Mississippi", "Minnesota", "Michigan", "Massachusetts", "Maryland", 
  "Maine", "Louisiana", "Kentucky", "Kansas", "Iowa", "Indiana", "Illinois", 
  "Idaho", "Hawaii", "Georgia", "Florida", "Delaware", "Connecticut", "Colorado",
  "California", "Arkansas", "Arizona", "Alaska", "Alabama", "Pennsylvania")

# Constructing Medicare States Template Subset
medicare_states <- medicare %>%
  filter(provider_geo_description %in% geo_descriptions) %>%
  mutate(state = provider_geo_description)

# Subset 1: Average Medicare Aid Amount
state_summary <- medicare_states %>%
  group_by(state) %>%
  summarize(avg_payment = mean(average_medicare_payment_amount, na.rm = TRUE))

# Subset 2: Average Medicare Submitted Amount 
state_summary_payment <- medicare_states %>%
  group_by(state) %>%
  summarize(avg_submitted = mean(average_submitted_charge, na.rm = TRUE))

# Subset 3: Total Beneficiaries
state_summary_beneficiaries <- medicare_states %>%
  group_by(state) %>%
  summarize(benefits = mean(total_beneficiaries, na.rm = TRUE))

# Subset 4: Total Services by Total Beneficiaries
state_summary_services <- medicare_states %>%
  group_by(state) %>%
  summarize(avg_services = mean(total_services / total_beneficiaries, na.rm = TRUE))

Part 2: Creating Maps for Average Aid, Payment Amounts, and Beneficiary Data

This code block creates four separate US maps using the plot_usmap function including a map showing average Medicare aid by state, a map displaying average submitted amounts by state, a map illustrating total beneficiaries by state, and a map presenting average services per beneficiary by state.

# Map 1: Average Medicare Aid by State
map_payment <- plot_usmap(data = state_summary, values = "avg_payment", 
                          color = "white") +
  scale_fill_viridis_c(name = "Avg Medicare\nPayment ($)", 
                       label = scales::dollar_format(), 
                       option = "plasma") +
  labs(title = "Average Medicare Aid by State") +
  theme(legend.position = "right")

# Map 2: Average Submitted Amount by State
map_submitted <- plot_usmap(data = state_summary_payment, values = "avg_submitted", 
                            color = "white") +
  scale_fill_viridis_c(name = "Avg Submitted\nAmount ($)", 
                       label = scales::dollar_format(), 
                       option = "viridis") +
  labs(title = "Average Submitted Amount by State") +
  theme(legend.position = "right")

# Map 3: Total Beneficiaries by State
map_beneficiaries <- plot_usmap(data = state_summary_beneficiaries, values = "benefits", 
                                color = "white") +
  scale_fill_viridis_c(name = "Total\nBeneficiaries", option = "mako") +
  labs(title = "Total Beneficiaries by State") +
  theme(legend.position = "right")

# Map 4: Total Services per Beneficiary by State
map_services <- plot_usmap(data = state_summary_services, values = "avg_services", 
                           color = "white") +
  scale_fill_viridis_c(name = "Avg Services\nper Beneficiary", option = "turbo") +
  labs(title = "Average Services per Beneficiary by State") +
  theme(legend.position = "right")

Part 3: Multi-Dimensional Analysis of Medicare Services Across States

This final section combines the four individual maps into a single multi-panel visualization using the patchwork package. It arranges the maps in a 2x2 grid, adds an overall title, and adjusts the layout for optimal viewing.

Analyzing the relationship between average Medicare aid and average submitted amounts provides a nuanced understanding of cost coverage across the country. For instance, while states like California and New York may have a lower higher Medicare aid compared to some smaller states, they have significant disparity in average submitted amounts. This especially indicates that many services are not fully covered within these states. Moreover, states with substantial Medicare populations, such as Florida and Texas, do not align with the expected high utilization of Medicare services given their large senior demographics. This itself may reflect a reliance on private insurance or alternative healthcare systems that could be potentially influenced by the political leanings or demographic composition of these states. In contrast, Northeastern states like Connecticut demonstrate a propensity for using Medicare services. This contributes to higher average services per beneficiary metric. The trends depicted here could be synonymous with better healthcare infrastructure, a higher acceptance of Medicare benefits, or simply a larger proportion of elderly individuals dependent on Medicare.

# Multi-Dimensional Analysis of Medicare Services Across States
combined_map <- (map_payment + map_submitted) / (map_beneficiaries + map_services) +
  plot_layout(heights = c(4, 4, 0.5)) +
  plot_annotation(
    title = "Multi-Dimensional Analysis of Medicare Services Across States",
    theme = theme(plot.title = element_text(size = 16, face = "bold"),
                  plot.subtitle = element_text(size = 12))
  )

# Displaying Combined Map
print(combined_map)

California has the highest aid, but also one of the highest average submitted payment amounts.

States with lower average submitted amounts and average services per beneficiary may indicate challenges related to healthcare access or awareness among Medicare recipients. The higher utilization of services in states such as West Virginia, when contrasted with the comparatively lower average Medicare aid, demonstrates a need for potential healthcare policies and allocation of resources to support beneficiaries’ awareness. The interactive effects of all four factors (aid, submitted amounts, beneficiaries, and utilization) provide a view of healthcare efficiency, equity, and accessibility across the United States with the utilization of mapping libraries.


Conclusion: Implications of 2022 Medicare Provider Data for Healthcare Policy

Introduction: Understanding Medicare Physician and Practitioner Services

The objective of this study was to examine patterns related to Medicare care, costs incurred, and differences in compensation rendered by care providers in different geographic areas. The data collected by Centers for Medicare & Medicaid Services laid a firm background for examining healthcare utilization and costs in different regions. The preliminary analyses showed differences between reimbursements made by Medicare and costs imposed by care providers, which indicated a possible financial strain on beneficiaries. Through reformatting of the dataset for greater clarity and consistency in presentation. A firm analytical perspective was developed for assessing the impact of Medicare on care access and policy concerns.

Visualizations: Mapping Medicare Service Distributions and Aid Across Geographies

The visual analysis of Medicare service distributions showed patterns in support provision and differences in payment. The scatter plots and box plots showed that Medicare reimbursed about 20% of total costs on a normal basis, although there were occasional instances of full reimbursement. Geographic differences were also found; states of greater cost, New York and California, showed greater differences between reimbursement levels by practitioners and those of Medicare. The opposite, however, showed in a state-by-state breakdown in which some regions, especially in the Northeast, showed a greater number of services per patient. In addition, other states of large numbers of beneficiaries of Medicare failed to achieve the desired levels of services as what was to be expected based on their population. The results point toward a reason for considering medical infrastructure and policy differences in regions when assessing the efficiency of Medicare.

Dataset Manipulations: Refining Medicare Provider Data for 2022

In order to increase this study’s validity, a variety of data manipulation strategies were used, such as location filtering for providers in an effort to focus solely on the United States’ 50 states and aggregating data in relation to state-level metrics of service. The illustration of Medicare support and payment data clarified differences in coverage between regions. Interestingly, despite its extreme pattern of Medicare support, California reported some of its largest physician-reported charges, which helped support this trend of partial cost coverage. Areas of lower averages of submitted amounts and lower numbers of rendered services per beneficiary, especially in some of the Midwest, may face related barriers to accessing care or unfamiliarity among Medicare beneficiaries. The subsetted and filtered datasets made it possible for a more detailed and nuanced understanding of the pattern of Medicare support.

Next Steps: Future Research and Policy Considerations

This study sought to discover substantial patterns in Medicare provision, cost variability, and reimbursement differences between different geographic regions for medical practitioners. Future studies may consider examining factors underlying medical specialties in producing these differences and analyzing lasting influences of Medicare policy on medical care access. For example, one limitation of the dataset’s information was the lack of co-insurance data for beneficiaries. This made it difficult to determine if Medicare coverage was intentionally low for outside coverage. In addition, a study of supplemental coverage on financing Medicare support may provide insights on patient financing practices. Policy-making may be enhanced by examining ways of simplifying reimbursement systems, ultimately reducing physician compensation and Medicare financing disparities, which in return, can increase care access throughout the country. This continued study is aimed at further understanding Medicare’s critical position in Medicare provision in the United States.

This study of Medicare services, providers, and fees in 2022 highlights trends and anomalies that impact the healthcare landscape for senior Americans. Understanding these patterns is important for informing policy decisions, improving service delivery, and bridging gaps between Medicare and the public. Thank you for your time and dedication to exploring this data with me!