Impacts of Medicare Services on Americans: A 2022 Analysis of Trends and Anomalies

Introduction: Understanding Medicare Physician and Practitioner Services

Part 0: Dataset Library Loadings

As part of this study, R libraries such as tidyverse will be used for data manipulation and visualization. More specifically, the functions in ggplot2 and dplyr will be used frequently for plotting and reorganizing data. For consistency and clarity, the tidyverse library and the dataset itself will be loaded before descriptive analysis.

# Loading tidyverse for data manipulation and visualization
suppressPackageStartupMessages(library(tidyverse))

# Loading Medicare dataset for study (information is provide below)
medicare <- read.csv("~/Downloads/medicare_providers_2022.csv")

Part 1: Dataset Link and Overview

Dataset Links and Descriptions

The dataset selected for R data and statistical analysis in this study is sourced from the Centers for Medicare & Medicaid Services (CMS). Updated annually, this dataset represents the 2022 fiscal year and provides recent Medicare information on services and procedures for Original Medicare (fee-for-service) Part B (Medical Insurance) beneficiaries. The dataset is unique in its representation of information as it includes multiple geographic and service-related variables.

The Medicare Physician & Other Practitioners - by Geography and Service Overview link provided below includes a list of all the data available for Medicare physicians and other practitioners, sorted by year. The dataset used for this study is under the 2022 selection area. To access the data, an acceptance of usage policy may be required.

Medicare Physician & Other Practitioners - by Geography and Service Overview

The Medicare Physician & Other Practitioners - by Geography and Service Dataset link provided below is the most recent available dataset from 2022. The dataset can be exported in CVS or CSV for Excel formats, which the former was used for data analysis in this document. This dataset is 42 MB in size and has 270,673 rows of data.

Medicare Physician & Other Practitioners - by Geography and Service Dataset

The Medicare Physician & Other Practitioners - by Geography and Service Data Dictionary link provided below includes detailed information on each variable. However, these variables and their implications will be covered in this study, which makes this source optional for reviewing as most information will be reiterated in this document.

Medicare Physician & Other Practitioners - by Geography and Service Data Dictionary

Part 2: Dataset Description

The Medicare Physician & Other Practitioners - by Geography and Service dataset selected for this study includes information regarding Medicare provider locations and services, overall geographic factors, and quantitative representations of payments submitted by providers and the aid paid to beneficiaries by Medicare.

Variable Renaming

Before analyzing the 15 variables in the dataset, the variables will be renamed for the purpose of clarity moving forward. This step is important to ensure that each variable name reflects its content and purpose in an accessible manner.

# Renaming medicare variables to accessible identifiers
medicare <- medicare %>%
  rename(
    provider_geo_level = Rndrng_Prvdr_Geo_Lvl,
    provider_geo_code = Rndrng_Prvdr_Geo_Cd,
    provider_geo_description = Rndrng_Prvdr_Geo_Desc,
    hcpcs_code = HCPCS_Cd,
    hcpcs_description = HCPCS_Desc,
    hcpcs_drug_indicator = HCPCS_Drug_Ind,
    place_of_service = Place_Of_Srvc,
    total_providers = Tot_Rndrng_Prvdrs,
    total_services = Tot_Srvcs,
    total_beneficiaries = Tot_Benes,
    total_beneficiary_day_services = Tot_Bene_Day_Srvcs,
    average_submitted_charge = Avg_Sbmtd_Chrg,
    average_medicare_allowed_amount = Avg_Mdcr_Alowd_Amt,
    average_medicare_payment_amount = Avg_Mdcr_Pymt_Amt,
    average_medicare_standardized_amount = Avg_Mdcr_Stdzd_Amt
  )

Variable Descriptions

The following table includes the type (character or numeric), classification (binary, nominal, ordinal, discrete, or continuous), and a basic overview description of each variable in the dataset. Additional information regarding the implications of each variable is covered below this table. However, this table can be used as a reference to understand what each variable means throughout the study.

Variable Name	Type	Classification	Description
Geographic Information
`provider_geo_level`	character	Binary	Geographic levels (e.g., “National”, “State”)
`provider_geo_code`	character	Discrete	Geographic identifiers (e.g., FIPS codes in number)
`provider_geo_description`	character	Nominal	Descriptive names of geographic regions
Service Information
`hcpcs_code`	character	Nominal	Healthcare Common Procedure Coding System
`hcpcs_description`	character	Nominal	Descriptions of HCPCS codes
`hcpcs_drug_indicator`	character	Binary	Indicates if code represents a drug (‘Y’ or ‘N’)
`place_of_service`	character	Binary	Facility (‘F’) or non-facility (‘O’) service
Utilization Metrics
`total_providers`	numeric	Discrete	Count of rendering providers
`total_services`	numeric	Discrete	Count of services provided
`total_beneficiaries`	numeric	Discrete	Count of unique beneficiaries
`total_beneficiary_day_services`	numeric	Discrete	Count of beneficiary day services
Financial Metrics
`average_submitted_charge`	numeric	Continuous	Average charge submitted by providers
`average_medicare_allowed_amount`	numeric	Continuous	Average amount allowed by Medicare
`average_medicare_payment_amount`	numeric	Continuous	Average amount paid by Medicare
`average_medicare_standardized_amount`	numeric	Continuous	Average standardized Medicare payment

The table above briefly details the 15 variables in the dataset. To find additional information, the following information below provides an explanation of key variables to enhance understanding of the dataset. This expanded description focuses on HCPCS codes, place of service data, service and beneficiary metrics, and financial indicators.

HCPCS Code and Description

The Healthcare Common Procedure Coding System (HCPCS) as described in the hcpcs_code variable are identifiers for describing the service furnished by the provider. Level I codes (CPT codes) are maintained by the American Medical Association and Level II codes are created by the Centers for Medicare & Medicaid Services for services that are not covered by CPT codes. These descriptions (hcpcs_description) are patient-friendly codes that describe the service provided.

Place of Service

This binary variable (place_of_service) takes a value between facility (‘F’) and non-facility (‘O’) services. Non-facility refers to an office setting but can include other entities as described by CMS.

Number of Services and Beneficiaries

The number of services (total_services) generally implies the variable name, however, the method for attaining this number can vary from service to service. The number of beneficiaries (total_beneficiaries) represents distinct individuals receiving the service and the total beneficiary days of services (total_beneficiary_day_services accounts) for multiple services provided to a beneficiary on the same day to avoid double-counting.

Financial Metrics

The average charge submitted (average_submitted_charge) represents what providers bill for the service. The average amount allowed (average_medicare_allowed_amount) is the amount Medicare allows including beneficiary responsibility and third-party payments. The average amount paid by Medicare (average_medicare_payment_amount) is what Medicare actually pays after deductibles and coinsurance. The average standardized Medicare payment (average_medicare_standardized_amount) is a standardized payment amount that removes geographic differences which allows for more accurate comparisons across regions.

Observations Analysis

Each observation or row in the dataset represents the aggregated service and payment information for a specific geographic region (National or State), the HCPCS code, and the place of service with Medicare payment amounts detailed for the procedure. While it may seem that the data represents a single beneficiary, each observation is based on the service provided and the costs associated with the procedure. However, this data can be used to determine implicit or possibly connected impacts on Medicare beneficiaries from costs.

Implications of Observations

By analyzing many of these observations from the Medicare Physician & Other Practitioners - by Geography and Service dataset, trends in healthcare delivery, cost variations, and service utilization patterns within the Medicare system can be identified.

Part 3: Motivation for Study

This study is dedicated to my father, who frequently requires medical attention and relies on Medicare as his primary insurance. Through this analysis, I aim to discover insights that will not only benefit my family but also assist others in understanding the intricacies, advantages, and procedures associated with Medicare. I hope to empower individuals and families to navigate the complexities of the Medicare system more effectively.

By analyzing this dataset, this allows for uncovering derivations in the following scopes:

Analyzing healthcare cost and aid disparities across various regions
Identifying 2022 trends in medical services provided and top procedures
Examining the relationship between provider payments and Medicare aid
Exploring potential areas for healthcare policy and cost improvements

Part 4: Basic Summary Statistics

The following functions and table summarize key information regarding the dataset. Starting from built-in methods, the summary() and str() functions provide information on the minimum, median, maximum, and quartiles for quantitative variables in the dataset. In addition to this, the str() function summarizes the dimensions of the dataset and the types associated with each variable. From this, additional R methods are used to determine the number of unique provider geographies, the number of unique HCPCS codes, and the region with the highest service utilization from the data.

Summary Descriptions

The summary() function is used to derive information on the minimum, median, maximum, and quartiles for quantitative variables.

# Summary of dataset (minimum, maximum, median, mean, etc.)
summary(medicare)

##  provider_geo_level provider_geo_code  provider_geo_description
##  Length:270673      Length:270673      Length:270673           
##  Class :character   Class :character   Class :character        
##  Mode  :character   Mode  :character   Mode  :character        
##                                                                
##                                                                
##                                                                
##   hcpcs_code        hcpcs_description  hcpcs_drug_indicator place_of_service  
##  Length:270673      Length:270673      Length:270673        Length:270673     
##  Class :character   Class :character   Class :character     Class :character  
##  Mode  :character   Mode  :character   Mode  :character     Mode  :character  
##                                                                               
##                                                                               
##                                                                               
##  total_providers    total_beneficiaries total_services     
##  Min.   :     1.0   Min.   :      11    Min.   :       11  
##  1st Qu.:    11.0   1st Qu.:      30    1st Qu.:       40  
##  Median :    29.0   Median :     106    Median :      162  
##  Mean   :   266.8   Mean   :    5343    Mean   :    23595  
##  3rd Qu.:    95.0   3rd Qu.:     586    3rd Qu.:     1102  
##  Max.   :601911.0   Max.   :21459588    Max.   :103325664  
##  total_beneficiary_day_services average_submitted_charge
##  Min.   :      11               Min.   :    0.0         
##  1st Qu.:      38               1st Qu.:  127.0         
##  Median :     143               Median :  440.6         
##  Mean   :   10396               Mean   : 1309.7         
##  3rd Qu.:     831               3rd Qu.: 1606.7         
##  Max.   :90436622               Max.   :99509.8         
##  average_medicare_allowed_amount average_medicare_payment_amount
##  Min.   :    0.00                Min.   :    0.00               
##  1st Qu.:   35.13                1st Qu.:   27.78               
##  Median :  110.84                Median :   85.52               
##  Mean   :  291.80                Mean   :  232.25               
##  3rd Qu.:  315.65                3rd Qu.:  250.72               
##  Max.   :58494.73                Max.   :46612.60               
##  average_medicare_standardized_amount
##  Min.   :    0.00                    
##  1st Qu.:   27.60                    
##  Median :   85.03                    
##  Mean   :  230.06                    
##  3rd Qu.:  249.34                    
##  Max.   :46577.95

The str() function is used to derive the dimensions of the dataset and the types associated with each variable.

# Summary of dimensions and variables (data types and example data)
str(medicare)

## 'data.frame':    270673 obs. of  15 variables:
##  $ provider_geo_level                  : chr  "State" "State" "State" "State" ...
##  $ provider_geo_code                   : chr  "9E" "9E" "9E" "9E" ...
##  $ provider_geo_description            : chr  "Foreign Country" "Foreign Country" "Foreign Country" "Foreign Country" ...
##  $ hcpcs_code                          : chr  "J1885" "G0439" "G0416" "G0283" ...
##  $ hcpcs_description                   : chr  "Injection, ketorolac tromethamine, per 15 mg" "Annual wellness visit, includes a personalized prevention plan of service (pps), subsequent visit" "Surgical pathology, gross and microscopic examinations, for prostate needle biopsy, any method" "Electrical stimulation (unattended), to one or more areas for indication(s) other than wound care, as part of a"| __truncated__ ...
##  $ hcpcs_drug_indicator                : chr  "Y" "N" "N" "N" ...
##  $ place_of_service                    : chr  "O" "O" "F" "O" ...
##  $ total_providers                     : int  6 3 3 2 3 3 3 3 2 1 ...
##  $ total_beneficiaries                 : int  21 37 89 19 13 14 38 51 50 19 ...
##  $ total_services                      : num  29 37 89 93 13 14 38 54 347 41 ...
##  $ total_beneficiary_day_services      : int  23 37 89 93 13 14 38 54 347 36 ...
##  $ average_submitted_charge            : num  39.8 226.4 821.9 41.7 947.5 ...
##  $ average_medicare_allowed_amount     : num  0.564 119.932 178.407 9.294 178.261 ...
##  $ average_medicare_payment_amount     : num  0.375 119.932 142.263 7.025 178.261 ...
##  $ average_medicare_standardized_amount: num  0.373 128.837 139.189 7.065 183.48 ...

Summary Manual Information

Additional R Data Summary Techniques

The num_unique_geos variable is calculated to be the number of unique provider geographies from the provider_geo_description variable.

# Number of Unique Provider Geographies
num_unique_geos <- length(unique(medicare$provider_geo_description))
print(paste("Number of Unique Provider Geographies:", num_unique_geos))

## [1] "Number of Unique Provider Geographies: 63"

The num_unique_hcpcs variable is calculated to be the number of unique HCPCS codes from the hcpcs_code variable.

# Number of Unique HCPCS Codes
num_unique_hcpcs <- length(unique(medicare$hcpcs_code))
print(paste("Number of Unique HCPCS Codes:", num_unique_hcpcs))

## [1] "Number of Unique HCPCS Codes: 9231"

The service_by_state variable is calculated to be the aggregate of total services from the provider_geo_description variable.

# Identify State with Highest Service Utilization
service_by_state <- aggregate(medicare$total_services, FUN=sum,
                              by=list(Category=medicare$provider_geo_description))
highest_state <- service_by_state[which.max(service_by_state$x), ]
print(paste("State with Highest Service Utilization:", highest_state$Category))

## [1] "State with Highest Service Utilization: National"

Summary Table of Information

The following table includes summarized information regarding the dataset. From this, ideas regarding Medicare services and costs can be formulated such as how aid differs from the submitted payment amount from providers. Further analysis will be conducted on such potentially observed relationships.

Metric	Value	Description
Total Number of Variables	15	Number of columns in the dataset
Total Number of Observations	270,673	Total number of rows representing service instances
Avg. Medicare Submitted per Service	$1309.7	Average charge initially billed by providers
Avg. Medicare Allowed Aid per Service	$291.80	Average amount Medicare approves for payment
Avg. Medicare Standardized Aid	$230.06	Average payment adjusted for geographic variations
Avg. Medicare Actual Aid	$232.25	Average payment provided by Medicare
Number of Unique Provider Geographies	63	Count of distinct geographical areas included
Number of Unique HCPCS Codes	9231	Count of different medical service codes in the dataset
State with Highest Service Utilization	National	Services aggregated at a National Level

Summary Key Insights

Extensive Range of Services: The dataset has a wide spectrum of medical services represented by over 9,000 HCPCS codes
Geographic Scope Beyond States: Geographic data includes national-level extending the analysis beyond individual states
Disparity Between Charges and Payments: There is a significant difference between the average submitted charge ($1309.7) and the average standardized payment amount ($230.06) which demonstrates that there are differences between the provider’s costs and the aid provided

Visualizations: Mapping Medicare Service Distributions and Aid Across Geographies

Part 0: Overview of Visualizations for Medicare 2022 Data

There are three plots covered in this section with two being box plots and one being a scatter plot. From these charts, significant insights regarding Medicare aid and payments are revealed. Found below will be three charts with the following titles: * Average Submitted Charge vs. Average Medicare Aid Amount * Distribution of Average Medicare Submitted Amount for Top 10 Areas * Distribution of Average Medicare Aid Amount for Top 10 Areas

Part 1: Average Submitted Charge vs. Average Medicare Aid Amount

The “Average Submitted Charge vs. Average Medicare Aid Amount” plot demonstrates the relationship between two quantitative variables, namely average_medicare_payment_amount and average_submitted_charge. A few modifications were made to the original data when plotting these variables. The first was removing outliers that existed above $50,000 for both of the variables. From calculations, there are only 45 cases where the variable values are above $50,000, which makes these insignificant for evaluating overall trends. In addition to this, 3 linear equations are mapped below. The solid black line with the equation “y = 0.20x - 29.3” demonstrates the line of best fit for the data. Two extra dashed lines are present with the green line as “y = x” and the orange as “y = 0.8x”.

The line of each slope represents the coverage as a percent that Medicare provides for each service provided. For example, the y = x equates to 100% coverage from the submitted payment. As can be seen from the chart below, there is a minimum amount of services that have been fully covered by Medicare. To make a more realistic assumption, the y = 0.8x line is graphed to represent 80% coverage by Medicare. From this, it is also observed that few points are above or at that line. In fact, the line of best fit demonstrates that the actual coverage by Medicare is 20% for all services provided in 2022.

# Loading Scales Library 
suppressPackageStartupMessages(library(scales))

# Fitting Linear Model
lm_fit <- lm(average_medicare_payment_amount ~ average_submitted_charge, data = medicare)
lm_coef <- coef(lm_fit)

# Average Submitted Charge vs. Average Medicare Aid Amount
ggplot(medicare, aes(average_submitted_charge, average_medicare_payment_amount)) +
  geom_point(alpha = 0.75, color = "navyblue", fill = "turquoise1", shape = 21, size = 2, stroke = 0.5) +
  geom_smooth(method = "lm", color = "black", se = FALSE) +
  geom_abline(intercept = 0, slope = 1, linetype = "longdash", color = "seagreen") +
  geom_abline(intercept = 0, slope = 0.80, linetype = "longdash", color = "darkorange") +
  # Labels, Annotations, and Coordinate Values
  labs(title = "Average Submitted Charge vs. Average Medicare Aid Amount",
       subtitle = "Comparison of submitted charges to Medicare aid",
       x = "Average Submitted Charge ($)",
       y = "Average Medicare Aid Amount ($)") +
  annotate("text", x = 45000, y = 45000, label = "y = x", color = "seagreen", hjust = -0.5, size = 4) +
  annotate("text", x = 45000, y = 36000, label = "y = 0.8x", color = "darkorange", hjust = -0.5, size = 4) +
  annotate("text", x = 40000, y = lm_coef[1] + lm_coef[2] * 40000, 
           label = sprintf("\ny = %.2fx + %.2f", lm_coef[2], lm_coef[1]), 
           color = "black", hjust = -0.1, size = 4) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(color = "gray50"),
    panel.grid.minor = element_blank(),
    legend.position = "none"
  ) +
  scale_x_continuous(labels = dollar_format(scale = 1e-3, suffix = "K")) +
  scale_y_continuous(labels = dollar_format(scale = 1e-3, suffix = "K")) +
  coord_cartesian(xlim = c(0, 50000), ylim = c(0, 50000))

## `geom_smooth()` using formula = 'y ~ x'

Most cases have roughly 20% coverage whereas few services are fully covered by Medicare.

Part 2: Distribution of Average Medicare Submitted Amount for Top 10 Areas

The “Distribution of Average Medicare Submitted Amount for Top 10 Areas” box plot visualizes the distribution of average submitted charges across the top 10 geographic areas as determined by the total number of services provided (total_services). A few adjustments were made to the dataset in order to create this plot. The first was sorting the data by the sum of total services and storing the top 10 areas. From here, the data was reordered for plotting. In addition to this, the maximum for the average submitted charges was set to be $50,000, which includes outliers.

From this box plot, it can be seen that the median or the 50th percentile of the average submitted amount from providers is between $1,000 and $550 for the top 10 regions in the dataset. The top 5 regions with the highest average submitted charge include the National level, New York, California, Texas, and Florida. This data is particularly useful for comparing the average aid paid out by Medicare which is investigated in the next box plot.

# Find Top 10 Geographic Areas by Total Services
top_10_geo <- medicare %>%
  group_by(provider_geo_description) %>%
  summarize(total_services_sum = sum(total_services, na.rm = TRUE)) %>%
  arrange(desc(total_services_sum)) %>%
  head(10)

# Include Top 10 Geographic Areas and Filter Payments Below $50000
medicare_top_10 <- medicare %>%
  filter(provider_geo_description %in% top_10_geo$provider_geo_description,
         average_submitted_charge <= 50000)
  
# Filter Data for Values Below or At Zero
medicare_top_10 <- medicare %>%
  filter(provider_geo_description %in% top_10_geo$provider_geo_description,
         average_submitted_charge >= 1)

# Calculate median payment for each area and arrange
area_medians <- medicare_top_10 %>%
  group_by(provider_geo_description) %>%
  summarize(median_payment = median(average_submitted_charge, na.rm = TRUE)) %>%
  arrange(desc(median_payment))

# Reorder the Geographic Areas based on arranged medians
medicare_top_10 <- medicare_top_10 %>%
  mutate(provider_geo_description = factor(provider_geo_description, 
                                           levels = area_medians$provider_geo_description))

# Distribution of Average Medicare Submitted Amount for Top 10 Areas
ggplot(medicare_top_10, aes(provider_geo_description, average_submitted_charge)) +
  geom_boxplot(aes(fill = provider_geo_description), outlier.shape = 1, outlier.size = 0.5, outlier.alpha = 0.3) +
  scale_y_log10(labels = scales::dollar_format(accuracy = 1)) +
  coord_cartesian(ylim = c(1, 100000)) +
  labs(
    title = "Distribution of Average Medicare Submitted Amount for Top 10 Areas",
    subtitle = "Top areas based on total services provided, payments <= $50,000 (log scale)",
    x = "Geographic Area",
    y = "Average Submitted Amount ($) - Log Scale"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(color = "gray50"),
    legend.position = "none"
  )

At the National level, New York, California, Texas, and Florida have the highest average submitted amount.

Part 3: Distribution of Average Medicare Aid Amount for Top 10 Areas

The “Distribution of Average Medicare Aid Amount for Top 10 Areas” box plot visualizes the distribution of average Medicare aid amounts across the top 10 geographic areas as determined by the total number of services provided (total_services). A few adjustments were made to the dataset in order to create this plot. The first was sorting the data by the sum of total services and storing the top 10 areas. From here, the data was reordered for plotting. In addition to this, the maximum for the average aid amounts was set to be $50,000, which includes outliers.

From this box plot, it can be seen that the median or the 50th percentile of the average aid amount provided by Medicare is between $300 and $90 for the top 10 regions in the dataset. The top 5 regions with the highest average aid amount provided include the National level, California, Florida, New York, and Texas. This data demonstrates a similar observation seen in the scatter plot; most of the service payments are often met with between 20% and 1% of Medicare aid.

# Find Top 10 Geographic Areas by Total Services
top_10_geo_aid <- medicare %>%
  group_by(provider_geo_description) %>%
  summarize(total_services_sum = sum(total_services, na.rm = TRUE)) %>%
  arrange(desc(total_services_sum)) %>%
  head(10)

# Include Top 10 Geographic Areas and Filter Payments Below $50000
medicare_top_10_aid <- medicare %>%
  filter(provider_geo_description %in% top_10_geo$provider_geo_description,
         average_medicare_payment_amount <= 50000)
  
# Filter Data for Values Below or At Zero
medicare_top_10_aid <- medicare %>%
  filter(provider_geo_description %in% top_10_geo$provider_geo_description,
         average_medicare_payment_amount >= 0)

# Calculate median payment for each area and arrange
area_medians <- medicare_top_10_aid %>%
  group_by(provider_geo_description) %>%
  summarize(median_payment = median(average_medicare_payment_amount, na.rm = TRUE)) %>%
  arrange(desc(median_payment))

# Reorder the Geographic Areas based on arranged medians
medicare_top_10_aid <- medicare_top_10_aid %>%
  mutate(provider_geo_description = factor(provider_geo_description, 
                                           levels = area_medians$provider_geo_description))

# Distribution of Average Medicare Aid Amount for Top 10 Areas
ggplot(medicare_top_10_aid, aes(provider_geo_description, average_medicare_payment_amount)) +
  geom_boxplot(aes(fill = provider_geo_description), outlier.shape = 1, outlier.size = 0.5, outlier.alpha = 0.3) +
  scale_y_log10(labels = scales::dollar_format(accuracy = 1)) +
  coord_cartesian(ylim = c(1, 100000)) +
  labs(
    title = "Distribution of Average Medicare Aid Amount for Top 10 Areas",
    subtitle = "Top areas based on total services provided, payments <= $50,000 (log scale)",
    x = "Geographic Area",
    y = "Average Medicare Aid Amount ($) - Log Scale"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(color = "gray50"),
    legend.position = "none"
  )

At the National level, California, Florida, New York, and Texas have the highest average Medicare aid amount.

Dataset Manipulations: Refining Medicare Provider Data for 2022

This section contains 4 parts that will cover the data manipulations performed on the original medicare dataset and have been plotted into one representation of 4 separate charts. The primary data manipulations that will be made include filtering the unique geographic locations to simply the 50 states in the United States and grouping the data into frames by state location. This section will include the following components: * Part 0: Loading Additional Libraries for Mapping * Part 1: Generating Subset Mappings for Each Map * Part 2: Creating Maps for Average Aid, Payment Amounts, and Beneficiary Data * Part 3: Multi-Dimensional Analysis of Medicare Services Across States

Part 0: Loading Additional Libraries for Mapping

This code block loads the necessary libraries for creating US maps and visualizations. The usmap package provides functions for plotting US maps, viridis offers color palettes, and patchwork allows for combining multiple plots. The last library is for suppressing warning messages.

# Libraries for Mappings
suppressPackageStartupMessages(library(usmap))
suppressPackageStartupMessages(library(viridis))
suppressPackageStartupMessages(library(patchwork))
suppressPackageStartupMessages(library(knitr))

Part 1: Generating Subset Mappings for Each Map

This section prepares the data for mapping. It defines valid US states, filters the Medicare data to include only those states, and creates several summary datasets including the average Medicare payment amount by state, the average submitted charge amount by state, the total beneficiaries by state, and the average services per beneficiary by state.

# Creating Valid Regions Vector
geo_descriptions <- c("Wyoming", "Wisconsin", "West Virginia", "Washington", "Virginia", 
  "Vermont", "Utah", "Texas", "Tennessee", "South Dakota", "South Carolina", "Rhode Island", 
  "Oregon", "Oklahoma", "Ohio", "North Dakota", "North Carolina", "New York", 
  "New Mexico", "New Jersey", "New Hampshire", "Nevada", "Nebraska", "Montana", 
  "Missouri", "Mississippi", "Minnesota", "Michigan", "Massachusetts", "Maryland", 
  "Maine", "Louisiana", "Kentucky", "Kansas", "Iowa", "Indiana", "Illinois", 
  "Idaho", "Hawaii", "Georgia", "Florida", "Delaware", "Connecticut", "Colorado",
  "California", "Arkansas", "Arizona", "Alaska", "Alabama", "Pennsylvania")

# Constructing Medicare States Template Subset
medicare_states <- medicare %>%
  filter(provider_geo_description %in% geo_descriptions) %>%
  mutate(state = provider_geo_description)

# Subset 1: Average Medicare Aid Amount
state_summary <- medicare_states %>%
  group_by(state) %>%
  summarize(avg_payment = mean(average_medicare_payment_amount, na.rm = TRUE))

# Subset 2: Average Medicare Submitted Amount 
state_summary_payment <- medicare_states %>%
  group_by(state) %>%
  summarize(avg_submitted = mean(average_submitted_charge, na.rm = TRUE))

# Subset 3: Total Beneficiaries
state_summary_beneficiaries <- medicare_states %>%
  group_by(state) %>%
  summarize(benefits = mean(total_beneficiaries, na.rm = TRUE))

# Subset 4: Total Services by Total Beneficiaries
state_summary_services <- medicare_states %>%
  group_by(state) %>%
  summarize(avg_services = mean(total_services / total_beneficiaries, na.rm = TRUE))

Part 2: Creating Maps for Average Aid, Payment Amounts, and Beneficiary Data

This code block creates four separate US maps using the plot_usmap function including a map showing average Medicare aid by state, a map displaying average submitted amounts by state, a map illustrating total beneficiaries by state, and a map presenting average services per beneficiary by state.

# Map 1: Average Medicare Aid by State
map_payment <- plot_usmap(data = state_summary, values = "avg_payment", 
                          color = "white") +
  scale_fill_viridis_c(name = "Avg Medicare\nPayment ($)", 
                       label = scales::dollar_format(), 
                       option = "plasma") +
  labs(title = "Average Medicare Aid by State") +
  theme(legend.position = "right")

# Map 2: Average Submitted Amount by State
map_submitted <- plot_usmap(data = state_summary_payment, values = "avg_submitted", 
                            color = "white") +
  scale_fill_viridis_c(name = "Avg Submitted\nAmount ($)", 
                       label = scales::dollar_format(), 
                       option = "viridis") +
  labs(title = "Average Submitted Amount by State") +
  theme(legend.position = "right")

# Map 3: Total Beneficiaries by State
map_beneficiaries <- plot_usmap(data = state_summary_beneficiaries, values = "benefits", 
                                color = "white") +
  scale_fill_viridis_c(name = "Total\nBeneficiaries", option = "mako") +
  labs(title = "Total Beneficiaries by State") +
  theme(legend.position = "right")

# Map 4: Total Services per Beneficiary by State
map_services <- plot_usmap(data = state_summary_services, values = "avg_services", 
                           color = "white") +
  scale_fill_viridis_c(name = "Avg Services\nper Beneficiary", option = "turbo") +
  labs(title = "Average Services per Beneficiary by State") +
  theme(legend.position = "right")

Part 3: Multi-Dimensional Analysis of Medicare Services Across States

This final section combines the four individual maps into a single multi-panel visualization using the patchwork package. It arranges the maps in a 2x2 grid, adds an overall title, and adjusts the layout for optimal viewing.

Analyzing the relationship between average Medicare aid and average submitted amounts provides a nuanced understanding of cost coverage across the country. For instance, while states like California and New York may have a lower higher Medicare aid compared to some smaller states, they have significant disparity in average submitted amounts. This especially indicates that many services are not fully covered within these states. Moreover, states with substantial Medicare populations, such as Florida and Texas, do not align with the expected high utilization of Medicare services given their large senior demographics. This itself may reflect a reliance on private insurance or alternative healthcare systems that could be potentially influenced by the political leanings or demographic composition of these states. In contrast, Northeastern states like Connecticut demonstrate a propensity for using Medicare services. This contributes to higher average services per beneficiary metric. The trends depicted here could be synonymous with better healthcare infrastructure, a higher acceptance of Medicare benefits, or simply a larger proportion of elderly individuals dependent on Medicare.

# Multi-Dimensional Analysis of Medicare Services Across States
combined_map <- (map_payment + map_submitted) / (map_beneficiaries + map_services) +
  plot_layout(heights = c(4, 4, 0.5)) +
  plot_annotation(
    title = "Multi-Dimensional Analysis of Medicare Services Across States",
    theme = theme(plot.title = element_text(size = 16, face = "bold"),
                  plot.subtitle = element_text(size = 12))
  )

# Displaying Combined Map
print(combined_map)

California has the highest aid, but also one of the highest average submitted payment amounts.

States with lower average submitted amounts and average services per beneficiary may indicate challenges related to healthcare access or awareness among Medicare recipients. The higher utilization of services in states such as West Virginia, when contrasted with the comparatively lower average Medicare aid, demonstrates a need for potential healthcare policies and allocation of resources to support beneficiaries’ awareness. The interactive effects of all four factors (aid, submitted amounts, beneficiaries, and utilization) provide a view of healthcare efficiency, equity, and accessibility across the United States with the utilization of mapping libraries.

Statistical Analysis: Trends in Medicare Physician Services and Reimbursements

Part 0: Statistical Analysis Overview

This section will cover the various components of statistical analysis such as correlation, regression, and conducting t-tests. For the t-tests created, the alpha value used to determine statistical significance is 0.05. In addition to this, data manipulation is performed on the data below to remove outliers and abbreviate variable names for ease of demonstration.

The dataset is first refined by renaming/adjusting key variables related to charges, payments, providers, and beneficiaries. To improve the accuracy of correlation and regression analyses, extreme outliers are removed based on predefined thresholds. The outlier-filtered dataset ensures that extreme values do not distort statistical relationships.

# Creating Abbreviated Dataset
medicare_abbrev <- medicare %>%
  select(
    avg_sub_chg = average_submitted_charge,
    avg_med_pay = average_medicare_payment_amount,
    avg_med_all = average_medicare_allowed_amount,
    avg_med_std = average_medicare_standardized_amount,
    tot_prov = total_providers,
    tot_benef = total_beneficiaries,
    tot_ben_day = total_beneficiary_day_services,
    tot_serv = total_services
  )

# Removing Outliers: Filtered Dataset for Correlation Testing w/ Abbreviations
medicare_adjust_abbrev <- medicare_abbrev %>%
  filter(avg_sub_chg <= 50000, avg_med_all <= 50000, avg_med_pay <= 50000, 
         avg_med_std <= 5000, tot_prov <= 500000, tot_benef <= 14000000, 
         tot_serv <= 10000000, tot_ben_day <= 10000000)

# Removing Outliers: Filtered Dataset
medicare_adjust <- medicare %>%
  filter(average_submitted_charge <= 50000, average_medicare_allowed_amount <= 50000,
         average_medicare_payment_amount <= 50000, total_services <= 10000000,
         average_medicare_standardized_amount <= 5000, total_providers <= 500000,
         total_beneficiaries <= 14000000, total_beneficiary_day_services <= 10000000)

Part 1: Correlation Analysis

This section examines the relationships between key financial and service-related Medicare metrics. Pearson and Spearman correlation tests are used to identify the strength and direction of relationships between submitted charges, allowed payments, and provider counts.

Part 1.1: Pearson and Spearman Correlation Testing

Pearson’s correlation measures linear relationships, while Spearman’s correlation accounts for non-linear but monotonic relationships. Here, it is assessed whether higher submitted charges correlate with Medicare reimbursements and whether provider counts relate to beneficiary counts.

The Pearson and Spearman analyses on the original dataset may show different results. If the relationships are purely linear, Pearson and Spearman should yield similar results. However, if there are non-linear but monotonic relationships, Spearman might show stronger correlations. Any significant differences between the two methods could indicate the presence of non-linear relationships or outliers affecting the Pearson correlation.

Correlation Testing with Pearson Analysis

# Pearson Correlation Testing: Average Submitted Charge vs. Average Aid Amount
cor.test(medicare$average_submitted_charge, medicare$average_medicare_payment_amount)

## 
##  Pearson's product-moment correlation
## 
## data:  medicare$average_submitted_charge and medicare$average_medicare_payment_amount
## t = 654.06, df = 270671, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7811460 0.7840658
## sample estimates:
##       cor 
## 0.7826102

# Pearson Correlation Testing: Average Submitted Charge vs. Average Allowed Amount
cor.test(medicare$average_submitted_charge, medicare$average_medicare_allowed_amount)

## 
##  Pearson's product-moment correlation
## 
## data:  medicare$average_submitted_charge and medicare$average_medicare_allowed_amount
## t = 656.65, df = 270671, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7823469 0.7852526
## sample estimates:
##       cor 
## 0.7838041

# Pearson Correlation Testing: Total Providers vs. Total Beneficiaries
cor.test(medicare$total_providers, medicare$total_beneficiaries)

## 
##  Pearson's product-moment correlation
## 
## data:  medicare$total_providers and medicare$total_beneficiaries
## t = 517.47, df = 270671, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7033023 0.7070899
## sample estimates:
##       cor 
## 0.7052011

From the first three tests between the average submitted charge, average Medicare payment, average Medicare allowed amount, total providers, and total beneficiaries data, it can be seen that a roughly strong correlation exists between each pair. From this data, let us look at the Spearman correlation to determine if there is any difference.

Correlation Testing with Spearman Analysis

# Spearman Correlation Testing: Average Submitted Charge vs. Average Aid Amount
suppressWarnings({
  cor.test(medicare$average_submitted_charge, medicare$average_medicare_payment_amount, method = "spearman")
})

## 
##  Spearman's rank correlation rho
## 
## data:  medicare$average_submitted_charge and medicare$average_medicare_payment_amount
## S = 2.2847e+14, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##      rho 
## 0.930874

# Spearman Correlation Testing: Average Submitted Charge vs. Average Allowed Amount
suppressWarnings({
  cor.test(medicare$average_submitted_charge, medicare$average_medicare_allowed_amount, method = "spearman")
})

## 
##  Spearman's rank correlation rho
## 
## data:  medicare$average_submitted_charge and medicare$average_medicare_allowed_amount
## S = 2.2459e+14, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9320485

# Spearman Correlation Testing: Total Providers vs. Total Beneficiaries
suppressWarnings({
  cor.test(medicare$total_providers, medicare$total_beneficiaries, method = "spearman")
})

## 
##  Spearman's rank correlation rho
## 
## data:  medicare$total_providers and medicare$total_beneficiaries
## S = 8.4523e+14, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.7442654

From Spearman’s correlation, it appears that for the same pairs of data, the Spearman correlation increases to almost a near-perfect relationship, demonstrating that there might be outliers or another factor influencing the data. In the next sections, the Pearson and Spearman correlation will be tested on the filtered data to determine if there is a relationship.

Correlation Testing with Pearson Analysis (Removing Outliers)

# Pearson Correlation: Average Submitted Charge vs. Average Aid Amount (Removed Outliers)
cor.test(medicare_adjust$average_submitted_charge, medicare_adjust$average_medicare_payment_amount)

## 
##  Pearson's product-moment correlation
## 
## data:  medicare_adjust$average_submitted_charge and medicare_adjust$average_medicare_payment_amount
## t = 726.08, df = 270165, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8118438 0.8143992
## sample estimates:
##       cor 
## 0.8131254

# Pearson Correlation: Average Submitted Charge vs. Average Allowed Amount (Removed Outliers)
cor.test(medicare_adjust$average_submitted_charge, medicare_adjust$average_medicare_allowed_amount)

## 
##  Pearson's product-moment correlation
## 
## data:  medicare_adjust$average_submitted_charge and medicare_adjust$average_medicare_allowed_amount
## t = 736.96, df = 270165, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8159379 0.8184432
## sample estimates:
##       cor 
## 0.8171944

# Pearson Correlation: Total Providers vs. Total Beneficiaries (Removed Outliers)
cor.test(medicare_adjust$total_providers, medicare_adjust$total_beneficiaries)

## 
##  Pearson's product-moment correlation
## 
## data:  medicare_adjust$total_providers and medicare_adjust$total_beneficiaries
## t = 448.95, df = 270165, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6515021 0.6558213
## sample estimates:
##      cor 
## 0.653667

From removing outliers, it can be seen that the Pearson correlation coefficient between the average submitted charge and the average Medicare aid amount increases from 0.7 as a strong positive relationship to 0.8 as a stronger relationship. In addition to that, a similar trend follows the average submitted charge amount and the Medicare-allowed aid amount. However, the last relationship between the total providers and total beneficiaries decreases in a positive relationship to 0.65.

Correlation Testing with Spearman Analysis (Removing Outliers)

# Spearman Correlation: Average Submitted Charge vs. Average Aid Amount (Removed Outliers)
suppressWarnings({
  cor.test(medicare_adjust$average_submitted_charge, medicare_adjust$average_medicare_payment_amount, method = "spearman")
})

## 
##  Spearman's rank correlation rho
## 
## data:  medicare_adjust$average_submitted_charge and medicare_adjust$average_medicare_payment_amount
## S = 2.2841e+14, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##      rho 
## 0.930503

# Spearman Correlation: Average Submitted Charge vs. Average Allowed Amount (Removed Outliers)
suppressWarnings({
  cor.test(medicare_adjust$average_submitted_charge, medicare_adjust$average_medicare_allowed_amount, method = "spearman")
})

## 
##  Spearman's rank correlation rho
## 
## data:  medicare_adjust$average_submitted_charge and medicare_adjust$average_medicare_allowed_amount
## S = 2.2453e+14, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.9316841

# Spearman Correlation: Total Providers vs. Total Beneficiaries (Removed Outliers)
suppressWarnings({
  cor.test(medicare_adjust$total_providers, medicare_adjust$total_beneficiaries, method = "spearman")
})

## 
##  Spearman's rank correlation rho
## 
## data:  medicare_adjust$total_providers and medicare_adjust$total_beneficiaries
## S = 8.4048e+14, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.7442714

The Spearman correlation between the variables in the first two relationships increases to a near-perfect coefficient. This demonstrates that there are indeed outliers that impact the data, but since the Pearson correlation increased, this was mitigated to an extent. In addition to this, the Spearman correlation for the total providers and total beneficiaries does not decrease, which demonstrates that the outliers in the first original had been accounted for when calculating the Spearman correlation coefficient.

Pearson and Spearman Correlations Table

The table summarizes the results of correlation tests conducted using both Pearson and Spearman methods on various Medicare-related variables. The analysis includes comparisons between the average submitted charge and both the average Medicare payment and allowed amounts, as well as between the total number of providers and beneficiaries. The results are presented for both the original dataset and an adjusted version with outliers removed. Across all tests, strong positive correlations were observed, with Spearman correlations generally indicating a stronger relationship than Pearson correlations. This suggests that while there are linear components to these relationships, there may also be non-linear aspects. Removing outliers tended to increase the strength of the Pearson correlations, except in the case of providers versus beneficiaries, where the correlation decreased.

Variables	Test Type	Original Data	Adjusted Data (Outliers Removed)
Submitted Charge vs Medicare Payment	Pearson	0.7826	0.8131
	Spearman	0.9309	0.9305
Submitted Charge vs Medicare Allowed	Pearson	0.7838	0.8172
	Spearman	0.9320	0.9317
Total Providers vs Total Beneficiaries	Pearson	0.7052	0.6537
	Spearman	0.7443	0.7441

All correlations are statistically significant with p-values < 2.2e-16.

Part 1.2: Pearson and Spearman Correlation Matrix

A correlation matrix is generated to visualize relationships across all numerical variables, both with and without outliers. This helps identify which factors are strongly associated with each other.

Pearson Correlation Matrix

# Pearson Correlation Matrix
pearson_cor <- cor(medicare_abbrev) %>% round(3)
print(pearson_cor)

##             avg_sub_chg avg_med_pay avg_med_all avg_med_std tot_prov tot_benef
## avg_sub_chg       1.000       0.783       0.784       0.775   -0.019    -0.020
## avg_med_pay       0.783       1.000       1.000       0.997   -0.014    -0.013
## avg_med_all       0.784       1.000       1.000       0.997   -0.014    -0.013
## avg_med_std       0.775       0.997       0.997       1.000   -0.014    -0.013
## tot_prov         -0.019      -0.014      -0.014      -0.014    1.000     0.705
## tot_benef        -0.020      -0.013      -0.013      -0.013    0.705     1.000
## tot_ben_day      -0.014      -0.009      -0.009      -0.009    0.700     0.853
## tot_serv         -0.018      -0.012      -0.012      -0.012    0.389     0.508
##             tot_ben_day tot_serv
## avg_sub_chg      -0.014   -0.018
## avg_med_pay      -0.009   -0.012
## avg_med_all      -0.009   -0.012
## avg_med_std      -0.009   -0.012
## tot_prov          0.700    0.389
## tot_benef         0.853    0.508
## tot_ben_day       1.000    0.578
## tot_serv          0.578    1.000

This matrix shows strong positive correlations (>0.77) between average submitted charges, Medicare payments, allowed amounts, and standardized amounts. There is also a strong correlation (0.70520) between total providers and total beneficiaries. However, these financial variables show very weak negative correlations with provider and beneficiary counts. Total beneficiary days and total services are strongly correlated with total beneficiaries (0.85252 and 0.50754 respectively).

Spearman Correlation Matrix

# Spearman Correlation Matrix
spearman_cor <- cor(medicare_abbrev, method = "spearman") %>% round(3)
print(spearman_cor)

##             avg_sub_chg avg_med_pay avg_med_all avg_med_std tot_prov tot_benef
## avg_sub_chg       1.000       0.931       0.932       0.927   -0.007    -0.297
## avg_med_pay       0.931       1.000       0.999       0.999   -0.040    -0.294
## avg_med_all       0.932       0.999       1.000       0.997   -0.032    -0.296
## avg_med_std       0.927       0.999       0.997       1.000   -0.042    -0.296
## tot_prov         -0.007      -0.040      -0.032      -0.042    1.000     0.744
## tot_benef        -0.297      -0.294      -0.296      -0.296    0.744     1.000
## tot_ben_day      -0.322      -0.312      -0.313      -0.313    0.729     0.974
## tot_serv         -0.367      -0.349      -0.351      -0.350    0.686     0.923
##             tot_ben_day tot_serv
## avg_sub_chg      -0.322   -0.367
## avg_med_pay      -0.312   -0.349
## avg_med_all      -0.313   -0.351
## avg_med_std      -0.313   -0.350
## tot_prov          0.729    0.686
## tot_benef         0.974    0.923
## tot_ben_day       1.000    0.970
## tot_serv          0.970    1.000

The Spearman correlations reveal even stronger relationships between the financial variables (>0.92). Interestingly, it shows moderate negative correlations (-0.29 to -0.36) between financial variables and beneficiary/service counts, which was not apparent in the Pearson correlations. The relationships between provider/beneficiary counts and service metrics are very strong (0.68 to 0.97), suggesting non-linear associations.

Pearson Correlation Matrix (Removing Outliers)

# Removing Outliers: Pearson Correlation Matrix
pearson_cor_outlier_rm <- cor(medicare_adjust_abbrev) %>% round(3)
print(pearson_cor_outlier_rm)

##             avg_sub_chg avg_med_pay avg_med_all avg_med_std tot_prov tot_benef
## avg_sub_chg       1.000       0.813       0.817       0.800   -0.025    -0.032
## avg_med_pay       0.813       1.000       0.999       0.995   -0.027    -0.030
## avg_med_all       0.817       0.999       1.000       0.994   -0.027    -0.030
## avg_med_std       0.800       0.995       0.994       1.000   -0.027    -0.030
## tot_prov         -0.025      -0.027      -0.027      -0.027    1.000     0.654
## tot_benef        -0.032      -0.030      -0.030      -0.030    0.654     1.000
## tot_ben_day      -0.033      -0.030      -0.031      -0.031    0.623     0.874
## tot_serv         -0.045      -0.042      -0.043      -0.042    0.400     0.558
##             tot_ben_day tot_serv
## avg_sub_chg      -0.033   -0.045
## avg_med_pay      -0.030   -0.042
## avg_med_all      -0.031   -0.043
## avg_med_std      -0.031   -0.042
## tot_prov          0.623    0.400
## tot_benef         0.874    0.558
## tot_ben_day       1.000    0.648
## tot_serv          0.648    1.000

This matrix shows strong positive correlations (>0.79) between average submitted charges, Medicare payments, allowed amounts, and standardized amounts. The correlation between total providers and total beneficiaries is moderately strong (0.6537). Financial variables show very weak negative correlations with provider and beneficiary counts. Total beneficiary days and total services are strongly correlated with total beneficiaries (0.8743 and 0.5585 respectively).

Spearman Correlation Matrix (Removing Outliers)

# Removing Outliers: Spearman Correlation Matrix
spearman_cor_outlier_rm <- cor(medicare_adjust_abbrev, method = "spearman") %>% round(3)
print(spearman_cor_outlier_rm)

##             avg_sub_chg avg_med_pay avg_med_all avg_med_std tot_prov tot_benef
## avg_sub_chg       1.000       0.931       0.932       0.927   -0.005    -0.296
## avg_med_pay       0.931       1.000       0.999       0.999   -0.038    -0.294
## avg_med_all       0.932       0.999       1.000       0.997   -0.030    -0.296
## avg_med_std       0.927       0.999       0.997       1.000   -0.040    -0.296
## tot_prov         -0.005      -0.038      -0.030      -0.040    1.000     0.744
## tot_benef        -0.296      -0.294      -0.296      -0.296    0.744     1.000
## tot_ben_day      -0.322      -0.312      -0.313      -0.313    0.729     0.974
## tot_serv         -0.367      -0.349      -0.351      -0.350    0.686     0.923
##             tot_ben_day tot_serv
## avg_sub_chg      -0.322   -0.367
## avg_med_pay      -0.312   -0.349
## avg_med_all      -0.313   -0.351
## avg_med_std      -0.313   -0.350
## tot_prov          0.729    0.686
## tot_benef         0.974    0.923
## tot_ben_day       1.000    0.970
## tot_serv          0.970    1.000

The Spearman correlations reveal even stronger relationships between the financial variables (>0.92). Interestingly, it shows moderate negative correlations (-0.29 to -0.36) between financial variables and beneficiary/service counts, which was not as apparent in the Pearson correlations. The relationships between provider/beneficiary counts and service metrics are very strong (0.68 to 0.97), suggesting non-linear associations.

Pearson and Spearman Correlations Matrix Summary

Financial variables (submitted charges, payments, allowed amounts) show strong positive correlations with each other in both Pearson and Spearman correlations which demonstrates that they have often increased together. However, there is a difference between Pearson and Spearman correlations, especially regarding the relationship between various aid/payment variables and beneficiary/service totals. While Pearson correlations show weak negative relationships, Spearman correlations reveal moderate negative associations which shows that some may be non-linear relationships that are better captured by Spearman’s rank-based correlations. The strong correlations between beneficiary counts, beneficiary days, and services indicate that these variables are closely related due to similar metrics. The relationship between total providers and total beneficiaries is moderately strong in the Pearson correlation (0.6537) but stronger in the Spearman correlation (0.7443), which shows that this relationship may not be fully linear.

The correlation matrices reveal strong relationships between financial variables and moderate to strong associations between provider and beneficiary metrics, highlighting both linear and non-linear interactions within the Medicare dataset.

Part 1.3: Displaying Correlation Graphs for Averages and Totals

Scatterplot matrices are used to visually assess trends and potential relationships among Medicare charge, payment, provider, and beneficiary data. These graphs help in detecting patterns or clustering within the dataset.

Pearson Correlation Graphs

# Pearson Correlation Graphs
pairs(select(medicare_abbrev, avg_sub_chg, avg_med_pay, avg_med_all, avg_med_std))

This graph set visualizes the relationships between average submitted charges, Medicare payments, allowed amounts, and standardized amounts using Pearson correlations. The scatter plots show strong positive linear relationships between these financial variables, as indicated by the high correlation coefficients (>0.79) in the matrix. This shows that as one financial metric increases, the others tend to increase proportionally. However, this is not the case for all variables as it can be seen that some contain outliers that skew the data.

Spearman Correlation Graphs

# Spearman Correlation Graphs
pairs(select(medicare_abbrev, tot_prov, tot_benef, tot_ben_day, tot_serv))

These graphs depict the relationships between total providers, total beneficiaries, total beneficiary days, and total services using Spearman rank correlations. The plots probably reveal strong positive monotonic relationships, especially between total beneficiaries, beneficiary days, and services. The relationship with total providers might appear less linear but still positively correlated.

Pearson Correlation Graphs (Removing Outliers)

# Removing Outliers: Pearson Correlation Graphs
pairs(select(medicare_adjust_abbrev, avg_sub_chg, avg_med_pay, avg_med_all, avg_med_std))

After removing outliers, some graphs such as between the average Medicare aid and the average Medicare allowed amount demonstrate a strong linear correlation. In addition to this, it can be seen that the outliers in the average submitted charge and average Medicare aid caused the correlation to appear stronger than it is without outliers.

Spearman Correlation Graphs (Removing Outliers)

# Removing Outliers: Spearman Correlation Graphs
pairs(select(medicare_adjust_abbrev, tot_prov, tot_benef, tot_ben_day, tot_serv))

These graphs, focusing on provider and service metrics with outliers removed, display stronger positive monotonic relationships than with outliers. The plots might demonstrate a more distinct pattern compared to the graphs with outliers. This is particularly true for the relationships involving total beneficiaries, which had lower Pearson correlations but higher Spearman correlations.

Pearson and Spearman Correlation Graphs Summary

The correlation graphs visually confirm the findings from the correlation matrices. They illustrate the strong positive relationships among financial variables and among service/beneficiary metrics. The differences between Pearson and Spearman correlations are likely visible in the shapes of the relationships, with some showing more linear patterns (financial variables) and others displaying non-linear but monotonic relationships (service metrics).

The correlation graphs visually reinforce the strong relationships between financial variables and the moderate to strong associations between provider and beneficiary metrics, highlighting both linear and non-linear patterns in the Medicare dataset.

Part 2: Regression Analysis

Regression analysis helps quantify the relationship between Medicare provider services and reimbursements. The goal is to determine how well one variable (e.g., total beneficiaries) predicts another (e.g., total providers).

Part 2.1: Relationship Between Total Providers and Total Beneficiaries

This linear regression model examines how the number of total beneficiaries influences the total number of providers. The coefficient and R-squared value indicate the extent to which variations in provider numbers are explained by beneficiary counts.

# Creating Linear Regression w/ Outliers
medicare_rgs_outliers <- lm(total_providers ~ total_beneficiaries, data = medicare)
summary_stats_outliers <- summary(medicare_rgs_outliers)
print(summary_stats_outliers)

## 
## Call:
## lm(formula = total_providers ~ total_beneficiaries, data = medicare)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -233213    -146    -132     -83  258446 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.545e+02  4.475e+00   34.53   <2e-16 ***
## total_beneficiaries 2.101e-02  4.061e-05  517.47   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2325 on 270671 degrees of freedom
## Multiple R-squared:  0.4973, Adjusted R-squared:  0.4973 
## F-statistic: 2.678e+05 on 1 and 270671 DF,  p-value: < 2.2e-16

This output presents the results of a linear regression analysis examining the relationship between total providers and total beneficiaries in the Medicare dataset, including outliers. The model is statistically significant (F-statistic: 2.678e+05, p-value: < 2.2e-16), indicating a strong relationship between the variables. The coefficient for total_beneficiaries (2.101e-02) is positive and highly significant (p < 2e-16), demonstrating that for every additional beneficiary, there is an average increase of about 0.02101 providers. The intercept (154.5) represents the estimated number of providers when there are zero beneficiaries. The model explains approximately 49.73% of the variance in total providers (R-squared: 0.4973), demonstrating a moderate fit. However, the large range in residuals (Min: -233213, Max: 258446) suggests the presence of influential outliers that may be affecting the model’s accuracy of linear regression.

# Relationship Between Total Providers and Total Beneficiaries 
ggplot(medicare, aes(total_providers, total_beneficiaries)) +
  geom_point(alpha = 0.75, color = "navyblue", fill = "turquoise1", shape = 21, size = 2, stroke = 0.5) +
  geom_smooth(method = "lm", color = "black", se = FALSE) +
  labs(
    title = "Relationship Between Total Providers and Total Beneficiaries",
    x = "Total Providers",
    y = "Total Beneficiaries",
    caption = paste("R-squared:", round(summary_stats_outliers$r.squared, 3))
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.title = element_text(face = "bold"),
    axis.text = element_text(size = 10)
  ) +
  scale_x_continuous(labels = scales::comma) +
  scale_y_continuous(labels = scales::comma)

## `geom_smooth()` using formula = 'y ~ x'

This graph illustrates the relationship between the total number of providers and total beneficiaries in the Medicare dataset. The scatter plot shows a positive linear relationship, with a moderate R-squared value of approximately 0.497. This indicates that as the number of providers increases, the number of beneficiaries tends to increase as well, despite there being considerable variation around the trend line. The presence of outliers is notable, with some points far from the main cluster. Outliers will be attempted to be removed and analyzed below.

Part 2.2: Relationship Between Total Providers and Total Beneficiaries (Outliers Removed)

A second regression model is generated after removing extreme outliers, attempting to provide a better analysis of the underlying trends between beneficiaries and providers. The improved fit of the model is assessed by comparing R-squared values before and after filtering.

# Creating Linear Regression w/o Outliers
medicare_rgs_filtered <- lm(total_providers ~ total_beneficiaries, data = medicare_adjust)
summary_stats_filtered <- summary(medicare_rgs_filtered)
print(summary_stats_filtered)

## 
## Call:
## lm(formula = total_providers ~ total_beneficiaries, data = medicare_adjust)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -147172    -125    -111     -65  218391 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         1.328e+02  3.656e+00   36.34   <2e-16 ***
## total_beneficiaries 2.589e-02  5.767e-05  448.95   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1895 on 270165 degrees of freedom
## Multiple R-squared:  0.4273, Adjusted R-squared:  0.4273 
## F-statistic: 2.016e+05 on 1 and 270165 DF,  p-value: < 2.2e-16

This output presents the results of a linear regression analysis examining the relationship between total providers and total beneficiaries in the Medicare study dataset after removing outliers. The model remains statistically significant (F-statistic: 1.433e+05, p-value: < 2.2e-16), which indicates a strong relationship between the variables. The coefficient for total_beneficiaries (2.751e-02) is positive and highly significant (p < 2e-16), demonstrating that for every additional beneficiary, there is an average increase of about 0.02751 providers. The intercept (123.2) represents the estimated number of providers when there are zero beneficiaries. The model explains approximately 34.67% of the variance in total providers (R-squared: 0.3467), indicating a moderate fit, though lower than the model with outliers. The range of residuals (Min: -38523, Max: 77334) is significantly smaller than in the previous model, demonstrating that removing outliers has reduced the influence of extreme values. This filtered model likely provides a better representation of the relationship between providers and beneficiaries for the majority of the data points.

# Relationship Between Total Providers and Total Beneficiaries (Outliers Removed)
ggplot(medicare_adjust, aes(total_providers, total_beneficiaries)) +
  geom_point(alpha = 0.75, color = "navyblue", fill = "turquoise1", shape = 21, size = 2, stroke = 0.5) +
  geom_smooth(method = "lm", color = "black", se = FALSE) +
  labs(
    title = "Relationship Between Total Providers and Total Beneficiaries (Outliers Removed)",
    x = "Total Providers",
    y = "Total Beneficiaries",
    caption = paste("R-squared:", round(summary_stats_filtered$r.squared, 3))
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.title = element_text(face = "bold"),
    axis.text = element_text(size = 10)
  ) +
  scale_x_continuous(labels = scales::comma) +
  scale_y_continuous(labels = scales::comma)

## `geom_smooth()` using formula = 'y ~ x'

After removing outliers, this graph displays a more distributed linear relationship between total beneficiaries and total providers. The R-squared value decreases to about 0.347, indicating that while the relationship remains positive, the outliers were influencing the strength of the correlation in the original dataset. This graph provides a more representative view of the typical relationship between providers and beneficiaries. Therefore, the graph highlights how the majority of data points relate to each other without the skewing effect of extreme values.

Part 2.3: Distribution of Beneficiary to Provider Ratio (Log Scale)

The beneficiary-to-provider ratio is analyzed and visualized on a logarithmic scale to better interpret variations across different locations. This highlights disparities in healthcare access and identifies outlier regions where provider shortages or surpluses exist.

# Filtering Data to Demonstrate Ratio
medicare_adjust$beneficiary_provider_ratio <- 
  medicare_adjust$total_beneficiaries / medicare_adjust$total_providers

# Calculate Mean Ratio and Print
mean_ratio <- mean(medicare_adjust$beneficiary_provider_ratio, na.rm = TRUE)
cat("Mean Beneficiary to Provider Ratio:", round(mean_ratio, 2), "\n")

## Mean Beneficiary to Provider Ratio: 23.49

This segment calculates the ratio of beneficiaries to providers for each entry in the adjusted Medicare dataset (with outliers removed) and then computes the average ratio across all entries. The resulting mean ratio of 23.48 indicates that, on average, there are approximately 23.48 beneficiaries for every provider in the dataset. This metric provides a quick snapshot of the overall distribution of beneficiaries among providers, suggesting that typically, each provider serves about 23 to 24 Medicare beneficiaries. This ratio can be useful for understanding the general workload or patient load per provider in the Medicare system. It is important to note that this is an average and individual cases may vary significantly.

# Distribution of Beneficiary to Provider Ratio (Log Scale)
ggplot(medicare_adjust, aes(x = beneficiary_provider_ratio)) +
  geom_histogram(binwidth = 0.1, fill = "aquamarine", color = "aquamarine4") +
  scale_x_log10(labels = scales::comma) +  # Apply log10 transformation
  labs(title = "Distribution of Beneficiary to Provider Ratio (Log Scale)",
       x = "Beneficiaries per Provider (Log Scale)",
       y = "Frequency") +
  theme_bw() +
  facet_wrap(~place_of_service) +
  coord_cartesian(xlim = c(1, 500), ylim = c(1, 25000))

This visualization presents the distribution of the beneficiary-to-provider ratio, displayed on a log scale, and faceted by place of service (labeled as ‘F’ and ‘O’). The histogram for place of service ‘F’ shows a high frequency of ratios clustered towards the lower end, indicating that many providers serve a relatively small number of beneficiaries. As the ratio increases (moving towards the right on the x-axis), the frequency decreases significantly. The histogram for place of service ‘O’ also shows a distribution skewed to the left, but with a less pronounced peak. The use of a logarithmic scale allows for a better understanding of the distribution across a wide range of ratios.

Part 3: T-Test Analysis

The following t-tests compare Medicare service metrics across different provider geographic levels to determine if there are statistically significant differences. This helps assess whether provider location impacts payment amounts and service utilization.

T-Test: Average Submitted Charge vs. Provider Geographic Level

# Performing t-test for Average Submitted Charge vs. Provider Geographic Level
t.test(average_submitted_charge ~ provider_geo_level, data = medicare)

## 
##  Welch Two Sample t-test
## 
## data:  average_submitted_charge by provider_geo_level
## t = 21.249, df = 14617, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group National and group State is not equal to 0
## 95 percent confidence interval:
##  449.1008 540.3763
## sample estimates:
## mean in group National    mean in group State 
##               1780.090               1285.352

This t-test compares the average submitted charge between national and state-level providers. The results show a statistically significant difference (t = 21.249, p-value < 2.2e-16) between the two groups. National providers have a higher mean submitted charge ($1780.09) compared to state providers ($1285.35), with a difference of approximately $494.74 (95% CI: $449.10 to $540.38).

T-Test: Average Aid Amount Charge vs. Provider Geographic Level

# Performing t-test for Average Aid Amount Charge vs. Provider Geographic Level
t.test(average_medicare_payment_amount ~ provider_geo_level, data = medicare)

## 
##  Welch Two Sample t-test
## 
## data:  average_medicare_payment_amount by provider_geo_level
## t = 16.985, df = 14571, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group National and group State is not equal to 0
## 95 percent confidence interval:
##   90.75193 114.43107
## sample estimates:
## mean in group National    mean in group State 
##               329.7900               227.1985

This analysis examines the difference in average Medicare payment amounts between national and state-level providers. The test reveals a statistically significant difference (t = 16.985, p-value < 2.2e-16). National providers receive higher average Medicare payments ($329.79) compared to state providers ($227.20), with a mean difference of about $102.59 (95% CI: $90.75 to $114.43).

T-Test: Total Services vs. Provider Geographic Level

# Performing t-test for Total Services vs. Provider Geographic Level
t.test(total_services ~ provider_geo_level, data = medicare)

## 
##  Welch Two Sample t-test
## 
## data:  total_services by provider_geo_level
## t = 9.9498, df = 13326, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group National and group State is not equal to 0
## 95 percent confidence interval:
##  186513.2 278029.3
## sample estimates:
## mean in group National    mean in group State 
##              244433.90               12162.67

To demonstrate the relationship between these two variables within the form of a bar graph, the results first must be saved into a variable after conducting the t-test.

# Storing t-test Results: Total Services vs. Provider Geographic Level for Plotting
t_test_results <- t.test(total_services ~ provider_geo_level, data = medicare)

The results can be further summarized by grouping functions and creating the mean from the t-test. The lower and upper intervals are also calculated to display within the bar graph.

# Summarizing t-test: Total Services vs. Provider Geographic Level
summary_data <- medicare %>%
  group_by(provider_geo_level) %>%
  summarise(
    mean = mean(total_services),
    se = sd(total_services) / sqrt(n()),
    ci_lower = mean - qt(0.975, df = n() - 1) * se,
    ci_upper = mean + qt(0.975, df = n() - 1) * se
  )

From summarizing are storing the data, the following snippet plots the bar graph to demonstrate the difference between the National and State levels of Medicare.

# Creating Gradient: Total Services vs. Provider Geographic Level
gradient_fill <- scale_fill_gradient(low = "royalblue1", high = "brown2")

suppressWarnings({
# Plotting Data: Total Services vs. Provider Geographic Level
ggplot(summary_data, aes(x = provider_geo_level, y = mean, fill = mean)) +
  geom_bar(stat = "identity", position = position_dodge(), 
           color = "black", size = 0.5) + # Add black outline
  geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper), width = 0.2, position = position_dodge(0.9)) +
  labs(
    title = "Comparison of Total Services by Provider Geographic Level",
    subtitle = paste("t =", round(t_test_results$statistic, 2), 
                     ", df =", round(t_test_results$parameter, 2),
                     ", p-value =", format.pval(t_test_results$p.value, digits = 3)),
    x = "Provider Geographic Level",
    y = "Mean Total Services"
  ) +
  scale_y_continuous(labels = scales::comma_format()) +
  gradient_fill +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "none",
    panel.grid.major = element_line(color = "gray90"),
    panel.grid.minor = element_line(color = "gray95")
  )
})

This t-test compares the total services provided by national and state-level providers. The results indicate a statistically significant difference (t = 9.9498, p-value < 2.2e-16). National providers offer substantially more services on average (244,433.90) compared to state providers (12,162.67), with a mean difference of approximately 232,271 services (95% CI: 186,513 to 278,029).

T-Test: Summary of Results

These t-tests consistently depict statistically significant differences between national and state-level providers across all three metrics: average submitted charges, average Medicare payment amounts, and total services provided. National providers consistently show higher values for all three measures. The most prevalent difference is in the total services provided, where national providers have an average of about 20 times more services than state providers. These results demonstrate that the geographic level of the provider (national vs. state) is strongly associated with differences in pricing, payment, and service volume in the Medicare system.

Conclusion: Implications of 2022 Medicare Provider Data for Healthcare Policy

Introduction: Understanding Medicare Physician and Practitioner Services

The objective of this study was to examine patterns related to Medicare care, costs incurred, and differences in compensation rendered by care providers in different geographic areas. The data collected by Centers for Medicare & Medicaid Services laid a firm background for examining healthcare utilization and costs in different regions. The preliminary analyses showed differences between reimbursements made by Medicare and costs imposed by care providers, which indicated a possible financial strain on beneficiaries. Through reformatting of the dataset for greater clarity and consistency in presentation. A firm analytical perspective was developed for assessing the impact of Medicare on care access and policy concerns.

Visualizations: Mapping Medicare Service Distributions and Aid Across Geographies

The visual analysis of Medicare service distributions showed patterns in support provision and differences in payment. The scatter plots and box plots showed that Medicare reimbursed about 20% of total costs on a normal basis, although there were occasional instances of full reimbursement. Geographic differences were also found; states of greater cost, New York and California, showed greater differences between reimbursement levels by practitioners and those of Medicare. The opposite, however, showed in a state-by-state breakdown in which some regions, especially in the Northeast, showed a greater number of services per patient. In addition, other states of large numbers of beneficiaries of Medicare failed to achieve the desired levels of services as what was to be expected based on their population. The results point toward a reason for considering medical infrastructure and policy differences in regions when assessing the efficiency of Medicare.

Dataset Manipulations: Refining Medicare Provider Data for 2022

In order to increase this study’s validity, a variety of data manipulation strategies were used, such as location filtering for providers in an effort to focus solely on the United States’ 50 states and aggregating data in relation to state-level metrics of service. The illustration of Medicare support and payment data clarified differences in coverage between regions. Interestingly, despite its extreme pattern of Medicare support, California reported some of its largest physician-reported charges, which helped support this trend of partial cost coverage. Areas of lower averages of submitted amounts and lower numbers of rendered services per beneficiary, especially in some of the Midwest, may face related barriers to accessing care or unfamiliarity among Medicare beneficiaries. The subsetted and filtered datasets made it possible for a more detailed and nuanced understanding of the pattern of Medicare support.

Statistical Analysis: Trends in Medicare Physician Services and Reimbursements

Statistical evaluations showed substantial correlations between physician submissions and Medicare reimbursement, although non-linear trends influenced by outliers were evidenced. Regression analysis showed that beneficiaries have a direct impact on providers, though with wide variability between states. T-tests comparing physicians on a national scale and a state-by-state scale showed statistically significant differences in physician submissions, support given by Medicare through payment aid, and total services. Physicians on a national scale receive greater compensation on average. The outcomes point toward systemic inequalities in Medicare resource provision, which may necessitate adjustments in reimbursement systems for greater equality between different levels of physicians.

Next Steps: Future Research and Policy Considerations

This study sought to discover substantial patterns in Medicare provision, cost variability, and reimbursement differences between different geographic regions for medical practitioners. Future studies may consider examining factors underlying medical specialties in producing these differences and analyzing lasting influences of Medicare policy on medical care access. For example, one limitation of the dataset’s information was the lack of co-insurance data for beneficiaries. This made it difficult to determine if Medicare coverage was intentionally low for outside coverage. In addition, a study of supplemental coverage on financing Medicare support may provide insights on patient financing practices. Policy-making may be enhanced by examining ways of simplifying reimbursement systems, ultimately reducing physician compensation and Medicare financing disparities, which in return, can increase care access throughout the country. This continued study is aimed at further understanding Medicare’s critical position in Medicare provision in the United States.

This study of Medicare services, providers, and fees in 2022 highlights trends and anomalies that impact the healthcare landscape for senior Americans. Understanding these patterns is important for informing policy decisions, improving service delivery, and bridging gaps between Medicare and the public. Thank you for your time and dedication to exploring this data with me!

Impacts of Medicare Services on Americans: A 2022 Analysis of Trends and Anomalies

Crafted by Sarah Akhtar at University of the Pacific

Date: February 20th, 2025

Introduction: Understanding Medicare Physician and Practitioner Services

Part 0: Dataset Library Loadings

Part 1: Dataset Link and Overview

Dataset Links and Descriptions

Part 2: Dataset Description

Variable Renaming

Variable Descriptions

Observations Analysis

Part 3: Motivation for Study

Part 4: Basic Summary Statistics

Summary Descriptions

Summary Manual Information

Summary Table of Information

Summary Key Insights

Visualizations: Mapping Medicare Service Distributions and Aid Across Geographies

Part 0: Overview of Visualizations for Medicare 2022 Data

Part 1: Average Submitted Charge vs. Average Medicare Aid Amount

Part 2: Distribution of Average Medicare Submitted Amount for Top 10 Areas

Part 3: Distribution of Average Medicare Aid Amount for Top 10 Areas

Dataset Manipulations: Refining Medicare Provider Data for 2022

Part 0: Loading Additional Libraries for Mapping

Part 1: Generating Subset Mappings for Each Map

Part 2: Creating Maps for Average Aid, Payment Amounts, and Beneficiary Data

Part 3: Multi-Dimensional Analysis of Medicare Services Across States

Statistical Analysis: Trends in Medicare Physician Services and Reimbursements

Part 0: Statistical Analysis Overview

Part 1: Correlation Analysis

Part 1.1: Pearson and Spearman Correlation Testing

Correlation Testing with Pearson Analysis

Correlation Testing with Spearman Analysis

Correlation Testing with Pearson Analysis (Removing Outliers)

Correlation Testing with Spearman Analysis (Removing Outliers)

Pearson and Spearman Correlations Table

Part 1.2: Pearson and Spearman Correlation Matrix

Pearson Correlation Matrix

Spearman Correlation Matrix

Pearson Correlation Matrix (Removing Outliers)

Spearman Correlation Matrix (Removing Outliers)

Pearson and Spearman Correlations Matrix Summary

Part 1.3: Displaying Correlation Graphs for Averages and Totals

Pearson Correlation Graphs

Spearman Correlation Graphs

Pearson Correlation Graphs (Removing Outliers)

Spearman Correlation Graphs (Removing Outliers)

Pearson and Spearman Correlation Graphs Summary

Part 2: Regression Analysis

Part 2.1: Relationship Between Total Providers and Total Beneficiaries

Part 2.2: Relationship Between Total Providers and Total Beneficiaries (Outliers Removed)

Part 2.3: Distribution of Beneficiary to Provider Ratio (Log Scale)

Part 3: T-Test Analysis

T-Test: Average Submitted Charge vs. Provider Geographic Level

T-Test: Average Aid Amount Charge vs. Provider Geographic Level

T-Test: Total Services vs. Provider Geographic Level

T-Test: Summary of Results

Conclusion: Implications of 2022 Medicare Provider Data for Healthcare Policy

Introduction: Understanding Medicare Physician and Practitioner Services

Visualizations: Mapping Medicare Service Distributions and Aid Across Geographies

Dataset Manipulations: Refining Medicare Provider Data for 2022

Statistical Analysis: Trends in Medicare Physician Services and Reimbursements

Next Steps: Future Research and Policy Considerations