Home-based services encompass a wide range of services including home health visits, remote patient monitoring, and delivery of clinical care within patients’ homes. Home healthcare utilization has become more common among complex and chronically ill Medicare populations. With healthcare systems shifting to value-based models of care delivery, providers and policymakers are interested in understanding if greater utilization of home-based services reduces higher-cost acute care service use including hospitalizations and emergency department visits.
Many Medicare beneficiaries experience multiple chronic conditions, functional limitations, and complex medication regimens that necessitate regular clinical monitoring. Historically, hospital based care and emergency department use have been utilized to treat acute on chronic illnesses. These care settings are expensive and resource intensive and are associated with negative outcomes such as hospital acquired complications including infections, functional decline, and preventable readmissions.
Home health is a means of delivering skilled clinical services to patients within the comfort of their own homes. These services range from skilled nursing visits to therapy services, medication reconciliation, chronic disease monitoring, and care coordination. Home health services may allow for clinical intervention at earlier stages of illness and closer follow-up to prevent acute clinical decline and avoidable hospitalization.
Professional Motivation and Topic Selection
I chose this topic because of increasing policy interest in home based care models, as well as direct professional experience with analysis of healthcare utilization at the population level. Through work with home-based clinical service programs, including House Calls–style programs serving high-risk Medicare beneficiaries, there is ongoing interest in understanding whether higher-intensity home-based services are associated with improved downstream utilization outcomes.
There are often organization-level analyses performed around cost and utilization trends after implementing a program like this, but it can also be interesting to look at these relationships in publicly available national data. Using CMS public use data to look at whether these relationships hold true at the county and state levels across the country can be one way to analyze this.
If we, as healthcare organizations, payers, and policymakers are looking to design care models that improve outcomes while reducing costs, it’s critical to understand the relationship between different components of utilization. If higher utilization of home health care is associated with decreased acute care utilization, that could indicate that there still needs to be investment in home-based care models for Medicare beneficiaries with complex medical needs.
Research Question
Will greater home health utilization lead to reduced acute care utilization (ie, inpatient admissions, emergency department use) after controlling for demographic and geographic differences?
Identifying these connections can allow for better policy development, resource management, and development of value based care models for high risk Medicare populations.
Primary Hypothesis
Higher home health utilization is associated with lower inpatient hospitalization rates across geographic regions.
Secondary Hypotheses
Higher home health utilization is associated with lower emergency department utilization rates. Higher home health service intensity is associated with variation in total Medicare spending patterns. Demographic characteristics of beneficiary populations may influence observed utilization relationships.
This project uses publicly available datasets from the Centers for Medicare & Medicaid Services (CMS) to evaluate geographic variation in healthcare utilization patterns across the United States.
Dataset 1
CMS Medicare Geographic Variation Dataset
The dataset contains county-level data about geographic variation in Medicare utilization, expenditures, and beneficiary populations. Variables include measures of inpatient utilization, ED visits, home health utilization, total Medicare expenditures, and Medicare beneficiary population.
Key variables include:
• Home Health Visits per 1,000 Beneficiaries
• Inpatient Covered Stays per 1,000 Beneficiaries
• Emergency Department Events per 1,000 Beneficiaries
• Medicare Beneficiary Counts
This dataset allows population-adjusted comparisons across geographic regions.
Dataset 2
CMS Medicare Geographic Variation Dataset
This dataset provides provider-level information on service intensity and cost structure associated with home-based care delivery.
Key variables include:
• Total Medicare Payments
• Total Provider Charges
• Total Service Days
• Average Beneficiary Age
• Beneficiary Gender Distribution
This dataset provides insight into utilization intensity and cost concentration patterns within home-based care delivery.
Data Integration Strategy
These two datasets will be utilized to analyze patterns of utilization at a macro level (ie. across geographies) as well as at a micro level (eg. intensity of service usage). Identification of state and county will be included when possible for cross sectional comparisons.
Data Cleaning and Preparation
Data cleaning and preparation will be conducted using R statistical software. Numeric variables stored as character strings will be converted to numeric format. Missing values will be removed to ensure analytic consistency. Variables will be standardized to per-1,000 beneficiary rates to enable cross-region comparison.
Multivariate regression models will be used to adjust for potential confounders such as beneficiary age distribution, gender composition, and regional healthcare spending variation.
Exploratory Data Analysis
Exploratory data analysis will be conducted to examine distribution patterns and identify potential outliers. Distributional visualizations will be used to evaluate cost concentration and utilization variability across providers and regions.
Initial exploratory analyses include:
• Distribution of Medicare Payments
• Distribution of Total Service Days
• Distribution of Home Health Visits per 1,000 Beneficiaries
Medicare payment distributions were examined to understand cost concentration across providers. Healthcare cost data typically demonstrates strong right skew.
Statistical Methods
The following statistical methods will be used to evaluate relationships between home health utilization and acute care utilization outcomes:
• Correlation analysis to evaluate initial relationships between utilization metrics
• Linear regression modeling to evaluate strength and direction of associations
• Log transformation of cost variables to address right-skewed healthcare spending distributions
• Distribution and outlier analysis to evaluate cost and utilization concentration patterns
If significant relationships are observed, additional modeling approaches such as generalized linear models or hierarchical geographic models may be explored to further evaluate regional variation in utilization patterns.
library(tidyverse)
library(janitor)
library(readr)
library(dplyr)
library(ggplot2)
hh<- read.csv("/Users/ulianaplotnikova/Downloads/HH/2023/HH.csv")
clean_numeric <- function(x) {
x <- gsub("[^0-9.]", "", x)
as.numeric(x)
}
hh_clean <- hh %>%
mutate(
TOT_MDCR_PYMT_AMT = clean_numeric(TOT_MDCR_PYMT_AMT),
TOT_CHRG_AMT = clean_numeric(TOT_CHRG_AMT),
TOT_SRVC_DAYS = clean_numeric(TOT_SRVC_DAYS),
BENE_AVG_AGE = clean_numeric(BENE_AVG_AGE),
BENE_FEML_PCT = clean_numeric(BENE_FEML_PCT)
) %>%
drop_na(
TOT_MDCR_PYMT_AMT,
TOT_CHRG_AMT,
TOT_SRVC_DAYS,
BENE_AVG_AGE,
BENE_FEML_PCT
)
geo<- read.csv("/Users/ulianaplotnikova/Downloads/Geo/2023/Geo.csv")
geo_county <- geo %>%
filter(BENE_GEO_LVL == "County")
analysis_df <- geo_county %>%
select(
YEAR,
BENE_GEO_DESC,
BENES_FFS_CNT,
HH_VISITS_PER_1000_BENES,
IP_CVRD_STAYS_PER_1000_BENES,
EM_EVNTS_PER_1000_BENES
)
analysis_df <- analysis_df %>%
mutate(
HH_VISITS_PER_1000_BENES = as.numeric(HH_VISITS_PER_1000_BENES),
IP_CVRD_STAYS_PER_1000_BENES = as.numeric(IP_CVRD_STAYS_PER_1000_BENES),
EM_EVNTS_PER_1000_BENES = as.numeric(EM_EVNTS_PER_1000_BENES)
)%>%
drop_na(
HH_VISITS_PER_1000_BENES,
IP_CVRD_STAYS_PER_1000_BENES,
EM_EVNTS_PER_1000_BENES
)
ggplot(hh_clean, aes(x = TOT_MDCR_PYMT_AMT)) +
geom_histogram(bins = 60, fill = "steelblue", color = "white") +
scale_x_log10() +
theme_minimal() +
labs(
title = "Distribution of Medicare Payments (Log Scale)",
x = "Medicare Payment Amount (Log Scale)",
y = "Count"
)
The distribution demonstrates strong right skew, consistent with known healthcare expenditure concentration patterns.
To evaluate variation in service intensity across providers, the distribution of total service days was examined. Service days represent the total number of days home health services were delivered and serve as a proxy for utilization intensity.
ggplot(hh_clean, aes(x = TOT_SRVC_DAYS)) +
geom_histogram(bins = 60, fill = "darkgreen", color = "white") +
scale_x_log10() +
theme_minimal() +
labs(
title = "Distribution of Total Service Days",
x = "Service Days",
y = "Count"
) +
theme_minimal()
The distribution demonstrates substantial variability in service utilization across providers. The right-skewed distribution suggests that a small number of providers deliver very high service volumes, consistent with known healthcare utilization concentration patterns.
To understand variation in home health utilization across geographic regions, a distribution analysis was conducted using home health visits per 1,000 Medicare beneficiaries. This visualization helps identify variability across counties and potential outliers in utilization patterns.
ggplot(analysis_df,
aes(x = HH_VISITS_PER_1000_BENES)) +
geom_histogram(
bins = 40,
fill = "steelblue",
color = "white"
) +
labs(
title = "Distribution of Home Health Visits per 1,000 Beneficiaries"
) +
theme_minimal()
To evaluate the primary hypothesis, the relationship between home health utilization and hospitalization utilization was examined using scatter plot visualization. This allows evaluation of whether higher home health utilization is associated with lower inpatient utilization rates.
Relationship Analysis
Scatterplot-based analyses will be conducted to evaluate relationships between:
• Home Health Utilization and Hospitalization Rates
• Home Health Utilization and Emergency Department Utilization
• Service Intensity and Medicare Payment Patterns
• Demographic Characteristics and Spending Variation
Linear regression models will be used to evaluate the strength and direction of observed relationships. The relationship between provider charges and Medicare reimbursement amounts was examined to understand reimbursement structure patterns.
ggplot(hh_clean, aes(x = TOT_CHRG_AMT, y = TOT_MDCR_PYMT_AMT)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "lm", color = "red") +
labs(
title = "Charges vs Medicare Payments",
x = "Total Charges",
y = "Medicare Payments"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
A strong positive association is observed between provider charges and Medicare reimbursement amounts.
ggplot(analysis_df,
aes(
x = HH_VISITS_PER_1000_BENES,
y = IP_CVRD_STAYS_PER_1000_BENES
)) +
geom_point(alpha = 0.05, size = 0.5) +
geom_smooth(method = "lm", color = "red", se = FALSE) +
labs(
title = "Home Health vs Hospital Stays"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
ggplot(hh_clean, aes(x = BENE_AVG_AGE, y = TOT_MDCR_PYMT_AMT)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm", color = "purple") +
scale_y_log10() +
theme_minimal() +
labs(
title = "Average Age vs Medicare Spending",
x = "Average Beneficiary Age",
y = "Medicare Payments (Log)"
)
## `geom_smooth()` using formula = 'y ~ x'
The visualization suggests a potential association between beneficiary age and Medicare spending, although additional statistical modeling is required to quantify the strength of this relationship.
ggplot(hh_clean, aes(x = BENE_FEML_PCT, y = TOT_MDCR_PYMT_AMT)) +
geom_point(alpha = 0.3) +
geom_smooth(method = "lm", color = "orange") +
scale_y_log10() +
theme_minimal() +
labs(
title = "Female Percentage vs Medicare Spending",
x = "% Female Beneficiaries",
y = "Medicare Payments (Log)"
)
## `geom_smooth()` using formula = 'y ~ x'
Differences in spending patterns across beneficiary gender composition may reflect underlying differences in healthcare utilization patterns and disease prevalence.
Initial visualization results demonstrate several expected healthcare utilization patterns. Medicare payment distributions demonstrate right-skewed behavior, indicating cost concentration among a subset of providers. This pattern is consistent with known healthcare cost distribution trends.
Service day distributions demonstrate variability across providers, suggesting heterogeneity in service intensity. This may reflect differences in patient complexity, provider practice patterns, or regional care delivery models.
Preliminary scatterplot analysis suggests potential associations between home health utilization and acute care utilization outcomes. However, further statistical modeling is required to determine statistical significance and causal inference.
Expected Findings
It is expected that higher home health utilization will be associated with lower hospitalization utilization rates. Increased home-based clinical monitoring may allow earlier intervention and prevent acute exacerbations requiring hospitalization.
It is also expected that higher home health utilization may be associated with reduced emergency department utilization. Improved care coordination and medication management may reduce emergency care utilization which can be avoided.
It is also possible that more home health utilization can lead to higher total spending in certain high-risk populations. This may reflect appropriate resource allocation rather than inefficiency.
Policy and Healthcare System Implications
Should results prove the hypothesis, they would provide evidence to promote expansion of home health as a care delivery model for Medicare beneficiaries. Results can be used to inform value-based care model design, care management program development, and allocation of resources for high risk patients.
Limitations
Due to the observational nature of this analysis, causality cannot be determined. Regional variation may be due to unobserved differences across geographic regions such as provider supply, socioeconomic differences, and overall population health. Publicly available CMS data is reported at an aggregated level and does not account for patient-level clinical severity.
Home healthcare services are an integral part of our healthcare system and provide necessary services to high-needs Medicare beneficiaries. It is important to understand how home health utilization impacts acute care utilization to design effective care models that reduce costs and improve the quality of care.
In this analysis, publicly available CMS data was used to examine the association between home health utilization and acute care utilization by geographic region. Results of this study can be used to inform policy and care delivery strategies.