Assignment 8
Modeling Healthcare Costs:
This dataset is from a public repository on Github. It contains information on 1,338 individuals, including demographic features (age, sex, region), health-related variables (BMI, number of children, smoking status), and individual medical costs billed by health insurance. It is commonly used to model and predict healthcare expenses. It is also used for teaching regression modeling with right-skewed outcomes and was included in Brett Lantz’s Machine Learning with R (2013). In this analysis, I originally used a gamma regression model to predict the medical costs (charges) based on age, BMI, and smoking status. The Gamma distribution is suitable for modeling positive continuous data with a right-skewed distribution, such as medical costs. However, this time I will be using the function modelsummary() to summarize the model in a format that is more organized and concise.
# Load the data
insurance <- read.csv("https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv")
head(insurance)
## age sex bmi children smoker region charges
## 1 19 female 27.900 0 yes southwest 16884.924
## 2 18 male 33.770 1 no southeast 1725.552
## 3 28 male 33.000 3 no southeast 4449.462
## 4 33 male 22.705 0 no northwest 21984.471
## 5 32 male 28.880 0 no northwest 3866.855
## 6 31 female 25.740 0 no southeast 3756.622
library(modelsummary)
# Convert "smoker" into a binary factor
insurance$smoker <- as.factor(insurance$smoker)
# Fit the Gamma Regression Model
m_gama <- glm(charges ~ age + bmi + smoker,
family = Gamma(link = "log"),
data = insurance)
# Display the model summary
modelsummary(m_gama, output = "markdown")
(1) | |
---|---|
(Intercept) | 7.445 |
(0.105) | |
age | 0.028 |
(0.001) | |
bmi | 0.012 |
(0.003) | |
smokeryes | 1.471 |
(0.046) | |
Num.Obs. | 1338 |
AIC | 26448.3 |
BIC | 26474.3 |
Log.Lik. | -13219.142 |
F | 488.741 |
RMSE | 7443.59 |
This table shows that for each additional year of age, the log of expected charges increases by 0.028. This means charges increase 2.8% increase per year.
For each unit increase in BMI, charges go up by 1.2% increase in expected charges.
Smokers have much higher expected charges. exp(1.471) ≈ 4.35, meaning smokers are expected to pay 4.35× more in insurance costs.
Standard Errors Values in parentheses: These show the uncertainty in the estimate. All variables are statistically significant.
Conclusion:
As people get older, gain BMI, or smoke, their insurance charges go up. But smoking has the biggest impact — it’s associated with more than a 4-fold increase in expected medical costs. We find that age, BMI, and smoking status are all statistically significant predictors of healthcare costs, consistent with findings in health economics literature (Lantz, n.d.).