Introduction:
This is Assignment # 7. In this analysis, we will be using a Gamma Regression Model using the insurance dataset. Through this analysis, we will be able to determine if certain variables have an affect on others. Specifically, we will be observing how age, BMI, and smoking habits affect insurance charges.
Data Preparation:
# Load the data
insurance <- read.csv("https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv")
head(insurance)
## age sex bmi children smoker region charges
## 1 19 female 27.900 0 yes southwest 16884.924
## 2 18 male 33.770 1 no southeast 1725.552
## 3 28 male 33.000 3 no southeast 4449.462
## 4 33 male 22.705 0 no northwest 21984.471
## 5 32 male 28.880 0 no northwest 3866.855
## 6 31 female 25.740 0 no southeast 3756.622
# Convert "smoker" into a binary factor
insurance$smoker <- as.factor(insurance$smoker)
# Fit the Gamma Regression Model
m_gama <- glm(charges ~ age + bmi + smoker,
family = Gamma(link = "log"),
data = insurance)
summary(m_gama)
##
## Call:
## glm(formula = charges ~ age + bmi + smoker, family = Gamma(link = "log"),
## data = insurance)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.444595 0.105187 70.775 < 2e-16 ***
## age 0.028426 0.001339 21.230 < 2e-16 ***
## bmi 0.012131 0.003084 3.934 8.79e-05 ***
## smokeryes 1.470576 0.046320 31.748 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 0.4671822)
##
## Null deviance: 1056.04 on 1337 degrees of freedom
## Residual deviance: 357.41 on 1334 degrees of freedom
## AIC: 26448
##
## Number of Fisher Scoring iterations: 7
Key Findings:
The p-values for age, BMI, and smoking are all less than 0.05, which indicates that they are highly significant. This means that all these variables have a significant effect on insurance charges.
Smoker had the largest effect on insurance charges, followed by age and BMI.
Calculating Average Marginal Effects:
library(clarify)
# Simulate model estimates
sim_gama <- sim(m_gama)
# Compute Average Marginal Effect for smoking
sim_est_gama <- sim_ame(sim_gama, var = "smoker", contrast = "rd")
# Summary of simulated AMEs
summary(sim_est_gama)
## Estimate 2.5 % 97.5 %
## E[Y(no)] 8234 7875 8605
## E[Y(yes)] 35834 33034 38963
## RD 27599 24684 30713
Here, I calculated the Average Marginal Effect of the smoking variable using Risk Difference (RD) as the contrast. These results imply that for:
Non-smokers:
The expected insurance charge for non-smokers is $8,234.
The true mean cost is likely between $7,891 and $8,588.
Smokers:
The expected insurance charge for smokers is $35,834.
The actual value likely falls between $32,973 and $38,875.
Based on the risk difference, smokers pay an average of $27,599 more than non-smokers for insurance.
Since the confidence interval (CI) does not include 0, smoking has a highly significant impact on insurance costs.
# Simulate model estimates for Age and BMI
sim_gama <- sim(m_gama)
# Compute Average Marginal Effects for Age and BMI
sim_est_age <- sim_ame(sim_gama, var = "age", contrast = "rd")
## Warning: `contrast` is ignored when the focal variable is continuous.
## Warning: `contrast` is ignored when the focal variable is continuous.
## Estimate 2.5 % 97.5 %
## E[dY/d(age)] 391 346 442
## Estimate 2.5 % 97.5 %
## E[dY/d(bmi)] 167.0 82.1 247.9
These results represent the expected marginal effects of age and BMI on insurance charges, derived from my Gamma regression model using Clarify. The results are as follows:
Effect of Age on Insurance Charges E[dY/d(age)] = 391
Interpretation: On average, for every 1-year increase in age, insurance charges increase by approximately $391.
Confidence Interval (95% CI): [348, 439]
The true effect is likely between $348 and $439.
Since the CI does not include 0, the effect is statistically significant.
Effect of BMI on Insurance Charges E[dY/d(bmi)] = 167
Interpretation: On average, for every 1-unit increase in BMI, insurance charges increase by approximately $167.
Confidence Interval (95% CI): [86.4, 254.8]
The true effect is likely between $86.4 and $254.8.
Since the CI does not include 0, the effect is statistically significant.
Conclusion:
Understanding how age, BMI, and smoking habits affect insurance charges is a critical question with significant implications for health economics, public health, and policy-making.
From an economic and insurance perspective, identifying how demographic factors like age and lifestyle choices influence insurance costs helps insurers set fair and data-driven premium rates. By analyzing these variables, predictive models can also be developed to estimate future healthcare expenses, ensuring better financial planning for both insurers and policyholders.
The question also has important public health implications. If smoking is shown to substantially increase insurance costs, it highlights the financial burden associated with smoking-related health risks, reinforcing the need for tobacco control measures. Similarly, if BMI strongly impacts charges, it underscores the economic consequences of obesity and the importance of preventive healthcare initiatives.
Lastly, these findings can inform policy and intervention strategies. If age and lifestyle factors are key elements of insurance costs, governments and employers may introduce incentives such as lower premiums for non-smokers or wellness programs aimed at obesity prevention. These measures could not only reduce healthcare expenses but also encourage healthier behaviors, benefiting both individuals and society as a whole.