Medical Insurance Charges (Questions)
1 Overview
This notebook uses a real healthcare-related dataset on medical insurance charges.
You will be tasked to conduct analysis on the medical insurance charges.
1.1 Your Task
- Import and load a dataset in R.
- Identify outcome and predictor variables.
- Fit simple and multiple linear regression models & carry out forward selection based on p-values.
- Interpret coefficients, p-values, and model fit statistics.
- Make predictions from a fitted regression model.
2 Load the data
2.1 Question 1:
Go to the following URL (https://www.kaggle.com/datasets/mirichoi0218/insurance/data)
Download the data in csv and print out the first few rows of the data.
Click on this link for answers to question 1: https://rpubs.com/Samuelllim/Q1
3 Data preparation
3.1 Question 2a:
List the variables that are categorical?
3.2 Question 2b:
What should you do with the data type of these variables before fitting a regression model?
3.3 Question 2c:
Which is the response variable?
Click on this link for answers to question 3: https://rpubs.com/Samuelllim/Q2
4 Simple linear regression
4.1 Question 3a
Fit a simple linear regression model with charges as the response and bmi as the predictor.
4.2 Question 3b
Interpret the coefficient of bmi
Click on this link for answers to question 3: https://rpubs.com/Samuelllim/Q3
5 Multiple linear regression
5.1 Question 4a
Fit the following multiple linear regression model with age, sex, bmi, children, smoker and region as predictors.
5.2 Question 4b
Using the fitted model, interpret the coefficient of smokeryes.
5.3 Question 4c
Using the fitted model, interpret the coefficient of age.
6 Model Selection - Forward selection based on p-values
In this section, use forward selection based on p-values.
Click on this link for answers to question 3: https://rpubs.com/Samuelllim/Q4
6.1 Question 5a
Fit one-predictor models for each candidate variable and compare their p-values.
Which variable should enter first?
6.2 Question 5b
Write down your final selected model.
Click on this link for answers to question 5: https://rpubs.com/Samuelllim/Q5
7 Prediction
7.1 Question 6
Predict the insurance charges for the following person:
- age = 40
- sex = female
- bmi = 30
- children = 2
- smoker = no
- region = southeast
Click on this link for answers to question 6: https://rpubs.com/Samuelllim/Q6