Objective:
In this project, we will perform exploratory data analysis (EDA) on a dataset focusing on healthcare costs. The dataset contains information about various factors affecting healthcare costs, such as age, BMI, region, smoking habits, etc. We will utilize Excel for data cleaning, SQL for data manipulation (optional), and R for data analysis and visualization.
Questions for Analysis:
The following questions will be the focus of my analysis.
- How do healthcare costs vary by demographic factors such as age, sex,
BMI, and region?
- What is the impact of smoking status on healthcare costs?
- What is the relationship between BMI (Body Mass Index) and healthcare
costs?
- Are there any regional differences in healthcare costs?
Data Souce:
* Dataset: “Medical
Cost Personal Datasets” via Kaggle
* Description: This dataset contains information about patients
including their age, sex, BMI, children, smoking status, region, and
their healthcare costs.
Limitaions:
The dataset contains a limited number of variables, focusing primarily
on demographic information (age, sex, BMI, children) and lifestyle
factors (smoking status), with healthcare costs being the primary
outcome variable. Other important factors such as pre-existing medical
conditions, type of healthcare coverage, and specific medical procedures
are not included, which could limit the comprehensiveness of the
analysis.
Clean Data in Excel
I chose to initally clean the dataset in Excel and took the following
steps to clean and organize the data:
- Remove any duplicate rows.
- Checked for missing values.
- Cheched for inconsistencies in the data types and fixed them if
necessary.
- Changed formatting for “charges” column to reflect currency.
Promotion of Healthy Lifestyle Choices: Encourage initiatives aimed at promoting healthy lifestyle choices, such as smoking cessation programs, nutrition education, and physical activity promotion. By reducing risk factors like smoking and obesity, individuals may experience improved health outcomes and lower healthcare costs in the long term.
Preventive Healthcare Interventions: Advocate for increased access to preventive healthcare services, including routine screenings, vaccinations, and preventive care visits. Early detection and management of chronic conditions can help prevent costly complications and reduce overall healthcare expenditures.
Targeted Interventions for High-Risk Groups: Identify high-risk demographic groups, such as smokers or individuals with high BMI, and develop targeted interventions tailored to their specific needs. This may include subsidized smoking cessation programs, weight management interventions, or targeted health education campaigns.