HealthCare Report

Author

Happiness Ndanu

Problem Statement

This analysis aims to investigate the relationship between demographic factors (age, gender, marital status, employment status, and monthly household income) and the likelihood of having health insurance.

Introduction

Target Audience

Libraries

To kick off, let; load the necessary libraries needed for this analysis.

Click to show/hide code
library(tidyverse)
library(readxl)
library(janitor)
library(scales)  
library(ggpie)
library(ggpubr)
library(ggforce)
library(htmltools)
library(plotly)
library(knitr)
library(gganimate)
library(scales)
library(kableExtra)
 library(DT)
library(viridis)
library(lubridate)
library(stringr)

Next, we will import the data set needed for this analysis.

Click to show/hide code
health_insurance <- read_excel("D:/projects/health_insurance/health_insurance.xlsx") %>% 
  clean_names()

Data Cleaning

After defining our problem statement, we will proceed to prepare our data for deep analysis through data cleaning procedures.

  1. Checking the data types in the columns
Click to show/hide code
glimpse(health_insurance)
Rows: 6,158
Columns: 32
$ location                                                                                                               <chr> …
$ location_latitude                                                                                                      <dbl> …
$ location_longitude                                                                                                     <dbl> …
$ location_altitude                                                                                                      <dbl> …
$ location_precision                                                                                                     <dbl> …
$ date_and_time                                                                                                          <dttm> …
$ age                                                                                                                    <chr> …
$ gender                                                                                                                 <chr> …
$ marital_status                                                                                                         <chr> …
$ how_many_children_do_you_have_if_any                                                                                   <dbl> …
$ employment_status                                                                                                      <chr> …
$ monthly_household_income                                                                                               <chr> …
$ have_you_ever_had_health_insurance                                                                                     <chr> …
$ if_yes_which_insurance_cover                                                                                           <chr> …
$ when_was_the_last_time_you_visited_a_hospital_for_medical_treatment_in_months                                          <dbl> …
$ did_you_have_health_insurance_during_your_last_hospital_visit                                                          <chr> …
$ have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider                                              <chr> …
$ if_you_answered_yes_to_the_previous_question_what_time_period_in_years_do_you_stay_before_having_your_routine_check_up <chr> …
$ have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc                                                     <chr> …
$ if_you_answered_yes_to_the_previous_question_what_time_period_in_years_do_you_stay_before_having_your_cancer_screening <chr> …
$ your_picture                                                                                                           <lgl> …
$ your_picture_url                                                                                                       <lgl> …
$ id                                                                                                                     <dbl> …
$ uuid                                                                                                                   <chr> …
$ submission_time                                                                                                        <dttm> …
$ validation_status                                                                                                      <lgl> …
$ notes                                                                                                                  <lgl> …
$ status                                                                                                                 <chr> …
$ submitted_by                                                                                                           <chr> …
$ version                                                                                                                <chr> …
$ tags                                                                                                                   <lgl> …
$ index                                                                                                                  <dbl> …

According to this, we have several columns that are not in their right data types. These columns include: age, marital status, how many children do you have and gender. To correct this, necessary data type will be allocated as follows.

Click to show/hide code
health_insurance$date_and_time <- as.Date(health_insurance$date_and_time, format = "%y/%m/%d")
health_insurance$how_many_children_do_you_have_if_any <- as.numeric(health_insurance$how_many_children_do_you_have_if_any)
health_insurance$age <-  as.factor(health_insurance$age)
health_insurance$marital_status <- as.factor(health_insurance$marital_status)
health_insurance$gender <- as.factor(health_insurance$gender)
#health_insurance$monthly_household_income <- as.factor(health_insurance$monthly_household_income)

Next, let’s look out for missing values

Click to show/hide code
missing_values <- colSums(is.na(health_insurance)) %>% print()
                                                                                                              location 
                                                                                                                   353 
                                                                                                     location_latitude 
                                                                                                                   353 
                                                                                                    location_longitude 
                                                                                                                   353 
                                                                                                     location_altitude 
                                                                                                                   353 
                                                                                                    location_precision 
                                                                                                                   353 
                                                                                                         date_and_time 
                                                                                                                   148 
                                                                                                                   age 
                                                                                                                    18 
                                                                                                                gender 
                                                                                                                    17 
                                                                                                        marital_status 
                                                                                                                    18 
                                                                                  how_many_children_do_you_have_if_any 
                                                                                                                   625 
                                                                                                     employment_status 
                                                                                                                    24 
                                                                                              monthly_household_income 
                                                                                                                   259 
                                                                                    have_you_ever_had_health_insurance 
                                                                                                                    19 
                                                                                          if_yes_which_insurance_cover 
                                                                                                                  2519 
                                         when_was_the_last_time_you_visited_a_hospital_for_medical_treatment_in_months 
                                                                                                                   158 
                                                         did_you_have_health_insurance_during_your_last_hospital_visit 
                                                                                                                    56 
                                             have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider 
                                                                                                                    23 
if_you_answered_yes_to_the_previous_question_what_time_period_in_years_do_you_stay_before_having_your_routine_check_up 
                                                                                                                  4382 
                                                    have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc 
                                                                                                                    31 
if_you_answered_yes_to_the_previous_question_what_time_period_in_years_do_you_stay_before_having_your_cancer_screening 
                                                                                                                  4593 
                                                                                                          your_picture 
                                                                                                                  6158 
                                                                                                      your_picture_url 
                                                                                                                  6158 
                                                                                                                    id 
                                                                                                                     0 
                                                                                                                  uuid 
                                                                                                                     0 
                                                                                                       submission_time 
                                                                                                                     0 
                                                                                                     validation_status 
                                                                                                                  6158 
                                                                                                                 notes 
                                                                                                                  6158 
                                                                                                                status 
                                                                                                                     0 
                                                                                                          submitted_by 
                                                                                                                     1 
                                                                                                               version 
                                                                                                                     0 
                                                                                                                  tags 
                                                                                                                  6158 
                                                                                                                 index 
                                                                                                                     0 

Next step is to work on the missing values.

Lets start with the “How many children do you have, if any?” column. Replace the missing values with 0 to stand for no children

Click to show/hide code
health_insurance$how_many_children_do_you_have_if_any[is.na(health_insurance$how_many_children_do_you_have_if_any)] <- 0

Next, fix the “Employment Status” column

Click to show/hide code
health_insurance$employment_status[is.na(health_insurance$employment_status)] <- "Unknown"
health_insurance$employment_status <-  as.factor(health_insurance$employment_status)

The blanks in this case were replaced with “Unknown”

Next, in the “Monthly Household Income” column, we have 259 missing values. This is quite a significant number of missing values so we will replace missing values with “Unknown”

Click to show/hide code
health_insurance$monthly_household_income[is.na(health_insurance$monthly_household_income)] <- "Unknown"
health_insurance$monthly_household_income <- factor(health_insurance$monthly_household_income)

Next, is the “Have you ever had health insurance” column we have 19 missing values, we will replace missing values with “No”

Click to show/hide code
health_insurance$have_you_ever_had_health_insurance[is.na(health_insurance$have_you_ever_had_health_insurance)] <- "No"

Next up, is the “If yes, which insurance cover” column which has missing values. In this case, if the patient has insurance cover, we will replace missing value with “Unknown” but if No, we will replace missing value with “Not Applicable”

Click to show/hide code
health_insurance$if_yes_which_insurance_cover <- ifelse(
  health_insurance$have_you_ever_had_health_insurance == "Yes" & is.na(health_insurance$if_yes_which_insurance_cover),
  "Unknown",
  ifelse(
    health_insurance$have_you_ever_had_health_insurance == "No",
    "Not_applicable",
    health_insurance$if_yes_which_insurance_cover 
  )
)

For the “When was the last time you visited a hospital for medical treatment in months”, the missing values will be replaced with 0

Click to show/hide code
health_insurance$when_was_the_last_time_you_visited_a_hospital_for_medical_treatment_in_months[is.na(health_insurance$when_was_the_last_time_you_visited_a_hospital_for_medical_treatment_in_months)] <- "0" 

In the “Did you have health insurance during your last hospital visit”, the missing values will be replaced with “Unknown”

Click to show/hide code
health_insurance$did_you_have_health_insurance_during_your_last_hospital_visit[is.na(health_insurance$did_you_have_health_insurance_during_your_last_hospital_visit)] <- "Unknown"

In the “Have you ever had a routine check up with a doctor or healthcare provider”, missing values are replaced with “Unknown”

Click to show/hide code
health_insurance$have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider[is.na(health_insurance$have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider)] <- "Unknown"

In the age column, we will drop the missing values. For marital status and gender, we will replace them with “Unknown”

Click to show/hide code
health_insurance <- health_insurance %>%
  filter(!is.na(age))
Click to show/hide code
health_insurance <- health_insurance %>%
  mutate(gender = ifelse(is.na(gender), "Unknown", as.character(gender)))
health_insurance$gender <- factor(health_insurance$gender)
Click to show/hide code
health_insurance <- health_insurance %>%
  mutate(marital_status = ifelse(is.na(marital_status), "Unknown", as.character(marital_status)))
health_insurance$marital_status <- factor(health_insurance$marital_status)

Lastly, in the “Have you ever had a cancer screening eg.mammogram colonoscopy etc” , the missing values will be replaced with Unspecified

Click to show/hide code
health_insurance$have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc[is.na(health_insurance$have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc)] <- "Unspecified"

Next, select the necessary columns that we will need for the analysis & confirm again if there are any pending missing values

Click to show/hide code
health_insurance <- health_insurance %>% 
  select(age,gender,marital_status, how_many_children_do_you_have_if_any, employment_status, monthly_household_income, have_you_ever_had_health_insurance,if_yes_which_insurance_cover,when_was_the_last_time_you_visited_a_hospital_for_medical_treatment_in_months, did_you_have_health_insurance_during_your_last_hospital_visit,have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider,have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc, index)

Last step in cleaning will be looking out for any duplicates in the data

Click to show/hide code
duplicate <- sum(duplicated(health_insurance)) %>% print()
[1] 0

Now that my data is all clean, let the EDA begin!

Exploratory Data Analysis

1.0 What is the common age group in the dataset?

Click to show/hide code
age_group_counts <- health_insurance%>%
  group_by(age) %>%
  summarise(number_of_patients = n())
Click to show/hide code
ggplot(age_group_counts, aes(x = age, y = number_of_patients, fill = age)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Number of Patients by Age Group",
    x = "Age Group",
    y = "Number of Patients"
  ) +
  theme_minimal() +
  scale_fill_brewer(palette = "Set2") +
  theme(legend.position = "none")

Findings: The 18-30 age group had the highest number of patients whereas the 60+ age group recorded the least number of patients

2.0 What the gender count of the patients?

Click to show/hide code
gender_group_counts <- health_insurance%>%
  group_by(gender) %>%
  summarise(number_of_patients = n())
Click to show/hide code
ggplot(gender_group_counts, aes(x = reorder(gender, -number_of_patients), y = number_of_patients)) +
  geom_bar(stat = "identity", fill=c("#B45D58","#B8647D","#A875A0")) +
  geom_text(aes(label = number_of_patients), hjust = 0.5, color = "black", size = 3)+
  labs(title = "Gender Distribution by Number of Patients",
    x = "Gender",
    y = "Number of Patients") +
  scale_y_continuous(labels = scales::comma_format()) +  
  coord_flip() + 
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Note

Findings:

We had a high record of males which was followed closely by the number of females. The margin between the two was quite small. It is also right to note that we had 10 patients that didn’t register their gender.

3.0 What the marital status count of the patients?

Click to show/hide code
marital_count<- health_insurance%>%
  group_by(marital_status) %>%
  summarise(number_of_patients = n())
Click to show/hide code
ggplot(marital_count, aes(x = reorder(marital_status, -number_of_patients), y = number_of_patients)) +
  geom_bar(stat = "identity", fill=c("#B45D58","#B8647D","#A875A0","#848BB8")) +
  geom_text(aes(label = number_of_patients), hjust = 0.5, color = "black", size = 3)+
  labs(title = "Distribution of Marital Status Among Patients",
    x = "Marital Status",
    y = "Number of Patients") +
  scale_y_continuous(labels = scales::comma_format()) +  
  coord_flip() + 
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Note

Findings: The Married category had the highest number of patients whereas the Divorced category had the least number of patients. It also important to note that we have 13 patients who did not register their gender.

4.0 How many children did the patients have?

Click to show/hide code
children_counts <- health_insurance %>%
  group_by(how_many_children_do_you_have_if_any) %>%
  summarise(count_of_children = n())%>% 
  arrange(desc(count_of_children))

 datatable(children_counts, 
           options = list(pageLength = 10, 
                          lengthMenu = c(10, 25, 50)), 
           colnames = c("Number of Children Category", "Children Count"))
Note

Findings: From the table we can conclude that the highest record was of patients with no children

5.0 What is the employment status of the patients

Click to show/hide code
employement_status<-health_insurance %>% 
  group_by(employment_status) %>% 
  summarise(employment_count=n()) %>% 
  arrange(desc(employment_count))
Click to show/hide code
ggplot(employement_status, aes(x = reorder(employment_status, -employment_count), y = employment_count)) +
  geom_bar(stat = "identity", fill=c("#EF5350", "#D32F2F", "#BA68C8", "#9C27B0")) +
  geom_text(aes(label = employment_count), hjust = 0.5, color = "black", size = 3)+
  labs(title = "Employment Status Distribution Among Patients",
    x = "Employment Status",
    y = "Number of Patients") +
  scale_y_continuous(labels = scales::comma_format()) +  
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Note

Findings: From the plot, it is safe to say that the highest record was of patients were employed whereas the self-employed had the least record count. However, there is a small margin between the employed and the unemployed. We also have 20 patients who didn’t record their employment status

5.1 Employment ft ever acquired insurance

Click to show/hide code
employment_insurance2 <- health_insurance %>% 
  select(employment_status, have_you_ever_had_health_insurance) %>% 
  group_by(employment_status,have_you_ever_had_health_insurance) %>% 
  summarise(total=n()) %>% 
  ungroup() %>% 
  arrange(desc(total))
Click to show/hide code
ggplot(employment_insurance2, aes(x = reorder(employment_status, total), y = total, fill = have_you_ever_had_health_insurance)) +
  geom_bar(stat = "identity", position = "dodge") +  
  geom_text(aes(label = total), position = position_dodge(width = 0.9), vjust = -0.5, color = "black", size = 3) + 
  labs(
    title = "Health Insurance Status by Employment Type",
    x = "Employment Status",
    y = "Number of Users",
    fill = "Health Insurance"
  ) +
  scale_y_continuous(labels = scales::comma_format()) +  
  coord_flip() + 
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

According to the plot, the employed and self-employed category has the highest number of patients who have insurance whereas the unemployed category has a record of many patients without health insurance

5.2. Employment, ever insurance, type of insurance

Click to show/hide code
employment_insurance <- health_insurance %>%
  select(employment_status, have_you_ever_had_health_insurance, if_yes_which_insurance_cover) %>%
  filter(have_you_ever_had_health_insurance == "Yes", employment_status %in% c("Employed","Self-employed","Unemployed")) %>%
  group_by(employment_status, if_yes_which_insurance_cover) %>%
  summarise(count = n(), .groups = 'drop') %>%
  arrange(employment_status, desc(count)) %>%
  group_by(employment_status) %>%
  slice_max(order_by = count, n = 5) %>% 
  ungroup() %>% 
  mutate(employment_status = reorder(employment_status, count))
Click to show/hide code
ggplot(employment_insurance, aes(x = employment_status, y = count, fill = if_yes_which_insurance_cover)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  geom_text(aes(label = count), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5, 
            size = 3) +
  labs(
    title = "Top 5 Insurance Types by Employment Status",
    x = "Employment Status",
    y = "Count",
    fill = "Insurance Type"
  ) +
  scale_y_continuous(labels = scales::comma_format()) +  
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

6.0 What is the monthly income of the patients?

Click to show/hide code
income_count<-health_insurance %>% 
  group_by(monthly_household_income) %>% 
  summarise(income_count=n()) %>% 
  arrange(desc(income_count))
Click to show/hide code
ggplot(income_count, aes(x = reorder(monthly_household_income, -income_count), y = income_count)) +
  geom_bar(stat = "identity", fill=c("#2E7D32", "#388E3C", "#4CAF50", "#81C784", "#C8E6C9", 
                                       "#FFB74D", "#FF9800")) +
  geom_text(aes(label = income_count), hjust = 0.5, color = "black", size = 3)+
  labs(title = "Distribution of Monthly Household Income",
    x = "Income Categories",
    y = "Number of Patients") +
  scale_y_continuous(labels = scales::comma_format()) +  
  coord_flip() + 
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Note

Findings: The highest record was of patients who earn less than 10,000 where the least record count was of patients who earn 30,000-40,000. We also have 255 patients who didn’t register their monthly household income.

6.1 Monthly income ft ever insurance

Click to show/hide code
income_insurance2 <- health_insurance %>% 
  select(monthly_household_income, have_you_ever_had_health_insurance) %>% 
  group_by(monthly_household_income,have_you_ever_had_health_insurance) %>% 
  summarise(total=n()) %>% 
  ungroup() %>% 
  arrange(desc(total))
Click to show/hide code
ggplot(income_insurance2, aes(x = reorder(monthly_household_income, total), y = total, fill = have_you_ever_had_health_insurance)) +
  geom_bar(stat = "identity", position = "dodge") +  
  geom_text(aes(label = total), position = position_dodge(width = 0.9), vjust = -0.5, color = "black", size = 3) + 
  labs(
    title = "Health Insurance Status by Monthly Household Income",
    x = "Monthly Household Income",
    y = "Number of Users",
    fill = "Health Insurance"
  ) +
  scale_y_continuous(labels = scales::comma_format()) +  
  coord_flip() + 
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

6.2 Monthly income ft ever insurance ft insurance type

Click to show/hide code
income_insurance <- health_insurance %>%
  select(monthly_household_income, have_you_ever_had_health_insurance, if_yes_which_insurance_cover) %>%
  filter(have_you_ever_had_health_insurance == "Yes") %>%
  group_by(monthly_household_income, if_yes_which_insurance_cover) %>%
  summarise(count = n(), .groups = 'drop') %>%
  arrange(monthly_household_income, desc(count)) %>%
  group_by(monthly_household_income) %>%
  slice_max(order_by = count, n = 3) %>% 
  ungroup() 
Click to show/hide code
ggplot(income_insurance, aes(x = monthly_household_income, y = count, fill = if_yes_which_insurance_cover)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  geom_text(aes(label = count), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5, 
            size = 3) +
  labs(
    title = "Top 3 Insurance Types by Monthly Household Income",
    x = "Monthly Household Income",
    y = "Count",
    fill = "Insurance Type"
  ) +
  scale_y_continuous(labels = scales::comma_format()) +  
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

7.0 How many patients have ever had health insurance?

Click to show/hide code
ever_had_insurance <- health_insurance %>% 
  group_by(have_you_ever_had_health_insurance) %>% 
  summarise(insurance_count=n()) %>% 
  arrange(desc(insurance_count))
Click to show/hide code
ggplot(ever_had_insurance, aes(x = reorder(have_you_ever_had_health_insurance, -insurance_count), y = insurance_count)) +
  geom_bar(stat = "identity", fill=c("#EF5350", "#BA68C8")) +
  geom_text(aes(label = insurance_count), hjust = 0.5, color = "black", size = 3)+
  labs(title = "Have Patients Ever Had Health Insurance?",
    x = "Response (Yes/No)",
    y = "Number of Patients") +
  scale_y_continuous(labels = scales::comma_format()) +  
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

8.0 What type of insurance do the patients have?

Click to show/hide code
#since we have insurance with the same name but different cases, we can standardize them to lower case so as to easily merge them
health_insurance <- health_insurance %>%
  mutate(if_yes_which_insurance_cover = str_to_lower(if_yes_which_insurance_cover))
Click to show/hide code
#we also have same insurance appearing twice in different names, lets normalize it 
health_insurance <- health_insurance %>%
  mutate(if_yes_which_insurance_cover = case_when(
    str_detect(if_yes_which_insurance_cover, regex("apa", ignore_case = TRUE)) ~ "APA Insurance",
    str_detect(if_yes_which_insurance_cover, regex("jubilee", ignore_case = TRUE)) ~ "Jubilee Insurance",
    str_detect(if_yes_which_insurance_cover, regex("cic", ignore_case = TRUE)) ~ "CIC Insurance",
    TRUE ~ if_yes_which_insurance_cover
  ))
Click to show/hide code
insurance_type <- health_insurance %>% 
  group_by(if_yes_which_insurance_cover) %>% 
  summarise(insurance_type_count = n()) %>% 
  arrange(desc(insurance_type_count)) %>% 
  head(10)
Click to show/hide code
ggplot(insurance_type, aes(x = reorder(if_yes_which_insurance_cover, insurance_type_count), y = insurance_type_count, fill = insurance_type_count)) +
 geom_bar(stat = "identity", fill=c("#2E7D32", "#388E3C", "#4CAF50", "#81C784", "#C8E6C9", 
                                       "#FFB74D", "#FF9800", "#F57C00", "#FF5722", "#E57373")) +
  geom_text(aes(label = insurance_type_count), hjust = 0.5, color = "black", size = 3)+
  labs(
    title = "Top 10 Most Common Insurance Types",
    x = "Insurance Type",
    y = "Number of Users"
  ) +
  scale_y_continuous(labels = scales::comma_format()) +  
  coord_flip() + 
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Note

Findings: According to the plot, NHIF is the most common insurance used by the patients, however, we also have a high record of patients who do not use any insurance cover (represented as “not applicable”)

9.0 Did patient have insurance cover during their last hospital visit?

Click to show/hide code
hospital_cover <- health_insurance %>% 
  group_by(did_you_have_health_insurance_during_your_last_hospital_visit) %>% 
  summarise(hospital_cover_count=n()) %>% 
  arrange(desc(hospital_cover_count))
Click to show/hide code
ggplot(hospital_cover, aes(x = reorder(did_you_have_health_insurance_during_your_last_hospital_visit, -hospital_cover_count), y = hospital_cover_count)) +
  geom_bar(stat = "identity", fill=c("#2E7D32", "#81C784", 
                                       "#FFB74D")) +
  geom_text(aes(label = hospital_cover_count), hjust = 0.5, color = "black", size = 3)+
  labs(title = "Health Insurance Coverage During Last Hospital Visit", 
       x = "Response (Yes or No)",
    y = "Number of Patients") +
  scale_y_continuous(labels = scales::comma_format()) +  
  coord_flip() + 
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

10.0 Any record of previous routine check-up with the doctor or health provider?

Click to show/hide code
record_count <- health_insurance %>% 
  group_by(have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider) %>% 
  summarise(routine_check=n()) %>% 
  arrange(desc(routine_check))
Click to show/hide code
ggplot(record_count, aes(x = reorder(have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider, -routine_check), y = routine_check)) +
  geom_bar(stat = "identity", fill=c("#FFB74D", "#FF9800", "#F57C00")) +
  geom_text(aes(label = routine_check), hjust = 0.5, color = "black", size = 3)+
  labs(title = "Records of Routine Check-Ups with Healthcare Providers",
    x = "Response (Yes or No)",
    y = "Number of Patients") +
  scale_y_continuous(labels = scales::comma_format()) +  
  coord_flip() + 
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Note

Findings: Most patients do not have a history of routine check ups with healthcare providers or doctors

10.0 Ever had cancer screening?

Click to show/hide code
cancer_screening <- health_insurance %>% 
  group_by(have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc) %>% 
  summarise(screening_count=n()) %>% 
   arrange(desc(screening_count))
Click to show/hide code
ggplot(cancer_screening, aes(x = reorder(have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc, -screening_count), y = screening_count)) +
  geom_bar(stat = "identity", fill=c( "#BA68C8", "#9C27B0", "#7B1FA2")) +
  geom_text(aes(label = screening_count), hjust = 0.5, color = "black", size = 3)+
  labs(title = "Cancer Screening History Among Patients",
    x = "Cancer Screening (Yes/No)",
    y = "Number of Patients") +
  scale_y_continuous(labels = scales::comma_format()) +  
  coord_flip() + 
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Note

Findings: There is a very high count of patients who do not have records of cancer screenings

11.0 Number of women who have done cancer screening

Click to show/hide code
cancer_screen <- health_insurance %>% 
  select(gender, have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc) %>% 
  group_by(gender, have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc) %>% 
  summarise(total=n()) %>% 
  ungroup() %>% 
  print()
# A tibble: 8 × 3
  gender  have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy…¹ total
  <fct>   <chr>                                                            <int>
1 Female  No                                                                2172
2 Female  Unspecified                                                         13
3 Female  Yes                                                                842
4 Male    No                                                                2471
5 Male    Unspecified                                                         14
6 Male    Yes                                                                618
7 Unknown No                                                                   8
8 Unknown Yes                                                                  2
# ℹ abbreviated name:
#   ¹​have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc
Click to show/hide code
ggplot(cancer_screen, aes(x = gender, y = total, fill = have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc )) +
  geom_bar(stat = "identity", position = "dodge") + 
  geom_text(aes(label = total), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5,  
            color = "black", 
            size = 3) +   
  labs(
    title = "Cancer Screening Status by Gender",
    x = "Gender", 
    y = "Number of Patients",
    fill = "Ever Undergone Cancer Screening?"
  ) +
  scale_y_continuous(labels = scales::comma_format()) +
  theme_minimal()

10.1 Cancer ft ever ft insurance type

Click to show/hide code
cancer_insurance <- health_insurance %>% 
  select(have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc,have_you_ever_had_health_insurance, if_yes_which_insurance_cover) %>%
  filter(have_you_ever_had_health_insurance == "Yes",have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc %in% c("Yes","No")) %>%
  group_by(have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc, if_yes_which_insurance_cover) %>%
  summarise(count = n(), .groups = 'drop') %>%
  arrange(have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc, desc(count)) %>%
  group_by(have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc) %>%
  slice_max(order_by = count, n = 5) %>% 
  ungroup() 
Click to show/hide code
ggplot(cancer_insurance, aes(x = have_you_ever_had_a_cancer_screening_e_g_mammogram_colonoscopy_etc, y = count, fill = if_yes_which_insurance_cover)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  geom_text(aes(label = count), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5, 
            size = 3) +
  labs(
    title = "Health Insurance Types by Cancer Screening Status",
    x = "Cancer Screening Status (Yes/No)",
    y = "Number of Patients",
    fill = "Insurance Type"
  ) +
  scale_y_continuous(labels = scales::comma_format()) +  
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

12.0 Routine Check by gender

Click to show/hide code
routine_gender <- health_insurance %>% 
  select(have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider, gender) %>% 
  group_by(gender, have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider) %>%
  summarise(routine_count=n()) %>% 
  ungroup() %>% 
  arrange(desc(routine_count))
Click to show/hide code
ggplot(routine_gender, aes(x = gender, y = routine_count, fill = have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider )) +
  geom_bar(stat = "identity", position = "dodge") + 
  geom_text(aes(label = routine_count), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5,  
            color = "black", 
            size = 3) +   
  labs(
    title = "Routine Check-up Status by Gender",
    x = "Gender", 
    y = "Number of Patients",
    fill = "Ever Undergone Routine Checkup?"
  ) +
  scale_y_continuous(labels = scales::comma_format()) +
  theme_minimal()

Note

In this instance, I was looking into the type of insurance covers that people of different employment statuses went for.

12.1 Routine ft ever insurance ft insurance type

Click to show/hide code
routine_insurance <- health_insurance %>%
  select(have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider, have_you_ever_had_health_insurance, if_yes_which_insurance_cover) %>%
  filter(have_you_ever_had_health_insurance == "Yes", have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider %in% c("Yes","No")) %>%
  group_by(have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider, if_yes_which_insurance_cover) %>%
  summarise(count = n(), .groups = 'drop') %>%
  arrange(have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider, desc(count)) %>%
  group_by(have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider) %>%
  slice_max(order_by = count, n = 5) %>% 
  ungroup() 
Click to show/hide code
ggplot(routine_insurance, aes(x =have_you_ever_had_a_routine_check_up_with_a_doctor_or_healthcare_provider, y = count, fill = if_yes_which_insurance_cover)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  geom_text(aes(label = count), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5, 
            size = 3) +
  labs(
    title = "Health Insurance Types Among Patients by Routine Check-Up Status",
    x = "Routine Check-Up Status (Yes/No)",
    y = "Number of Patients",
    fill = "Insurance Type"
  ) +
  scale_y_continuous(labels = scales::comma_format()) +  
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))