Milestone4_group12

Milestone #3 Objective

You have the freedom to define the age category and mortality grouping of interest. The age group must be comparable between chosen Medicare/Medicaid and mortality metrics (i.e. adult, child, senior, or a combination). The mortality grouping must include at least two ICD sub-chapters, but may not include all. These datasets will need to be combined to create a single state-level dataset. To prepare each dataset for analysis (milestone #3), you will be expected to:

Medicare/Medicaid Choose one table to focus on (Medicare or Medicaid), or you may choose to focus on both (assume Medicare dataset ages 65+ only) Recode common variable(s) to be combined with other datasets Select columns related to age group of focus Convert payment columns to numeric Create a new variable for spending per enrollee for age group of interest

Mortality Recode common variable(s) to be combined with other datasets Limit dataset to rows with age group and mortality counts of interest (recommend intermediate variable to indicate which rows you would like to keep) Create a new variable that includes a state level sum and summarize to only include one row per state

Population Recode common variable(s) to be combined with other datasets Create a new variable with population per state for the age group of interest and summarize to only include one row per state

Milestone #3 Objective 1, cleaning population data set You have the freedom to define the age category and mortality grouping of interest. The age group must be comparable between chosen Medicare/Medicaid and mortality metrics (i.e. adult, child, senior, or a combination).

Population Recode common variable(s) to be combined with other datasets Create a new variable with population per state for the age group of interest and summarize to only include one row per state :

1- we selected > 65 years old age group as age group of interest. This is compatible between seniors in medicaid and medicare population. We will look into total mortality as well as mortality for malignancies and unknown neoplasm. 2- Variable state is decoded to residence_state_code and residence state to separate sate from state code, then new variables trimmed and formatted to title format we are going to use residence_state as a key across all data-sets 3- some state codes skipped. We are re-assigning residence_state_code 01-51 including DC with state name in alphabetical order 4- we define data type for all variables to align with rest of tables

5- we create a new variable called population_65 for population >65 and summarize to just show 1 row per state for >65 population by sum 65-74 years, 75-84 years and>85 years for each state 6- All variables names renamed based on snake format

We define data types for all varaibles to align with other tables

Milestone #3 Objective 2, cleaning mortality data set You have the freedom to define the age category and mortality grouping of interest. The age group must be comparable between chosen Medicare/Medicaid and mortality metrics (i.e. adult, child, senior, or a combination). The mortality grouping must include at least two ICD sub-chapters, but may not include all. Mortality :Mortality Recode common variable(s) to be combined with other datasets Limit dataset to rows with age group and mortality counts of interest (recommend creating an intermediate variable to indicate which rows to keep) Create a new variable that includes a state level sum and summarize to only include one row per state

1- we renamed all variable with snake format with name and type of variables in alignment across all tables for example_residence_state_code changed to character 2- we selected > 65 years old age group as age group of interest. This is compatible between seniors in medicaid and medicare population. We will look into total mortality as well as mortality for neoplasms unknown and malignant 3- residence_state_code and residence state trimmed we are going to use residence_state as a key variable across all data-sets 4- some state codes skipped. We are re-assigning residence_state_code 01-51 including DC with state name in alphabetical order

5- we created an intermediate variable called keep_raw as logic variable which becomes true when age>65 and we filtered for keep_raw =TRUE 6- we create new variables called total_death_count_65, malignant_death_count_65, unknown_neoplasm_death_count_65 for total mortality and 2 main sub_chapter of mortality in >65 years old population and summarize to just show 1 row per state for >65 population by sum 65-74 years, 75-84 years and>85 years for each state we summarized total mortality>65 years old as well mortality for malignant and unknown neoplasm sub chapters

Milestone #3 Objective 3, cleaning medicaid data set You have the freedom to define the age category and mortality grouping of interest. The age group must be comparable between chosen Medicare/Medicaid and mortality metrics (i.e. adult, child, senior, or a combination). The mortality grouping must include at least two ICD sub-chapters, but may not include all. Medicaid

1- We will proceed with both Medicaid for seniors and Medicare 2- we renamed all variable with snake format with name and type of variables in alignment across all tables for example spending changes to numeric 2- we selected > 65 years old age group as age group of interest. This is compatible between seniors in medicaid and medicare population.

3- residence state trimmed and formatted as title format, United State filtered we are going to use residence_state as a key variable across all data-sets

4- All payments changed to numeric in US dollars 5- we changed different forms of n/a to NA 6- we create new variables total_per_enrollee_payments by dividing seniors_medicaid_spending/seniors_medicaid_enrollment

Milestone #3 Objective 4, cleaning medicaid data set You have the freedom to define the age category and mortality grouping of interest. The age group must be comparable between chosen Medicare/Medicaid and mortality metrics (i.e. adult, child, senior, or a combination). The mortality grouping must include at least two ICD sub-chapters, but may not include all. Medicaid

Choose one table to focus on (Medicare or Medicaid), or you may choose to focus on both
(assume Medicare dataset represents ages 65+ only) Recode common variable(s) to be combined with other datasets Select columns related to age group of focus Convert payment columns to numeric Create a new variable for spending per enrollee for age group of interest 1- We will proceed with both Medicaid for seniors and Medicare 2- we renamed all variable with snake format with name and type of variables in alignment across all tables for example spending changes to numeric 2- we selected > 65 years old age group as age group of interest. This is compatible between seniors in medicaid and medicare population.

3- residence state trimmed and formatted as title format, United State filtered out we are going to use residence_state as a key variable across all data-sets

4- All spending and eligible adult enrollment changed to numeric 5- we change different forms of n/a, 0 to NA, as missing data 6- we create new variables total_per_senior_enrollee_spending by dividing seniors_medicaid_spending/seniors_medicaid_enrollement

********************Milestone #4,*********************

Ojective 1:

Join all datasets together Calculate any remaining data elements needed for analysis

1- we joined cdc_population_cleaned, cdc_mortality_cleaned, cdc_medicare_cleaned, cdc_medicaid_cleaned

2- we calculated medicare utilization per enrollee and medicare per utilization payment as well as senior death rate per 100000, senior malignancy death rate per 100000, total medicare enrollment rate as well as senior medicaid enrollment rate

3- we reordered data in data set

4- we integrated and added US summaries for comparison purposes in our visualization

combine_state_data <- cdc_population_cleaned  %>%
  left_join(cdc_mortality_cleaned, by = "residence_state") %>%
  
  left_join(cdc_medicare_cleaned, by = "residence_state") %>%
  
 
  left_join(cdc_medicaid_cleaned, by = "residence_state") %>%
 
  arrange(residence_state) %>%
  mutate( 
    total_medicare_utilization_per_enrollee= 
       round((total_medicare_utilization/total_medicare_enrollees), digits =2),
    total_medicare_per_utilization_payment =
      round((total_medicare_payments/total_medicare_utilization), digits =2),
    total_senior_death_rate_per_100k = 
      round((total_death_count_65/ population_65) *100000, digits= 2),
    total_senior_malignant_death_rate_per_100k = 
      round((malignant_death_count_65/ population_65) *100000, digits= 2),
    total_senior_unknow_neoplasm_death_rate_per_100k = 
      round((unknown_neoplasm_death_count_65/ population_65) *100000, digits= 2),
    total_medicare_enrollement_rate = 
      round((total_medicare_enrollees/ population_65) *100000, digits= 2),
    seniors_medicaid_enrollement_rate = 
      round((seniors_medicaid_enrollement/ population_65) *100000, digits= 2),
  )  %>%

  relocate(
    total_medicare_utilization_per_enrollee, .after= total_medicare_utilization  
   
    ) %>%
  relocate(
    total_medicare_per_utilization_payment, .after = total_medicare_per_enrollee_payments
    ) %>%
  relocate(
    total_senior_death_rate_per_100k, .after = total_death_count_65
    )  %>%
  relocate(
    total_senior_malignant_death_rate_per_100k, .after = malignant_death_count_65
    ) %>%
  relocate(
    total_senior_unknow_neoplasm_death_rate_per_100k, .after = 
      unknown_neoplasm_death_count_65
    )%>%
  relocate(
    total_medicare_enrollement_rate, .after =  total_medicare_enrollees
    )%>%
  relocate(
    seniors_medicaid_enrollement_rate, .after =  seniors_medicaid_enrollement
    )

combine_state_data

## # A tibble: 51 × 20
##    residence_state_code residence_state      population_65 total_death_count_65
##    <chr>                <chr>                        <int>                <int>
##  1 01                   Alabama                     888817                46667
##  2 02                   Alaska                       97663                 3365
##  3 03                   Arizona                    1333046                57512
##  4 04                   Arkansas                    528101                27283
##  5 05                   California                 5957092               238984
##  6 06                   Colorado                    879653                33294
##  7 07                   Connecticut                 649235                25397
##  8 08                   Delaware                    201646                 7863
##  9 09                   District Of Columbia         85838                 3353
## 10 10                   Florida                    4598386               192675
## # ℹ 41 more rows
## # ℹ 16 more variables: total_senior_death_rate_per_100k <dbl>,
## #   malignant_death_count_65 <int>,
## #   total_senior_malignant_death_rate_per_100k <dbl>,
## #   unknown_neoplasm_death_count_65 <int>,
## #   total_senior_unknow_neoplasm_death_rate_per_100k <dbl>,
## #   total_medicare_enrollees <int>, total_medicare_enrollement_rate <dbl>, …

str(combine_state_data)

## tibble [51 × 20] (S3: tbl_df/tbl/data.frame)
##  $ residence_state_code                            : chr [1:51] "01" "02" "03" "04" ...
##  $ residence_state                                 : chr [1:51] "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  $ population_65                                   : int [1:51] 888817 97663 1333046 528101 5957092 879653 649235 201646 85838 4598386 ...
##  $ total_death_count_65                            : int [1:51] 46667 3365 57512 27283 238984 33294 25397 7863 3353 192675 ...
##  $ total_senior_death_rate_per_100k                : num [1:51] 5250 3446 4314 5166 4012 ...
##  $ malignant_death_count_65                        : int [1:51] 7440 807 9723 4697 44095 6035 4986 1641 710 35521 ...
##  $ total_senior_malignant_death_rate_per_100k      : num [1:51] 837 826 729 889 740 ...
##  $ unknown_neoplasm_death_count_65                 : int [1:51] 159 0 341 111 1022 159 179 36 0 1003 ...
##  $ total_senior_unknow_neoplasm_death_rate_per_100k: num [1:51] 17.9 0 25.6 21 17.2 ...
##  $ total_medicare_enrollees                        : int [1:51] 528983 105831 767513 430724 3436742 518796 363399 171134 70224 2359900 ...
##  $ total_medicare_enrollement_rate                 : num [1:51] 59515 108363 57576 81561 57692 ...
##  $ total_medicare_utilization                      : int [1:51] 512627 90717 720762 409441 3009051 467294 326323 162925 57112 2272148 ...
##  $ total_medicare_utilization_per_enrollee         : num [1:51] 0.97 0.86 0.94 0.95 0.88 0.9 0.9 0.95 0.81 0.96 ...
##  $ total_medicare_payments                         : num [1:51] 5.65e+09 1.05e+09 7.75e+09 4.27e+09 4.33e+10 ...
##  $ total_medicare_per_enrollee_payments            : num [1:51] 10688 9939 10092 9919 12586 ...
##  $ total_medicare_per_utilization_payment          : num [1:51] 11029 11594 10747 10434 14375 ...
##  $ seniors_medicaid_enrollement                    : int [1:51] 134000 14200 184900 80800 1492400 91500 154900 20600 28800 716900 ...
##  $ seniors_medicaid_enrollement_rate               : num [1:51] 15076 14540 13870 15300 25052 ...
##  $ seniors_medicaid_spending                       : num [1:51] 1.47e+09 2.59e+08 1.35e+09 1.38e+09 1.90e+10 ...
##  $ total_medicade_per_senior_enrollee_spending     : num [1:51] 10945 18271 7275 17110 12698 ...

us_data <- combine_state_data %>%
  summarise(
    residence_state_code = "52", 
    residence_state = "United State Sum/Average",
    population_65 = sum(population_65, na.rm = TRUE),
    total_death_count_65 = sum(total_death_count_65, na.rm = TRUE),
    total_senior_death_rate_per_100k = 
      round((total_death_count_65 / population_65) * 100000, digits = 2),
    malignant_death_count_65 = sum(malignant_death_count_65, na.rm = TRUE),
    total_senior_malignant_death_rate_per_100k = 
      round((malignant_death_count_65 / population_65) * 100000, digits = 2),
    unknown_neoplasm_death_count_65 = sum(unknown_neoplasm_death_count_65, na.rm = TRUE),
    total_senior_unknow_neoplasm_death_rate_per_100k = 
      round((unknown_neoplasm_death_count_65 / population_65) * 100000, digits = 2),
    total_medicare_enrollees = sum(total_medicare_enrollees, na.rm = TRUE),
      total_medicare_enrollement_rate = 
      round((total_medicare_enrollees / population_65) * 100000, digits = 2),
    total_medicare_utilization = sum(total_medicare_utilization, na.rm = TRUE),
    total_medicare_utilization_per_enrollee = 
      round((total_medicare_enrollees / total_medicare_utilization), digits = 2),
    total_medicare_payments = sum(total_medicare_payments, na.rm = TRUE),
    total_medicare_per_enrollee_payments = 
      round((total_medicare_payments / total_medicare_enrollees), digits = 2),
    total_medicare_per_utilization_payment = 
      round((total_medicare_payments / total_medicare_utilization), digits = 2),
    seniors_medicaid_enrollement = sum(seniors_medicaid_enrollement, na.rm = TRUE),
    seniors_medicaid_enrollement_rate = 
      round((seniors_medicaid_enrollement / population_65) * 100000, digits = 2),
    seniors_medicaid_spending = sum(seniors_medicaid_spending, na.rm = TRUE),
    total_medicade_per_senior_enrollee_spending = 
      round((seniors_medicaid_spending / seniors_medicaid_enrollement), digits = 2),
      
  )
us_data

## # A tibble: 1 × 20
##   residence_state_code residence_state        population_65 total_death_count_65
##   <chr>                <chr>                          <int>                <int>
## 1 52                   United State Sum/Aver…      55847953              2468669
## # ℹ 16 more variables: total_senior_death_rate_per_100k <dbl>,
## #   malignant_death_count_65 <int>,
## #   total_senior_malignant_death_rate_per_100k <dbl>,
## #   unknown_neoplasm_death_count_65 <int>,
## #   total_senior_unknow_neoplasm_death_rate_per_100k <dbl>,
## #   total_medicare_enrollees <int>, total_medicare_enrollement_rate <dbl>,
## #   total_medicare_utilization <int>, …

combine_state_data_us <- bind_rows(combine_state_data, us_data)
head(combine_state_data_us)

## # A tibble: 6 × 20
##   residence_state_code residence_state population_65 total_death_count_65
##   <chr>                <chr>                   <int>                <int>
## 1 01                   Alabama                888817                46667
## 2 02                   Alaska                  97663                 3365
## 3 03                   Arizona               1333046                57512
## 4 04                   Arkansas               528101                27283
## 5 05                   California            5957092               238984
## 6 06                   Colorado               879653                33294
## # ℹ 16 more variables: total_senior_death_rate_per_100k <dbl>,
## #   malignant_death_count_65 <int>,
## #   total_senior_malignant_death_rate_per_100k <dbl>,
## #   unknown_neoplasm_death_count_65 <int>,
## #   total_senior_unknow_neoplasm_death_rate_per_100k <dbl>,
## #   total_medicare_enrollees <int>, total_medicare_enrollement_rate <dbl>,
## #   total_medicare_utilization <int>, …

str(combine_state_data_us)

## tibble [52 × 20] (S3: tbl_df/tbl/data.frame)
##  $ residence_state_code                            : chr [1:52] "01" "02" "03" "04" ...
##  $ residence_state                                 : chr [1:52] "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  $ population_65                                   : int [1:52] 888817 97663 1333046 528101 5957092 879653 649235 201646 85838 4598386 ...
##  $ total_death_count_65                            : int [1:52] 46667 3365 57512 27283 238984 33294 25397 7863 3353 192675 ...
##  $ total_senior_death_rate_per_100k                : num [1:52] 5250 3446 4314 5166 4012 ...
##  $ malignant_death_count_65                        : int [1:52] 7440 807 9723 4697 44095 6035 4986 1641 710 35521 ...
##  $ total_senior_malignant_death_rate_per_100k      : num [1:52] 837 826 729 889 740 ...
##  $ unknown_neoplasm_death_count_65                 : int [1:52] 159 0 341 111 1022 159 179 36 0 1003 ...
##  $ total_senior_unknow_neoplasm_death_rate_per_100k: num [1:52] 17.9 0 25.6 21 17.2 ...
##  $ total_medicare_enrollees                        : int [1:52] 528983 105831 767513 430724 3436742 518796 363399 171134 70224 2359900 ...
##  $ total_medicare_enrollement_rate                 : num [1:52] 59515 108363 57576 81561 57692 ...
##  $ total_medicare_utilization                      : int [1:52] 512627 90717 720762 409441 3009051 467294 326323 162925 57112 2272148 ...
##  $ total_medicare_utilization_per_enrollee         : num [1:52] 0.97 0.86 0.94 0.95 0.88 0.9 0.9 0.95 0.81 0.96 ...
##  $ total_medicare_payments                         : num [1:52] 5.65e+09 1.05e+09 7.75e+09 4.27e+09 4.33e+10 ...
##  $ total_medicare_per_enrollee_payments            : num [1:52] 10688 9939 10092 9919 12586 ...
##  $ total_medicare_per_utilization_payment          : num [1:52] 11029 11594 10747 10434 14375 ...
##  $ seniors_medicaid_enrollement                    : int [1:52] 134000 14200 184900 80800 1492400 91500 154900 20600 28800 716900 ...
##  $ seniors_medicaid_enrollement_rate               : num [1:52] 15076 14540 13870 15300 25052 ...
##  $ seniors_medicaid_spending                       : num [1:52] 1.47e+09 2.59e+08 1.35e+09 1.38e+09 1.90e+10 ...
##  $ total_medicade_per_senior_enrollee_spending     : num [1:52] 10945 18271 7275 17110 12698 ...

Objective 2:

Visualizations (at least one per group member) One print quality table as requested in scenario One print quality plot or chart as requested in scenario For each visual, include Code used to generate visual Legend (if necessary)

1-2 sentence interpretation

1- We generated a table to show mortality rate per 100,000 among seniors in all 50 states+ DC and compare it with average U.S. mortality rates. Our table includes both total mortality rate and malignancy-related mortality rate. We highlighted total mortality state rates between 10% and 20% above U.S. average in yellow and more than 20 % above U.S. average in red.

2 - We created a bar graph with overlay to show both total and cancer mortality rates per 100,000 among seniors in all 50 states+ DC and compare it with average U.S. mortality rates.

coord_flip() was used to swap x and y axis for better visualization.

We added interpretation both to the table and graph.

3 - We created a table to show Medicare and Medicaid coverage rate among seniors in all 50 states + DC and compare it with average U.S. rate. We differentially highlighted Medicare coverage rate below 85% of U.S. coverage in red, between 85%-90% of U.S. coverage in yellow. We added interpretation both to the table and the graph.

library(pacman)
p_load(tidyverse, DT, formattable)
library (scales)
mortality_rate_standard_us <- combine_state_data_us%>%
  filter(residence_state == "United State Sum/Average") %>% 
  pull(total_senior_death_rate_per_100k)
mortality_rate_standard_us

## [1] 4420.34

yellow_limit <- mortality_rate_standard_us * 1.1 
red_limit <- mortality_rate_standard_us * 1.2

table_mortality <- combine_state_data_us %>%
  select(
    "State Name" = residence_state,
    "Total Deaths Count (65+)" = total_death_count_65,
    "Death Rate per 100k (65+)" = total_senior_death_rate_per_100k,  
    "Malignancy Deaths Count (65+)" = malignant_death_count_65,
    "Malignancy Death Rate per 100k (65+)" = total_senior_malignant_death_rate_per_100k, 
    "Benign neoplasm Deaths Count (65+)" = unknown_neoplasm_death_count_65,
    "Benign Neoplasm Death Rate per 100k (65+)" = total_senior_unknow_neoplasm_death_rate_per_100k
) 
   

mortality_output <- datatable(table_mortality,
          options = list(pageLength = 52, autoWidth = TRUE),
          rownames = FALSE,
          caption = "Mortality Counts and Rates for Seniors (States Compared to US)") %>%
  
  formatStyle(
    columns = 'Death Rate per 100k (65+)',
    valueColumns = 'Death Rate per 100k (65+)',
    backgroundColor = styleInterval(
      cuts = c(mortality_rate_standard_us, yellow_limit, red_limit),
      values = c("#CCFFCC", "white", "#FFFFCC", "#FFCCCC")
    )
  )
print(mortality_output)
paste0(cat("Colors indicate rates 10% (yellow) and 20% (red) above the average."))

## Colors indicate rates 10% (yellow) and 20% (red) above the average.

## character(0)

#Green is equal or below U.S. average; white is up to 10% above U.S. average; #yellow is 10% to 20% above U.S. average; red is more than 20% above U.S. # average

#Three states (Mississipi, Ohlahoma, West Virginia) have the mortality rate #20% above U.S. average mortality rate. #Seven states (Alabama, Arkansas,Indiana, Kentucky, Louisiana, Ohio, Tennessee) #have the mortality rate 10% to 20% above than U.S. average mortality rate.

library(pacman)
p_load(tidyverse, plotly)
library (scales)
library(ggplot2)
library(dplyr)
library(forcats)

graph_mortality <- combine_state_data_us %>%
  select(
    state_name= residence_state,
   total_death_rate = total_senior_death_rate_per_100k,  
   malignancy_death_rate = total_senior_malignant_death_rate_per_100k 
    
)%>%
   mutate(
      state_name = factor(state_name),
     state_name = fct_rev(state_name),
    state_name = fct_relevel(state_name, "United State Sum/Average", after = Inf)
  )
   
plot_mortality <- ggplot(graph_mortality, aes(x =  state_name)) +
                           geom_bar(aes(y= malignancy_death_rate ), 
                                    stat = "identity", 
                                    fill = "#FFCCCC", 
                                    alpha = 0.8) +
  geom_bar(aes(y= total_death_rate),
           stat =  "identity", fill = "#CCFFCC", alpha = 0.4) + 
  coord_flip() +
  labs(
    title = "Total vs. Malignancy Mortality Rates per 100k Seniors",
    x = "State",
    y = "Mortality Rate per 100k Seniors",
   caption= "light bars show total death rate and darker bars show malignancy death rate. 
   Variability in malignancy mortality rates in seniors across states is less than 
   variabiliy in total mortality rates in seniors across states. "
  ) +
  theme_minimal() +
    theme(axis.text.y = element_text(size = 8),
          plot.margin = margin(t = 10, r = 10, b = 20, l = 10, unit = "pt"),
          plot.caption = element_text(hjust = 0.5))

print(plot_mortality)

library(pacman)
p_load(tidyverse, DT, formattable)
library (scales)
coverage_rate_standard_us <- combine_state_data_us%>%
  filter(residence_state == "United State Sum/Average") %>% 
  pull(total_medicare_enrollement_rate)

yellow_limit_medicare <- coverage_rate_standard_us * 0.90
red_limit_medicare <- coverage_rate_standard_us * 0.85

table_coverage <- combine_state_data_us %>%
  select(
    "State Name" = residence_state,
    "TotalMedicare Enrollees" = total_medicare_enrollees,
    "Total Medicare Enrollment Rate" = total_medicare_enrollement_rate,  
    "Total Senior Medicaid Enrollees" = seniors_medicaid_enrollement,
    "Senior Medicaid Enrollment Rate" = seniors_medicaid_enrollement_rate, 
  )
   

coverage_output <- datatable(table_coverage,
          options = list(pageLength = 52, autoWidth = TRUE),
          rownames = FALSE,
          caption = "Medicare and Medicaid Coverage for Seniors (States Compared to US)") %>%
  
  formatStyle(
    columns =  "Total Medicare Enrollment Rate",
    valueColumns =  "Total Medicare Enrollment Rate",
    backgroundColor = styleInterval(
      cuts = c(red_limit_medicare, yellow_limit_medicare,coverage_rate_standard_us),
      values = c("#FFCCCC", "#FFFFCC", "white", "#CCFFCC")
    )
  )
print(coverage_output)
paste0(cat("Colors indicate rates 15% (red) and 10% (yellow) above the average US."))

## Colors indicate rates 15% (red) and 10% (yellow) above the average US.

## character(0)

# Green color highlights enrollment rate equals or more than U.S. average White color highlights enrollment rate up to 10% less than U.S. average Yellow color highlights enrollment rate 10% to 15 % less than U.S. average Red color highlights enrollment rate lower than 15% below U.S. average

Notably, Three states (Florida, Hawaii, Minnesota) have Medicare enrollment rate lower than 15% below U.S. average.

library(pacman)
p_load(tidyverse, plotly)
library (scales)
library(ggplot2)
library(dplyr)
library(forcats)

graph_coverage <- combine_state_data_us %>%
  select(
    state_name= residence_state,
   medicare_coverage_rate = total_medicare_enrollement_rate,  
   medicaid_seniors_coverage_rate= seniors_medicaid_enrollement_rate
    
)%>%
   mutate(
     state_name = factor(state_name),
     state_name = fct_rev(state_name),
    state_name = fct_relevel(state_name, "United State Sum/Average", after = Inf)
  )
   
plot_coverage <- ggplot(graph_coverage, aes(x =  state_name)) +
                           geom_bar(aes(y= medicare_coverage_rate), 
                                    stat = "identity", 
                                    fill = "#FFCCCC", 
                                    alpha = 0.8) +
  geom_bar(aes(y= medicaid_seniors_coverage_rate),
           stat =  "identity", fill = "#CCFFCC", alpha = 0.4) + 
  coord_flip() +
  labs(
    title = "Medicare vs Medicaid Coverage Rates per 100k Seniors",
    x = "State",
    y = "Rate per 100k Seniors",
   caption= "Pink bars show Medicare coverage rate and green bars show 
   medicaid coverage rate for seniors
   Notable variability exists in emrollment both for Medicare and Medicaid
   across different states."
  ) +
  theme_minimal() +
    theme(axis.text.y = element_text(size = 8),
          plot.margin = margin(t = 10, r = 10, b = 20, l = 10, unit = "pt"),
          plot.caption = element_text(hjust = 0.5))

print(plot_coverage)

Rest of this section is from Milestone 2

Identify data types for 5+ data elements/columns/variables: we decided to identify these types Identify 5+ data elements required for your specified scenario. If <5 elements are required to complete the analysis, please choose additional variables of interest in the data set to explore in this milestone. Utilize functions or resources in RStudio to determine the types of each data element (i.e. character, numeric, factor) Identify the desired type/format for each variable—will you need to convert any columns to numeric or another type? 1- ‘residence_state_code’ was integer we changed to character 2- ‘residence_state’ is: character and we will keep it 3- ‘Ten.Year.Age.Groups’ was: character, we need to order data based on age group as a resullt we changed this variable This variable was converted to ‘ten_year_age_groups’ which is: integer and its class is ordered or factor 4- ‘population’ is: integer and we will keep it 5- ‘death_count’ is: integer and we will keep it 6- ‘total_medicare_enrollees’ is: integer and we will keep it 7- ‘total_medicare_utilization’ is: integer and we will keep it 8- ‘Total.Medicare.Part.A.and.or.Part.B.Program.Payments’ was: character. Considering we are going to perform numerical calculation with payments, this data type should be converted. This variable was converted to ‘total_medicare_payments’ which is: double 9- Data type of ‘seniors_medicaid_enrollement’ is: integer and we will keep it 10- Data type of ‘Seniors.Spending’ was: character Considering we are going to perform numerical calculation with payments, this data type should be converted. This variable was converted to ‘seniors_medicaid_spending’ which is: double

1- residence_state_code we already renamed columns with snake format, removed NA, separated residence_state and residence_state and residence_state_code and ordered by residence_state_code and ten_year_age_groups some state codes skipped, we reasigned codes as characters after ordering state names alphabetically

2- residence_state we already renamed columns with snake format, trimmed, removed NA Moreover, in medicare and medicaid dataset we converted them to title format (first letter capital, rest lower case) to align with population and mortality datasets

3- ten_year_age_groups we already renamed columns with snake format, removed NA, converted them to factor (ordered descending) Please note te_year_age _group_code requires cleaning, some data shows up as date, however considering we already converted age groups to factor we do not need this column and will not clean it

4- population we already renamed columns with snake format, removed NA, ordered by residence_state_code and ten_year_age_groups

5- ttotal_death_65”, “malignant_mortality_65”, and “unknown_neoplasm_mortality_65” already renamed columns with snake format, removed NA, we summed 65-74, 75-84 and 85+ groups to create > 65 ordered by residence_state_code total death count and death count for malignant an unknown neoplasm for each state already calculated

6- # Description for medicare enrolls we already renamed columns with snake format, replaced all different forms of 0 or NA/NSD with NA ,

7- # Description for medicare utilization we already renamed columns with snake format, replaced all different forms of 0 or NA/NSD with NA ,

8- # Description for medicare payments we already renamed columns with snake format, replaced all different forms of 0 or NA/NSD with NA , payments had commas and $ signs, we removed them and converted to numeric for future analysis

8- # Description for medicaid senior enrollment we already renamed columns with snake format, replaced all different forms of 0 or NA/NSD with NA

8- # Description for medicaid senior spending we already renamed columns with snake format, replaced all different forms of 0 or NA/NSD with NA , spendings had commas and $ signs, we removed them and converted to numeric for future analysis

Milestone4_group12

Sanaz Mohajeri & Dzmitry Zhmurouski

2025-11-24