Preventative Approaches to Health in Maryland

Author

N Bellot Norman

Published

July 1, 2024

An ounce of PREVENTION save LIVES www.cardio.com

Load the libraries and set the working directory

library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.4.1

Warning: package 'dplyr' was built under R version 4.4.1

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(lubridate)
library(leaflet)

Warning: package 'leaflet' was built under R version 4.4.1

library(sf)

Warning: package 'sf' was built under R version 4.4.1

Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE

library(knitr)

setwd("C:/Users/naomi/OneDrive/Desktop/Desktop of 11-08-2022/Community College Classes/DATA 110/Submitted Assignments/GIS Assignment")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")

Rows: 810103 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (17): StateAbbr, StateDesc, CityName, GeographicLevel, DataSource, Categ...
dbl  (6): Year, Data_Value, Low_Confidence_Limit, High_Confidence_Limit, Cit...
num  (1): PopulationCount

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

summary(cities500)

      Year       StateAbbr          StateDesc           CityName        
 Min.   :2016   Length:810103      Length:810103      Length:810103     
 1st Qu.:2016   Class :character   Class :character   Class :character  
 Median :2017   Mode  :character   Mode  :character   Mode  :character  
 Mean   :2017                                                           
 3rd Qu.:2017                                                           
 Max.   :2017                                                           
                                                                        
 GeographicLevel     DataSource          Category           UniqueID        
 Length:810103      Length:810103      Length:810103      Length:810103     
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
   Measure          Data_Value_Unit    DataValueTypeID    Data_Value_Type   
 Length:810103      Length:810103      Length:810103      Length:810103     
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
   Data_Value    Low_Confidence_Limit High_Confidence_Limit
 Min.   : 0.3    Min.   : 0.2         Min.   : 0.30        
 1st Qu.:10.0    1st Qu.: 8.9         1st Qu.:11.20        
 Median :23.0    Median :20.8         Median :25.20        
 Mean   :31.4    Mean   :29.7         Mean   :33.11        
 3rd Qu.:46.0    3rd Qu.:43.2         3rd Qu.:49.20        
 Max.   :95.7    Max.   :94.6         Max.   :96.50        
 NA's   :22792   NA's   :22792        NA's   :22792        
 Data_Value_Footnote_Symbol Data_Value_Footnote PopulationCount    
 Length:810103              Length:810103       Min.   :        1  
 Class :character           Class :character    1st Qu.:     2405  
 Mode  :character           Mode  :character    Median :     3632  
                                                Mean   :    32024  
                                                3rd Qu.:     5040  
                                                Max.   :308745538  
                                                                   
 GeoLocation         CategoryID         MeasureId            CityFIPS      
 Length:810103      Length:810103      Length:810103      Min.   :  15003  
 Class :character   Class :character   Class :character   1st Qu.: 681344  
 Mode  :character   Mode  :character   Mode  :character   Median :2622000  
                                                          Mean   :2606307  
                                                          3rd Qu.:4055000  
                                                          Max.   :5613900  
                                                          NA's   :56       
   TractFIPS         Short_Question_Text
 Min.   :1.073e+09   Length:810103      
 1st Qu.:8.001e+09   Class :character   
 Median :2.608e+10   Mode  :character   
 Mean   :2.593e+10                      
 3rd Qu.:4.011e+10                      
 Max.   :5.602e+10                      
 NA's   :28056

names(cities500) <- tolower(names(cities500))
names(cities500) <- gsub(" ","",names(cities500))

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500 %>%
  mutate(geolocation = str_replace_all(geolocation, "[()]", "")) %>%
  separate(geolocation, into = c("lat", "long"), sep = ",", convert = TRUE)



## Filter the dataset

Remove the StateDesc that includes the United Sates, select **Prevention** as the category (of interest), filter for only measuring **crude prevalence** and select only **2017**.

atlong_clean <- latlong %>%
  select(-datasource, -data_value_unit, -datavaluetypeid, -low_confidence_limit,
         -high_confidence_limit, -data_value_footnote_symbol, -data_value_footnote)

summary(atlong_clean)

      year       stateabbr          statedesc           cityname        
 Min.   :2016   Length:810103      Length:810103      Length:810103     
 1st Qu.:2016   Class :character   Class :character   Class :character  
 Median :2017   Mode  :character   Mode  :character   Mode  :character  
 Mean   :2017                                                           
 3rd Qu.:2017                                                           
 Max.   :2017                                                           
                                                                        
 geographiclevel      category           uniqueid           measure         
 Length:810103      Length:810103      Length:810103      Length:810103     
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
 data_value_type      data_value    populationcount          lat       
 Length:810103      Min.   : 0.3    Min.   :        1   Min.   :21.26  
 Class :character   1st Qu.:10.0    1st Qu.:     2405   1st Qu.:33.59  
 Mode  :character   Median :23.0    Median :     3632   Median :37.33  
                    Mean   :31.4    Mean   :    32024   Mean   :36.92  
                    3rd Qu.:46.0    3rd Qu.:     5040   3rd Qu.:40.75  
                    Max.   :95.7    Max.   :308745538   Max.   :61.34  
                    NA's   :22792                       NA's   :56     
      long          categoryid         measureid            cityfips      
 Min.   :-158.21   Length:810103      Length:810103      Min.   :  15003  
 1st Qu.:-115.07   Class :character   Class :character   1st Qu.: 681344  
 Median : -93.31   Mode  :character   Mode  :character   Median :2622000  
 Mean   : -96.09                                         Mean   :2606307  
 3rd Qu.: -81.43                                         3rd Qu.:4055000  
 Max.   : -70.17                                         Max.   :5613900  
 NA's   :56                                              NA's   :56       
   tractfips         short_question_text
 Min.   :1.073e+09   Length:810103      
 1st Qu.:8.001e+09   Class :character   
 Median :2.608e+10   Mode  :character   
 Mean   :2.593e+10                      
 3rd Qu.:4.011e+10                      
 Max.   :5.602e+10                      
 NA's   :28056

latlong_clean <- latlong %>%
  filter(!str_detect(statedesc, "United States"),  # Remove StateDesc including "United States"
         category == "prevention",               # Select Prevention category
         data_value_type == "crude prevalence",  # Filter for crude prevalence
         year == 2017,                           # Select only 2017
         stateabbr == "MD")                      # Filter for only MD in stateabbr column

What variables are included? (can any of them be removed?)

summary(atlong_clean)

      year       stateabbr          statedesc           cityname        
 Min.   :2016   Length:810103      Length:810103      Length:810103     
 1st Qu.:2016   Class :character   Class :character   Class :character  
 Median :2017   Mode  :character   Mode  :character   Mode  :character  
 Mean   :2017                                                           
 3rd Qu.:2017                                                           
 Max.   :2017                                                           
                                                                        
 geographiclevel      category           uniqueid           measure         
 Length:810103      Length:810103      Length:810103      Length:810103     
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
 data_value_type      data_value    populationcount          lat       
 Length:810103      Min.   : 0.3    Min.   :        1   Min.   :21.26  
 Class :character   1st Qu.:10.0    1st Qu.:     2405   1st Qu.:33.59  
 Mode  :character   Median :23.0    Median :     3632   Median :37.33  
                    Mean   :31.4    Mean   :    32024   Mean   :36.92  
                    3rd Qu.:46.0    3rd Qu.:     5040   3rd Qu.:40.75  
                    Max.   :95.7    Max.   :308745538   Max.   :61.34  
                    NA's   :22792                       NA's   :56     
      long          categoryid         measureid            cityfips      
 Min.   :-158.21   Length:810103      Length:810103      Min.   :  15003  
 1st Qu.:-115.07   Class :character   Class :character   1st Qu.: 681344  
 Median : -93.31   Mode  :character   Mode  :character   Median :2622000  
 Mean   : -96.09                                         Mean   :2606307  
 3rd Qu.: -81.43                                         3rd Qu.:4055000  
 Max.   : -70.17                                         Max.   :5613900  
 NA's   :56                                              NA's   :56       
   tractfips         short_question_text
 Min.   :1.073e+09   Length:810103      
 1st Qu.:8.001e+09   Class :character   
 Median :2.608e+10   Mode  :character   
 Mean   :2.593e+10                      
 3rd Qu.:4.011e+10                      
 Max.   :5.602e+10                      
 NA's   :28056

Remove the variables that will not be used in the assignment

prevention <- atlong_clean %>%
  filter(short_question_text %in% c("Cholesterol Screening", "Taking BP Medication")) %>% # Filter for specific measures
select(-cityname, -categoryid, -uniqueid, -measure, -categoryid, -measureid, 
         -cityfips, -tractfips) %>%
  mutate(data_value_category = case_when(
    data_value >= 19 & data_value <= 29.7 ~ "Low",
    data_value >= 30 & data_value <= 39.8 ~ "Medium",
    data_value >= 40 & data_value <= 49.7 ~ "High",
    data_value > 50 ~ "Extremely High")) %>%
  mutate(short_question_text = recode(short_question_text, 
                                      "Cholesterol Screening" = "Cholesterol_Screening",
                                      "Taking BP Medication" = "Taking_BP_Medication"),
         geographiclevel = if_else(geographiclevel == "Census Tract", geographiclevel, NA_character_)) %>%
  filter(!is.na(geographiclevel))

summary(prevention)

      year       stateabbr          statedesc         geographiclevel   
 Min.   :2017   Length:56008       Length:56008       Length:56008      
 1st Qu.:2017   Class :character   Class :character   Class :character  
 Median :2017   Mode  :character   Mode  :character   Mode  :character  
 Mean   :2017                                                           
 3rd Qu.:2017                                                           
 Max.   :2017                                                           
                                                                        
   category         data_value_type      data_value    populationcount
 Length:56008       Length:56008       Min.   : 9.60   Min.   :    1  
 Class :character   Class :character   1st Qu.:70.40   1st Qu.: 2349  
 Mode  :character   Mode  :character   Median :75.70   Median : 3548  
                                       Mean   :74.74   Mean   : 3679  
                                       3rd Qu.:80.30   3rd Qu.: 4849  
                                       Max.   :95.70   Max.   :28960  
                                       NA's   :1588                   
      lat             long         short_question_text data_value_category
 Min.   :21.26   Min.   :-158.21   Length:56008        Length:56008       
 1st Qu.:33.58   1st Qu.:-114.63   Class :character    Class :character   
 Median :37.33   Median : -93.29   Mode  :character    Mode  :character   
 Mean   :36.91   Mean   : -96.03                                          
 3rd Qu.:40.75   3rd Qu.: -81.40                                          
 Max.   :61.34   Max.   : -70.17

The new dataset “Prevention” is a manageable dataset now.

For your assignment, work with the cleaned “Prevention” dataset

1. Once you run the above code, filter this dataset one more time for any particular subset.

summary(prevention)

      year       stateabbr          statedesc         geographiclevel   
 Min.   :2017   Length:56008       Length:56008       Length:56008      
 1st Qu.:2017   Class :character   Class :character   Class :character  
 Median :2017   Mode  :character   Mode  :character   Mode  :character  
 Mean   :2017                                                           
 3rd Qu.:2017                                                           
 Max.   :2017                                                           
                                                                        
   category         data_value_type      data_value    populationcount
 Length:56008       Length:56008       Min.   : 9.60   Min.   :    1  
 Class :character   Class :character   1st Qu.:70.40   1st Qu.: 2349  
 Mode  :character   Mode  :character   Median :75.70   Median : 3548  
                                       Mean   :74.74   Mean   : 3679  
                                       3rd Qu.:80.30   3rd Qu.: 4849  
                                       Max.   :95.70   Max.   :28960  
                                       NA's   :1588                   
      lat             long         short_question_text data_value_category
 Min.   :21.26   Min.   :-158.21   Length:56008        Length:56008       
 1st Qu.:33.58   1st Qu.:-114.63   Class :character    Class :character   
 Median :37.33   Median : -93.29   Mode  :character    Mode  :character   
 Mean   :36.91   Mean   : -96.03                                          
 3rd Qu.:40.75   3rd Qu.: -81.40                                          
 Max.   :61.34   Max.   : -70.17

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

ggplot(prevention, aes(x = data_value, fill = short_question_text)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density Plot of Cholesterol Screening and Taking BP Medication in Maryland",
       x = "Data Value (%)",
       y = "Density",
       fill = "Measure") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.position = "bottom",
    legend.title = element_text(face = "bold")
  )

Warning: Removed 1588 rows containing non-finite outside the scale range
(`stat_density()`).

##When I correct the shapfile, I can run this code

#{r} outreach_map <- leaflet(data = low_medium_sf) %>% addProviderTiles("Esri.WorldPhysical") %>% addCircleMarkers( ~longitude, ~latitude, color = ~case_when( data_value_category == "Low" ~ "red", data_value_category == "Medium" ~ "orange" ), popup = ~paste(short_question_text, "<br>", "Census Tract:", tractfips, "<br>", "Value:", data_value), label = ~paste("Value:", data_value), radius = 5, stroke = FALSE, fillOpacity = 0.7, labelOptions = labelOptions(noHide = TRUE, direction = 'auto') )

```

5. Write a paragraph

The Center for Disease Control oversaw the administration of this study. In addition to filtering the data, my aim is to investigate the relationship between cholesterol and blood pressure, as both are major contributors to heart disease, a leading cause of death in the United States. This inquiry focuses on Maryland in 2017.

The density plot illustrates that “Cholesterol Screening” shows a very high peak, indicating a higher and more consistent reporting percentage across different census tracts compared to “Taking BP Medication.” Cholesterol Screening’s peak represents an extremely high consistency in reporting percentages.

The density curve for “Taking BP Medication” shows larger variability in data_value percentages. In contrast, “Cholesterol Screening” presents a somewhat normal bell-shaped curve with even distribution at the 68, 98, and 99.7 percentiles, though the tail slightly veers to the left. Respondents are more likely to participate in cholesterol screening than in taking their BP medication.

The density plot for “Taking BP Medication” is much lower with the tail extending to the left, indicating lower scores. The width of this density curve shows large variation in respondents’ reported behavior regarding “Taking BP Medication.”

There is some overlap between the two distributions, suggesting similarities in the variables. Overall, Maryland’s residents show promising attitudes, behavior, and awareness. Public health campaigns in Maryland are showing positive signs, but more work is needed.

In my map, I intend to identify areas of opportunity, particularly focusing on areas that reported low and medium results.