Healthy Cities GIS Assignment

Author

Thejitha Rajapkshe

Load the libraries and set the working directory

library(tidyverse)
library(tidyr)
setwd("/Users/thejitharajapakshe/Desktop/DATA 110")
cities500 <- read.csv("500CitiesLocalHealthIndicators.cdc.csv")

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
Warning: Expected 2 pieces. Missing pieces filled with `NA` in 56 rows [14, 2379, 2565,
3196, 3256, 3447, 3484, 3552, 3590, 4166, 4245, 4522, 5447, 5684, 6345, 6352,
6499, 6560, 7464, 7528, ...].
head(latlong)
  Year StateAbbr  StateDesc  CityName GeographicLevel DataSource
1 2017        CA California Hawthorne    Census Tract      BRFSS
2 2017        CA California Hawthorne            City      BRFSS
3 2017        CA California   Hayward            City      BRFSS
4 2017        CA California   Hayward            City      BRFSS
5 2017        CA California     Hemet            City      BRFSS
6 2017        CA California     Indio    Census Tract      BRFSS
             Category            UniqueID
1     Health Outcomes 0632548-06037602504
2 Unhealthy Behaviors              632548
3     Health Outcomes              633000
4 Unhealthy Behaviors              633000
5          Prevention              633182
6     Health Outcomes 0636448-06065045213
                                              Measure Data_Value_Unit
1              Arthritis among adults aged >=18 Years               %
2        Current smoking among adults aged >=18 Years               %
3 Coronary heart disease among adults aged >=18 Years               %
4                Obesity among adults aged >=18 Years               %
5  Cholesterol screening among adults aged >=18 Years               %
6              Arthritis among adults aged >=18 Years               %
  DataValueTypeID         Data_Value_Type Data_Value Low_Confidence_Limit
1          CrdPrv        Crude prevalence       14.6                 13.9
2          CrdPrv        Crude prevalence       15.4                 15.0
3       AgeAdjPrv Age-adjusted prevalence        4.8                  4.7
4          CrdPrv        Crude prevalence       24.2                 24.1
5       AgeAdjPrv Age-adjusted prevalence       78.0                 77.6
6          CrdPrv        Crude prevalence       22.0                 21.1
  High_Confidence_Limit Data_Value_Footnote_Symbol Data_Value_Footnote
1                  15.2                                               
2                  15.9                                               
3                   4.8                                               
4                  24.4                                               
5                  78.3                                               
6                  22.8                                               
  PopulationCount      lat      long CategoryID  MeasureId CityFIPS  TractFIPS
1           4,407 33.90555 -118.3373    HLTHOUT  ARTHRITIS   632548 6037602504
2          84,293 33.91467 -118.3477     UNHBEH   CSMOKING   632548         NA
3         144,186 37.63296 -122.0771    HLTHOUT        CHD   633000         NA
4         144,186 37.63296 -122.0771     UNHBEH    OBESITY   633000         NA
5          78,657 33.73523 -116.9946    PREVENT CHOLSCREEN   633182         NA
6           5,006 33.71446 -116.2582    HLTHOUT  ARTHRITIS   636448 6065045213
     Short_Question_Text
1              Arthritis
2        Current Smoking
3 Coronary Heart Disease
4                Obesity
5  Cholesterol Screening
6              Arthritis

Filter the dataset

Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.

latlong_clean <- latlong |>
  filter(StateDesc != "United States") |>
  filter(Category == "Unhealthy Behaviors") |>
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == 2017)
head(latlong_clean)
  Year StateAbbr  StateDesc   CityName GeographicLevel DataSource
1 2017        CA California  Hawthorne            City      BRFSS
2 2017        CA California    Hayward            City      BRFSS
3 2017        CA California   Lakewood            City      BRFSS
4 2017        AL    Alabama Huntsville    Census Tract      BRFSS
5 2017        AZ    Arizona   Avondale    Census Tract      BRFSS
6 2017        AZ    Arizona   Chandler            City      BRFSS
             Category            UniqueID
1 Unhealthy Behaviors              632548
2 Unhealthy Behaviors              633000
3 Unhealthy Behaviors              639892
4 Unhealthy Behaviors 0137000-01089010612
5 Unhealthy Behaviors 0404720-04013082027
6 Unhealthy Behaviors              412000
                                                         Measure
1                   Current smoking among adults aged >=18 Years
2                           Obesity among adults aged >=18 Years
3                           Obesity among adults aged >=18 Years
4                           Obesity among adults aged >=18 Years
5                           Obesity among adults aged >=18 Years
6 No leisure-time physical activity among adults aged >=18 Years
  Data_Value_Unit DataValueTypeID  Data_Value_Type Data_Value
1               %          CrdPrv Crude prevalence       15.4
2               %          CrdPrv Crude prevalence       24.2
3               %          CrdPrv Crude prevalence       22.1
4               %          CrdPrv Crude prevalence       30.3
5               %          CrdPrv Crude prevalence       30.6
6               %          CrdPrv Crude prevalence       20.9
  Low_Confidence_Limit High_Confidence_Limit Data_Value_Footnote_Symbol
1                 15.0                  15.9                           
2                 24.1                  24.4                           
3                 21.9                  22.2                           
4                 29.2                  31.5                           
5                 29.6                  31.5                           
6                 20.6                  21.2                           
  Data_Value_Footnote PopulationCount      lat       long CategoryID MeasureId
1                              84,293 33.91467 -118.34767     UNHBEH  CSMOKING
2                             144,186 37.63296 -122.07705     UNHBEH   OBESITY
3                              80,048 33.84705 -118.12220     UNHBEH   OBESITY
4                               2,654 34.76364  -86.75002     UNHBEH   OBESITY
5                               3,978 33.45053 -112.29254     UNHBEH   OBESITY
6                             236,123 33.28319 -111.85221     UNHBEH       LPA
  CityFIPS  TractFIPS Short_Question_Text
1   632548         NA     Current Smoking
2   633000         NA             Obesity
3   639892         NA             Obesity
4   137000 1089010612             Obesity
5   404720 4013082027             Obesity
6   412000         NA Physical Inactivity

What variables are included? (can any of them be removed?)

names(latlong_clean)
 [1] "Year"                       "StateAbbr"                 
 [3] "StateDesc"                  "CityName"                  
 [5] "GeographicLevel"            "DataSource"                
 [7] "Category"                   "UniqueID"                  
 [9] "Measure"                    "Data_Value_Unit"           
[11] "DataValueTypeID"            "Data_Value_Type"           
[13] "Data_Value"                 "Low_Confidence_Limit"      
[15] "High_Confidence_Limit"      "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote"        "PopulationCount"           
[19] "lat"                        "long"                      
[21] "CategoryID"                 "MeasureId"                 
[23] "CityFIPS"                   "TractFIPS"                 
[25] "Short_Question_Text"       

Remove the variables that will not be used in the assignment

prevention <- latlong_clean |>
  select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(prevention)
  Year StateAbbr  StateDesc   CityName GeographicLevel            Category
1 2017        CA California  Hawthorne            City Unhealthy Behaviors
2 2017        CA California    Hayward            City Unhealthy Behaviors
3 2017        CA California   Lakewood            City Unhealthy Behaviors
4 2017        AL    Alabama Huntsville    Census Tract Unhealthy Behaviors
5 2017        AZ    Arizona   Avondale    Census Tract Unhealthy Behaviors
6 2017        AZ    Arizona   Chandler            City Unhealthy Behaviors
             UniqueID
1              632548
2              633000
3              639892
4 0137000-01089010612
5 0404720-04013082027
6              412000
                                                         Measure
1                   Current smoking among adults aged >=18 Years
2                           Obesity among adults aged >=18 Years
3                           Obesity among adults aged >=18 Years
4                           Obesity among adults aged >=18 Years
5                           Obesity among adults aged >=18 Years
6 No leisure-time physical activity among adults aged >=18 Years
   Data_Value_Type Data_Value PopulationCount      lat       long CategoryID
1 Crude prevalence       15.4          84,293 33.91467 -118.34767     UNHBEH
2 Crude prevalence       24.2         144,186 37.63296 -122.07705     UNHBEH
3 Crude prevalence       22.1          80,048 33.84705 -118.12220     UNHBEH
4 Crude prevalence       30.3           2,654 34.76364  -86.75002     UNHBEH
5 Crude prevalence       30.6           3,978 33.45053 -112.29254     UNHBEH
6 Crude prevalence       20.9         236,123 33.28319 -111.85221     UNHBEH
  MeasureId CityFIPS  TractFIPS Short_Question_Text
1  CSMOKING   632548         NA     Current Smoking
2   OBESITY   633000         NA             Obesity
3   OBESITY   639892         NA             Obesity
4   OBESITY   137000 1089010612             Obesity
5   OBESITY   404720 4013082027             Obesity
6       LPA   412000         NA Physical Inactivity
md <- prevention |>
  filter(StateAbbr %in% c("MD"))
head(md)
  Year StateAbbr StateDesc  CityName GeographicLevel            Category
1 2017        MD  Maryland Baltimore    Census Tract Unhealthy Behaviors
2 2017        MD  Maryland Baltimore    Census Tract Unhealthy Behaviors
3 2017        MD  Maryland Baltimore    Census Tract Unhealthy Behaviors
4 2017        MD  Maryland Baltimore    Census Tract Unhealthy Behaviors
5 2017        MD  Maryland Baltimore    Census Tract Unhealthy Behaviors
6 2017        MD  Maryland Baltimore    Census Tract Unhealthy Behaviors
             UniqueID
1 2404000-24510120300
2 2404000-24510151300
3 2404000-24510280301
4 2404000-24510080800
5 2404000-24510260604
6 2404000-24510230100
                                                         Measure
1                   Current smoking among adults aged >=18 Years
2 No leisure-time physical activity among adults aged >=18 Years
3                           Obesity among adults aged >=18 Years
4 No leisure-time physical activity among adults aged >=18 Years
5                    Binge drinking among adults aged >=18 Years
6                   Current smoking among adults aged >=18 Years
   Data_Value_Type Data_Value PopulationCount      lat      long CategoryID
1 Crude prevalence       20.0           3,552 39.31995 -76.61249     UNHBEH
2 Crude prevalence       42.5           4,546 39.33785 -76.66619     UNHBEH
3 Crude prevalence       42.3           4,101 39.31032 -76.70164     UNHBEH
4 Crude prevalence       38.9           1,281 39.30370 -76.59305     UNHBEH
5 Crude prevalence       15.0           1,465 39.27868 -76.53884     UNHBEH
6 Crude prevalence       18.3           1,953 39.27581 -76.61707     UNHBEH
  MeasureId CityFIPS   TractFIPS Short_Question_Text
1  CSMOKING  2404000 24510120300     Current Smoking
2       LPA  2404000 24510151300 Physical Inactivity
3   OBESITY  2404000 24510280301             Obesity
4       LPA  2404000 24510080800 Physical Inactivity
5     BINGE  2404000 24510260604      Binge Drinking
6  CSMOKING  2404000 24510230100     Current Smoking

The new dataset “Prevention” is a manageable dataset now.

For your assignment, work with the cleaned “Prevention” dataset

1. Once you run the above code, filter this dataset one more time for any particular subset.

Filter chunk here

unique(latlong_clean$StateAbbr)
 [1] "CA" "AL" "AZ" "FL" "CO" "CT" "IL" "IN" "KS" "GA" "ID" "LA" "ME" "MA" "MI"
[16] "MN" "MO" "NV" "NJ" "NY" "PA" "NC" "ND" "OH" "OK" "OR" "TX" "RI" "SC" "SD"
[31] "TN" "UT" "VT" "VA" "WA" "WI" "WY" "AK" "AR" "DE" "DC" "HI" "IA" "KY" "MD"
[46] "NM" "NH" "MS" "NE" "MT" "WV"
unique(prevention$Measure)
[1] "Current smoking among adults aged >=18 Years"                  
[2] "Obesity among adults aged >=18 Years"                          
[3] "No leisure-time physical activity among adults aged >=18 Years"
[4] "Binge drinking among adults aged >=18 Years"                   
mdclean <- md %>% select(-CityFIPS,-CategoryID,-MeasureId,-Short_Question_Text)
smoking_adults <- mdclean |>
  filter(Measure == "Current smoking among adults aged >=18 Years")
head(smoking_adults)
  Year StateAbbr StateDesc  CityName GeographicLevel            Category
1 2017        MD  Maryland Baltimore    Census Tract Unhealthy Behaviors
2 2017        MD  Maryland Baltimore    Census Tract Unhealthy Behaviors
3 2017        MD  Maryland Baltimore    Census Tract Unhealthy Behaviors
4 2017        MD  Maryland Baltimore    Census Tract Unhealthy Behaviors
5 2017        MD  Maryland Baltimore    Census Tract Unhealthy Behaviors
6 2017        MD  Maryland Baltimore    Census Tract Unhealthy Behaviors
             UniqueID                                      Measure
1 2404000-24510120300 Current smoking among adults aged >=18 Years
2 2404000-24510230100 Current smoking among adults aged >=18 Years
3 2404000-24510250500 Current smoking among adults aged >=18 Years
4 2404000-24510180200 Current smoking among adults aged >=18 Years
5 2404000-24510250103 Current smoking among adults aged >=18 Years
6 2404000-24510160300 Current smoking among adults aged >=18 Years
   Data_Value_Type Data_Value PopulationCount      lat      long   TractFIPS
1 Crude prevalence       20.0           3,552 39.31995 -76.61249 24510120300
2 Crude prevalence       18.3           1,953 39.27581 -76.61707 24510230100
3 Crude prevalence       30.3           5,468 39.21536 -76.56698 24510250500
4 Crude prevalence       28.9             977 39.29147 -76.63614 24510180200
5 Crude prevalence       23.8           4,050 39.26870 -76.67766 24510250103
6 Crude prevalence       30.1           1,558 39.29858 -76.64441 24510160300

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

First plot chunk here

smoking_adults$PopulationCount <- as.numeric(gsub(",", "", smoking_adults$PopulationCount))
total_population <- sum(smoking_adults$PopulationCount, na.rm = TRUE)
smoking_adults_wpercent <- smoking_adults %>%
  mutate(PopulationPercentage = (PopulationCount / total_population) * 100)
ggplot(smoking_adults_wpercent, aes(x = Data_Value, y = lat)) +
  geom_point() +
  labs(title = "Smoking Population According to Latitudes in Maryland",
       x = "Smoking Prevalence (%)",
       y = "Latitude") +
  theme_minimal()
Warning: Removed 1 rows containing missing values (`geom_point()`).

3. Now create a map of your subsetted dataset.

First map chunk here

library(leaflet)

leaflet() |>
  setView(lng = -76.6122, lat = 39.2904, zoom =10.5) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data = smoking_adults_wpercent,
    radius = smoking_adults_wpercent$PopulationPercentage
)
Assuming "long" and "lat" are longitude and latitude, respectively

4. Refine your map to include a mousover tooltip

Refined map chunk here

popupsmoke <- paste0(
      "<b>City: </b>", smoking_adults_wpercent$CityName, "<br>",
      "<b>State: </b>", smoking_adults_wpercent$StateDesc, "<br>",
      "<b>Percentage of Population: </b>", smoking_adults_wpercent$PopulationPercentage, "<br>")
leaflet() |>
  setView(lng = -76.6122, lat = 39.2904, zoom = 10.5) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data = smoking_adults_wpercent,
    radius = smoking_adults_wpercent$PopulationCount / 100,
    color = "#f2079c",
    fillColor = "#93609f",
    popup = popupsmoke
  )
Assuming "long" and "lat" are longitude and latitude, respectively

5. Write a paragraph

In a paragraph, describe the plots you created and what they show.

In this analysis, I looked into the prevalence of smoking among adults in Maryland. Through data visualizations, I sought to understand how smoking rates vary across different regions of the state. Using ggplot2, I generated a scatter plot to explore potential relationships between smoking prevalence and geographical factors, such as latitude. Subsequently, I constructed an interactive map using the leaflet package, where the size of circles corresponds to smoking rates, facilitating the identification of spatial patterns. By incorporating mousover tooltips into the map, I enhanced its interactivity by providing detailed information about each location’s smoking prevalence. This analysis provides valuable insights into the geographic distribution of smoking behaviors in Maryland and can inform targeted public health interventions aimed at reducing smoking prevalence in specific areas.