Healthy Cities GIS Assignment

Author

Ayan Elmi

Load the libraries and set the working directory

library(tidyverse)
library(tidyr)
library (leaflet)
library(knitr)
library(webshot2)
setwd("~/Desktop/Data 110")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)

The GeoLocation variable has (lat, long) format

Split GeoLocation (lat, long) into two columns: lat and long

latlong <- cities500|>
  mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
  separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)

# A tibble: 6 × 25
   Year StateAbbr StateDesc  CityName  GeographicLevel DataSource Category      
  <dbl> <chr>     <chr>      <chr>     <chr>           <chr>      <chr>         
1  2017 CA        California Hawthorne Census Tract    BRFSS      Health Outcom…
2  2017 CA        California Hawthorne City            BRFSS      Unhealthy Beh…
3  2017 CA        California Hayward   City            BRFSS      Health Outcom…
4  2017 CA        California Hayward   City            BRFSS      Unhealthy Beh…
5  2017 CA        California Hemet     City            BRFSS      Prevention    
6  2017 CA        California Indio     Census Tract    BRFSS      Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Filter the dataset

Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.

latlong_clean <- latlong |>
  filter(StateDesc != "United States") |>
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == 2017) |>
  filter(StateAbbr == "CT") |>
  filter(Category == "Unhealthy Behaviors")
head(latlong_clean)

# A tibble: 6 × 25
   Year StateAbbr StateDesc   CityName   GeographicLevel DataSource Category    
  <dbl> <chr>     <chr>       <chr>      <chr>           <chr>      <chr>       
1  2017 CT        Connecticut Bridgeport Census Tract    BRFSS      Unhealthy B…
2  2017 CT        Connecticut Danbury    City            BRFSS      Unhealthy B…
3  2017 CT        Connecticut Norwalk    Census Tract    BRFSS      Unhealthy B…
4  2017 CT        Connecticut Bridgeport Census Tract    BRFSS      Unhealthy B…
5  2017 CT        Connecticut Hartford   Census Tract    BRFSS      Unhealthy B…
6  2017 CT        Connecticut Waterbury  Census Tract    BRFSS      Unhealthy B…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

The new dataset “latlong_clean2” is a manageable dataset now.

For your assignment, work with a cleaned dataset where you perform your own cleaning and filtering.

1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.

Filter chunk here (you may need multiple chunks)

Checking what states are there

unique(latlong$StateDesc)

 [1] "California"    "United States" "Alabama"       "Alaska"       
 [5] "Arizona"       "Arkansas"      "Connecticut"   "Delaware"     
 [9] "District of C" "Florida"       "Colorado"      "Illinois"     
[13] "Indiana"       "Iowa"          "Kansas"        "Georgia"      
[17] "Idaho"         "Kentucky"      "Louisiana"     "Maine"        
[21] "Massachusetts" "Michigan"      "Minnesota"     "Mississippi"  
[25] "Missouri"      "Montana"       "Nebraska"      "New York"     
[29] "Nevada"        "New Hampshire" "New Jersey"    "New Mexico"   
[33] "Pennsylvania"  "North Carolin" "North Dakota"  "Ohio"         
[37] "Oklahoma"      "Oregon"        "Texas"         "Rhode Island" 
[41] "South Carolin" "South Dakota"  "Tennessee"     "Utah"         
[45] "Vermont"       "Virginia"      "Washington"    "West Virginia"
[49] "Wisconsin"     "Wyoming"       "Hawaii"        "Maryland"

Filtering for diabetes in Southern States

latlong_2 <- latlong |>
  filter(StateDesc != "United States") |>
  filter(Data_Value_Type == "Crude prevalence") |>
  filter(Year == 2017) |>
  filter(StateDesc %in% c("Alabama",  "Louisiana" , "Mississippi")) |>
  filter( MeasureId == "DIABETES")

head(latlong_2)

# A tibble: 6 × 25
   Year StateAbbr StateDesc   CityName   GeographicLevel DataSource Category    
  <dbl> <chr>     <chr>       <chr>      <chr>           <chr>      <chr>       
1  2017 AL        Alabama     Huntsville Census Tract    BRFSS      Health Outc…
2  2017 LA        Louisiana   Lafayette  City            BRFSS      Health Outc…
3  2017 MS        Mississippi Gulfport   City            BRFSS      Health Outc…
4  2017 AL        Alabama     Huntsville Census Tract    BRFSS      Health Outc…
5  2017 AL        Alabama     Birmingham Census Tract    BRFSS      Health Outc…
6  2017 AL        Alabama     Birmingham Census Tract    BRFSS      Health Outc…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
#   DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
#   Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
#   Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

Removinng unused columns

latlong_filtered<- latlong_2 |>
  select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(latlong_filtered)

# A tibble: 6 × 18
   Year StateAbbr StateDesc   CityName GeographicLevel Category UniqueID Measure
  <dbl> <chr>     <chr>       <chr>    <chr>           <chr>    <chr>    <chr>  
1  2017 AL        Alabama     Huntsvi… Census Tract    Health … 0137000… Diagno…
2  2017 LA        Louisiana   Lafayet… City            Health … 2240735  Diagno…
3  2017 MS        Mississippi Gulfport City            Health … 2829700  Diagno…
4  2017 AL        Alabama     Huntsvi… Census Tract    Health … 0137000… Diagno…
5  2017 AL        Alabama     Birming… Census Tract    Health … 0107000… Diagno…
6  2017 AL        Alabama     Birming… Census Tract    Health … 0107000… Diagno…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
#   PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
#   MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>

2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.

Plot 1:Geographic Location vs Diabetes Prevalence

ggplot(latlong_filtered, aes(x = lat, y = Data_Value, color = StateDesc)) +
  labs(x = "Latitude",
       y = "Diabetes Prevalence (%)",
       title = " Geographic Location vs Diabetes Prevalence in 2017") +
  theme_minimal(base_size = 14, base_family = "serif") + 
  geom_jitter( alpha = 0.5) +
  scale_color_brewer(name = "State", palette = "Set1")

Warning: Removed 32 rows containing missing values or values outside the scale range
(`geom_point()`).

I noticed that the graph does not show a direct correlation or trend, I expected this since they are all located in southern states and share similar latitudes. However, the graph still still gives us insight on Diabetes prevalence and latitude in southern states. I will expand on this in question 5.

3. Now create a map of your subsetted dataset.

Mean of latitude and longitude of southern states

south_long <-  mean (latlong_filtered$long)
south_lat <- mean (latlong_filtered$lat)

Creating the map

leaflet() |>
  setView(lng = south_long, lat = south_lat, zoom = 4.5) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data = latlong_filtered,
    radius =latlong_filtered$Data_Value*400, #played around with different values
    color = "#14010d",
    fillColor = "#f2079c",
    fillOpacity = 0.95 )

Assuming "long" and "lat" are longitude and latitude, respectively

Adding colour pink fill and black around the circles

leaflet() |>
  setView(lng = south_long, lat = south_lat, zoom = 5.5) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    data = latlong_filtered,
    radius =latlong_filtered$Data_Value*400,
    color = "#14010d",
    fillColor = "#f2079c",
    fillOpacity = 0.9
  )

Assuming "long" and "lat" are longitude and latitude, respectively

4. Refine your map to include a mouse-click tooltip

Refined map chunk here ## making tooltip

popupdiabetes <- paste0(
      "<b>City: </b>", latlong_filtered$CityName, "<br>",
      "<b>State: </b>", latlong_filtered$StateDesc, "<br>",
      "<b>Diabetes percentage: </b>", round(latlong_filtered$Data_Value,1), "%<br>",    #rounding learnt in data 101 
      "<b>Population: </b>", latlong_filtered$PopulationCount, "<br>")

adding tooltip to map

leaflet() |>
  setView(lng = south_long, lat = south_lat, zoom = 5.5) |>
  addProviderTiles("Esri.WorldStreetMap") |>
  addCircles(
    lng= ~long,
    lat= ~lat,
    data = latlong_filtered,
    radius =latlong_filtered$Data_Value*400,
    color = "#14010d",
    fillColor = "#f2079c",
    fillOpacity = 0.9,
    popup= popupdiabetes
  )

5. Write a paragraph

In a paragraph, describe the plots you created and the insights they show.

The first plot (scatter plot) shows the latitude of the southern states (Alabama, Mississippi and Louisiana). I chose to investigate diabetes in southern states because the southern states in the U.S.A tend to have higher rates of diabetes than other states in America. Although, I expected to see a pattern in Geographical latitude and diabetes that wasn’t the case, the points that represented locations didnt show a clear trend. This is likely because there was little variation in latitude between the three states (30-35 N), suggesting that other factors affect diabetes prevalence for example access to healthcare or socioeconomic status.

Additionally, the map showed that most cities cluster around 8-25% percent prevalence of diabetes. The larger circles that are in the cluster represent the higher prevalence of diabetes. Larger cities especially in Louisiana and Mississippi have higher cases of Diabetes. Diabetes is shown on the map to be prevalent around the region meaning its an issue across the southern states.