library(tidyverse)
library(tidyr)
library (leaflet)
library(knitr)
library(webshot2)
setwd("~/Desktop/Data 110")
cities500 <- read_csv("500CitiesLocalHealthIndicators.cdc.csv")
data(cities500)Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)
head(latlong)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
latlong_clean <- latlong |>
filter(StateDesc != "United States") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017) |>
filter(StateAbbr == "CT") |>
filter(Category == "Unhealthy Behaviors")
head(latlong_clean)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CT Connecticut Bridgeport Census Tract BRFSS Unhealthy B…
2 2017 CT Connecticut Danbury City BRFSS Unhealthy B…
3 2017 CT Connecticut Norwalk Census Tract BRFSS Unhealthy B…
4 2017 CT Connecticut Bridgeport Census Tract BRFSS Unhealthy B…
5 2017 CT Connecticut Hartford Census Tract BRFSS Unhealthy B…
6 2017 CT Connecticut Waterbury Census Tract BRFSS Unhealthy B…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
The new dataset “latlong_clean2” is a manageable dataset now.
For your assignment, work with a cleaned dataset where you perform your own cleaning and filtering.
1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.
Filter chunk here (you may need multiple chunks)
Checking what states are there
unique(latlong$StateDesc) [1] "California" "United States" "Alabama" "Alaska"
[5] "Arizona" "Arkansas" "Connecticut" "Delaware"
[9] "District of C" "Florida" "Colorado" "Illinois"
[13] "Indiana" "Iowa" "Kansas" "Georgia"
[17] "Idaho" "Kentucky" "Louisiana" "Maine"
[21] "Massachusetts" "Michigan" "Minnesota" "Mississippi"
[25] "Missouri" "Montana" "Nebraska" "New York"
[29] "Nevada" "New Hampshire" "New Jersey" "New Mexico"
[33] "Pennsylvania" "North Carolin" "North Dakota" "Ohio"
[37] "Oklahoma" "Oregon" "Texas" "Rhode Island"
[41] "South Carolin" "South Dakota" "Tennessee" "Utah"
[45] "Vermont" "Virginia" "Washington" "West Virginia"
[49] "Wisconsin" "Wyoming" "Hawaii" "Maryland"
Filtering for diabetes in Southern States
latlong_2 <- latlong |>
filter(StateDesc != "United States") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017) |>
filter(StateDesc %in% c("Alabama", "Louisiana" , "Mississippi")) |>
filter( MeasureId == "DIABETES")
head(latlong_2)# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 AL Alabama Huntsville Census Tract BRFSS Health Outc…
2 2017 LA Louisiana Lafayette City BRFSS Health Outc…
3 2017 MS Mississippi Gulfport City BRFSS Health Outc…
4 2017 AL Alabama Huntsville Census Tract BRFSS Health Outc…
5 2017 AL Alabama Birmingham Census Tract BRFSS Health Outc…
6 2017 AL Alabama Birmingham Census Tract BRFSS Health Outc…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Removinng unused columns
latlong_filtered<- latlong_2 |>
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(latlong_filtered)# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 AL Alabama Huntsvi… Census Tract Health … 0137000… Diagno…
2 2017 LA Louisiana Lafayet… City Health … 2240735 Diagno…
3 2017 MS Mississippi Gulfport City Health … 2829700 Diagno…
4 2017 AL Alabama Huntsvi… Census Tract Health … 0137000… Diagno…
5 2017 AL Alabama Birming… Census Tract Health … 0107000… Diagno…
6 2017 AL Alabama Birming… Census Tract Health … 0107000… Diagno…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
Plot 1:Geographic Location vs Diabetes Prevalence
ggplot(latlong_filtered, aes(x = lat, y = Data_Value, color = StateDesc)) +
labs(x = "Latitude",
y = "Diabetes Prevalence (%)",
title = " Geographic Location vs Diabetes Prevalence in 2017") +
theme_minimal(base_size = 14, base_family = "serif") +
geom_jitter( alpha = 0.5) +
scale_color_brewer(name = "State", palette = "Set1")Warning: Removed 32 rows containing missing values or values outside the scale range
(`geom_point()`).
I noticed that the graph does not show a direct correlation or trend, I expected this since they are all located in southern states and share similar latitudes. However, the graph still still gives us insight on Diabetes prevalence and latitude in southern states. I will expand on this in question 5.
3. Now create a map of your subsetted dataset.
Mean of latitude and longitude of southern states
south_long <- mean (latlong_filtered$long)
south_lat <- mean (latlong_filtered$lat)Creating the map
leaflet() |>
setView(lng = south_long, lat = south_lat, zoom = 4.5) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = latlong_filtered,
radius =latlong_filtered$Data_Value*400, #played around with different values
color = "#14010d",
fillColor = "#f2079c",
fillOpacity = 0.95 )Assuming "long" and "lat" are longitude and latitude, respectively
Adding colour pink fill and black around the circles
leaflet() |>
setView(lng = south_long, lat = south_lat, zoom = 5.5) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = latlong_filtered,
radius =latlong_filtered$Data_Value*400,
color = "#14010d",
fillColor = "#f2079c",
fillOpacity = 0.9
)Assuming "long" and "lat" are longitude and latitude, respectively
4. Refine your map to include a mouse-click tooltip
Refined map chunk here ## making tooltip
popupdiabetes <- paste0(
"<b>City: </b>", latlong_filtered$CityName, "<br>",
"<b>State: </b>", latlong_filtered$StateDesc, "<br>",
"<b>Diabetes percentage: </b>", round(latlong_filtered$Data_Value,1), "%<br>", #rounding learnt in data 101
"<b>Population: </b>", latlong_filtered$PopulationCount, "<br>")adding tooltip to map
leaflet() |>
setView(lng = south_long, lat = south_lat, zoom = 5.5) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
lng= ~long,
lat= ~lat,
data = latlong_filtered,
radius =latlong_filtered$Data_Value*400,
color = "#14010d",
fillColor = "#f2079c",
fillOpacity = 0.9,
popup= popupdiabetes
)5. Write a paragraph
In a paragraph, describe the plots you created and the insights they show.
The first plot (scatter plot) shows the latitude of the southern states (Alabama, Mississippi and Louisiana). I chose to investigate diabetes in southern states because the southern states in the U.S.A tend to have higher rates of diabetes than other states in America. Although, I expected to see a pattern in Geographical latitude and diabetes that wasn’t the case, the points that represented locations didnt show a clear trend. This is likely because there was little variation in latitude between the three states (30-35 N), suggesting that other factors affect diabetes prevalence for example access to healthcare or socioeconomic status.
Additionally, the map showed that most cities cluster around 8-25% percent prevalence of diabetes. The larger circles that are in the cluster represent the higher prevalence of diabetes. Larger cities especially in Louisiana and Mississippi have higher cases of Diabetes. Diabetes is shown on the map to be prevalent around the region meaning its an issue across the southern states.