library(tidyverse)
library(tidyr)
setwd("/Users/thejitharajapakshe/Desktop/DATA 110")
cities500 <- read.csv("500CitiesLocalHealthIndicators.cdc.csv")Healthy Cities GIS Assignment
Load the libraries and set the working directory
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
latlong <- cities500|>
mutate(GeoLocation = str_replace_all(GeoLocation, "[()]", ""))|>
separate(GeoLocation, into = c("lat", "long"), sep = ",", convert = TRUE)Warning: Expected 2 pieces. Missing pieces filled with `NA` in 56 rows [14, 2379, 2565,
3196, 3256, 3447, 3484, 3552, 3590, 4166, 4245, 4522, 5447, 5684, 6345, 6352,
6499, 6560, 7464, 7528, ...].
head(latlong) Year StateAbbr StateDesc CityName GeographicLevel DataSource
1 2017 CA California Hawthorne Census Tract BRFSS
2 2017 CA California Hawthorne City BRFSS
3 2017 CA California Hayward City BRFSS
4 2017 CA California Hayward City BRFSS
5 2017 CA California Hemet City BRFSS
6 2017 CA California Indio Census Tract BRFSS
Category UniqueID
1 Health Outcomes 0632548-06037602504
2 Unhealthy Behaviors 632548
3 Health Outcomes 633000
4 Unhealthy Behaviors 633000
5 Prevention 633182
6 Health Outcomes 0636448-06065045213
Measure Data_Value_Unit
1 Arthritis among adults aged >=18 Years %
2 Current smoking among adults aged >=18 Years %
3 Coronary heart disease among adults aged >=18 Years %
4 Obesity among adults aged >=18 Years %
5 Cholesterol screening among adults aged >=18 Years %
6 Arthritis among adults aged >=18 Years %
DataValueTypeID Data_Value_Type Data_Value Low_Confidence_Limit
1 CrdPrv Crude prevalence 14.6 13.9
2 CrdPrv Crude prevalence 15.4 15.0
3 AgeAdjPrv Age-adjusted prevalence 4.8 4.7
4 CrdPrv Crude prevalence 24.2 24.1
5 AgeAdjPrv Age-adjusted prevalence 78.0 77.6
6 CrdPrv Crude prevalence 22.0 21.1
High_Confidence_Limit Data_Value_Footnote_Symbol Data_Value_Footnote
1 15.2
2 15.9
3 4.8
4 24.4
5 78.3
6 22.8
PopulationCount lat long CategoryID MeasureId CityFIPS TractFIPS
1 4,407 33.90555 -118.3373 HLTHOUT ARTHRITIS 632548 6037602504
2 84,293 33.91467 -118.3477 UNHBEH CSMOKING 632548 NA
3 144,186 37.63296 -122.0771 HLTHOUT CHD 633000 NA
4 144,186 37.63296 -122.0771 UNHBEH OBESITY 633000 NA
5 78,657 33.73523 -116.9946 PREVENT CHOLSCREEN 633182 NA
6 5,006 33.71446 -116.2582 HLTHOUT ARTHRITIS 636448 6065045213
Short_Question_Text
1 Arthritis
2 Current Smoking
3 Coronary Heart Disease
4 Obesity
5 Cholesterol Screening
6 Arthritis
Filter the dataset
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
latlong_clean <- latlong |>
filter(StateDesc != "United States") |>
filter(Category == "Unhealthy Behaviors") |>
filter(Data_Value_Type == "Crude prevalence") |>
filter(Year == 2017)
head(latlong_clean) Year StateAbbr StateDesc CityName GeographicLevel DataSource
1 2017 CA California Hawthorne City BRFSS
2 2017 CA California Hayward City BRFSS
3 2017 CA California Lakewood City BRFSS
4 2017 AL Alabama Huntsville Census Tract BRFSS
5 2017 AZ Arizona Avondale Census Tract BRFSS
6 2017 AZ Arizona Chandler City BRFSS
Category UniqueID
1 Unhealthy Behaviors 632548
2 Unhealthy Behaviors 633000
3 Unhealthy Behaviors 639892
4 Unhealthy Behaviors 0137000-01089010612
5 Unhealthy Behaviors 0404720-04013082027
6 Unhealthy Behaviors 412000
Measure
1 Current smoking among adults aged >=18 Years
2 Obesity among adults aged >=18 Years
3 Obesity among adults aged >=18 Years
4 Obesity among adults aged >=18 Years
5 Obesity among adults aged >=18 Years
6 No leisure-time physical activity among adults aged >=18 Years
Data_Value_Unit DataValueTypeID Data_Value_Type Data_Value
1 % CrdPrv Crude prevalence 15.4
2 % CrdPrv Crude prevalence 24.2
3 % CrdPrv Crude prevalence 22.1
4 % CrdPrv Crude prevalence 30.3
5 % CrdPrv Crude prevalence 30.6
6 % CrdPrv Crude prevalence 20.9
Low_Confidence_Limit High_Confidence_Limit Data_Value_Footnote_Symbol
1 15.0 15.9
2 24.1 24.4
3 21.9 22.2
4 29.2 31.5
5 29.6 31.5
6 20.6 21.2
Data_Value_Footnote PopulationCount lat long CategoryID MeasureId
1 84,293 33.91467 -118.34767 UNHBEH CSMOKING
2 144,186 37.63296 -122.07705 UNHBEH OBESITY
3 80,048 33.84705 -118.12220 UNHBEH OBESITY
4 2,654 34.76364 -86.75002 UNHBEH OBESITY
5 3,978 33.45053 -112.29254 UNHBEH OBESITY
6 236,123 33.28319 -111.85221 UNHBEH LPA
CityFIPS TractFIPS Short_Question_Text
1 632548 NA Current Smoking
2 633000 NA Obesity
3 639892 NA Obesity
4 137000 1089010612 Obesity
5 404720 4013082027 Obesity
6 412000 NA Physical Inactivity
What variables are included? (can any of them be removed?)
names(latlong_clean) [1] "Year" "StateAbbr"
[3] "StateDesc" "CityName"
[5] "GeographicLevel" "DataSource"
[7] "Category" "UniqueID"
[9] "Measure" "Data_Value_Unit"
[11] "DataValueTypeID" "Data_Value_Type"
[13] "Data_Value" "Low_Confidence_Limit"
[15] "High_Confidence_Limit" "Data_Value_Footnote_Symbol"
[17] "Data_Value_Footnote" "PopulationCount"
[19] "lat" "long"
[21] "CategoryID" "MeasureId"
[23] "CityFIPS" "TractFIPS"
[25] "Short_Question_Text"
Remove the variables that will not be used in the assignment
prevention <- latlong_clean |>
select(-DataSource,-Data_Value_Unit, -DataValueTypeID, -Low_Confidence_Limit, -High_Confidence_Limit, -Data_Value_Footnote_Symbol, -Data_Value_Footnote)
head(prevention) Year StateAbbr StateDesc CityName GeographicLevel Category
1 2017 CA California Hawthorne City Unhealthy Behaviors
2 2017 CA California Hayward City Unhealthy Behaviors
3 2017 CA California Lakewood City Unhealthy Behaviors
4 2017 AL Alabama Huntsville Census Tract Unhealthy Behaviors
5 2017 AZ Arizona Avondale Census Tract Unhealthy Behaviors
6 2017 AZ Arizona Chandler City Unhealthy Behaviors
UniqueID
1 632548
2 633000
3 639892
4 0137000-01089010612
5 0404720-04013082027
6 412000
Measure
1 Current smoking among adults aged >=18 Years
2 Obesity among adults aged >=18 Years
3 Obesity among adults aged >=18 Years
4 Obesity among adults aged >=18 Years
5 Obesity among adults aged >=18 Years
6 No leisure-time physical activity among adults aged >=18 Years
Data_Value_Type Data_Value PopulationCount lat long CategoryID
1 Crude prevalence 15.4 84,293 33.91467 -118.34767 UNHBEH
2 Crude prevalence 24.2 144,186 37.63296 -122.07705 UNHBEH
3 Crude prevalence 22.1 80,048 33.84705 -118.12220 UNHBEH
4 Crude prevalence 30.3 2,654 34.76364 -86.75002 UNHBEH
5 Crude prevalence 30.6 3,978 33.45053 -112.29254 UNHBEH
6 Crude prevalence 20.9 236,123 33.28319 -111.85221 UNHBEH
MeasureId CityFIPS TractFIPS Short_Question_Text
1 CSMOKING 632548 NA Current Smoking
2 OBESITY 633000 NA Obesity
3 OBESITY 639892 NA Obesity
4 OBESITY 137000 1089010612 Obesity
5 OBESITY 404720 4013082027 Obesity
6 LPA 412000 NA Physical Inactivity
md <- prevention |>
filter(StateAbbr %in% c("MD"))
head(md) Year StateAbbr StateDesc CityName GeographicLevel Category
1 2017 MD Maryland Baltimore Census Tract Unhealthy Behaviors
2 2017 MD Maryland Baltimore Census Tract Unhealthy Behaviors
3 2017 MD Maryland Baltimore Census Tract Unhealthy Behaviors
4 2017 MD Maryland Baltimore Census Tract Unhealthy Behaviors
5 2017 MD Maryland Baltimore Census Tract Unhealthy Behaviors
6 2017 MD Maryland Baltimore Census Tract Unhealthy Behaviors
UniqueID
1 2404000-24510120300
2 2404000-24510151300
3 2404000-24510280301
4 2404000-24510080800
5 2404000-24510260604
6 2404000-24510230100
Measure
1 Current smoking among adults aged >=18 Years
2 No leisure-time physical activity among adults aged >=18 Years
3 Obesity among adults aged >=18 Years
4 No leisure-time physical activity among adults aged >=18 Years
5 Binge drinking among adults aged >=18 Years
6 Current smoking among adults aged >=18 Years
Data_Value_Type Data_Value PopulationCount lat long CategoryID
1 Crude prevalence 20.0 3,552 39.31995 -76.61249 UNHBEH
2 Crude prevalence 42.5 4,546 39.33785 -76.66619 UNHBEH
3 Crude prevalence 42.3 4,101 39.31032 -76.70164 UNHBEH
4 Crude prevalence 38.9 1,281 39.30370 -76.59305 UNHBEH
5 Crude prevalence 15.0 1,465 39.27868 -76.53884 UNHBEH
6 Crude prevalence 18.3 1,953 39.27581 -76.61707 UNHBEH
MeasureId CityFIPS TractFIPS Short_Question_Text
1 CSMOKING 2404000 24510120300 Current Smoking
2 LPA 2404000 24510151300 Physical Inactivity
3 OBESITY 2404000 24510280301 Obesity
4 LPA 2404000 24510080800 Physical Inactivity
5 BINGE 2404000 24510260604 Binge Drinking
6 CSMOKING 2404000 24510230100 Current Smoking
The new dataset “Prevention” is a manageable dataset now.
For your assignment, work with the cleaned “Prevention” dataset
1. Once you run the above code, filter this dataset one more time for any particular subset.
Filter chunk here
unique(latlong_clean$StateAbbr) [1] "CA" "AL" "AZ" "FL" "CO" "CT" "IL" "IN" "KS" "GA" "ID" "LA" "ME" "MA" "MI"
[16] "MN" "MO" "NV" "NJ" "NY" "PA" "NC" "ND" "OH" "OK" "OR" "TX" "RI" "SC" "SD"
[31] "TN" "UT" "VT" "VA" "WA" "WI" "WY" "AK" "AR" "DE" "DC" "HI" "IA" "KY" "MD"
[46] "NM" "NH" "MS" "NE" "MT" "WV"
unique(prevention$Measure)[1] "Current smoking among adults aged >=18 Years"
[2] "Obesity among adults aged >=18 Years"
[3] "No leisure-time physical activity among adults aged >=18 Years"
[4] "Binge drinking among adults aged >=18 Years"
mdclean <- md %>% select(-CityFIPS,-CategoryID,-MeasureId,-Short_Question_Text)
smoking_adults <- mdclean |>
filter(Measure == "Current smoking among adults aged >=18 Years")
head(smoking_adults) Year StateAbbr StateDesc CityName GeographicLevel Category
1 2017 MD Maryland Baltimore Census Tract Unhealthy Behaviors
2 2017 MD Maryland Baltimore Census Tract Unhealthy Behaviors
3 2017 MD Maryland Baltimore Census Tract Unhealthy Behaviors
4 2017 MD Maryland Baltimore Census Tract Unhealthy Behaviors
5 2017 MD Maryland Baltimore Census Tract Unhealthy Behaviors
6 2017 MD Maryland Baltimore Census Tract Unhealthy Behaviors
UniqueID Measure
1 2404000-24510120300 Current smoking among adults aged >=18 Years
2 2404000-24510230100 Current smoking among adults aged >=18 Years
3 2404000-24510250500 Current smoking among adults aged >=18 Years
4 2404000-24510180200 Current smoking among adults aged >=18 Years
5 2404000-24510250103 Current smoking among adults aged >=18 Years
6 2404000-24510160300 Current smoking among adults aged >=18 Years
Data_Value_Type Data_Value PopulationCount lat long TractFIPS
1 Crude prevalence 20.0 3,552 39.31995 -76.61249 24510120300
2 Crude prevalence 18.3 1,953 39.27581 -76.61707 24510230100
3 Crude prevalence 30.3 5,468 39.21536 -76.56698 24510250500
4 Crude prevalence 28.9 977 39.29147 -76.63614 24510180200
5 Crude prevalence 23.8 4,050 39.26870 -76.67766 24510250103
6 Crude prevalence 30.1 1,558 39.29858 -76.64441 24510160300
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here
smoking_adults$PopulationCount <- as.numeric(gsub(",", "", smoking_adults$PopulationCount))
total_population <- sum(smoking_adults$PopulationCount, na.rm = TRUE)
smoking_adults_wpercent <- smoking_adults %>%
mutate(PopulationPercentage = (PopulationCount / total_population) * 100)ggplot(smoking_adults_wpercent, aes(x = Data_Value, y = lat)) +
geom_point() +
labs(title = "Smoking Population According to Latitudes in Maryland",
x = "Smoking Prevalence (%)",
y = "Latitude") +
theme_minimal()Warning: Removed 1 rows containing missing values (`geom_point()`).
3. Now create a map of your subsetted dataset.
First map chunk here
library(leaflet)
leaflet() |>
setView(lng = -76.6122, lat = 39.2904, zoom =10.5) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = smoking_adults_wpercent,
radius = smoking_adults_wpercent$PopulationPercentage
)Assuming "long" and "lat" are longitude and latitude, respectively
4. Refine your map to include a mousover tooltip
Refined map chunk here
popupsmoke <- paste0(
"<b>City: </b>", smoking_adults_wpercent$CityName, "<br>",
"<b>State: </b>", smoking_adults_wpercent$StateDesc, "<br>",
"<b>Percentage of Population: </b>", smoking_adults_wpercent$PopulationPercentage, "<br>")leaflet() |>
setView(lng = -76.6122, lat = 39.2904, zoom = 10.5) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = smoking_adults_wpercent,
radius = smoking_adults_wpercent$PopulationCount / 100,
color = "#f2079c",
fillColor = "#93609f",
popup = popupsmoke
)Assuming "long" and "lat" are longitude and latitude, respectively
5. Write a paragraph
In a paragraph, describe the plots you created and what they show.
In this analysis, I looked into the prevalence of smoking among adults in Maryland. Through data visualizations, I sought to understand how smoking rates vary across different regions of the state. Using ggplot2, I generated a scatter plot to explore potential relationships between smoking prevalence and geographical factors, such as latitude. Subsequently, I constructed an interactive map using the leaflet package, where the size of circles corresponds to smoking rates, facilitating the identification of spatial patterns. By incorporating mousover tooltips into the map, I enhanced its interactivity by providing detailed information about each location’s smoking prevalence. This analysis provides valuable insights into the geographic distribution of smoking behaviors in Maryland and can inform targeted public health interventions aimed at reducing smoking prevalence in specific areas.