── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidyr)library(leaflet)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Rows: 810103 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (17): StateAbbr, StateDesc, CityName, GeographicLevel, DataSource, Categ...
dbl (6): Year, Data_Value, Low_Confidence_Limit, High_Confidence_Limit, Cit...
num (1): PopulationCount
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
# Filtering cholesterol screening among adults aged >= 18 in measure categorycholesterol_screening_subset <- prevention |>filter(Measure =="Cholesterol screening among adults aged >=18 Years")
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here
# Create a scatter plot for cholesterol screening rates by cityggplot(cholesterol_screening_subset, aes(x = long, y = lat, color = Data_Value)) +geom_point() +labs(title ="Cholesterol Screening Prevalence by City",x ="Longitude",y ="Latitude",color ="Prevalence Rate (%)") +theme_minimal()
3. Now create a map of your subsetted dataset.
First map chunk here
# Clean new datacholesterol_screening_subset$long <-as.numeric(cholesterol_screening_subset$long)cholesterol_screening_subset$lat <-as.numeric(cholesterol_screening_subset$lat)cholesterol_screening_subset <-na.omit(cholesterol_screening_subset)
world_map <-map_data("world")
ggplot() +geom_polygon(data = world_map, aes(x = long, y = lat, group = group), fill ="lightgrey", color ="white") +geom_point(data = cholesterol_screening_subset, aes(x = long, y = lat, color = Data_Value), size =2) +scale_color_gradient(low ="purple", high ="orange") +labs(title ="Map of Cholesterol Screening Prevalence",x ="Longitude", y ="Latitude",color ="Prevalence Rate (%)") +xlim(min(cholesterol_screening_subset$long, na.rm =TRUE) -5,max(cholesterol_screening_subset$long, na.rm =TRUE) +5) +ylim(min(cholesterol_screening_subset$lat, na.rm =TRUE) -5,max(cholesterol_screening_subset$lat, na.rm =TRUE) +5) +theme_minimal() +theme(legend.position ="bottom") +coord_fixed(1.3)
In a paragraph, describe the plots you created and what they show.
The first map created based off the “Prevention” dataset shows cholesterol screenings among adults aged >= 18 across various locations. This map uses a color gradient of purple and orange to indicate screening prevelance with purple being lower percentage and orange being higher. Leaflet was used to create the interactive map which makes the second map more useful. When you click on a city on the interactive map, a pop up shows the city name and the screening rate percentage. Both maps are still useful in their own ways by helping in identifying areas with different levels of cholesterol health interventions making it easier to see areas of improvement.