Warning: package 'tidyverse' was built under R version 4.4.1
Warning: package 'dplyr' was built under R version 4.4.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)library(leaflet)
Warning: package 'leaflet' was built under R version 4.4.1
library(sf)
Warning: package 'sf' was built under R version 4.4.1
Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(knitr)setwd("C:/Users/naomi/OneDrive/Desktop/Desktop of 11-08-2022/Community College Classes/DATA 110/Submitted Assignments/GIS Assignment")cities500 <-read_csv("500CitiesLocalHealthIndicators.cdc.csv")
Rows: 810103 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (17): StateAbbr, StateDesc, CityName, GeographicLevel, DataSource, Categ...
dbl (6): Year, Data_Value, Low_Confidence_Limit, High_Confidence_Limit, Cit...
num (1): PopulationCount
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(cities500)
Year StateAbbr StateDesc CityName
Min. :2016 Length:810103 Length:810103 Length:810103
1st Qu.:2016 Class :character Class :character Class :character
Median :2017 Mode :character Mode :character Mode :character
Mean :2017
3rd Qu.:2017
Max. :2017
GeographicLevel DataSource Category UniqueID
Length:810103 Length:810103 Length:810103 Length:810103
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Measure Data_Value_Unit DataValueTypeID Data_Value_Type
Length:810103 Length:810103 Length:810103 Length:810103
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Data_Value Low_Confidence_Limit High_Confidence_Limit
Min. : 0.3 Min. : 0.2 Min. : 0.30
1st Qu.:10.0 1st Qu.: 8.9 1st Qu.:11.20
Median :23.0 Median :20.8 Median :25.20
Mean :31.4 Mean :29.7 Mean :33.11
3rd Qu.:46.0 3rd Qu.:43.2 3rd Qu.:49.20
Max. :95.7 Max. :94.6 Max. :96.50
NA's :22792 NA's :22792 NA's :22792
Data_Value_Footnote_Symbol Data_Value_Footnote PopulationCount
Length:810103 Length:810103 Min. : 1
Class :character Class :character 1st Qu.: 2405
Mode :character Mode :character Median : 3632
Mean : 32024
3rd Qu.: 5040
Max. :308745538
GeoLocation CategoryID MeasureId CityFIPS
Length:810103 Length:810103 Length:810103 Min. : 15003
Class :character Class :character Class :character 1st Qu.: 681344
Mode :character Mode :character Mode :character Median :2622000
Mean :2606307
3rd Qu.:4055000
Max. :5613900
NA's :56
TractFIPS Short_Question_Text
Min. :1.073e+09 Length:810103
1st Qu.:8.001e+09 Class :character
Median :2.608e+10 Mode :character
Mean :2.593e+10
3rd Qu.:4.011e+10
Max. :5.602e+10
NA's :28056
## Filter the dataset
Remove the StateDesc that includes the United Sates, select **Prevention** as the category (of interest), filter for only measuring **crude prevalence** and select only **2017**.
year stateabbr statedesc geographiclevel
Min. :2017 Length:56008 Length:56008 Length:56008
1st Qu.:2017 Class :character Class :character Class :character
Median :2017 Mode :character Mode :character Mode :character
Mean :2017
3rd Qu.:2017
Max. :2017
category data_value_type data_value populationcount
Length:56008 Length:56008 Min. : 9.60 Min. : 1
Class :character Class :character 1st Qu.:70.40 1st Qu.: 2349
Mode :character Mode :character Median :75.70 Median : 3548
Mean :74.74 Mean : 3679
3rd Qu.:80.30 3rd Qu.: 4849
Max. :95.70 Max. :28960
NA's :1588
lat long short_question_text data_value_category
Min. :21.26 Min. :-158.21 Length:56008 Length:56008
1st Qu.:33.58 1st Qu.:-114.63 Class :character Class :character
Median :37.33 Median : -93.29 Mode :character Mode :character
Mean :36.91 Mean : -96.03
3rd Qu.:40.75 3rd Qu.: -81.40
Max. :61.34 Max. : -70.17
The new dataset “Prevention” is a manageable dataset now.
For your assignment, work with the cleaned “Prevention” dataset
1. Once you run the above code, filter this dataset one more time for any particular subset.
summary(prevention)
year stateabbr statedesc geographiclevel
Min. :2017 Length:56008 Length:56008 Length:56008
1st Qu.:2017 Class :character Class :character Class :character
Median :2017 Mode :character Mode :character Mode :character
Mean :2017
3rd Qu.:2017
Max. :2017
category data_value_type data_value populationcount
Length:56008 Length:56008 Min. : 9.60 Min. : 1
Class :character Class :character 1st Qu.:70.40 1st Qu.: 2349
Mode :character Mode :character Median :75.70 Median : 3548
Mean :74.74 Mean : 3679
3rd Qu.:80.30 3rd Qu.: 4849
Max. :95.70 Max. :28960
NA's :1588
lat long short_question_text data_value_category
Min. :21.26 Min. :-158.21 Length:56008 Length:56008
1st Qu.:33.58 1st Qu.:-114.63 Class :character Class :character
Median :37.33 Median : -93.29 Mode :character Mode :character
Mean :36.91 Mean : -96.03
3rd Qu.:40.75 3rd Qu.: -81.40
Max. :61.34 Max. : -70.17
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
ggplot(prevention, aes(x = data_value, fill = short_question_text)) +geom_density(alpha =0.5) +labs(title ="Density Plot of Cholesterol Screening and Taking BP Medication in Maryland",x ="Data Value (%)",y ="Density",fill ="Measure") +theme_minimal() +theme(plot.title =element_text(hjust =0.5, face ="bold"),legend.position ="bottom",legend.title =element_text(face ="bold") )
Warning: Removed 1588 rows containing non-finite outside the scale range
(`stat_density()`).
##When I correct the shapfile, I can run this code
The Center for Disease Control oversaw the administration of this study. In addition to filtering the data, my aim is to investigate the relationship between cholesterol and blood pressure, as both are major contributors to heart disease, a leading cause of death in the United States. This inquiry focuses on Maryland in 2017.
The density plot illustrates that “Cholesterol Screening” shows a very high peak, indicating a higher and more consistent reporting percentage across different census tracts compared to “Taking BP Medication.” Cholesterol Screening’s peak represents an extremely high consistency in reporting percentages.
The density curve for “Taking BP Medication” shows larger variability in data_value percentages. In contrast, “Cholesterol Screening” presents a somewhat normal bell-shaped curve with even distribution at the 68, 98, and 99.7 percentiles, though the tail slightly veers to the left. Respondents are more likely to participate in cholesterol screening than in taking their BP medication.
The density plot for “Taking BP Medication” is much lower with the tail extending to the left, indicating lower scores. The width of this density curve shows large variation in respondents’ reported behavior regarding “Taking BP Medication.”
There is some overlap between the two distributions, suggesting similarities in the variables. Overall, Maryland’s residents show promising attitudes, behavior, and awareness. Public health campaigns in Maryland are showing positive signs, but more work is needed.
In my map, I intend to identify areas of opportunity, particularly focusing on areas that reported low and medium results.