Rows: 810103 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (17): StateAbbr, StateDesc, CityName, GeographicLevel, DataSource, Categ...
dbl (6): Year, Data_Value, Low_Confidence_Limit, High_Confidence_Limit, Cit...
num (1): PopulationCount
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data(cities500)
Warning in data(cities500): data set 'cities500' not found
The GeoLocation variable has (lat, long) format
Split GeoLocation (lat, long) into two columns: lat and long
# A tibble: 6 × 25
Year StateAbbr StateDesc CityName GeographicLevel DataSource Category
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne Census Tract BRFSS Health Outcom…
2 2017 CA California Hawthorne City BRFSS Unhealthy Beh…
3 2017 CA California Hayward City BRFSS Health Outcom…
4 2017 CA California Hayward City BRFSS Unhealthy Beh…
5 2017 CA California Hemet City BRFSS Prevention
6 2017 CA California Indio Census Tract BRFSS Health Outcom…
# ℹ 18 more variables: UniqueID <chr>, Measure <chr>, Data_Value_Unit <chr>,
# DataValueTypeID <chr>, Data_Value_Type <chr>, Data_Value <dbl>,
# Low_Confidence_Limit <dbl>, High_Confidence_Limit <dbl>,
# Data_Value_Footnote_Symbol <chr>, Data_Value_Footnote <chr>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
Filter the dataset
Remove the StateDesc that includes the United Sates, select Prevention as the category (of interest), filter for only measuring crude prevalence and select only 2017.
# A tibble: 6 × 18
Year StateAbbr StateDesc CityName GeographicLevel Category UniqueID Measure
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2017 CA California Hawthorne City Unhealt… 632548 Curren…
2 2017 CA California Hayward City Unhealt… 633000 Obesit…
3 2017 CA California Lakewood City Unhealt… 639892 Obesit…
4 2017 AL Alabama Huntsvil… Census Tract Unhealt… 0137000… Obesit…
5 2017 AZ Arizona Avondale Census Tract Unhealt… 0404720… Obesit…
6 2017 AZ Arizona Chandler City Unhealt… 412000 No lei…
# ℹ 10 more variables: Data_Value_Type <chr>, Data_Value <dbl>,
# PopulationCount <dbl>, lat <dbl>, long <dbl>, CategoryID <chr>,
# MeasureId <chr>, CityFIPS <dbl>, TractFIPS <dbl>, Short_Question_Text <chr>
The new dataset “latlong_clean2” is a manageable dataset now.
For your assignment, work with a cleaned dataset where you perform your own cleaning and filtering.
1. Once you run the above code and filter this complicated dataset, perform your own investigation by filtering this dataset however you choose so that you have a subset with no more than 900 observations through some inclusion/exclusion criteria.
Filter chunk here (you may need multiple chunks)
#I researched the most and least walkable cities in the USA in 2017. From multiple articles, it seemed Fayetteville was the clear winner of the least. For the most walkable, it was a tie between New York and San Francisco but New York had too much data so I went with San Francisco to make my observations under 900. I wanted to see the differences in obesity rates between walkable and unwalkable cities in the US so I filtered for obesity in the measure id. However, after doing some of the project, the population difference was making the porject akward so I changed my unwalkable city to Charlotte NC since it was closer in population to San Francisco but was still not considered very walkable. myfilter_data <- latlong_clean2 |>filter(CityName %in%c("San Francisco", "Charlotte" )) |>filter(StateAbbr %in%c("CA", "NC")) |>filter(MeasureId =="OBESITY")
2. Based on the GIS tutorial (Japan earthquakes), create one plot about something in your subsetted dataset.
First plot chunk here
# non map plotggplot(myfilter_data, aes(x = CityName, y = Data_Value)) +geom_point() +labs(title ="Obesity in Selected Cities",x ="City Names",y="Percent of Adults with Obesity",caption ="Source: CDC 500 Healthy Cities ") +theme_bw()
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
I wanted to make a graph that showed obesity rates in walkable and unwalkbale cities. I originally wanted to do a comparison of a few cities that were the most and least walkable, but foudn that I could only do one city each as there were so many observations. Originally I had picked Fayetteville, North Carolina as my least walkable as multiple articles form 2017 had stated it was the least walkable. The most walkable was tied between San Francisco and New York but I went with SF because there were fewer observations, making it easier to manage and handle the data. However, the further along I went in the project using these two cities, I found that the population difference in the two cities were so great that the maps look awkward. So, I changed my unwalkable city to Charlotte, NC which was still unwalkable according to a few articles, and it had a similar population to SF. Finally, as I was mapping these out, I found that because these two cities were essentially on the other side of the US, it was easier to make two maps that had the same zoom so you could more easily see, side by side, the comparison of obesity rates in these two cities.