In recent years, Obesity in America has become more prevalent recording major health issues as in Diabetes and in coronary heart disease. Nearly two-thirds of adult Americans are overweight or obese and according to a study produced by The Journal of the American Medical Association (JAMA) there are more obese persons than overweight. Despite the attention of the health profession, the media, and the public, and mass educational campaigns about the benefits of healthier diets and increased physical activity, the prevalence of obesity in the United States has more than doubled over the past four decades.
The data was collected through social explorer’s Health data. This data is to be used to build awareness of health factors that affect certain health issues. Since our data originally derived from counties, for this specific analysis I have created a STATEFP which has categorized the counties by states in a numerical form. The variables used in the analysis are STATFP which identifies each state in the united states.FEI which stands for the food environment index, ranging from 0 (worst) to 10 (best). Obese_Adults is the variable used to evaluate the percent of obese adults by location.
The basis of this analysis is to visualize our data on the percent of obese adults through two different methods. The first, method used is a non-spatial regression and ggplot that observes the relationship between states and the percent of obese adults. The second method used is through a spatial visual of the percent of obesity among all states and counties, the 11 nations of America and at county level in New York State.
library(readr)
Amer.nation <- read_csv("C:\\Users\\Cespi\\Documents\\712\\American Nations.csv")
#head(Amer.nation)
ct_map <- st_read("tl_2016_us_county.shp", stringsAsFactors = FALSE)
## Reading layer `tl_2016_us_county' from data source `C:\Users\Cespi\Documents\712\tl_2016_us_county\tl_2016_us_county.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3233 features and 17 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.44106
## epsg (SRID): 4269
## proj4string: +proj=longlat +datum=NAD83 +no_defs
library(readr)
Obesity <- read_csv("C:\\Users\\Cespi\\Documents\\712\\obesity.csv")
Obesity <- rename (Obesity,
"County" = Geo_QNAME,
"STATEFP" = Geo_STATE,
"Ad_Diabet" = SE_T009_001,
"Ad_limitFds" = SE_T012_001,
"Access_Exer_Opport" = SE_T012_002,
"Obese_Adults" = SE_T012_003,
"PhysInactive" = SE_T012_004,
"FEI" = SE_T013_001)
Obesity1 <- Obesity
select(Obesity1, STATEFP, County, Obese_Adults, PhysInactive, FEI)
head(Obesity1)
Non-spatial data provides information of a correlation which is independent of all geometric considerations. Using Ecological regressions and ggplots our information may consume more factors that are independent.
Ecological Regression
The first, analysis, produced was a Ecological regression of the relationship between FEI (food environmental index) and the percent of Obese Adults. The ecological regression, shown below illustrates the mean of all obese adults and the food scarcity in each state. The regression evaluated is significant and it can be assumed that the mean food environment index by states are decreased by -1.896 for every percent increase of the average obesity level in the state. The problem with evaluating food scarcity on a ecological level is that we can not evaluate each one on an individual level. This brings in concern of ecological fallacy.
GGPLOT
The plot below, shows a statistical observation of the percent of Obese Adults among the counties within its State. The plot graph present may exhibit beneficial analysis to identifying the percent of obese Adults among states and their counties. The plot below, can be intense and confusing to readers that only would like to know the percent of obese adults by state.
Obese<- Obesity1 %>%
group_by(STATEFP) %>%
summarise(mean_p = mean(Obese_Adults, na.rm = TRUE), mean_s = mean(FEI, na.rm = TRUE))
ecoobs <- lm(mean_p ~ mean_s, data = Obese)
summary(ecoobs)
Call: lm(formula = mean_p ~ mean_s, data = Obese)
Residuals: Min 1Q Median 3Q Max -8.8146 -2.5721 0.6634 2.8373 5.8935
Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.012 4.824 8.916 7.86e-12 * mean_s -1.896 0.676 -2.805 0.0072 — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1
Residual standard error: 3.642 on 49 degrees of freedom Multiple R-squared: 0.1383, Adjusted R-squared: 0.1207 F-statistic: 7.866 on 1 and 49 DF, p-value: 0.007202
library(lemon)
ggplot(Obesity1, aes(x=STATEFP, y=Obese_Adults)) +
geom_point() +
coord_capped_cart(bottom='both', left='none') +
theme_light() + theme(panel.border=element_blank(), axis.line = element_line())
Data is used in Spatial Analysis to define locations, shape, and size. Analyzing our data through spatial analysis, provides us with a general distinction multi-dimensional and auto correlation of the percent of obese adults. The maps below, were created to see the progression of how adding visuals can introduce new information in regards to our research.
Obesity1$Geo_FIPS <- as.integer(Obesity1$Geo_FIPS)
ct_map$GEOID <- as.integer(ct_map$GEOID)
Obesity2 <- Obesity1 %>%
mutate(fips= Geo_FIPS)
ct_map <- ct_map %>%
mutate(fips = parse_integer(GEOID))
Obesitymap <- ct_map %>%
left_join(Obesity2, by = "fips")
##**First Look at Obesity percentages in the United States**
tm_shape(Obesitymap) + tm_polygons("Obese_Adults")
obesitymap1 <- Obesitymap %>%
filter(STATEFP.x != "02") %>%
filter(STATEFP.x != "15") %>%
filter(STATEFP.x != "60") %>%
filter(STATEFP.x != "66") %>%
filter(STATEFP.x != "69") %>%
filter(STATEFP.x != "72") %>%
filter(STATEFP.x != "78")
##**Excluding Hawaii and Alaska**
tm_shape(obesitymap1, projection = 2163) + tm_polygons("Obese_Adults")
To observe the differences between states clearly, inspecting the mean of Obese_Adults will help us differentiate the states percentages of obese adults. The mean is used to avoid extrapolation.
obesitymap1 <- obesitymap1 %>%
mutate(obeseAdults = Obese_Adults - mean(Obesity2$Obese_Adults))
us_states <- obesitymap1 %>%
aggregate_map(by = "STATEFP.x")
By highlighting the state lines gives this spatial visual a more clear understanding of obesity within the nation. Using the legend we can see the differences among states and their counties. From the visual below the most obese areas reside in the eastern south portion of the United States. We can also identify large percentages of obese adults in the border of Mississippi, Arkansas and Louisiana. The maps legend and outlining of all counties and states identifies Mississippi as having the highest percent of obese adults. This is identified because the states obesity ranges from 0 - 20, while other state’s counties represent different distributions of obesity.
tm_shape(obesitymap1, projection = 2163) + tm_polygons("obeseAdults", palette = "-RdBu", border.col = "grey", border.alpha = .4) +
tm_shape(us_states) + tm_borders(lwd = .36, col = "black", alpha = 1)
The American Nation: A History of the United States, was written by Co-authors Mark Carnes and John Garraty. The text depicts the political history of the United States intimately tied with its social, economic and cultural development. Using the mapping file of the American Nations, my spatial analysis, shows county level percents of obese adults among the 11 nations within America.
By applying this new map division, we can assess that the percent of obesity is highest in the greater Appalachia and the deep south. This may reflect the specific food intake in these areas and the culture of the residents. To further, assess this notion non-spatial analysis of the agriculture of this area, the ethnic composition, environmental food index and possibly other variables can be factored into determining this notion.
Obesity2 <- Obesity1 %>%
mutate(fips= Geo_FIPS)
ct_map <- ct_map %>%
mutate(fips = parse_integer(GEOID))
Obesitymap <- ct_map %>%
left_join(Obesity2, by = "fips")
Amer.nations<- Amer.nation%>%
mutate(fips= fips_code)
Amer.nations.obs<- Obesitymap %>%
left_join(Amer.nations, by = "fips")
Amer.nations.obs1 <- Amer.nations.obs %>%
filter(STATEFP.x != "02") %>%
filter(STATEFP.x != "15") %>%
filter(STATEFP.x != "60") %>%
filter(STATEFP.x != "66") %>%
filter(STATEFP.x != "69") %>%
filter(STATEFP.x != "72") %>%
filter(STATEFP.x != "78")
Amer.nations.obs1 <- Amer.nations.obs1 %>%
mutate(obeseAdults = Obese_Adults- mean(Obesity2$Obese_Adults))
Amer.nation <- Amer.nations.obs1 %>%
aggregate_map(by = "AN_KEY")
tm_shape(Amer.nations.obs1, projection = 2163) + tm_polygons("obeseAdults", palette = "-RdBu", border.col = "grey", border.alpha = .4) +
tm_shape(Amer.nation) + tm_borders(lwd = .50, col = "black", alpha = 1)
The Tigris package, is a package that is already downloaded into R. This makes creating spatial mapping much easier to code and to apply to many analysis. On slide 32, we have coded counties(CB= TRUE), for this analysis the CB has been changed to FALSE. When CB is set to FALSE the most detailed TIGER/Line file is produced. When it is set to TRUE, the downloaded file is generalized to (1:500k).
options(tigris_class = "sf")
t_county <- counties(cb = TRUE)
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|==== | 7%
|
|===== | 7%
|
|===== | 8%
|
|====== | 9%
|
|====== | 10%
|
|======= | 10%
|
|======= | 11%
|
|======== | 12%
|
|======== | 13%
|
|========= | 13%
|
|========= | 14%
|
|========== | 15%
|
|========== | 16%
|
|=========== | 16%
|
|=========== | 17%
|
|=========== | 18%
|
|============ | 18%
|
|============ | 19%
|
|============= | 19%
|
|============= | 20%
|
|============= | 21%
|
|============== | 21%
|
|============== | 22%
|
|=============== | 22%
|
|=============== | 23%
|
|=============== | 24%
|
|================ | 24%
|
|================ | 25%
|
|================= | 26%
|
|================= | 27%
|
|================== | 27%
|
|================== | 28%
|
|=================== | 29%
|
|=================== | 30%
|
|==================== | 30%
|
|==================== | 31%
|
|===================== | 32%
|
|===================== | 33%
|
|====================== | 33%
|
|====================== | 34%
|
|====================== | 35%
|
|======================= | 35%
|
|======================= | 36%
|
|======================== | 36%
|
|======================== | 37%
|
|======================== | 38%
|
|========================= | 38%
|
|========================= | 39%
|
|========================== | 39%
|
|========================== | 40%
|
|========================== | 41%
|
|=========================== | 41%
|
|=========================== | 42%
|
|============================ | 42%
|
|============================ | 43%
|
|============================ | 44%
|
|============================= | 44%
|
|============================= | 45%
|
|============================== | 46%
|
|============================== | 47%
|
|=============================== | 47%
|
|=============================== | 48%
|
|================================ | 48%
|
|================================ | 49%
|
|================================ | 50%
|
|================================= | 50%
|
|================================= | 51%
|
|================================== | 52%
|
|================================== | 53%
|
|=================================== | 53%
|
|=================================== | 54%
|
|==================================== | 55%
|
|==================================== | 56%
|
|===================================== | 56%
|
|===================================== | 57%
|
|====================================== | 58%
|
|====================================== | 59%
|
|======================================= | 59%
|
|======================================= | 60%
|
|======================================= | 61%
|
|======================================== | 61%
|
|======================================== | 62%
|
|========================================= | 62%
|
|========================================= | 63%
|
|========================================= | 64%
|
|========================================== | 64%
|
|========================================== | 65%
|
|=========================================== | 65%
|
|=========================================== | 66%
|
|=========================================== | 67%
|
|============================================ | 67%
|
|============================================ | 68%
|
|============================================= | 69%
|
|============================================= | 70%
|
|============================================== | 70%
|
|============================================== | 71%
|
|=============================================== | 72%
|
|=============================================== | 73%
|
|================================================ | 73%
|
|================================================ | 74%
|
|================================================ | 75%
|
|================================================= | 75%
|
|================================================= | 76%
|
|================================================== | 76%
|
|================================================== | 77%
|
|================================================== | 78%
|
|=================================================== | 78%
|
|=================================================== | 79%
|
|==================================================== | 79%
|
|==================================================== | 80%
|
|==================================================== | 81%
|
|===================================================== | 81%
|
|===================================================== | 82%
|
|====================================================== | 82%
|
|====================================================== | 83%
|
|====================================================== | 84%
|
|======================================================= | 84%
|
|======================================================= | 85%
|
|======================================================== | 86%
|
|======================================================== | 87%
|
|========================================================= | 87%
|
|========================================================= | 88%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|=========================================================== | 90%
|
|=========================================================== | 91%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================= | 93%
|
|============================================================= | 94%
|
|============================================================= | 95%
|
|============================================================== | 95%
|
|============================================================== | 96%
|
|=============================================================== | 96%
|
|=============================================================== | 97%
|
|=============================================================== | 98%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 99%
|
|=================================================================| 100%
t_obs_data <- t_county %>%
mutate(fips = parse_integer(GEOID)) %>%
left_join(Obesity2, by = "fips")
t_obs_data_sub <- t_obs_data %>%
filter(STATEFP.x != "02") %>%
filter(STATEFP.x != "15") %>%
filter(STATEFP.x != "60") %>%
filter(STATEFP.x != "66") %>%
filter(STATEFP.x != "69") %>%
filter(STATEFP.x != "72") %>%
filter(STATEFP.x != "78")
ob_states <- t_obs_data_sub %>%
aggregate_map(by = "STATEFP.x")
obese_nation <- t_obs_data_sub %>%
aggregate_map(by = "STATEFP.x")
t_obs_data_sub <- t_obs_data_sub %>%
mutate(obeseAdults = Obese_Adults - mean(Obesity2$Obese_Adults))
tm_shape(t_obs_data_sub, projection = 2163) + tm_polygons("obeseAdults", palette = "-RdBu", border.col = "grey", border.alpha = .4) +
tm_shape(obese_nation) + tm_borders(lwd = .50, col = "black", alpha = 1)
The spatial analysis below examines Obesity in within New York State only. Within New York State we can examine that the majority of high percentage of obesity is located in more rural areas. From our spatial map we can see that there is one county in particular which has the highest percent of obese adults. According to its history, the county is in a rural setting with a population of only 51,401 and a per capita income for the county was $16,427. Non-spatial analysis, may assist to finding underlying factors in why this county’s percent of obese adults stands out more than its surrounding counties.
ct_map <- filter(ct_map, STATEFP=="36")
ct_map <- ct_map[order(ct_map$NAME),]
Obesity2 <- Obesity2[order(Obesity2$Geo_NAME),]
identical(ct_map$NAME,Obesity2$Geo_NAME)
## [1] FALSE
ct_Obese_map <- left_join(ct_map, Obesity2, key.shp = "NAME", key.data="Geo_NAME")
Obesepalette <- colorNumeric(palette = "Purples", domain=ct_Obese_map$Obese_Adults)
ilpopup_Obese <- paste0("County: ", ct_Obese_map$NAME, " | Obesity: ", (ct_Obese_map$Obese_Adults))
leaflet(ct_Obese_map) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(stroke=FALSE,
smoothFactor = 0.2,
fillOpacity = .8,
popup=ilpopup_Obese,
color= ~Obesepalette(ct_Obese_map$Obese_Adults)
)
After, evaluating both spatial and non-spatial analysis, I have found that both have great advantages but, also appear to have disadvantages. Spatial data is simple and appealing but, does not offer in depth analysis. By ultimately combining both types of data analysis, locational observations can be evaluated easily a alongside independent associations with other factors.