I have taken the data from Social explorer. Here I am interested in looking at health of persons in USA and how various factors are contributing to health. In my data set I have choosen variables T009=diabetic, T12-001=Persons with Limited Access to Healthy Foods, T12-002=Persons with Access to Exercise Opportunities, T012_003=Obese Persons (20 Years and Over), T012_004=Physically Inactive Persons (20 Years and Over), SE_T013_001=Food Environment Index and counties and states and created STATEFP which has categorized the counties by States. I have renamed “T012_003” as Obese_Adults to evaluate the percent of obese adults by location.
The purpose of my analysis is to visualize my data on the percent of obese adults through two different methods. The first, method used is a non-spatial regression and by usuing ggplot that will observes the relationship between states and the percent of obese adults. The second method used is through a spatial visual of the percent of obesity among all states and counties, the 11 nations of America.
library(tidyverse)
library(sf)
library(tmap)
library(tigris)
library(spdep)
options(tigris_use_cache = TRUE)
options(tigris_progress_bar = FALSE)
options(tidycensus_progress_bar = FALSE)
ct_map <- st_read('/Users/kanwallatif/Documents/tl_2016_us_county/tl_2016_us_county.shp', stringsAsFactors = FALSE)
## Reading layer `tl_2016_us_county' from data source `/Users/kanwallatif/Documents/tl_2016_us_county/tl_2016_us_county.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3233 features and 17 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.44106
## epsg (SRID): 4269
## proj4string: +proj=longlat +datum=NAD83 +no_defs
library(readr)
Amer.nation <- read_csv('/Users/kanwallatif/Documents/county_results.csv')
library(readr)
Obesity <- read_csv("/Users/kanwallatif/Documents/obesity.csv")
library(dplyr)
library(kableExtra)
Obesity <- rename (Obesity,
"County" = Geo_QNAME,
"STATEFP" = Geo_STATE,
"Ad_Diabet" = SE_T009_001,
"Ad_limitFds" = SE_T012_001,
"Access_Exer_Opport" = SE_T012_002,
"Obese_Adults" = SE_T012_003,
"PhysInactive" = SE_T012_004,
"FEI" = SE_T013_001)
Obesity1 <- Obesity
select(Obesity1, STATEFP, County, Obese_Adults, PhysInactive, FEI)
## # A tibble: 3,141 x 5
## STATEFP County Obese_Adults PhysInactive FEI
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1 Autauga County, Alabama 30.9 28.7 7.1
## 2 4 Apache County, Arizona 32.8 21.8 0.5
## 3 5 Arkansas County, Arkansas 34 37.1 6
## 4 6 Alameda County, California 20 14.7 7.6
## 5 8 Adams County, Colorado 25.8 20 8.1
## 6 9 Fairfield County, Connecticut 20.8 20.3 8.4
## 7 10 Kent County, Delaware 33.2 28.1 7.7
## 8 11 District of Columbia County, Di… 22.1 17 8
## 9 12 Alachua County, Florida 25.3 19.2 6.1
## 10 13 Appling County, Georgia 35.2 29.6 6.6
## # … with 3,131 more rows
Ecological Regression
Obese<- Obesity1 %>%
group_by(STATEFP) %>%
summarise(mean_p = mean(Obese_Adults, na.rm = TRUE), mean_s = mean(FEI, na.rm = TRUE))
ecoobs <- lm(mean_p ~ mean_s, data = Obese)
summary(ecoobs)
##
## Call:
## lm(formula = mean_p ~ mean_s, data = Obese)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.8146 -2.5721 0.6634 2.8373 5.8935
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 43.012 4.824 8.916 7.86e-12 ***
## mean_s -1.896 0.676 -2.805 0.0072 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.642 on 49 degrees of freedom
## Multiple R-squared: 0.1383, Adjusted R-squared: 0.1207
## F-statistic: 7.866 on 1 and 49 DF, p-value: 0.007202
INTERPRETATION:
Here, in the firts analysis I am showing the relationship between FEI (food environmental index) and the percent of Obese Adults. The ecological regression, shown above illustrates the mean of all obese adults and the food scarcity in each state. The regression evaluated is significant and it can be assumed that the mean food environment index by states are decreased by -1.896 for every percent increase of the average obesity level in the state.
library(lemon)
library(ggplot2)
ggplot(Obesity1, aes(x=STATEFP, y=Obese_Adults)) +
geom_point() +
coord_capped_cart(bottom='both', left='none') +
theme_light() + theme(panel.border=element_blank(), axis.line = element_line())
INTERPRETATION:
The plot above showing a statistical observation of the percent of Obese Adults among the counties within its State.
Data is used in Spatial Analysis to define locations, shape, and size. Analyzing my data through spatial analysis, provides with a general distinction multi-dimensional and auto correlation of the percent of obese adults.
Obesity2 <- Obesity1 %>%
mutate(fips= GEOID)
ct_map <- ct_map %>% filter(!STATEFP %in% c("02","15","60","66","69","72","78","79"))%>%
mutate(fips = parse_integer(GEOID))
Obesitymap <- ct_map %>%
left_join(Obesity2, by = "fips")
library(tmap)
tm_shape(Obesitymap, projection = 2163) + tm_polygons("Obese_Adults")
To observe the differences between states clearly, the mean of Obese_Adults will help differentiating the states percentages of obese adults.
library(tmaptools)
Obesitymap <- Obesitymap %>%
mutate(obeseAdults = Obese_Adults - mean(Obesity2$Obese_Adults))
us_states <- Obesitymap %>%
aggregate_map(by = "STATEFP.x")
By highlighting the state lines gives this spatial visual a more clear understanding of obesity within the nation. Using the legend we can see the differences among states and their counties. From the visual above the most obese areas reside in the eastern south portion of the United States. I can also see a large percentages of obese adults in the border of Mississippi, Arkansas and Louisiana. The maps legend and outlining of all counties and states identifies Mississippi as having the highest percent of obese adults.
tm_shape(Obesitymap, projection = 2163) + tm_polygons("obeseAdults", palette = "-RdBu", border.col = "grey", border.alpha = .4) +
tm_shape(us_states) + tm_borders(lwd = .36, col = "black", alpha = 1)
A History of the United States, was written by Co-authors Mark Carnes and John Garraty. The text depicts the political history of the United States intimately tied with its social, economic and cultural development. Using the mapping file of the American Nations, my spatial analysis shows county level percents of obese adults among the 11 nations within America.In the map below I can assess that the percent of obesity is highest in the greater Appalachia and the deep south.
Obesity2 <- Obesity1 %>%
mutate(fips= GEOID)
ct_map <- ct_map %>%
mutate(fips = parse_integer(GEOID))
Obesitymap <- ct_map %>%
left_join(Obesity2, by = "fips")
Amer.nations<- Amer.nation%>%
mutate(fips= fips_code)
Amer.nations.obs<- Obesitymap %>%
left_join(Amer.nations, by = "fips")
Amer.nations.obs1 <- Amer.nations.obs %>%
filter(STATEFP.x != "02") %>%
filter(STATEFP.x != "15") %>%
filter(STATEFP.x != "60") %>%
filter(STATEFP.x != "66") %>%
filter(STATEFP.x != "69") %>%
filter(STATEFP.x != "72") %>%
filter(STATEFP.x != "78")
Amer.nations.obs1 <- Amer.nations.obs1 %>%
mutate(obeseAdults = Obese_Adults- mean(Obesity2$Obese_Adults))
Amer.nation <- Amer.nations.obs1 %>%
aggregate_map(by = "AN_KEY")
tm_shape(Amer.nations.obs1, projection = 2163) + tm_polygons("obeseAdults", palette = "-RdBu", border.col = "grey", border.alpha = .4, , midpoint = NA) +
tm_shape(Amer.nation) + tm_borders(lwd = .50, col = "black", alpha = 1)
cb=FALSE
The Tigris package, is a package that is already downloaded into R. This makes creating spatial mapping much easier to code and to apply to many analysis. t_county<- counties(cb=TRUE) will grap county level boundry files from the official census website and put in on my machine. On slide 33, coded counties(CB= TRUE), for this analysis the CB has been changed to FALSE. When CB is set to FALSE the most detailed TIGER/Line file is produced. When it is set to TRUE, the downloaded file is generalized to (1:500k).
options(tigris_class = "sf")
t_county <- counties(cb = FALSE)
t_obs_data <- t_county %>%
mutate(fips = parse_integer(GEOID)) %>%
left_join(Obesity2, by = "fips")
t_obs_data_sub <- t_obs_data %>%
filter(STATEFP.x != "02") %>%
filter(STATEFP.x != "15") %>%
filter(STATEFP.x != "60") %>%
filter(STATEFP.x != "66") %>%
filter(STATEFP.x != "69") %>%
filter(STATEFP.x != "72") %>%
filter(STATEFP.x != "78")
ob_states <- t_obs_data_sub %>%
aggregate_map(by = "STATEFP.x")
obese_nation <- t_obs_data_sub %>%
aggregate_map(by = "STATEFP.x")
t_obs_data_sub <- t_obs_data_sub %>%
mutate(obeseAdults = Obese_Adults - mean(Obesity2$Obese_Adults))
tm_shape(t_obs_data_sub, projection = 2163) + tm_polygons("obeseAdults", palette = "-RdBu", border.col = "grey", border.alpha = .4) +
tm_shape(obese_nation) + tm_borders(lwd = .50, col = "black", alpha = 1)
Spatial data are primarily defined as those which are directly or indirectly referenced to a location on the surface of the earth. When a dataset cannot be related to a location on the surface of the earth is referred as non spatial data. The non spatial data are numbers, characters or logical type.
Spatial data refers to the shape, size and location of the feature. Non- spatial data refers to other attributes associated with the feature such as name, length, area, volume, population, soil type, etc.
Non-spatial data is usually one dimensional and independent while spatial is multidimensional and auto correlated.
Locational data is easily assessed through spatial analysis rather than processing locational functions which will have probabilities of having wrong outcomes due to other independent variables influences.
In the above analysis, Spatial data evaluates obesity as a whole and its relationship between states but, is not as precise and specific as non-spatial.
After, evaluating both spatial and non-spatial analysis, I have found that both have its own advantages and disadvantages. Spatial data is simple and appealing but, does not offer in depth analysis. It is a good idea to combine both types of data analysis.