INTRODUCTION:

I have taken the data from Social explorer. Here I am interested in looking at health of persons in USA and how various factors are contributing to health. In my data set I have choosen variables T009=diabetic, T12-001=Persons with Limited Access to Healthy Foods, T12-002=Persons with Access to Exercise Opportunities, T012_003=Obese Persons (20 Years and Over), T012_004=Physically Inactive Persons (20 Years and Over), SE_T013_001=Food Environment Index and counties and states and created STATEFP which has categorized the counties by States. I have renamed “T012_003” as Obese_Adults to evaluate the percent of obese adults by location.

The purpose of my analysis is to visualize my data on the percent of obese adults through two different methods. The first, method used is a non-spatial regression and by usuing ggplot that will observes the relationship between states and the percent of obese adults. The second method used is through a spatial visual of the percent of obesity among all states and counties, the 11 nations of America.

library(tidyverse)
library(sf)
library(tmap)
library(tigris)
library(spdep)
options(tigris_use_cache = TRUE)
options(tigris_progress_bar = FALSE)
options(tidycensus_progress_bar = FALSE)
ct_map <- st_read('/Users/kanwallatif/Documents/tl_2016_us_county/tl_2016_us_county.shp', stringsAsFactors = FALSE)
## Reading layer `tl_2016_us_county' from data source `/Users/kanwallatif/Documents/tl_2016_us_county/tl_2016_us_county.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3233 features and 17 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.44106
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs
library(readr)
Amer.nation <- read_csv('/Users/kanwallatif/Documents/county_results.csv')
library(readr)
Obesity <- read_csv("/Users/kanwallatif/Documents/obesity.csv")

Renaming my Variables:

library(dplyr)
library(kableExtra)
Obesity <- rename (Obesity,
          "County" = Geo_QNAME,
          "STATEFP" = Geo_STATE,
          "Ad_Diabet" = SE_T009_001,
          "Ad_limitFds" = SE_T012_001,
          "Access_Exer_Opport" = SE_T012_002,
          "Obese_Adults" = SE_T012_003,
          "PhysInactive" = SE_T012_004,
          "FEI" = SE_T013_001)
Obesity1 <- Obesity
select(Obesity1, STATEFP, County, Obese_Adults, PhysInactive, FEI) 
## # A tibble: 3,141 x 5
##    STATEFP County                           Obese_Adults PhysInactive   FEI
##      <dbl> <chr>                                   <dbl>        <dbl> <dbl>
##  1       1 Autauga County, Alabama                  30.9         28.7   7.1
##  2       4 Apache County, Arizona                   32.8         21.8   0.5
##  3       5 Arkansas County, Arkansas                34           37.1   6  
##  4       6 Alameda County, California               20           14.7   7.6
##  5       8 Adams County, Colorado                   25.8         20     8.1
##  6       9 Fairfield County, Connecticut            20.8         20.3   8.4
##  7      10 Kent County, Delaware                    33.2         28.1   7.7
##  8      11 District of Columbia County, Di…         22.1         17     8  
##  9      12 Alachua County, Florida                  25.3         19.2   6.1
## 10      13 Appling County, Georgia                  35.2         29.6   6.6
## # … with 3,131 more rows

NON SPATIAL REGRESSION:

Ecological Regression

Obese<- Obesity1 %>% 
  group_by(STATEFP) %>% 
  summarise(mean_p = mean(Obese_Adults, na.rm = TRUE), mean_s = mean(FEI, na.rm = TRUE))

ecoobs <- lm(mean_p ~ mean_s, data = Obese)
summary(ecoobs)
## 
## Call:
## lm(formula = mean_p ~ mean_s, data = Obese)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8146 -2.5721  0.6634  2.8373  5.8935 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   43.012      4.824   8.916 7.86e-12 ***
## mean_s        -1.896      0.676  -2.805   0.0072 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.642 on 49 degrees of freedom
## Multiple R-squared:  0.1383, Adjusted R-squared:  0.1207 
## F-statistic: 7.866 on 1 and 49 DF,  p-value: 0.007202

INTERPRETATION:

Here, in the firts analysis I am showing the relationship between FEI (food environmental index) and the percent of Obese Adults. The ecological regression, shown above illustrates the mean of all obese adults and the food scarcity in each state. The regression evaluated is significant and it can be assumed that the mean food environment index by states are decreased by -1.896 for every percent increase of the average obesity level in the state.

PLOTTING:

library(lemon)
library(ggplot2)
ggplot(Obesity1, aes(x=STATEFP, y=Obese_Adults)) + 
  geom_point() + 
  coord_capped_cart(bottom='both', left='none') +
  theme_light() + theme(panel.border=element_blank(), axis.line = element_line())

INTERPRETATION:
The plot above showing a statistical observation of the percent of Obese Adults among the counties within its State.

Spatial Mapping of the Percent of Obese Adults among States:

Data is used in Spatial Analysis to define locations, shape, and size. Analyzing my data through spatial analysis, provides with a general distinction multi-dimensional and auto correlation of the percent of obese adults.

Obesity2 <- Obesity1 %>% 
  mutate(fips= GEOID)
ct_map <- ct_map %>% filter(!STATEFP %in% c("02","15","60","66","69","72","78","79"))%>%
  
  mutate(fips = parse_integer(GEOID)) 
Obesitymap <- ct_map %>% 
  left_join(Obesity2, by = "fips")

Looking at Obesity percentages in the United States:

library(tmap)
tm_shape(Obesitymap, projection = 2163) + tm_polygons("Obese_Adults")

Mean of Obese_Adults:

To observe the differences between states clearly, the mean of Obese_Adults will help differentiating the states percentages of obese adults.

library(tmaptools)
Obesitymap <- Obesitymap %>% 
  mutate(obeseAdults = Obese_Adults - mean(Obesity2$Obese_Adults))
us_states <- Obesitymap %>% 
  aggregate_map(by = "STATEFP.x")

Highlighting State Lines:

By highlighting the state lines gives this spatial visual a more clear understanding of obesity within the nation. Using the legend we can see the differences among states and their counties. From the visual above the most obese areas reside in the eastern south portion of the United States. I can also see a large percentages of obese adults in the border of Mississippi, Arkansas and Louisiana. The maps legend and outlining of all counties and states identifies Mississippi as having the highest percent of obese adults.

tm_shape(Obesitymap, projection = 2163) + tm_polygons("obeseAdults", palette = "-RdBu", border.col = "grey", border.alpha = .4) + 
  tm_shape(us_states) + tm_borders(lwd = .36, col = "black", alpha = 1) 

The American Nations:

A History of the United States, was written by Co-authors Mark Carnes and John Garraty. The text depicts the political history of the United States intimately tied with its social, economic and cultural development. Using the mapping file of the American Nations, my spatial analysis shows county level percents of obese adults among the 11 nations within America.In the map below I can assess that the percent of obesity is highest in the greater Appalachia and the deep south.

Obesity2 <- Obesity1 %>% 
  mutate(fips= GEOID)

ct_map <- ct_map %>% 
  mutate(fips = parse_integer(GEOID)) 

Obesitymap <- ct_map %>% 
  left_join(Obesity2, by = "fips")

Amer.nations<- Amer.nation%>%
mutate(fips= fips_code)

Amer.nations.obs<- Obesitymap %>% 
  left_join(Amer.nations, by = "fips")

Amer.nations.obs1 <- Amer.nations.obs %>% 
  filter(STATEFP.x != "02") %>% 
  filter(STATEFP.x != "15") %>% 
  filter(STATEFP.x != "60") %>% 
  filter(STATEFP.x != "66") %>% 
  filter(STATEFP.x != "69") %>% 
  filter(STATEFP.x != "72") %>% 
  filter(STATEFP.x != "78") 
Amer.nations.obs1 <- Amer.nations.obs1 %>% 
  mutate(obeseAdults = Obese_Adults- mean(Obesity2$Obese_Adults))

Amer.nation <- Amer.nations.obs1 %>% 
  aggregate_map(by = "AN_KEY")

tm_shape(Amer.nations.obs1, projection = 2163) + tm_polygons("obeseAdults", palette = "-RdBu", border.col = "grey", border.alpha = .4, , midpoint = NA) + 
  tm_shape(Amer.nation) + tm_borders(lwd = .50, col = "black", alpha = 1)

TIGRIS:

cb=FALSE

The Tigris package, is a package that is already downloaded into R. This makes creating spatial mapping much easier to code and to apply to many analysis. t_county<- counties(cb=TRUE) will grap county level boundry files from the official census website and put in on my machine. On slide 33, coded counties(CB= TRUE), for this analysis the CB has been changed to FALSE. When CB is set to FALSE the most detailed TIGER/Line file is produced. When it is set to TRUE, the downloaded file is generalized to (1:500k).

options(tigris_class = "sf")
t_county <- counties(cb = FALSE)
t_obs_data <- t_county %>% 
  mutate(fips = parse_integer(GEOID)) %>% 
  left_join(Obesity2, by = "fips")
t_obs_data_sub <- t_obs_data %>% 
  filter(STATEFP.x != "02") %>% 
  filter(STATEFP.x != "15") %>% 
  filter(STATEFP.x != "60") %>% 
  filter(STATEFP.x != "66") %>% 
  filter(STATEFP.x != "69") %>% 
  filter(STATEFP.x != "72") %>% 
  filter(STATEFP.x != "78")
ob_states <- t_obs_data_sub %>% 
  aggregate_map(by = "STATEFP.x")
obese_nation <- t_obs_data_sub %>% 
  aggregate_map(by = "STATEFP.x")
t_obs_data_sub <- t_obs_data_sub %>% 
  mutate(obeseAdults = Obese_Adults - mean(Obesity2$Obese_Adults))
tm_shape(t_obs_data_sub, projection = 2163) + tm_polygons("obeseAdults", palette = "-RdBu", border.col = "grey", border.alpha = .4) + 
  tm_shape(obese_nation) + tm_borders(lwd = .50, col = "black", alpha = 1)

Differences between Spatial and Non-Spatial Data:

Spatial data are primarily defined as those which are directly or indirectly referenced to a location on the surface of the earth. When a dataset cannot be related to a location on the surface of the earth is referred as non spatial data. The non spatial data are numbers, characters or logical type.
Spatial data refers to the shape, size and location of the feature. Non- spatial data refers to other attributes associated with the feature such as name, length, area, volume, population, soil type, etc.
Non-spatial data is usually one dimensional and independent while spatial is multidimensional and auto correlated.
Locational data is easily assessed through spatial analysis rather than processing locational functions which will have probabilities of having wrong outcomes due to other independent variables influences.
In the above analysis, Spatial data evaluates obesity as a whole and its relationship between states but, is not as precise and specific as non-spatial.

Conclusion:

After, evaluating both spatial and non-spatial analysis, I have found that both have its own advantages and disadvantages. Spatial data is simple and appealing but, does not offer in depth analysis. It is a good idea to combine both types of data analysis.