fastfooddesert <- read.csv(
file="https://raw.githubusercontent.com/RobertSellers/R/master/data/fooddesert_v1.txt",
header=TRUE, sep=",")
fastfooddesert <- tbl_df(fastfooddesert)

Research question

Are fast food chains in the urban Northeast United States contributing to or symptomatic of a lack of availabile healthy alternatives?

Cases

The United States Census Tract is often the most viable central relationship data to any geographic analyses and this is no exception. Our cases will be the urban census tracts for the states of Pennsylvania, New Jersey, New York, Connecticut, Massachusetts, Rhode Island, Vermont, New Hampshire, and Maine.

Data collection

The data will be refined and initially calculated using GIS software (ArcGIS), but the inferential statistics and analysis will be conducted using R. Food Desert data, reflecting urban census tracts, is joined with US Census Tract Data. In an effort to keep processing capability effective, the urban and northeastern subset of states and census tracts were selected as the sample population area. As data sources improve and methods become refined, the statistical analysis could be easily transferred over the rest of the nation.

A demonstration of these spatial relationships can viewed in the following map of Rochester New York.

Type of study

This is an observational study.

Data Source

Fast food chain data was taken from fastfoodmaps.com. Food Desert data was retrieved from the USDA Food Access Research Atlas. 2014 Cartographic Boundary Shapefiles (US Census tract data) were retrieved from the United States Census Bureau.

With regards to retail geographic data, the premier sources would require extensive scraping and/or purchases licensed data. With this in mind, the conclusion of this analysis should be chiefly regarded for its methods.

Response

The response variable is the severity of food restriction, or lack of food access in low-income census tracts. Specifically, the USDA designated low-access urban tracts. Low-income is defined by the standards of the Department of Treasury’s New Market Tax Credit (NMTC) program within the USDA dataset. The food access variable is defined by the USDA as “limited access to supermarkets, supercenters, grocery stores, or other sources of healthy and affordable food retailers” and is designated in urban tracts as being places of limited access to a substantial proportion of the tract’s population to within 1 or 0.5 miles of a source. The specifics of these are outlined in the documentation for the Food Access Research Atlas. The variable is, using the USDA low-access data as precedent, categorical.

Explanatory

The explanatory variable is the proportion of fast food restaurants per person, per census tract. This would be represented numerically as the ratio between census tract population and the number of fast food restaurants spatially intersecting the census tract. This is a geoprocessing function that uses a one-to-many spatial intersection in GIS. In other words, it returns a count field where point features within the polygon source census tracts. The resulting table can be rejoined to the census tract and food desert data by once again using the census tract unique identifier. The iPython Python script equivalent is arcpy.TabulateIntersection_analysis.

Relevant summary statistics

fastfooddesert$ratio<-fastfooddesert$PNT_COUNT/fastfooddesert$POP2010
fastfooddesert$ratio[is.infinite(fastfooddesert$ratio)] <- 0

Sample Size

n<-nrow(fastfooddesert)
n
## [1] 11079

Mean, median, and standard deviation of the explanatory variable. The data is left-tailed.

mean(fastfooddesert$ratio, na.rm=TRUE)
## [1] 0.0005180905
median(fastfooddesert$ratio, na.rm=TRUE)
## [1] 0
sd(fastfooddesert$ratio, na.rm=TRUE)
## [1] 0.02152387
quantile(fastfooddesert$ratio, na.rm=TRUE)
##           0%          25%          50%          75%         100% 
## 0.0000000000 0.0000000000 0.0000000000 0.0001693337 2.0000000000

Density Plots and Frequency Table for the explanatory variable representing fast food chains per census tract.

fastfood.freq = table(fastfooddesert$PNT_COUNT)
hist(fastfooddesert$PNT_COUNT, main="",
col="lightblue", breaks=7, xlab="Fast Food Chains")
fastfood.freq
## 
##    0    1    2    3    4    5    6    7    8    9 
## 7915 1885  718  321  145   54   31    8    1    1

Boxplot for the explanatory variable representing fast food chains per census tract in relation to population.

boxplot(fastfooddesert$POP2010~fastfooddesert$PNT_COUNT,xlab="Fast Food Chains",ylab="Population")

Finally, we must create a categorical variable called “fooddesert” as the existing data does not suit our analysis perfectly. The following descriptions are the USDA definitions for these data.

halfmile: Low-income census tracts where a significant number or share of residents is more than 0.5 miles from the nearest supermarket.

onemile: Low-income census tracts where a significant number or share of residents is more than 1 mile from the nearest supermarket.

supplied: Residents are within 0.5 miles of a supermarket.

fastfooddesert$fooddesert <- fastfooddesert$LA1and10+fastfooddesert$LAhalfand1
fastfooddesert$fooddesert[fastfooddesert$fooddesert==2]<-"halfmile"
fastfooddesert$fooddesert[fastfooddesert$fooddesert==1]<-"onemile"
fastfooddesert$fooddesert[fastfooddesert$fooddesert==0]<-"supplied"

Frequency Table for the explanatory variable.

fooddesert.freq = table(fastfooddesert$fooddesert)
fooddesert.freq
## 
## halfmile  onemile supplied 
##     4174     3616     3289