“Final Project: The Price of Bread over Time in Different Cities”
output:html_notebook html_document: df_print: paged
**Goal**:**To compare the change of the price of a loaf of white bread over time from 2015 to 2020 in the US**
. Data Sources:Price of White Bread from Numbeo (2015 to 2020), ACS data for MHI from 2015 to 2020**
—{r}
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.4 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(sf)
## Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library(RColorBrewer)
library(viridis)
## Loading required package: viridisLite
##
## Attaching package: 'viridis'
## The following object is masked from 'package:scales':
##
## viridis_pal
library(readxl)
library(tidycensus)
options(tigris_use_cache = TRUE)
library(sf)
#pull in the data
read_excel("Final R Data.xlsx")
## # A tibble: 25 x 3
## `1` `Honolulu, HI, United States` `4`
## <dbl> <chr> <dbl>
## 1 2 New York, NY, United States 3.5
## 2 3 Washington, DC, United States 3.5
## 3 4 Portland, OR, United States 3
## 4 5 San Jose, CA, United States 2.67
## 5 6 Los Angeles, CA, United States 2.62
## 6 7 Saint Louis, MO, United States 2.51
## 7 8 Boston, MA, United States 2.5
## 8 9 San Francisco, CA, United States 2.5
## 9 10 Hartford, CT, United States 2.5
## 10 11 Detroit, MI, United States 2.45
## # ... with 15 more rows
read.csv("state_postal_key.csv")
## ï..Postal State
## 1 AL Alabama
## 2 AK Alaska
## 3 AZ Arizona
## 4 AR Arkansas
## 5 CA California
## 6 CO Colorado
## 7 CT Connecticut
## 8 DE Delaware
## 9 DC District of Columbia
## 10 FL Florida
## 11 GA Georgia
## 12 HI Hawaii
## 13 ID Idaho
## 14 IL Illinois
## 15 IN Indiana
## 16 IA Iowa
## 17 KS Kansas
## 18 KY Kentucky
## 19 LA Louisiana
## 20 ME Maine
## 21 MD Maryland
## 22 MA Massachusetts
## 23 MI Michigan
## 24 MN Minnesota
## 25 MS Mississippi
## 26 MO Missouri
## 27 MT Montana
## 28 NE Nebraska
## 29 NV Nevada
## 30 NH New Hampshire
## 31 NJ New Jersey
## 32 NM New Mexico
## 33 NY New York
## 34 NC North Carolina
## 35 ND North Dakota
## 36 OH Ohio
## 37 OK Oklahoma
## 38 OR Oregon
## 39 PA Pennsylvania
## 40 RI Rhode Island
## 41 SC South Carolina
## 42 SD South Dakota
## 43 TN Tennessee
## 44 TX Texas
## 45 UT Utah
## 46 VT Vermont
## 47 VA Virginia
## 48 WA Washington
## 49 WV West Virginia
## 50 WI Wisconsin
## 51 WY Wyoming
state_keys<-read.csv("state_postal_key.csv")
#rename the dataframe of the price of bread
bread_2010<-read_excel("Final R Data.xlsx",
sheet = "2010", na = "-",col_names = FALSE)%>%
rename(rank=...1,
City=...2,
Loaf_of_White_Bread_2010 =...3,)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
bread_2011<-read_excel("Final R Data.xlsx",
sheet = "2011", col_names = FALSE)%>%
rename(rank=...1,
City=...2,
Loaf_of_White_Bread_2011 =...3,)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
bread_2012<-read_excel("Final R Data.xlsx",
sheet = "2012", col_names = FALSE)%>%
rename(rank=...1,
City=...2,
Loaf_of_White_Bread_2012 =...3,)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
bread_2013<-read_excel("Final R Data.xlsx",
sheet = "2013", col_names = FALSE)%>%
rename(rank=...1,
City=...2,
Loaf_of_White_Bread_2013 =...3,)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
bread_2014<-read_excel("Final R Data.xlsx",
sheet = "2014", col_names = FALSE)%>%
rename(rank=...1,
City=...2,
Loaf_of_White_Bread_2014 =...3,)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
bread_2015<-read_excel("Final R Data.xlsx",
sheet = "2015",na = "-", col_names = FALSE)%>%
rename(rank=...1,
City=...2,
Loaf_of_White_Bread_2015 =...3)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
bread_2016<-read_excel("Final R Data.xlsx",
sheet = "2016",na="-", col_names = FALSE)%>%
rename(rank=...1,
City=...2,
Loaf_of_White_Bread_2016 =...3,)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
bread_2017<-read_excel("Final R Data.xlsx",
sheet = "2017",na="-", col_names = FALSE)%>%
rename(rank=...1,
City=...2,
Loaf_of_White_Bread_2017 =...3,)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
bread_2018<-read_excel("Final R Data.xlsx",
sheet = "2018",na="-", col_names = FALSE)%>%
rename(rank=...1,
City=...2,
Loaf_of_White_Bread_2018 =...3,)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
bread_2019<-read_excel("Final R Data.xlsx",
sheet = "2019",na="-", col_names = FALSE)%>%
rename(rank=...1,
City=...2,
Loaf_of_White_Bread_2019 =...3,)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
bread_2020<-read_excel("Final R Data.xlsx",
sheet = "2020",na="-", col_names = FALSE)%>%
rename(rank=...1,
City=...2,
Loaf_of_White_Bread_2020 =...3,)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
#join the dataframe together
bread_2015_to_2020<-bread_2015%>%
left_join(bread_2016,by=c("City"))%>%
left_join(bread_2017,by=c("City"))%>%
left_join(bread_2018,by=c("City"))%>%
left_join(bread_2019,by=c("City"))%>%
left_join(bread_2020,by=c("City"))%>%
select(City,Loaf_of_White_Bread_2015,Loaf_of_White_Bread_2016, Loaf_of_White_Bread_2017,
Loaf_of_White_Bread_2018,Loaf_of_White_Bread_2019,Loaf_of_White_Bread_2020)%>%
separate(col=City, into =c('name','State','country'), sep = ", ")
# load all ACS variables
acs201519 <- load_variables(2019, "acs5", cache = T)
raw_income = get_acs(geography = "place",
variables = "B07011_001",
geometry = FALSE,
year = 2019)
## Getting data from the 2015-2019 5-year ACS
# process the data into a dataframe
national_income <- raw_income %>%
rename(median_income = estimate,
income_moe = moe)%>%
separate(col=NAME, into =c('name','State'), sep = ", ")%>%
mutate(name =str_replace(name,"village",""))%>%
mutate(name =str_replace(name,"city",""))%>%
mutate(name =str_replace(name,"City",""))%>%
mutate(name =str_replace(name,"town",""))%>%
mutate(name =str_replace(name,"city",""))%>%
mutate(name=str_trim(name,side="both"))
acs201519 <- load_variables(2019, "acs5", cache = T)
#join the median income and bread price data frame
bread_keys<-bread_2015_to_2020 %>%
left_join(state_keys, by=c("State"= "ï..Postal"))%>%
rename(Postal=State,State=State.y)%>%
select(-Postal,-country)%>%
mutate(name=str_trim(name,side="both"))
national_price<-bread_keys %>%
left_join(national_income, by=c("State","name"))%>%
filter(!is.na(median_income))%>%
filter(!is.na(Loaf_of_White_Bread_2019))%>%
mutate(Loaf_of_White_Bread_2019=round(Loaf_of_White_Bread_2019,2))
#scatter plot
max_plot <- national_price %>%
ggplot(aes(x = Loaf_of_White_Bread_2019, y = median_income)) +
geom_point()+
scale_y_continuous(labels = dollar_format(accuracy = 1))+
labs(x = "Cost of Bread(USD)", y = "Median Income",
title = "The Relationship between Cost of Bread and Median Income in 124 American Cities",
caption = "Sources: Numbeo 2019 and ACS, 2019")
max_plot
#bar chart
p<-ggplot(data=national_price, aes(x=name, y=Loaf_of_White_Bread_2019)) +
geom_col() +
# scale_y_continuous(labels = dollar_format(accuracy = 1))+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +
labs(x = "State", y = "Price of a Loaf of Bread in 2019 ($)",
caption = "Sources: Numbeo 2019 and ACS, 2019")
p
Interesting Data Points: Some of the Highest Prices were:
**Price of A Loaf of Bread in New York City:
Summary
This project was focused on tracking the prices of bread in 124 different American cities over the past 5 years. 124 was chosen to get a good representation of a diverse set of cities in the United States. The question that aims to be answered with this is how has the price of bread in the United States changed over time? How has it changed in different cities. There is some research to suggest that bread princes may be tied to political stability. There could have been a number of different ways this data could have been pulled together, due to the many dimensions of data available. One way the data could have been pulled together was through using a smaller set for data in cities, and using that to delve into the price of bread has changed over time in those cities. This project is important as the price of food may have been impacted by the COVID 19 pandemic, the climate crisis, and general inflation. The geography of where the cities are located may also affect how the data is represented, as cities such as Honolulu and Anchorage, which are more rural might have more expensive foods. The cost of living may also increase th eprice of basic goods as cities such as New York, are considered to be expensive due the the high value of land. Natural disasters may also make food such as bread more expensive due to the potential damage a tornado or hurricane can have on vital infrastructure. The data sources used were the ACS census data from 2015-2019 and the data taken from Numbeo.com. The data could be important for researchers interested in economics, particulary the relationship between the price of inelastic goods, and median income. This data could also be important for researchers studying the cost of living and food security. Some expected results from this data are that the price of bread will be higher in already expensive cities such as New York, and Los Angeles, and cheaper in smaller cities. There may also be a trend that coastal cities may be more expensive. The method for this data was taken import the data sets, and then mutate them until they are ready to be joined together. Through an analysis of this data, several things are clear. One of the facts this data reinforces is that that expensive cities tend to be on the coast, and the cost of living in places with high bread prices were already high without the effects of COVID.
The results produced by this analysis could be useful to a wide array of different researchers. It primarily could be interesting for researchers to study economics, and large scales of data such as GDP. It may also interest researchers in agriculture, who are interested in better understanding the price and production of wheat and grain. The prices of bread may be wanted by researchers, activists, planners, and designers who work to fight food insecurity. This data could be analyzed further to break down how geography could be one of the factors that affects the price of bread. The data could also be broken down further to better research how the climate crisis is affecting the price of bread. This data does not go into detail on that, but it could provide a crucial piece of information. Lastly, this data could be valuable to political scientists, and sociologists who are interested in political stability. This is because the price of bread, and grain has been tied to be a major factor in political upheavals, with the Arab Spring in 2011 being a major modern example. Much of the Middle East’s grain is imported from Russia, which was suffering from a drought the year prior.
The next steps for this data would be to * -Dive deeper into the reason for price disparities: This would be done to investigate whether if the reason for higher bread prices is just because of a higher cost of living for another, or several reasons.
-Find new ways to visualize the data set: Using different types of charts to visualize the data may provide insights that would be different to identify with a bar or scatter plot.
-Analyze other types of bread, food, and foodstuffs:Bread is just one food to be analyzed. Fruits or meat price may also provide data to create a picture of the state of food economics in the United States.
-Create a scatter plot, and a bar chart for other cities: Uisng more cities than just what what used on this project may provide a larger accumulation of data over time to get a better scale of bread prices over time.
Discussion of results:
Bar Chart Analysis Through an analysis of the bar chart it can be seen that some of the most expensive cities to buy a loaf of bread in are New York, San Diego, Sacramento, and Syracuse. The prices range from around $3.50 to $4.00 and above. Some of the cities where the price of bread was the cheapest were Buffalo, Las Vegas, Ann Arbor, and Grand Rapids. The prices in these cities ranged from $1.75 to $2.00 The main trend that can be seen from this is that cities on the coast are more expensive, with cities like New York already being notoriously expensive. Cities in the Rust Belt, and Southwest are cheaper. Most of the cities fell between $2.00 and $3.50. This may indicate that bread prices are relatively the same across the country, but slight cost of living standards may be responsible for the slight differences in prices.
Scatter plot analysis:
An analysis of the scatter plot shows the general trend in the prices of bread. For the most part, the price of bread is spread evenly across the cities, with many of the bread prices in various cities falling between $2.50 and $3.00 as well as $3.00 and $3.50. There are some outliers: Naples, Florida with average median income of $50,548, Irvine, California, and Fairfax, Virginia also had average median incomes of over $50,000.New York’s median income is $32,320, which is surprisingly low compared to other cities. One of those is median income. This conceptualization of this project and the expected results raise some questions. Initially, I was interested in tracing any connection between the COVID-19 pandemic and the price of bread. With the results of this project, bread prices generally increased but occasionally decreased, as stated above with New York. This may be contributed to general inflation, more data is needed to reach a decisive concl
Methods Appendix:
The data set I used for this research was taken from Numbeo.com. Numbeo is an open source of data which features information about cost of living. The data set I used from Numbeo was based on the cost of living in hundreds of cities across the globe. The data first started in 2010, and only had around 20-30 cities. By 2020, Numbeo had data from bread prices from hundreds of cities around the world. I chose 124 cities, because those were the cities available for the years of 2015-2020. The second set of data was the American Community Survey taken from the census in 2019, and it was the median income The Numbeo data came in the form of an excel spreadsheet, while the ACS data came in the form of a CSV file. Some assumptions that may underlie my research is that the price of bread can be property represented by Numbeo, and that other factors such as taxes and price gouging do not apply. Another assumption to my data is that the data is an average representation from all of the supermarkets/grocery stores, and not just a price from a single store, as a place such as Whole Foods will have more expensive food than Walmart for example.
The methodology of this project was to join several data sets together to synthesize charts. The dataset chosen was the Numbeo: Cost of living from 2015 to 2020. This data set was chosen because the 2015 data set had over 100 cities and those year had enough data to make several graphs. The data set was imported to excel, and mutate to be joinable. The median income data was also imported. The excel data was joined together, and had the column titles changed to match the median income data. The data was joined by the shared columns, which was the name state, and the name of the city. Each city now has a median price of bread for 2015 to 2020, and the median income for that city. Once joined, the data was used to make graphs, which the were modified so any cities that did not have data were excluded. The most time consuming part of this process was completing several joins.
Data used:
ACS: https://www.census.gov/programs-surveys/acs/news/updates/2019.html
Conclusion
Overall, this data represents data taken over a four year period, taken from over 120 American cities. Initially this research was conducted with intention to connect the effects of the COVID-19 pandemic to the price of bread. Bread was chosen because it is considered to be a relatively inelastic good, and is a part of the idea of a "market basket ’’ good. The data showed that bread in coastal cities such as New York, and San Diego was overall more expensive than cities closer to the Hinterland such as Ann Arbor and Las Vegas. The data also showed a trend in median income, and price of bread. Most bread prices fell between a particular range, being $2.50 and $3.00 dollars. Three major cities were outliers when it came to the price of bread. Those cities were San Diego, New York, and San Francisco. Those cities are already notoriously expensive. Through the analysis of the data, it can be seen that the price of bread may not have been affected by COVID because the data used is from 2019, but is affected by the cost of living in expensive areas.