I found the dataset on this kaggle page.
In 2006, global concern was raised over the rapid decline in the honeybee population, an integral component to American honey agriculture. Large number of hives were lost to Colony Collapse Disorder, a phenomenon of disapperaring worker bees causing the remaining hive colony to collapse. Speculation to the cause of this disorder points to hive diseases and pesticides harming the pollinators, though no overall consensus has bee reached. 12 years later, some industries are observing recovery but the American honey industry is still largely struggling. The U.S. used to locally produce over half the honey it consumes per year. Now, honey mostly comes from overseas, with 350 of the 400 million pounds of honey consumed every year originating from imports. This dataset provides insight into honey production supply and demand in America by state from 1998 to 2012.
NASS
NASS(National Agricultural Statistics Service) is a part of United States Department of Agriculture(USDA). NASS conducts many surveys every year and prepares reports covering virtually every aspect of U.S. agriculture such as production and supplies of food and fiber, prices paid and recieived by farmers, farm labor and wages, farm finances, chemical use, and changes in the demographics of U.S. and so on…. Those are a few of what they are doing. NSAA largely is responsible for collecting data in agricultural industry. Since this organization is backed by U.S. government, those datasets are reliable and accurate.
To have access to those dataset makes huge amount of people working in the agriculture industry. However, analyzing the dataset is not only benifital for those people but also for intermediaries and consumers. Analyzin historical data can show you what has been going on and what is going to happen. So that, this analysis is for people who produce, sell and consume honey.
library(tidyverse)## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1.9000 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.4
## ✔ tidyr 0.8.0 ✔ stringr 1.3.0
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::vars() masks ggplot2::vars()
library(plotly)##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(stringr)
honey <- read.csv("honeyproduction.csv")
code <- read.csv("state_code.csv")
honey$state <- as.factor(honey$state)honey %>% str()## 'data.frame': 626 obs. of 8 variables:
## $ state : Factor w/ 44 levels "AL","AR","AZ",..: 1 3 2 4 5 6 7 8 10 11 ...
## $ numcol : num 16000 55000 53000 450000 27000 230000 75000 8000 120000 9000 ...
## $ yieldpercol: int 71 60 65 83 72 98 56 118 50 71 ...
## $ totalprod : num 1136000 3300000 3445000 37350000 1944000 ...
## $ stocks : num 159000 1485000 1688000 12326000 1594000 ...
## $ priceperlb : num 0.72 0.64 0.59 0.62 0.7 0.64 0.69 0.77 0.65 1.19 ...
## $ prodvalue : num 818000 2112000 2033000 23157000 1361000 ...
## $ year : int 1998 1998 1998 1998 1998 1998 1998 1998 1998 1998 ...
honey.year <- honey %>%
group_by(year) %>%
mutate(
numcol.year = mean(numcol),
yieldpercol.year = mean(yieldpercol),
totalprod.year = mean(totalprod),
stocks.year = mean(stocks),
priceperlb.year = mean(priceperlb),
prodvalue.year = mean(prodvalue))
honey.year <- honey.year %>% select(contains("year"))
honey.year <- honey.year %>% gather(key = "type", value = "value", -year)
label <- c(
"numcol.year" = "Honey producing colonies",
"priceperlb.year" = "Average price per pound",
"prodvalue.year" = "Value of production",
"stocks.year" = "Stocks held by producers",
"totalprod.year" = "Total production (pounds)",
"yieldpercol.year" = "Honey yield per colony"
)
honey.year %>%
ggplot(aes(x = year, y = value, group = type, color = type)) +
geom_line(show.legend = F) +
facet_wrap(~type, scales = "free", labeller = as_labeller(label)) +
geom_vline(xintercept = 2006, color = "red",
linetype = "dotted", size = 1.3) +
labs(y = "")Average price per pound & Value of production show clear positive trend over time. After 2006, price has increased rapidly, so has value of production.
honey <- merge(honey, code, by.x = "state", by.y = "State_Code")
state.production <- honey %>%
ggplot(aes(x = year, y = totalprod, color = State)) +
geom_line(show.legend = F) +
labs(title = "Hoeny Production from 1998 to 2012 by each state")
state.production %>% ggplotly()As the plot shows, North Dakota and South Dakota have been leading the Hoeny production over time. Those are the top 2 states producing honey the most.
honey %>% filter(year %in% c(1998,2012))%>%
arrange(year) %>%
ggplot(aes(x = totalprod, y = reorder(State, totalprod))) +
geom_path(color = "red",
arrow = arrow(length = unit(1.5, "mm"),
type = "closed")) +
labs(title = "Change of Total Production from 1998 to 2012",
y = "State", x = "Production (pounds)")Although almost all the states’ production decreases, only North Dakota recorded significant increase.
us <- map_data('state')##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
us$region <- str_to_title(us$region)
us <- fortify(us)
ggplot() +
geom_map(data=us, map=us,
aes(x=long, y=lat, group=group, map_id=region),
fill="white", colour="black") +
geom_map(data = honey %>% filter(year == 2012),
map=us,
aes(fill=totalprod, map_id=State),
colour="black") +
scale_fill_continuous(high="red", low="yellow", guide="colorbar") +
labs(title = "Honey Production in 2012")## Warning: Ignoring unknown aesthetics: x, y