1 Introduction

I found the dataset on this kaggle page.

In 2006, global concern was raised over the rapid decline in the honeybee population, an integral component to American honey agriculture. Large number of hives were lost to Colony Collapse Disorder, a phenomenon of disapperaring worker bees causing the remaining hive colony to collapse. Speculation to the cause of this disorder points to hive diseases and pesticides harming the pollinators, though no overall consensus has bee reached. 12 years later, some industries are observing recovery but the American honey industry is still largely struggling. The U.S. used to locally produce over half the honey it consumes per year. Now, honey mostly comes from overseas, with 350 of the 400 million pounds of honey consumed every year originating from imports. This dataset provides insight into honey production supply and demand in America by state from 1998 to 2012.

2 Data Source

NASS

NASS(National Agricultural Statistics Service) is a part of United States Department of Agriculture(USDA). NASS conducts many surveys every year and prepares reports covering virtually every aspect of U.S. agriculture such as production and supplies of food and fiber, prices paid and recieived by farmers, farm labor and wages, farm finances, chemical use, and changes in the demographics of U.S. and so on…. Those are a few of what they are doing. NSAA largely is responsible for collecting data in agricultural industry. Since this organization is backed by U.S. government, those datasets are reliable and accurate.

3 Research Questions

4 Stake holder of this analysis

To have access to those dataset makes huge amount of people working in the agriculture industry. However, analyzing the dataset is not only benifital for those people but also for intermediaries and consumers. Analyzin historical data can show you what has been going on and what is going to happen. So that, this analysis is for people who produce, sell and consume honey.

5 Analysis

5.1 Preparation

library(tidyverse)
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1.9000     ✔ purrr   0.2.4     
## ✔ tibble  1.4.2          ✔ dplyr   0.7.4     
## ✔ tidyr   0.8.0          ✔ stringr 1.3.0     
## ✔ readr   1.1.1          ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ dplyr::vars()   masks ggplot2::vars()
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(stringr)

honey <- read.csv("honeyproduction.csv")
code <- read.csv("state_code.csv")
honey$state <- as.factor(honey$state)

5.2 Data Component

honey %>% str()
## 'data.frame':    626 obs. of  8 variables:
##  $ state      : Factor w/ 44 levels "AL","AR","AZ",..: 1 3 2 4 5 6 7 8 10 11 ...
##  $ numcol     : num  16000 55000 53000 450000 27000 230000 75000 8000 120000 9000 ...
##  $ yieldpercol: int  71 60 65 83 72 98 56 118 50 71 ...
##  $ totalprod  : num  1136000 3300000 3445000 37350000 1944000 ...
##  $ stocks     : num  159000 1485000 1688000 12326000 1594000 ...
##  $ priceperlb : num  0.72 0.64 0.59 0.62 0.7 0.64 0.69 0.77 0.65 1.19 ...
##  $ prodvalue  : num  818000 2112000 2033000 23157000 1361000 ...
##  $ year       : int  1998 1998 1998 1998 1998 1998 1998 1998 1998 1998 ...
  • state: 2-digit code standing for each state across America
  • numcol: Number of honey producing colonies. Honey producing colonies are the maximum number of colonies from which honey was taken during the year. It is possible to take honey from colonies which did not survive the entire year
  • yieldpercol: Honey yield per colony. Unit is pounds
  • totalprod: Total production (numcol x yieldpercol). Unit is pounds
  • stocks: Refers to stocks held by producers. Unit is pounds
  • priceperlb: Refers to average price per pound based on expanded sales. Unit is dollars.
  • prodvalue: Value of production (totalprod x priceperlb). Unit is dollars.

5.3 Time series Analysis

5.3.1 How has honey industry changed?

honey.year <- honey %>% 
  group_by(year) %>% 
  mutate(
    numcol.year = mean(numcol),
    yieldpercol.year = mean(yieldpercol),
    totalprod.year = mean(totalprod),
    stocks.year = mean(stocks),
    priceperlb.year = mean(priceperlb),
    prodvalue.year = mean(prodvalue))

honey.year <- honey.year %>% select(contains("year"))

honey.year <- honey.year %>% gather(key = "type", value = "value", -year)

label <- c(
  "numcol.year" = "Honey producing colonies",
  "priceperlb.year" = "Average price per pound",
  "prodvalue.year" = "Value of production",
  "stocks.year" = "Stocks held by producers",
  "totalprod.year" = "Total production (pounds)",
  "yieldpercol.year" = "Honey yield per colony"
)

honey.year %>% 
  ggplot(aes(x = year, y = value, group = type, color = type)) + 
  geom_line(show.legend = F) + 
  facet_wrap(~type, scales = "free", labeller = as_labeller(label)) + 
  geom_vline(xintercept = 2006, color = "red", 
             linetype = "dotted", size = 1.3) + 
  labs(y = "")

Average price per pound & Value of production show clear positive trend over time. After 2006, price has increased rapidly, so has value of production.

5.4 Which state produce most and least

honey <- merge(honey, code, by.x = "state", by.y = "State_Code")

state.production <- honey %>% 
  ggplot(aes(x = year, y = totalprod, color = State)) + 
  geom_line(show.legend = F) + 
  labs(title = "Hoeny Production from 1998 to 2012 by each state")
state.production %>% ggplotly()

As the plot shows, North Dakota and South Dakota have been leading the Hoeny production over time. Those are the top 2 states producing honey the most.

5.4.1 Production change from 1998 to 20012

honey %>% filter(year %in% c(1998,2012))%>% 
  arrange(year) %>% 
  ggplot(aes(x = totalprod, y = reorder(State, totalprod))) + 
  geom_path(color = "red", 
            arrow = arrow(length = unit(1.5, "mm"), 
                          type = "closed")) +
  labs(title = "Change of Total Production from 1998 to 2012", 
       y = "State", x = "Production (pounds)")

Although almost all the states’ production decreases, only North Dakota recorded significant increase.

5.4.2 U.S. map

us <- map_data('state')
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
us$region <- str_to_title(us$region)
us <- fortify(us)

ggplot() + 
  geom_map(data=us, map=us,
           aes(x=long, y=lat, group=group, map_id=region),
           fill="white", colour="black") +
  geom_map(data = honey %>% filter(year == 2012), 
           map=us,
           aes(fill=totalprod, map_id=State),
           colour="black") +
  scale_fill_continuous(high="red", low="yellow", guide="colorbar") +
  labs(title = "Honey Production in 2012")
## Warning: Ignoring unknown aesthetics: x, y