Project_2_Dataset_3_Fox

Author

Amanda Fox

Published

March 3, 2024

2. Shift between “consumed where produced” vs. available for sale from 1909 to present

The next analysis was performed on the relationship between milk consumed at the point of production vs. available for sale.

df_avail_con <- filter(df_tidy, category %in% c("tot_consumed_on_site","tot_bev_sales"))

# Display data table: create a wide tidy format, one obs = one year, to calc % trends

df_avail_sum <- filter(df_avail_con, yr %in% c(1920,1940,1960,1980,2000,2020)) %>% 
  pivot_wider(names_from = category,values_from = gallons) %>% 
  mutate(percent_consumed_on_site = tot_consumed_on_site/(tot_consumed_on_site + tot_bev_sales))

df_avail_sum
# A tibble: 6 × 4
     yr tot_bev_sales tot_consumed_on_site percent_consumed_on_site
  <dbl>         <dbl>                <dbl>                    <dbl>
1  1920          18.1               15.9                    0.468  
2  1940          21                 13                      0.382  
3  1960          30.7                3.2                    0.0944 
4  1980          27.1                0.5                    0.0181 
5  2000          22.7                0.100                  0.00439
6  2020          16.3                0                      0      
# Create plot: long tidy dataframe, one obs = one year/category pair

ggplot(df_avail_con, aes(x = yr, y = gallons, fill = category)) +
  geom_bar(stat="identity") +
  scale_fill_brewer(palette = "Paired", name = "Category", labels = c("Total Available for Sale","Consumed Where Produced")) +
  scale_x_continuous(breaks = breaks_width(10))+
  scale_y_continuous(n.breaks=20, labels = scales::label_comma()) +
  xlab("Year") +
  ylab("Total Gallons Per Capita") +
  ggtitle("Total Gallons Per Capita: Liquid Milk Availability by Location Consumed")

Note that in 1920, almost half (46.8%) of liquid milk products was consumed at the point of production, which fell to less than 10% by 1960 and was no longer reported in 2020.

Interestingly, the total gallons per capital available for sale in 2020 was less than that in 1920, when 46.8% of the milk produced was consumed at point of production and excluded from this metric.

3. Estimated consumption per capita trend

As mentioned in the introduction, the USDA provides broad estimates of post-production (retail and consumer) waste, in order to estimate loss-adjusted availability, a proxy for what we might call actual consumption: https://www.ers.usda.gov/webdocs/DataFiles/50472/Dairy.xlsx?v=2937.9

There are multiple caveats to this estimate, as this data is complex and rarely tracked (when was the last time you weighed your food waste?). Extensive documentation is available here: https://www.ers.usda.gov/data-products/food-availability-per-capita-data-system/loss-adjusted-food-availability-documentation/#usefulness

This data is hard to obtain but quite valuable in understanding not only patterns of waste but also patterns in actual American nutrition compared to recommendations or to historical patterns (example study summary: https://www.ers.usda.gov/webdocs/publications/82220/eib166%20summary.pdf?v=4270.2).

By applying the USDA retail + consumer waste estimates, we find the following trend:

df_loss_pct <- read_csv("https://raw.githubusercontent.com/AmandaSFox/DATA607/main/project_2/Dataset_3_Milk/loss_pct.csv")
Rows: 6 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): yr, pct_loss

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_avail_adj <- filter(df_avail_con, yr %in% c(1970,1980,1990,2000,2010,2020)) %>% 
  pivot_wider(names_from = category,values_from = gallons) %>%  
  mutate(tot_avail = tot_consumed_on_site + tot_bev_sales) %>% 
  left_join(df_loss_pct) %>% 
  mutate(pct_loss = pct_loss/100, loss_adjusted_avail = (1-pct_loss) * tot_avail) %>% 
  select(yr,tot_avail,pct_loss,loss_adjusted_avail)
Joining with `by = join_by(yr)`
df_avail_adj 
# A tibble: 6 × 4
     yr tot_avail pct_loss loss_adjusted_avail
  <dbl>     <dbl>    <dbl>               <dbl>
1  1970      31.3    0.303                21.8
2  1980      27.6    0.305                19.2
3  1990      25.7    0.306                17.8
4  2000      22.8    0.311                15.7
5  2010      20.6    0.315                14.1
6  2020      16.3    0.314                11.2
# Create plot: 

ggplot(df_avail_adj, aes(x = yr, y = loss_adjusted_avail)) +
  geom_bar(stat="identity", fill = "lightblue") +
  scale_x_continuous(breaks = breaks_width(10))+
  scale_y_continuous(n.breaks=10, labels = scales::label_comma()) +
  xlab("Year") +
  ylab("Total Gallons Per Capita") +
  ggtitle("Total Gallons Per Capita: Liquid Milk Estimated Consumption (Loss-Adjusted Availability)")

While the waste estimate went up slightly between 1970 and 2020, this chart closely tracks the previously-seen aggregate trends. However, it does add relatable context: Americans are now drinking less than one gallon per person per month, whereas in 1970, we were drinking nearly two gallons per person per month.