Does Crop Yield Correlate to Meat Consumption in the United States?

by Will Outcault

The following document will attempt to extract incite on how the amount of crops yielded in the U.S. effects overall meat consumption.

The data was found from “https://data.oecd.org/agroutput/crop-production.htm”.

Download Packages

The following packages are necessary in order for the data visualization used in this document.

require(ggplot2)
## Loading required package: ggplot2
require(gridExtra)
## Loading required package: gridExtra
require(corrplot)
## Loading required package: corrplot
## corrplot 0.84 loaded

Obtaining the CSV Files.

The two csv files are read, the column names are changed, and a summary of the two files are created. The summary includes features that do not provide incite but still this is a good look at the raw data.

crops <- read.csv('https://raw.githubusercontent.com/wco1216/crop-yield/master/CROPS.csv', TRUE, ",")
meats <- read.csv('https://raw.githubusercontent.com/wco1216/meat-consumption/master/MEATS.csv', TRUE, ",")

colnames(crops) <- c("Location", "Indicator", "Subject", "Measure", "Frequency", "Time", "Crop Yield", "Falg Codes")
colnames(meats) <- c("Location", "Indicator", "Subject", "Measure", "Frequency", "Time", "Meat Consumption", "Falg Codes")

summary(crops)
##     Location         Indicator        Subject           Measure    
##  ARG    :  456   CROPYIELD:16240   MAIZE  :4101   THND_HA   :5343  
##  AUS    :  456                     RICE   :3907   THND_TONNE:5624  
##  BRA    :  456                     SOYBEAN:4021   TONNE_HA  :5273  
##  BRICS  :  456                     WHEAT  :4211                    
##  CHL    :  456                                                     
##  CHN    :  456                                                     
##  (Other):13504                                                     
##  Frequency      Time        Crop Yield        Falg Codes    
##  A:16240   Min.   :1990   Min.   :      0.0   Mode:logical  
##            1st Qu.:1999   1st Qu.:      2.8   NA's:16240    
##            Median :2009   Median :     70.0                 
##            Mean   :2009   Mean   :  17452.2                 
##            3rd Qu.:2018   3rd Qu.:   3834.8                 
##            Max.   :2027   Max.   :1201712.7                 
## 
summary(meats)
##     Location          Indicator        Subject           Measure    
##  ARG    : 304   MEATCONSUMP:11220   BEEF   :2805   KG_CAP    :5596  
##  AUS    : 304                       PIG    :2805   THND_TONNE:5624  
##  BRA    : 304                       POULTRY:2805                    
##  BRICS  : 304                       SHEEP  :2805                    
##  CAN    : 304                                                       
##  CHE    : 304                                                       
##  (Other):9396                                                       
##  Frequency      Time      Meat Consumption    Falg Codes    
##  A:11220   Min.   :1990   Min.   :     0.00   Mode:logical  
##            1st Qu.:1999   1st Qu.:     5.67   NA's:11220    
##            Median :2009   Median :    25.10                 
##            Mean   :2009   Mean   :  2454.60                 
##            3rd Qu.:2018   3rd Qu.:   462.12                 
##            Max.   :2027   Max.   :138920.59                 
## 

We know that the years span from 1990 through 2027, also that each subject was counted almost the same for the crops. Each subject was counted exactly the same for the meat. The range is very broad for both crop yield and meat consumption, this is most likely due to the different types of measures.

Create Data Frames Containing the Features of Interest.

We are only interested in crop yield and meat consumption in the U.S., also, only one unit of measurement is necessary. We create a subset by setting conditions so that only the “USA” location, 10 kg/km, and 1 kg per person values appear in the UScrops and USmeats subset.

UScrops <- data.frame(subset(crops, Location=="USA"))
USmeats <- data.frame(subset(meats, Location=="USA"))
UScrops <- data.frame(subset(UScrops, UScrops$Measure=="TONNE_HA"))
USmeats <- data.frame(subset(USmeats, USmeats$Measure=="KG_CAP"))

UScrops[1:2] <- NULL
UScrops[2:3] <- NULL
UScrops$Falg.Codes <- NULL
USmeats[1:2] <- NULL
USmeats[2:3] <- NULL
USmeats$Falg.Codes <- NULL

summary(UScrops)
##     Subject        Time        Crop.Yield    
##  MAIZE  :38   Min.   :1990   Min.   : 2.195  
##  RICE   :38   1st Qu.:1999   1st Qu.: 2.918  
##  SOYBEAN:38   Median :2008   Median : 3.919  
##  WHEAT  :38   Mean   :2008   Mean   : 5.208  
##               3rd Qu.:2018   3rd Qu.: 6.319  
##               Max.   :2027   Max.   :12.092
summary(USmeats)
##     Subject        Time      Meat.Consumption 
##  BEEF   :38   Min.   :1990   Min.   : 0.3784  
##  PIG    :38   1st Qu.:1999   1st Qu.:15.8291  
##  POULTRY:38   Median :2008   Median :24.5194  
##  SHEEP  :38   Mean   :2008   Mean   :24.3098  
##               3rd Qu.:2018   3rd Qu.:32.0311  
##               Max.   :2027   Max.   :49.8942

Create a Scatter Plot of Crop Yield and Meat Consumption over Time.

The package ggplot2 was used to create the two plots. The years range from 1990 to 2027 (projected values), also the crop yield and meat consumption values were 10 kg/km, and 1 kg/capita. We are looking for meat consumption to either decrease or increase in accordance to crop yield.

p1 <- ggplot(UScrops, aes(x=Time, y=Crop.Yield)) + 
  theme_bw() +
  geom_point(aes(col=Subject)) +
  labs(x = "Year",
       y = "Crop Yield",
       title = "Crop Yield Over Time")
p2 <- ggplot(USmeats, aes(x=Time, y=Meat.Consumption)) + 
  theme_bw() +
  geom_point(aes(col=Subject)) +
  labs(x = "Year",
       y = "Meat Consumption",
       title = "Meat Consumption Over Time")
grid.arrange(p1, p2, ncol=2)

We notice a steady increase in crop yield for maize, rice, soybean and wheat. Adversely beef and sheep consumption seem to be a subtle decline. Poultry consumption continues to grow, similar to that of maize. The few years following 2010 show drastic drops in overall crop yield and overall meat consumption. Did a decrease in crop yield lead to a decrease in meat consumption? If that is true, would an increase in crop yield lead to an increase in meat consumption? There are other factors that lead to meat consumoption which need to be taken into consideration before a decisive answer can be made. In addition, correlation does not mean causation however, these changes are noted and taken into consideration during further analysis.

Crop Yield and Meat Consumption Box Plots

Two box plots were created to better understand the distributions of crop yield and meat consumption.

p1 <- ggplot(UScrops, aes(Subject, Crop.Yield))+ 
  geom_boxplot() + 
    labs(x = "Subject",
       y = "Crop Yield",
       title = "Crop Yield Distribution")
p2 <- ggplot(USmeats, aes(Subject, Meat.Consumption))+ 
  geom_boxplot() +
    labs(x = "Subject",
       y = "Meat Consumption",
       title = "Meat Consumption Distribution")
grid.arrange(p1, p2, ncol=2)

The distribution is large when considering maize and poultry. It makes sense as they are vastly more popular in the United States. Sheep have an extremely small distribution as did pig, although there was still significant pig consumption. Pig consumption seems to be very consistent in the US. Un fortunately I am not able to get much feedback on correlation from this particular boxplot.

Histograms of both crop yield and meat consumption do not give insite to correlation either, however, they do provide insite as to which crop or meat is more popular in the US. This should coincide with the assumption made earlier that maize and poultry are more popular in the United States.

p1 <- ggplot(UScrops, aes(Crop.Yield)) +
  geom_histogram(aes(fill=Subject), binwidth = .4) 
p2 <- ggplot(USmeats, aes(Meat.Consumption)) +
  geom_histogram(aes(fill=Subject), binwidth = 1.5)
grid.arrange(p1, p2, ncol=2)

From the data we can infer that maize and poultry are the two most popular subjects in the US. Adversely, sheep and soybean seem to be much less popular in the US.

Another plot from ggplots that allows visualization, perhaps more clear than a histogram, is a density plot. By setting alpha equal to 0.5 we can see the overlap between subject. Originally in the meat consumption plot, sheep was included but this drastically affected the plot making it difficult to interpret. To acquire a better visualization the sheep data were removed.

USmeats2 <- data.frame(subset(USmeats, USmeats$Subject!="SHEEP"))
p1 <- ggplot(UScrops, aes(x = Crop.Yield, fill = Subject)) +
  theme_bw() +
  geom_density(alpha = 0.5) 
p2 <- ggplot(USmeats2, aes(x = Meat.Consumption, fill = Subject)) +
  theme_bw() +
  geom_density(alpha = 0.5)
grid.arrange(p1, p2, ncol=2)

This density plot shows similar information to that of the histogram, however, this plot is easier to interpret because of the opacity. I also believe the smooth lines allow for a better looking plot and some more incite.

Lastly we made a correlation table to show how each variable changed with one another. In order to do so a subset was created for each subject. These subjects were then added to a new data frame, which had the subject name as the column header and crop yield/meat consumption values in each row. Then a correlation table was created using this new data frame and package referenced earlier.

crops_maize <- subset(UScrops, UScrops$Subject == "MAIZE")
crops_rice <- subset(UScrops, UScrops$Subject == "RICE")
crops_soybean <- subset(UScrops, UScrops$Subject == "SOYBEAN")
crops_wheat <- subset(UScrops, UScrops$Subject == "WHEAT")
meats_beef <- subset(USmeats, USmeats$Subject == "BEEF")
meats_pig <- subset(USmeats, USmeats$Subject == "PIG")
meats_poultry <- subset(USmeats, USmeats$Subject == "POULTRY")
meats_sheep <- subset(USmeats, USmeats$Subject == "SHEEP")

df <- cbind(maize = crops_maize$Crop.Yield, rice = crops_rice$Crop.Yield, soybean = crops_soybean$Crop.Yield, wheat =crops_wheat$Crop.Yield, beef = meats_beef$Meat.Consumption, pig = meats_pig$Meat.Consumption, poultry = meats_poultry$Meat.Consumption, sheep = meats_sheep$Meat.Consumption)
dfcorr <- cor(df)
corrplot.mixed(dfcorr, upper="number", lower="color", order="hclust")

The correlations are clustered based on how strongly positive or negative they are. We see a strong positive correlation between all crops and poultry, mainly poultry and rice. Adversely we see negative correlations between each crop with sheep and beef. The one exception of strong correlation is pig, which seems to be independent of other subjects.

Conclusion

Based on this data alone and the scatter plot, we assume that a decrease in overall crop yield will cause a decrease in overall meat consumption. According to the correlation table crop yield would increase poultry consumption. Adversely, crop yield would decrease beef and sheep consumption. The one exception is bacon, which seems to be consumed independent of crop yield. If the data were available, price in meat should be considered each year. Also if instead of the yearly meat consumption and crop yield data, it would help to have monthly meat consumption and monthly crop yield for more precise analysis.