The following document will attempt to extract incite on how the amount of crops yielded in the U.S. effects overall meat consumption.
The data was found from “https://data.oecd.org/agroutput/crop-production.htm”.
The following packages are necessary in order for the data visualization used in this document.
require(ggplot2)
## Loading required package: ggplot2
require(gridExtra)
## Loading required package: gridExtra
require(corrplot)
## Loading required package: corrplot
## corrplot 0.84 loaded
The two csv files are read, the column names are changed, and a summary of the two files are created. The summary includes features that do not provide incite but still this is a good look at the raw data.
crops <- read.csv('https://raw.githubusercontent.com/wco1216/crop-yield/master/CROPS.csv', TRUE, ",")
meats <- read.csv('https://raw.githubusercontent.com/wco1216/meat-consumption/master/MEATS.csv', TRUE, ",")
colnames(crops) <- c("Location", "Indicator", "Subject", "Measure", "Frequency", "Time", "Crop Yield", "Falg Codes")
colnames(meats) <- c("Location", "Indicator", "Subject", "Measure", "Frequency", "Time", "Meat Consumption", "Falg Codes")
summary(crops)
## Location Indicator Subject Measure
## ARG : 456 CROPYIELD:16240 MAIZE :4101 THND_HA :5343
## AUS : 456 RICE :3907 THND_TONNE:5624
## BRA : 456 SOYBEAN:4021 TONNE_HA :5273
## BRICS : 456 WHEAT :4211
## CHL : 456
## CHN : 456
## (Other):13504
## Frequency Time Crop Yield Falg Codes
## A:16240 Min. :1990 Min. : 0.0 Mode:logical
## 1st Qu.:1999 1st Qu.: 2.8 NA's:16240
## Median :2009 Median : 70.0
## Mean :2009 Mean : 17452.2
## 3rd Qu.:2018 3rd Qu.: 3834.8
## Max. :2027 Max. :1201712.7
##
summary(meats)
## Location Indicator Subject Measure
## ARG : 304 MEATCONSUMP:11220 BEEF :2805 KG_CAP :5596
## AUS : 304 PIG :2805 THND_TONNE:5624
## BRA : 304 POULTRY:2805
## BRICS : 304 SHEEP :2805
## CAN : 304
## CHE : 304
## (Other):9396
## Frequency Time Meat Consumption Falg Codes
## A:11220 Min. :1990 Min. : 0.00 Mode:logical
## 1st Qu.:1999 1st Qu.: 5.67 NA's:11220
## Median :2009 Median : 25.10
## Mean :2009 Mean : 2454.60
## 3rd Qu.:2018 3rd Qu.: 462.12
## Max. :2027 Max. :138920.59
##
We know that the years span from 1990 through 2027, also that each subject was counted almost the same for the crops. Each subject was counted exactly the same for the meat. The range is very broad for both crop yield and meat consumption, this is most likely due to the different types of measures.
We are only interested in crop yield and meat consumption in the U.S., also, only one unit of measurement is necessary. We create a subset by setting conditions so that only the “USA” location, 10 kg/km, and 1 kg per person values appear in the UScrops and USmeats subset.
UScrops <- data.frame(subset(crops, Location=="USA"))
USmeats <- data.frame(subset(meats, Location=="USA"))
UScrops <- data.frame(subset(UScrops, UScrops$Measure=="TONNE_HA"))
USmeats <- data.frame(subset(USmeats, USmeats$Measure=="KG_CAP"))
UScrops[1:2] <- NULL
UScrops[2:3] <- NULL
UScrops$Falg.Codes <- NULL
USmeats[1:2] <- NULL
USmeats[2:3] <- NULL
USmeats$Falg.Codes <- NULL
summary(UScrops)
## Subject Time Crop.Yield
## MAIZE :38 Min. :1990 Min. : 2.195
## RICE :38 1st Qu.:1999 1st Qu.: 2.918
## SOYBEAN:38 Median :2008 Median : 3.919
## WHEAT :38 Mean :2008 Mean : 5.208
## 3rd Qu.:2018 3rd Qu.: 6.319
## Max. :2027 Max. :12.092
summary(USmeats)
## Subject Time Meat.Consumption
## BEEF :38 Min. :1990 Min. : 0.3784
## PIG :38 1st Qu.:1999 1st Qu.:15.8291
## POULTRY:38 Median :2008 Median :24.5194
## SHEEP :38 Mean :2008 Mean :24.3098
## 3rd Qu.:2018 3rd Qu.:32.0311
## Max. :2027 Max. :49.8942
The package ggplot2 was used to create the two plots. The years range from 1990 to 2027 (projected values), also the crop yield and meat consumption values were 10 kg/km, and 1 kg/capita. We are looking for meat consumption to either decrease or increase in accordance to crop yield.
p1 <- ggplot(UScrops, aes(x=Time, y=Crop.Yield)) +
theme_bw() +
geom_point(aes(col=Subject)) +
labs(x = "Year",
y = "Crop Yield",
title = "Crop Yield Over Time")
p2 <- ggplot(USmeats, aes(x=Time, y=Meat.Consumption)) +
theme_bw() +
geom_point(aes(col=Subject)) +
labs(x = "Year",
y = "Meat Consumption",
title = "Meat Consumption Over Time")
grid.arrange(p1, p2, ncol=2)
We notice a steady increase in crop yield for maize, rice, soybean and wheat. Adversely beef and sheep consumption seem to be a subtle decline. Poultry consumption continues to grow, similar to that of maize. The few years following 2010 show drastic drops in overall crop yield and overall meat consumption. Did a decrease in crop yield lead to a decrease in meat consumption? If that is true, would an increase in crop yield lead to an increase in meat consumption? There are other factors that lead to meat consumoption which need to be taken into consideration before a decisive answer can be made. In addition, correlation does not mean causation however, these changes are noted and taken into consideration during further analysis.
Two box plots were created to better understand the distributions of crop yield and meat consumption.
p1 <- ggplot(UScrops, aes(Subject, Crop.Yield))+
geom_boxplot() +
labs(x = "Subject",
y = "Crop Yield",
title = "Crop Yield Distribution")
p2 <- ggplot(USmeats, aes(Subject, Meat.Consumption))+
geom_boxplot() +
labs(x = "Subject",
y = "Meat Consumption",
title = "Meat Consumption Distribution")
grid.arrange(p1, p2, ncol=2)
The distribution is large when considering maize and poultry. It makes sense as they are vastly more popular in the United States. Sheep have an extremely small distribution as did pig, although there was still significant pig consumption. Pig consumption seems to be very consistent in the US. Un fortunately I am not able to get much feedback on correlation from this particular boxplot.
Histograms of both crop yield and meat consumption do not give insite to correlation either, however, they do provide insite as to which crop or meat is more popular in the US. This should coincide with the assumption made earlier that maize and poultry are more popular in the United States.
p1 <- ggplot(UScrops, aes(Crop.Yield)) +
geom_histogram(aes(fill=Subject), binwidth = .4)
p2 <- ggplot(USmeats, aes(Meat.Consumption)) +
geom_histogram(aes(fill=Subject), binwidth = 1.5)
grid.arrange(p1, p2, ncol=2)
From the data we can infer that maize and poultry are the two most popular subjects in the US. Adversely, sheep and soybean seem to be much less popular in the US.
Another plot from ggplots that allows visualization, perhaps more clear than a histogram, is a density plot. By setting alpha equal to 0.5 we can see the overlap between subject. Originally in the meat consumption plot, sheep was included but this drastically affected the plot making it difficult to interpret. To acquire a better visualization the sheep data were removed.
USmeats2 <- data.frame(subset(USmeats, USmeats$Subject!="SHEEP"))
p1 <- ggplot(UScrops, aes(x = Crop.Yield, fill = Subject)) +
theme_bw() +
geom_density(alpha = 0.5)
p2 <- ggplot(USmeats2, aes(x = Meat.Consumption, fill = Subject)) +
theme_bw() +
geom_density(alpha = 0.5)
grid.arrange(p1, p2, ncol=2)
This density plot shows similar information to that of the histogram, however, this plot is easier to interpret because of the opacity. I also believe the smooth lines allow for a better looking plot and some more incite.
Lastly we made a correlation table to show how each variable changed with one another. In order to do so a subset was created for each subject. These subjects were then added to a new data frame, which had the subject name as the column header and crop yield/meat consumption values in each row. Then a correlation table was created using this new data frame and package referenced earlier.
crops_maize <- subset(UScrops, UScrops$Subject == "MAIZE")
crops_rice <- subset(UScrops, UScrops$Subject == "RICE")
crops_soybean <- subset(UScrops, UScrops$Subject == "SOYBEAN")
crops_wheat <- subset(UScrops, UScrops$Subject == "WHEAT")
meats_beef <- subset(USmeats, USmeats$Subject == "BEEF")
meats_pig <- subset(USmeats, USmeats$Subject == "PIG")
meats_poultry <- subset(USmeats, USmeats$Subject == "POULTRY")
meats_sheep <- subset(USmeats, USmeats$Subject == "SHEEP")
df <- cbind(maize = crops_maize$Crop.Yield, rice = crops_rice$Crop.Yield, soybean = crops_soybean$Crop.Yield, wheat =crops_wheat$Crop.Yield, beef = meats_beef$Meat.Consumption, pig = meats_pig$Meat.Consumption, poultry = meats_poultry$Meat.Consumption, sheep = meats_sheep$Meat.Consumption)
dfcorr <- cor(df)
corrplot.mixed(dfcorr, upper="number", lower="color", order="hclust")
The correlations are clustered based on how strongly positive or negative they are. We see a strong positive correlation between all crops and poultry, mainly poultry and rice. Adversely we see negative correlations between each crop with sheep and beef. The one exception of strong correlation is pig, which seems to be independent of other subjects.
Based on this data alone and the scatter plot, we assume that a decrease in overall crop yield will cause a decrease in overall meat consumption. According to the correlation table crop yield would increase poultry consumption. Adversely, crop yield would decrease beef and sheep consumption. The one exception is bacon, which seems to be consumed independent of crop yield. If the data were available, price in meat should be considered each year. Also if instead of the yearly meat consumption and crop yield data, it would help to have monthly meat consumption and monthly crop yield for more precise analysis.