The only difference between this publication and the first publication is the inclusion of full datasets within the published page…
I’ve used the online database Data.gov to locate a data file for grain-based food production. I intend to look into poultry and hog food production to see if there is any kind of correlation between production between one and the other.
I’ve decided on hog vs. poultry because I understand more about cattle-based food production, so while learning about something different, I wanted to see if there was any relation between the often seen as healthier poultry vs the less health-conscious pork.
The null hypothesis is that there is no relationship between the two variables.
The alternative hypothesis is that there is a correlation (I suspect a mild negative correlation) between pork consumption and poultry consumption, using grain-based feed production as a metric.
Not considered are changes in hog and poultry diets away from grain-based diets over the years. All observations not marked specifically for hog (Hogs, Hog-corn) and poultry (Poultry, Turkey-feed), including market egg-feed (as eggs are not included in this examination) have been excluded.
Loading libraries:
library(DataEditR)
## Warning: package 'DataEditR' was built under R version 4.3.2
library(AgroR)
## Warning: package 'AgroR' was built under R version 4.3.2
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::desc() masks AgroR::desc()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
## Warning: package 'janitor' was built under R version 4.3.2
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(data.table)
##
## Attaching package: 'data.table'
##
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
##
## The following objects are masked from 'package:dplyr':
##
## between, first, last
##
## The following object is masked from 'package:purrr':
##
## transpose
library(ggplot2)
library(cowplot)
## Warning: package 'cowplot' was built under R version 4.3.2
##
## Attaching package: 'cowplot'
##
## The following object is masked from 'package:lubridate':
##
## stamp
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.3.2
## corrplot 0.92 loaded
getwd()
## [1] "C:/Users/19255/Desktop/R/ENVS567"
setwd("C:/Users/19255/Desktop/R/ENVS567")
knitr::opts_knit$set(root.dir = "C:/Users/19255/Desktop/R/ENVS567")
Creating data subset for applicable observations and variables:
feedgrains <- read.csv("FeedGrains.csv")
feedgrainsfil <- feedgrains %>% select(SC_GroupCommod_Desc,
SC_Commodity_Desc,
Year_ID,
Amount)
feedgrainsfin <- subset(feedgrainsfil, SC_Commodity_Desc == "Hogs" | SC_Commodity_Desc == "Hog-corn" | SC_Commodity_Desc == "Poultry" | SC_Commodity_Desc == "Turkey-feed")
After inspection, I see no missing data (save for 1960 data for hogs, which has been excluded as Poultry and Hogs data can simply start in 1961), I see categorical data (Hogs, Hog-corn, Turkey-feed, Poultry), as well as applicable production amount and year.
I’ve heard many great things about the DataEditR, so as I intend to use that later in the month for my own research, I want to edit the observations (SC_Commodity_Desc) within that package’s functionalities to group hog and poultry observations into two distinct groupings.
I wanted to ensure all modifications were done within the R program and with R package features.
Changes made and the new modified datafile has been generated.
feedgrainsbyanimal <- read.csv("20231211-data.csv")
I accidentally included one too many variables, and will remove one additional column.
feedanimal <- feedgrainsbyanimal %>% select(SC_Commodity_Desc,
Year_ID,
Amount)
Now to sum amount by species and year…
aggregate(feedanimal$Amount, list(feedanimal$SC_Commodity_Desc), FUN = sum)
## Group.1 x
## 1 Hogs 17416.74
## 2 Poultry 10052.64
agrelist <- feedanimal %>% group_by(SC_Commodity_Desc, Year_ID) %>% summarise_at(vars(Amount), list(name = sum))
agrelistdata <- read.csv("agrelist-data.csv")
And finally have a dataset more appropriate for a correlation test…
cor.test(agrelistdata$HogsAmount, agrelistdata$PoultryAmount, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: agrelistdata$HogsAmount and agrelistdata$PoultryAmount
## t = 3.6414, df = 61, p-value = 0.0005607
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1952379 0.6067979
## sample estimates:
## cor
## 0.4225622
cor.test(agrelistdata$HogsAmount, agrelistdata$PoultryAmount, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: agrelistdata$HogsAmount and agrelistdata$PoultryAmount
## S = 24874, p-value = 0.001156
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.4029858
I don’t need the spearman correlation because we’re already working with nominal data and don’t need the spearman’s rank correlation, but provided the script for purposes of showcasing the difference.
I’d like to visualize the data now.
plot(agrelistdata)
Different visualizations would be more helpful.
I’ll start with a scatter plot of the data.
plotdata <- read.csv("agrelistplotdata.csv")
plotdata
## X Category Year Amount
## 1 1 Hogs 1961 215.92500
## 2 2 Hogs 1962 211.71667
## 3 3 Hogs 1963 176.60000
## 4 4 Hogs 1964 174.10000
## 5 5 Hogs 1965 238.68333
## 6 6 Hogs 1966 240.93333
## 7 7 Hogs 1967 213.29167
## 8 8 Hogs 1968 234.54167
## 9 9 Hogs 1969 265.50833
## 10 10 Hogs 1970 229.37500
## 11 11 Hogs 1971 195.45000
## 12 12 Hogs 1972 290.00833
## 13 13 Hogs 1973 270.32500
## 14 14 Hogs 1974 156.26667
## 15 15 Hogs 1975 282.55147
## 16 16 Hogs 1976 279.95191
## 17 17 Hogs 1977 321.03101
## 18 18 Hogs 1978 352.68759
## 19 19 Hogs 1979 294.28514
## 20 20 Hogs 1980 249.74661
## 21 21 Hogs 1981 259.85847
## 22 22 Hogs 1982 354.69159
## 23 23 Hogs 1983 261.64667
## 24 24 Hogs 1984 263.06759
## 25 25 Hogs 1985 289.09917
## 26 26 Hogs 1986 418.28729
## 27 27 Hogs 1987 476.49374
## 28 28 Hogs 1988 308.72459
## 29 29 Hogs 1989 295.41312
## 30 30 Hogs 1990 352.99797
## 31 31 Hogs 1991 332.98843
## 32 32 Hogs 1992 307.74300
## 33 33 Hogs 1993 328.87605
## 34 34 Hogs 1994 278.41083
## 35 35 Hogs 1995 274.29289
## 36 36 Hogs 1996 267.30206
## 37 37 Hogs 1997 324.64280
## 38 38 Hogs 1998 259.70841
## 39 39 Hogs 1999 296.85540
## 40 40 Hogs 2000 369.38980
## 41 41 Hogs 2001 365.53338
## 42 42 Hogs 2002 271.92514
## 43 43 Hogs 2003 284.12208
## 44 44 Hogs 2004 346.57343
## 45 45 Hogs 2005 394.80667
## 46 46 Hogs 2006 333.71062
## 47 47 Hogs 2007 251.36304
## 48 48 Hogs 2008 205.56822
## 49 49 Hogs 2009 223.15063
## 50 50 Hogs 2010 261.44102
## 51 51 Hogs 2011 219.13418
## 52 52 Hogs 2012 201.39304
## 53 53 Hogs 2013 226.33802
## 54 54 Hogs 2014 321.41598
## 55 55 Hogs 2015 268.25701
## 56 56 Hogs 2016 269.52433
## 57 57 Hogs 2017 291.56563
## 58 58 Hogs 2018 275.50652
## 59 59 Hogs 2019 270.18361
## 60 60 Hogs 2020 265.20095
## 61 61 Hogs 2021 249.93691
## 62 62 Hogs 2022 227.40416
## 63 63 Hogs 2023 179.21605
## 64 64 Poultry 1961 79.33333
## 65 65 Poultry 1962 78.45833
## 66 66 Poultry 1963 63.35833
## 67 67 Poultry 1964 62.41667
## 68 68 Poultry 1965 66.09167
## 69 69 Poultry 1966 66.05000
## 70 70 Poultry 1967 57.13333
## 71 71 Poultry 1968 59.10833
## 72 72 Poultry 1969 62.68333
## 73 73 Poultry 1970 64.47500
## 74 74 Poultry 1971 59.27500
## 75 75 Poultry 1972 58.49167
## 76 76 Poultry 1973 56.72500
## 77 77 Poultry 1974 43.65000
## 78 78 Poultry 1975 113.95368
## 79 79 Poultry 1976 110.63811
## 80 80 Poultry 1977 114.64928
## 81 81 Poultry 1978 128.95591
## 82 82 Poultry 1979 124.52081
## 83 83 Poultry 1980 118.81153
## 84 84 Poultry 1981 112.62461
## 85 85 Poultry 1982 114.46816
## 86 86 Poultry 1983 113.57206
## 87 87 Poultry 1984 160.40868
## 88 88 Poultry 1985 183.88415
## 89 89 Poultry 1986 200.41111
## 90 90 Poultry 1987 179.60588
## 91 91 Poultry 1988 161.57165
## 92 92 Poultry 1989 171.61335
## 93 93 Poultry 1990 181.56138
## 94 94 Poultry 1991 185.62150
## 95 95 Poultry 1992 186.25412
## 96 96 Poultry 1993 190.57553
## 97 97 Poultry 1994 194.66609
## 98 98 Poultry 1995 192.76547
## 99 99 Poultry 1996 183.16470
## 100 100 Poultry 1997 188.45108
## 101 101 Poultry 1998 204.97065
## 102 102 Poultry 1999 231.11765
## 103 103 Poultry 2000 233.15674
## 104 104 Poultry 2001 227.64514
## 105 105 Poultry 2002 208.72297
## 106 106 Poultry 2003 198.26515
## 107 107 Poultry 2004 205.14233
## 108 108 Poultry 2005 226.02833
## 109 109 Poultry 2006 225.03725
## 110 110 Poultry 2007 202.25385
## 111 111 Poultry 2008 181.54773
## 112 112 Poultry 2009 187.58406
## 113 113 Poultry 2010 200.10725
## 114 114 Poultry 2011 184.52545
## 115 115 Poultry 2012 182.34885
## 116 116 Poultry 2013 182.75847
## 117 117 Poultry 2014 206.97606
## 118 118 Poultry 2015 236.59438
## 119 119 Poultry 2016 245.76493
## 120 120 Poultry 2017 221.53761
## 121 121 Poultry 2018 203.73921
## 122 122 Poultry 2019 215.30127
## 123 123 Poultry 2020 230.79744
## 124 124 Poultry 2021 210.21710
## 125 125 Poultry 2022 216.95126
## 126 126 Poultry 2023 194.29605
ggplot(data = plotdata, mapping = aes(x = Year, y = Amount)) +
geom_point(aes(color = Category)) +
labs(title = "Hog and Poultry Grain-Based Feed Produced",
x = "Year",
y = "Amount of Grain-Based Feed Produced (Million Animal Units)")
And move on to the correlation plots.
corrplot(corr = cor(agrelistdata[3:5]), method = "number")
corrplot(corr = cor(agrelistdata[3:5]), method = "square")
So right off the bat, it seems that PoultryAmount (the amount of grain-based production for feeding Poultry) has a high positive correlation (0.89) with year, whereas HogsAmount (amount of grain produced for feeding hogs) has almost no correlation with year (0.12). This, however, is irrelevant to the exercise.
For the sake of relevancy, there seems to be a mild positive correlation between PoultryAmount and HogsAmount (0.42).
Based on the p-value being as low as it is, it appears to be significant (1e-03).
Better visuals are needed, though.
plot_cor(agrelistdata$HogsAmount, agrelistdata$PoultryAmount,
method = "pearson",
xlab = "Amount of Feed Produced for Hogs (million animal units)",
ylab = "Amount of Feed Produced for Poultry (million animal units)",
theme = theme_classic(),
pointsize = 2,
shape = 20,
fill = "black",
color = "darkred",
ic = TRUE,
title = "Relationship Between Hog and Poultry Grain-Based Feed Produced")
## `geom_smooth()` using formula = 'y ~ x'