This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Numeric summary for 10 columns of data
library(ggplot2)
eggproduction <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-04-11/egg-production.csv')
## Rows: 220 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): prod_type, prod_process, source
## dbl (2): n_hens, n_eggs
## date (1): observed_month
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
cagefreepercentages <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-04-11/cage-free-percentages.csv')
## Rows: 96 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): source
## dbl (2): percent_hens, percent_eggs
## date (1): observed_month
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
install.packages("ggplot2",repos = "http://cran.us.r-project.org")
## Warning: package 'ggplot2' is in use and will not be installed
#eggproduction |> filter(n_hens < 350000000)
summary(eggproduction)
## observed_month prod_type prod_process n_hens
## Min. :2016-07-31 Length:220 Length:220 Min. : 13500000
## 1st Qu.:2017-09-30 Class :character Class :character 1st Qu.: 17284500
## Median :2018-11-15 Mode :character Mode :character Median : 59939500
## Mean :2018-11-14 Mean :110839873
## 3rd Qu.:2019-12-31 3rd Qu.:125539250
## Max. :2021-02-28 Max. :341166000
## n_eggs source
## Min. :2.981e+08 Length:220
## 1st Qu.:4.240e+08 Class :character
## Median :1.155e+09 Mode :character
## Mean :2.607e+09
## 3rd Qu.:2.963e+09
## Max. :8.601e+09
summary(cagefreepercentages)
## observed_month percent_hens percent_eggs source
## Min. :2007-12-31 Min. : 3.20 Min. : 9.557 Length:96
## 1st Qu.:2017-05-23 1st Qu.:13.46 1st Qu.:14.521 Class :character
## Median :2018-11-15 Median :17.30 Median :16.235 Mode :character
## Mean :2018-05-12 Mean :17.95 Mean :17.095
## 3rd Qu.:2020-02-28 3rd Qu.:23.46 3rd Qu.:19.460
## Max. :2021-02-28 Max. :29.20 Max. :24.546
## NA's :42
Questions
Are organic table eggs produced less than nonorganic eggs?
Do farmers need more hens to produce the same amount of orgnaic eggs than nonorganic eggs?
Is the ratio for hens to eggs similar depending on the kind of egg?
When does egg production spike during the year? When does it go down?
We have data from 2016-2020. Are there any unusual patterns that arise around the time COVID globally broke out?
Aggregate function
aggregate(n_eggs ~ prod_type, data = eggproduction, FUN = mean)
## prod_type n_eggs
## 1 hatching eggs 1168747273
## 2 table eggs 3085974349
Visual summary
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#ggplot(eggproduction, aes(x = n_hens)) + geom_histogram(binwidth = 200)
eggproduction |>
group_by(prod_process) |>
ggplot() + geom_boxplot(mapping = aes(x = prod_process, y = n_eggs))