R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

Numeric summary for 10 columns of data

library(ggplot2)
eggproduction  <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-04-11/egg-production.csv')
## Rows: 220 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): prod_type, prod_process, source
## dbl  (2): n_hens, n_eggs
## date (1): observed_month
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
cagefreepercentages <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-04-11/cage-free-percentages.csv')
## Rows: 96 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): source
## dbl  (2): percent_hens, percent_eggs
## date (1): observed_month
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
install.packages("ggplot2",repos = "http://cran.us.r-project.org")
## Warning: package 'ggplot2' is in use and will not be installed
#eggproduction |> filter(n_hens < 350000000)
summary(eggproduction)
##  observed_month        prod_type         prod_process           n_hens         
##  Min.   :2016-07-31   Length:220         Length:220         Min.   : 13500000  
##  1st Qu.:2017-09-30   Class :character   Class :character   1st Qu.: 17284500  
##  Median :2018-11-15   Mode  :character   Mode  :character   Median : 59939500  
##  Mean   :2018-11-14                                         Mean   :110839873  
##  3rd Qu.:2019-12-31                                         3rd Qu.:125539250  
##  Max.   :2021-02-28                                         Max.   :341166000  
##      n_eggs             source         
##  Min.   :2.981e+08   Length:220        
##  1st Qu.:4.240e+08   Class :character  
##  Median :1.155e+09   Mode  :character  
##  Mean   :2.607e+09                     
##  3rd Qu.:2.963e+09                     
##  Max.   :8.601e+09
summary(cagefreepercentages)
##  observed_month        percent_hens    percent_eggs       source         
##  Min.   :2007-12-31   Min.   : 3.20   Min.   : 9.557   Length:96         
##  1st Qu.:2017-05-23   1st Qu.:13.46   1st Qu.:14.521   Class :character  
##  Median :2018-11-15   Median :17.30   Median :16.235   Mode  :character  
##  Mean   :2018-05-12   Mean   :17.95   Mean   :17.095                     
##  3rd Qu.:2020-02-28   3rd Qu.:23.46   3rd Qu.:19.460                     
##  Max.   :2021-02-28   Max.   :29.20   Max.   :24.546                     
##                                       NA's   :42

Questions

  1. Are organic table eggs produced less than nonorganic eggs?

  2. Do farmers need more hens to produce the same amount of orgnaic eggs than nonorganic eggs?

  3. Is the ratio for hens to eggs similar depending on the kind of egg?

  4. When does egg production spike during the year? When does it go down?

  5. We have data from 2016-2020. Are there any unusual patterns that arise around the time COVID globally broke out?

Aggregate function

aggregate(n_eggs ~ prod_type, data = eggproduction, FUN = mean)
##       prod_type     n_eggs
## 1 hatching eggs 1168747273
## 2    table eggs 3085974349

Visual summary

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#ggplot(eggproduction, aes(x = n_hens)) + geom_histogram(binwidth = 200)

eggproduction |> 
  group_by(prod_process) |> 
  ggplot() + geom_boxplot(mapping = aes(x = prod_process, y = n_eggs))