Challenge 3: Organic Egg Poultry Excel

library(here)
library(dplyr)
library(readxl)
library(tidyr)
library(tidyverse)
library(tibble)

First open up excel to preview the data. (I cannot seem to pull in an image into R otherwise I would to show what it looks like in excel)

Cleaning up the data

I see column A and G are blank columns. Also the four rows are not necessary to pull in. I will use the range function to pull in just from ‘B5:L125’ and name the columns with col_names. For column G that still gets pulled in, I will label that column ‘delete’ and delete it by selecting it and using !contains to delete that column.

eggPoultry <- here("challenge_datasets/organiceggpoultry.xls") %>%
  read_excel(range = "B6:L125", col_names= c("Date", "Doz Extra Large", ".5 Doz Extra Large", "Doz Large", ".5 Doz Large", "delete", "Whole Chicken", "B/S Breast", "Bone-in Breast", "Legs","Thighs"))%>%
  select(!contains("delete"))

head(eggPoultry)
## # A tibble: 6 × 10
##   Date     `Doz Extra Large` `.5 Doz Extra Large` `Doz Large` `.5 Doz Large`
##   <chr>                <dbl>                <dbl>       <dbl>          <dbl>
## 1 Jan 2004              230                  132         230            126 
## 2 February              230                  134.        226.           128.
## 3 March                 230                  137         225            131 
## 4 April                 234.                 137         225            131 
## 5 May                   236                  137         225            131 
## 6 June                  241                  137         231.           134.
## # ℹ 5 more variables: `Whole Chicken` <dbl>, `B/S Breast` <dbl>,
## #   `Bone-in Breast` <chr>, Legs <dbl>, Thighs <chr>

Clean up columns from character to numeric

There are cells with the text ‘too few’. I assume this means there is too few data to sold within poultry to be logged into this data. I assume this text in the cell is causing two columns (Bone-in Breast & Thighs) to be categorized as character rather than numeric.

I to use mutate to replace the character columns with as.numeric to change to numeric.

eggPoultry <- eggPoultry %>%
  mutate(`Bone-in Breast` = as.numeric(`Bone-in Breast`),
         Thighs = as.numeric(Thighs))
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `Bone-in Breast = as.numeric(`Bone-in Breast`)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
head(eggPoultry)
## # A tibble: 6 × 10
##   Date     `Doz Extra Large` `.5 Doz Extra Large` `Doz Large` `.5 Doz Large`
##   <chr>                <dbl>                <dbl>       <dbl>          <dbl>
## 1 Jan 2004              230                  132         230            126 
## 2 February              230                  134.        226.           128.
## 3 March                 230                  137         225            131 
## 4 April                 234.                 137         225            131 
## 5 May                   236                  137         225            131 
## 6 June                  241                  137         231.           134.
## # ℹ 5 more variables: `Whole Chicken` <dbl>, `B/S Breast` <dbl>,
## #   `Bone-in Breast` <dbl>, Legs <dbl>, Thighs <dbl>

Adding a Year column

First The data column has the year only for January, but I want a column with all the years. I use mutate to have a new column made of year to be if the str_detect finds “Jan” and then gsub can subs ‘Jan’ for blank to just get the year. To use fill to fill the year down the column.

In class, str_detect was mentioned so I dug into string functions and found a str_replace. I am sure there is a more effective way to remove the years in the date column (which contains month), but the str_replace is able to remove the occurance in the date column with nothing and I indicated all the years to be removed in reach row.

eggPoultry <- eggPoultry %>%
  mutate(year = ifelse(str_detect(Date, "Jan"), gsub("Jan", "", Date), NA)) %>% 
  fill(year, .direction = "down")
eggPoultryClean <- eggPoultry %>%
 mutate(
    Date = str_replace(Date, "2004", ""),
    Date = str_replace(Date, "2005", ""), 
    Date = str_replace(Date, "2006", ""), 
    Date = str_replace(Date, "2007", ""), 
    Date = str_replace(Date, "2008", ""), 
    Date = str_replace(Date, "2009", ""), 
    Date = str_replace(Date, "2010", ""), 
    Date = str_replace(Date, "2011", ""), 
    Date = str_replace(Date, "2012", ""), 
    Date = str_replace(Date, "2013", ""), 
    Date = str_replace(Date, "/1", ""))

head(eggPoultryClean)
## # A tibble: 6 × 11
##   Date       `Doz Extra Large` `.5 Doz Extra Large` `Doz Large` `.5 Doz Large`
##   <chr>                  <dbl>                <dbl>       <dbl>          <dbl>
## 1 "Jan "                  230                  132         230            126 
## 2 "February"              230                  134.        226.           128.
## 3 "March"                 230                  137         225            131 
## 4 "April"                 234.                 137         225            131 
## 5 "May"                   236                  137         225            131 
## 6 "June"                  241                  137         231.           134.
## # ℹ 6 more variables: `Whole Chicken` <dbl>, `B/S Breast` <dbl>,
## #   `Bone-in Breast` <dbl>, Legs <dbl>, Thighs <dbl>, year <chr>

Pivot Longer

“According to Wickham & Grolemund, there are three principles for tidy data.

Each variable must have its own column.

Each observation must have its own row.

Each value must have its own cell.”

In the eggPoultry data set, each observation does not have its own row. The goal is to make the table longer so eggs and chicken products are not both occupying a row together. To do this, I will use pivot_longer and list out all the products to be put into a column named ‘Product’ with the prices of each product going in a column named ‘Price’

eggPoultryLong <- eggPoultryClean %>% 
  pivot_longer(c("Doz Extra Large", ".5 Doz Extra Large", "Doz Large", ".5 Doz Large", "Whole Chicken", "B/S Breast", "Bone-in Breast", "Legs","Thighs"), names_to = "Product", values_to = "Price")

head(eggPoultryLong)
## # A tibble: 6 × 4
##   Date   year    Product            Price
##   <chr>  <chr>   <chr>              <dbl>
## 1 "Jan " " 2004" Doz Extra Large     230 
## 2 "Jan " " 2004" .5 Doz Extra Large  132 
## 3 "Jan " " 2004" Doz Large           230 
## 4 "Jan " " 2004" .5 Doz Large        126 
## 5 "Jan " " 2004" Whole Chicken       198.
## 6 "Jan " " 2004" B/S Breast          646.

It is missing the unit (price per carton for eggs and price cents per pound for chicken), but I am not sure how to add a column with the data when it is was not column in the original data set but noted at the top of the excel.