Read in Data

For this challenge, I will be using poultry_tidy.xlsx

poultry_tidy <- read_xlsx("../challenge_datasets/poultry_tidy.xlsx")
head(poultry_tidy)
## # A tibble: 6 × 4
##   Product  Year Month    Price_Dollar
##   <chr>   <dbl> <chr>           <dbl>
## 1 Whole    2013 January          2.38
## 2 Whole    2013 February         2.38
## 3 Whole    2013 March            2.38
## 4 Whole    2013 April            2.38
## 5 Whole    2013 May              2.38
## 6 Whole    2013 June             2.38

Tidy

As described by the title of the dataset and what is observed from the head command, the data is already tidied.

Identify variables that need to be mutated

Observed by the header I would mutate so the dataset has a standard date format. I could mutate the column so that the dates are in a standard YYYY-MM format. The price needs to be a decimal format variable. And each column needs to be an appropriate data type. I had to look up what B/S meant in poultry terms so will be standardizing it this way: “B/S” = “Boneless Skinless”, “Breast”=“Breast Meat” to make it clearer on the data.

Further more, I would make a new column with broader categories of meat for maybe a broader analysis.

I took some time to read up on stringR library to utilize for this challenge to make things a bit easier.

months <- c("January", "February", "March", "April", "May", "June", 
            "July", "August", "September", "October", "November", "December")

poultry_tidy <- poultry_tidy %>%
  mutate(Month = match(Month, months)) %>% 
  mutate(Date = paste0(Year, "-", Month)) %>% 
  mutate(Year = as.integer(substr(Date, 1, 4)),
         Month = as.integer(substr(Date, 6, 7))) 

  poultry_tidy <- poultry_tidy %>%
    mutate(Date = paste(Year, Month, "01", sep = "-")) %>% 
    mutate(Date = ymd(Date))
poultry_tidy
## # A tibble: 600 × 5
##    Product  Year Month Price_Dollar Date      
##    <chr>   <int> <int>        <dbl> <date>    
##  1 Whole    2013     1         2.38 2013-01-01
##  2 Whole    2013     2         2.38 2013-02-01
##  3 Whole    2013     3         2.38 2013-03-01
##  4 Whole    2013     4         2.38 2013-04-01
##  5 Whole    2013     5         2.38 2013-05-01
##  6 Whole    2013     6         2.38 2013-06-01
##  7 Whole    2013     7         2.38 2013-07-01
##  8 Whole    2013     8         2.38 2013-08-01
##  9 Whole    2013     9         2.38 2013-09-01
## 10 Whole    2013    10         2.38 2013-10-01
## # ℹ 590 more rows
# Further processing
poultry_tidy$Product <- str_replace_all(poultry_tidy$Product, c("B/S" = "Boneless Skinless", "Breast"="Breast Meat"))
poultry_tidy$Price_Dollar <- as.numeric(poultry_tidy$Price_Dollar)

poultry_tidy
## # A tibble: 600 × 5
##    Product  Year Month Price_Dollar Date      
##    <chr>   <int> <int>        <dbl> <date>    
##  1 Whole    2013     1         2.38 2013-01-01
##  2 Whole    2013     2         2.38 2013-02-01
##  3 Whole    2013     3         2.38 2013-03-01
##  4 Whole    2013     4         2.38 2013-04-01
##  5 Whole    2013     5         2.38 2013-05-01
##  6 Whole    2013     6         2.38 2013-06-01
##  7 Whole    2013     7         2.38 2013-07-01
##  8 Whole    2013     8         2.38 2013-08-01
##  9 Whole    2013     9         2.38 2013-09-01
## 10 Whole    2013    10         2.38 2013-10-01
## # ℹ 590 more rows
table(poultry_tidy$Product) 
## 
##           Bone-in Breast Meat Boneless Skinless Breast Meat 
##                           120                           120 
##                        Thighs                         Whole 
##                           120                           120 
##                    Whole Legs 
##                           120

As we can observe the names have been repplaced and now there are columns with a new date table.

poultry_tidy <- poultry_tidy %>%

  mutate(Product_Group = case_when(
    str_detect(Product, "Breast") ~ "Breast",
    str_detect(Product, "Leg") | Product == "Thighs" ~ "Leg",
    TRUE ~ "Whole")) %>%

  arrange(Product, Date) %>% 
  
  group_by(Product) %>%
  
  arrange(Date) %>%

  mutate(Prev_Price = lag(Price_Dollar)) %>%
  
  ungroup() %>%

  mutate(Price_MoM_Change = Price_Dollar - Prev_Price) %>%

  arrange(Product, Date) 

table(poultry_tidy$Product_Group) 
## 
## Breast    Leg  Whole 
##    240    240    120
poultry_tidy %>%
  select(Product, Year, Month, Price_Dollar, Date, Product_Group, Prev_Price, Price_MoM_Change)
## # A tibble: 600 × 8
##    Product           Year Month Price_Dollar Date       Product_Group Prev_Price
##    <chr>            <int> <int>        <dbl> <date>     <chr>              <dbl>
##  1 Bone-in Breast …  2004     1        NA    2004-01-01 Breast             NA   
##  2 Bone-in Breast …  2004     2        NA    2004-02-01 Breast             NA   
##  3 Bone-in Breast …  2004     3        NA    2004-03-01 Breast             NA   
##  4 Bone-in Breast …  2004     4        NA    2004-04-01 Breast             NA   
##  5 Bone-in Breast …  2004     5        NA    2004-05-01 Breast             NA   
##  6 Bone-in Breast …  2004     6        NA    2004-06-01 Breast             NA   
##  7 Bone-in Breast …  2004     7         3.90 2004-07-01 Breast             NA   
##  8 Bone-in Breast …  2004     8         3.90 2004-08-01 Breast              3.90
##  9 Bone-in Breast …  2004     9         3.90 2004-09-01 Breast              3.90
## 10 Bone-in Breast …  2004    10         3.90 2004-10-01 Breast              3.90
## # ℹ 590 more rows
## # ℹ 1 more variable: Price_MoM_Change <dbl>

We can further see that I added a prev_price column to display and it was orgnaized so its in order of the product and its year-month so we can observe sequentially the changes.