For this challenge, I will be using poultry_tidy.xlsx
poultry_tidy <- read_xlsx("../challenge_datasets/poultry_tidy.xlsx")
head(poultry_tidy)
## # A tibble: 6 × 4
## Product Year Month Price_Dollar
## <chr> <dbl> <chr> <dbl>
## 1 Whole 2013 January 2.38
## 2 Whole 2013 February 2.38
## 3 Whole 2013 March 2.38
## 4 Whole 2013 April 2.38
## 5 Whole 2013 May 2.38
## 6 Whole 2013 June 2.38
As described by the title of the dataset and what is observed from the head command, the data is already tidied.
Observed by the header I would mutate so the dataset has a standard date format. I could mutate the column so that the dates are in a standard YYYY-MM format. The price needs to be a decimal format variable. And each column needs to be an appropriate data type. I had to look up what B/S meant in poultry terms so will be standardizing it this way: “B/S” = “Boneless Skinless”, “Breast”=“Breast Meat” to make it clearer on the data.
Further more, I would make a new column with broader categories of meat for maybe a broader analysis.
I took some time to read up on stringR library to utilize for this challenge to make things a bit easier.
months <- c("January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December")
poultry_tidy <- poultry_tidy %>%
mutate(Month = match(Month, months)) %>%
mutate(Date = paste0(Year, "-", Month)) %>%
mutate(Year = as.integer(substr(Date, 1, 4)),
Month = as.integer(substr(Date, 6, 7)))
poultry_tidy <- poultry_tidy %>%
mutate(Date = paste(Year, Month, "01", sep = "-")) %>%
mutate(Date = ymd(Date))
poultry_tidy
## # A tibble: 600 × 5
## Product Year Month Price_Dollar Date
## <chr> <int> <int> <dbl> <date>
## 1 Whole 2013 1 2.38 2013-01-01
## 2 Whole 2013 2 2.38 2013-02-01
## 3 Whole 2013 3 2.38 2013-03-01
## 4 Whole 2013 4 2.38 2013-04-01
## 5 Whole 2013 5 2.38 2013-05-01
## 6 Whole 2013 6 2.38 2013-06-01
## 7 Whole 2013 7 2.38 2013-07-01
## 8 Whole 2013 8 2.38 2013-08-01
## 9 Whole 2013 9 2.38 2013-09-01
## 10 Whole 2013 10 2.38 2013-10-01
## # ℹ 590 more rows
# Further processing
poultry_tidy$Product <- str_replace_all(poultry_tidy$Product, c("B/S" = "Boneless Skinless", "Breast"="Breast Meat"))
poultry_tidy$Price_Dollar <- as.numeric(poultry_tidy$Price_Dollar)
poultry_tidy
## # A tibble: 600 × 5
## Product Year Month Price_Dollar Date
## <chr> <int> <int> <dbl> <date>
## 1 Whole 2013 1 2.38 2013-01-01
## 2 Whole 2013 2 2.38 2013-02-01
## 3 Whole 2013 3 2.38 2013-03-01
## 4 Whole 2013 4 2.38 2013-04-01
## 5 Whole 2013 5 2.38 2013-05-01
## 6 Whole 2013 6 2.38 2013-06-01
## 7 Whole 2013 7 2.38 2013-07-01
## 8 Whole 2013 8 2.38 2013-08-01
## 9 Whole 2013 9 2.38 2013-09-01
## 10 Whole 2013 10 2.38 2013-10-01
## # ℹ 590 more rows
table(poultry_tidy$Product)
##
## Bone-in Breast Meat Boneless Skinless Breast Meat
## 120 120
## Thighs Whole
## 120 120
## Whole Legs
## 120
As we can observe the names have been repplaced and now there are columns with a new date table.
poultry_tidy <- poultry_tidy %>%
mutate(Product_Group = case_when(
str_detect(Product, "Breast") ~ "Breast",
str_detect(Product, "Leg") | Product == "Thighs" ~ "Leg",
TRUE ~ "Whole")) %>%
arrange(Product, Date) %>%
group_by(Product) %>%
arrange(Date) %>%
mutate(Prev_Price = lag(Price_Dollar)) %>%
ungroup() %>%
mutate(Price_MoM_Change = Price_Dollar - Prev_Price) %>%
arrange(Product, Date)
table(poultry_tidy$Product_Group)
##
## Breast Leg Whole
## 240 240 120
poultry_tidy %>%
select(Product, Year, Month, Price_Dollar, Date, Product_Group, Prev_Price, Price_MoM_Change)
## # A tibble: 600 × 8
## Product Year Month Price_Dollar Date Product_Group Prev_Price
## <chr> <int> <int> <dbl> <date> <chr> <dbl>
## 1 Bone-in Breast … 2004 1 NA 2004-01-01 Breast NA
## 2 Bone-in Breast … 2004 2 NA 2004-02-01 Breast NA
## 3 Bone-in Breast … 2004 3 NA 2004-03-01 Breast NA
## 4 Bone-in Breast … 2004 4 NA 2004-04-01 Breast NA
## 5 Bone-in Breast … 2004 5 NA 2004-05-01 Breast NA
## 6 Bone-in Breast … 2004 6 NA 2004-06-01 Breast NA
## 7 Bone-in Breast … 2004 7 3.90 2004-07-01 Breast NA
## 8 Bone-in Breast … 2004 8 3.90 2004-08-01 Breast 3.90
## 9 Bone-in Breast … 2004 9 3.90 2004-09-01 Breast 3.90
## 10 Bone-in Breast … 2004 10 3.90 2004-10-01 Breast 3.90
## # ℹ 590 more rows
## # ℹ 1 more variable: Price_MoM_Change <dbl>
We can further see that I added a prev_price column to display and it was orgnaized so its in order of the product and its year-month so we can observe sequentially the changes.