# A tibble: 600 × 4
Product Year Month Price_Dollar
<chr> <dbl> <chr> <dbl>
1 Whole 2013 January 2.38
2 Whole 2013 February 2.38
3 Whole 2013 March 2.38
4 Whole 2013 April 2.38
5 Whole 2013 May 2.38
6 Whole 2013 June 2.38
7 Whole 2013 July 2.38
8 Whole 2013 August 2.38
9 Whole 2013 September 2.38
10 Whole 2013 October 2.38
# ℹ 590 more rows
Briefly describe the data
The data set records price information for 5 different parts of poultry from 2004 to 2013.
Tidy Data (as needed)
While the variables do need to be mutated, the data set does not need to be tidied. Each row represents a single observance of a poultry part and its corresponding time and price, which is the layout that we want.
Identify variables that need to be mutated
The first column “Product” is given as a character string, but the data is categorical. There are exactly 5 products:
# A tibble: 600 × 4
Product Year Month Price_Dollar
<fct> <dbl> <chr> <dbl>
1 Whole 2013 January 2.38
2 Whole 2013 February 2.38
3 Whole 2013 March 2.38
4 Whole 2013 April 2.38
5 Whole 2013 May 2.38
6 Whole 2013 June 2.38
7 Whole 2013 July 2.38
8 Whole 2013 August 2.38
9 Whole 2013 September 2.38
10 Whole 2013 October 2.38
# ℹ 590 more rows
Next, we see that the Year and Month columns are used in conjunction to describe the date that the product’s price was recorded. Instead, it would be nice to have a single Date column which shows the time at which the price was recorded for our analysis. To do this, we can use the make_date function of lubridate. In order to convert the name of the month into its number, we can use match to get the index of the name of the Month given in the dataset in the month.name vector. Then, we can simply pass in the Year and Month to make_date, which will produce the corresponding date assuming the day was the 1st of the month. Lastly, we drop the Year and Month columns as they are now redundant.
Now, all the data is tidy and the variables are in their best forms for data analysis. No change is needed on the Price_Dollar, as the values are continuous so storing as a double is logical.