R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

Objective

The objective of this applied project is to explain the price of avocados using some basic descriptive analysis. This analysis can be used by producers, retailers, and grocers to make decisions about their pricing strategies, advertising strategies, and supply chain strategies among others. Some additional analysis will follow after this episode. Your feedback is highly appreciated.

Dataset - Weekly avocado sales and price data from the Hass Avocado Board website

This data was downloaded from the Hass Avocado Board website in May of 2018 & compiled into a single CSV. Here’s how the Hass Avocado Board describes the data on their website: The table below represents weekly retail scan data for National retail volume (units) and price. Retail scan data comes directly from retailers’ cash registers based on actual retail sales of Hass avocados. Starting in 2013, the table below reflects an expanded, multi-outlet retail data set. Multi-outlet reporting includes an aggregation of the following channels: grocery, mass, club, drug, dollar and military. The Average Price (of avocados) in the table reflects a per unit (per avocado) cost, even when multiple units (avocados) are sold in bags. The Product Lookup codes (PLU’s) in the table are only for Hass avocados. Other varieties of avocados (e.g. greenskins) are not included in this table.

urlfile<-'https://raw.github.com/utjimmyx/resources/master/avocado_HAA.csv'
data<-read.csv(urlfile, fileEncoding="UTF-8-BOM")
summary(data)
##      date           average_price    total_volume         type          
##  Length:12628       Min.   :0.500   Min.   :    253   Length:12628      
##  Class :character   1st Qu.:1.100   1st Qu.:  15733   Class :character  
##  Mode  :character   Median :1.320   Median :  94806   Mode  :character  
##                     Mean   :1.359   Mean   : 325259                     
##                     3rd Qu.:1.570   3rd Qu.: 430222                     
##                     Max.   :2.780   Max.   :5660216                     
##       year       geography        
##  Min.   :2017   Length:12628      
##  1st Qu.:2018   Class :character  
##  Median :2019   Mode  :character  
##  Mean   :2019                     
##  3rd Qu.:2020                     
##  Max.   :2020
library(plyr)
str(data)
## 'data.frame':    12628 obs. of  6 variables:
##  $ date         : chr  "2017/12/3" "2017/12/3" "2017/12/3" "2017/12/3" ...
##  $ average_price: num  1.39 1.44 1.07 1.62 1.43 1.58 1.14 1.77 1.4 1.88 ...
##  $ total_volume : int  139970 3577 504933 10609 658939 38754 86646 1829 488588 21338 ...
##  $ type         : chr  "conventional" "organic" "conventional" "organic" ...
##  $ year         : int  2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
##  $ geography    : chr  "Albany" "Albany" "Atlanta" "Atlanta" ...

Let’s build a simple histogram

hist(data$average_price, main = "Histogram of Average Price", xlab = "Price in USD (US Dollar)")

Simple EFA with ggplot

library(ggplot2)
ggplot(data, aes(x = average_price, fill = type)) + geom_histogram(bins = 30, col = "black") +
scale_fill_manual(values = c("blue", "green")) +
ggtitle("Frequency of Average Price - Organic vs. Conventional")

Sample response for year 2017 - The plot below shows that Los Angeles has the highest amount of sales in 2017.

ggplot() +
geom_col(data, mapping = aes(x = reorder(geography,total_volume),
y = total_volume, fill = year )) +
xlab("geography")+
ylab("total_volume")+
theme(axis.text.x = element_text(angle = 90, size = 7))

The plot generated below is a shadow histogram. It shows a similar figure to the Simple EFA with ggplot above; however, ShadowHist individually highlights and emphasizes the average price distribution for each type of avocado. The APD of conventional avocados is less than the APD of organic avocados.

library(WVPlots) 
## Loading required package: wrapr
ShadowHist(data, "average_price", "type", "Average Price Distribution for Conventional vs. Organic", binwidth=0.01)
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the WVPlots package.
##   Please report the issue at <https://github.com/WinVector/WVPlots/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

The figure below is a scatter plot displaying the relation between the total volume of avocado sales to the average price. The resulting visual shows that total avocado sales drop when price increases.

ggplot(data, aes(x=average_price, y=total_volume)) +  
         
        geom_point() +
       ggtitle("Total Sales as a Function of Average Price")