R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

Note: this analysis was performed using the open source software R and Rstudio.

Objective

The objective of this applied project is to explain the price of avocados using some basic descriptive analysis.This analysis can be used by producers, retailers, and groceries to make decisions about their pricing strategies, advertising strategies, and supply chain stratgies among others. Some additional analysis will follow after this episode. Your feedback is highly appreciated.

Dataset - weekly avocado sales and price data from the Hass Avocado Board website

This data was downloaded from the Hass Avocado Board website in May of 2018 & compiled into a single CSV. Here’s how the Hass Avocado Board describes the data on their website: The table below represents weekly retail scan data for National retail volume (units) and price. Retail scan data comes directly from retailers’ cash registers based on actual retail sales of Hass avocados. Starting in 2013, the table below reflects an expanded, multi-outlet retail data set. Multi-outlet reporting includes an aggregation of the following channels: grocery, mass, club, drug, dollar and military. The Average Price (of avocados) in the table reflects a per unit (per avocado) cost, even when multiple units (avocados) are sold in bags. The Product Lookup codes (PLU’s) in the table are only for Hass avocados. Other varieties of avocados (e.g. greenskins) are not included in this table.

urlfile<-'https://raw.github.com/utjimmyx/resources/master/avocado_HAA.csv'
data<-read.csv("avocado.csv")
summary(data)
##      date           average_price    total_volume         type          
##  Length:12628       Min.   :0.500   Min.   :    253   Length:12628      
##  Class :character   1st Qu.:1.100   1st Qu.:  15733   Class :character  
##  Mode  :character   Median :1.320   Median :  94806   Mode  :character  
##                     Mean   :1.359   Mean   : 325259                     
##                     3rd Qu.:1.570   3rd Qu.: 430222                     
##                     Max.   :2.780   Max.   :5660216                     
##       year       geography            Mileage    
##  Min.   :2017   Length:12628       Min.   : 111  
##  1st Qu.:2018   Class :character   1st Qu.:1097  
##  Median :2019   Mode  :character   Median :2193  
##  Mean   :2019                      Mean   :1911  
##  3rd Qu.:2020                      3rd Qu.:2632  
##  Max.   :2020                      Max.   :2998
library(plyr)
str(data)
## 'data.frame':    12628 obs. of  7 variables:
##  $ date         : chr  "2017/12/3" "2017/12/3" "2017/12/3" "2017/12/3" ...
##  $ average_price: num  1.39 1.44 1.07 1.62 1.43 1.58 1.14 1.77 1.4 1.88 ...
##  $ total_volume : int  139970 3577 504933 10609 658939 38754 86646 1829 488588 21338 ...
##  $ type         : chr  "conventional" "organic" "conventional" "organic" ...
##  $ year         : int  2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
##  $ geography    : chr  "Albany" "Albany" "Atlanta" "Atlanta" ...
##  $ Mileage      : int  2832 2832 2199 2199 2679 2679 827 827 2998 2998 ...
# Let's build a simple histogram
hist(data$average_price ,
     main = "Histogram of average_price",
     xlab = "Price in USD (US Dollar)")

library(ggplot2)
ggplot(data, aes(x = average_price, fill = type)) +
geom_histogram(bins = 30, col = "purple") +
scale_fill_manual(values = c("blue", "pink")) +
ggtitle("Frequency of Average Price - Oragnic vs. Conventional")

# Simple EFA with ggplot
ggplot() +
  geom_col(data, mapping = aes(x = reorder(geography,total_volume),
  y = total_volume, fill = year )) +
xlab("geography")+
ylab("total_volume")+
theme(axis.text.x = element_text(angle = 90, size = 7))

# Sample response for year 2017 - The plot shows that Los Angels has the highest amount of sales in 2017.

Questions

1.Refer to your report and answer this question. How do the prices of organic and conventional avocados compare? Any other findings?

The pries of organic and conventional avocados compare by the conventional avocados being lower in price but higher in sales than the organic avocados. Other findings are that, once the conventional avocados start increasing in price, their count goes lower than the count of the organic even tho the prices are the same when they are higher. 

2. Which city has the largest sales of organic and conventional Hass avocados in the dollar amount, respectively, for year 2017 and year 2018, respectively? What could be the possible reasons for Hass avocado’s popularity in these cities?

Los Angeles has the largest sales. The possible reasons for the avocado’s popularity to be in Los Angeles is the demographic which are mostly hispanic/latinos as we use avocado in a lot of our dishes, but also Los Angeles is a city where lots of healthy food is consumed as it is a big city.

3. Based on your analysis and list at least three reasons different stakeholders could benefit from your own marketing research.

Three reasons different stakeholders could benefit from my own marketing research could be where to sell avocados, how much to price them at depending on the city and demographic, and how many avocados to provide in each city at grocery stores.