This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Note: this analysis was performed using the open source software R and Rstudio.
The objective of this applied project is to explain the price of avocados using some basic descriptive analysis.This analysis can be used by producers, retailers, and groceries to make decisions about their pricing strategies, advertising strategies, and supply chain stratgies among others. Some additional analysis will follow after this episode. Your feedback is highly appreciated.
This data was downloaded from the Hass Avocado Board website in May of 2018 & compiled into a single CSV. Here’s how the Hass Avocado Board describes the data on their website: The table below represents weekly retail scan data for National retail volume (units) and price. Retail scan data comes directly from retailers’ cash registers based on actual retail sales of Hass avocados. Starting in 2013, the table below reflects an expanded, multi-outlet retail data set. Multi-outlet reporting includes an aggregation of the following channels: grocery, mass, club, drug, dollar and military. The Average Price (of avocados) in the table reflects a per unit (per avocado) cost, even when multiple units (avocados) are sold in bags. The Product Lookup codes (PLU’s) in the table are only for Hass avocados. Other varieties of avocados (e.g. greenskins) are not included in this table.
urlfile<-'https://raw.github.com/utjimmyx/resources/master/avocado_HAA.csv'
data<-read.csv(urlfile, fileEncoding="UTF-8-BOM")
summary(data)
## date average_price total_volume type
## Length:12628 Min. :0.500 Min. : 253 Length:12628
## Class :character 1st Qu.:1.100 1st Qu.: 15733 Class :character
## Mode :character Median :1.320 Median : 94806 Mode :character
## Mean :1.359 Mean : 325259
## 3rd Qu.:1.570 3rd Qu.: 430222
## Max. :2.780 Max. :5660216
## year geography
## Min. :2017 Length:12628
## 1st Qu.:2018 Class :character
## Median :2019 Mode :character
## Mean :2019
## 3rd Qu.:2020
## Max. :2020
library(plyr)
str(data)
## 'data.frame': 12628 obs. of 6 variables:
## $ date : chr "2017/12/3" "2017/12/3" "2017/12/3" "2017/12/3" ...
## $ average_price: num 1.39 1.44 1.07 1.62 1.43 1.58 1.14 1.77 1.4 1.88 ...
## $ total_volume : int 139970 3577 504933 10609 658939 38754 86646 1829 488588 21338 ...
## $ type : chr "conventional" "organic" "conventional" "organic" ...
## $ year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
## $ geography : chr "Albany" "Albany" "Atlanta" "Atlanta" ...
# Let's build a simple histogram
hist(data$average_price ,
main = "Histogram of average_price",
xlab = "Price in USD (US Dollar)")
library(ggplot2)
ggplot(data, aes(x = average_price, fill = type)) +
geom_histogram(bins = 30, col = "red") +
scale_fill_manual(values = c("blue", "green")) +
ggtitle("Frequency of Average Price - Oragnic vs. Conventional")
# Simple EFA with ggplot
ggplot() +
geom_col(data, mapping = aes(x = reorder(geography,total_volume),
y = total_volume, fill = year )) +
xlab("geography")+
ylab("total_volume")+
theme(axis.text.x = element_text(angle = 90, size = 7))
# Simple EFA with ggplot
ggplot() +
geom_col(data, mapping = aes(x = reorder(geography,average_price),
y = average_price, fill = year )) +
xlab("geography")+
ylab("average_price")+
theme(axis.text.x = element_text(angle = 90, size = 7))
# Sample response for year 2017 - The plot shows that Los Angels has the highest amount of sales in 2017.
Questions
From my findings, using a summary, histogram, and a box plot, I can say there is a difference in average price between conventional and organic. With this information, I found the price mean when including both to be $1.35 cost and the minimum to be $.5 and the max to be $2.78 for an avocado. Of these prices, I found the conventional avocado to be below $1.5 and the organic avocado to be above it. These price differences shown in my box plot show that organic avocados are more expensive than conventional by a difference of 10 cents.
Looking at the bar chart I found that from 2017 to 2018 the city with the most sales of HASS Avocados was Los Angeles and right behind it is New York. The reason Los Angeles had the most sales correlates with California being one of the highest producers of HASS Avocados. With HASS Avocados being produced in Cali and the health benefits it provides the people of LA grew their desire for this product as supply is always high.
Hass Avocados are known and marketed as one of the most healthy and nutrient-dense fruits making it more impactful to its stakeholders. With its health benefits, most customers would be glad to hear that their decision to purchase this tasty fruit can have a positive impact on their health. Employees will also be excited to know their efforts in selling avocados will help improve the lives and health of their community and society. As for the community and public opinion, purchasing an avocado would become more acceptable in society and make it easier to buy after people know how beneficial it is to one’s health.