Understanding the dynamics of avocado prices in relation to sales volume can offer valuable insights into market behavior. This tutorial explores the relationship between the total volume of avocados sold and their average price using a subset of data. The dataset, which appears to track avocado sales information, was examined using the first 25 observations to conduct a preliminary simple linear regression analysis. Our primary objective is to investigate whether there is a linear association between the quantity of avocados sold and their average price during the observed period.
To investigate the relationship between the total volume of avocados sold and their average price, a simple linear regression model was fitted to the first 25 observations of the dataset. The lm function in R was used to model AveragePrice as the dependent variable and Total_Volume as the independent variable. The resulting model was then summarized using the summary function to assess the strength and direction of the linear relationship. Additionally, a scatter plot was generated using the plot function to visualize the raw data points and the fitted regression line, which was added to the plot using abline.
# Define the data vectors
Date <- c("12/29/2019", "12/22/2019", "12/15/2019", "12/08/2019", "12/01/2019",
"11/24/2019", "11/17/2019", "11/10/2019", "11/03/2019", "10/27/2019",
"10/20/2019", "10/13/2019", "10/06/2019", "9/29/2019", "9/22/2019",
"9/15/2019", "9/08/2019", "9/01/2019", "8/25/2019", "8/18/2019",
"8/11/2019", "8/04/2019", "7/28/2019", "7/21/2019", "7/14/2019")
AveragePrice <- c(1.14, 1.15, 1.13, 1.11, 1.11, 1.11, 1.12, 1.13, 1.14, 1.13,
1.14, 1.13, 1.14, 1.14, 1.13, 1.14, 1.12, 1.11, 1.11, 1.12,
1.12, 1.12, 1.12, 1.12, 1.12)
Total_Volume <- c(437987.39, 439934.30, 412729.07, 435011.68, 410224.28,
430704.97, 428753.86, 453110.15, 435345.22, 427236.72,
427505.41, 444360.50, 442905.04, 428989.70, 430545.92,
435300.95, 427142.16, 424298.11, 410444.69, 417250.91,
414002.58, 413393.36, 417642.49, 416209.77, 411030.76)
type <- c("conventional", "conventional", "conventional", "conventional",
"conventional", "conventional", "conventional", "conventional",
"conventional", "conventional", "conventional", "conventional",
"conventional", "conventional", "conventional", "conventional",
"conventional", "conventional", "conventional", "conventional",
"conventional", "conventional", "conventional", "conventional",
"conventional")
year <- c(2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019)
geography <- c("Atlanta", "Atlanta", "Atlanta", "Atlanta", "Atlanta",
"Atlanta", "Atlanta", "Atlanta", "Atlanta", "Atlanta",
"Atlanta", "Atlanta", "Atlanta", "Atlanta", "Atlanta",
"Atlanta", "Atlanta", "Atlanta", "Atlanta", "Atlanta",
"Atlanta", "Atlanta", "Atlanta", "Atlanta", "Atlanta")
Mileage <- c(2190, 2190, 2190, 2190, 2190, 2190, 2190, 2190, 2190, 2190,
2190, 2190, 2190, 2190, 2190, 2190, 2190, 2190, 2190, 2190,
2190, 2190, 2190, 2190, 2190)
# Create the data frame
avocado_data_subset <- data.frame(Date, AveragePrice, Total_Volume, type, year, geography, Mileage)
# Print the data frame
print(avocado_data_subset)
## Date AveragePrice Total_Volume type year geography Mileage
## 1 12/29/2019 1.14 437987.4 conventional 2019 Atlanta 2190
## 2 12/22/2019 1.15 439934.3 conventional 2019 Atlanta 2190
## 3 12/15/2019 1.13 412729.1 conventional 2019 Atlanta 2190
## 4 12/08/2019 1.11 435011.7 conventional 2019 Atlanta 2190
## 5 12/01/2019 1.11 410224.3 conventional 2019 Atlanta 2190
## 6 11/24/2019 1.11 430705.0 conventional 2019 Atlanta 2190
## 7 11/17/2019 1.12 428753.9 conventional 2019 Atlanta 2190
## 8 11/10/2019 1.13 453110.2 conventional 2019 Atlanta 2190
## 9 11/03/2019 1.14 435345.2 conventional 2019 Atlanta 2190
## 10 10/27/2019 1.13 427236.7 conventional 2019 Atlanta 2190
## 11 10/20/2019 1.14 427505.4 conventional 2019 Atlanta 2190
## 12 10/13/2019 1.13 444360.5 conventional 2019 Atlanta 2190
## 13 10/06/2019 1.14 442905.0 conventional 2019 Atlanta 2190
## 14 9/29/2019 1.14 428989.7 conventional 2019 Atlanta 2190
## 15 9/22/2019 1.13 430545.9 conventional 2019 Atlanta 2190
## 16 9/15/2019 1.14 435301.0 conventional 2019 Atlanta 2190
## 17 9/08/2019 1.12 427142.2 conventional 2019 Atlanta 2190
## 18 9/01/2019 1.11 424298.1 conventional 2019 Atlanta 2190
## 19 8/25/2019 1.11 410444.7 conventional 2019 Atlanta 2190
## 20 8/18/2019 1.12 417250.9 conventional 2019 Atlanta 2190
## 21 8/11/2019 1.12 414002.6 conventional 2019 Atlanta 2190
## 22 8/04/2019 1.12 413393.4 conventional 2019 Atlanta 2190
## 23 7/28/2019 1.12 417642.5 conventional 2019 Atlanta 2190
## 24 7/21/2019 1.12 416209.8 conventional 2019 Atlanta 2190
## 25 7/14/2019 1.12 411030.8 conventional 2019 Atlanta 2190
# Regression and plot commands
plot(avocado_data_subset$Total_Volume, avocado_data_subset$AveragePrice,
xlab = "Total Volume of Avocados Sold",
ylab = "Average Price",
main = "Relationship between Volume and Price (First 25 Obs)")
model <- lm(AveragePrice ~ Total_Volume, data = avocado_data_subset)
summary(model)
##
## Call:
## lm(formula = AveragePrice ~ Total_Volume, data = avocado_data_subset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.020495 -0.006789 0.001122 0.007859 0.016783
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.899e-01 7.365e-02 12.084 1.92e-11 ***
## Total_Volume 5.530e-07 1.725e-07 3.206 0.00392 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01011 on 23 degrees of freedom
## Multiple R-squared: 0.3089, Adjusted R-squared: 0.2789
## F-statistic: 10.28 on 1 and 23 DF, p-value: 0.003919
abline(model, col = "red")
#Conclusion and Interpretation The simple linear regression analysis on
the first 25 observations suggests a weak negative relationship between
the total volume of avocados sold and their average price. The scatter
plot indicates a slight downward trend, implying that as the volume of
avocados sold increases, the average price tends to decrease slightly.
The summary of the linear regression model provides the estimated
coefficients, which quantify this relationship. The coefficient for
Total_Volume is negative, supporting the observed trend. However, the
R-squared value is likely to be low given the limited number of
observations, indicating that the total volume alone does not explain a
large proportion of the variation in average prices within this small
subset.
These preliminary findings, based on a very small portion of the potential dataset, hint at a possible inverse relationship between supply and price, which aligns with basic economic principles. However, it is crucial to acknowledge the limitations of this analysis due to the small sample size. A more comprehensive analysis using a larger portion of the data, and considering other potentially influential factors, would be necessary to draw more robust conclusions about the dynamics of avocado prices and sales volume. For now, this initial exploration provides a basic illustration of how linear regression can be used to examine relationships between variables in a marketing dataset.
##References References:
Zhenning Xu, Jimmy (2025). Subset of avocado sales data for regression analysis [avocado.csv]. Received via Canvas.