R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

library(readr)
data <- read_csv('avocado.csv')
## Rows: 12628 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): date, type, geography
## dbl (4): average_price, total_volume, year, Mileage
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(data)
##      date           average_price    total_volume         type          
##  Length:12628       Min.   :0.500   Min.   :    253   Length:12628      
##  Class :character   1st Qu.:1.100   1st Qu.:  15733   Class :character  
##  Mode  :character   Median :1.320   Median :  94806   Mode  :character  
##                     Mean   :1.359   Mean   : 325259                     
##                     3rd Qu.:1.570   3rd Qu.: 430222                     
##                     Max.   :2.780   Max.   :5660216                     
##       year       geography            Mileage    
##  Min.   :2017   Length:12628       Min.   : 111  
##  1st Qu.:2018   Class :character   1st Qu.:1097  
##  Median :2019   Mode  :character   Median :2193  
##  Mean   :2019                      Mean   :1911  
##  3rd Qu.:2020                      3rd Qu.:2632  
##  Max.   :2020                      Max.   :2998
library(ggplot2)
ggplot(data, aes(x = average_price)) +
  geom_histogram(binwidth = 0.1, fill = "blue", color = "black", alpha = 0.7) +
  labs(title = "Distribution of Avocado Prices", x = "Average Price", y = "Frequency") +
  theme_minimal()

library(ggplot2)
ggplot(data, aes(x = type, y = average_price, fill = type)) +
  geom_boxplot() +
  labs(title = "Price Comparison: Organic vs. Conventional Avocados", x = "Avocado Type", y = "Average Price") +
  theme_minimal()

Questions

1. Perform an exploratory analysis of the variable AveragePrice. How do the prices of organic and conventional avocados compare? Any other findings?

From my findings, using a summary, histogram, and a box plot, I can say there is a difference in average price between conventional and organic. With this information, I found the price mean when including both to be $1.35 cost and the minimum to be $.5 and the max to be $2.78 for an avocado. Of these prices, I found the conventional avocado to be below $1.5 and the organic avocado to be above it. These price differences shown in my box plot show that organic avocados are more expensive than conventional by a difference of 10 cents.

2. Draw a regression plot using the variable AveragePrice and Total Volume.

library(ggplot2)
ggplot(data, aes(x =total_volume, y = average_price)) +
  geom_point(alpha = 0.5, color = "blue") +  # Scatter plot
  geom_smooth(method = "lm", color = "red", se = TRUE) +  # Regression line with confidence interval
  labs(title = "Regression Plot: Average Price vs. Total Volume",
       x = "Total Volume",
       y = "Average Price") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'