Introduction

  1. Introduction to Fruit and Vegetable Prices Analysis

The increasing cost of fruits and vegetables has become a pressing concern, particularly for households with limited budgets. Understanding the factors influencing these price fluctuations is crucial to addressing the affordability of healthy diets.

  1. Show Dataset
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)

fruit_and_veg_prices <- read.csv("Fruit-Prices-2022.csv")

colnames(fruit_and_veg_prices)
## [1] "Fruit"              "Form"               "RetailPrice"       
## [4] "RetailPriceUnit"    "Yield"              "CupEquivalentSize" 
## [7] "CupEquivalentUnit"  "CupEquivalentPrice"
str(fruit_and_veg_prices)
## 'data.frame':    62 obs. of  8 variables:
##  $ Fruit             : chr  "Apples" "Apples, applesauce" "Apples, ready-to-drink" "Apples, frozen concentrate" ...
##  $ Form              : chr  "Fresh" "Canned" "Juice" "Juice" ...
##  $ RetailPrice       : num  1.854 1.171 0.87 0.609 3.616 ...
##  $ RetailPriceUnit   : chr  "per pound" "per pound" "per pint" "per pint" ...
##  $ Yield             : num  0.9 1 1 1 0.93 1 0.65 1 0.64 1 ...
##  $ CupEquivalentSize : num  0.242 0.54 8 8 0.364 ...
##  $ CupEquivalentUnit : chr  "pounds" "pounds" "fluid ounces" "fluid ounces" ...
##  $ CupEquivalentPrice: num  0.5 0.632 0.435 0.304 1.415 ...
head(fruit_and_veg_prices)
##                        Fruit   Form RetailPrice RetailPriceUnit Yield
## 1                     Apples  Fresh      1.8541       per pound  0.90
## 2         Apples, applesauce Canned      1.1705       per pound  1.00
## 3     Apples, ready-to-drink  Juice      0.8699        per pint  1.00
## 4 Apples, frozen concentrate  Juice      0.6086        per pint  1.00
## 5                   Apricots  Fresh      3.6162       per pound  0.93
## 6  Apricots, packed in juice Canned      1.8645       per pound  1.00
##   CupEquivalentSize CupEquivalentUnit CupEquivalentPrice
## 1            0.2425            pounds             0.4996
## 2            0.5401            pounds             0.6323
## 3            8.0000      fluid ounces             0.4349
## 4            8.0000      fluid ounces             0.3043
## 5            0.3638            pounds             1.4145
## 6            0.5401            pounds             1.0071
summary(fruit_and_veg_prices)
##     Fruit               Form            RetailPrice     RetailPriceUnit   
##  Length:62          Length:62          Min.   : 0.382   Length:62         
##  Class :character   Class :character   1st Qu.: 1.364   Class :character  
##  Mode  :character   Mode  :character   Median : 2.159   Mode  :character  
##                                        Mean   : 2.995                     
##                                        3rd Qu.: 4.117                     
##                                        Max.   :10.303                     
##      Yield        CupEquivalentSize CupEquivalentUnit  CupEquivalentPrice
##  Min.   :0.4600   Min.   :0.1232    Length:62          Min.   :0.2429    
##  1st Qu.:0.7225   1st Qu.:0.3225    Class :character   1st Qu.:0.6393    
##  Median :0.9800   Median :0.3638    Mode  :character   Median :1.0083    
##  Mean   :0.8761   Mean   :1.7050                       Mean   :1.0651    
##  3rd Qu.:1.0000   3rd Qu.:0.5401                       3rd Qu.:1.3535    
##  Max.   :1.0000   Max.   :8.0000                       Max.   :3.5558

Exploratory Data Analysis

  1. Summary
summary(fruit_and_veg_prices$RetailPrice)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.382   1.364   2.159   2.995   4.117  10.303
summary(fruit_and_veg_prices$Form)
##    Length     Class      Mode 
##        62 character character
summary(fruit_and_veg_prices$Fruit)
##    Length     Class      Mode 
##        62 character character
  1. Data Manipulation
fruit_and_veg_prices <- fruit_and_veg_prices %>%
  drop_na(RetailPrice, Form)

fruit_and_veg_prices <- fruit_and_veg_prices %>%
  drop_na(Fruit, RetailPrice)

fruit_and_veg_prices <- fruit_and_veg_prices %>%
  mutate(Form = as.factor(Form))

fruit_and_veg_prices <- fruit_and_veg_prices %>%
  mutate(Fruit = as.factor(Fruit))
  1. Plot
ggplot(fruit_and_veg_prices, aes(x = Form, y = RetailPrice)) +
  geom_boxplot() +
  labs(title = "Retail Price by Form")

fruit_avg_prices <- fruit_and_veg_prices %>%
  group_by(Fruit) %>%
  summarize(avg_price = mean(RetailPrice))

ggplot(fruit_avg_prices, aes(x = Fruit, y = avg_price)) +
  geom_bar(stat = "identity") +
  labs(title = "Average Retail Price by Fruit")

ggplot(fruit_and_veg_prices, aes(x = RetailPrice)) +
  geom_histogram() +
  labs(title = "Distribution of Retail Prices")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Hypothesis Testing

  1. Null Hypothesis: There is no significant difference in retail prices among different forms of fruits and vegetables.

Alternative Hypothesis: There is a significant difference in retail prices among different forms of fruits and vegetables.

anova_test1<- aov(RetailPrice ~ Form, data = fruit_and_veg_prices)

print(summary(anova_test1))
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Form         4  188.8   47.19   21.45 7.77e-11 ***
## Residuals   57  125.4    2.20                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Null Hypothesis: There are no significant differences in retail prices among different fruit categories.

Alternative Hypothesis: There are significant differences in retail prices among different fruit categories.

anova_test2 <- aov(RetailPrice~ Fruit, data = fruit_and_veg_prices)

summary(anova_test2)
##             Df Sum Sq Mean Sq F value Pr(>F)
## Fruit       52 223.38   4.296   0.426  0.974
## Residuals    9  90.78  10.087

Conclusion

  1. There is a significant difference in retail prices among different forms of fruits and vegetables. Reject the null hypothesis and Accept the Alternative Hypothesis.

  2. There are no significant differences in retail prices among different fruit categories. Accept the Null hypothesis and Reject the Alternative Hypothesis.