The increasing cost of fruits and vegetables has become a pressing concern, particularly for households with limited budgets. Understanding the factors influencing these price fluctuations is crucial to addressing the affordability of healthy diets.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
fruit_and_veg_prices <- read.csv("Fruit-Prices-2022.csv")
colnames(fruit_and_veg_prices)
## [1] "Fruit" "Form" "RetailPrice"
## [4] "RetailPriceUnit" "Yield" "CupEquivalentSize"
## [7] "CupEquivalentUnit" "CupEquivalentPrice"
str(fruit_and_veg_prices)
## 'data.frame': 62 obs. of 8 variables:
## $ Fruit : chr "Apples" "Apples, applesauce" "Apples, ready-to-drink" "Apples, frozen concentrate" ...
## $ Form : chr "Fresh" "Canned" "Juice" "Juice" ...
## $ RetailPrice : num 1.854 1.171 0.87 0.609 3.616 ...
## $ RetailPriceUnit : chr "per pound" "per pound" "per pint" "per pint" ...
## $ Yield : num 0.9 1 1 1 0.93 1 0.65 1 0.64 1 ...
## $ CupEquivalentSize : num 0.242 0.54 8 8 0.364 ...
## $ CupEquivalentUnit : chr "pounds" "pounds" "fluid ounces" "fluid ounces" ...
## $ CupEquivalentPrice: num 0.5 0.632 0.435 0.304 1.415 ...
head(fruit_and_veg_prices)
## Fruit Form RetailPrice RetailPriceUnit Yield
## 1 Apples Fresh 1.8541 per pound 0.90
## 2 Apples, applesauce Canned 1.1705 per pound 1.00
## 3 Apples, ready-to-drink Juice 0.8699 per pint 1.00
## 4 Apples, frozen concentrate Juice 0.6086 per pint 1.00
## 5 Apricots Fresh 3.6162 per pound 0.93
## 6 Apricots, packed in juice Canned 1.8645 per pound 1.00
## CupEquivalentSize CupEquivalentUnit CupEquivalentPrice
## 1 0.2425 pounds 0.4996
## 2 0.5401 pounds 0.6323
## 3 8.0000 fluid ounces 0.4349
## 4 8.0000 fluid ounces 0.3043
## 5 0.3638 pounds 1.4145
## 6 0.5401 pounds 1.0071
summary(fruit_and_veg_prices)
## Fruit Form RetailPrice RetailPriceUnit
## Length:62 Length:62 Min. : 0.382 Length:62
## Class :character Class :character 1st Qu.: 1.364 Class :character
## Mode :character Mode :character Median : 2.159 Mode :character
## Mean : 2.995
## 3rd Qu.: 4.117
## Max. :10.303
## Yield CupEquivalentSize CupEquivalentUnit CupEquivalentPrice
## Min. :0.4600 Min. :0.1232 Length:62 Min. :0.2429
## 1st Qu.:0.7225 1st Qu.:0.3225 Class :character 1st Qu.:0.6393
## Median :0.9800 Median :0.3638 Mode :character Median :1.0083
## Mean :0.8761 Mean :1.7050 Mean :1.0651
## 3rd Qu.:1.0000 3rd Qu.:0.5401 3rd Qu.:1.3535
## Max. :1.0000 Max. :8.0000 Max. :3.5558
summary(fruit_and_veg_prices$RetailPrice)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.382 1.364 2.159 2.995 4.117 10.303
summary(fruit_and_veg_prices$Form)
## Length Class Mode
## 62 character character
summary(fruit_and_veg_prices$Fruit)
## Length Class Mode
## 62 character character
fruit_and_veg_prices <- fruit_and_veg_prices %>%
drop_na(RetailPrice, Form)
fruit_and_veg_prices <- fruit_and_veg_prices %>%
drop_na(Fruit, RetailPrice)
fruit_and_veg_prices <- fruit_and_veg_prices %>%
mutate(Form = as.factor(Form))
fruit_and_veg_prices <- fruit_and_veg_prices %>%
mutate(Fruit = as.factor(Fruit))
ggplot(fruit_and_veg_prices, aes(x = Form, y = RetailPrice)) +
geom_boxplot() +
labs(title = "Retail Price by Form")
fruit_avg_prices <- fruit_and_veg_prices %>%
group_by(Fruit) %>%
summarize(avg_price = mean(RetailPrice))
ggplot(fruit_avg_prices, aes(x = Fruit, y = avg_price)) +
geom_bar(stat = "identity") +
labs(title = "Average Retail Price by Fruit")
ggplot(fruit_and_veg_prices, aes(x = RetailPrice)) +
geom_histogram() +
labs(title = "Distribution of Retail Prices")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Alternative Hypothesis: There is a significant difference in retail prices among different forms of fruits and vegetables.
anova_test1<- aov(RetailPrice ~ Form, data = fruit_and_veg_prices)
print(summary(anova_test1))
## Df Sum Sq Mean Sq F value Pr(>F)
## Form 4 188.8 47.19 21.45 7.77e-11 ***
## Residuals 57 125.4 2.20
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Alternative Hypothesis: There are significant differences in retail prices among different fruit categories.
anova_test2 <- aov(RetailPrice~ Fruit, data = fruit_and_veg_prices)
summary(anova_test2)
## Df Sum Sq Mean Sq F value Pr(>F)
## Fruit 52 223.38 4.296 0.426 0.974
## Residuals 9 90.78 10.087
There is a significant difference in retail prices among different forms of fruits and vegetables. Reject the null hypothesis and Accept the Alternative Hypothesis.
There are no significant differences in retail prices among different fruit categories. Accept the Null hypothesis and Reject the Alternative Hypothesis.