knitr::opts_chunk$set(echo = TRUE)

Introduction:

The goal of this assignment practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.

What is tidyverse?

“The tidyverse is a powerful collection of R packages that you can use for data science. They are designed to help you to transform and visualize data. All packages within this collection share an underlying philosophy and common APIs.” (datacamp)

Tidyverse packages include the following libraries:

loading the librairies:

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Loading the data:

The dataset I chose for this assignment is about the ‘Fruits and Vegetables Prices In USA In The Year 2020’. The dataset contains 8 columns and 156 rows.

The column description of the dataset is as follows:

-Item: Name of the fruit or the vegetable.

-Form: The form of the item, i.e., canned, fresh, juice, dried or frozen.

-Retail Price: Average retail price of the item in the year.

-Retail Price Unit: Average retail price’s measurement unit.

-Yield: Average yield of the item in the year.

-Cup Equivalent Size: Comparison done with one edible cup of food.

-Cup Equivalent Unit: Comparison’s measurement unit.

-Cup Equivalent Price: Price per edible cup equivalent (The Unit of Measurement for Federal Recommendations for Fruit and Vegetable Consumption)

This is taken form the website of kaggle

url<- "https://raw.githubusercontent.com/SalouaDaouki/Data607/main/Prices.csv"
dataset <- read.csv(url)

The dplyr package:

The “dplyr is a grammar of data You can use it to solve the most common data manipulation challenges.” (Datacamp). Some of its functions are:

Applying dplyr on the data:

Glimpse of the data:

Now, let’s first take a look at the data:

glimpse(dataset)
## Rows: 155
## Columns: 8
## $ Item               <chr> "Acorn squash", "Apples", "Apples, applesauce", "Ap…
## $ Form               <chr> "Fresh", "Fresh", "Canned", "Juice", "Juice", "Fres…
## $ RetailPrice        <dbl> 1.1804, 1.5193, 1.0660, 0.5853, 0.7804, 2.9665, 6.6…
## $ RetailPriceUnit    <chr> "per pound", "per pound", "per pound", "per pint", …
## $ Yield              <dbl> 0.4586, 0.9000, 1.0000, 1.0000, 1.0000, 0.9300, 1.0…
## $ CupEquivalentSize  <dbl> 0.4519, 0.2425, 0.5401, 8.0000, 8.0000, 0.3638, 0.1…
## $ CupEquivalentUnit  <chr> "pounds", "pounds", "pounds", "fluid ounces", "flui…
## $ CupEquivalentPrice <dbl> 1.1633, 0.4094, 0.5758, 0.2926, 0.3902, 1.1603, 0.9…

Looking at the summary of the data will give us the five number summary for the quantitative data and the class of each qualitative data:

summary(dataset)
##      Item               Form            RetailPrice      RetailPriceUnit   
##  Length:155         Length:155         Min.   : 0.3604   Length:155        
##  Class :character   Class :character   1st Qu.: 1.1565   Class :character  
##  Mode  :character   Mode  :character   Median : 1.7218   Mode  :character  
##                                        Mean   : 2.1796                     
##                                        3rd Qu.: 2.5783                     
##                                        Max.   :10.5527                     
##      Yield        CupEquivalentSize CupEquivalentUnit  CupEquivalentPrice
##  Min.   :0.3750   Min.   :0.1232    Length:155         Min.   :0.2021    
##  1st Qu.:0.6650   1st Qu.:0.3197    Class :character   1st Qu.:0.5199    
##  Median :0.9100   Median :0.3527    Mode  :character   Median :0.6769    
##  Mean   :0.9264   Mean   :0.8852                       Mean   :0.8141    
##  3rd Qu.:1.0000   3rd Qu.:0.3858                       3rd Qu.:1.0223    
##  Max.   :2.5397   Max.   :8.0000                       Max.   :3.0700

Filtering the data:

Let’s filter the data by form of the fruits and vegetables:

head(filter(dataset, Form=="Fresh"))
##           Item  Form RetailPrice RetailPriceUnit  Yield CupEquivalentSize
## 1 Acorn squash Fresh      1.1804       per pound 0.4586            0.4519
## 2       Apples Fresh      1.5193       per pound 0.9000            0.2425
## 3     Apricots Fresh      2.9665       per pound 0.9300            0.3638
## 4    Artichoke Fresh      2.1913       per pound 0.3750            0.3858
## 5    Asparagus Fresh      2.7576       per pound 0.4938            0.3968
## 6     Avocados Fresh      2.2368       per pound 0.7408            0.3197
##   CupEquivalentUnit CupEquivalentPrice
## 1            pounds             1.1633
## 2            pounds             0.4094
## 3            pounds             1.1603
## 4            pounds             2.2545
## 5            pounds             2.2159
## 6            pounds             0.9653

Arranging the data:

Let’s arrange the data on ascending order based on the retail price:

head(dataset%>% arrange(RetailPrice))
##                         Item  Form RetailPrice RetailPriceUnit  Yield
## 1                 Watermelon Fresh      0.3604       per pound 0.5200
## 2                    Bananas Fresh      0.5249       per pound 0.6400
## 3                  Pineapple Fresh      0.5685       per pound 0.5100
## 4                 Cantaloupe Fresh      0.5767       per pound 0.5100
## 5 Apples, frozen concentrate Juice      0.5853        per pint 1.0000
## 6                   Potatoes Fresh      0.6682       per pound 0.8113
##   CupEquivalentSize CupEquivalentUnit CupEquivalentPrice
## 1            0.3307            pounds             0.2292
## 2            0.3307            pounds             0.2712
## 3            0.3638            pounds             0.4055
## 4            0.3748            pounds             0.4238
## 5            8.0000      fluid ounces             0.2926
## 6            0.2646            pounds             0.2179

We can also combine both filter and arrange functions, for example, below, I am going to filter the data to focus only on the fresh fruits and vegetables, then arrange in ascending order based on the retail price.

dataset%>%
  filter(Form == "Fresh") %>%
  arrange(RetailPrice)
##                        Item  Form RetailPrice RetailPriceUnit  Yield
## 1                Watermelon Fresh      0.3604       per pound 0.5200
## 2                   Bananas Fresh      0.5249       per pound 0.6400
## 3                 Pineapple Fresh      0.5685       per pound 0.5100
## 4                Cantaloupe Fresh      0.5767       per pound 0.5100
## 5                  Potatoes Fresh      0.6682       per pound 0.8113
## 6            Cabbage, green Fresh      0.7025       per pound 0.7788
## 7     Carrots, cooked whole Fresh      0.8703       per pound 0.8158
## 8        Carrots, raw whole Fresh      0.8703       per pound 0.8900
## 9                  Honeydew Fresh      0.9056       per pound 0.4600
## 10                   Onions Fresh      0.9751       per pound 0.9000
## 11  Celery, trimmed bunches Fresh      0.9842       per pound 0.7300
## 12         Lettuce, iceberg Fresh      0.9952       per pound 0.9500
## 13             Cabbage, red Fresh      1.0985       per pound 0.7791
## 14           Sweet potatoes Fresh      1.1198       per pound 0.8818
## 15                  Mangoes Fresh      1.1513       per pound 0.7100
## 16    Tomatoes, roma & plum Fresh      1.1618       per pound 0.9100
## 17               Grapefruit Fresh      1.1695       per pound 0.4900
## 18             Acorn squash Fresh      1.1804       per pound 0.4586
## 19      Cucumbers with peel Fresh      1.1933       per pound 0.9700
## 20   Cucumbers without peel Fresh      1.1933       per pound 0.7300
## 21                  Oranges Fresh      1.2131       per pound 0.6800
## 22         Butternut squash Fresh      1.2325       per pound 0.7140
## 23            Carrots, baby Fresh      1.2716       per pound 1.0000
## 24            Green peppers Fresh      1.2772       per pound 0.8200
## 25                   Papaya Fresh      1.2904       per pound 0.6200
## 26              Clementines Fresh      1.3847       per pound 0.7700
## 27                   Apples Fresh      1.5193       per pound 0.9000
## 28                 Zucchini Fresh      1.5489       per pound 0.7695
## 29                   Radish Fresh      1.5770       per pound 0.9000
## 30                    Pears Fresh      1.5865       per pound 0.9000
## 31                  Peaches Fresh      1.7167       per pound 0.9600
## 32  Lettuce, romaine, heads Fresh      1.8299       per pound 0.9400
## 33                   Grapes Fresh      1.8398       per pound 0.9600
## 34                     Corn Fresh      1.8908       per pound 0.5400
## 35               Nectarines Fresh      1.9062       per pound 0.9100
## 36        Cauliflower heads Fresh      1.9769       per pound 0.8926
## 37                     Plum Fresh      2.0292       per pound 0.9400
## 38              Green beans Fresh      2.1463       per pound 0.8466
## 39              Red peppers Fresh      2.1614       per pound 0.8200
## 40                     Kiwi Fresh      2.1849       per pound 0.7600
## 41                Artichoke Fresh      2.1913       per pound 0.3750
## 42              Pomegranate Fresh      2.2350       per pound 0.5600
## 43                 Avocados Fresh      2.2368       per pound 0.7408
## 44           Broccoli heads Fresh      2.3065       per pound 0.7800
## 45    Tomatoes, large round Fresh      2.3347       per pound 0.9100
## 46            Celery sticks Fresh      2.4041       per pound 1.0000
## 47            Turnip greens Fresh      2.4176       per pound 0.7500
## 48                     Kale Fresh      2.5018       per pound 1.0500
## 49 Lettuce, romaine, hearts Fresh      2.5766       per pound 0.8500
## 50             Strawberries Fresh      2.5800       per pound 0.9400
## 51           Collard greens Fresh      2.6820       per pound 1.1600
## 52         Brussels sprouts Fresh      2.6895       per pound 1.0600
## 53         Broccoli florets Fresh      2.7486       per pound 1.0000
## 54                Asparagus Fresh      2.7576       per pound 0.4938
## 55                 Apricots Fresh      2.9665       per pound 0.9300
## 56          Spinach, boiled Fresh      2.9940       per pound 0.7700
## 57       Spinach, eaten raw Fresh      2.9940       per pound 1.0000
## 58                 Cherries Fresh      3.4269       per pound 0.9200
## 59         Mushrooms, whole Fresh      3.4464       per pound 0.9700
## 60      Cauliflower florets Fresh      3.5859       per pound 0.9702
## 61        Mushrooms, sliced Fresh      3.6417       per pound 1.0000
## 62                     Okra Fresh      3.9803       per pound 0.7695
## 63 Tomatoes, grape & cherry Fresh      4.1458       per pound 0.9100
## 64              Blueberries Fresh      4.1739       per pound 0.9500
## 65             Blackberries Fresh      6.0172       per pound 0.9600
## 66              Raspberries Fresh      6.6391       per pound 0.9600
##    CupEquivalentSize CupEquivalentUnit CupEquivalentPrice
## 1             0.3307            pounds             0.2292
## 2             0.3307            pounds             0.2712
## 3             0.3638            pounds             0.4055
## 4             0.3748            pounds             0.4238
## 5             0.2646            pounds             0.2179
## 6             0.3307            pounds             0.2983
## 7             0.3197            pounds             0.3410
## 8             0.2756            pounds             0.2695
## 9             0.3748            pounds             0.7378
## 10            0.3527            pounds             0.3822
## 11            0.2646            pounds             0.3567
## 12            0.2425            pounds             0.2540
## 13            0.3307            pounds             0.4663
## 14            0.4409            pounds             0.5599
## 15            0.3638            pounds             0.5898
## 16            0.3748            pounds             0.4785
## 17            0.4630            pounds             1.1050
## 18            0.4519            pounds             1.1633
## 19            0.2646            pounds             0.3255
## 20            0.2646            pounds             0.4325
## 21            0.4079            pounds             0.7276
## 22            0.4519            pounds             0.7802
## 23            0.2756            pounds             0.3504
## 24            0.2646            pounds             0.4121
## 25            0.3086            pounds             0.6424
## 26            0.4630            pounds             0.8326
## 27            0.2425            pounds             0.4094
## 28            0.3968            pounds             0.7987
## 29            0.2756            pounds             0.4829
## 30            0.3638            pounds             0.6412
## 31            0.3417            pounds             0.6111
## 32            0.2094            pounds             0.4077
## 33            0.3307            pounds             0.6338
## 34            0.3638            pounds             1.2737
## 35            0.3197            pounds             0.6696
## 36            0.2756            pounds             0.6103
## 37            0.3638            pounds             0.7852
## 38            0.2756            pounds             0.6987
## 39            0.2646            pounds             0.6973
## 40            0.3858            pounds             1.1091
## 41            0.3858            pounds             2.2545
## 42            0.3417            pounds             1.3638
## 43            0.3197            pounds             0.9653
## 44            0.3417            pounds             1.0105
## 45            0.3748            pounds             0.9616
## 46            0.2646            pounds             0.6360
## 47            0.3197            pounds             1.0304
## 48            0.2866            pounds             0.6829
## 49            0.2094            pounds             0.6349
## 50            0.3197            pounds             0.8774
## 51            0.2866            pounds             0.6626
## 52            0.3417            pounds             0.8670
## 53            0.3417            pounds             0.9393
## 54            0.3968            pounds             2.2159
## 55            0.3638            pounds             1.1603
## 56            0.3307            pounds             1.2859
## 57            0.1543            pounds             0.4621
## 58            0.3417            pounds             1.2729
## 59            0.1543            pounds             0.5483
## 60            0.2756            pounds             1.0185
## 61            0.1543            pounds             0.5620
## 62            0.3527            pounds             1.8246
## 63            0.3748            pounds             1.7075
## 64            0.3197            pounds             1.4045
## 65            0.3197            pounds             2.0037
## 66            0.3197            pounds             2.2107

Mutating a new column:

Using mutate() we can update or add another column; in this data, I am going to add a column “EquivalentPounds” to calculate the number of pounds equivalent to the cup_equivalent_price. To be able to do that, I am going to filter first the column cup equivalent unit that has pounds only. Then, I can arrange the dataset by the new column in descending order. The new datset will be called “newDataset” for further use.

newDataset <- dataset %>% 
  filter(CupEquivalentUnit=="pounds") %>%
  mutate(EquivalentPounds =CupEquivalentPrice/RetailPrice)%>%
  arrange(desc(EquivalentPounds))

Summarizing the Equivalent pounds:

We can use summarize() function to summarize observations in the dataset, To illustrate, I am going to find the median, the minimum and the maximum values of the new column “EquivalentPounds” I created in the previous subsection.

newDataset %>%
  summarise(MedianEP = median(EquivalentPounds),
            MinEP = min(EquivalentPounds),
            MaxEP = max(EquivalentPounds))
##   MedianEP     MinEP    MaxEP
## 1 0.399709 0.1231699 1.028841

Grouping the data by a condition:

To summarize within a group in the dataset, we can use group by. To illustrate I am going to summarize the data the same way as before, but focusing on each form of the fruits and vegetables:

newDataset %>%
  group_by(Form) %>%
  summarise(MedianEP = median(EquivalentPounds),
            MinEP = min(EquivalentPounds),
            MaxEP = max(EquivalentPounds))
## # A tibble: 4 × 4
##   Form   MedianEP MinEP MaxEP
##   <chr>     <dbl> <dbl> <dbl>
## 1 Canned    0.560 0.298 0.678
## 2 Dried     0.156 0.123 0.187
## 3 Fresh     0.389 0.154 1.03 
## 4 Frozen    0.344 0.296 0.483

The ggplot2 package:

scatterplot:

ggplot2 is a package in tidyverse that allows us to visualize the data using different plots such as scatter plots, which is used to compare two variables. To illustrate, I am going to plot the relationship between the retail price and the cup equivalent price of the fruits and vegetables in different plots based on the form:

ggplot(dataset, aes(x=RetailPrice,
          y=CupEquivalentPrice)) +
     geom_point() +
     facet_wrap("Form")

Based on the graphs, there is a strong positive correlation between the retail price and the cup equivalent price of the fruits and vegetables across all their five forms.

Conclusion:

There are many packages within tidyverse that we can use to manipulate, tidy and visualize the data, but for starters, the dplyr and ggplots2 are enough to acquire and practice basic skills in R.