DATA-607 TIDYVERSE ASSIGNMENT

The tidyverse is a powerful collection of R packages that are actually data tools All packages of the tidyverse share an underlying common APIs as listed below.

Lists

ggplot2, which implements the grammar of graphics. You can use it to visualize your data.
dplyr is a grammar of data manipulation. You can use it to solve the most common data manipulation challenges.
tidyr helps you to create tidy data or data where each variable is in a column, each observation is a row end each value is a cell.
readr is a fast and friendly way to read rectangular data.
purrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.
tibble is a modern re-imaginging of the data frame.
stringr provides a cohesive set of functions designed to make working with strings as easy as posssible
forcats provide a suite of useful tools that solve common problems with factors.

Now for the package

#Loading just the "tidyverse" library !

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.0.5

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.0     v dplyr   1.0.5
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

# For this exercise we are using a dataset from kaggle.com which contain 
#information about avacado sales in various cities in the USA

# ---------- Using the "rear" function "read_csv" -------

avacado <- data.frame(read_csv(file = "https://raw.githubusercontent.com/tagensingh/SPS-DATA607-TIDYVERSE/main/avocado.csv"))

## Warning: Missing column names filled in: 'X1' [1]

## 
## -- Column specification --------------------------------------------------------
## cols(
##   X1 = col_double(),
##   Date = col_date(format = ""),
##   AveragePrice = col_double(),
##   `Total Volume` = col_double(),
##   `4046` = col_double(),
##   `4225` = col_double(),
##   `4770` = col_double(),
##   `Total Bags` = col_double(),
##   `Small Bags` = col_double(),
##   `Large Bags` = col_double(),
##   `XLarge Bags` = col_double(),
##   type = col_character(),
##   year = col_double(),
##   region = col_character()
## )

# ---------- Using the "tibble" function to look at a snapshot of the dataframe -------

tibble(avacado)

## # A tibble: 18,249 x 14
##       X1 Date       AveragePrice Total.Volume X4046   X4225 X4770 Total.Bags
##    <dbl> <date>            <dbl>        <dbl> <dbl>   <dbl> <dbl>      <dbl>
##  1     0 2015-12-27         1.33       64237. 1037.  54455.  48.2      8697.
##  2     1 2015-12-20         1.35       54877.  674.  44639.  58.3      9506.
##  3     2 2015-12-13         0.93      118220.  795. 109150. 130.       8145.
##  4     3 2015-12-06         1.08       78992. 1132   71976.  72.6      5811.
##  5     4 2015-11-29         1.28       51040.  941.  43838.  75.8      6184.
##  6     5 2015-11-22         1.26       55980. 1184.  48068.  43.6      6684.
##  7     6 2015-11-15         0.99       83454. 1369.  73673.  93.3      8319.
##  8     7 2015-11-08         0.98      109428.  704. 101815.  80        6829.
##  9     8 2015-11-01         1.02       99811. 1022.  87316.  85.3     11388.
## 10     9 2015-10-25         1.07       74339.  842.  64757. 113        8626.
## # ... with 18,239 more rows, and 6 more variables: Small.Bags <dbl>,
## #   Large.Bags <dbl>, XLarge.Bags <dbl>, type <chr>, year <dbl>, region <chr>

# ---------- Using the "arrange" function  from "dplyr"to sort the dataframe by date-------

avacado_date <- arrange(avacado,Date)

tibble(avacado_date)

## # A tibble: 18,249 x 14
##       X1 Date       AveragePrice Total.Volume   X4046   X4225   X4770 Total.Bags
##    <dbl> <date>            <dbl>        <dbl>   <dbl>   <dbl>   <dbl>      <dbl>
##  1    51 2015-01-04         1.22       40873.  2.82e3  2.83e4  4.99e1      9716.
##  2    51 2015-01-04         1         435021.  3.64e5  2.38e4  8.22e1     46816.
##  3    51 2015-01-04         1.08      788025.  5.40e4  5.53e5  4.00e4    141137.
##  4    51 2015-01-04         1.01       80034.  4.46e4  2.50e4  2.75e3      7756.
##  5    51 2015-01-04         1.02      491738   7.19e3  3.97e5  1.29e2     87663.
##  6    51 2015-01-04         1.4       116253.  3.27e3  5.57e4  1.10e2     57183.
##  7    51 2015-01-04         0.93     5777335.  2.84e6  2.27e6  1.37e5    528452.
##  8    51 2015-01-04         1.19      166006.  2.94e4  4.72e4  3.86e4     50798.
##  9    51 2015-01-04         1.11      783068.  3.03e4  5.51e5  1.25e5     77539.
## 10    51 2015-01-04         0.88      228570.  3.27e3  1.69e5  1.45e3     55083.
## # ... with 18,239 more rows, and 6 more variables: Small.Bags <dbl>,
## #   Large.Bags <dbl>, XLarge.Bags <dbl>, type <chr>, year <dbl>, region <chr>

# ---------- Using the "ggplot" function  from "ggplot2"to chart the pricing density of avacados-------

# Histogram overlaid with kernel density curve
ggplot(avacado, aes(x=AveragePrice)) + 
    geom_histogram(aes(y=..density..),      # Histogram with density instead of count on y-axis
                   binwidth=.1,
                   colour="black", fill="white") +
    geom_density(alpha=.1, fill="#FF6666")+# Overlay with transparent density plot
    ggtitle("Avacados Pricing Density")

DATA-607 TIDYVERSE ASSIGNMENT

Tage N Singh

2021-04-13

This vignette will provide a brief exploration of the “tidyverse” package using its built-in libraries

Now for the package

As is shown above, the tidyverse packages is one of the most versatile packages in the R sphere….