Data Exploration

Coding Confident

6/26/2021

In this lesson you will learn how to create visualizations that summarize your dataset in a matter of seconds, using the package DataExplorer.

Step 1 - Install and Load Packages

If you have already installed these packages, do not install again

install.packages("DataExplorer")
#Load Packages
library(DataExplorer)
library(gt)

Step 2 - Import Data

Load in a dataset that is located in the gt package containing pizza sales transactions. After the package is loaded we can use datasets that are within the package. If we type pizzaplace we can view a dataset containing pizza sales transactions.

pizzaplace
## # A tibble: 49,574 x 7
##    id          date       time     name        size  type    price
##    <chr>       <chr>      <chr>    <chr>       <chr> <chr>   <dbl>
##  1 2015-000001 2015-01-01 11:38:36 hawaiian    M     classic  13.2
##  2 2015-000002 2015-01-01 11:57:40 classic_dlx M     classic  16  
##  3 2015-000002 2015-01-01 11:57:40 mexicana    M     veggie   16  
##  4 2015-000002 2015-01-01 11:57:40 thai_ckn    L     chicken  20.8
##  5 2015-000002 2015-01-01 11:57:40 five_cheese L     veggie   18.5
##  6 2015-000002 2015-01-01 11:57:40 ital_supr   L     supreme  20.8
##  7 2015-000003 2015-01-01 12:12:28 prsc_argla  L     supreme  20.8
##  8 2015-000003 2015-01-01 12:12:28 ital_supr   M     supreme  16.5
##  9 2015-000004 2015-01-01 12:16:31 ital_supr   M     supreme  16.5
## 10 2015-000005 2015-01-01 12:21:30 ital_supr   M     supreme  16.5
## # ... with 49,564 more rows

Turn the dataset into a dataframe using as.data.frame() and assign it to a name, In this case we are naming our dataframe “df”

df <- as.data.frame(pizzaplace)

Step 3 - Data Summary Report (Default)

DataExplorer is an amazing package that allows us to produce charts and tables that summarize our data in a matter of seconds. It can be done in as little as one line of code. Below we are creating the report using our dataframe named “df”.

create_report(df)

Multivariate Distribution Charts (bivariate)

Multivariate distribution is one of the most common metrics in business analysis. It is not included in the default report, however, it can easily be produced using the code below. In this case, the multivariate variable that we are using is type. Any discrete variable can be placed within the by =, for example “size”. Bivariate distribution charts are then created with that variable and every other discrete variable.

plot_bar(df, by = "type")

To save these multivariate distribution charts, click the Export button below.

Many analysts prefer to use the copy to clipboard button for easily copying and pasting charts to Powerpoint.

Customized Report

There are a lot of charts included in the default report that are not frequently used in business analysis. We can remove those by configuring the report. For more details on how to configure the report visit https://cran.r-project.org/web/packages/DataExplorer/DataExplorer.pdf

config <- configure_report(
 add_plot_str = FALSE,
  add_plot_qq = FALSE,
  add_plot_prcomp = FALSE,
  add_plot_boxplot = FALSE,
  add_plot_scatterplot = FALSE,
)
create_report(df, config = config)

Exporting Report to PDF

DataExplorer does not have a function to export the report to a PDF. However, below is a simple workaround for exporting the full report to PDF file.

Step 1

Run the code below. The console will display the location of a file.

system.file("rmd_template/report.Rmd", package = "DataExplorer")
## [1] "C:/Users/tyler/Documents/R/win-library/4.1/DataExplorer/rmd_template/report.Rmd"

Step 2

Copy the location displayed in your console, paste it into your file explorer, and click ENTER.

Make sure to not copy the quotation marks

Step 3

You will see the following code in the file. Add the highlight text to the code and then click save.

Add: always_allow_html: true

Step 5

Install the package below. This will only need to be installed once

tinytex::install_tinytex()

Step 6

Export our customized report to a PDF

create_report(df, config = config, output_file = "report.pdf", output_format = "pdf_document")