Using the {summarytools} package…

The {summarytools} package allows you to quickly create an EDA report in R.
For more details related to this package please visit: https://cran.r-project.org/web/packages/summarytools/vignettes/introduction.html.

Here are the steps to create a quick EDA report using this amazing R package.
Note. To save the R code from this tutorial, please copy and paste to your RStudio the lines of code located in the gray boxes.

  1. Set working directory (i.e.,Working Folder). In this folder, you need to place your data file (i.e., CollegeDistance.csv).
# Set working directory
setwd("C:/MyRData/RPubs-summarytools")
  1. Upload libraries (i.e.,Packages). Always make sure to include the {tidiverse} package!
    Note. You will get warnings every time you run the packages for the first time.
# Libraries 
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.4.2     v purrr   1.0.1
## v tibble  3.2.1     v dplyr   1.1.2
## v tidyr   1.3.0     v stringr 1.5.0
## v readr   2.1.3     v forcats 0.5.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(summarytools)
## 
## Attaching package: 'summarytools'
## 
## The following object is masked from 'package:tibble':
## 
##     view
  1. Upload the data file from your working directory/folder.
# Upload data from working directory (Working Folder)
CollegeDistance<- read.csv("CollegeDistance.csv", fileEncoding = "UTF-8-BOM")
  1. Display the class type for each of the variables in the dataset. The variables that are character need to be converted to factor variables.
# Display classes of each column
sapply(CollegeDistance, class)
##    rownames      gender   ethnicity       score    fcollege    mcollege 
##   "integer" "character" "character"   "numeric" "character" "character" 
##        home       urban       unemp        wage    distance     tuition 
## "character" "character"   "numeric"   "numeric"   "numeric"   "numeric" 
##   education      income      region 
##   "integer" "character" "character"
  1. The following line of code will convert all character type variables in the dataset to factor type variables.
# Convert all character columns to factor
CollegeDistance <- as.data.frame(unclass(CollegeDistance), stringsAsFactors = TRUE)
  1. After, verify if variables were converted to factors by running the code in step (4), one more time.
# Display classes of each column
sapply(CollegeDistance, class)
##  rownames    gender ethnicity     score  fcollege  mcollege      home     urban 
## "integer"  "factor"  "factor" "numeric"  "factor"  "factor"  "factor"  "factor" 
##     unemp      wage  distance   tuition education    income    region 
## "numeric" "numeric" "numeric" "numeric" "integer"  "factor"  "factor"
  1. Now, create the EDA report using the {summarytools} package along with the view() and dfSummary() funtions.
    The EDA report will appear in your ‘Viewer’ tab. In the ‘Viewer’ tab, click the ‘Show in new window’ icon to view EDA report in your browser.
library(summarytools)
# Create EDA
view(dfSummary(CollegeDistance, max.distinct.values = 25, max.string.width = 25))
## Switching method to 'browser'
## Output file written: C:\Users\RODRIG~1\AppData\Local\Temp\Rtmp6xPEAS\file4ce06c8276e2.html
  1. Lastly, save the EDA report in HTML format using the following line of code.
    The College Distance EDA report is now saved in your working directory/folder.
    This report can now be viewed using any computer web browser.
# Save EDA in HTML format to your working directory/folder
print(dfSummary(CollegeDistance, max.distinct.values = 25,
                max.string.width = 25),
                file = 'CollegeDistance_EDA.html')
## Output file written: C:\MyRData\RPubs-summarytools\CollegeDistance_EDA.html

Congratulations for completing all the eight steps for creating an EDA report with the {summarytools} package!

If you are new to R Programming Language, don’t give up. Your R skills will get better with time.

Note. The ‘CollegeDistance.csv’ file can be downloaded from: https://vincentarelbundock.github.io/Rdatasets/articles/data.html