The {summarytools} package allows you to quickly
create an EDA report in R.
For more details related to this package please visit: https://cran.r-project.org/web/packages/summarytools/vignettes/introduction.html.
Here are the steps to create a quick EDA report using this amazing R
package.
Note. To save the R code from this tutorial, please
copy and paste to your RStudio the lines of code located in the gray
boxes.
# Set working directory
setwd("C:/MyRData/RPubs-summarytools")
# Libraries
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.4.2 v purrr 1.0.1
## v tibble 3.2.1 v dplyr 1.1.2
## v tidyr 1.3.0 v stringr 1.5.0
## v readr 2.1.3 v forcats 0.5.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(summarytools)
##
## Attaching package: 'summarytools'
##
## The following object is masked from 'package:tibble':
##
## view
# Upload data from working directory (Working Folder)
CollegeDistance<- read.csv("CollegeDistance.csv", fileEncoding = "UTF-8-BOM")
# Display classes of each column
sapply(CollegeDistance, class)
## rownames gender ethnicity score fcollege mcollege
## "integer" "character" "character" "numeric" "character" "character"
## home urban unemp wage distance tuition
## "character" "character" "numeric" "numeric" "numeric" "numeric"
## education income region
## "integer" "character" "character"
# Convert all character columns to factor
CollegeDistance <- as.data.frame(unclass(CollegeDistance), stringsAsFactors = TRUE)
# Display classes of each column
sapply(CollegeDistance, class)
## rownames gender ethnicity score fcollege mcollege home urban
## "integer" "factor" "factor" "numeric" "factor" "factor" "factor" "factor"
## unemp wage distance tuition education income region
## "numeric" "numeric" "numeric" "numeric" "integer" "factor" "factor"
library(summarytools)
# Create EDA
view(dfSummary(CollegeDistance, max.distinct.values = 25, max.string.width = 25))
## Switching method to 'browser'
## Output file written: C:\Users\RODRIG~1\AppData\Local\Temp\Rtmp6xPEAS\file4ce06c8276e2.html
# Save EDA in HTML format to your working directory/folder
print(dfSummary(CollegeDistance, max.distinct.values = 25,
max.string.width = 25),
file = 'CollegeDistance_EDA.html')
## Output file written: C:\MyRData\RPubs-summarytools\CollegeDistance_EDA.html
Congratulations for completing all the eight steps for creating an EDA report with the {summarytools} package!
If you are new to R Programming Language, don’t give up. Your R skills will get better with time.
Note. The ‘CollegeDistance.csv’ file can be downloaded from: https://vincentarelbundock.github.io/Rdatasets/articles/data.html