DATA621 Blog 3: DataExplorer

Before performing analysis or any type of predictive modeling on a data set, you may want to see what the data looks like in order to understand what problem solving techniques you can develop to derive the desired information you wish to obtain from the data. A lot of time, cleaning and organizing data can be time-consuming. There is an R package called DataExplorer that automates most of data handling and visualization, so that users could focus on studying the data and extracting insights.

For this example two datasets will be used: Soybean and Glass. Soybean consist of only categorical/discrete variables while Glass has continuous variables. Some functions like plot_qq and plot_histogram won’t work on Soybean because they need continuous values as input.

library(ggplot2)
library(mlbench)
library(DataExplorer)
data("Soybean")
data("Glass")

introduce(Soybean)

##   rows columns discrete_columns continuous_columns all_missing_columns
## 1  683      36               36                  0                   0
##   total_missing_values complete_rows total_observations memory_usage
## 1                 2337           562              24588       128600

introduce(Glass)

##   rows columns discrete_columns continuous_columns all_missing_columns
## 1  214      10                1                  9                   0
##   total_missing_values complete_rows total_observations memory_usage
## 1                    0           214               2140        19984

plot_intro(Soybean)

plot_intro(Glass)

plot_missing(Soybean)

plot_missing(Glass)

plot_bar(Soybean)

plot_bar(Glass)

plot_histogram(Glass)

plot_boxplot(Glass, by="Type")

plot_qq(Glass, by="Type")

plot_correlation(Soybean)

plot_correlation(Glass)

DATA621 Blog 3: DataExplorer

Javern Wilson

2/29/2020