Before performing analysis or any type of predictive modeling on a data set, you may want to see what the data looks like in order to understand what problem solving techniques you can develop to derive the desired information you wish to obtain from the data. A lot of time, cleaning and organizing data can be time-consuming. There is an R package called DataExplorer
that automates most of data handling and visualization, so that users could focus on studying the data and extracting insights.
For this example two datasets will be used: Soybean
and Glass
. Soybean consist of only categorical/discrete variables while Glass has continuous variables. Some functions like plot_qq
and plot_histogram
won’t work on Soybean because they need continuous values as input.
## rows columns discrete_columns continuous_columns all_missing_columns
## 1 683 36 36 0 0
## total_missing_values complete_rows total_observations memory_usage
## 1 2337 562 24588 128600
## rows columns discrete_columns continuous_columns all_missing_columns
## 1 214 10 1 9 0
## total_missing_values complete_rows total_observations memory_usage
## 1 0 214 2140 19984