Data Mining is not only about getting the data, filtering it, analysing it and make sense out of it. It also means to present it to the users in such a way that, they don’t have to be an experts to understand it. The data should be presented in such a way that they have a glance at the data or graph and they can somewhat understand what exactly the data means.
There are different graphing tools which makes the life easier for the data scientists to do so. You can show the ranking, scatterplots, flowcharts, histogram, line chart, pie chart, distribution, correlation, etc. using the graph.
To present the data into a simplest form is a technique that a data scientist should learn. After all that’s what you are getting paid for, to make the supervisors, managers, BODs and owners life easy.
Some of the very popular graphing tools that I have used in this course are:
There are so many other graphical forms but the one that I catches my eye is one which is getting popular and is attractive too. It kinds of give you the live information about the data that you are presenting. I wanted to use this so much in this course but I am not quite pro on this yet.
So, I have mentioned a simple example below which defines what a graph can do to your data.
It uses gapminder, gganimate and ggplot2.
#include required libraries
library(gapminder)
library(ggplot2)
library(gganimate)
# create a ggplot
ggplot(gapminder, aes(gdpPercap, lifeExp, size=pop, color = continent)) +
geom_point() +
scale_x_log10() +
theme_bw() +
# gganimate specific bits:
labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'Life Expectancy') +
transition_time(year) +
ease_aes('linear')