Data visualization is a key step in order to become a good Data Analyst. Indeed, having a clear representation of the results that we have found, using data analysis techniques, is very important in order to allow the stakeholders to understand better our work. Moreover, I think that the best way to explain the results of a dataset analysis is through graphs, because they give you an immediate broadly understanding of the conclusion you have found.
As a result, in this tutorial, I will illustrate how to compact all graphs in just one image. This procedure is useful in terms of EDA because you can create a clear comparison among plots.
Furthermore, in order to do that we will use the command grid.arrange() that you can find in the library gridExtra.
In addition, we will see how to create an interactive graph using the function ggplotly of the package plotly. In particular, this command is helpful since it allows you to see the coordinates corresponding to each dot in a graph.
For this analysis, we will use the cars dataset that comes with R by default.
It is composed of 50 observations and 2 variables i.e speed and dist.
-The variable dist indicates the numeric stopping distance (ft) of the car depending on the speed.
-The variable speed indicates the speed of the car at a certain point in time.
Moreover, in this dataset there are no missing values. As a result, we don’t need to adjust our dataset, but we can proceed directly with the data analysis.
First of all we have to recall the necessary libraries:
After that we have to recall the dataset:
data("cars")
summary(cars)
#> speed dist
#> Min. : 4.0 Min. : 2.00
#> 1st Qu.:12.0 1st Qu.: 26.00
#> Median :15.0 Median : 36.00
#> Mean :15.4 Mean : 42.98
#> 3rd Qu.:19.0 3rd Qu.: 56.00
#> Max. :25.0 Max. :120.00We can use the Scatter Plot or the geometric ggplot in order to verify potential correlation among variables:
At this point we can definetly say that there is a positive correlation among speed and the distance that a car use to stop. We can find the correspondent value using the command cor().
In order to simplify the reading of the graphs we can use the command ggplotly() , that we can find in the library plotly. This command show us the coordinates of each dot in the geometric ggplot.
N.B. In order to activate the coordinates, please press on the graph with the mouse.
In order to have all graphs in just one image we can use the command grid.arrange.
grid.arrange(a,c, ncol=2,
top="Cars Analysis", left = "Geom. Plot", right= "linear Regression",
bottom = "Speed ~ Distance Reletionship")
#> `geom_smooth()` using formula 'y ~ x'In conclusion, we can say that grid.arrange is a very useful command, in terms of EDA, because it allows you to compact different graphs in just one image. Indeed, I think that in order to have a clear and fast comparison of different graphs using this command is for certainty a good idea. However, I have to underline that this command has a little problem. In particular, it does not allow you to compact graphs under the function ggplotly() because that function is not supported by grid.Extra()
Dataset —> https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/cars.html
gridExtra Package —> https://cran.r-project.org/web/packages/gridExtra/index.html
plotly Package —> https://cran.r-project.org/web/packages/plotly/index.html