Data Visualization to Explore Datasets:

Data Visualization is essential for delivering data in the most efficient way possible. Data visualization deals with the raw or clean data, models it, and delivers the data so that conclusions can be reached. However,Data visualization also helps to understand how to work efficiently with a dataset. Visualization helps to see through different variables and their relationships. In this blog, I am exploring some of the data visualization plots tools that I usually use to understand any datasets.

library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library(ggplot2)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## v purrr   0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(reshape)
## 
## Attaching package: 'reshape'
## The following object is masked from 'package:dplyr':
## 
##     rename
## The following objects are masked from 'package:tidyr':
## 
##     expand, smiths
## The following object is masked from 'package:openintro':
## 
##     tips
gifted
## # A tibble: 36 x 8
##    score fatheriq motheriq speak count  read edutv cartoons
##    <int>    <int>    <int> <int> <int> <dbl> <dbl>    <dbl>
##  1   159      115      117    18    26   1.9  3        2   
##  2   164      117      113    20    37   2.5  1.75     3.25
##  3   154      115      118    20    32   2.2  2.75     2.5 
##  4   157      113      131    12    24   1.7  2.75     2.25
##  5   156      110      109    17    34   2.2  2.25     2.5 
##  6   150      113      109    13    28   1.9  1.25     3.75
##  7   155      118      119    19    24   1.8  2        3   
##  8   161      117      120    18    32   2.3  2.25     2.5 
##  9   163      111      128    22    28   2.1  1        4   
## 10   162      122      120    18    27   2.1  2.25     2.75
## # ... with 26 more rows
data<-gifted
summary(data)
##      score          fatheriq        motheriq         speak        count      
##  Min.   :150.0   Min.   :110.0   Min.   :101.0   Min.   :10   Min.   :21.00  
##  1st Qu.:155.0   1st Qu.:112.0   1st Qu.:113.8   1st Qu.:17   1st Qu.:28.00  
##  Median :159.0   Median :115.0   Median :118.0   Median :18   Median :31.00  
##  Mean   :159.1   Mean   :114.8   Mean   :118.2   Mean   :18   Mean   :30.69  
##  3rd Qu.:162.0   3rd Qu.:116.2   3rd Qu.:122.2   3rd Qu.:20   3rd Qu.:34.25  
##  Max.   :169.0   Max.   :126.0   Max.   :131.0   Max.   :23   Max.   :39.00  
##       read           edutv          cartoons    
##  Min.   :1.700   Min.   :0.750   Min.   :1.750  
##  1st Qu.:2.000   1st Qu.:1.750   1st Qu.:2.688  
##  Median :2.200   Median :2.000   Median :3.000  
##  Mean   :2.136   Mean   :1.958   Mean   :3.062  
##  3rd Qu.:2.300   3rd Qu.:2.250   3rd Qu.:3.500  
##  Max.   :2.500   Max.   :3.000   Max.   :4.500
plot(data[,c(1,5,6,7,8)])

library(corrplot)
## corrplot 0.90 loaded
corrplot(cor(data[,c(1,3,5,6,7)], data$price, use = "na.or.complete"),
type = "lower", 
order = "original", tl.col = "black", tl.srt = 45, tl.cex = 0.55, cl.pos='n', addgrid.col = FALSE)
## Warning: Unknown or uninitialised column: `price`.

### Histogram is implemented to understand the distribution of different variables:I am exploring the score variables’s distribution.

hist(data$score,col="blue")

Undesrtanding correlation or distribution is really essential to initiate data analysis.