Package R untuk Visualisasi Data

Aulia Asyiva

January 24, 2022

Materi ini akan membahas tentang beberapa pacakge yang menunjang dalam eksplorasi dan visualisasi data.

Data

Data yang digunakan adalah data yang bersumber dari Bank Portugis mengenai marketing campaigns melalui telepon. Data ini memiliki 16 Variabel prediktor untuk memprediksi apakah klien akan berlangganan term deposit atau tidak. Data bersumber dari UCI.

Import Data

Melakukan Import Data dari file CSV. Dapat di download dengan link ini https://ipb.link/praktikum-visualisasidata

Bank <-read.csv("bank.csv")
head(Bank)

##   age         job marital education default balance housing loan  contact day
## 1  30  unemployed married   primary      no    1787      no   no cellular  19
## 2  33    services married secondary      no    4789     yes  yes cellular  11
## 3  35  management  single  tertiary      no    1350     yes   no cellular  16
## 4  30  management married  tertiary      no    1476     yes  yes  unknown   3
## 5  59 blue-collar married secondary      no       0     yes   no  unknown   5
## 6  35  management  single  tertiary      no     747      no   no cellular  23
##   month duration campaign pdays previous  y weight hight
## 1   oct       79        1    -1        0 no   63.2 160.9
## 2   may      220        1   339        4 no   68.5 165.3
## 3   apr      185        1   330        1 no   78.5 173.5
## 4   jun      199        4    -1        0 no   95.3 187.2
## 5   may      226        1    -1        0 no   60.1 158.3
## 6   feb      141        2   176        3 no   94.8 186.8

1.DataExplorer

Automated data exploration process for analytic tasks and predictive modeling, so that users could focus on understanding data and extracting insights. The package scans and analyzes each variable, and visualizes them with typical graphical techniques. Common data processing methods are also available to treat and format data.

#install.packages("DataExplorer")
library("DataExplorer")

Histogram

plot_histogram(Bank)

plot_density(Bank,geom_density_args = list(fill="blue"))

BarPlot

plot_bar(Bank)

ScatterplotPlot

plot_scatterplot(Bank,by = "age")

Plot Correlation

type untuk . . .

plot_correlation(Bank,type = "c")

2.corplot

Package Description: R package corrplot provides a visual exploratory tool on correlation matrix that supports automatic variable reordering to help detect hidden patterns among variables.

corrplot is very easy to use and provides a rich array of plotting options in visualization method, graphic layout, color, legend, text labels, etc. It also provides p-values and confidence intervals to help users determine the statistical significance of the correlations.

#install.packages("corrplot")
library("corrplot")
library("tidyverse") #untuk menggunakan symbol %>%

corrplot(Bank %>% select(where(is.numeric)) %>% cor,
 method = "pie",type = "lower",
 diag = FALSE)

3.ggpubr

Package Description: The 'ggplot2' package is excellent and flexible for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a 'ggplot', the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills. 'ggpubr' provides some easy-to-use functions for creating and customizing 'ggplot2'- based publication ready plots

#install.packages("ggpubr")
library("ggpubr")

Histogram

gghistogram(Bank,x="age",fill="pink")+scale_y_continuous(expand = c(0,0))

## Warning: Using `bins = 30` by default. Pick better value with the argument
## `bins`.

Dencity Plot

ggdensity(Bank,x ="age",fill ="pink")+
 scale_y_continuous(expand = c(0,0))+
 scale_x_continuous(expand = c(0,0))

names_cont <- colnames(Bank %>% select(where(is.numeric)))
p1 <- map(names_cont,~ 
ggdensity(Bank,x = .x,fill ="pink")+
 scale_y_continuous(expand = c(0,0))+
 scale_x_continuous(expand = c(0,0))
)
ggarrange(plotlist = p1)

Scatterplot

ggscatter(Bank,x = "age",y="duration",color = "pink")

ggscatter(Bank,x = "age",y="duration",color="pink",
 add = "reg.line" , # Add regression line 
 conf.int = TRUE, # Add confidence interval
 add.params = list(color = "blue",
 fill = "lightgray")
 )

## `geom_smooth()` using formula 'y ~ x'

3.ggplot2

#install.packages("ggplot2")
library(ggplot2)

One variable

Data Continuous

a <- ggplot(Bank, aes(x=weight))
a + geom_area(stat = "bin", color="blue",fill="skyblue",size=1) +
    labs(
      x = "x",
      y = "y",
      title = "Title",
      subtitle = "Subtitle",
      caption = "caption"
    )

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

a + geom_histogram(binwidth = 10)

Data Discrete

b <- ggplot(Bank, aes(day))
b + geom_bar(color="blue", fill = "pink") +
    labs(
      x = "x",
      y = "y",
      title = "Title",
      subtitle = "Subtitle",
      caption = "caption"
    )

Two variable

Continuous X, Continuous Y

f <- ggplot(Bank, aes(weight, hight))
f + geom_jitter(color="pink", size=2, shape="o")

Discrete X, Continuous Y

g <- ggplot(Bank, aes(day, weight))
g + geom_bar(stat = "identity", color = "blue") +
  labs(
      x = "x",
      y = "y",
      title = "Title",
      subtitle = "Subtitle",
      caption = "caption"
)

g <- ggplot(Bank, aes(day, weight))
g + geom_boxplot(aes(group=day))+
  labs(
      x = "x",
      y = "y",
      title = "Title",
      subtitle = "Subtitle",
      caption = "caption"
)

g1 <- ggplot(Bank, aes(job, age))
g1 + geom_violin(scale = "area", color="#993399", fill = "#993399") +
  labs(
      x = "x",
      y = "y",
      title = "Title",
      subtitle = "Subtitle",
      caption = "caption"
)

Discrete X, Discrete Y

h <- ggplot(Bank, aes(education, job))
h + geom_jitter(colour="red", size=1)+
  labs(
      x = "x",
      y = "y",
      title = "Title",
      subtitle = "Subtitle",
      caption = "caption"
)