Advanced Data Visualization with ggplot2 using R

Pendahuluan

Sistem pembuatan grafik pada software R memiliki banyak cara dan package yang mendukung sesuai kebutuhan. Namun, kali ini kita akan membuat grafing menggunakan package ggplot2 dan beberapa package pendukung lainya. Pembuatan grafik menggunakan ggplot2 merupakan inplementasi dari konsep Grammar of Graphic untuk bahasa R. Konsep Grammar of Graphic mengajak kita untuk merekonstruksi pembuatan grafik dengan menggunakan kaidah tata bahasa sehingga tidak terikat pada nama jenis grafik (contoh: scatterplot, line-chart, bar-chart, dll.) seperti yang umumnya dilakukan.

Sebelum memulai diharapkan sudah terinstall package ggplot2 dan memanggil dengan fungsi library() di R masing masing atau dapat menginstallnya dengan menggunakna kode sebagai berikut

Ingat ingat

ggplot2 sendiri memiliki satu fungsi yang dimana dapat membuat grafik dengan cepat yagn dinamakan qplot() atau biasa disebut quick plot yang bermanfaat untuk membuat plot yang cepat dan ringkas. Penggunaannya pun lebih mudah apabila kita sudha terbiasa dengan fungsi plot()

disini kita emnggunakan data diamonds untuk data yang dipakai dan kita kan membuat scatter plot dengan sumbu x adalah carat dan sumbu y adalah price dan color yang akan kita gunakan adalah clarity

qplot(x = carat, y = price, colour = clarity, data = diamonds)

namun pembuatan grafik menggunakan qplot() hanya dapat digunakan untuk grafik grafik sederhana saja, jika kita ingin membuat grafik yang menggunakan fungsi ggplot maka kode yang digunakan akan lebih kompleks dan dapat digunakan untuk membuat masalah pembuatan grafik yang kompleks.

ggplot(data = diamonds,
       mapping = aes(x = carat, y = price, colour = clarity)) +
  geom_point()

3 Cara

ggplot2 merupakan fungsi yang sangat fleksibel dan terdapat berbagai macam cara yang dapat digunakan. kita akan mencoba 3 cara berbeda namun menghasilkan hasil yang sama dalam menggunakan ggplot nda gunakna fungsi summary() untuk melihat perbedaan dari setiap kodenya

library(ggplot2)
# Cara 1
diamonds_c1 <- 
  ggplot(data = diamonds,
         mapping = aes(x = carat, y = price, colour = clarity)) +
  geom_point()

summary(diamonds_c1)

## data: carat, cut, color, clarity, depth, table, price, x, y, z
##   [53940x10]
## mapping:  x = ~carat, y = ~price, colour = ~clarity
## faceting: <ggproto object: Class FacetNull, Facet, gg>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map_data: function
##     params: list
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetNull, Facet, gg>
## -----------------------------------
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity

# Cara 2
diamonds_c2<- 
  ggplot(data = diamonds) +
geom_point(mapping = aes (x = carat, y = price, colour = clarity))

summary(diamonds_c2)

## data: carat, cut, color, clarity, depth, table, price, x, y, z
##   [53940x10]
## faceting: <ggproto object: Class FacetNull, Facet, gg>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map_data: function
##     params: list
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetNull, Facet, gg>
## -----------------------------------
## mapping: x = ~carat, y = ~price, colour = ~clarity 
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity

# Cara 3
diamonds_c3 <- 
  ggplot() + geom_point (
    data = diamonds, mapping = aes(x = carat, y = price, colour = clarity)
    )

summary(diamonds_c3)

## data: [x]
## faceting: <ggproto object: Class FacetNull, Facet, gg>
##     compute_layout: function
##     draw_back: function
##     draw_front: function
##     draw_labels: function
##     draw_panels: function
##     finish_data: function
##     init_scales: function
##     map_data: function
##     params: list
##     setup_data: function
##     setup_params: function
##     shrink: TRUE
##     train_scales: function
##     vars: function
##     super:  <ggproto object: Class FacetNull, Facet, gg>
## -----------------------------------
## mapping: x = ~carat, y = ~price, colour = ~clarity 
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity

8 Komponen grafik

Selain tiga komponen dasar yang sebelumnya telah disinggung, dalam konsep Grammar of Graphic terdapat lima komponen utama lainnya yang berperan penting dalam pembuatan sebuah grafik. yaitu: Data, Mapping, Statistics, Scales, Geometries, Facets, Coordinates, dan Theme

namun tunggu dulu sebelummenyelam lebih dalam terkait visualisas data menggunakan ggplot2konsep dasar yang perlu kita pahami adlah transformasi data mengunakan package salahsatunya dplyr. karna dengan pmahaman konsep transformasi data yang kuat maka akan mempermudah seorang data analysis melakukan visualisasi

Transformasi data

Transformasi data umumnya merupakan sebuah rangkaian yang terdiri lebih dari satu proses. Oleh karena itu, dalam tranformasi data menggunakan dplyr sering digunakan operator pipe (%>%) untuk menghubungkan antara satu fungsi ke fungsi selanjutnya. sebenrnya banyak sekali paket yang digunakan sesuai kebutuhan transformasi data itu sendiri dplyr merupakan salah satu dari sekian banyak paket. proses transfirmasi ini sangat dibutuhkan untuk mempermudah dalam membuat visualisasi yang cukup kompleks.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

glimpse(storms)

## Rows: 10,010
## Columns: 13
## $ name        <chr> "Amy", "Amy", "Amy", "Amy", "Amy", "Amy", "Amy", "Amy",...
## $ year        <dbl> 1975, 1975, 1975, 1975, 1975, 1975, 1975, 1975, 1975, 1...
## $ month       <dbl> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7...
## $ day         <int> 27, 27, 27, 27, 28, 28, 28, 28, 29, 29, 29, 29, 30, 30,...
## $ hour        <dbl> 0, 6, 12, 18, 0, 6, 12, 18, 0, 6, 12, 18, 0, 6, 12, 18,...
## $ lat         <dbl> 27.5, 28.5, 29.5, 30.5, 31.5, 32.4, 33.3, 34.0, 34.4, 3...
## $ long        <dbl> -79.0, -79.0, -79.0, -79.0, -78.8, -78.7, -78.0, -77.0,...
## $ status      <chr> "tropical depression", "tropical depression", "tropical...
## $ category    <ord> -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ wind        <int> 25, 25, 25, 25, 25, 25, 25, 30, 35, 40, 45, 50, 50, 55,...
## $ pressure    <int> 1013, 1013, 1013, 1013, 1012, 1012, 1011, 1006, 1004, 1...
## $ ts_diameter <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ hu_diameter <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...

# Tanpa menggunakan %>% 

storms1 <- select(storms, year, month, wind, pressure)
storms2 <- filter(storms1, between(year, 2000, 2015))
storms3 <- mutate(storms2, month = factor(month.name[storms2$month], levels = month.name))
storms4 <- group_by(storms3, month)
storms_nopipe <- summarise(storms4, avg_wind = mean(wind), avg_pressure = mean(pressure))
glimpse(storms_nopipe)

## Rows: 10
## Columns: 3
## $ month        <fct> January, April, May, June, July, August, September, Oc...
## $ avg_wind     <dbl> 45.65217, 44.61538, 36.76471, 39.03030, 48.21981, 51.9...
## $ avg_pressure <dbl> 999.4348, 996.9231, 1003.4510, 999.5333, 999.1300, 994...

# Menggunakan %>% 

storms_pipe <-
    storms %>%
    select(year, month, wind, pressure) %>%
    filter(between(year, 2000, 2015)) %>%
    mutate(month = factor(month.name[month], levels = month.name)) %>%
    group_by(month) %>%
    summarise(
        avg_wind = mean(wind),
        avg_pressure = mean(pressure)
    )
glimpse(storms_pipe)

## Rows: 10
## Columns: 3
## $ month        <fct> January, April, May, June, July, August, September, Oc...
## $ avg_wind     <dbl> 45.65217, 44.61538, 36.76471, 39.03030, 48.21981, 51.9...
## $ avg_pressure <dbl> 999.4348, 996.9231, 1003.4510, 999.5333, 999.1300, 994...

# Komparasi metode tanpa pipe dan dengan pipe
identical(storms_nopipe, storms_pipe)

## [1] TRUE

berikut merupakan simnplifikasi dari kode yang kita jalankan sebelumnya

storms %>%
  select(year, month, wind, pressure) %>%
  filter(between(year, 2000, 2015)) %>%
  mutate(month = factor(month.name[month], levels = month.name)) %>%
  group_by(month) %>%
  summarise(
    avg_wind = mean(wind),
    avg_pressure = mean(pressure)
  )

## # A tibble: 10 x 3
##    month     avg_wind avg_pressure
##    <fct>        <dbl>        <dbl>
##  1 January       45.7         999.
##  2 April         44.6         997.
##  3 May           36.8        1003.
##  4 June          39.0         999.
##  5 July          48.2         999.
##  6 August        52.0         994.
##  7 September     58.3         988.
##  8 October       55.7         990.
##  9 November      56.5         990.
## 10 December      46.8         997.

Mencoba Visualisasi

Pada kali ini kita akan menggunakan data INDODAPOER yang sudah di sediakan oleh DQLab dalam penjelasan tersebut namun diperlukan pengisntalan package readr terlebih dahulu. dikarenakan perlunya fungsi read_tsv untuk membaca file tsv.gz.

library(readr)

indodapoer <- read_tsv("https://dqlab-dataset.s3-ap-southeast-1.amazonaws.com/indodapoer.tsv.gz")

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   area_name = col_character(),
##   `Import: Commodities and transaction not elsewhere classified (province Level, in USD)` = col_logical(),
##   `Length of National Road: Dirt (in km) (BPS Data, Province only)` = col_logical(),
##   `Length of National Road: Other (in km) (BPS Data, Province only)` = col_logical(),
##   `Total Natural Resources Revenue Sharing from Geothermal  Energy (in IDR, realization value)` = col_logical(),
##   `Total Revenue Sharing` = col_logical(),
##   `Total Specific Allocation Grant for Village (in IDR Billion)` = col_logical()
## )

## See spec(...) for full column specifications.

## Warning: 232 parsing failures.
##  row                                                                                   col           expected  actual                                                                      file
## 1008 Import: Commodities and transaction not elsewhere classified (province Level, in USD) 1/0/T/F/TRUE/FALSE 554693  'https://dqlab-dataset.s3-ap-southeast-1.amazonaws.com/indodapoer.tsv.gz'
## 1009 Import: Commodities and transaction not elsewhere classified (province Level, in USD) 1/0/T/F/TRUE/FALSE 1291450 'https://dqlab-dataset.s3-ap-southeast-1.amazonaws.com/indodapoer.tsv.gz'
## 1010 Import: Commodities and transaction not elsewhere classified (province Level, in USD) 1/0/T/F/TRUE/FALSE 365356  'https://dqlab-dataset.s3-ap-southeast-1.amazonaws.com/indodapoer.tsv.gz'
## 1011 Import: Commodities and transaction not elsewhere classified (province Level, in USD) 1/0/T/F/TRUE/FALSE 216478  'https://dqlab-dataset.s3-ap-southeast-1.amazonaws.com/indodapoer.tsv.gz'
## 1012 Import: Commodities and transaction not elsewhere classified (province Level, in USD) 1/0/T/F/TRUE/FALSE 646310  'https://dqlab-dataset.s3-ap-southeast-1.amazonaws.com/indodapoer.tsv.gz'
## .... ..................................................................................... .................. ....... .........................................................................
## See problems(...) for more details.

nrow(indodapoer)

## [1] 22468

ncol(indodapoer)

## [1] 222

pada data tersebut terdapat 22468 baris dan 222 kolom. dari data tersebut masih banyak kolom yang tidak memenuhi kaidah “syntactically valid names” namun pada R terdapat package janitor yang dapat mempermudah pekerjaan dalam membersihkan hal tersebut.

#install.packages("janitor", repos = "http://cran.us.r-project.org")
library(janitor)

## Warning: package 'janitor' was built under R version 3.6.3

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

head(colnames(indodapoer), 15)

##  [1] "area_name"                                                                                                  
##  [2] "year"                                                                                                       
##  [3] "Agriculture function expenditure (in IDR)"                                                                  
##  [4] "Average National Exam Score: Junior Secondary Level (out of 100, available only in district level for 2009)"
##  [5] "Average National Exam Score: Primary Level (out of 100, available only in district level for 2009)"         
##  [6] "Average National Exam Score: Senior Secondary Level (out of 100, available only in district level for 2009)"
##  [7] "Birth attended by Skilled Health worker (in % of total birth)"                                              
##  [8] "BPK Audit Report on Sub-National Budget"                                                                    
##  [9] "Capital expenditure (in IDR)"                                                                               
## [10] "Consumer Price Index in 42 cities base 1996"                                                                
## [11] "Consumer Price Index in 45 cities base 2002"                                                                
## [12] "Consumer Price Index in 66 cities base 2007"                                                                
## [13] "Economy function expenditure (in IDR)"                                                                      
## [14] "Education function expenditure (in IDR)"                                                                    
## [15] "Environment function expenditure (in IDR)"

indodapoer <- clean_names(indodapoer)
head(colnames(indodapoer), 15)

##  [1] "area_name"                                                                                              
##  [2] "year"                                                                                                   
##  [3] "agriculture_function_expenditure_in_idr"                                                                
##  [4] "average_national_exam_score_junior_secondary_level_out_of_100_available_only_in_district_level_for_2009"
##  [5] "average_national_exam_score_primary_level_out_of_100_available_only_in_district_level_for_2009"         
##  [6] "average_national_exam_score_senior_secondary_level_out_of_100_available_only_in_district_level_for_2009"
##  [7] "birth_attended_by_skilled_health_worker_in_percent_of_total_birth"                                      
##  [8] "bpk_audit_report_on_sub_national_budget"                                                                
##  [9] "capital_expenditure_in_idr"                                                                             
## [10] "consumer_price_index_in_42_cities_base_1996"                                                            
## [11] "consumer_price_index_in_45_cities_base_2002"                                                            
## [12] "consumer_price_index_in_66_cities_base_2007"                                                            
## [13] "economy_function_expenditure_in_idr"                                                                    
## [14] "education_function_expenditure_in_idr"                                                                  
## [15] "environment_function_expenditure_in_idr"

dengan demikian masalah “syntactically valid names” sudah teratasi

Produk Domestik Regional Bruto

Utuk melihat perkembangan Produk Domestik Regional Bruto (PDRB) Non-Migas dari provinsi-provinsi di pulau Jawa. Informasi PDRB Non-Migas tersebut tersimpan pada kolom total_gdp_excluding_oil_and_gas_in_idr_million_constant_price. Sebelum memulai membuat visualisasi, ekstraklah data tersebut menjadi pdrb_pjawa

library(stringr)

## Warning: package 'stringr' was built under R version 3.6.2

library(dplyr)
pdrb_pjawa <-
indodapoer %>%
filter(
area_name %in% c(
"Banten, Prop.",
"DKI Jakarta, Prop.",
"Jawa Barat, Prop.",
"Jawa Tengah, Prop.",
"DI Yogyakarta, Prop.",
"Jawa Timur, Prop."
)
) %>%
transmute(
provinsi = str_remove(area_name, ", Prop."),
tahun = year,
pdrb_nonmigas = total_gdp_excluding_oil_and_gas_in_idr_million_constant_price
) %>%
filter(!is.na(pdrb_nonmigas))

glimpse(pdrb_pjawa)

## Rows: 164
## Columns: 3
## $ provinsi      <chr> "Banten", "Banten", "Banten", "Banten", "Banten", "Ba...
## $ tahun         <dbl> 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008,...
## $ pdrb_nonmigas <dbl> 45690559, 47495383, 49449321, 51957458, 54880407, 581...

Grafik PDRB Non-Migas

Dengan menggunakan data pdrb_pjawa kita akan membuat grafik tren PDRB Non-Migas dengan baris kode berikut:

library(dplyr)
library(ggplot2)
library(forcats)

pdrb_pjawa %>%
mutate(
provinsi = fct_reorder2(provinsi, tahun, pdrb_nonmigas)
) %>%
ggplot(aes(tahun, pdrb_nonmigas, colour = provinsi)) +
geom_line()

Urutan nama provinsi pada legenda tidak mempresentasikan urutan yang ditampilkan pada grafik. Bayangkan jika kita memiliki lebih banyak nama provinsi yang ditampilkan pada grafik, akan sulit untuk dapat mencocokan nama pada legenda dan garis pada grafik.

Direct Labeling

salah satu solusi untuk mengatasi permasalahan pada grafik sebelumnya dalah dengan menggunakan direct labeling. Hal ini lebih direkomendasikan karena salah satu prinsip dalam merancang grafik adalah “sebisa mungkin rancang grafik yang tidak memerlukan legenda”. Kita dapat memanfaatkan fungsi geom_dl() dari paket directlabels untuk membuat direct labeling di ggplot2. Adapun aesthetic mapping yang diperlukan dalam geom_dl() tersebut adalah label.

library(ggplot2)
library(dplyr)
library(directlabels)

## Warning: package 'directlabels' was built under R version 3.6.3

pdrb_pjawa %>% 
  ggplot(aes(tahun, pdrb_nonmigas)) +
  geom_line(aes(colour = provinsi), show.legend = FALSE) +
  geom_dl(
    aes(label = provinsi), 
    method = "last.points",
    position = position_nudge(x = 0.3) # agar teks tidak berhimpitan dengan garis
  )

Finalisasi Grafik

Selanjutnya kita akan melakukan finalisasi grafik menggunkana kode berikut:

library(ggplot2)
library(dplyr)
library(directlabels)
library(hrbrthemes)

## Warning: package 'hrbrthemes' was built under R version 3.6.3

## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.

##       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and

##       if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow

pdrb_pjawa %>% 
  ggplot(aes(tahun, pdrb_nonmigas/1e6)) +
  geom_line(aes(colour = provinsi), show.legend = FALSE) +
  geom_dl(
    aes(label = provinsi), 
    method = "last.points",
    position = position_nudge(x = 0.3) # agar teks tidak berhimpitan dengan garis
  ) +
  labs(
    x = NULL,
    y = NULL,
    title = "PDRB Non-Migas di Pulau Jawa Hingga Tahun 2011",
    subtitle = "PDRB atas dasar harga konstan, dalam satuan triliun",
    caption = "Data: INDO-DAPOER, The World Bank"
  ) +
  coord_cartesian(clip =  "off") +
  theme_ipsum(grid = "Y", ticks = TRUE)

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

jika menemukan kata “font family not found in Windows font database” tidak usah hawatir dikarenakan haltersebut tandanya database pada mesin kita tidak terdapat font tersebut maka secara default akan di gantikan dengan font default pada R.

Kabkot di Indonesia

Kita akan banyak melakukan proses transformasi data sebelum akhirnya membuat visualisasi yang menarik. kita diminta untuk mengamati kondisi infrastruktur jalan raya di seluruh kabupatan dan kota di Indonesia.

library(dplyr)
library(stringr)
jalan_kabkota <- 
  indodapoer %>% 
  filter(str_detect(area_name, ", Prop.", negate = TRUE)) %>% 
  filter(year == 2008) %>%
  transmute(
    kabkota = area_name,
    jalan_rusak_parah = length_of_district_road_bad_damage_in_km_bina_marga_data,
    jalan_rusak_ringan = length_of_district_road_light_damage_in_km_bina_marga_data,
    jalan_cukup_baik = length_of_district_road_fair_in_km_bina_marga_data,
    jalan_sangat_baik = length_of_district_road_good_in_km_bina_marga_data
  )
glimpse(jalan_kabkota)

## Rows: 514
## Columns: 5
## $ kabkota            <chr> "Aceh Barat, Kab.", "Aceh Barat Daya, Kab.", "Ac...
## $ jalan_rusak_parah  <dbl> 64, 1, 97, 112, 21, NA, 130, 8, 168, 76, 25, NA,...
## $ jalan_rusak_ringan <dbl> 191, 15, 101, 321, 36, 553, 183, 207, 174, 35, 7...
## $ jalan_cukup_baik   <dbl> 218, 81, 270, 416, 59, 170, 146, 284, 201, 74, 3...
## $ jalan_sangat_baik  <dbl> 153, 87, 105, 284, 89, 25, 177, 432, 177, 221, 3...

Pivot

Selanjutnya Anda diminta untuk melakukan pivot pada data jalan_kabkota tersebut sehingga menghasilkan sebuah dataframe dengan tiga kolom, yaitu: kabkota, kondisi, dan panjang_jalan.

library(tidyr)
library(dplyr)

glimpse(jalan_kabkota)

## Rows: 514
## Columns: 5
## $ kabkota            <chr> "Aceh Barat, Kab.", "Aceh Barat Daya, Kab.", "Ac...
## $ jalan_rusak_parah  <dbl> 64, 1, 97, 112, 21, NA, 130, 8, 168, 76, 25, NA,...
## $ jalan_rusak_ringan <dbl> 191, 15, 101, 321, 36, 553, 183, 207, 174, 35, 7...
## $ jalan_cukup_baik   <dbl> 218, 81, 270, 416, 59, 170, 146, 284, 201, 74, 3...
## $ jalan_sangat_baik  <dbl> 153, 87, 105, 284, 89, 25, 177, 432, 177, 221, 3...

jalan_kabkota <- 
  jalan_kabkota %>% 
  pivot_longer(
    cols = starts_with("jalan_"),
    names_to = "kondisi",
    names_prefix = "jalan_",
    values_to = "panjang_jalan"
  )
glimpse(jalan_kabkota)

## Rows: 2,056
## Columns: 3
## $ kabkota       <chr> "Aceh Barat, Kab.", "Aceh Barat, Kab.", "Aceh Barat, ...
## $ kondisi       <chr> "rusak_parah", "rusak_ringan", "cukup_baik", "sangat_...
## $ panjang_jalan <dbl> 64, 191, 218, 153, 1, 15, 81, 87, 97, 101, 270, 105, ...

langkah selanjutnya adalah menentukan mana wilayah kabupaten dan kota

library(dplyr)
library(stringr)
jalan_kabkota <-
jalan_kabkota %>%
mutate(
status = case_when(
str_detect(kabkota, ", Kab") ~ "Kabupaten",
str_detect(kabkota, ", Kota") ~ "Kota",
str_detect(kabkota, "City") ~ "Kota",
TRUE ~ NA_character_
),
kondisi = factor(
kondisi,
levels = c("rusak_parah", "rusak_ringan", "cukup_baik", "sangat_baik"),
labels = c("Rusak parah", "Rusak ringan", "Cukup baik", "Sangat baik")
)
)
glimpse(jalan_kabkota)

## Rows: 2,056
## Columns: 4
## $ kabkota       <chr> "Aceh Barat, Kab.", "Aceh Barat, Kab.", "Aceh Barat, ...
## $ kondisi       <fct> Rusak parah, Rusak ringan, Cukup baik, Sangat baik, R...
## $ panjang_jalan <dbl> 64, 191, 218, 153, 1, 15, 81, 87, 97, 101, 270, 105, ...
## $ status        <chr> "Kabupaten", "Kabupaten", "Kabupaten", "Kabupaten", "...

Grafik Kondisi Jalan

Sekarang saatnya kita membuat grafik yang akan menunjukan kondisi jalan raya di kabupaten & kota berdasarkan kondisinya.

#install.packages("ggridges",repos = "http://cran.us.r-project.org")
library(ggplot2)
library(dplyr)
library(ggridges)

## Warning: package 'ggridges' was built under R version 3.6.3

jalan_kabkota_plot <- 
  jalan_kabkota %>% 
  ggplot(aes(panjang_jalan, kondisi)) +
  facet_wrap(~status) +
  geom_density_ridges_gradient(
    aes(fill = after_stat(x)), 
    show.legend = FALSE
  )
jalan_kabkota_plot

## Picking joint bandwidth of 41

## Picking joint bandwidth of 28.2

### Trnsformasi Logaritmik Kita dapat melakukan komparasi distribusi jalan kabupaten/kota berdasarkan berdasarkan kondisinya dengan mudah. Namun, dalam grafik tersebut masih ada beberapa hal yang harus diperbaiki. jika dilakukan transformasi menggunakan fungsi log sebagai berikut:

#install.packages("ggridges",repos = "http://cran.us.r-project.org")
library(ggplot2)
library(dplyr)
library(ggridges)

jalan_kabkota_plot <-
  jalan_kabkota %>%
  ggplot(aes(panjang_jalan, kondisi)) +
  facet_wrap(~status) +
  geom_density_ridges_gradient(
    aes(fill = after_stat(x)),
    show.legend = FALSE
  )
jalan_kabkota_plot +
  geom_vline(xintercept = 100, linetype = "dashed", colour = "darkslategray4") +
  scale_x_continuous(trans = "log10")

## Picking joint bandwidth of 0.128

## Picking joint bandwidth of 0.17

### Finalisasi

#install.packages("ggridges",repos = "http://cran.us.r-project.org")
library(ggplot2)
library(dplyr)
library(ggridges)
library(hrbrthemes)

jalan_kabkota_plot <-
jalan_kabkota %>%
ggplot(aes(panjang_jalan, kondisi)) +
facet_wrap(~status) +
geom_density_ridges_gradient(
aes(fill = after_stat(x)),
show.legend = FALSE
)
jalan_kabkota_plot +
geom_vline(xintercept = 100, linetype = "dashed", colour = "darkslategray4") +
scale_x_continuous(trans = "log10") +
scale_fill_viridis_c(option = "magma") +
labs(
x = "Panjang jalan (Km)",
y = NULL,
title = "Jalan Kabupaten/Kota Berdasarkan Kondisi",
subtitle = "Berdasarkan data tahun 2008, garis vertikal menunjukan panjang jalan 100 Km",
caption = "Data: INDO-DAPOER, The World Bank"
) +
theme_ipsum(grid = FALSE, ticks = TRUE)

## Picking joint bandwidth of 0.128

## Picking joint bandwidth of 0.17

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

tidak usah hawatir ketika ada tulisan “font family not found in Windows font database” karena hal tersebut menunjukan bahwa pada komputer atau laptop tidak terinstal font yang digunakan

Fasilitas Kesehatan di Kalimantan

sekarang Kita akan membuat sebuah grafik unik bernama waffle charts.

library(dplyr)
library(ggplot2)
library(tidyr)
library(stringr)
library(forcats)
faskes_kalimantan <-
indodapoer %>%
filter(str_detect(area_name, "Kalimantan")) %>%
filter(year == 2011) %>%
transmute(
provinsi = str_remove(area_name, ", Prop."),
rumahsakit = number_of_hospitals,
polindes = number_of_polindes_poliklinik_desa_village_polyclinic,
puskesmas = number_of_puskesmas_and_its_line_services
) %>%
pivot_longer(
cols = -provinsi,
names_to = "faskes",
values_to = "jumlah"
) %>%
filter(!is.na(jumlah)) %>%
mutate(
provinsi = fct_reorder(provinsi, jumlah, sum),
jumlah = ceiling(jumlah / 10)
)
glimpse(faskes_kalimantan)

## Rows: 12
## Columns: 3
## $ provinsi <fct> Kalimantan Barat, Kalimantan Barat, Kalimantan Barat, Kali...
## $ faskes   <chr> "rumahsakit", "polindes", "puskesmas", "rumahsakit", "poli...
## $ jumlah   <dbl> 4, 53, 98, 5, 11, 96, 3, 41, 75, 2, 22, 109

Waffle Charts

Waffle charts dapat dibuat di ggplot2 dengan menggunakan bantuan paket waffle. geom_waffle() yang memiliki dua aesthetic mappings wajib, yakni fill dan values, merupakan fungsi utama dalam pembuatan jenis grafik tersebut.

#install.packages("waffle", repos = "https://cinc.rud.is")
library(waffle)
library(ggplot2)
library(dplyr)

faskes_kalimantan_plot <-
faskes_kalimantan %>%
ggplot(aes(fill = faskes, values = jumlah)) +
facet_wrap(~provinsi) +
geom_waffle(colour = "white")
faskes_kalimantan_plot

### Mengatur Warna dan Label Selain itu, Kita juga dapat melakukan modifikasi teks label langsung pada fungsi tersebut dengan mengatur argumen labels.

#("waffle", repos = "https://cinc.rud.is")
library(waffle)
library(ggplot2)
library(dplyr)
faskes_kalimantan_plot <-
faskes_kalimantan %>%
ggplot(aes(fill = faskes, values = jumlah)) +
facet_wrap(~provinsi) +
geom_waffle(colour = "white")

faskes_kalimantan_plot <-
faskes_kalimantan_plot +
scale_fill_manual(
values = c(
"polindes" = "seagreen3",
"puskesmas" = "steelblue",
"rumahsakit" = "cyan4"
),
labels = c(
"polindes" = "Poliklinik Desa",
"puskesmas" = "Puskesmas",
"rumahsakit" = "Rumah Sakit"
)
) +
labs(
fill = NULL,
title = "Fasilitas Kesehatan di Kalimantan",
subtitle = "Berdasarkan data tahun 2011, satu petak menyatakan 戼㸱10 faskes",
caption = "Data: INDO-DAPOER, The World Bank"
)
faskes_kalimantan_plot

### Finalisasi Waffle Charts

#install.packages("waffle", repos = "https://cinc.rud.is")
library(waffle)
library(ggplot2)
library(dplyr)
faskes_kalimantan_plot <-
  faskes_kalimantan %>%
  ggplot(aes(fill = faskes, values = jumlah)) +
  facet_wrap(~provinsi) +
  geom_waffle(colour = "white")

faskes_kalimantan_plot <-
  faskes_kalimantan_plot +
  scale_fill_manual(
    values = c(
      "polindes" = "seagreen3",
      "puskesmas" = "steelblue",
      "rumahsakit" = "cyan4"
    ),
    labels = c(
      "polindes" = "Poliklinik Desa",
      "puskesmas" = "Puskesmas",
      "rumahsakit" = "Rumah Sakit"
    )
  ) +
  labs(
    fill = NULL,
    title = "Fasilitas Kesehatan di Kalimantan",
    subtitle = "Berdasarkan data tahun 2011, satu petak menyatakan 戼㸱10 faskes",
    caption = "Data: INDO-DAPOER, The World Bank"
  )
  
faskes_kalimantan_plot +
  coord_equal() +
  theme(
    text = element_text(family = "Arial Narrow"),
    plot.title.position = "plot",
    plot.title = element_text(face = "bold", size = 18, hjust = 0.5),
    plot.subtitle = element_text(face = "plain", size = 12, hjust = 0.5),
    plot.caption = element_text(face = "italic", size = 9),
    legend.position = "bottom",
    panel.background = element_blank(),
    panel.grid = element_blank(),
    strip.background = element_blank(),
    strip.text = element_text(face = "italic", size = 9, hjust = 0),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks = element_blank()
  )

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

dan hasil akhirnya adalah seperti grafik diatas.

Penutup

Sekian dulu sharing yang dilakukan menggunakan data dari DQLab semoga bermanfaat.