Statistika deskriptif tidak hanya berhenti pada tabel angka dan grafik sederhana, tetapi dapat berkembang ke arah visualisasi data yang lebih kaya dan interaktif. Dalam konteks statistika lingkungan, visualisasi memiliki peran penting untuk memperlihatkan pola, tren, serta distribusi data yang berhubungan dengan fenomena lingkungan, seperti kualitas udara, pencemaran air, hingga perubahan iklim. Pada materi ini, kita akan membahas empat jenis visualisasi data yang lebih “advance” dibanding grafik dasar: Choropleth Map, Heatmap, Violin Plot, dan Ridgeline Plot.
Apa itu Choropleth Map?
Choropleth map adalah peta tematik yang menampilkan data kuantitatif menggunakan gradasi warna sesuai dengan batas wilayah tertentu. Visualisasi ini sangat berguna dalam statistika lingkungan untuk melihat perbedaan spasial, misalnya distribusi polusi udara, curah hujan, atau kualitas air pada wilayah yang berbeda.
library(sf)
## Warning: package 'sf' was built under R version 4.4.3
## Linking to GEOS 3.13.0, GDAL 3.10.1, PROJ 9.5.1; sf_use_s2() is TRUE
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Baca shapefile (misalnya kecamatan di Semarang)
peta <- st_read("Kecamatan_Semarang.geojson")
## Reading layer `Kecamatan_Semarang' from data source
## `C:\Users\USER\Downloads\Kecamatan_Semarang.geojson' using driver `GeoJSON'
## Simple feature collection with 32 features and 10 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 110.2673 ymin: -7.114464 xmax: 110.5089 ymax: -6.931992
## Geodetic CRS: WGS 84
peta
## Simple feature collection with 32 features and 10 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 110.2673 ymin: -7.114464 xmax: 110.5089 ymax: -6.931992
## Geodetic CRS: WGS 84
## First 10 features:
## id X.id X.relations admin_level boundary
## 1 relation/8110792 relation/8110792 <NA> 6 administrative
## 2 relation/8132993 relation/8132993 <NA> 6 administrative
## 3 relation/8147917 relation/8147917 <NA> 6 administrative
## 4 relation/8178535 relation/8178535 <NA> 6 administrative
## 5 relation/8201927 relation/8201927 <NA> 6 administrative
## 6 relation/8201936 relation/8201936 <NA> 6 administrative
## 7 relation/8222496 relation/8222496 <NA> 6 administrative
## 8 relation/8244965 relation/8244965 <NA> 6 administrative
## 9 relation/8350713 relation/8350713 <NA> 6 administrative
## 10 relation/8350849 relation/8350849 <NA> 6 administrative
## is_in.city is_in.province name source type
## 1 Semarang Jawa Tengah Gayamsari HOT_InAWARESurvey_2018 boundary
## 2 Semarang Jawa Tengah Semarang Selatan HOT_InAWARESurvey_2018 boundary
## 3 Semarang Jawa Tengah Gajah Mungkur HOT_InAWARESurvey_2018 boundary
## 4 Semarang Jawa Tengah Candisari HOT_InAWARESurvey_2018 boundary
## 5 Semarang Jawa Tengah Semarang Timur HOT_InAWARESurvey_2018 boundary
## 6 Semarang Jawa Tengah Semarang Utara HOT_InAWARESurvey_2018 boundary
## 7 Semarang Jawa Tengah Semarang Tengah HOT_InAWARESurvey_2018 boundary
## 8 Semarang Jawa Tengah Semarang Barat HOT_InAWARESurey_2018 boundary
## 9 Semarang Jawa Tengah Genuk HOT_InAWARESurvey_2018 boundary
## 10 Semarang Jawa Tengah Pedurungan HOT_InAWARESurvey_2018 boundary
## geometry
## 1 POLYGON ((110.4404 -6.95149...
## 2 POLYGON ((110.432 -6.994293...
## 3 POLYGON ((110.3874 -7.01746...
## 4 POLYGON ((110.4161 -7.00192...
## 5 POLYGON ((110.4316 -6.98417...
## 6 POLYGON ((110.4308 -6.96088...
## 7 POLYGON ((110.4026 -6.97739...
## 8 POLYGON ((110.3797 -7.00813...
## 9 POLYGON ((110.5048 -6.96815...
## 10 POLYGON ((110.4553 -6.96618...
# Simulasi data PM2.5 untuk tiap kecamatan
set.seed(100)
data <- data.frame(
Kecamatan = peta$name, # kolom nama kecamatan
PM25 = sample(20:100, length(peta$name), replace = TRUE)
)
data
## Kecamatan PM25
## 1 Gayamsari 93
## 2 Semarang Selatan 97
## 3 Gajah Mungkur 42
## 4 Candisari 89
## 5 Semarang Timur 23
## 6 Semarang Utara 74
## 7 Semarang Tengah 89
## 8 Semarang Barat 26
## 9 Genuk 26
## 10 Pedurungan 74
## 11 Tugu 62
## 12 Gunung Pati 80
## 13 Tembalang 31
## 14 Mijen 70
## 15 Banyumanik 91
## 16 Ngaliyan 37
## 17 <NA> 44
## 18 <NA> 21
## 19 <NA> 70
## 20 <NA> 87
## 21 <NA> 87
## 22 <NA> 71
## 23 <NA> 67
## 24 <NA> 51
## 25 <NA> 58
## 26 <NA> 35
## 27 <NA> 94
## 28 <NA> 85
## 29 <NA> 89
## 30 <NA> 64
## 31 <NA> 49
## 32 <NA> 49
# Gabungkan data dengan peta
peta_data <- left_join(peta, data, by=c("name"="Kecamatan"))
## Warning in sf_column %in% names(g): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 17 of `x` matches multiple rows in `y`.
## ℹ Row 17 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
peta_data
## Simple feature collection with 272 features and 11 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 110.2673 ymin: -7.114464 xmax: 110.5089 ymax: -6.931992
## Geodetic CRS: WGS 84
## First 10 features:
## id X.id X.relations admin_level boundary
## 1 relation/8110792 relation/8110792 <NA> 6 administrative
## 2 relation/8132993 relation/8132993 <NA> 6 administrative
## 3 relation/8147917 relation/8147917 <NA> 6 administrative
## 4 relation/8178535 relation/8178535 <NA> 6 administrative
## 5 relation/8201927 relation/8201927 <NA> 6 administrative
## 6 relation/8201936 relation/8201936 <NA> 6 administrative
## 7 relation/8222496 relation/8222496 <NA> 6 administrative
## 8 relation/8244965 relation/8244965 <NA> 6 administrative
## 9 relation/8350713 relation/8350713 <NA> 6 administrative
## 10 relation/8350849 relation/8350849 <NA> 6 administrative
## is_in.city is_in.province name source type
## 1 Semarang Jawa Tengah Gayamsari HOT_InAWARESurvey_2018 boundary
## 2 Semarang Jawa Tengah Semarang Selatan HOT_InAWARESurvey_2018 boundary
## 3 Semarang Jawa Tengah Gajah Mungkur HOT_InAWARESurvey_2018 boundary
## 4 Semarang Jawa Tengah Candisari HOT_InAWARESurvey_2018 boundary
## 5 Semarang Jawa Tengah Semarang Timur HOT_InAWARESurvey_2018 boundary
## 6 Semarang Jawa Tengah Semarang Utara HOT_InAWARESurvey_2018 boundary
## 7 Semarang Jawa Tengah Semarang Tengah HOT_InAWARESurvey_2018 boundary
## 8 Semarang Jawa Tengah Semarang Barat HOT_InAWARESurey_2018 boundary
## 9 Semarang Jawa Tengah Genuk HOT_InAWARESurvey_2018 boundary
## 10 Semarang Jawa Tengah Pedurungan HOT_InAWARESurvey_2018 boundary
## PM25 geometry
## 1 93 POLYGON ((110.4404 -6.95149...
## 2 97 POLYGON ((110.432 -6.994293...
## 3 42 POLYGON ((110.3874 -7.01746...
## 4 89 POLYGON ((110.4161 -7.00192...
## 5 23 POLYGON ((110.4316 -6.98417...
## 6 74 POLYGON ((110.4308 -6.96088...
## 7 89 POLYGON ((110.4026 -6.97739...
## 8 26 POLYGON ((110.3797 -7.00813...
## 9 26 POLYGON ((110.5048 -6.96815...
## 10 74 POLYGON ((110.4553 -6.96618...
# Plot choropleth
ggplot(peta_data) +
geom_sf(aes(fill=PM25), color="white") +
geom_sf_text(aes(label = name), size = 2, color = "white") +
scale_fill_gradient(low="tomato", high="navy", name="PM2.5") +
theme_minimal() +
labs(title="Choropleth Map PM2.5 - Kota Semarang")
## Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
## give correct results for longitude/latitude data
## Warning: Removed 256 rows containing missing values or values outside the scale range
## (`geom_text()`).
Interpretasi:
Contoh Insight:
Apa itu Heatmap?
Heatmap adalah representasi data dalam bentuk grid (baris-kolom) dengan warna sebagai pengganti angka. Dalam statistika lingkungan, heatmap bisa menampilkan pola temporal (waktu) atau spasial (lokasi) secara kompak, misalnya variasi polusi udara setiap jam dalam satu minggu.
Contoh Data dan Visualisasi (Polusi per Jam dalam Seminggu)
library(ggplot2)
library(dplyr)
set.seed(42)
# Hari dan jam
hari <- c("Senin","Selasa","Rabu","Kamis","Jumat","Sabtu","Minggu")
jam <- 0:23
# Buat grid data
data_heat <- expand.grid(Hari = hari, Jam = jam)
# Fungsi untuk simulasi realistis
data_heat <- data_heat %>%
rowwise() %>%
mutate(PM25 = {
base <- 30
# Tambah polusi saat jam sibuk
if (Jam %in% 7:9 | Jam %in% 17:19) base <- base + 40
# Tambah polusi khusus hari kerja
if (Hari %in% c("Senin","Selasa","Rabu","Kamis","Jumat")) base <- base + 10
# Sedikit lebih rendah saat malam/dini hari
if (Jam %in% 0:5) base <- base - 15
# Akhir pekan lebih bersih
if (Hari %in% c("Sabtu","Minggu")) base <- base - 10
# Variasi acak
round(rnorm(1, mean = base, sd = 10))
}) %>%
ungroup()
data_heat
## # A tibble: 168 × 3
## Hari Jam PM25
## <fct> <int> <dbl>
## 1 Senin 0 39
## 2 Selasa 0 19
## 3 Rabu 0 29
## 4 Kamis 0 31
## 5 Jumat 0 29
## 6 Sabtu 0 4
## 7 Minggu 0 20
## 8 Senin 1 24
## 9 Selasa 1 45
## 10 Rabu 1 24
## # ℹ 158 more rows
# Plot heatmap
ggplot(data_heat, aes(x=Jam, y=Hari, fill=PM25)) +
geom_tile(color="white") +
scale_fill_gradient(low="lightyellow", high="darkred") +
theme_minimal() +
labs(
title="Heatmap PM2.5 per Jam dan Hari",
x="Jam", y="Hari"
)
Interpretasi:
Apa itu Violin Plot?
Violin plot adalah pengembangan dari boxplot yang menambahkan kepadatan distribusi data di kedua sisi, menyerupai bentuk biola. Visualisasi ini berguna untuk melihat distribusi pencemaran antar kategori (misalnya PM2.5 antar kecamatan, atau kadar logam berat di sungai antar lokasi).
set.seed(42)
data_violin <- data.frame(
Sungai = rep(c("Banjir Kanal Barat", "Banjir Kanal Timur", "Kali Garang"), each=50),
pH = c(
rnorm(50, 6.8, 0.3), # Banjir Kanal Barat
rnorm(50, 7.2, 0.4), # Banjir Kanal Timur
rnorm(50, 6.5, 0.2) # Kali Garang
),
Waktu = rep(seq.POSIXt(from = as.POSIXct("2025-09-01 00:00"),
by = "hour", length.out = 50), 3)
)
head(data_violin, 10)
## Sungai pH Waktu
## 1 Banjir Kanal Barat 7.211288 2025-09-01 00:00:00
## 2 Banjir Kanal Barat 6.630591 2025-09-01 01:00:00
## 3 Banjir Kanal Barat 6.908939 2025-09-01 02:00:00
## 4 Banjir Kanal Barat 6.989859 2025-09-01 03:00:00
## 5 Banjir Kanal Barat 6.921280 2025-09-01 04:00:00
## 6 Banjir Kanal Barat 6.768163 2025-09-01 05:00:00
## 7 Banjir Kanal Barat 7.253457 2025-09-01 06:00:00
## 8 Banjir Kanal Barat 6.771602 2025-09-01 07:00:00
## 9 Banjir Kanal Barat 7.405527 2025-09-01 08:00:00
## 10 Banjir Kanal Barat 6.781186 2025-09-01 09:00:00
ggplot(data_violin, aes(x=Sungai, y=pH, fill=Sungai)) +
geom_violin(trim=FALSE, alpha=0.6) +
geom_boxplot(width=0.1, fill="white") +
theme_minimal() +
labs(title="Sebaran pH Air Sungai di Semarang",
subtitle="Violin plot menunjukkan distribusi lengkap",
x="", y="pH")
Contoh interpretasi:
Apa itu Ridgeline Plot?
Ridgeline plot menampilkan distribusi data (density) dari beberapa kategori dalam bentuk tumpukan kurva yang saling berlapis. Ini sangat efektif untuk menunjukkan perubahan distribusi data antar waktu atau lokasi.
library(ggridges)
## Warning: package 'ggridges' was built under R version 4.4.3
set.seed(42)
data_ridge <- data.frame(
Tahun = rep(2015:2024, each=12),
Bulan = rep(month.abb, times=10),
CurahHujan = rnorm(120, mean=200, sd=50)
)
data_ridge
## Tahun Bulan CurahHujan
## 1 2015 Jan 268.54792
## 2 2015 Feb 171.76509
## 3 2015 Mar 218.15642
## 4 2015 Apr 231.64313
## 5 2015 May 220.21342
## 6 2015 Jun 194.69377
## 7 2015 Jul 275.57610
## 8 2015 Aug 195.26705
## 9 2015 Sep 300.92119
## 10 2015 Oct 196.86430
## 11 2015 Nov 265.24348
## 12 2015 Dec 314.33227
## 13 2016 Jan 130.55696
## 14 2016 Feb 186.06056
## 15 2016 Mar 193.33393
## 16 2016 Apr 231.79752
## 17 2016 May 185.78735
## 18 2016 Jun 67.17723
## 19 2016 Jul 77.97665
## 20 2016 Aug 266.00567
## 21 2016 Sep 184.66807
## 22 2016 Oct 110.93458
## 23 2016 Nov 191.40413
## 24 2016 Dec 260.73373
## 25 2017 Jan 294.75967
## 26 2017 Feb 178.47654
## 27 2017 Mar 187.13653
## 28 2017 Apr 111.84185
## 29 2017 May 223.00487
## 30 2017 Jun 168.00026
## 31 2017 Jul 222.77251
## 32 2017 Aug 235.24187
## 33 2017 Sep 251.75518
## 34 2017 Oct 169.55368
## 35 2017 Nov 225.24776
## 36 2017 Dec 114.14957
## 37 2018 Jan 160.77705
## 38 2018 Feb 157.45462
## 39 2018 Mar 79.28962
## 40 2018 Apr 201.80613
## 41 2018 May 210.29993
## 42 2018 Jun 181.94714
## 43 2018 Jul 237.90816
## 44 2018 Aug 163.66476
## 45 2018 Sep 131.58595
## 46 2018 Oct 221.64090
## 47 2018 Nov 159.43034
## 48 2018 Dec 272.20506
## 49 2019 Jan 178.42769
## 50 2019 Feb 232.78239
## 51 2019 Mar 216.09626
## 52 2019 Apr 160.80805
## 53 2019 May 278.78638
## 54 2019 Jun 232.14497
## 55 2019 Jul 204.48803
## 56 2019 Aug 213.82754
## 57 2019 Sep 233.96444
## 58 2019 Oct 204.49164
## 59 2019 Nov 50.34550
## 60 2019 Dec 214.24415
## 61 2020 Jan 181.63827
## 62 2020 Feb 209.26153
## 63 2020 Mar 229.09119
## 64 2020 Apr 269.98684
## 65 2020 May 163.63540
## 66 2020 Jun 265.12713
## 67 2020 Jul 216.79241
## 68 2020 Aug 251.92530
## 69 2020 Sep 246.03643
## 70 2020 Oct 236.04391
## 71 2020 Nov 147.84405
## 72 2020 Dec 195.49068
## 73 2021 Jan 231.17591
## 74 2021 Feb 152.32383
## 75 2021 Mar 172.85856
## 76 2021 Apr 229.04982
## 77 2021 May 238.40894
## 78 2021 Jun 223.18838
## 79 2021 Jul 155.71119
## 80 2021 Aug 145.01096
## 81 2021 Sep 275.63535
## 82 2021 Oct 212.89607
## 83 2021 Nov 204.42201
## 84 2021 Dec 193.95517
## 85 2022 Jan 140.28356
## 86 2022 Feb 230.59984
## 87 2022 Mar 189.14301
## 88 2022 Apr 190.86216
## 89 2022 May 246.66732
## 90 2022 Jun 241.08866
## 91 2022 Jul 269.60582
## 92 2022 Aug 176.19130
## 93 2022 Sep 232.51743
## 94 2022 Oct 269.55552
## 95 2022 Nov 144.46056
## 96 2022 Dec 156.96037
## 97 2023 Jan 143.41307
## 98 2023 Feb 127.03930
## 99 2023 Mar 203.99913
## 100 2023 Apr 232.66022
## 101 2023 May 260.04827
## 102 2023 Jun 252.23755
## 103 2023 Jul 149.83957
## 104 2023 Aug 292.42410
## 105 2023 Sep 166.66133
## 106 2023 Oct 205.27569
## 107 2023 Nov 178.88721
## 108 2023 Dec 193.88249
## 109 2024 Jan 209.40965
## 110 2024 Feb 205.95805
## 111 2024 Mar 198.74537
## 112 2024 Apr 205.40364
## 113 2024 May 175.72824
## 114 2024 Jun 174.78914
## 115 2024 Jul 116.94505
## 116 2024 Aug 180.88331
## 117 2024 Sep 174.36749
## 118 2024 Oct 335.09455
## 119 2024 Nov 131.89419
## 120 2024 Dec 206.86281
ggplot(data_ridge, aes(x=CurahHujan, y=factor(Tahun), fill=..x..)) +
geom_density_ridges_gradient(scale=3, rel_min_height=0.01) +
scale_fill_gradient(low="skyblue", high="navy") +
theme_minimal() +
labs(title="Ridgeline Plot Curah Hujan Tahunan",
x="Curah Hujan (mm)", y="Tahun")
## Warning: The dot-dot notation (`..x..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(x)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Picking joint bandwidth of 22.7