Statistika Deskriptif dan Visualisasi Data Lingkungan

Pendahuluan

Statistika deskriptif tidak hanya berhenti pada tabel angka dan grafik sederhana, tetapi dapat berkembang ke arah visualisasi data yang lebih kaya dan interaktif. Dalam konteks statistika lingkungan, visualisasi memiliki peran penting untuk memperlihatkan pola, tren, serta distribusi data yang berhubungan dengan fenomena lingkungan, seperti kualitas udara, pencemaran air, hingga perubahan iklim. Pada materi ini, kita akan membahas empat jenis visualisasi data yang lebih “advance” dibanding grafik dasar: Choropleth Map, Heatmap, Violin Plot, dan Ridgeline Plot.

1. Choropleth Map

Apa itu Choropleth Map?

Choropleth map adalah peta tematik yang menampilkan data kuantitatif menggunakan gradasi warna sesuai dengan batas wilayah tertentu. Visualisasi ini sangat berguna dalam statistika lingkungan untuk melihat perbedaan spasial, misalnya distribusi polusi udara, curah hujan, atau kualitas air pada wilayah yang berbeda.

library(sf)
## Warning: package 'sf' was built under R version 4.4.3
## Linking to GEOS 3.13.0, GDAL 3.10.1, PROJ 9.5.1; sf_use_s2() is TRUE
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Baca shapefile (misalnya kecamatan di Semarang)
peta <- st_read("Kecamatan_Semarang.geojson")
## Reading layer `Kecamatan_Semarang' from data source 
##   `C:\Users\USER\Downloads\Kecamatan_Semarang.geojson' using driver `GeoJSON'
## Simple feature collection with 32 features and 10 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 110.2673 ymin: -7.114464 xmax: 110.5089 ymax: -6.931992
## Geodetic CRS:  WGS 84
peta
## Simple feature collection with 32 features and 10 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 110.2673 ymin: -7.114464 xmax: 110.5089 ymax: -6.931992
## Geodetic CRS:  WGS 84
## First 10 features:
##                  id             X.id X.relations admin_level       boundary
## 1  relation/8110792 relation/8110792        <NA>           6 administrative
## 2  relation/8132993 relation/8132993        <NA>           6 administrative
## 3  relation/8147917 relation/8147917        <NA>           6 administrative
## 4  relation/8178535 relation/8178535        <NA>           6 administrative
## 5  relation/8201927 relation/8201927        <NA>           6 administrative
## 6  relation/8201936 relation/8201936        <NA>           6 administrative
## 7  relation/8222496 relation/8222496        <NA>           6 administrative
## 8  relation/8244965 relation/8244965        <NA>           6 administrative
## 9  relation/8350713 relation/8350713        <NA>           6 administrative
## 10 relation/8350849 relation/8350849        <NA>           6 administrative
##    is_in.city is_in.province             name                 source     type
## 1    Semarang    Jawa Tengah        Gayamsari HOT_InAWARESurvey_2018 boundary
## 2    Semarang    Jawa Tengah Semarang Selatan HOT_InAWARESurvey_2018 boundary
## 3    Semarang    Jawa Tengah    Gajah Mungkur HOT_InAWARESurvey_2018 boundary
## 4    Semarang    Jawa Tengah        Candisari HOT_InAWARESurvey_2018 boundary
## 5    Semarang    Jawa Tengah   Semarang Timur HOT_InAWARESurvey_2018 boundary
## 6    Semarang    Jawa Tengah   Semarang Utara HOT_InAWARESurvey_2018 boundary
## 7    Semarang    Jawa Tengah  Semarang Tengah HOT_InAWARESurvey_2018 boundary
## 8    Semarang    Jawa Tengah   Semarang Barat  HOT_InAWARESurey_2018 boundary
## 9    Semarang    Jawa Tengah            Genuk HOT_InAWARESurvey_2018 boundary
## 10   Semarang    Jawa Tengah       Pedurungan HOT_InAWARESurvey_2018 boundary
##                          geometry
## 1  POLYGON ((110.4404 -6.95149...
## 2  POLYGON ((110.432 -6.994293...
## 3  POLYGON ((110.3874 -7.01746...
## 4  POLYGON ((110.4161 -7.00192...
## 5  POLYGON ((110.4316 -6.98417...
## 6  POLYGON ((110.4308 -6.96088...
## 7  POLYGON ((110.4026 -6.97739...
## 8  POLYGON ((110.3797 -7.00813...
## 9  POLYGON ((110.5048 -6.96815...
## 10 POLYGON ((110.4553 -6.96618...
# Simulasi data PM2.5 untuk tiap kecamatan
set.seed(100)
data <- data.frame(
  Kecamatan = peta$name, # kolom nama kecamatan
  PM25 = sample(20:100, length(peta$name), replace = TRUE)
)
data
##           Kecamatan PM25
## 1         Gayamsari   93
## 2  Semarang Selatan   97
## 3     Gajah Mungkur   42
## 4         Candisari   89
## 5    Semarang Timur   23
## 6    Semarang Utara   74
## 7   Semarang Tengah   89
## 8    Semarang Barat   26
## 9             Genuk   26
## 10       Pedurungan   74
## 11             Tugu   62
## 12      Gunung Pati   80
## 13        Tembalang   31
## 14            Mijen   70
## 15       Banyumanik   91
## 16         Ngaliyan   37
## 17             <NA>   44
## 18             <NA>   21
## 19             <NA>   70
## 20             <NA>   87
## 21             <NA>   87
## 22             <NA>   71
## 23             <NA>   67
## 24             <NA>   51
## 25             <NA>   58
## 26             <NA>   35
## 27             <NA>   94
## 28             <NA>   85
## 29             <NA>   89
## 30             <NA>   64
## 31             <NA>   49
## 32             <NA>   49
# Gabungkan data dengan peta
peta_data <- left_join(peta, data, by=c("name"="Kecamatan"))
## Warning in sf_column %in% names(g): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 17 of `x` matches multiple rows in `y`.
## ℹ Row 17 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
peta_data
## Simple feature collection with 272 features and 11 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 110.2673 ymin: -7.114464 xmax: 110.5089 ymax: -6.931992
## Geodetic CRS:  WGS 84
## First 10 features:
##                  id             X.id X.relations admin_level       boundary
## 1  relation/8110792 relation/8110792        <NA>           6 administrative
## 2  relation/8132993 relation/8132993        <NA>           6 administrative
## 3  relation/8147917 relation/8147917        <NA>           6 administrative
## 4  relation/8178535 relation/8178535        <NA>           6 administrative
## 5  relation/8201927 relation/8201927        <NA>           6 administrative
## 6  relation/8201936 relation/8201936        <NA>           6 administrative
## 7  relation/8222496 relation/8222496        <NA>           6 administrative
## 8  relation/8244965 relation/8244965        <NA>           6 administrative
## 9  relation/8350713 relation/8350713        <NA>           6 administrative
## 10 relation/8350849 relation/8350849        <NA>           6 administrative
##    is_in.city is_in.province             name                 source     type
## 1    Semarang    Jawa Tengah        Gayamsari HOT_InAWARESurvey_2018 boundary
## 2    Semarang    Jawa Tengah Semarang Selatan HOT_InAWARESurvey_2018 boundary
## 3    Semarang    Jawa Tengah    Gajah Mungkur HOT_InAWARESurvey_2018 boundary
## 4    Semarang    Jawa Tengah        Candisari HOT_InAWARESurvey_2018 boundary
## 5    Semarang    Jawa Tengah   Semarang Timur HOT_InAWARESurvey_2018 boundary
## 6    Semarang    Jawa Tengah   Semarang Utara HOT_InAWARESurvey_2018 boundary
## 7    Semarang    Jawa Tengah  Semarang Tengah HOT_InAWARESurvey_2018 boundary
## 8    Semarang    Jawa Tengah   Semarang Barat  HOT_InAWARESurey_2018 boundary
## 9    Semarang    Jawa Tengah            Genuk HOT_InAWARESurvey_2018 boundary
## 10   Semarang    Jawa Tengah       Pedurungan HOT_InAWARESurvey_2018 boundary
##    PM25                       geometry
## 1    93 POLYGON ((110.4404 -6.95149...
## 2    97 POLYGON ((110.432 -6.994293...
## 3    42 POLYGON ((110.3874 -7.01746...
## 4    89 POLYGON ((110.4161 -7.00192...
## 5    23 POLYGON ((110.4316 -6.98417...
## 6    74 POLYGON ((110.4308 -6.96088...
## 7    89 POLYGON ((110.4026 -6.97739...
## 8    26 POLYGON ((110.3797 -7.00813...
## 9    26 POLYGON ((110.5048 -6.96815...
## 10   74 POLYGON ((110.4553 -6.96618...
# Plot choropleth
ggplot(peta_data) +
  geom_sf(aes(fill=PM25), color="white") +
  geom_sf_text(aes(label = name), size = 2, color = "white") +
  scale_fill_gradient(low="tomato", high="navy", name="PM2.5") +
  theme_minimal() +
  labs(title="Choropleth Map PM2.5 - Kota Semarang")
## Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
## give correct results for longitude/latitude data
## Warning: Removed 256 rows containing missing values or values outside the scale range
## (`geom_text()`).

Interpretasi:

Contoh Insight:

2. Heatmap

Apa itu Heatmap?

Heatmap adalah representasi data dalam bentuk grid (baris-kolom) dengan warna sebagai pengganti angka. Dalam statistika lingkungan, heatmap bisa menampilkan pola temporal (waktu) atau spasial (lokasi) secara kompak, misalnya variasi polusi udara setiap jam dalam satu minggu.

Contoh Data dan Visualisasi (Polusi per Jam dalam Seminggu)

library(ggplot2)
library(dplyr)

set.seed(42)

# Hari dan jam
hari <- c("Senin","Selasa","Rabu","Kamis","Jumat","Sabtu","Minggu")
jam <- 0:23

# Buat grid data
data_heat <- expand.grid(Hari = hari, Jam = jam)

# Fungsi untuk simulasi realistis
data_heat <- data_heat %>%
  rowwise() %>%
  mutate(PM25 = {
    base <- 30
    
    # Tambah polusi saat jam sibuk
    if (Jam %in% 7:9 | Jam %in% 17:19) base <- base + 40
    
    # Tambah polusi khusus hari kerja
    if (Hari %in% c("Senin","Selasa","Rabu","Kamis","Jumat")) base <- base + 10
    
    # Sedikit lebih rendah saat malam/dini hari
    if (Jam %in% 0:5) base <- base - 15
    
    # Akhir pekan lebih bersih
    if (Hari %in% c("Sabtu","Minggu")) base <- base - 10
    
    # Variasi acak
    round(rnorm(1, mean = base, sd = 10))
  }) %>%
  ungroup()
data_heat
## # A tibble: 168 × 3
##    Hari     Jam  PM25
##    <fct>  <int> <dbl>
##  1 Senin      0    39
##  2 Selasa     0    19
##  3 Rabu       0    29
##  4 Kamis      0    31
##  5 Jumat      0    29
##  6 Sabtu      0     4
##  7 Minggu     0    20
##  8 Senin      1    24
##  9 Selasa     1    45
## 10 Rabu       1    24
## # ℹ 158 more rows
# Plot heatmap
ggplot(data_heat, aes(x=Jam, y=Hari, fill=PM25)) +
  geom_tile(color="white") +
  scale_fill_gradient(low="lightyellow", high="darkred") +
  theme_minimal() +
  labs(
    title="Heatmap PM2.5 per Jam dan Hari",
    x="Jam", y="Hari"
  )

Interpretasi:

3. Violin Plot

Apa itu Violin Plot?

Violin plot adalah pengembangan dari boxplot yang menambahkan kepadatan distribusi data di kedua sisi, menyerupai bentuk biola. Visualisasi ini berguna untuk melihat distribusi pencemaran antar kategori (misalnya PM2.5 antar kecamatan, atau kadar logam berat di sungai antar lokasi).

set.seed(42)
data_violin <- data.frame(
  Sungai = rep(c("Banjir Kanal Barat", "Banjir Kanal Timur", "Kali Garang"), each=50),
  pH = c(
    rnorm(50, 6.8, 0.3),  # Banjir Kanal Barat
    rnorm(50, 7.2, 0.4),  # Banjir Kanal Timur
    rnorm(50, 6.5, 0.2)   # Kali Garang
  ),
  Waktu = rep(seq.POSIXt(from = as.POSIXct("2025-09-01 00:00"),
                         by = "hour", length.out = 50), 3)
)

head(data_violin, 10)
##                Sungai       pH               Waktu
## 1  Banjir Kanal Barat 7.211288 2025-09-01 00:00:00
## 2  Banjir Kanal Barat 6.630591 2025-09-01 01:00:00
## 3  Banjir Kanal Barat 6.908939 2025-09-01 02:00:00
## 4  Banjir Kanal Barat 6.989859 2025-09-01 03:00:00
## 5  Banjir Kanal Barat 6.921280 2025-09-01 04:00:00
## 6  Banjir Kanal Barat 6.768163 2025-09-01 05:00:00
## 7  Banjir Kanal Barat 7.253457 2025-09-01 06:00:00
## 8  Banjir Kanal Barat 6.771602 2025-09-01 07:00:00
## 9  Banjir Kanal Barat 7.405527 2025-09-01 08:00:00
## 10 Banjir Kanal Barat 6.781186 2025-09-01 09:00:00
ggplot(data_violin, aes(x=Sungai, y=pH, fill=Sungai)) +
  geom_violin(trim=FALSE, alpha=0.6) +
  geom_boxplot(width=0.1, fill="white") +
  theme_minimal() +
  labs(title="Sebaran pH Air Sungai di Semarang",
       subtitle="Violin plot menunjukkan distribusi lengkap",
       x="", y="pH")

Contoh interpretasi:

4. Ridgeline Plot

Apa itu Ridgeline Plot?

Ridgeline plot menampilkan distribusi data (density) dari beberapa kategori dalam bentuk tumpukan kurva yang saling berlapis. Ini sangat efektif untuk menunjukkan perubahan distribusi data antar waktu atau lokasi.

library(ggridges)
## Warning: package 'ggridges' was built under R version 4.4.3
set.seed(42)
data_ridge <- data.frame(
  Tahun = rep(2015:2024, each=12),
  Bulan = rep(month.abb, times=10),
  CurahHujan = rnorm(120, mean=200, sd=50)
)
data_ridge
##     Tahun Bulan CurahHujan
## 1    2015   Jan  268.54792
## 2    2015   Feb  171.76509
## 3    2015   Mar  218.15642
## 4    2015   Apr  231.64313
## 5    2015   May  220.21342
## 6    2015   Jun  194.69377
## 7    2015   Jul  275.57610
## 8    2015   Aug  195.26705
## 9    2015   Sep  300.92119
## 10   2015   Oct  196.86430
## 11   2015   Nov  265.24348
## 12   2015   Dec  314.33227
## 13   2016   Jan  130.55696
## 14   2016   Feb  186.06056
## 15   2016   Mar  193.33393
## 16   2016   Apr  231.79752
## 17   2016   May  185.78735
## 18   2016   Jun   67.17723
## 19   2016   Jul   77.97665
## 20   2016   Aug  266.00567
## 21   2016   Sep  184.66807
## 22   2016   Oct  110.93458
## 23   2016   Nov  191.40413
## 24   2016   Dec  260.73373
## 25   2017   Jan  294.75967
## 26   2017   Feb  178.47654
## 27   2017   Mar  187.13653
## 28   2017   Apr  111.84185
## 29   2017   May  223.00487
## 30   2017   Jun  168.00026
## 31   2017   Jul  222.77251
## 32   2017   Aug  235.24187
## 33   2017   Sep  251.75518
## 34   2017   Oct  169.55368
## 35   2017   Nov  225.24776
## 36   2017   Dec  114.14957
## 37   2018   Jan  160.77705
## 38   2018   Feb  157.45462
## 39   2018   Mar   79.28962
## 40   2018   Apr  201.80613
## 41   2018   May  210.29993
## 42   2018   Jun  181.94714
## 43   2018   Jul  237.90816
## 44   2018   Aug  163.66476
## 45   2018   Sep  131.58595
## 46   2018   Oct  221.64090
## 47   2018   Nov  159.43034
## 48   2018   Dec  272.20506
## 49   2019   Jan  178.42769
## 50   2019   Feb  232.78239
## 51   2019   Mar  216.09626
## 52   2019   Apr  160.80805
## 53   2019   May  278.78638
## 54   2019   Jun  232.14497
## 55   2019   Jul  204.48803
## 56   2019   Aug  213.82754
## 57   2019   Sep  233.96444
## 58   2019   Oct  204.49164
## 59   2019   Nov   50.34550
## 60   2019   Dec  214.24415
## 61   2020   Jan  181.63827
## 62   2020   Feb  209.26153
## 63   2020   Mar  229.09119
## 64   2020   Apr  269.98684
## 65   2020   May  163.63540
## 66   2020   Jun  265.12713
## 67   2020   Jul  216.79241
## 68   2020   Aug  251.92530
## 69   2020   Sep  246.03643
## 70   2020   Oct  236.04391
## 71   2020   Nov  147.84405
## 72   2020   Dec  195.49068
## 73   2021   Jan  231.17591
## 74   2021   Feb  152.32383
## 75   2021   Mar  172.85856
## 76   2021   Apr  229.04982
## 77   2021   May  238.40894
## 78   2021   Jun  223.18838
## 79   2021   Jul  155.71119
## 80   2021   Aug  145.01096
## 81   2021   Sep  275.63535
## 82   2021   Oct  212.89607
## 83   2021   Nov  204.42201
## 84   2021   Dec  193.95517
## 85   2022   Jan  140.28356
## 86   2022   Feb  230.59984
## 87   2022   Mar  189.14301
## 88   2022   Apr  190.86216
## 89   2022   May  246.66732
## 90   2022   Jun  241.08866
## 91   2022   Jul  269.60582
## 92   2022   Aug  176.19130
## 93   2022   Sep  232.51743
## 94   2022   Oct  269.55552
## 95   2022   Nov  144.46056
## 96   2022   Dec  156.96037
## 97   2023   Jan  143.41307
## 98   2023   Feb  127.03930
## 99   2023   Mar  203.99913
## 100  2023   Apr  232.66022
## 101  2023   May  260.04827
## 102  2023   Jun  252.23755
## 103  2023   Jul  149.83957
## 104  2023   Aug  292.42410
## 105  2023   Sep  166.66133
## 106  2023   Oct  205.27569
## 107  2023   Nov  178.88721
## 108  2023   Dec  193.88249
## 109  2024   Jan  209.40965
## 110  2024   Feb  205.95805
## 111  2024   Mar  198.74537
## 112  2024   Apr  205.40364
## 113  2024   May  175.72824
## 114  2024   Jun  174.78914
## 115  2024   Jul  116.94505
## 116  2024   Aug  180.88331
## 117  2024   Sep  174.36749
## 118  2024   Oct  335.09455
## 119  2024   Nov  131.89419
## 120  2024   Dec  206.86281
ggplot(data_ridge, aes(x=CurahHujan, y=factor(Tahun), fill=..x..)) +
  geom_density_ridges_gradient(scale=3, rel_min_height=0.01) +
  scale_fill_gradient(low="skyblue", high="navy") +
  theme_minimal() +
  labs(title="Ridgeline Plot Curah Hujan Tahunan",
       x="Curah Hujan (mm)", y="Tahun")
## Warning: The dot-dot notation (`..x..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(x)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Picking joint bandwidth of 22.7