Pre-talk quiz

Part one: Introduction

Slide license

Creative Commons

Creative Commons

  • need attribution (BY)
  • for non-commercial purposes only (NC)
  • can be copied, modified (SA, share-alike)

A bit about me

Personal data

Japan, 2009

Japan, 2009

  • Name: Dasapta Erwin Irawan

  • Job: Lecturer/researcher at Groundwater Engineering Program, ITB

  • Education (Geology, ITB): 1994-1998 (undergrad), 1999-2001 (Master), 2005-2009 (PhD)

A bit about me

Visiting program

  • Nov-Dec 2009

    • Center for Environmental Remote Sensing (CeRES), Chiba University (2009)
    • Supervisor: Prof. Josaphat T.S. Sumantyo
    • Research: remote sensing for hydrological purposes
  • Feb 2014-Feb 2015

    • Faculty of Agriculture and Environment, University of Sydney
    • Supervisor: Dr. Willem Vervoort
    • Research: hydrological modeling in R

A bit about me

Research

  • hydrogeology
  • hydrochemistry
  • multivariate [statistical] analysis

A bit about me

Media social

Website: - R and Linux - Writing - SlideShare

Twitter: @dasaptaerwin

Email: d_erwin_irawan[at]yahoo[dot]com

Skills in Geology

Essential skills for geologist

Essential skills for geologist

Essential skills for geologist

Software skills (not free version)

  • office: Microsoft Office (Word, Excel, ppt) -> annual subscription (from USD 10 per month)
  • citation and referencing: EndNote -> from USD 250
  • statistical: Minitab, SPSS, Statistica, Stata -> basic version from USD 700 (2012)
  • spatial / GIS: ArcGIS, Mapinfo, etc -> annual subscription (basic version from USD 100 per year)

Sources:

Software skills (free equivalent)

Why open source?

  • free as breathing
  • mostly cross-platform (Linux, Mac, Win)
  • strong community, hence rapid development
  • supporting reproducibility

What is reproducibility in science?

Every step can be:

  • re-do
  • re-analysed and re-evaluate
  • re-developed

What is reproducibility in science?

Those principles are applied to:

  • data (items and locations)
  • software used in the analyses:
    • each software has distinct feature and algorithm
    • what would happen if not everyone could purchase the software?

End of part one

Part two: Cikapundung case

Slide license

Creative Commons

Creative Commons

  • need attribution (BY)
  • for non-commercial purposes only (NC)
  • can be copied, modified (SA, share-alike)

Background

Kepadatan penduduk di bantaran Cikapundung (panoramio.com)

Kepadatan penduduk di bantaran Cikapundung (panoramio.com)

Background

  • Cikapundung has important roles:

  • is one of the major water source for Bandung Basin: WTP Dago Pakar = 40 L/sec
  • electrical generator (since 1923):
    • PLTA Bengkok = 3 MW
    • PLTA Dago = 0.7 MW
  • Drainase kota

Background

Some facts

Some facts

Background

Vast growth of settlements + landuse change -> declining water quality (both river and groundwater).

High density spots (Distarcip, 2014)

High density spots (Distarcip, 2014)

Background

BOD and COD (BPLH, 2006)

BOD and COD (BPLH, 2006)

What do we know so far?

  • There types of groundwater and river water interactions (Lubis and Puradimaja 2006)
    • isolated stream at Maribaya area (upstream)
    • effluent stream (or gaining stream) at Maribaya to Viaduct segment (Bandung central)
    • influent stream (or losing stream) from Viaduct to Dayeuhkolot
  • some facts of springs and seepages at isolated segment (Tanuwijaya 2014).

What do we know so far?

Model Lubis and Puradimaja, 2006

Model Lubis and Puradimaja, 2006

What do we know so far?

  • Conceptual model to mimic the interactions (Darul et.al 2014ab)
  • It confirms the Lubis Model

What do we know so far?

Model 1

Model 1

What do we know so far?

Model 2

Model 2

Our question

  • Does water quality reflect the interactions?

Our tools

  • R
  • Let’s do some [simple] analyses

Showing pairs analysis (bivariate analysis)

Data format

  • variables or measurements in columns
  • cases or samples in rows
  • no merged columns or rows
  • read also Data is the new soil

Why pairs analysis

  • equivalent to correlation matrix
  • the fastest way to see correlations between variables
  • pls bear in mind correlation does not always mean causality

Our data

Data points

Data points

Our data

  • 295 samples
  • From five years periode (1997, 1998, 2007, 2011, 2012)

Load

# load data
data <- as.data.frame(read.csv("BandungData.csv",
                               header = TRUE))
attach(data)
## The following object is masked from package:datasets:
## 
##     CO2

Data structure

# data structure
str(data)
## 'data.frame':    295 obs. of  33 variables:
##  $ no     : int  16 22 263 17 12 18 13 19 14 20 ...
##  $ code   : int  116 122 8 117 112 118 113 119 114 120 ...
##  $ year   : int  1997 1997 1997 1997 1997 1997 1997 1997 1997 1997 ...
##  $ type   : Factor w/ 2 levels "groundwater",..: 1 1 2 1 1 1 1 1 1 1 ...
##  $ x      : num  785175 785168 799275 785175 785181 ...
##  $ y      : num  10752836 10752843 10753680 10752840 10752843 ...
##  $ distx  : num  6897 6904 0 6897 6891 ...
##  $ elv    : int  1338 1336 1336 1320 1300 1247 1240 1230 1228 1225 ...
##  $ aq     : Factor w/ 3 levels "breccias","clay",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ zone   : Factor w/ 2 levels "eff","inf": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ec     : num  71.9 71.9 77 71.9 71.9 71.9 71.9 71.9 71.9 71.9 ...
##  $ ph     : num  6.89 6.89 6.39 6.89 6.89 ...
##  $ hard   : num  11 11 26.4 11 11 11 11 11 11 11 ...
##  $ tds    : num  58.7 58.7 50 58.7 58.7 ...
##  $ temp   : num  21 21 16.1 21 21 ...
##  $ eh     : num  30 24 -0.45 34 35 32 30 23 24 21 ...
##  $ Q      : num  1 1 NA 1 1 1 1 1 1 1 ...
##  $ Ca     : num  1.8 1.8 9.44 1.8 1.8 1.8 1.8 1.8 1.8 1.8 ...
##  $ Mg     : num  1.7 1.7 0.72 1.7 1.7 1.7 1.7 1.7 1.7 1.7 ...
##  $ Fe     : num  0.08 0.08 0.216 0.08 0.08 0.08 0.08 0.08 0.08 0.08 ...
##  $ Mn     : num  0.22 0.22 0 0.22 0.22 0.22 0.22 0.22 0.22 0.22 ...
##  $ K      : num  1.7 1.7 0.8 1.7 1.7 1.7 1.7 1.7 1.7 1.7 ...
##  $ Na     : num  5 5 3.2 5 5 5 5 5 5 5 ...
##  $ CO3    : num  7 6 0 6 5.8 6.8 6.7 8 8 8.2 ...
##  $ HCO3   : num  8 7.8 31.4 6.7 7 ...
##  $ CO2    : num  36.3 36.3 7.28 36.3 36.3 36.3 36.3 36.3 36.3 36.3 ...
##  $ Cl     : num  4.8 4.8 5.52 4.8 4.8 4.8 4.8 4.8 4.8 4.8 ...
##  $ SO4    : num  0.6 0.6 0 0.6 0.6 0.6 0.6 0.6 0.6 0.6 ...
##  $ NO2    : num  0 0 0.04 0 0 0 0 0 0 0 ...
##  $ NO3    : num  4.7 4.7 2.24 4.7 4.7 4.7 4.7 4.7 4.7 4.7 ...
##  $ SiO2   : num  23 23 37.8 23 23 ...
##  $ cumrain: num  38.5 38.5 38.5 38.5 38.5 38.5 38.5 38.5 38.5 38.5 ...
##  $ lag1   : num  32.4 32.4 32.4 32.4 32.4 32.4 32.4 32.4 32.4 32.4 ...

pairs plot 1

Simple command

pairs(data)

pairs plot 1

You would get a plot but it’s:

  • ugly, too small
  • no legend and axis
  • we need to tweak it: group the variables and change plot code

pairs plot 1

pairs(group1,labels=colnames(group1),
      main="Physical parameter", 
      pch=21, bg=c("red", "blue")
      [unclass(data$type)],
      upper.panel=NULL)
legend(x=0.6, y=0.8, levels(data$type), 
        pt.bg=c("red", "blue"), 
        pch=21, bty="n", ncol=2, 
        horiz=F)

Grouping variables

# group data
# Data group 1: Physical parameters
group1 <- data[,c("x", "y", "elv", "aq", "ec", "ph",
                  "hard", "tds", "temp", "eh", "Q")]

# Data group 2: Cation
group2 <- data[,c("x", "y", "elv", "Ca",
                  "Mg", "Fe", "Mn", "K", "Na")]

## Data group 3: Anions (unit = ppm)
group3 = data[,c("x", "y", "CO3", "HCO3",
                 "CO2", "Cl", "SO4", "NO2",
                 "NO3", "SiO2")]

Pairs plot group 1 (physical parameters)

plot of chunk unnamed-chunk-4

plot of chunk unnamed-chunk-4

Pairs plot group 2 (cations )

plot of chunk unnamed-chunk-5

plot of chunk unnamed-chunk-5

Pairs plot group 3 (anions )

plot of chunk unnamed-chunk-6

plot of chunk unnamed-chunk-6

Showing PCA analysis (multivariate analysis)

Why PCA (Principle Component Analysis)?

  • nature embeds multivariable process
  • has been widely used and developed since the 60’s
  • simple, straighforward, nearest neighbour (cluster) principles
  • offers nice visualisation

[Simple] codes

# install library
install.packages("pcaMethods") # for PCA
install.packages("gridExtra")  # for plot lay out

# load library
library(pcaMethods) # for PCA
library(gridExtra)  # for plot lay out

# run PCA
pca1 <- pca(group1,  
               method = "svdImpute", 
               scale = "uv",
               center = T,
               nPcs = 3,
               evalPcs = 1:3)

[Simple] codes

# evaluate results
summary(pca1)     # result summary
sDev(pca1)        # extracting eigenvalues
plot(sDev(pca1))  # plotting eigenvalues
loadings(pca1)    # plot loadings
scores(pca1)      # plot scores

Results: summary PCA1

## svdImpute calculated PCA
## Importance of component(s):
##                 PC1    PC2    PC3
## R2            0.258 0.1672 0.1257
## Cumulative R2 0.258 0.4251 0.5509

Results: summary PCA2

## svdImpute calculated PCA
## Importance of component(s):
##                  PC1    PC2    PC3
## R2            0.2891 0.1392 0.1064
## Cumulative R2 0.2891 0.4283 0.5347

Results: summary PCA3

## svdImpute calculated PCA
## Importance of component(s):
##                  PC1    PC2     PC3
## R2            0.1991 0.1337 0.09922
## Cumulative R2 0.1991 0.3327 0.43194

Results: Extract Eigenvalues PCA1

plot of chunk unnamed-chunk-13

plot of chunk unnamed-chunk-13

Results: plot PCA Group 1

plot of chunk unnamed-chunk-14

plot of chunk unnamed-chunk-14

Results: plot PCA Group 2

plot of chunk unnamed-chunk-15

plot of chunk unnamed-chunk-15

Results: plot PCA Group 3

plot of chunk unnamed-chunk-16

plot of chunk unnamed-chunk-16

Results: loadings and scores Group1

plot of chunk unnamed-chunk-17

plot of chunk unnamed-chunk-17

Results: loadings and scores Group2

plot of chunk unnamed-chunk-18

plot of chunk unnamed-chunk-18

Results: loadings and scores Group3

plot of chunk unnamed-chunk-19

plot of chunk unnamed-chunk-19

spatial analysis (bubble plot)

why bubble plot?

  • shows spatial variation as well as values distribution
  • simple and straigtforward visualisation

[simple] codes

# load library (assuming all libraries are installed)
library(gstat)
library(sp)
library(rgdal)
library(latticeExtra)

# open and load data
df <- read.csv("BandungData.csv", header=TRUE)

# convert xy values as coordinates
coordinates(df) <- ~ x + y

[simple] codes

# make bubble plot
bubbleCa <- bubble(data, zcol="Ca", 
                   xlab="X coord", ylab="Y coord",
                   main="Bubble plot Ca",
                   col = data$zone, 
                   scales=list(tck=0.5))
print(bubbleCa)

Process

## rgdal: version: 0.9-1, (SVN revision 518)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.10.1, released 2013/08/26
## Path to GDAL shared files: /usr/share/gdal/1.10
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: (autodetected)

Results: groundwater end member (Ca)

plot of chunk unnamed-chunk-21

plot of chunk unnamed-chunk-21

Results: groundwater end member (Mg)

plot of chunk unnamed-chunk-22

plot of chunk unnamed-chunk-22

Results: river end member (Na)

plot of chunk unnamed-chunk-23

plot of chunk unnamed-chunk-23

Results: river end member (K)

plot of chunk unnamed-chunk-24

plot of chunk unnamed-chunk-24

Results: contamination signature (NO3)

plot of chunk unnamed-chunk-25

plot of chunk unnamed-chunk-25

Results: contamination signature (NO2)

plot of chunk unnamed-chunk-26

plot of chunk unnamed-chunk-26

Results: contamination signature (SO4)

plot of chunk unnamed-chunk-27

plot of chunk unnamed-chunk-27

Results: contamination signature (Cl)

plot of chunk unnamed-chunk-28

plot of chunk unnamed-chunk-28

Remarks

  • higher mineral concentration in river water than groundwater should have occured in effluent flow.
  • higher mineral concentration in groundwater than river water should have occured in influent flow.
  • both natural indications are not detected, except for NO2.

Remarks

  • the anomaly is due to dilution effect.
  • dilution overides enrichment effect.
  • the opposite would happen if sampling is conducted in dry season.
  • possibility of different catchment between groundwater and river water.

Closing

Future research opportunities:

  • to add more data in different locations along river bank, taken in both rain and dry season.
  • more exploratory statistical analysis, eg: multiple regression tree to extract data pattern.

Main references

  • Lubis, RF and Puradimaja, DJ, 2006, Hydrodynamic relationsships between groundwater and river water: CIkapundung river stream, West Java, Indonesia
  • Darul, A, Irawan, DE, and Trilaksono, NJ, 2014a, Groundwater and river water interaction on Cikapundung River: Revisited, International Conference on Math and Natural Sciences, ITB.
  • Darul, A, 2014b, Model konseptual interaksi air tanah dan air sungai di bantaran S. Cikapundung, Bandung, Jawa Barat, Tesis S2, Supervisor: Dr. Dasapta Erwin Irawan dan Dr. Nurjanna Joko Trilaksono.
  • Tanuwijaya, ZAJ, 2014, Identifikasi interaksi air sungai dan air tanah di DAS Cikapundung, Disertasi, Geologi Universitas Padjadjaran.

These slides were made using open-source tools

  • Ubuntu Linux (14.04)
  • R
  • Dia flowcharter
  • Gimp image editor