Pre-talk quiz

Pls visit www.govote.at enter code 88 74 35, and answer three yes/no questions

Part one: Introduction

Slide license

Creative Commons

need attribution (BY)
for non-commercial purposes only (NC)
can be copied, modified (SA, share-alike)

A bit about me

Personal data

Japan, 2009

Name: Dasapta Erwin Irawan
Job: Lecturer/researcher at Groundwater Engineering Program, ITB
Education (Geology, ITB): 1994-1998 (undergrad), 1999-2001 (Master), 2005-2009 (PhD)

A bit about me

Visiting program

Nov-Dec 2009
- Center for Environmental Remote Sensing (CeRES), Chiba University (2009)
- Supervisor: Prof. Josaphat T.S. Sumantyo
- Research: remote sensing for hydrological purposes
Feb 2014-Feb 2015
- Faculty of Agriculture and Environment, University of Sydney
- Supervisor: Dr. Willem Vervoort
- Research: hydrological modeling in R

A bit about me

Research

hydrogeology
hydrochemistry
multivariate [statistical] analysis

A bit about me

Media social

Website: - R and Linux - Writing - SlideShare

Twitter: @dasaptaerwin

Email: d_erwin_irawan[at]yahoo[dot]com

Skills in Geology

Essential skills for geologist

Software skills (not free version)

office: Microsoft Office (Word, Excel, ppt) -> annual subscription (from USD 10 per month)
citation and referencing: EndNote -> from USD 250
statistical: Minitab, SPSS, Statistica, Stata -> basic version from USD 700 (2012)
spatial / GIS: ArcGIS, Mapinfo, etc -> annual subscription (basic version from USD 100 per year)

Sources:

Software skills (free equivalent)

office: OpenOffice or LibreOffice
citation and referencing: Zotero, Mendeley, etc
statistical: R and R Studio, Orange Data Mining, PSPP, etc
GIS: QGIS, GRASSGIS, R

Why open source?

free as breathing
mostly cross-platform (Linux, Mac, Win)
strong community, hence rapid development
supporting reproducibility

What is reproducibility in science?

Every step can be:

re-do
re-analysed and re-evaluate
re-developed

What is reproducibility in science?

Those principles are applied to:

data (items and locations)
software used in the analyses:
- each software has distinct feature and algorithm
- what would happen if not everyone could purchase the software?

End of part one

Part two: Cikapundung case

Slide license

Creative Commons

need attribution (BY)
for non-commercial purposes only (NC)
can be copied, modified (SA, share-alike)

Background

Kepadatan penduduk di bantaran Cikapundung (panoramio.com)

Background

Cikapundung has important roles:
is one of the major water source for Bandung Basin: WTP Dago Pakar = 40 L/sec
electrical generator (since 1923):
- PLTA Bengkok = 3 MW
- PLTA Dago = 0.7 MW
Drainase kota

Background

Some facts

Background

Vast growth of settlements + landuse change -> declining water quality (both river and groundwater).

$High density spots (Distarcip, 2014)$

High density spots (Distarcip, 2014)

Background

BOD and COD (BPLH, 2006)

What do we know so far?

There types of groundwater and river water interactions (Lubis and Puradimaja 2006)
- isolated stream at Maribaya area (upstream)
- effluent stream (or gaining stream) at Maribaya to Viaduct segment (Bandung central)
- influent stream (or losing stream) from Viaduct to Dayeuhkolot
some facts of springs and seepages at isolated segment (Tanuwijaya 2014).

What do we know so far?

Model Lubis and Puradimaja, 2006

What do we know so far?

Conceptual model to mimic the interactions (Darul et.al 2014^ab)
It confirms the Lubis Model

What do we know so far?

Model 1

What do we know so far?

Model 2

Our question

Does water quality reflect the interactions?

Our tools

R
Let’s do some [simple] analyses

Showing `pairs` analysis (`bi`variate analysis)

Data format

variables or measurements in columns
cases or samples in rows
no merged columns or rows
read also Data is the new soil

Why `pairs` analysis

equivalent to correlation matrix
the fastest way to see correlations between variables
pls bear in mind correlation does not always mean causality

Our data

Data points

Our data

295 samples
From five years periode (1997, 1998, 2007, 2011, 2012)

Load

# load data
data <- as.data.frame(read.csv("BandungData.csv",
                               header = TRUE))
attach(data)

## The following object is masked from package:datasets:
## 
##     CO2

Data structure

# data structure
str(data)

## 'data.frame':    295 obs. of  33 variables:
##  $ no     : int  16 22 263 17 12 18 13 19 14 20 ...
##  $ code   : int  116 122 8 117 112 118 113 119 114 120 ...
##  $ year   : int  1997 1997 1997 1997 1997 1997 1997 1997 1997 1997 ...
##  $ type   : Factor w/ 2 levels "groundwater",..: 1 1 2 1 1 1 1 1 1 1 ...
##  $ x      : num  785175 785168 799275 785175 785181 ...
##  $ y      : num  10752836 10752843 10753680 10752840 10752843 ...
##  $ distx  : num  6897 6904 0 6897 6891 ...
##  $ elv    : int  1338 1336 1336 1320 1300 1247 1240 1230 1228 1225 ...
##  $ aq     : Factor w/ 3 levels "breccias","clay",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ zone   : Factor w/ 2 levels "eff","inf": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ec     : num  71.9 71.9 77 71.9 71.9 71.9 71.9 71.9 71.9 71.9 ...
##  $ ph     : num  6.89 6.89 6.39 6.89 6.89 ...
##  $ hard   : num  11 11 26.4 11 11 11 11 11 11 11 ...
##  $ tds    : num  58.7 58.7 50 58.7 58.7 ...
##  $ temp   : num  21 21 16.1 21 21 ...
##  $ eh     : num  30 24 -0.45 34 35 32 30 23 24 21 ...
##  $ Q      : num  1 1 NA 1 1 1 1 1 1 1 ...
##  $ Ca     : num  1.8 1.8 9.44 1.8 1.8 1.8 1.8 1.8 1.8 1.8 ...
##  $ Mg     : num  1.7 1.7 0.72 1.7 1.7 1.7 1.7 1.7 1.7 1.7 ...
##  $ Fe     : num  0.08 0.08 0.216 0.08 0.08 0.08 0.08 0.08 0.08 0.08 ...
##  $ Mn     : num  0.22 0.22 0 0.22 0.22 0.22 0.22 0.22 0.22 0.22 ...
##  $ K      : num  1.7 1.7 0.8 1.7 1.7 1.7 1.7 1.7 1.7 1.7 ...
##  $ Na     : num  5 5 3.2 5 5 5 5 5 5 5 ...
##  $ CO3    : num  7 6 0 6 5.8 6.8 6.7 8 8 8.2 ...
##  $ HCO3   : num  8 7.8 31.4 6.7 7 ...
##  $ CO2    : num  36.3 36.3 7.28 36.3 36.3 36.3 36.3 36.3 36.3 36.3 ...
##  $ Cl     : num  4.8 4.8 5.52 4.8 4.8 4.8 4.8 4.8 4.8 4.8 ...
##  $ SO4    : num  0.6 0.6 0 0.6 0.6 0.6 0.6 0.6 0.6 0.6 ...
##  $ NO2    : num  0 0 0.04 0 0 0 0 0 0 0 ...
##  $ NO3    : num  4.7 4.7 2.24 4.7 4.7 4.7 4.7 4.7 4.7 4.7 ...
##  $ SiO2   : num  23 23 37.8 23 23 ...
##  $ cumrain: num  38.5 38.5 38.5 38.5 38.5 38.5 38.5 38.5 38.5 38.5 ...
##  $ lag1   : num  32.4 32.4 32.4 32.4 32.4 32.4 32.4 32.4 32.4 32.4 ...

`pairs` plot 1

Simple command

pairs(data)

`pairs` plot 1

You would get a plot but it’s:

ugly, too small
no legend and axis
we need to tweak it: group the variables and change plot code

`pairs` plot 1

pairs(group1,labels=colnames(group1),
      main="Physical parameter", 
      pch=21, bg=c("red", "blue")
      [unclass(data$type)],
      upper.panel=NULL)
legend(x=0.6, y=0.8, levels(data$type), 
        pt.bg=c("red", "blue"), 
        pch=21, bty="n", ncol=2, 
        horiz=F)

Grouping variables

# group data
# Data group 1: Physical parameters
group1 <- data[,c("x", "y", "elv", "aq", "ec", "ph",
                  "hard", "tds", "temp", "eh", "Q")]

# Data group 2: Cation
group2 <- data[,c("x", "y", "elv", "Ca",
                  "Mg", "Fe", "Mn", "K", "Na")]

## Data group 3: Anions (unit = ppm)
group3 = data[,c("x", "y", "CO3", "HCO3",
                 "CO2", "Cl", "SO4", "NO2",
                 "NO3", "SiO2")]

`Pairs` plot group 1 (physical parameters)

plot of chunk unnamed-chunk-4

`Pairs` plot group 2 (cations )

plot of chunk unnamed-chunk-5

`Pairs` plot group 3 (anions )

plot of chunk unnamed-chunk-6

Showing `PCA` analysis (`multi`variate analysis)

Why PCA (Principle Component Analysis)?

nature embeds multivariable process
has been widely used and developed since the 60’s
simple, straighforward, nearest neighbour (cluster) principles
offers nice visualisation

[Simple] codes

# install library
install.packages("pcaMethods") # for PCA
install.packages("gridExtra")  # for plot lay out

# load library
library(pcaMethods) # for PCA
library(gridExtra)  # for plot lay out

# run PCA
pca1 <- pca(group1,  
               method = "svdImpute", 
               scale = "uv",
               center = T,
               nPcs = 3,
               evalPcs = 1:3)

[Simple] codes

# evaluate results
summary(pca1)     # result summary
sDev(pca1)        # extracting eigenvalues
plot(sDev(pca1))  # plotting eigenvalues
loadings(pca1)    # plot loadings
scores(pca1)      # plot scores

Results: summary PCA1

## svdImpute calculated PCA
## Importance of component(s):
##                 PC1    PC2    PC3
## R2            0.258 0.1672 0.1257
## Cumulative R2 0.258 0.4251 0.5509

Results: summary PCA2

## svdImpute calculated PCA
## Importance of component(s):
##                  PC1    PC2    PC3
## R2            0.2891 0.1392 0.1064
## Cumulative R2 0.2891 0.4283 0.5347

Results: summary PCA3

## svdImpute calculated PCA
## Importance of component(s):
##                  PC1    PC2     PC3
## R2            0.1991 0.1337 0.09922
## Cumulative R2 0.1991 0.3327 0.43194

Results: Extract Eigenvalues PCA1

plot of chunk unnamed-chunk-13

Results: plot PCA Group 1

plot of chunk unnamed-chunk-14

Results: plot PCA Group 2

plot of chunk unnamed-chunk-15

Results: plot PCA Group 3

plot of chunk unnamed-chunk-16

Results: loadings and scores Group1

plot of chunk unnamed-chunk-17

Results: loadings and scores Group2

plot of chunk unnamed-chunk-18

Results: loadings and scores Group3

plot of chunk unnamed-chunk-19

spatial analysis (bubble plot)

why bubble plot?

shows spatial variation as well as values distribution
simple and straigtforward visualisation

[simple] codes

# load library (assuming all libraries are installed)
library(gstat)
library(sp)
library(rgdal)
library(latticeExtra)

# open and load data
df <- read.csv("BandungData.csv", header=TRUE)

# convert xy values as coordinates
coordinates(df) <- ~ x + y

[simple] codes

# make bubble plot
bubbleCa <- bubble(data, zcol="Ca", 
                   xlab="X coord", ylab="Y coord",
                   main="Bubble plot Ca",
                   col = data$zone, 
                   scales=list(tck=0.5))
print(bubbleCa)

Process

## rgdal: version: 0.9-1, (SVN revision 518)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.10.1, released 2013/08/26
## Path to GDAL shared files: /usr/share/gdal/1.10
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: (autodetected)

Results: groundwater end member (Ca)

plot of chunk unnamed-chunk-21

Results: groundwater end member (Mg)

plot of chunk unnamed-chunk-22

Results: river end member (Na)

plot of chunk unnamed-chunk-23

Results: river end member (K)

plot of chunk unnamed-chunk-24

Results: contamination signature (NO₃)

plot of chunk unnamed-chunk-25

Results: contamination signature (NO₂)

plot of chunk unnamed-chunk-26

Results: contamination signature (SO₄)

plot of chunk unnamed-chunk-27

Results: contamination signature (Cl)

plot of chunk unnamed-chunk-28

Remarks

higher mineral concentration in river water than groundwater should have occured in effluent flow.
higher mineral concentration in groundwater than river water should have occured in influent flow.
both natural indications are not detected, except for NO₂.

Remarks

the anomaly is due to dilution effect.
dilution overides enrichment effect.
the opposite would happen if sampling is conducted in dry season.
possibility of different catchment between groundwater and river water.

Closing

Future research opportunities:

to add more data in different locations along river bank, taken in both rain and dry season.
more exploratory statistical analysis, eg: multiple regression tree to extract data pattern.

Main references

Lubis, RF and Puradimaja, DJ, 2006, Hydrodynamic relationsships between groundwater and river water: CIkapundung river stream, West Java, Indonesia
Darul, A, Irawan, DE, and Trilaksono, NJ, 2014^a, Groundwater and river water interaction on Cikapundung River: Revisited, International Conference on Math and Natural Sciences, ITB.
Darul, A, 2014^b, Model konseptual interaksi air tanah dan air sungai di bantaran S. Cikapundung, Bandung, Jawa Barat, Tesis S2, Supervisor: Dr. Dasapta Erwin Irawan dan Dr. Nurjanna Joko Trilaksono.
Tanuwijaya, ZAJ, 2014, Identifikasi interaksi air sungai dan air tanah di DAS Cikapundung, Disertasi, Geologi Universitas Padjadjaran.

These slides were made using open-source tools

Ubuntu Linux (14.04)
R
Dia flowcharter
Gimp image editor

Geological analysis with open-source software: case Cikapundung River

Event: Sarasehan Geologi Populer, Badan Geologi Indonesia

Dasapta Erwin Irawan

16th March 2015

Pre-talk quiz

Part one: Introduction

Slide license

A bit about me

A bit about me

A bit about me

A bit about me

Skills in Geology

Essential skills for geologist

Software skills (not free version)

Software skills (free equivalent)

Why open source?

What is reproducibility in science?

What is reproducibility in science?

End of part one

Part two: Cikapundung case

Slide license

Background

Background

Background

Background

Background

What do we know so far?

What do we know so far?

What do we know so far?

What do we know so far?

What do we know so far?

Our question

Our tools

Showing pairs analysis (bivariate analysis)

Data format

Why pairs analysis

Our data

Our data

Load

Data structure

pairs plot 1

pairs plot 1

pairs plot 1

Grouping variables

Pairs plot group 1 (physical parameters)

Pairs plot group 2 (cations )

Pairs plot group 3 (anions )

Showing PCA analysis (multivariate analysis)

Why PCA (Principle Component Analysis)?

[Simple] codes

[Simple] codes

Results: summary PCA1

Results: summary PCA2

Results: summary PCA3

Results: Extract Eigenvalues PCA1

Results: plot PCA Group 1

Results: plot PCA Group 2

Results: plot PCA Group 3

Results: loadings and scores Group1

Results: loadings and scores Group2

Results: loadings and scores Group3

spatial analysis (bubble plot)

why bubble plot?

[simple] codes

[simple] codes

Process

Results: groundwater end member (Ca)

Results: groundwater end member (Mg)

Results: river end member (Na)

Results: river end member (K)

Results: contamination signature (NO3)

Results: contamination signature (NO2)

Results: contamination signature (SO4)

Results: contamination signature (Cl)

Remarks

Remarks

Closing

Main references

These slides were made using open-source tools

More resources

Showing `pairs` analysis (`bi`variate analysis)

Why `pairs` analysis

`pairs` plot 1

`pairs` plot 1

`pairs` plot 1

`Pairs` plot group 1 (physical parameters)

`Pairs` plot group 2 (cations )

`Pairs` plot group 3 (anions )

Showing `PCA` analysis (`multi`variate analysis)

Results: contamination signature (NO₃)

Results: contamination signature (NO₂)

Results: contamination signature (SO₄)