Showing pairs analysis (bivariate analysis)
Why pairs analysis
- equivalent to
correlation matrix
- the fastest way to see correlations between variables
- pls bear in mind
correlation does not always mean causality
Our data
- 295 samples
- From five years periode (1997, 1998, 2007, 2011, 2012)
Load
# load data
data <- as.data.frame(read.csv("BandungData.csv",
header = TRUE))
attach(data)
## The following object is masked from package:datasets:
##
## CO2
Data structure
# data structure
str(data)
## 'data.frame': 295 obs. of 33 variables:
## $ no : int 16 22 263 17 12 18 13 19 14 20 ...
## $ code : int 116 122 8 117 112 118 113 119 114 120 ...
## $ year : int 1997 1997 1997 1997 1997 1997 1997 1997 1997 1997 ...
## $ type : Factor w/ 2 levels "groundwater",..: 1 1 2 1 1 1 1 1 1 1 ...
## $ x : num 785175 785168 799275 785175 785181 ...
## $ y : num 10752836 10752843 10753680 10752840 10752843 ...
## $ distx : num 6897 6904 0 6897 6891 ...
## $ elv : int 1338 1336 1336 1320 1300 1247 1240 1230 1228 1225 ...
## $ aq : Factor w/ 3 levels "breccias","clay",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ zone : Factor w/ 2 levels "eff","inf": 1 1 1 1 1 1 1 1 1 1 ...
## $ ec : num 71.9 71.9 77 71.9 71.9 71.9 71.9 71.9 71.9 71.9 ...
## $ ph : num 6.89 6.89 6.39 6.89 6.89 ...
## $ hard : num 11 11 26.4 11 11 11 11 11 11 11 ...
## $ tds : num 58.7 58.7 50 58.7 58.7 ...
## $ temp : num 21 21 16.1 21 21 ...
## $ eh : num 30 24 -0.45 34 35 32 30 23 24 21 ...
## $ Q : num 1 1 NA 1 1 1 1 1 1 1 ...
## $ Ca : num 1.8 1.8 9.44 1.8 1.8 1.8 1.8 1.8 1.8 1.8 ...
## $ Mg : num 1.7 1.7 0.72 1.7 1.7 1.7 1.7 1.7 1.7 1.7 ...
## $ Fe : num 0.08 0.08 0.216 0.08 0.08 0.08 0.08 0.08 0.08 0.08 ...
## $ Mn : num 0.22 0.22 0 0.22 0.22 0.22 0.22 0.22 0.22 0.22 ...
## $ K : num 1.7 1.7 0.8 1.7 1.7 1.7 1.7 1.7 1.7 1.7 ...
## $ Na : num 5 5 3.2 5 5 5 5 5 5 5 ...
## $ CO3 : num 7 6 0 6 5.8 6.8 6.7 8 8 8.2 ...
## $ HCO3 : num 8 7.8 31.4 6.7 7 ...
## $ CO2 : num 36.3 36.3 7.28 36.3 36.3 36.3 36.3 36.3 36.3 36.3 ...
## $ Cl : num 4.8 4.8 5.52 4.8 4.8 4.8 4.8 4.8 4.8 4.8 ...
## $ SO4 : num 0.6 0.6 0 0.6 0.6 0.6 0.6 0.6 0.6 0.6 ...
## $ NO2 : num 0 0 0.04 0 0 0 0 0 0 0 ...
## $ NO3 : num 4.7 4.7 2.24 4.7 4.7 4.7 4.7 4.7 4.7 4.7 ...
## $ SiO2 : num 23 23 37.8 23 23 ...
## $ cumrain: num 38.5 38.5 38.5 38.5 38.5 38.5 38.5 38.5 38.5 38.5 ...
## $ lag1 : num 32.4 32.4 32.4 32.4 32.4 32.4 32.4 32.4 32.4 32.4 ...
pairs plot 1
Simple command
pairs(data)
pairs plot 1
You would get a plot but it’s:
- ugly, too small
- no legend and axis
- we need to tweak it: group the variables and change plot code
pairs plot 1
pairs(group1,labels=colnames(group1),
main="Physical parameter",
pch=21, bg=c("red", "blue")
[unclass(data$type)],
upper.panel=NULL)
legend(x=0.6, y=0.8, levels(data$type),
pt.bg=c("red", "blue"),
pch=21, bty="n", ncol=2,
horiz=F)
Grouping variables
# group data
# Data group 1: Physical parameters
group1 <- data[,c("x", "y", "elv", "aq", "ec", "ph",
"hard", "tds", "temp", "eh", "Q")]
# Data group 2: Cation
group2 <- data[,c("x", "y", "elv", "Ca",
"Mg", "Fe", "Mn", "K", "Na")]
## Data group 3: Anions (unit = ppm)
group3 = data[,c("x", "y", "CO3", "HCO3",
"CO2", "Cl", "SO4", "NO2",
"NO3", "SiO2")]
Pairs plot group 1 (physical parameters)
Pairs plot group 2 (cations )
Pairs plot group 3 (anions )
Showing PCA analysis (multivariate analysis)
Why PCA (Principle Component Analysis)?
- nature embeds multivariable process
- has been widely used and developed since the 60’s
- simple, straighforward, nearest neighbour (cluster) principles
- offers nice visualisation
[Simple] codes
# install library
install.packages("pcaMethods") # for PCA
install.packages("gridExtra") # for plot lay out
# load library
library(pcaMethods) # for PCA
library(gridExtra) # for plot lay out
# run PCA
pca1 <- pca(group1,
method = "svdImpute",
scale = "uv",
center = T,
nPcs = 3,
evalPcs = 1:3)
[Simple] codes
# evaluate results
summary(pca1) # result summary
sDev(pca1) # extracting eigenvalues
plot(sDev(pca1)) # plotting eigenvalues
loadings(pca1) # plot loadings
scores(pca1) # plot scores
Results: summary PCA1
## svdImpute calculated PCA
## Importance of component(s):
## PC1 PC2 PC3
## R2 0.258 0.1672 0.1257
## Cumulative R2 0.258 0.4251 0.5509
Results: summary PCA2
## svdImpute calculated PCA
## Importance of component(s):
## PC1 PC2 PC3
## R2 0.2891 0.1392 0.1064
## Cumulative R2 0.2891 0.4283 0.5347
Results: summary PCA3
## svdImpute calculated PCA
## Importance of component(s):
## PC1 PC2 PC3
## R2 0.1991 0.1337 0.09922
## Cumulative R2 0.1991 0.3327 0.43194
Results: plot PCA Group 1
Results: plot PCA Group 2
Results: plot PCA Group 3
Results: loadings and scores Group1
Results: loadings and scores Group2
Results: loadings and scores Group3
spatial analysis (bubble plot)
why bubble plot?
- shows spatial variation as well as values distribution
- simple and straigtforward visualisation
[simple] codes
# load library (assuming all libraries are installed)
library(gstat)
library(sp)
library(rgdal)
library(latticeExtra)
# open and load data
df <- read.csv("BandungData.csv", header=TRUE)
# convert xy values as coordinates
coordinates(df) <- ~ x + y
[simple] codes
# make bubble plot
bubbleCa <- bubble(data, zcol="Ca",
xlab="X coord", ylab="Y coord",
main="Bubble plot Ca",
col = data$zone,
scales=list(tck=0.5))
print(bubbleCa)
Process
## rgdal: version: 0.9-1, (SVN revision 518)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.10.1, released 2013/08/26
## Path to GDAL shared files: /usr/share/gdal/1.10
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: (autodetected)
Results: groundwater end member (Ca)
Results: groundwater end member (Mg)
Results: river end member (Na)
Results: river end member (K)
Results: contamination signature (NO3)
Results: contamination signature (NO2)
Results: contamination signature (SO4)
Results: contamination signature (Cl)
Closing
Future research opportunities:
- to add more data in different locations along river bank, taken in both rain and dry season.
- more exploratory statistical analysis, eg: multiple regression tree to extract data pattern.
Main references
- Lubis, RF and Puradimaja, DJ, 2006, Hydrodynamic relationsships between groundwater and river water: CIkapundung river stream, West Java, Indonesia
- Darul, A, Irawan, DE, and Trilaksono, NJ, 2014a, Groundwater and river water interaction on Cikapundung River: Revisited, International Conference on Math and Natural Sciences, ITB.
- Darul, A, 2014b, Model konseptual interaksi air tanah dan air sungai di bantaran S. Cikapundung, Bandung, Jawa Barat, Tesis S2, Supervisor: Dr. Dasapta Erwin Irawan dan Dr. Nurjanna Joko Trilaksono.
- Tanuwijaya, ZAJ, 2014, Identifikasi interaksi air sungai dan air tanah di DAS Cikapundung, Disertasi, Geologi Universitas Padjadjaran.