Anything after a hash sign “#” is considered “not R code” and it is not processed by R. Thus, the hash sign “#” is used to include comments and sometimes to prevent the execution (we often say “comment out”) of particular command lines that we want to keep in the script rather than to delete (i.e., to indicate an alternative way of doing a task).
In the R script of this page, you can observe than we often use the hash sign followed by an apostrophe in this way “#’”. This is a particular way of commenting that I will explain in future sessions (it has to do with the way of producing html pages such as the one you are reading out of R scripts).
I assume here that you have installed both R and RStudio Desktop on your computer.
It is very important having a well organized directory in R. Please create an specific directory for the course, (i.e. Rcourse or RtalleR or whatever) and a subdirectory named RLab1. Furthermore, within RLab1 create a subdirectory named RLab1Data and copy this file in it:
https://dl.dropboxusercontent.com/u/3180464/taller/RLab1/mikola.csv
Avoid using the Desktop for the directory of the course. Better look for a rational location in your directory tree.
In general, avoid as much as possible having empty spaces embeded within file or directory names (better use “_" if you absolutely want to separate words in a name).
We will be always using R through RStudio in this talleR. RStudio automatically starts R for you, please do not start both RStudio and then R on your own.
The RStudio GUI was briefly explained during the session and we will progressively learn more features along the sessions. RStudio is a pretty straigthforward and minimalist GUI, but if you need a guide Oscar Torres-Reyna from Princeton University has authored an excellent one: Introduction to RStudio
At the begining of the seesion, we started up RStudio, created a new project in your RLab1 directory (which we will call “R working directory” hereafter), and in few minutes learned:
It is much shorter and more clear explaining the above issues “in vivo” than writing how to do it here. If you missed the session, please ask any of those who actually attended.
From now on in this page, I assume you have started RStudio and are working on your script named RLab1.R in your RLab1.Rproj project.
Numeric
a <- 12
The symbol “<-” stands for “assign”. Nowadays, “<-” has the same effect as “=”
a = 12
a
## [1] 12
Note that executing an object name as a command, has printed itself to the console. To check the type of object:
class(a)
## [1] "numeric"
You can also use simple mathematical operators:
b = a^2+3
x <- sqrt(a) + b
Use ? or help() to get help. In the case of the arithmetic operators, you need “”:
?"+"
?sqrt
help(class)
Character
a <- "DIOGENES"
a
## [1] "DIOGENES"
class(a)
## [1] "character"
In future sessions, we will always include the path of the working directory in which the script was designed to work at the beginning. You can get the path of your current directory with
getwd()
## [1] "/media/alobo/KINGSTON/Rictja/RLab1"
and then save it as a character variable
rwd <- "/Volumes/KINGSTON/Rictja/RLab1/.RData"
so that you keep documented the context in which you wrote your script.
a <- 1:10
a
## [1] 1 2 3 4 5 6 7 8 9 10
a <- seq(from=1,to=10,by=1)
#a <- seq(from=1,to=10,by=0.1)
a <- c(1,2,3,4,5,6,7,8,9,10)
length(a)
## [1] 10
summary(a)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 3.25 5.50 5.50 7.75 10.00
sum(a)/length(a)
## [1] 5.5
mean(a)
## [1] 5.5
sd(a)
## [1] 3.028
Note that functions use () and have named arguments
Names of the arguments can be omitted but then arguments must be in order We always use “=” (not “<-”) for named arguments in the function call
seq(from=1,to=3,by=.5)
## [1] 1.0 1.5 2.0 2.5 3.0
seq(1,3,.5)
## [1] 1.0 1.5 2.0 2.5 3.0
seq(by=0.5,from=1,to=3)
## [1] 1.0 1.5 2.0 2.5 3.0
seq(0.5,1,3) #wrong order
## [1] 0.5
?seq
In particular for beginners, it is better to name the arguments in the call, except for very simple functions
Character vectors
x <- c("A","B")
x
## [1] "A" "B"
class(x)
## [1] "character"
x <- as.character(a)
x
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
vectors must have all elements of the same type
x <- c(a,"A") #numeric "upgraded" to character
x
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "A"
class(x)
## [1] "character"
Forcing a character to numeric renders a NA (“Not Available”, code for missing data)
x <- as.numeric(x)
## Warning: NAs introduced by coercion
x
## [1] 1 2 3 4 5 6 7 8 9 10 NA
We use “[]”, not “()”: this indicates to us that the object is data and not a function
x[3]
## [1] 3
x[c(2,5,1)]
## [1] 2 5 1
ind <- c(3,1,4)
ind
## [1] 3 1 4
x[ind]
## [1] 3 1 4
x[3] == 6
## [1] FALSE
x[x>5]
## [1] 6 7 8 9 10 NA
x[which(x>5)]
## [1] 6 7 8 9 10
NA particularities
x[11]
## [1] NA
length(x)
## [1] 11
x[length(x)]
## [1] NA
x[11] == NA
## [1] NA
is.na(x[11])
## [1] TRUE
any(is.na(x))
## [1] TRUE
which(is.na(x))
## [1] 11
summary(x)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 3.25 5.50 5.50 7.75 10.00 1
x +1
## [1] 2 3 4 5 6 7 8 9 10 11 NA
mean(x)
## [1] NA
mean(x,na.rm=TRUE)
## [1] 5.5
x <- x[!is.na(x)]
x
## [1] 1 2 3 4 5 6 7 8 9 10
shorter vector
x2 <- x[3:7]
x2 <- x[-c(1,2)]
a <- 1:10
b <- sqrt(a)
plot(a,b)
plot(a,b,xlab="a", ylab="b = sqrt(a)", main="My first plot in R")
plot(a,b,xlab="a", ylab="b = sqrt(a)",
xlim=c(0,15), ylim= c(0,10),
main="My first plot in R")
b2 <- b +3
points(a,b2,pch=19)
lines(a,b2)
legend("topright",pch=c(1,19), legend=c("b","b2"))
dev.off()
## null device
## 1
To check symbols and colors
plot(1:20,1:20,pch=1:20,col=1:20)
Let’s use the quadratic equation as an example:
a <- 3
b <- -5
c <- 1
The roots are:
x1 <- (-b + sqrt(b^2 - 4*a*c))/(2*a)
x2 <- (-b - sqrt(b^2 - 4*a*c))/(2*a)
Quick plot:
x <- seq(from=-10, to=10, by=0.1)
plot(x,a*x^2 + b*x + c,type="l",ylim=c(-10,10))
abline(a=0,b=0)
abline(h=0)
abline(v=x1,col="red")
abline(v=x2,col="green")
Now let’s create your first script (actually your second one, as your first one is this one that you are writing now).
Copy all lines since the last heading to another script file and save it as “quad_log.R” Then let’s clear the graphics
dev.off()
## null device
## 1
…and the variables
rm(x,x1,x2,a,b,c)
Check that these variable do not exist any more in your workspace, i.e.
a
## Error: object 'a' not found
and “source” the script file:
source("quad_log.R")
Note that you get the plot and that the variables are back to your workspace:
x1
## [1] 1.434
x2
## [1] 0.2324
And now let’s create your first function: copy the follwing lines to another script file and save it as “miquad.R” (and remove the first “#” of each line)
# miquad <- function(v=c(3,-5,1))
# # quadratic function for a vector of 3 real parameters
# {
# v[1]*x^2 + v[2]*x + v[3]
# }
As before:
dev.off()
## null device
## 1
rm(x,x1,x2,a,b,c)
source("miquad.R")
But now the source() has got different consequences:
miquad
## function(v=x, p=c(3,-5,1))
## # quadratic function for a vector of 3 real parameters
## {
## p[1]*v^2 + p[2]*v + p[3]
## }
miquad()
## Error: object 'x' not found
Omitting the “()” implies that R just prints out the contents of the object, which in this case is a function. Adding the “()” has exectuted the function, but we get en error because the function requires an argument (v) that defaults to x, and optionally accepts a second argument (p). Let’s create x again and run miquad() with it:
x <- seq(from=-10, to=10, by=0.1)
xq <- miquad(v=x)
plot(x,miquad(x),type="l")
plot(x,miquad(x,p=c(2,9,11)),type="l")
p
## Error: object 'p' not found
v
## Error: object 'v' not found
The function just returns to the user workspace whatever it is in its last line (you can use return() to make it clear if you want), but internal (to the function) variables such as p or v in this case just vanish after the function completes.
#set.seed(10)
a <- rnorm(n=1000,m=2,sd=2.5)
plot(1:1000,a)
hist(a)
mean(a)
## [1] 2.073
sd(a)
## [1] 2.496
b <- runif(n=10000,min=min(a), max=max(a))
hist(b,nclass=50)
Robust statistics
b <- a
b[200] <- 1000*a[200] #make an outlier
summary(a)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -6.85 0.44 2.10 2.07 3.76 9.14
summary(b)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -6.8 0.4 2.1 3.2 3.8 1180.0
hist(a)
hist(b)
In the presence of outliers, median and mad are better estimators of the “true” mean and sd
mean(a)
## [1] 2.073
median(a)
## [1] 2.096
mean(b)
## [1] 3.252
median(b)
## [1] 2.098
(mean(b)-mean(a))*100/mean(a)
## [1] 56.82
(median(b)-mean(a))*100/mean(a)
## [1] 1.164
(sd(b)-sd(a))*100/sd(a)
## [1] 1395
(mad(b)-sd(a))*100/sd(a)
## [1] -1.121
Assume we have two populations:
a <- rnorm(n=100000,m=2,sd=2.5)
b <- rnorm(n=100000,m=2.5,sd=2.5)
and make 3 samples:
a1 <- sample(a,size=1000)
a2 <- sample(a,size=1000)
b1 <- sample(b,size=1000)
summary(a)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -9.450 0.313 2.000 2.000 3.700 13.200
summary(a1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -7.16 0.49 2.18 2.15 3.92 10.00
summary(a2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -6.800 0.339 2.000 2.030 3.870 8.920
summary(b1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -6.130 0.783 2.630 2.560 4.430 10.100
Tests: H0 The difference between the 2 means is 0
t.test(a1,a2) #we cannot reject H0
##
## Welch Two Sample t-test
##
## data: a1 and a2
## t = 1.113, df = 1998, p-value = 0.2657
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.09547 0.34620
## sample estimates:
## mean of x mean of y
## 2.153 2.028
t.test(a1,b1) #we can reject H0
##
## Welch Two Sample t-test
##
## data: a1 and b1
## t = -3.561, df = 1995, p-value = 0.000378
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.6384 -0.1849
## sample estimates:
## mean of x mean of y
## 2.153 2.565
hist(a1)
hist(a2,col="red",add=TRUE)
Not very useful as bars overlap Try this
hist(a1,col=alpha("blue", 0.5))
## Error: could not find function "alpha"
You get an error because alpha() is not in the “base” or “core” of R Load the package “scales”
require("scales")
## Loading required package: scales
You get again an error if you have not installed package “scales”. Install with Tools/Install Packages…
hist(a1,col=alpha("blue", 0.5))
hist(b1,col=alpha("green", 0.5),add=TRUE)
Set of vectors of the same length. As vectors, all elements must be of the same type
A <- cbind(a1,a2, b1)
A[10,3]
## b1
## 2.522
A[10,]
## a1 a2 b1
## 6.195 2.838 2.522
#A[,2]
class(A)
## [1] "matrix"
dim(A)
## [1] 1000 3
nrow(A)
## [1] 1000
ncol(A)
## [1] 3
length(A)
## [1] 3000
rownames(A)
## NULL
colnames(A)
## [1] "a1" "a2" "b1"
colnames(A) <- c("samp1","samp2","samp3")
head(A,3)
## samp1 samp2 samp3
## [1,] 3.135 4.735 3.436
## [2,] 5.499 5.401 4.147
## [3,] 1.989 -2.738 0.270
tail(A)
## samp1 samp2 samp3
## [995,] -0.6071 4.9520 1.3484
## [996,] 2.2766 -0.2093 5.3785
## [997,] 0.6017 -1.5982 0.6143
## [998,] -0.2809 2.1062 4.5710
## [999,] 1.4591 -0.2311 4.5802
## [1000,] 1.9465 -0.7089 1.4442
B <- A^2 #Note you must press ^ space, not just ^; also: B <- A**2
Note the effect of the ^2 in the distribution
hist(A[,1])
hist(B[,1])
Let’s keep the normality:
B <- A*2
We cannot refer to rows or cols that do not exist:
B[,4] <- 0.5*B[,3] + 10 + rnorm(n=nrow(B), m=0,sd=1)
## Error: subscript out of bounds
Correct way:
B <- cbind(B, 0.5*B[,3] + 10 + rnorm(n=nrow(B), m=0,sd=1))
head(B,3)
## samp1 samp2 samp3
## [1,] 6.270 9.470 6.872 13.65
## [2,] 10.998 10.801 8.293 15.61
## [3,] 3.978 -5.476 0.540 10.19
For rows:
B <- rbind(c(1:4),B)
Removing rows or cols:
B <- B[-1,] #remove first row
Removing last column:
B <- cbind(B,1) #add column of 1: R will repeat for us
dim(B)
## [1] 1000 5
B <- B[,-ncol(B)] #remove last column
dim(B)
## [1] 1000 4
summary(B)
## samp1 samp2 samp3 V4
## Min. :-14.322 Min. :-13.591 Min. :-12.26 Min. : 2.92
## 1st Qu.: 0.981 1st Qu.: 0.678 1st Qu.: 1.57 1st Qu.:10.56
## Median : 4.354 Median : 4.008 Median : 5.26 Median :12.70
## Mean : 4.307 Mean : 4.056 Mean : 5.13 Mean :12.58
## 3rd Qu.: 7.843 3rd Qu.: 7.730 3rd Qu.: 8.86 3rd Qu.:14.50
## Max. : 20.009 Max. : 17.840 Max. : 20.13 Max. :21.56
Quick Plots
plot(x=B[,3],y=B[,4],pch=19)
plot(x=B[,3],y=B[,4],pch=19,col=alpha("black",0.25),cex=0.7)
Effect of outliers: Before including outliers:
summary(B)
## samp1 samp2 samp3 V4
## Min. :-14.322 Min. :-13.591 Min. :-12.26 Min. : 2.92
## 1st Qu.: 0.981 1st Qu.: 0.678 1st Qu.: 1.57 1st Qu.:10.56
## Median : 4.354 Median : 4.008 Median : 5.26 Median :12.70
## Mean : 4.307 Mean : 4.056 Mean : 5.13 Mean :12.58
## 3rd Qu.: 7.843 3rd Qu.: 7.730 3rd Qu.: 8.86 3rd Qu.:14.50
## Max. : 20.009 Max. : 17.840 Max. : 20.13 Max. :21.56
t.test(B[,1],B[,3])
##
## Welch Two Sample t-test
##
## data: B[, 1] and B[, 3]
## t = -3.561, df = 1995, p-value = 0.000378
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.2767 -0.3699
## sample estimates:
## mean of x mean of y
## 4.307 5.130
Include outliers:
Boutl <- B
Boutl[c(10,100,1000),1] <- runif(n=3,min=50,max=500)
Boutl[c(10,100,1000),1]
## [1] 168.6 284.1 406.4
summary(Boutl)
## samp1 samp2 samp3 V4
## Min. :-14.3 Min. :-13.591 Min. :-12.26 Min. : 2.92
## 1st Qu.: 1.0 1st Qu.: 0.678 1st Qu.: 1.57 1st Qu.:10.56
## Median : 4.4 Median : 4.008 Median : 5.26 Median :12.70
## Mean : 5.1 Mean : 4.056 Mean : 5.13 Mean :12.58
## 3rd Qu.: 7.9 3rd Qu.: 7.730 3rd Qu.: 8.86 3rd Qu.:14.50
## Max. :406.4 Max. : 17.840 Max. : 20.13 Max. :21.56
t.test(Boutl[,1],Boutl[,3])
##
## Welch Two Sample t-test
##
## data: Boutl[, 1] and Boutl[, 3]
## t = 0.012, df = 1187, p-value = 0.9904
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.103 1.116
## sample estimates:
## mean of x mean of y
## 5.137 5.130
boxplot(Boutl[,1:2],notch=TRUE)
boxplot(Boutl[,1:2],notch=TRUE,ylim=c(0,30))
Statistical detection of outliers
plot(quantile(Boutl[,1],probs=seq(0,1,0.1)),seq(0,1,0.1),
xlab="percentile",
ylab="probability")
Assume median and mad are good estimates of undisturbed mean and sd
mean(B[,1])
## [1] 4.307
mean(Boutl[,1])
## [1] 5.137
median(Boutl[,1])
## [1] 4.368
sd(B[,1])
## [1] 5.071
sd(Boutl[,1])
## [1] 17.09
mad(Boutl[,1])
## [1] 5.112
Then exclude very extreme values only, i.e. 4sd
pnorm(4,m=0,sd=1) #prob of being within m-4sd and m+4sd in the Gaussian
## [1] 1
See http://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule
minval1 <- median(Boutl[,1])-4*mad(Boutl[,1])
maxval1 <- median(Boutl[,1])+4*mad(Boutl[,1])
Review candidate values to be NA: accept only if theoretically justified
Boutl[Boutl[,1]<minval1,1]
## numeric(0)
Boutl[Boutl[,1]>maxval1,1]
## [1] 168.6 284.1 406.4
set to NA
Boutl[Boutl[,1]<minval1,1] <- NA
Boutl[Boutl[,1]>maxval1,1] <- NA
summary(Boutl[,1])
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -14.300 0.979 4.340 4.290 7.820 20.000 3
Test now?
t.test(Boutl[,1],Boutl[,3])
##
## Welch Two Sample t-test
##
## data: Boutl[, 1] and Boutl[, 3]
## t = -3.63, df = 1992, p-value = 0.0002905
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.2929 -0.3859
## sample estimates:
## mean of x mean of y
## 4.29 5.13
would this be correct? No! We must apply the same rule to the other distribution
minval3 <- median(Boutl[,3])-4*mad(Boutl[,3])
maxval3 <- median(Boutl[,3])+4*mad(Boutl[,3])
Boutl[Boutl[,3]<minval3,3]
## numeric(0)
Boutl[Boutl[,3]>maxval3,3]
## numeric(0)
Boutl[Boutl[,3]<minval3,3] <- NA
Boutl[Boutl[,3]>maxval3,3] <- NA
summary(Boutl[,3])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -12.30 1.57 5.26 5.13 8.86 20.10
Test now
t.test(Boutl[,1],Boutl[,3])
##
## Welch Two Sample t-test
##
## data: Boutl[, 1] and Boutl[, 3]
## t = -3.63, df = 1992, p-value = 0.0002905
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.2929 -0.3859
## sample estimates:
## mean of x mean of y
## 4.29 5.13
#There are specific tools for outlier detection in R:
# packages "extremevalues", "outliers"
# http://www.dbs.ifi.lmu.de/~zimek/publications/KDD2010/kdd10-outlier-tutorial.pdf
multi-dimensional matrices We skip this by now.
Basic class for experimental data in R: as a matrix, but each column vector (“variable”) can be of different type. Let’s read an actual example from a data file:
dirdata <- "/Volumes/KINGSTON/Rictja/RLab1/RLab1Data" #NOTE: the actual path depends on your system
#For example, this could work:
dirdata <- "./RLab1Data"
kola <- read.csv(file.path(dirdata,"mikola.csv"),stringsAsFactors=FALSE)
#mg/kg
class(kola)
## [1] "data.frame"
dim(kola)
## [1] 1816 11
nrow(kola)
## [1] 1816
head(kola)
## ID XCOO YCOO ELEV H LITO Al K Pb S Cu
## 1 1 547960 7693790 135 M 20 71.2 4360 1.50 783 5.63
## 2 2 770025 7679167 140 M 4 245.0 4040 4.61 880 20.40
## 3 3 498651 7668151 255 M 31 103.0 4060 2.43 809 6.65
## 4 4 795152 7569386 240 M 20 307.0 4770 3.83 894 41.70
## 5 5 437050 7855900 80 M 10 253.0 4220 2.65 714 3.85
## 6 6 752106 7626023 140 M 20 238.0 5560 3.18 953 28.90
tail(kola)
## ID XCOO YCOO ELEV H LITO Al K Pb S Cu
## 1811 786 672583 7430000 160 C 1 9510 1000 1.5 24 17.8
## 1812 901 625409 7700000 210 C 52 6090 400 1.4 27 15.2
## 1813 902 657495 7740000 90 C 21 21300 3000 7.5 66 38.3
## 1814 903 685682 7740000 170 C 9 22600 1800 20.2 48 36.7
## 1815 904 683207 7730000 180 C 4 6310 1400 2.8 68 19.8
## 1816 905 680047 7720000 200 C 21 8430 2000 3.9 46 19.5
summary(kola)
## ID XCOO YCOO ELEV
## Min. : 1 Min. :372602 Min. :7370000 Min. : 15
## 1st Qu.:185 1st Qu.:498651 1st Qu.:7510000 1st Qu.:140
## Median :383 Median :600600 Median :7610000 Median :200
## Mean :386 Mean :602356 Mean :7611125 Mean :204
## 3rd Qu.:579 3rd Qu.:700690 3rd Qu.:7700000 3rd Qu.:260
## Max. :906 Max. :861309 Max. :7890000 Max. :540
##
## H LITO Al K
## Length:1816 Min. : 1.0 Min. : 34 Min. : 100
## Class :character 1st Qu.: 4.0 1st Qu.: 290 1st Qu.: 900
## Mode :character Median : 20.0 Median : 1975 Median : 1300
## Mean : 21.2 Mean : 5193 Mean : 2257
## 3rd Qu.: 31.0 3rd Qu.: 7065 3rd Qu.: 3800
## Max. :107.0 Max. :85900 Max. :11000
## NA's :74
## Pb S Cu
## Min. : 0.3 Min. : 2 Min. : 2
## 1st Qu.: 1.9 1st Qu.: 46 1st Qu.: 6
## Median : 3.7 Median : 860 Median : 11
## Mean : 10.2 Mean : 830 Mean : 27
## 3rd Qu.: 14.9 3rd Qu.:1370 3rd Qu.: 21
## Max. :1110.0 Max. :3830 Max. :4080
## NA's :1
table(kola$LITO)
##
## 1 4 6 7 9 10 20 21 22 31 32 51 52 81 82 83 105 107
## 369 70 25 118 68 145 291 67 45 125 101 190 86 3 17 11 8 3
kola$LITO <- as.character(kola$LITO)
head(kola,3)
## ID XCOO YCOO ELEV H LITO Al K Pb S Cu
## 1 1 547960 7693790 135 M 20 71.2 4360 1.50 783 5.63
## 2 2 770025 7679167 140 M 4 245.0 4040 4.61 880 20.40
## 3 3 498651 7668151 255 M 31 103.0 4060 2.43 809 6.65
boxplot(Al~H,data=kola)
Observe 2 problems: order of H is not correct and ylim must be adjusted
kola$H <- factor(kola$H, levels= c("M","0","C"))
boxplot(Al~H,data=kola,ylim=c(0,20000))
#Actually, distributions are log-normal:
hist(kola$Al)
boxplot(Al~H,data=kola, log="y", main="Al")
boxplot(Pb~H,data=kola, log="y",main="Pb")
boxplot(S~H, data=kola, log="y",main="S")
boxplot(Cu~H,data=kola, log="y",main="Cu")
To have all 4 plots in the same page:
par(mfrow=c(2,2))
boxplot(Al~H,data=kola, log="y", main="Al")
boxplot(Pb~H,data=kola, log="y",main="Pb")
boxplot(S~H, data=kola, log="y",main="S")
boxplot(Cu~H,data=kola, log="y",main="Cu")
To save the plot you can use Export or do it by command:
bmp("kolaplots.bmp",height=512*4/5, width=512)
par(mfrow=c(2,2))
boxplot(Al~H,data=kola, log="y", main="Al")
boxplot(Pb~H,data=kola, log="y",main="Pb")
boxplot(S~H, data=kola, log="y",main="S")
boxplot(Cu~H,data=kola, log="y",main="Cu")
dev.off()
## pdf
## 2