One of the most relevant consequences of the eruption of volcan Eyjafjoll (in Iceland), in 2010, is the contamination by fluoride. The latter is due to the deposit on the ground of the ash released in the atmosphere during the eruption.
The file “fluoruro.txt” reports the coordinates of 50 measurement sites s_i, i=1,…,50, the corresponding concentrations of fluoride (ppm) F(s_i) and the distance D.s_i of each site s_i from the crater of the volcano. Denoting by delta a zero-mean, second-order stationary and isotropic random field:
Estimate two empirical variograms, assuming the following models: F(s_i)=beta_0+delta(s_i) and F(s_i)=beta_0+beta_1*D.s_i+delta(s_i). Choose the most appropriate model for the observations.
Fit to the empirical variogram at point (a), a Gaussian model without nugget, via weighted least squares. Use as initial parameters: sill=100, range=0.02. Report the estimates of sill and range.
Fit to the empirical variogram chosen at point (a), a spherical model without nugget, via weighted least squares. Report the estimates of sill and range.
Compare the variograms estimated at points (b) and (c), with the empirical variogram at point (a). Given that the ash deposition is known to be a very regular phenomenon, which variogram model is the most appropriate?
Based on model (d), estimate the concentration of fluoride due to the eruption in the city of Raufarhofn (s0 = (0.3; 0.24), D.s0 = 0.1970)
Based on model (d), estimate the concentration of fluoride at the same location, due to a possible new eruption of equivalent intensity, independent of that of 2010. (USE THE MEAN)
df = read.table('C:\\Users\\lucap\\OneDrive\\Desktop\\ANNO 2\\Applied Statistics\\LAB APPLIED STATISTICS\\LAB 6\\fluoruro.txt')
print(df)
## X Y Conc.ppm D
## 1 0.22 0.24 58.4717460 0.18000000
## 2 0.00 0.21 -0.9714840 0.26627054
## 3 0.03 0.00 45.9547483 0.19924859
## 4 0.17 0.12 169.4762921 0.07810250
## 5 0.18 0.04 202.9179258 0.04472136
## 6 0.07 0.24 -0.2130984 0.23430749
## 7 0.27 0.27 38.7726104 0.21587033
## 8 0.32 0.17 90.4124393 0.14866069
## 9 0.08 0.04 112.8388939 0.14142136
## 10 0.14 0.05 177.6114388 0.08062258
## 11 0.25 0.22 75.3715236 0.16278821
## 12 0.24 0.02 215.7660293 0.04472136
## 13 0.02 0.08 32.6546277 0.20099751
## 14 0.11 0.06 138.0455583 0.11000000
## 15 0.38 0.07 92.3495422 0.16031220
## 16 0.09 0.28 -16.2851140 0.25553865
## 17 0.02 0.13 24.8324330 0.21189620
## 18 0.14 0.11 161.8645124 0.09433981
## 19 0.12 0.10 146.1278596 0.10770330
## 20 0.28 0.19 108.0733852 0.14317821
## 21 0.09 0.16 87.8413869 0.16401219
## 22 0.33 0.10 140.1593775 0.11704700
## 23 0.06 0.03 75.2826362 0.16278821
## 24 0.38 0.18 61.8261232 0.20000000
## 25 0.16 0.19 122.6832862 0.14317821
## 26 0.37 0.04 93.1288124 0.15132746
## 27 0.22 0.28 40.7512009 0.22000000
## 28 0.11 0.14 118.8163267 0.13601471
## 29 0.07 0.08 93.4539098 0.15132746
## 30 0.37 0.02 93.0423547 0.15524175
## 31 0.03 0.22 8.5508811 0.24839485
## 32 0.29 0.08 178.0955023 0.07280110
## 33 0.35 0.25 3.7967752 0.23021729
## 34 0.09 0.17 78.4513599 0.17029386
## 35 0.09 0.09 108.6936154 0.13341664
## 36 0.24 0.29 25.7550660 0.23086793
## 37 0.03 0.26 -31.7909147 0.27586228
## 38 0.25 0.15 173.5808031 0.09486833
## 39 0.39 0.18 54.4497595 0.20808652
## 40 0.34 0.18 80.0194483 0.16970563
## 41 0.04 0.28 -35.9040783 0.28425341
## 42 0.27 0.09 196.2377925 0.05830952
## 43 0.40 0.02 74.3956040 0.18439089
## 44 0.14 0.08 153.4082883 0.08246211
## 45 0.17 0.23 68.8767272 0.17720045
## 46 0.23 0.28 49.7325667 0.22022716
## 47 0.33 0.27 8.6094198 0.23706539
## 48 0.37 0.10 87.4608405 0.15524175
## 49 0.19 0.27 20.8716351 0.21213203
## 50 0.34 0.21 55.8641247 0.19209373
I have to specificy to the software what are the coordinates.
library(sp) ## Data management
library(lattice) ## Data management
library(gstat) ## Geostatistics (essential package)
library(geoR) ## Geostatistics (just for some plots)
## --------------------------------------------------------------
## Analysis of Geostatistical Data
## For an Introduction to geoR go to http://www.leg.ufpr.br/geoR
## geoR version 1.9-4 (built on 2024-02-14) is now loaded
## --------------------------------------------------------------
coordinates(df) = c('X','Y')
I have to verify that the type of the variables is correct
str(df)
## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
## ..@ data :'data.frame': 50 obs. of 2 variables:
## .. ..$ Conc.ppm: num [1:50] 58.472 -0.971 45.955 169.476 202.918 ...
## .. ..$ D : num [1:50] 0.18 0.2663 0.1992 0.0781 0.0447 ...
## ..@ coords.nrs : int [1:2] 1 2
## ..@ coords : num [1:50, 1:2] 0.22 0 0.03 0.17 0.18 0.07 0.27 0.32 0.08 0.14 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:50] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:2] "X" "Y"
## ..@ bbox : num [1:2, 1:2] 0 0 0.4 0.29
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:2] "X" "Y"
## .. .. ..$ : chr [1:2] "min" "max"
## ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
## .. .. ..@ projargs: chr NA
It is correct ! We can proceed.
v1 = variogram(Conc.ppm~1, df)
plot(v1)
v2 = variogram(Conc.ppm ~1+ D, df)
plot(v2)
I choose the second one as It has the most reasonable shape in this case, in fact the ash is coming from an explosion. Moreover the delta is isotropic (finite sill).
v.fit1 = fit.variogram(v2,vgm(100,'Gau', 0.02))
plot(v2,v.fit1)
v.fit1$psill
## [1] 96.03334
v.fit1$range
## [1] 0.0217514
v.fit2 = fit.variogram(v2,vgm(100,'Sph',0.02))
plot(v2,v.fit2)
paste('The partial sill is:',v.fit2$psill)
## [1] "The partial sill is: 94.4289524946686"
paste('The range is:',v.fit2$range)
## [1] "The range is: 0.0462548291455416"
Given that the deposition is a regular phenomena I expect that the one that fits the best is the Gaussian one, because it has a smoother behaviour close to zero.
We can compare the fits by computing the Sum of Squared error of the fitting process.
paste('The error for the Gaussian fit is',attr(v.fit1,'SSErr'))
## [1] "The error for the Gaussian fit is 31927827.2986808"
paste('The error for the Spherical fit is',attr(v.fit2,'SSErr'))
## [1] "The error for the Spherical fit is 38303097.1590093"
I take the one with the lowest error and, as expected, is the Gaussian one.
In this case I have to solve the Kriging problem. In fact I have to forecast the value of the Fluoride in a new unobserved position.
The first thing to do is to record this new position.
s0.new = data.frame(X=0.3, Y=0.24, D=0.1970) # UTM coordinates OF THE NEW LOCATION
coordinates(s0.new) = c('X','Y')
g.tr <- gstat(formula = Conc.ppm~1+D, data = df, model = v.fit1)
p1 = predict(g.tr,s0.new)
## [using universal kriging]
paste('The predicted level of Fluoride is', p1$var1.pred)
## [1] "The predicted level of Fluoride is 50.8060424796375"
paste('The variability associated with this prediction is ', p1$var1.var)
## [1] "The variability associated with this prediction is 98.6611862046467"
p.mean = predict(g.tr,s0.new, BLUE = TRUE)
## [generalized least squares trend estimation]
paste('The mean predicted level of Fluoride is', p.mean$var1.pred)
## [1] "The mean predicted level of Fluoride is 50.669031251992"
paste('The variability associated with this prediction is ', p.mean$var1.var)
## [1] "The variability associated with this prediction is 3.07724388197526"