Differences between Worldclim and local data

The following SQL query was run before moving to R.

create view worldclim_comp as
select st.altitud,anprec, maxyr,minyr,nyrs,st.id,st.geom, (bioclim(st.geom)).* from
cna.estclim st,

(select station, avg(sum) anprec, count(yr) nyrs, max(yr) maxyr,min(yr) minyr from
(select * from
(select station, extract(year from fecha)  yr,sum(prec), count(fecha) nrecs from 
cna.prec
group by 
station,extract(year from fecha)) d

where nrecs >360) f
group by station) sq
where sq.station=st.id_estacio

The results are imported using RODBC.

library(RODBC)
con <- odbcConnect("gisdb")
d <- sqlQuery(con, "select * from worldclim_comp")

Now to look at at the percentage difference between Worldclim values and the yearly averages for precipitation in the CNA data we just need to divide one column by the other. Anprec is the CNA version and PrecAnnual comes from Worldclim.

boxplot(100 * d$anprec/d$precanual, col = "grey")

plot of chunk unnamed-chunk-3

This does not look too bad for Worldclim. Most of the values are within 20% of Worldclim. Hijmans coverage seems to be reliable enough to suggest that the outliers are more likely to be the result of anomalies (which may be genuine and interestin or simply mistakes) in the CNA data.

We can look at a more robust subset by only using data from stations with many records.

d1 <- subset(d, nyrs > 30)
str(d1)

## 'data.frame':    60 obs. of  12 variables:
##  $ altitud     : num  1775 1775 1240 675 620 ...
##  $ anprec      : num  896 896 1684 1015 4230 ...
##  $ maxyr       : int  2005 2005 2005 2004 2005 2005 2003 1970 1999 1990 ...
##  $ minyr       : int  1945 1945 1943 1958 1956 1946 1962 1923 1962 1946 ...
##  $ nyrs        : int  32 32 57 42 49 51 36 31 35 36 ...
##  $ id          : int  4972 4971 4852 5099 5160 5175 5184 4891 5118 4786 ...
##  $ geom        : Factor w/ 230 levels "0101000020E610000000D312F0574557C0979B154884953140",..: 42 42 228 80 151 172 106 190 75 139 ...
##  $ elev        : int  1889 1889 1262 622 901 485 69 568 707 2111 ...
##  $ tempannual  : num  16.8 16.8 20.4 24.3 23.8 26.1 27.3 24.7 24.2 15.2 ...
##  $ mesmasfrio  : num  7.7 7.7 11.3 14.7 15.6 18.8 18.7 15.7 14.5 6.4 ...
##  $ mesmascalido: num  25.7 25.7 29.6 33.8 31.7 33.6 35.3 33.3 33.8 23.3 ...
##  $ precanual   : int  995 995 1677 1264 4034 4513 1908 925 1635 1494 ...

boxplot(100 * d1$anprec/d1$precanual, col = "grey")

plot of chunk unnamed-chunk-4

In this case the match is quite convincing (using 60 climate stations in Chiapas).

Differences in elevation

These wil usually be due to human error unless the points fall in an area with very abrupt topogrpahy where 500m laterally can reslut in a grest change in elevation.


elevdif <- d$elev - d$altitud

boxplot(elevdif, col = "grey")

plot of chunk unnamed-chunk-5

This does need some further work to clarify problems with the data.

What is interesting is that in the raw data the two errors are only slightly correlated. Although a few mistakes lead to a statistically significant corelation, for most cases it appears that some other process has led to the discrpepancy.

precdif <- d$anprec/d$precanual
plot(precdif ~ elevdif, pch = 21, bg = 2)

plot of chunk unnamed-chunk-6

cor.test(precdif, elevdif, method = "spearman")

## 
##  Spearman's rank correlation rho
## 
## data:  precdif and elevdif 
## S = 2148848, p-value = 0.01474
## alternative hypothesis: true rho is not equal to 0 
## sample estimates:
##    rho 
## 0.1547