Author: Andrzej Jarynowski, IVEB FU Berlin

1. Load the data and introduction

Loading and preparing the unofficial data - suspected 87 case submitted by animal owners. Participatory epidemiology has been largely employed for the control and early warning of infectious diseases spread within resource-limited settings, and also being popular method to gather data about COVID-19 in Poland. This group https://www.facebook.com/groups/1382403905988672 formed a passive surveillance system.

2. Data exploration

2.1 Spatial component

colourScale = colorRampPalette(brewer.pal(9,"OrRd"))(16)[1:16]
cols = colourScale[zlicz$cases_per_million+1]
plot(contour, lwd=0.4, border="gray30", col=NA); plot(nutsM, col=cols , lwd=0.1, add=T)
rast = raster(as.matrix(c(min(zlicz$cases_per_million,na.rm=T),max(zlicz$cases_per_million,na.rm=T))))
title("Suspected cases per milion citizens", side=3, line=-1.5, at=0, cex=0.5, col="gray30")

## Warning in title("Suspected cases per milion citizens", side = 3, line = -1.5,
## : 'side' nie jest parametrem graficznym

## Warning in title("Suspected cases per milion citizens", side = 3, line = -1.5,
## : 'at' nie jest parametrem graficznym

plot(rast, legend.only=T, add=T, col=colourScale, legend.width=0.5, legend.shrink=0.3, smallplot=c(0.870,0.885,0.3,0.8), adj=3,
     axis.args=list(cex.axis=0.65, lwd=0, col="gray30", lwd.tick=0.2, col.tick="gray30", tck=-1.0, col.axis="gray30", line=0, mgp=c(0,0.45,0)), alpha=1, side=3)

We see that most affected region from participatory ample is Pomeranian V. and secondary Kuj-Pom V. and Lower Silesia V.

2.2 Temporal component

Sys.setlocale(category="LC_ALL", locale="US")

## Warning in Sys.setlocale(category = "LC_ALL", locale = "US"): using locale code
## page other than 65001 ("UTF-8") may cause problems

## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

cats$week=week(cats$date)
cats$name=1
p<-ggplot(cats, aes(name,date)) +
  geom_violin(scale="count")+ ggtitle("Smoothed suspected Cases intensity in time")+
  theme_bw()
p

p=ggplot(cats, aes(x=week)) +
  geom_histogram(fill="black", alpha=1, position="identity")+ 
  ggtitle("Weekly histogram of suspected Cases")+
  theme_bw()
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

p=ggplot(cats, aes(x=date)) +
  geom_histogram(fill="black", alpha=1, position="identity")+
    ggtitle("Daily histogram of suspected Cases")+
  theme_bw()
p

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The shape of the co called epidemic curve is not symmetric. One of the possible non-biological explanation is that people lost their attention in the beginning of July. Here, we take the date of symptoms developing into account. Since July 8, no new cases were reported.

cats$cum=NA
cats = cats%>% mutate(cum = cumsum(name))
ggplot(cats,aes(date,cum))+geom_line(color="blue")+
xlab("Time")+ylab("Cumulative suspected cases")+
  theme_bw()

By visual inspection of smoothed epidemic curve and cumulative distribution, the peak of outbreak (by symptoms developments) was around 14-17.06.

2.3 Spatio-Temporal component

newmap <- getMap(resolution = "low")


plot(newmap, xlim = c(15, 25), ylim = c(45, 55), asp = 1)
points(cats$lon+rnorm(87)/10, cats$lat+rnorm(87)/10, col = cats$week, pch=20, cex = 1)
legend("topleft", legend = c(20:27), col = c(20:27), lty = 1, bty = "n")
title("Weeks of suspected symptoms dev.")

cor(as.numeric(cats$date),cats$lon,use="pairwise.complete.obs" )

## [1] -0.2141932

cor(as.numeric(cats$date),cats$lat,use="pairwise.complete.obs")

## [1] 0.1616069

korWE=rcorr(as.numeric(cats$date),cats$lon)
korWE$P

##            x          y
## x         NA 0.05968957
## y 0.05968957         NA

korWE$r

##            x          y
## x  1.0000000 -0.2141932
## y -0.2141932  1.0000000

korNS=rcorr(as.numeric(cats$date),cats$lat)
korNS$P

##           x         y
## x        NA 0.1574969
## y 0.1574969        NA

korNS$r

##           x         y
## x 1.0000000 0.1616069
## y 0.1616069 1.0000000

We see that in first weeks of the observed period a significant amounts of suspected cases are close to the border with Ukraine (second part of May). We see the peak of intensity of cases in Gdańsk area around mid of June.

There is borderline significant correlation between time and geography (East to West gradient).

3. Kernel density estimate (KDE) and Relative risk

3.1 Kernel density using the default bandwidth

K1 <- density(pp_) 
plot(K1, main="cat KDE default bandwidth Gaussisan kernel", las=1, alpha=1)

#contour(K1, add=TRUE,levels=c(0.05), lty=1:2)

We see Pomeranian - central Poland clear cluster.

4. Distance based methods for clustering

4.1 Global clustering with Average Nearest Neighbour (ANN)

Average Nearest Neighbour (ANN) is the average distance between all points within a dataset and their individual nearest point.

## ANN
## ANN
mean(nndist(pp_, k = 1))

## [1] 0.2038613

## ANN
mean(nndist(pp_, k = 2))

## [1] 0.3584615

4.2 Local clustering with K-mean test

#process_ <- preProcess(cats[-no_geo,c(30,29,31)], method=c("range"))

#norm_scale_ <- predict(process_, cats[-no_geo,c(30,29,31)])
km.res <- kmeans(norm_scale_[, c(1,2)], 3, nstart = 5)
#km.res$cluster
#integrity of points
km.res$betweenss

## [1] 6.488437

km.res$totss

## [1] 9.687858

newmap <- getMap(resolution = "low")

plot(newmap, xlim = c(15, 25), ylim = c(45, 55), asp = 1)
points(cats_geo$lon +rnorm(78)/10, cats_geo$lat +rnorm(78)/10, col = km.res$cluster, pch=20, cex = 1)
title("XY no time")

newmap <- getMap(resolution = "low")
km.res <- kmeans(norm_scale_[, c(1,2,3)], 3, nstart = 5)
#km.res$cluster
km.res$betweenss

## [1] 7.433772

km.res$totss

## [1] 13.74246

plot(newmap, xlim = c(15, 25), ylim = c(45, 55), asp = 1)
points(cats_geo$lon +rnorm(78)/10, cats_geo$lat +rnorm(78)/10, col = km.res$cluster, pch=20, cex = 1)
title("XYT (3 clusters)")

newmap <- getMap(resolution = "low")

km.res <- kmeans(norm_scale_[, c(1,2,3,3)], 3, nstart = 5)
#km.res$cluster
plot(newmap, xlim = c(15, 25), ylim = c(45, 55), asp = 1)
points(cats_geo$lon +rnorm(78)/10, cats_geo$lat +rnorm(78)/10, col = km.res$cluster, pch=20, cex = 1)
title("XYTT (3 clusters) Time with doubled weight")

cats_geo$clus=km.res$cluster

summ_clus=ddply(cats_geo,.(clus), summarise, day_med=median(date), day_mean=mean(date),count=length(date), lon=mean(lon), lat=mean(lat))
summ_clus

##   clus    day_med   day_mean count      lon      lat
## 1    1 2023-06-17 2023-06-18    44 18.77591 51.60193
## 2    2 2023-05-20 2023-05-21     8 21.77408 50.99285
## 3    3 2023-06-17 2023-06-16    26 18.64486 53.99787

There is clear spatio-temporal cluster of suspected cases in second part of Mai on Lublin/Podkarpacie area.

5. Preliminary conclusions

From participatory data 2 hypotheses come to mind both with local outbreak in May in Lublin area:

Wild water-birds at/along water reservoirs on specific migration routes (from South-Eastern Poland of Bug transboudary region). Only if the virus would have had to have gotten there already in spring and survived in the environment and caused mild-symptomatic infections;
contamination of poultry meat in somewhere in production chain somewhere in Lublin region (maybe imported from Ukraine?) (but with reach to the whole country).

Spatio-temporal analysis of H5N1 in cats in Poland (citizens data)

2023-07-22