n <- 100
x <- rnorm(n)
plot(density(x))
par(mfrow=c(1,2))
hist(x,freq=FALSE)
lines(density(x,bw=0.1),col=227) #Example of bandwidth too narrow
hist(x,freq=FALSE)
lines(density(x,bw=1.5),col=458) #Eample of bandwidth too wide
The data chosen for this assignment is the Chorley cancer data set which is described as spatial locations of cases of cancer of the larynx and cancer of the lung, and the location of disused industrial waste incinerators.The format is a marked point pattern, with an irregular window and a simple covariate. Chorley is a town in the Lancashire, UK. The data in the Chorley data set surrounds a single incinerator which operated between 1972 and 1980. Incidences of larynx and lung cancer spiked a mear 5-10 years later. The primary causative agent was thought to be dioxins from the burning of organic solvents.
Below we see that solely looking at the density of the data that there are far more incidences of lung cancer than of the larynx. This is also evident from the basic plots above.
both<-split(chorley)
larynx<-both$larynx
lung<-both$lung
hist(density(lung))
hist(density(larynx))
The perspective density plot (left) and the contour plot (right) indicate that there are two centers of high density. Without a good deal more of information, we can only infer that there is either a greater population density in these regions or there was a higher exposure rate. I suspect it is the former rather than latter.
require(spatstat)
data(chorley)
summary(chorley)
## Marked planar point pattern: 1036 points
## Average intensity 3.287268 points per square km
##
## *Pattern contains duplicated points*
##
## Coordinates are given to 1 decimal place
## i.e. rounded to the nearest multiple of 0.1 km
##
## Multitype:
## frequency proportion intensity
## larynx 58 0.05598456 0.1840363
## lung 978 0.94401540 3.1032320
##
## Window: polygonal boundary
## single connected closed polygon with 131 vertices
## enclosing rectangle: [343.45, 366.45] x [410.41, 431.79] km
## Window area = 315.155 square km
## Unit of length: 1 km
## Fraction of frame area: 0.641
chorley.den<-density(chorley)
persp(chorley.den,theta = 30, phi = 30)
plot(chorley.den, main='Larnyx and Throat Cancer Density',legend=TRUE)
contour(chorley.den,add=TRUE)
points(chorley,pch=c(6,8),cex=1.3)
You can also split the covariates up….NEAT! Here I applied the split
function and compared a kernel bandwidth of 1 (top) and 3 (bottom). The split
function is nice and simple for ppp. data sets that is categorical data where as the subset
function is nice for ppp. data sets that are continuous
plot(density(split(chorley),sigma=1)) #Lower level of smoothing
plot(density(split(chorley),sigma=3)) #Higher level of smoothing
n <- 100
both<-split(chorley)
larynx<-both$larynx
lung<-both$lung
larynxK<- envelope(larynx, fun = Kest, nsim = n)
## Generating 100 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
## 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
## 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.
##
## Done.
lungK<- envelope(lung, fun = Kest, nsim = n)
## Generating 100 simulations of CSR ...
## 1, 2, [etd 4:18] 3, [etd 4:02] 4,
## [etd 3:56] 5, [etd 3:54] 6, [etd 3:54] 7, [etd 3:50] 8,
## [etd 3:48] 9, [etd 3:47] 10, [etd 3:45] 11, [etd 3:42] 12,
## [etd 3:38] 13, [etd 3:36] 14, [etd 3:34] 15, [etd 3:33] 16,
## [etd 3:29] 17, [etd 3:28] 18, [etd 3:26] 19, [etd 3:24] 20,
## [etd 3:21] 21, [etd 3:18] 22, [etd 3:15] 23, [etd 3:13] 24,
## [etd 3:10] 25, [etd 3:08] 26, [etd 3:06] 27, [etd 3:02] 28,
## [etd 3:00] 29, [etd 2:57] 30, [etd 2:55] 31, [etd 2:52] 32,
## [etd 2:49] 33, [etd 2:46] 34, [etd 2:44] 35, [etd 2:42] 36,
## [etd 2:40] 37, [etd 2:37] 38, [etd 2:34] 39, [etd 2:32] 40,
## [etd 2:30] 41, [etd 2:27] 42, [etd 2:24] 43, [etd 2:22] 44,
## [etd 2:19] 45, [etd 2:17] 46, [etd 2:14] 47, [etd 2:11] 48,
## [etd 2:09] 49, [etd 2:07] 50, [etd 2:04] 51, [etd 2:02] 52,
## [etd 1:59] 53, [etd 1:57] 54, [etd 1:54] 55, [etd 1:51] 56,
## [etd 1:49] 57, [etd 1:46] 58, [etd 1:44] 59, [etd 1:42] 60,
## [etd 1:39] 61, [etd 1:37] 62, [etd 1:34] 63, [etd 1:32] 64,
## [etd 1:29] 65, [etd 1:27] 66, [etd 1:24] 67, [etd 1:22] 68,
## [etd 1:19] 69, [etd 1:17] 70, [etd 1:14] 71, [etd 1:12] 72,
## [etd 1:09] 73, [etd 1:07] 74, [etd 1:04] 75, [etd 1:02] 76,
## [etd 59 sec] 77, [etd 57 sec] 78, [etd 54 sec] 79, [etd 52 sec] 80,
## [etd 49 sec] 81, [etd 47 sec] 82, [etd 44 sec] 83, [etd 42 sec] 84,
## [etd 40 sec] 85, [etd 37 sec] 86, [etd 35 sec] 87, [etd 32 sec] 88,
## [etd 30 sec] 89, [etd 27 sec] 90, [etd 25 sec] 91, [etd 22 sec] 92,
## [etd 20 sec] 93, [etd 17 sec] 94, [etd 15 sec] 95, [etd 12 sec] 96,
## [etd 10 sec] 97, [etd 7 sec] 98, [etd 5 sec] 99, [etd 2 sec] 100.
##
## Done.
plot(larynxK)
plot(lungK)
par(mfrow=c(1,2))
plot(larynx,pch=1,cols = "green" )
plot(lung,pch=2,cex=0.4,cols= "deeppink" )
Below is four plots: Incidences of larynx cancers plotted over the density of larynx cancer incidence (top left);Incidences of Lung cancers plotted over the density of larynx cancer incidence (top right); Incidences of Larynx cancers plotted over the density of lung cancer incidence (bottom left); and Incidences of Lung cancers plotted over the density of lung cancer incidence (bottom right). No major differences are obvious. They seem to cluster in basically the same regions. However, the density of lung cancers is simply higher and therefor creates a slightly different point density distribution.
lung.den <- density(lung,sigma=1.7)
larynx.den <- density(larynx,sigma=1.7)
par(mfrow=c(2,2))
plot(larynx.den,main="Larynx upon Larynx")
points(larynx, pch='*',cex=1.5)
plot(larynx.den,main="Lung Upon Larynx")
points(lung, pch='.',cex=1.5)
plot(lung.den,main="Larynx Upon Lung",ylab="eae")
points(larynx, pch="*",cex=1.5)
plot(lung.den,main="Lung Upon Lung")
points(lung, pch='.',cex=1.5)
Below we can see that the both lung and larynx cancers are strongly clustered at all distances in the plot window. It is clear that the clustering of lung cancers is much stronger at all distances. The diagonals of the array show that the larynx and lung cancers are more tightly clustered than larynx cancers alone. Further, the distribution of lung cancers relative to other lung cancers are the most tightly clustered at all radii or distances. Given the time I would love to spend more time looking at the distribution of the population in this area relative to these centers of cancers incidence and perhaps look at the prevailing weather patterns to see what more could be driving this distribution. However I must move along… I too digress.
both2K <- alltypes(chorley, "K", envelope = TRUE, verbose=FALSE)
plot(both2K)