A Brief Overview
Indian Statistical Institute, Delhi
Directional data may appear in different fields.
For instance, a biologist may be measuring the direction of flight of a bird or the direction of migration of an animal.
Another example may be taken as data analysis in cricket.
Wind directions provide a natural source of circular data.
Two-dimensional directions can be represented as angles measured with respect to some suitably chosen “zero direction”, i.e., the starting point.Usually statisticians take true North as zero direction.
There is also a “sense of rotation”, i.e., whether clockwise or anti-clockwise, is taken as the positive direction. Usually, anti-clockwise direction is taken as a positive direction.
Because of this circular representation, observations on such two-dimensional directions are also called circular data.
Circular data are usually measured in degrees. However, it is sometimes useful to measure in radians.
R Packages.We load the following packages for data analysis.
The simplest representation of circular data is a circular raw data plot, in which each observation is plotted as a point on the unit circle.
Grouped circular data can be represented by circular histograms, which are analogous to histograms on the real line.
Each bar in a circular histogram is centered at the midpoint of the corresponding group of angles, and the area of the bar is proportional to the frequency in that group.
A useful variant of the circular histogram is the Rose Diagram, in which the bars of the circular histogram are replaced by sectors.
The area of each sector is proportional to the frequency in the corresponding group. To achieve this when the groups are of equal width, the radius of each sector should be proportional to the square root of the relevant frequency.
Wind Dataset.The M2 Tower is an 80-meter meteorological tower at the National Wind Technology Center (NWTC) that measures wind speed and direction . The tower is located at the west end of the site, about five miles south of Boulder, Colorado. The data collected from the tower is used for research into how the atmosphere and wind turbines interact, and is also used to display the NWTC’s weather.
# first create a true POSIXCT timestamp from the date and hour columns
wind$timestamp <- as.POSIXct(paste0(wind$date, " ", wind$hr),
tz = "GMT",
format = "%m-%d-%Y %H:%M")
wind=na.omit(wind)
# Convert the time stamp to years and months
wind$Year <- as.numeric(format(wind$timestamp, "%Y"))
wind$month <- factor(format(wind$timestamp, "%B"),
levels = month.name)
# now generate the faceting
p.wr3 <- plot.windrose(data=wind,spd="ws.80",dir="wd.80") + facet_wrap(~month,ncol = 3)+ theme(axis.text.x = element_blank(),axis.title.x = element_blank())After data have been plotted, it is useful to summarise them by appropriate descriptive statistics.
As we said before, in directional analysis we are interested in the direction and not in the magnitude of the vector and therefore we take these vectors to be of unit length (i.e., \(r=1\)) for convenience.
We will convert polar system to rectangular system by using the transformation,
\[ \left(1,\alpha\right)\iff\left(x=\cos\alpha,y=\sin\alpha\right) \]
Lets consider a sample of size \(2\) on the circle consisting of the angles \(1°\) and \(359°\). Cutting the circle at \(0°\) would give the sample mean as \(180°\) and the sample standard deviation with divisor \(n\)) as \(179°\) , whereas cutting the circle at \(180° = (−180°)\) would give the sample mean as \(0°\) and the sample standard deviation as \(1°\).
An appropriate and meaningful measure of the mean direction for a set of directions which are unimodal i.e., point or concentration towards a single direction, is obtained by treating the data as unit vectors and using the direction of their resultant vector.
Let \(α_1, α_2,\ldots,\alpha_n\) be a set of circular observations given in terms of angles. Consider the polar to rectangular transformation for each observation i.e., \(\left(\cos\alpha_i,\sin\alpha_i\right);i=1(1)n\) . We obtain the resultant vector of these n unit vectors by summing them component-wise, to get,
\[ \boldsymbol{R}=\left(\sum_{i=1}^n \cos\alpha_i,\sum_{i=1}^n \sin\alpha_i\right)=\left(C,S\right)\,\,\,\mathrm{(say)} \]
Let,
\[ R=\left\|\boldsymbol{R}\right\|=\sqrt{C^2+S^2} \]
represent the length of the resultant vector \(\boldsymbol{R}\). The direction of this resultant vector \(\boldsymbol{R}\), which is proposed as the circular mean direction, is denoted by \(\bar{\alpha}_0\), and is defined as,
\[ \bar{\alpha}_0=\arg\left(\sum_{i=1}^n \cos\alpha_i+i\sum_{i=1}^n \sin\alpha_i\right) \]
or by the equations,
\[ \cos\bar{\alpha}_0 = \frac{C}{R},\sin\bar{\alpha}_0 = \frac{S}{R} \]
More explicitly, it is given by the “quadrant-specific” inverse of the tangent that we shall refer to from now on as,
\[ \bar{\alpha}_0 = \arctan^* \left(\frac{S}{C}\right) \]
where,
\[ \bar{\alpha}_0= \arctan^* \left(\frac{S}{C}\right)=\begin{cases} \arctan\frac{S}{C} & \mathrm{if}\:C>0,S\geq0\\\frac{\pi}{2} & \mathrm{if}\:C=0,S>0\\\arctan\left(\frac{S}{C}\right)+\pi & \mathrm{if}\:C<0\\ \arctan\left(\frac{S}{C}\right)+2\pi & \mathrm{if}\:C\geq0,S<0\\\mathrm{undefined}& \mathrm{if}\:C=0,S=0 \end{cases} \]
\(R = 0\) if both \(C = 0, S = 0\implies\) The mean direction \(\bar{\alpha}_0\) doesn’t exist, and this happens when \(\alpha_i\)’s are evenly scattered over the circumference of the unit circle.
\[ 0\leq R\leq n \]
\(R = 0\) indicates that the directions \(α_1, α_2, \ldots , α_n\) are evenly spreaded over and the circumference of the unit circle. \(\iff\) The concentration towards a particular direction of \(\alpha_i\)’s is zero.
\(R = n\) indicates that all the angles \(\alpha_i\)’s are equal or equivalently they are exactly towards a particular direction and hence their concentration around the mean direction is maximum. Here we see that larger the value of \(R\) greater will ve the concentration (i.e. smaller will be the dispersion) so \(R\) gives a suitable measure of concentration (or dispersion) within the given set of directions.
\(\sum_{i=1}^n\sin\left(\alpha_i-\bar{\alpha}_0\right)=0\) . This result is analougas to\(\sum_{i=1}^n\left(x_i-\bar{x}\right)=0\) in linear case.
\(\sum_{i=1}^n\cos\left(\alpha_i-\bar{\alpha}_0\right)=R\)
For the purpose of robust estimation, it is useful to have a version for circular data of the sample median. A sample median direction \(\tilde{\alpha}\) of angles \(α_1,\ldots, α_n\) is any angle \(θ\) such that (i) half of the data points lie in the arc \(\left[\theta, \theta + \pi\right)\), arid (ii) the majority of the data points lie nearer to \(θ\) than to \(θ + π\) . When the sample size \(n\) is odd, the sample median is one of the data points. When \(n\) is even, it is convenient to take the sample median as the midpoint of two appropriate adjacent data points.
Length of \(R=\left\|\boldsymbol{R}\right\|\) is a useful measure for unimodal data, of how concentrated the data is towards its center.
\[ D_{\beta}\left(\boldsymbol{\alpha}\right)=D_{\bar{\alpha}_0}\left(\boldsymbol{\alpha}\right)+2R\left\{\sin\left(\frac{\beta-\bar{\alpha}_0}{2}\right)^2\right\} \]
This is analougas to the identity,
\[ \sum_{i=1}^n\left(x_i-\mu\right)^2 = \sum_{i=1}^n\left(x_i-\bar{x}\right)^2+n\left(\bar{x}-\mu\right)^2 \]
These results appear more intuitive and seem less mysterious if one notes that,
\[ \sin\theta\approx\theta\:\,\,\cos\theta\approx\left(1-\frac{\theta^2}{2}\right) \]
Hence using Taylor’s expansion we get,
\[ 2\left(1-\bar{R}\right)=\frac{2}{n}\sum_{i=1}^n\left(1-\cos\left(\alpha_i-\bar{\alpha}_0\right)\right)\approx\frac{1}{n}\sum_{i=1}^n\left(\alpha_i-\bar{\alpha}_0\right)^2=s^2 \]
This led some to define \(2\left(1-\bar{R}\right)\) as a “circular variance.” However it is preferable to stay with the original concentration measure \(R\) instead of trying to find its approximate linear equivalents.
Since \(e^{iα} = (\cos α + i \sinα)\) we note that,
\[ \bar{C}_n+i\bar{S}_n=\frac{1}{n}\sum_{j=1}^ne^{i\alpha_j} \]
Similarly we can define the higher order sample moments by taking higher powers of \(\left\{e^{i\alpha_j}\right\}\) and averaging these. We have,
\[ \frac{1}{n}\sum_{j=1}^n\left(e^{i\alpha_j}\right)^p=\frac{1}{n}\sum_{j=1}^n\cos p\alpha_j + i \frac{1}{n}\sum_{j=1}^n\sin p\alpha_j = \bar{C}_n(p) + i\bar{S}_n(p) \]
for \(p\in I^+\), where \(\left(\bar{C}_n(p),\bar{S}_n(p)\right)\) are called the \(p\)th order trigonometric moments based on the sample.
A circular distribution is a probability distribution whose total probability is concentrated on the circumference of a unit circle. Since each point on the circumference represents a direction, such a distribution is a way of assigning probabilities to different directions or defining a directional distribution. \(\theta \in \left(0,2\pi\right]\) or \(\left(-\pi,\pi\right]\) ,
\(f\left(\theta\right)\geq0\)
\(\int_{0}^{2\pi}f\left(\theta\right)d\theta=1\)
\(f\left(\theta\right)=f\left(\theta+2k\pi\right)\) for any integer \(k\) (i.e. \(f\) is periodic).
Let \(f\left(x\right)\) be a periodic function of period \(2\pi\), then the complex fourier series representation \(f\left(x\right),x\in\left[-\pi,\pi\right]\) is,
\[ f\left(x\right)=\sum_{k=-\infty}^{+\infty}c_ke^{ikx} \]
where,
\[ c_n = \frac{1}{2\pi}\int_{-\pi}^{\pi}f\left(x\right)e^{-inx}dx \, \, \forall n\in\mathbb{Z} \]
As on the real line, such a distribution can also be described via its “characteristic function”. However, since \(\theta\) is a periodic r.v. having the same distribution as \(\left(\theta+2\pi\right)\), the characteristic function of such a r.v. has the property :
\[ \varphi_\theta\left(t\right)\overset{def}{=}\mathbb{E}\left(e^{it\theta}\right)=\mathbb{E}\left(e^{it\left(\theta+2\pi\right)}\right)=e^{it2\pi}\varphi_\theta\left(t\right) \]
Hence, either \(\varphi_{\theta}\left(t\right)=0\) or \(e^{it2\pi}=1\) i.e., \(t\) must be an integer. Thus, for circular r.v.s, the characteristic function needs to be defined only at integer values.
The value of the characteristic function at an integer \(p\) is also called the \(p\)th trigonometric moment of \(\theta\).
\[ \varphi_\theta\left(t\right)=\mathbb{E}\left(e^{ip\theta}\right)=\mathbb{E}\left(\cos p\theta + i\sin p\theta\right) = \rho_pe^{i\mu_p}\,\,\,\, p\in\mathbb{Z} \]
Say,
\[ \alpha_p=\mathbb{E}\left(\cos p\theta\right),\beta_p=\mathbb{E}\left(\sin p\theta\right)\,\,\,p\in\mathbb{Z} \]
and these are related by the equation,
\[ \rho_p= \sqrt{\alpha^2_p+\beta^2_p}\,\,\,\mathrm{and}\,\,\mu_p=\arctan^*\left(\frac{\beta_p}{\alpha_p}\right) \]
Here the pdf \(f\left(\theta\right)\) is a periodic function of \(\theta\), so from the complex fourier series, we can write,
\[ c_p=\frac{1}{2\pi}\int_{-\pi}^{\pi}f\left(\theta\right)e^{-ip\theta}d\theta \]
\[ \iff2\pi c_p=\mathbb{E}\left(e^{-ip\theta}\right)=\varphi_{-p}\left(\theta\right) \]
That is,
\[ 2\pi f\left(\theta \right)=\sum_{k=-\infty}^{+\infty}\varphi_{-p}\left(\theta\right)e^{ip\theta}=1+\sum_{p=1}^{\infty}\left\{\varphi_{p}\left(\theta\right)e^{-ip\theta}+\varphi_{-p}\left(\theta\right)e^{ip\theta}\right\} \]
which gives,
\[ 2\pi f\left(\theta\right)=\left[1+2\sum_{p=1}^{\infty}\left(\alpha_p\cos p\theta + \beta_p\sin p\theta\right)\right] \]
Indeed, one can recover the pdf from either of these sequences. These trigonometric moments of \(\theta\) are the same as the Fourier coefficients in the Fourier series expansion of the pdf \(f\left(\theta\right)\). This is analogous to the “inversion formula” for characteristic functions of real-valued r.v.s.
We already know, \(1\)st order Trigonometric moment,
\[ \varphi_1=\alpha_1+i\beta_1 = \rho_1 e^{i\mu_1} \]
Say, \(\rho_{1}=\rho\) and \(\mu_{1}=\mu\). Then,
\(\mu=\) Population Mean Direction = Polar Direction.
\(\rho=\) Measure of Concentration. \(0\leq\rho\leq1\)
Analougas to the sample counterpart, here we can also define the mean distance \(\theta\) from angle \(\xi\) as,
\[ \underset{\xi\in\left[0,2\pi\right)}{\min}D_\theta\left(\xi\right)=1-\rho=\gamma \]
Measure of dispersion in the population is \(\gamma=1-\rho\in\left[0,1\right]\). Here \(\gamma\) is the population analouge of it’s sample version \(\left(1-\bar{R}\right)\).
But \(\gamma\in\left[0,1\right]\). Some authors define,
\[ \sigma^2_{Circ}=-2\log\rho=-2\log\left(1-\gamma\right) \]
as the measure of concentration as it ranges in \(\left[0,\infty\right)\).
Circular Uniform, Normal. Their Properties and Variations….
If the total probability is spread out uniformly on the circumference of a circle, we get the Circular Uniform (CU) distribution with the constant density,
\[ f\left(\theta\right)=\frac{1}{2\pi}\boldsymbol{1}\left[0\leq\theta\leq2\pi\right] \]
If the total probability is spread out uniformly on the circumference of a circle, we get the Circular Uniform (CU) distribution with the constant density,
\[ f\left(\theta\right)=\frac{1}{2\pi}\boldsymbol{1}\left[0\leq\theta\leq2\pi\right] \]
\(\theta\sim CU\left(0,2\pi\right)\implies\varphi_{p}\left(\theta\right)= \begin{cases}1 & \mathrm{for}\:p=0\\0 & \mathrm{for}\:p\in\mathbb{Z}\setminus\left\{0\right\} \end{cases}\)
\(\mu=\arctan^*\left(\frac{0}{0}\right)\) which is undefined. For Circular uniform distribution, there is no preferred direction.
\(\rho=\sqrt{\alpha^{2}+\beta^{2}}=0\)
\(\theta_{1},\theta_{2},\ldots,\theta_{n}\overset{iid}{\sim}CU\left(0,2\pi\right)\implies\sum_{i=1}^{n}\theta_i\sim CU\left(0,2\pi\right)\)
For \(\kappa>0,p\in\mathbb{Z}\),
A circular r.v. \(\theta\) is said to have a von Mises or a Circular Normal (CN) distribution if it has the density function :
\[ f\left(\theta;\mu,\kappa\right)=\frac{1}{2\pi I_0\left(\kappa\right)}e^{\kappa\cos\left(\theta-\mu\right)}\boldsymbol{1}\left[0\leq\theta<2\pi\right] \]
where \(0\leq\mu<2\pi\) and \(\kappa>0\) are parameters. Here \(I_{0}\left(\kappa\right)\) in the normalizing constant is the modified Bessel function of the first kind and order zero , and is given by,
\[ I_0\left(\kappa\right)=\frac{1}{2\pi}\int_0^{2\pi}\exp\left(\kappa\cos\theta\right)d\theta=\sum_{r=0}^{\infty}\left(\frac{\kappa}{2}\right)^{2r}\left(\frac{1}{r!}\right)^2 \]
Symmetry : By the symmetry of the cosine function, this density is seen to be symmetric about the direction \(\mu\) (as well as \(\mu+\pi\)).
Mode at \(\boldsymbol{\mu}\) : Since the cosine function has a maximum value at zero, the Circular Normal density is maximum at \(\theta=\mu\), i.e., \(\mu\) is the modal direction with the maximum value, \(f\left(\mu\right)=\frac{e^{\kappa}}{2\pi I_{0}\left(\kappa\right)}\)
Antimode at \(\boldsymbol{\mu\pm\pi}\) : Again since \(\cos\pi=-1\) is the minimum value, at the opposite end, when \(\theta=\mu\pm\pi\), we have the minimum density, namely, \(f\left(\mu\pm\pi\right)=\frac{e^{-\kappa}}{2\pi I_{0}\left(\kappa\right)}\)
Role of \(\boldsymbol{\kappa}\) : From the above expressions, \(\frac{f\left(\mu\right)}{f\left(\mu\pm\pi\right)}=e^{2\kappa}\) . Hence, the larger the value of \(\kappa\), the larger will be the ratio of \(f\left(\mu\right)\,\, \mathrm{to}\,\, f\left(\mu\pm\pi\right)\) indicating higher concentration towards the population mean direction \(\mu\). Thus, \(\kappa\) is a parameter which measures the concentration towards the mean direction \(\mu\).
#---Circular Representation of CN densities----
#--mu=0, kappa= 0.5,1,2,4---
cdensity(function(x){dvonmises(x,mu=0,kappa=4)},nlabels=0,
main="Circular Representation of CN Densities",col="navy",xlim=c(-0.8,1.5),ylim = c(-0.8,0.8))
cdensity(function(x){dvonmises(x,mu=0,kappa=2)},nlabels=0,col="brown",add=T)
cdensity(function(x){dvonmises(x,mu=0,kappa=1)},nlabels=0,col="darkgreen",add=T)
cdensity(function(x){dvonmises(x,mu=0,kappa=0.5)},nlabels=0,col="orange",add=T)
legend("bottomright",legend=c("0.5","1","2","4"),col=c("orange","darkgreen","brown","navy")
,title=expression(kappa),lwd=2)
segments(x0=0,y0=0,x1=1/sqrt(pi),y1=0)
box(lwd=3)#---Linear Representation of CN densities----
#--mu=0, kappa= 0.5,1,2,4---
cols=c("0.5"="orange","1"="darkgreen","2"="brown","4"="navy")
ggplot()+xlim(rad(c(-180,180)))+ylim(c(0,0.8))+geom_vline(xintercept=0,lwd=0.8)+
labs(y="Density",x=expression(theta),title = "Linear Representation of CN Densities",
caption=expression(paste(mu,"= 0")))+
geom_function(fun=function(x){dvonmises(x,mu=0,kappa=0.5)},aes(col="0.5"),lwd=1)+
geom_function(fun=function(x){dvonmises(x,mu=0,kappa=1)},aes(col="1"),lwd=1)+
geom_function(fun=function(x){dvonmises(x,mu=0,kappa=2)},aes(col="2"),lwd=1)+
geom_function(fun=function(x){dvonmises(x,mu=0,kappa=4)},aes(col="4"),lwd=1)+
scale_color_manual(expression(kappa),values = cols) One can define a CN distribution restricted to any arc of arbitrary length say \(\frac{2\pi}{\ell}\) for an integer \(\ell\), rather than the full circle, with the resulting density,
\[ g\left(\theta\right)=\frac{\ell}{2\pi I_0\left(\kappa\right)}\exp\left[\kappa\cos\ell\left(\theta-\mu\right)\right]\boldsymbol{1}\left[0\leq\theta<\frac{2\pi}{\ell}\right] \]
where \(0\leq\mu<\frac{2\pi}{\ell}\) .
A multi-modal CN density, say with \(\ell\) equidistant modes, is obtained through the density,
\[ g\left(\theta\right)=\frac{1}{2\pi I_0\left(\kappa\right)}\exp\left[\kappa\cos\ell\left(\theta-\mu\right)\right]\boldsymbol{1}\left[0\leq\theta<{2\pi}\right];\,\,\,0\leq\mu<2\pi \]
Consider the mixture of two von Mises distributions with probability density,
\[ f\left(\theta;\mu,\kappa\right)=pCN\left(\theta;\mu_1,\kappa_1\right)+\left(1-p\right)CN\left(\theta;\mu_2,\kappa_2\right) \]
where, \(0\leq\mu_1,\mu_2<2\pi;\kappa_1,\kappa_2>0;p\in\left(0,1\right)\)
For better illustration we generate \(1000\) random samples from Mixed \(\nu M\) distribution with \(\left(\mu_{1},\mu_{2},\kappa_{1},\kappa_{2}\right)'=\left(\frac{3\pi}{2},\frac{\pi}{2},2,4\right)\) with proportion \(\frac{1}{2}\) and draw a circular histogram with the density curve.
Suppose \(\alpha_{1},\alpha_{2},\ldots,\alpha_{n}\overset{iid}{\sim}CN\left(\mu,\kappa\right)\). Say, \(\boldsymbol{\alpha}=\left(\alpha_{1},\alpha_{2},\ldots,\alpha_{n}\right)'\). Then the likelihood function is given by,
Differentiating this with respect to \(\mu\), and \(\kappa\), respectively, and equating to zero, we obtain,
\[ \hat{\mu}_{MLE}=\arctan^*\left(\frac{S}{C}\right)=\bar{\alpha}_0 \]
\(\hat\kappa_{MLE}\) as a solution of the equation,
\[ \frac{I_1\left(\kappa\right)}{I_0\left(\kappa\right)}=\bar{R} \]
A unit random vector \(\boldsymbol{I}\) is said to have a \(p\)-variate von Mises-Fisher distribution if its p.e. is,
\[ C_p\left(\kappa\right)e^{\kappa\boldsymbol{\mu}'\boldsymbol{I}}dS_p\,\,\,I\in S_p \]
where \(\kappa\geq0\) and \(\boldsymbol{\mu}'\boldsymbol{\mu}=1\) . Here \(\kappa\) is called the concentration parameter and \(C_{p}\left(\kappa\right)\) is the normalizing constant. If a random vector \(\boldsymbol{I}\) has its p.e. of the form, we’ll say it’s distributed as \(\nu M_p\left(\boldsymbol{\mu},\kappa\right)\) . For the cases of \(p=2\) and \(p=3\), the distributions are called von Mises and Fisher distributions after the names of the originators 1, and these particular cases explain the common nomenclature for \(p>3\).
It can be shown that,
\[ C_p\left(\kappa\right)=\frac{\kappa^{\left(\frac{p-1}{2}\right)}}{\left\{\left(2\pi\right)^{\frac{p}{2}}\boldsymbol{I}_{\left(\frac{p-1}{2}\right)}\left(\kappa\right)\right\}} \]
where \(I_r\left(\kappa\right)\) denotes the modified Bessel Function of the first kind and order \(r\), which can be defined by the formula,
\[ I_r\left(\kappa\right)=\frac{\left(\frac{\kappa}{2}\right)^r}{\Gamma\left(r+\frac{1}{2}\right)\Gamma\left(\frac{1}{2}\right)}\int_0^\pi e^{\pm\kappa\cos\theta}\sin^{2r}\theta d\theta \]
Simulation and density evaluation for \(p=3\). Taking \(\boldsymbol{\mu}=(0,0,1)',\kappa=0.5\)
Taking \(\boldsymbol{\mu}=(0,0,1)',\kappa=2\)
Taking \(\boldsymbol{\mu}=(0,0,1)',\kappa=5\)
Let \(I_{1},I_{2},\dots,I_{n}\) be a random sample of \(\nu M\left(\boldsymbol{\mu},\kappa\right)\). Consider,
\[ H_0:\kappa=0\,\,\mathrm{against}\,\,H_1:\kappa\neq0 \]
where \(\boldsymbol{\mu}\) is unknown , let \(\lambda\) be the LR statistics. Then it can be shown that,
\[ \log\lambda=-n\left\{\log\left(C_p\left(\hat{\kappa}\right)\right)+\hat{\kappa}A\left(\hat{\kappa}\right)\right\} \]
Note that \(\lambda\) is a function of \(\bar{R}\) alone. It can be shown that \(\lambda\) is a monotonically decreasing function of \(\hat{\kappa}\). Finally , on differentiating with respect to \(\hat{\kappa}\) and using results we get , it is found that \(\hat{\kappa}\) is a monotonically increasing function of \(\bar{R}\). Therefore the critical region of the Rayleigh test reduces to \(\bar{R}>k\).
For large \(n\) it can be shown that,
\[ pn\bar{R}^2\overset{H_0}{\sim}\chi^2_p \]
Pigeons” Dataset.The circular package in R contains a dataset named pigeons. This has \(108\) rows and \(2\) columns. The observations are the vanishing bearings of homing pigeons displaced and released at two unfamiliar locations. The data are pooled with respect to the home direction (home direction set in \(360\) grades). This contains the following column,
treatment : A factor with levels:
c : Control pigeon (unmanipulated).
v1 : Pigeons subjected to bilateral section of the ophthalmic branch of the trigeminal nerve.
on : Pigeons subjected to bilateral section of the olfactory nerve.
bearing : Vanishing bearing of each bird in degrees.The bearings are in degrees, we need to change them into radians.
With respect to the factors, we first make a multiple stacked histogram of bearing, to get an idea about the data.
Let’s obtain the summary statistics.
[1] 0.1742687
n r rbar var
1 108 61.81342 0.5723464 0.4276536
Here the variability is much high. A natural question may be,
Before going into that, we need to look for appropriate distribution of bearing. Will \(CN\left(\mu,\kappa\right)\) give a better fit for this data for some suitable \(\mu,\kappa\) ?
Assuming the underlying distribution is normal, we find the MLEs of \(\mu,\kappa\).
Call:
mle.vonmises(x = pigeons$bearing)
mu: 0.1743 ( 0.1076 )
kappa: 1.401 ( 0.1862 )
With these estimates we again plot the circular histogram alongwith the fitted density.
The proposed model gives a nice fit. It’s plausible to work with CN distribution in this case. Now, we check whether the pigeons have any preferred direction by rayleigh test.
Rayleigh Test of Uniformity
General Unimodal Alternative
Test Statistic: 0.5723
P-value: 0
This test strictly rejects \(H_{0}\) at level \(\alpha=0.05\). The pigeons have a preferred direction and an estimate of the direction is \(0.1743\) radians towards north-east.
Directional Statistics - Kanti V. Mardia, Peter E. Jupp - John Wiley & Sons.
Multivariate Analysis - KV Mardia, JT Kent, JM Bibby - Academic Press
Topics in Circular Statistics - Sreenivasa Rao Jammalamadaka, A. Sengupta - World Scientific.
Circular Statistics in R - Arthur Pewsey, Markus Neuhauser, Graeme D. Ruxton - Oxford University Press.
FISHER, N. I. (1993), Statistical Analysis of Circular Data, Cambridge University Press, Cambridge.
CSORGO, M. AND HORVATH, L. (1996), A note on the change-point problem for angular data, Statist. Probab. Lett, 27, 61—65.
Wikipedia
Any Questions (-_- ;) ?
Indian Statistical Institute, Delhi.