Estimation of Probability from the Kernel Density

Installation of Packages

if (!require(pacman)) install.packages("pacman")
## Loading required package: pacman
pacman::p_load(stats,openxlsx,gganimate,gapminder,rmdformats,ggpubr,ggplot2,readxl, RColorBrewer,data.table,astsa, foreign, haven)

Estimation of Probability

Generally in R there are standard functions for a particular distribution to estimate probabilities for some given interval. For example, for a normal distribution \(X\sim N(\mu,\sigma^2)\), p_k=pnorm(k, mu, sigma, lower.tail=TRUE) estimates \(P(X \leq k)\). To estimate \(P(a \leq X \leq b)\), one need to estimate p_b - p_a. But in real life data, the observed values may not follow any standard distribution. To estimate the probability, this codes derives probability from the kernel density for any arbitrary univariate data. In this code, we have three inputs

  • v: the univariate numeric vector for which density is estimated.

  • a: The lower bound

  • b: The upper bound

probability=function(v,a,b){
d=density(as.vector(na.omit(v)))
xx=d$x
yy=d$y
if(a<min(xx) | b>max(xx)){
print("Limits are outside the range")
}
else{
  f <- approxfun(xx, yy)
total=integrate(f, min(xx), max(xx))$value
prob=integrate(f, a, b)$value/total
print(prob)
plot(d,col="blue", main = "Kernel Density")
polygon(c(xx[xx>=a & xx<=b], b, a), c(yy[xx>=a & xx<=b], 0, 0), col="grey")
}
text((a+b)/2,max(yy)/2,paste("Prob(",a,"<X<",b,")=", round(prob,2)))
}

# An Example
data=read.xlsx("https://mudulisilu.files.wordpress.com/2021/11/monthly_data.xlsx")
v=data$CPIInflation
a=2
b=6
probability(v,a,b) #Probability that inflation will lie in between 2 to 6 per cent
## [1] 0.4096522