Estimation of Probability from the Kernel Density
Installation of Packages
if (!require(pacman)) install.packages("pacman")## Loading required package: pacman
pacman::p_load(stats,openxlsx,gganimate,gapminder,rmdformats,ggpubr,ggplot2,readxl, RColorBrewer,data.table,astsa, foreign, haven)Estimation of Probability
Generally in R there are standard functions for a particular distribution to estimate probabilities for some given interval. For example, for a normal distribution \(X\sim N(\mu,\sigma^2)\), p_k=pnorm(k, mu, sigma, lower.tail=TRUE) estimates \(P(X \leq k)\). To estimate \(P(a \leq X \leq b)\), one need to estimate p_b - p_a. But in real life data, the observed values may not follow any standard distribution. To estimate the probability, this codes derives probability from the kernel density for any arbitrary univariate data. In this code, we have three inputs
v: the univariate numeric vector for which density is estimated.
a: The lower bound
b: The upper bound
probability=function(v,a,b){
d=density(as.vector(na.omit(v)))
xx=d$x
yy=d$y
if(a<min(xx) | b>max(xx)){
print("Limits are outside the range")
}
else{
f <- approxfun(xx, yy)
total=integrate(f, min(xx), max(xx))$value
prob=integrate(f, a, b)$value/total
print(prob)
plot(d,col="blue", main = "Kernel Density")
polygon(c(xx[xx>=a & xx<=b], b, a), c(yy[xx>=a & xx<=b], 0, 0), col="grey")
}
text((a+b)/2,max(yy)/2,paste("Prob(",a,"<X<",b,")=", round(prob,2)))
}
# An Example
data=read.xlsx("https://mudulisilu.files.wordpress.com/2021/11/monthly_data.xlsx")
v=data$CPIInflation
a=2
b=6
probability(v,a,b) #Probability that inflation will lie in between 2 to 6 per cent## [1] 0.4096522