The below graph shows an example distribution of bike trips:

dist_bike <- c(1.5, 9, 4, 2, 2, 3, 5, 3.4, 1) # distance, e.g. in km
hist(dist_bike) # plot a crude histogram of counts

d3 <- density(x = dist_bike, bw = 3) # calculate the kernel density
d1 <- density(x = dist_bike, bw = 1) # calculate the kernel density

Note that the ‘bw’ is the bandwidth, respresenting the standard deviation of the smoothing kernel. (I think this is an assumption about the s.d. of the underlying data - will need to be mode specific if so.)

Let’s plot the kernel density just estimated:

hist(dist_bike, freq = F)
lines(d3, col = "red")
lines(d1, col = "blue")

Fitting a functional form to KDE estimates

The MASS function fitdistr may fit functional forms to the pdf and it certainly can fit fixed distributions:

library(MASS)
m1 <- fitdistr(x = dist_bike, densfun = "gamma" )
hist(dist_bike, freq = F)
lines(dgamma(x = 0:10, m1$estimate[1], m1$estimate[2]))

It’s also possible to fit functional forms using the ‘np’ package, but not yet figured that out. This may be moving in the right direction though:

# install.packages("np") # if not installed
library(np)
## Nonparametric Kernel Methods for Mixed Datatypes (version 0.60-2)
## [vignette("np_faq",package="np") provides answers to frequently asked questions]
mnp <- npreg(dist_bike ~ log(dist_bike))
## 
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 /
Multistart 1 of 1 |
Multistart 1 of 1 |
                   
summary(mnp)
## 
## Regression Data: 9 training points, in 1 variable(s)
##               log(dist_bike)
## Bandwidth(s):      0.1096861
## 
## Kernel Regression Estimator: Local-Constant
## Bandwidth Type: Fixed
## Residual standard error: 0.06891933
## R-squared: 0.9991394
## 
## Continuous Kernel Type: Second-Order Gaussian
## No. Continuous Explanatory Vars.: 1
hist(dist_bike)
lines(fitted(mnp))