What is the average distance between two randomly chosen points in a square with a side length 1? The video presents how to estimate the answer and also derives the exact answer using integrals. Per suggestions in the comments, I have modified the title to indicate the difficulty level is VERY HARD (a college level calculus course and probability theory course are necessary background).
Blog post
http://wp.me/p6aMk-4FA
Let us try a simulation for this puzzle. Produce 1e6 pairs of points with in a unit square and determine their distance. Next make a histogram of the distribution. Test whether the distribution is normal or uniform?
set.seed(123)
x_1 = runif(1e6, min = 0, max = 1)
x_2 = runif(1e6, min = 0, max = 1)
y_1 = runif(1e6, min = 0, max = 1)
y_2 = runif(1e6, min = 0, max = 1)
# Calcualte distance from pt (x_1, y_1) to (x_2, y_2).
distance = rep(0, 1e6)
distance = sqrt((y_2-y_1)^2 + (x_2-x_1)^2)
average_distance = mean(distance)
sd_distance = sd(distance)
output1 = paste("mean = ", average_distance, ", std dev = ", sd_distance)
print(output1)
## [1] "mean = 0.521496403495269 , std dev = 0.247852242264267"
# Plot histgram of distances with Normal Curve
h <- hist(distance,
breaks = 50,
ylim = c(0, 40000),
xlim = c(0, 1.42),
col = "red",
xlab = "Distance",
main = "Distance of 1e6 Random Points in Unit Square")
xfit <- seq(min(distance), max(distance), length = 40)
yfit <- dnorm(xfit, mean = mean(distance), sd = sd(distance))
yfit <- yfit * diff(h$mids[1:2]) * length(distance)
lines(xfit, yfit, col = "blue", lwd = 2)
# Model fitting
# create sample data
set.seed(123)
sample_1000 <- sample(distance, 1000, replace = FALSE)
# normal fit
qqnorm(sample_1000)
qqline(sample_1000)
The fitdistr() function in the MASS package provides maximum-likelihood fitting of univariate distributions. The format is
fitdistr(x, densityfunction)
where x is the sample data and densityfunction is one of the following:
“beta”, “cauchy”, “chi-squared”, “exponential”, “f”, “gamma”, “geometric”, “log-normal”, “lognormal”, “logistic”, “negative binomial”, “normal”, “Poisson”, “t” or “weibull”
# take a random sample of size 100 from a dataset distance, sample w/o replacement
set.seed(123)
sample_1000 <- sample(distance, 1000, replace = FALSE)
# Estimate parameters assuming different distributions
# start = list(shape = 1, rate = 0.1),lower=0.001)
# estimate paramters
library(MASS)
fitdistr(sample_1000, "lognormal") # estimate paramters
## meanlog sdlog
## -0.79720081 0.62206889
## ( 0.01967155) ( 0.01390988)
#fitdistr(sample_1000, "cauchy") # estimate paramters
#fitdistr(sample_1000, "chi-squared") # estimate paramters
fitdistr(sample_1000, "exponential") # estimate paramters
## rate
## 1.90780471
## (0.06033008)
#fitdistr(sample_1000, "f") # estimate paramters
suppressWarnings(suppressMessages(fitdistr(sample_1000, "geometric"))) # estimate paramters
## prob
## 0.65609795
## (0.01216707)
#fitdistr(sample_1000, "logistic") # estimate paramters
#fitdistr(sample_1000, "negative binomial") # estimate paramters
fitdistr(sample_1000, "normal") # estimate paramters
## mean sd
## 0.524162664 0.244242458
## (0.007723625) (0.005461427)
suppressWarnings(suppressMessages(fitdistr(sample_1000, "Poisson")))# estimate paramters
## lambda
## 0.5241627
## (0.0228946)
#fitdistr(sample_1000, "weibull") # estimate paramters