What Is The Distance Between Two Random Points In A Square?

What is the average distance between two randomly chosen points in a square with a side length 1? The video presents how to estimate the answer and also derives the exact answer using integrals. Per suggestions in the comments, I have modified the title to indicate the difficulty level is VERY HARD (a college level calculus course and probability theory course are necessary background).

Blog post
http://wp.me/p6aMk-4FA

Let us try a simulation for this puzzle. Produce 1e6 pairs of points with in a unit square and determine their distance. Next make a histogram of the distribution. Test whether the distribution is normal or uniform?

set.seed(123)
x_1 = runif(1e6, min = 0, max = 1)
x_2 = runif(1e6, min = 0, max = 1)
y_1 = runif(1e6, min = 0, max = 1)
y_2 = runif(1e6, min = 0, max = 1)

# Calcualte distance from pt (x_1, y_1) to (x_2, y_2).
distance = rep(0, 1e6)
distance = sqrt((y_2-y_1)^2 + (x_2-x_1)^2)

average_distance = mean(distance)
sd_distance = sd(distance)
output1 = paste("mean = ", average_distance, ",  std dev = ", sd_distance)
print(output1)
## [1] "mean =  0.521496403495269 ,  std dev =  0.247852242264267"
# Plot histgram of distances with Normal Curve
h <- hist(distance, 
          breaks = 50,
          ylim = c(0, 40000),
          xlim = c(0, 1.42),
          col = "red", 
          xlab = "Distance", 
          main = "Distance of 1e6 Random Points in Unit Square") 

xfit <- seq(min(distance), max(distance), length = 40) 
yfit <- dnorm(xfit, mean = mean(distance), sd = sd(distance)) 
yfit <- yfit * diff(h$mids[1:2]) * length(distance) 
lines(xfit, yfit, col = "blue", lwd = 2)

# Model fitting

# create sample data 
set.seed(123)
sample_1000 <- sample(distance, 1000, replace = FALSE)

# normal fit
qqnorm(sample_1000)
qqline(sample_1000)

The fitdistr() function in the MASS package provides maximum-likelihood fitting of univariate distributions. The format is

fitdistr(x, densityfunction)

where x is the sample data and densityfunction is one of the following:

“beta”, “cauchy”, “chi-squared”, “exponential”, “f”, “gamma”, “geometric”, “log-normal”, “lognormal”, “logistic”, “negative binomial”, “normal”, “Poisson”, “t” or “weibull”

# take a random sample of size 100 from a dataset distance, sample w/o replacement
set.seed(123)
sample_1000 <- sample(distance, 1000, replace = FALSE)

# Estimate parameters assuming different distributions
# start = list(shape = 1, rate = 0.1),lower=0.001)

# estimate paramters
library(MASS)
fitdistr(sample_1000, "lognormal") # estimate paramters
##      meanlog        sdlog   
##   -0.79720081    0.62206889 
##  ( 0.01967155) ( 0.01390988)
#fitdistr(sample_1000, "cauchy") # estimate paramters

#fitdistr(sample_1000, "chi-squared") # estimate paramters

fitdistr(sample_1000, "exponential") # estimate paramters
##       rate   
##   1.90780471 
##  (0.06033008)
#fitdistr(sample_1000, "f") # estimate paramters

suppressWarnings(suppressMessages(fitdistr(sample_1000, "geometric"))) # estimate paramters
##       prob   
##   0.65609795 
##  (0.01216707)
#fitdistr(sample_1000, "logistic") # estimate paramters

#fitdistr(sample_1000, "negative binomial") # estimate paramters

fitdistr(sample_1000, "normal") # estimate paramters
##       mean           sd     
##   0.524162664   0.244242458 
##  (0.007723625) (0.005461427)
suppressWarnings(suppressMessages(fitdistr(sample_1000, "Poisson")))# estimate paramters
##     lambda  
##   0.5241627 
##  (0.0228946)
#fitdistr(sample_1000, "weibull") # estimate paramters