ECDFLab

Dont forget to Load Packages

Step1: Find the empirical distribution function of the following sample:

\((−15.4,−8.8,8.2,3.4,−7.1,4.5,−12.7,5.2,−10.6,−11.2)\)

X<- c(-15.4, -8.8, 8.2, 3.4, -7.1, 4.5, -12.7, 5.2, -10.6, -11.2) 
a <- ecdf(X)
prob <- a(X)

dt <- as.data.frame(rbind(X, prob))
# sort by x
#is.data.frame(dt)
#row.names(dt)
# sort by mydata variable

dt1 <- dt[,order(X)]

dt1 %>%
kbl() %>% kable_styling()

	V1	V7	V10	V9	V2	V5	V4	V6	V8	V3
X	-15.4	-12.7	-11.2	-10.6	-8.8	-7.1	3.4	4.5	5.2	8.2
prob	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9	1.0

Step2: Graph the empirical distribution function of the following sample:

\((−15.4,−8.8,8.2,3.4,−7.1,4.5,−12.7,5.2,−10.6,−11.2)\)

plot(ecdf(X))

Step3: Suggest a distribution that you might want to use to draw inference from the observed sample. Use quantile plots to compare the empirical quantiles with theoretical.

Hint: You can use qqnorm if you choose to compare the quantiles of observed sample with normal and qqplot if you choose some other distribution for comparison.

Hint: Look at the values and think of a distribution, whose support set would allow for the observed sample values. Most of the students would suggest normal distribution as starting point and if it doesn’t work then a second check would be probably t-distribution.

You don’t have to find the best fit, the objective of the task/problem is to teach you to compare observed sample quantiles with a known distribution’s quantiles. It is an important technique for exploring what distribution might fit well to the data one has observed.

Start with Standard Normal Distribution

set.seed(1234)

qqnorm(X)
abline(0,1)

Now try another normal distribution with same mean and variance as data

qnormal <- qnorm(ppoints(10), mean = mean(X), sd = sd(X)) 

qqplot(x = qnormal,y = X,xlab = "theoretical quantiles", ylab = "Empirical quantiles",
main = "QQ plot: Empirical vs. theoretical quantiles ")
abline(0,1)

Next try a student’s T-distribution

tdist <- qt(ppoints(10), df = 9) 
qqplot(x = tdist,y = X,xlab = "T-distribution (df=9) Quantiles",ylab = "Empirical quantiles",
main = "QQ plot: Empirical vs theoretical t quantiles")
abline(0,1)

Try a uniform distribution

uniform <- qunif(ppoints(10), min = -16, max = 10) 
qqplot(x = uniform,y = X, xlab = "Uniform (-16, 10) Quantiles",
ylab = "Empirical quantiles",
main = "QQ plot: Empirical vs. Uniform quantiles")
abline(0,1)

The mean of the data is -4.45 and the standard deviation is 8.769676. Note that none of the values in the sample really lie close to the mean, and variance is way higher than the mean of the data. Above, we made QQ plots of the empirical quantiles vs. theoretical quantiles for the following distributions: (1) standard normal, (2) normal with mean -4.45 and sd 8.769676, (3) T-distribution (df = 9), and (4) uniform distribution ranging from -16 to 10.

Based on the plots, the sample is definitely not from a (1) standard normal or a (3) T-distribution (df = 9) which makes sense given that the sample takes on values such as -15.4 and 8.2 that have an extremely low probability of being sampled. However, out of the remaining two distributions, the (2) normal with mean -4.45 and sd 8.769676 and the (4) uniform distribution ranging from -16 to 10, neither appear to be a good fit to the sample data. Regardless, the (2) normal with mean -4.45 and sd 8.769676 appears to fit better othan all the others.

Step4: Finally, superimposes empirical and theoretical density (empirical pdf not empirical cdf) on the histogram of the data, see example code below from the ecdf lab example.

hist(X, probability=TRUE,
main="Histogram with superimposed N(mean = -4.45, sd = 8.769676)")

curve(dnorm(x, mean(X), sd(X)),
add=TRUE, col='red', lwd=2) # theoretical density
lines(density(X), col='green', lwd=2) # empirical density