\((−15.4,−8.8,8.2,3.4,−7.1,4.5,−12.7,5.2,−10.6,−11.2)\)
X<- c(-15.4, -8.8, 8.2, 3.4, -7.1, 4.5, -12.7, 5.2, -10.6, -11.2)
a <- ecdf(X)
prob <- a(X)
dt <- as.data.frame(rbind(X, prob))
# sort by x
#is.data.frame(dt)
#row.names(dt)
# sort by mydata variable
dt1 <- dt[,order(X)]
dt1 %>%
kbl() %>% kable_styling()
V1 | V7 | V10 | V9 | V2 | V5 | V4 | V6 | V8 | V3 | |
---|---|---|---|---|---|---|---|---|---|---|
X | -15.4 | -12.7 | -11.2 | -10.6 | -8.8 | -7.1 | 3.4 | 4.5 | 5.2 | 8.2 |
prob | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1.0 |
\((−15.4,−8.8,8.2,3.4,−7.1,4.5,−12.7,5.2,−10.6,−11.2)\)
plot(ecdf(X))
Hint: You can use qqnorm if you choose to compare the quantiles of observed sample with normal and qqplot if you choose some other distribution for comparison.
Hint: Look at the values and think of a distribution, whose support set would allow for the observed sample values. Most of the students would suggest normal distribution as starting point and if it doesn’t work then a second check would be probably t-distribution.
You don’t have to find the best fit, the objective of the task/problem is to teach you to compare observed sample quantiles with a known distribution’s quantiles. It is an important technique for exploring what distribution might fit well to the data one has observed.
set.seed(1234)
qqnorm(X)
abline(0,1)
qnormal <- qnorm(ppoints(10), mean = mean(X), sd = sd(X))
qqplot(x = qnormal,y = X,xlab = "theoretical quantiles", ylab = "Empirical quantiles",
main = "QQ plot: Empirical vs. theoretical quantiles ")
abline(0,1)
tdist <- qt(ppoints(10), df = 9)
qqplot(x = tdist,y = X,xlab = "T-distribution (df=9) Quantiles",ylab = "Empirical quantiles",
main = "QQ plot: Empirical vs theoretical t quantiles")
abline(0,1)
uniform <- qunif(ppoints(10), min = -16, max = 10)
qqplot(x = uniform,y = X, xlab = "Uniform (-16, 10) Quantiles",
ylab = "Empirical quantiles",
main = "QQ plot: Empirical vs. Uniform quantiles")
abline(0,1)
The mean of the data is -4.45 and the standard deviation is 8.769676. Note that none of the values in the sample really lie close to the mean, and variance is way higher than the mean of the data. Above, we made QQ plots of the empirical quantiles vs. theoretical quantiles for the following distributions: (1) standard normal, (2) normal with mean -4.45 and sd 8.769676, (3) T-distribution (df = 9), and (4) uniform distribution ranging from -16 to 10.
Based on the plots, the sample is definitely not from a (1) standard normal or a (3) T-distribution (df = 9) which makes sense given that the sample takes on values such as -15.4 and 8.2 that have an extremely low probability of being sampled. However, out of the remaining two distributions, the (2) normal with mean -4.45 and sd 8.769676 and the (4) uniform distribution ranging from -16 to 10, neither appear to be a good fit to the sample data. Regardless, the (2) normal with mean -4.45 and sd 8.769676 appears to fit better othan all the others.
hist(X, probability=TRUE,
main="Histogram with superimposed N(mean = -4.45, sd = 8.769676)")
curve(dnorm(x, mean(X), sd(X)),
add=TRUE, col='red', lwd=2) # theoretical density
lines(density(X), col='green', lwd=2) # empirical density