Given two sets:
Identify the mean and standard deviation of the samples following these experiments:
Case 1 - Draw one and estimate mean and standard deviation
Single draws obtained from a set of equiprobable alternatives will result in a discrete uniform distribution (2 times a uniform distribution to be precise) for both samples.
\[ Distribution = 2\cdot unif[1,5] \] Mean and variance of such distribution can be calculated using the following:
Given \(X \sim Unif[1,5]\)
For a random variable sampled from a discrete uniform distribution X~U[a,b], the mean and variance are:
\[ E[X] = \frac{a+b}{2}\\ \qquad V[X] = \frac{(b-a+1)^2-1}{12} \]
Applying properties of expectations :
The mean: \(E[2X] = 2E[X]\)
2*(1+5)/2
## [1] 6
The variance: \(V[2X] = 4V[X]\)
sqrt(4*((5-1+1)^2-1)/12)
## [1] 2.828427
Check using a simulation of draw (100000 reps):
D1 <- replicate(100000, sample(x = s,size = 1,replace = F))
D1_2 <- replicate(100000, sample(x = s2, size = 1, replace = F))
ggplot(mapping=aes(D1)) + geom_bar(fill = "blue", alpha=0.8)
Not surprisingly, a uniform distribution.
Mean and standard deviation of simulation:
data.frame(Mean = c(mean(D1),mean(D1_2)), Standard_Deviation = c(sd(D1),sd(D1_2)), row.names = c("Draw 1 (Small Sample)", "Draw 1 (Large Sample)"))
## Mean Standard_Deviation
## Draw 1 (Small Sample) 5.99098 2.828420
## Draw 1 (Large Sample) 6.00204 2.829232
Case 2 - Draw 4 and sum together
This is an application of the Central Limit Theorem : under common condotions the sum of many i.i.d random variables will converge to a normal distribution.
For completeness, I want too add that this would depend on the problem interpretation: to satisfy the condition for CLT as stated above, the 4 cards have to be sampled with replacement to ensure the i.i.d property.
In this application, the sampling was done without replacement.
The unique combinations for both cases (small and large sample) are shown below:
(D4_comb = unique(colSums(combn(s,4))))
## [1] 12 14 16 18 20 22 24 26 28 30 32 34 36
(D4_comb_2 = unique(colSums(combn(s2,4))))
## [1] 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
My observations:
Simulation:
D4 <- replicate(100000,sum(sample(x = s, size = 4, replace = F)))
D4_2 <- replicate(100000,sum(sample(x = s2, size = 4, replace = F)))
ggplot() + geom_bar(mapping= aes(D4), fill = "blue", alpha = 0.5) +
geom_bar(mapping=aes(D4_2), fill = "orange", alpha = 0.5)
data.frame(Mean = c(mean(D4), mean(D4_2)), Standard_Deviation = c( sd(D4), sd(D4_2)),
row.names = c("Draw 4 (Small Sample)","Draw 4 (Large Sample)"))
## Mean Standard_Deviation
## Draw 4 (Small Sample) 24.01156 4.614473
## Draw 4 (Large Sample) 23.99860 5.189908
Case 3 - Draw 10 and sum together
For the small sample, which has 10 elements, the problem is trivial and leads to a mean of 60 and 0 standard deviation. For the larger sample, I assume we will see again a normal distribution spanning a larger range.
(D10_comb_2 <- unique(colSums(combn(s2,10))))
## [1] 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80
## [24] 82 84
Observations:
Simulation
D10_2 <- replicate(100000,sum(sample(x = s2, size = 10, replace = F)))
ggplot() + geom_bar(mapping= aes(D10_2), fill = "orange", alpha = 0.5)
data.frame(Mean = c(60, mean(D10_2)), Standard_Deviation = c(0, sd(D10_2)),
row.names = c("Draw 10 (Small Sample)","Draw 10 (Large Sample)"))
## Mean Standard_Deviation
## Draw 10 (Small Sample) 60.00000 0.000000
## Draw 10 (Large Sample) 60.05234 6.486901