A probability question

Given two sets:

Small Sample : {2,2,4,4,6,6,8,8,10,10}
Large Sample : {2,2,2,2,4,4,4,4,6,6,6,6,8,8,8,8,10,10,10,10}

Identify the mean and standard deviation of the samples following these experiments:

Case 1 - Draw one and estimate mean and standard deviation

Single draws obtained from a set of equiprobable alternatives will result in a discrete uniform distribution (2 times a uniform distribution to be precise) for both samples.

\[ Distribution = 2\cdot unif[1,5] \] Mean and variance of such distribution can be calculated using the following:

Given \(X \sim Unif[1,5]\)

For a random variable sampled from a discrete uniform distribution X~U[a,b], the mean and variance are:

\[ E[X] = \frac{a+b}{2}\\ \qquad V[X] = \frac{(b-a+1)^2-1}{12} \]

Applying properties of expectations :

The mean: \(E[2X] = 2E[X]\)

2*(1+5)/2

## [1] 6

The variance: \(V[2X] = 4V[X]\)

sqrt(4*((5-1+1)^2-1)/12)

## [1] 2.828427

Check using a simulation of draw (100000 reps):

D1 <- replicate(100000, sample(x = s,size = 1,replace = F))
D1_2 <- replicate(100000, sample(x = s2, size = 1, replace = F))
ggplot(mapping=aes(D1)) + geom_bar(fill = "blue", alpha=0.8)

Not surprisingly, a uniform distribution.

Mean and standard deviation of simulation:

data.frame(Mean = c(mean(D1),mean(D1_2)), Standard_Deviation = c(sd(D1),sd(D1_2)), row.names = c("Draw 1 (Small Sample)", "Draw 1 (Large Sample)"))

##                          Mean Standard_Deviation
## Draw 1 (Small Sample) 5.99098           2.828420
## Draw 1 (Large Sample) 6.00204           2.829232

Case 2 - Draw 4 and sum together

This is an application of the Central Limit Theorem : under common condotions the sum of many i.i.d random variables will converge to a normal distribution.
For completeness, I want too add that this would depend on the problem interpretation: to satisfy the condition for CLT as stated above, the 4 cards have to be sampled with replacement to ensure the i.i.d property.

In this application, the sampling was done without replacement.

The unique combinations for both cases (small and large sample) are shown below:

(D4_comb = unique(colSums(combn(s,4))))

##  [1] 12 14 16 18 20 22 24 26 28 30 32 34 36

(D4_comb_2 = unique(colSums(combn(s2,4))))

##  [1]  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

My observations:

Since both are normal distributions, the median of the series will be the mean - namely 24.
The larger sample produces a longer string of unique combinations, thus I assume larger standard distribution for it.

Simulation:

D4 <- replicate(100000,sum(sample(x = s, size = 4, replace = F)))
D4_2 <- replicate(100000,sum(sample(x = s2, size = 4, replace = F)))
ggplot() + geom_bar(mapping= aes(D4), fill = "blue", alpha = 0.5) + 
  geom_bar(mapping=aes(D4_2), fill = "orange", alpha = 0.5)

data.frame(Mean =  c(mean(D4), mean(D4_2)), Standard_Deviation = c( sd(D4), sd(D4_2)), 
           row.names = c("Draw 4 (Small Sample)","Draw 4 (Large Sample)"))

##                           Mean Standard_Deviation
## Draw 4 (Small Sample) 24.01156           4.614473
## Draw 4 (Large Sample) 23.99860           5.189908

Case 3 - Draw 10 and sum together

For the small sample, which has 10 elements, the problem is trivial and leads to a mean of 60 and 0 standard deviation. For the larger sample, I assume we will see again a normal distribution spanning a larger range.

(D10_comb_2 <- unique(colSums(combn(s2,10))))

##  [1] 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80
## [24] 82 84

Observations:

The mean is 60, since it equals the median of the range.
The standard deviations should be higher than both cases of drawing 4 cards, just judging on range length.

Simulation

D10_2 <- replicate(100000,sum(sample(x = s2, size = 10, replace = F)))
ggplot() + geom_bar(mapping= aes(D10_2), fill = "orange", alpha = 0.5)

data.frame(Mean =  c(60, mean(D10_2)), Standard_Deviation = c(0, sd(D10_2)), 
           row.names = c("Draw 10 (Small Sample)","Draw 10 (Large Sample)"))

##                            Mean Standard_Deviation
## Draw 10 (Small Sample) 60.00000           0.000000
## Draw 10 (Large Sample) 60.05234           6.486901

A probability question

Orest Alickolli

11/16/2017