Project

The number of customers visiting a department store per day is known to have a Poisson distribution with parameter \(\theta\). Suppose that the number of visitors in the first week was known to be 500, however the number of visitors to the store in the following week was not recorded. We wish to estimate the second week total visitors treating this total as missing, but incorporated into the estimation procedure for \(\theta\).

Describe a Bayes procedure for doing this using Gibbs sampling with \(\theta|\lambda \sim \exp(\lambda)\) and \(\lambda \sim \Gamma(2,4)\).
Using squared error loss, what is the expected number of visitors to the store in week two.
Estimate the number of visitors in week two using loss function \(\theta^3(\theta - d)^2\).
Estimate the variance of the these estimates.
Estimate the probability that that there were more visitors to the store in the second week, based on the two loss functions.

Based on the information given, we know that \({X \sim P_o(\theta)}\), where \(X\) is the number of customers visiting a department store per day. To examine the number of visitors per week, we can let \(Y= \sum_{i=1}^7 X_i\), where \(Y \sim P_o (7 \theta)\).

Calculation of Fully Conditionals

As previously stated, \(\theta \mid \lambda \sim \exp(\lambda)\) and \(\lambda \sim \Gamma (2,4)\).

Knowing this information, we can seek to calculate our fully conditionals, which are the distributions from which we will be sampling our \(\theta\)’s and \(\lambda\)’s:

\(\tau (\theta, \lambda|y) \propto f(y|\theta) \tau (\theta|\lambda) \tau(\lambda)\)
\(\propto \frac {(7\theta)^y e^{-7\theta}} {y!} \lambda e^{-\lambda \theta} \lambda^{2-1} e^{-4\lambda}\)

\(\tau (-|y) \propto \frac {(7\theta)^y e^{-7\theta}} {y!} \lambda e^{-\lambda \theta} \lambda e^{-4\lambda}\) (***)

Here, we work this expression to find the fully conditional of \(\theta\):

\(\tau (\theta|-) \propto (7\theta)^y e^{-7\theta} e^ {-\lambda\theta}\)
\(=7^y \theta^y e^{-7\theta} e^{-\lambda \theta}\)
\(\propto \theta^y e ^{-(\lambda +7)\theta}\) \(=\theta^{(y+1)-1} e^{-(\lambda +7)\theta}\)

Given the above, we now see that \(\pmb{\tau(\theta|-) \sim \Gamma (y+1, \lambda +7)}\).

Calculation of Fully Conditionals (contd.)

Now, we work the same *** expression in the previous slide to find the fully conditional for \(\lambda\):

\(\tau (\lambda \mid - )\propto \lambda e^{-\lambda \theta} \lambda e^{-4\lambda}\)
\(=\lambda^2 e^{-(\theta+4)\lambda}\)
\(=\lambda^{3-1} e^{-(\theta+4)\lambda}\)

We see that \(\pmb{\tau(\lambda|-) \sim \Gamma (3, \theta +4)}\).

With these fully conditionals, we can now initialize \(\lambda^{(0)}\):

\(\theta^{(1)} \mid - \sim \Gamma (y+1, \lambda^{(0)}+7)\)

\(\lambda^{(1)} \mid - \sim \Gamma (3, \theta^{(1)} + 4)\)

From here, we can use \(\lambda^{(0)}\) to generate \(\theta^{(1)}\), then use \(\theta^{(1)}\) to generate \(\lambda^{(1)}\), and so on.

Gibbs Sampling in R

We can now use R to run our Gibbs sampling process. We set \(Y\) as the number of visitors in the first week (known), \(k\) as the number of total iterations to be completed, and \(T\) as the number of initial iterations to be disregarded. b_mat is our matrix consisting of two columns and 20000 rows for our \(\theta\)’s and \(\lambda\)’s generated. At the moment, it is full of null values. However, as we run the Gibbs sampling procedure, the matrix will populate with our data sampled from the fully conditionals.

library(pander)
set.seed(8670)
Y<-500

k<-20000
T<-10000 #burn-in
l<-0.5

b_mat<-matrix(data = (k*2)*NA, nrow = k, ncol = 2)
colnames(b_mat)<-c("theta", "lambda"); pander(head(b_mat)) #first few rows

theta	lambda
NA	NA
NA	NA
NA	NA
NA	NA
NA	NA
NA	NA

Gibbs Sampling in R (contd.)

With our b_mat ready to take in data, we can now set up a for-loop that will allow for iterative sampling of our \(\theta\)’s and \(\lambda\)’s. As each is sampled, it will populate b_mat:

for (i in 1:k) {
  t<-rgamma(1, shape = Y+1, rate = l+7) 
  l<-rgamma(1, shape = 3, rate = t+4)
  
  b_mat[i, ] <- c(t,l)
}
pander(head(b_mat))

theta	lambda
65.27	0.05358
69.85	0.04669
72.56	0.0413
72.38	0.01803
71.58	0.02149
68.47	0.03736

And thus completes the Bayes procedure of Gibbs sampling. We now have our 20000 samples of both our \(\theta\)’s and \(\lambda\)’s. With this complete, we can now use the squared error loss function to approximate the number of visitors in the second week.

Trace plot

par(mfrow=c(2,1))
trace.plot(data.frame(t(b_mat)))

The trace plot shows the sampled values of a parameter over time and helps to judge how quickly the MCMC procedure converges in distribution—that is, how quickly it forgets its starting values.
The plot shown above is quite ideal. It exhibits rapid up-and-down variation with no long-term trends or drifts

Approx. Number of Visitors in Week 2

We can now use loss functions to approximate the number of visitors in week 2. We know that if we use squared error loss, where \(L(\theta,d)=(\theta-d)^2\), the Bayes estimator converges to the MLE of \(\theta\), which is \(\hat{\theta}=\bar{Y}\). We first remove the first 10000 \(\theta\)’s and \(\lambda\)’s (iter_t and iter_l), leaving the latter 10000 samples for each. We write a for-loop to sample from our iter_t and iter_l vectors with a lag of 10 samples to fill two initially blank vectors samp_t and samp_l. And from our 1000 samples of \(\theta\) in samp_t, we calculate a Bayes estimate of week two’s visitor count.

iter_t<-b_mat[, 1] [-(1:T)] #removing first 10000 iterations as burn-in
iter_l<-b_mat[ ,2] [-(1:T)]

samp_t<-c()
samp_l<-c()

for (i in 1:1000) {
  samp_t[i]<-iter_t[10*i] #1000 samples, lagged by 10 samples
  samp_l[i]<-iter_l[10*i]
}

y_week2<-rpois(1, 7*mean(samp_t)); y_week2

## [1] 451

Approx. Number of Visitors in Week 2 (contd.)

However, let’s say we want to use loss function \(L(\theta,d)=\theta^3(\theta-d)^2\). Here, we need to calculate a weighted Bayes estimate using the equation: \(\frac{E[\theta w(\theta)|Y=y]}{E[w(\theta)|Y=y]}\), where \(w(\theta)=\theta^3\) in this case. We then utilize the weighted Bayes equation:

\(\frac{E[\theta w(\theta)|Y=y]}{E[w(\theta)|Y=y]} = \frac{E[\theta*\theta^3|Y=y]}{E[\theta^3|Y=y]}\), where the terms in the numerator and denominator are represented by \(\Sigma\pmb{\theta}^4/length(\pmb{\theta})\) and \(\Sigma\pmb{\theta}^3/length(\pmb{\theta})\) respectively.

w_bays<-function(vec){
  (sum(vec^4)/length(vec))/(sum(vec^3)/length(vec))
}

w_y_week2<-rpois(1, 7*w_bays(samp_t)); w_y_week2

## [1] 490

We see a higher weighted Bayes estimate due to more weight given to the higher values of theta. In other words, because of the nature of \(\theta^3\) in the loss function, higher values have more bearing on the estimate here, whereas the smaller values will have a relatively smaller impact on the estimate than in the squared error loss estimate.

Variance Estimation

est_var_t<-(sum(samp_t^2)/length(samp_t)) - (mean(samp_t))^2; est_var_t

## [1] 9.826178

est_var_l<-(sum(samp_l^2)/length(samp_l)) - (mean(samp_l))^2; est_var_l

## [1] 0.0005338739

We can compare these to using the var() function in R:

var(samp_t)

## [1] 9.836014

var(samp_l)

## [1] 0.0005344084

We see that our estimated variances are very similar to what R reports our variances to be.

Number of Visitors in Week 2 > 500?

Unweighted

ppois(500, 7*mean(samp_t), lower.tail=FALSE)

## [1] 0.4525651

Weighted

ppois(500, 7*w_bays(samp_t), lower.tail=FALSE)

## [1] 0.5042535

Conclusion

By running a Bayesian Gibbs sampling procedure using what we already know about the number of department store visitors in the first week, we essentially were able to forecast a count of visitors for the following week. With more prior information (i.e. the counts of visitors over the past 12 weeks), we could make a more solid prediction about the visitor count in the next week. However, this project shows that even with minimal information (the visitor count for only one week), we can still make a prediction about the visitor count in the second week, giving us a glimpse into forecasting and proving the power of Bayesian inference.

Bayesian Inference

Final Project