Mahbubul Hasan and Tristan Barton
5/3/2021
The number of customers visiting a department store per day is known to have a Poisson distribution with parameter \(\theta\). Suppose that the number of visitors in the first week was known to be 500, however the number of visitors to the store in the following week was not recorded. We wish to estimate the second week total visitors treating this total as missing, but incorporated into the estimation procedure for \(\theta\).
Based on the information given, we know that \({X \sim P_o(\theta)}\), where \(X\) is the number of customers visiting a department store per day. To examine the number of visitors per week, we can let \(Y= \sum_{i=1}^7 X_i\), where \(Y \sim P_o (7 \theta)\).
As previously stated, \(\theta \mid \lambda \sim \exp(\lambda)\) and \(\lambda \sim \Gamma (2,4)\).
Knowing this information, we can seek to calculate our fully conditionals, which are the distributions from which we will be sampling our \(\theta\)’s and \(\lambda\)’s:
\(\tau (\theta, \lambda|y) \propto f(y|\theta) \tau (\theta|\lambda) \tau(\lambda)\)
\(\propto \frac {(7\theta)^y e^{-7\theta}} {y!} \lambda e^{-\lambda \theta} \lambda^{2-1} e^{-4\lambda}\)
\(\tau (-|y) \propto \frac {(7\theta)^y e^{-7\theta}} {y!} \lambda e^{-\lambda \theta} \lambda e^{-4\lambda}\) (***)
Here, we work this expression to find the fully conditional of \(\theta\):
\(\tau (\theta|-) \propto (7\theta)^y e^{-7\theta} e^ {-\lambda\theta}\)
\(=7^y \theta^y e^{-7\theta} e^{-\lambda \theta}\)
\(\propto \theta^y e ^{-(\lambda +7)\theta}\) \(=\theta^{(y+1)-1} e^{-(\lambda +7)\theta}\)
Given the above, we now see that \(\pmb{\tau(\theta|-) \sim \Gamma (y+1, \lambda +7)}\).
Now, we work the same *** expression in the previous slide to find the fully conditional for \(\lambda\):
\(\tau (\lambda \mid - )\propto \lambda e^{-\lambda \theta} \lambda e^{-4\lambda}\)
\(=\lambda^2 e^{-(\theta+4)\lambda}\)
\(=\lambda^{3-1} e^{-(\theta+4)\lambda}\)
We see that \(\pmb{\tau(\lambda|-) \sim \Gamma (3, \theta +4)}\).
With these fully conditionals, we can now initialize \(\lambda^{(0)}\):
\(\theta^{(1)} \mid - \sim \Gamma (y+1, \lambda^{(0)}+7)\)
\(\lambda^{(1)} \mid - \sim \Gamma (3, \theta^{(1)} + 4)\)
From here, we can use \(\lambda^{(0)}\) to generate \(\theta^{(1)}\), then use \(\theta^{(1)}\) to generate \(\lambda^{(1)}\), and so on.
We can now use R to run our Gibbs sampling process. We set \(Y\) as the number of visitors in the first week (known), \(k\) as the number of total iterations to be completed, and \(T\) as the number of initial iterations to be disregarded. b_mat is our matrix consisting of two columns and 20000 rows for our \(\theta\)’s and \(\lambda\)’s generated. At the moment, it is full of null values. However, as we run the Gibbs sampling procedure, the matrix will populate with our data sampled from the fully conditionals.
library(pander)
set.seed(8670)
Y<-500
k<-20000
T<-10000 #burn-in
l<-0.5
b_mat<-matrix(data = (k*2)*NA, nrow = k, ncol = 2)
colnames(b_mat)<-c("theta", "lambda"); pander(head(b_mat)) #first few rows| theta | lambda |
|---|---|
| NA | NA |
| NA | NA |
| NA | NA |
| NA | NA |
| NA | NA |
| NA | NA |
With our b_mat ready to take in data, we can now set up a for-loop that will allow for iterative sampling of our \(\theta\)’s and \(\lambda\)’s. As each is sampled, it will populate b_mat:
for (i in 1:k) {
t<-rgamma(1, shape = Y+1, rate = l+7)
l<-rgamma(1, shape = 3, rate = t+4)
b_mat[i, ] <- c(t,l)
}
pander(head(b_mat))| theta | lambda |
|---|---|
| 65.27 | 0.05358 |
| 69.85 | 0.04669 |
| 72.56 | 0.0413 |
| 72.38 | 0.01803 |
| 71.58 | 0.02149 |
| 68.47 | 0.03736 |
And thus completes the Bayes procedure of Gibbs sampling. We now have our 20000 samples of both our \(\theta\)’s and \(\lambda\)’s. With this complete, we can now use the squared error loss function to approximate the number of visitors in the second week.
We can now use loss functions to approximate the number of visitors in week 2. We know that if we use squared error loss, where \(L(\theta,d)=(\theta-d)^2\), the Bayes estimator converges to the MLE of \(\theta\), which is \(\hat{\theta}=\bar{Y}\). We first remove the first 10000 \(\theta\)’s and \(\lambda\)’s (iter_t and iter_l), leaving the latter 10000 samples for each. We write a for-loop to sample from our iter_t and iter_l vectors with a lag of 10 samples to fill two initially blank vectors samp_t and samp_l. And from our 1000 samples of \(\theta\) in samp_t, we calculate a Bayes estimate of week two’s visitor count.
iter_t<-b_mat[, 1] [-(1:T)] #removing first 10000 iterations as burn-in
iter_l<-b_mat[ ,2] [-(1:T)]
samp_t<-c()
samp_l<-c()
for (i in 1:1000) {
samp_t[i]<-iter_t[10*i] #1000 samples, lagged by 10 samples
samp_l[i]<-iter_l[10*i]
}
y_week2<-rpois(1, 7*mean(samp_t)); y_week2## [1] 451
However, let’s say we want to use loss function \(L(\theta,d)=\theta^3(\theta-d)^2\). Here, we need to calculate a weighted Bayes estimate using the equation: \(\frac{E[\theta w(\theta)|Y=y]}{E[w(\theta)|Y=y]}\), where \(w(\theta)=\theta^3\) in this case. We then utilize the weighted Bayes equation:
\(\frac{E[\theta w(\theta)|Y=y]}{E[w(\theta)|Y=y]} = \frac{E[\theta*\theta^3|Y=y]}{E[\theta^3|Y=y]}\), where the terms in the numerator and denominator are represented by \(\Sigma\pmb{\theta}^4/length(\pmb{\theta})\) and \(\Sigma\pmb{\theta}^3/length(\pmb{\theta})\) respectively.
w_bays<-function(vec){
(sum(vec^4)/length(vec))/(sum(vec^3)/length(vec))
}
w_y_week2<-rpois(1, 7*w_bays(samp_t)); w_y_week2## [1] 490
We see a higher weighted Bayes estimate due to more weight given to the higher values of theta. In other words, because of the nature of \(\theta^3\) in the loss function, higher values have more bearing on the estimate here, whereas the smaller values will have a relatively smaller impact on the estimate than in the squared error loss estimate.
## [1] 9.826178
## [1] 0.0005338739
We can compare these to using the var() function in R:
## [1] 9.836014
## [1] 0.0005344084
We see that our estimated variances are very similar to what R reports our variances to be.
Unweighted
## [1] 0.4525651
Weighted
## [1] 0.5042535
By running a Bayesian Gibbs sampling procedure using what we already know about the number of department store visitors in the first week, we essentially were able to forecast a count of visitors for the following week. With more prior information (i.e. the counts of visitors over the past 12 weeks), we could make a more solid prediction about the visitor count in the next week. However, this project shows that even with minimal information (the visitor count for only one week), we can still make a prediction about the visitor count in the second week, giving us a glimpse into forecasting and proving the power of Bayesian inference.