Load required packages.
options(repos=c(getOption('repos'), rethinking='http://xcelab.net/R'))
library( tidyverse )
library( tidybayes )
library( tidyr )
library( rethinking )
library( gridExtra )
Set up workspace, i.e., remove all existing data from working memory, initialize the random number generator, turn of scientific notation of large numbers, set a standard theme for plotting.
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)
rm(list=ls())
set.seed(42)
options(scipen=10000)
## Warning in options(scipen = 10000): invalid 'scipen' 10000, used 9999
You are tasked with estimating the average height of adult males in a country. You collect a small sample of 3 adult males and measure their heights:
The goal is to estimate the population’s average height based on this small sample.
Using the frequentist approach, you estimate the population mean by calculating the sample mean.
## [1] 192
The frequentist estimate of the population mean is 192 cm.
Small sample size: The sample consists of only 3 individuals, which may not be representative of the whole population.
No prior knowledge: The frequentist approach does not incorporate any prior knowledge about human height (e.g., that the global average is around 175 cm). It treats this small sample as fully representative of the population.
Because the sample size is very small, this estimate could easily be biased. It may not provide a realistic estimate of the true average height of the population.
In the Bayesian approach, we can incorporate prior knowledge (such as global height averages) along with the observed data to get a more reasonable estimate.
Let’s start by defining a prior distribution for the average height of adult males. Based on global data, we assume that the average height is approximately 175 cm with a standard deviation of 5 cm.
This prior reflects our belief that the average height is likely around 175 cm, but with some uncertainty.
The data we have are the heights of 3 individuals (195 cm, 192 cm, and 189 cm). This gives us the likelihood of observing the data, which we will use to update our prior belief.
Using Bayes’ Theorem, we can now update our prior belief based on the observed data. The posterior distribution combines our prior and the sample data to give us an updated estimate of the population mean.
## [1] 187.75
## [1] 2.5
The posterior mean is approximately 187.75 cm, and the posterior standard deviation is around 2.5 cm.
The posterior mean of 187.75 cm gives us a more reasonable estimate than the frequentist mean of 180 cm. It accounts for both the prior knowledge about typical human heights and the new data we collected. Additionally, the posterior standard deviation of 2.5 cm quantifies our uncertainty around this estimate.
Prior Probability
Bayesian methods start with a prior – an initial belief
or distribution about the parameter before seeing the data.
Likelihood
The likelihood function measures how well the data fits different
parameter values, given a specific model.
Posterior Probability
Using Bayes’ Theorem, we combine the prior and the likelihood to compute
the posterior, the updated belief about the parameter
after observing the data:
\[P(\theta|D) = \frac{P(D|\theta) P(\theta)}{P(D)}\]
This allows us to continuously update our understanding of the parameter as new data arrives.
Frequentist Thinking:
The test has a 99% sensitivity, so a positive result strongly suggests
the presence of the disease.
Frequentist Limitation:
Frequentist methods cannot formally account for how rare the disease is
(i.e., the base rate of 0.1%). It simply focuses on the
test’s properties (sensitivity and specificity).
\[P(\text{Disease} | \text{Positive}) = \frac{P(\text{Positive} | \text{Disease}) \cdot P(\text{Disease})}{P(\text{Positive})}\]
Where:
The prior probability is
\[P(\text{Disease}) = 0.001\]
The likelihood is:
\[P(\text{Positive} | \text{Disease}) = 0.99\] The standardizing factor accounts for true positives and false positives:
\[P(\text{Positive}) = P(\text{Positive} | \text{Disease}) \cdot P(\text{Disease})+P(\text{Positive} | \text{No Disease}) \cdot P(\text{No Disease})\] Plug in values
\[P(\text{Positive}) = (0.99 \cdot 0.001) + (0.05 \cdot 0.999) = 0.00099 + 0.04995 = 0.05094\]
\[P(\text{Disease} | \text{Positive}) = \frac{0.99 \cdot 0.001}{0.05094} = 0.0194\]
After applying Bayes’ theorem, the probability of actually having the disease is only 1.94%, despite testing positive!
Lack of Prior Knowledge
Frequentist methods do not incorporate prior information or expert
knowledge. Every analysis starts “from scratch,” which can be
inefficient, especially in cases where historical data or prior beliefs
could improve estimates.
P-Value Misinterpretation
Frequentist inference relies heavily on p-values for
hypothesis testing. However, p-values are often misinterpreted. They
only tell us how extreme the data is under a null hypothesis but do not
provide the probability that a hypothesis is true or false.
Confidence Intervals
Frequentist confidence intervals are often misunderstood. A 95%
confidence interval does not mean there is a 95% chance the true
parameter lies within the interval; rather, it means that in 95% of
repeated samples, the interval will contain the true parameter, assuming
the model is correct.
Limited in Small Sample Sizes
Frequentist methods tend to require large sample sizes to provide
reliable estimates. With small datasets, the conclusions can be
unstable, whereas Bayesian methods often provide more robust estimates
by incorporating prior information.
In 1990, Andreas Brehme scored the game-deciding penalty kick in the 89th minute of the World Cup final.
Let’s assume, we want to study the proportion of goals scored in penalty kicks (i.e., the probability of a goal).
From previous trials, we have observed the following sequence: G M G G G M G M G, i.e., 6 out of 9 (G = goal; M = miss).
How plausible is it to observe such a sequence?
Simple example (4 penalty trials): We want to know a players true performance (goal probability), after observing a sequence of trials. How plausible is it this player scores 0, 1, 2, 3, or 4 goals out of 4 trials?
The Forking Garden Here, a player has 4 penalty trials.
After 3 trials, we observe 2 goals and 1 miss.
(Simplified assumption: penalty trials are independent, “drawing with replacement”)
What is the most likely conjecture (1-5) given the observed data? (blue = goal; white = miss)
Let’s start with conjecture 2.
Each circle marks a trail.
For conjecture 2, we count the paths/ways being consistent with the data. Here, there are 3 ways.
(Given the observed sequence, it is impossible that a player missed 3 penalties.)
Now, we do it for all conjectures, and add the ways that might have produced it (i.e., we multiply). That’s it.
Finally, we can calculate the plausibility of each event. Remember, plausibility is probability: set of non-negative real numbers that sum to one.
Belief updating: When
a player scores another goal in a penalty trial (3 out of 4), we can
simple update our belief.
Bayesian Data Analysis For each possible explanation of the sample, count all the ways the sample could happen. Explanations with more ways to produce the sample are more plausible.
We create a dataset (tibble) with penalty trails, now with 9 trials, where “G” = goal and “M” is a miss.
We use the sequence of penalty trials to decide among different possible proportions of goals/misses. Each proportion may be more or less plausible, given the evidence (G M G G G M G M G).
## # A tibble: 9 × 1
## trial
## <chr>
## 1 G
## 2 M
## 3 G
## 4 G
## 5 G
## 6 M
## 7 G
## 8 M
## 9 G
Let’s first have a look at belief updating in the context of a penalty shootout.
Therefore, we create a figure. (Figure 2.5 from the book; don’t worry, you don’t need to understand the code yet.).
(The dashed line represents t-1; the solid line t´.)
Recall the sequence: G M G G G M G M G
In each new penalty trial, the plausibility of the previous trial becomes the prior of the current trial.
We initially (n=0; no penalty trial) assign the same plausibility to each possible value of p, i.e., goal probability.
For trial: n = 1: G: p = 0 is impossible n = 2: M: p = 0.5 is most plausible n = 3: G: p > 0.5 more plausible
Each new sample updates our prior belief. - Data order is NOT relevant - Every posterior is a prior for next observation - Every prior is posterior of some other inference - Sample size is embodied in posterior
Let’s build a model. How? 1. Design the model (data story: how is the data distributed?): e.g., height = normal; penalty (yes/no) = binomial; number of goals = Poisson 2. Condition on data (update) 3. Evaluation the model (critique)
We need: 1. Variables: observed (such as the counts of G and M) and unobserved such as p (which we usually call parameters that need to be inferred from other variables). 2. Likelihood: a function assigned to an observed variable.
What’s the relative number of ways to see goals (G) for given N=3 and p?
Sample sequence: G M G
It is the probability of scoring (p) x not scoring (1-p) x scoring (p)
\[ p \cdot (1-p) \cdot p \]
More general:
\[ p^G \cdot (1-p)^M \]
Now, we normalize this to probability.
We define how plausible any combination of G and M would be, for a specific value of p.
Here, we use the binomial distribution: The counts of “goal” G and “miss” M are distributed binomially, with probability p of “goal” on each trial (N).
\[ \text{Posterior probability of}\ p = \frac{(G+M+1)!}{G!M!} p^G (1 - p)^{M} \] This term makes sure that probabilities add up to 1.
\[ \frac{(G+M+1)!}{G!M!} \]
The binomial distribution is implemented in R. Given a probability of .5 (i.e., goals and misses are equally likely), the binomial likelihood of scoring 6 goals out of 9 trials is:
G, N, p
in R: x, size, prob
## [1] 0.1640625
Interpretation: Given a true goal probability of 50%, the likelihood of scoring in 6 out of 9 trails is… (see a above)
Recall, the sequence of events doesn’t matter!
Exercise > How does the likelihood change, when we assume that the true probablity of goals is higher (say 60%)? > How does the likelihood change, when we assume that Andreas Brehme scored in all of his last 9 trials? > What is the likelihood if Brehme scored in all of his trials and the probability of scoring a goal is 100%? > What is the likelihood if Brehme scored in 8 his 9 trials and the probability of scoring a goal is 100%?
Now let’s plot the binomial likelihood. Which probability values are most plausible?
## # A tibble: 101 × 1
## probabilities
## <dbl>
## 1 0
## 2 0.01
## 3 0.02
## 4 0.03
## 5 0.04
## 6 0.05
## 7 0.06
## 8 0.07
## 9 0.08
## 10 0.09
## # ℹ 91 more rows
We see, a goal probability of around .65 is most plausible.
Exercise > Change the number of goals (x) and see what happens to the distribution. > Change the number of trials (size) and see what happens to the distribution.
The observed variables G and M are given relative counts through the binomial distribution.
\[ G \sim Binomial(N,p) \\ p \sim Uniform(0,1) \] How to read it: - First line: Scoring a goal follows a binomial distribution with a given N (number of trails), and an unknown probability p. - Second line: Probability p has a uniform prior distribution (each value is equally plausible).
N = G + M p is the unobserved parameter
Excursus: The distribution zoo: https://ben18785.shinyapps.io/distribution-zoo/
\[ \text{Posterior distribution}=\frac{(\text{Probability of the observed data}) \cdot (\text{Prior})}{\text{Average probability of the data}} \]
(Average probability of the data is often called the “evidence” or the “average likelihood” and we’ll get a sense of what that means as we go along)
Using Bayes’ theorem in the context of scoring goals:
\[ \operatorname{Pr}(p | G, M) = \frac{\operatorname{Pr}(G, M | p) \cdot \operatorname{Pr}(p )}{\operatorname{Pr}(G, M)}. \]
Let’s use the formula to explain the figure from above (solid lines = posterior): (Sequence: G M G G G M G M G) Posterior: Finding most plausible values
A note on different distributions Let’s check different combinations of priors and likelihood. (Creating Figure 2.6: prior, likelihood, and posterior)
The posterior distribution is a product of the prior distribution and the likelihood: - Top: a flat prior is proportional to likelihood. - Middle: a step prior, zero probability to all values below 0.5, results in truncated posterior. - Bottom: a peaked prior shifts and skews the posterior.
We condition the prior on the data. However, there are different ways to do it (i.e., different conditioning techniques). - Grid approximation - Quadratic approximation - Markov chain Monte Carlo (MCMC)
Here, we consider only a finite grid of parameter values.
How to do it (see code example below): 1. Define the grid: how many data points to use in estimating the posterior 2. Compute the value of the prior at each parameter value on the grid 3. Compute the likelihood at each parameter value. 4. Compute the unstandardized posterior at each parameter value, by multiplying the prior by the likelihood 5. Standardize the posterior, by dividing each value by the sum of all values
Here, we use grid approximation for our penalty example. p.grid = goal probability prior = prior probability (here, ground truth) of scoring a penalty likelihoods = plausibility of seeing such a sequence
\[ \operatorname{Posterior} = \frac{\operatorname{Likelihood} \cdot \operatorname{Prior}}{\operatorname{Standardizing factor}} = \frac{\operatorname{Pr}(G, M | p) \cdot \operatorname{Pr}(p )}{\operatorname{Pr}(G, M)}. \]
## # A tibble: 10 × 5
## p_grid prior likelihood unstd_posterior posterior
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0.5 0 0 0
## 2 0.111 0.5 0.000111 0.0000555 0.000123
## 3 0.222 0.5 0.00476 0.00238 0.00528
## 4 0.333 0.5 0.0341 0.0171 0.0379
## 5 0.444 0.5 0.111 0.0555 0.123
## 6 0.556 0.5 0.217 0.108 0.241
## 7 0.667 0.5 0.273 0.137 0.303
## 8 0.778 0.5 0.204 0.102 0.227
## 9 0.889 0.5 0.0568 0.0284 0.0631
## 10 1 0.5 0 0 0
(The standardizing factor makes sure that the posterior adds up to 1. It shows the most plausible value for a given p in the grid.)
Plot this data.
Exercise: > What happens if you create a sparser grid (i.e., say, 5 points)) > What happens if you create a denser grid (i.e., say, 1000 points))
It assumes that the shape of the distribution near the peak is Gaussian (normal). This is convenient as a Gaussian distribution can be described by its location (i.e, the mean) and its spread (i.e., variance).
## mean sd 5.5% 94.5%
## p 0.6666664 0.1571338 0.4155362 0.9177966
Interpretation: assuming the posterior is Gaussian, it is maximized at 0.67, and its standard deviation is 0.16.
Now, let’s plot the posterior using different N’s. (Comparing grid posterior – black – and quadratic approximation posterior – grey)
Interpretation: with more observations we are more certain in our posterior.
Suppose the penalty kick data had turned out to be 8 goals in 15 trials. Construct the posterior distribution, using grid approximation. Use the same flat prior as before.
## # A tibble: 20 × 5
## p_grid prior likelihood unstd_posterior posterior
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0.5 0 0 0
## 2 0.0526 0.5 0.000000260 0.000000130 0.000000219
## 3 0.105 0.5 0.0000445 0.0000223 0.0000375
## 4 0.158 0.5 0.000747 0.000373 0.000629
## 5 0.211 0.5 0.00475 0.00237 0.00400
## 6 0.263 0.5 0.0175 0.00873 0.0147
## 7 0.316 0.5 0.0447 0.0223 0.0376
## 8 0.368 0.5 0.0876 0.0438 0.0737
## 9 0.421 0.5 0.139 0.0693 0.117
## 10 0.474 0.5 0.182 0.0912 0.154
## 11 0.526 0.5 0.203 0.101 0.171
## 12 0.579 0.5 0.191 0.0953 0.160
## 13 0.632 0.5 0.150 0.0751 0.126
## 14 0.684 0.5 0.0968 0.0484 0.0815
## 15 0.737 0.5 0.0489 0.0244 0.0412
## 16 0.789 0.5 0.0178 0.00890 0.0150
## 17 0.842 0.5 0.00398 0.00199 0.00335
## 18 0.895 0.5 0.000378 0.000189 0.000319
## 19 0.947 0.5 0.00000467 0.00000234 0.00000393
## 20 1 0.5 0 0 0
Sampling: draw samples (i.e., observations) from a distribution/bucket (with replacement).
Why? In many situation, working with the full posterior is infeasable.
Here, we work with samples from the posterior, which helps us to (1) visualize uncertainty, (2) compute credible intervals, and (3) simulate observations. From a mathematical perspective, it is hence easier to work with samples.
Note: some procedures to compute the posterior, like MCMC, produce nothing but samples.
Procedure: 1. We compute or approximate the posterior 2. We sample with replacement from posterior 3. Compute stuff from samples
Here we use grid approximation to generate samples for our penalty example.
## # A tibble: 1,000 × 4
## p_grid prior likelihood posterior
## <dbl> <dbl> <dbl> <dbl>
## 1 0 0.5 0 0
## 2 0.00100 0.5 8.43e-17 8.43e-19
## 3 0.00200 0.5 5.38e-15 5.38e-17
## 4 0.00300 0.5 6.11e-14 6.11e-16
## 5 0.00400 0.5 3.42e-13 3.42e-15
## 6 0.00501 0.5 1.30e-12 1.30e-14
## 7 0.00601 0.5 3.87e-12 3.88e-14
## 8 0.00701 0.5 9.73e-12 9.74e-14
## 9 0.00801 0.5 2.16e-11 2.16e-13
## 10 0.00901 0.5 4.37e-11 4.38e-13
## # ℹ 990 more rows
Let’ create some samples from the posterior. It’s like drawing from a bucket, where some values are more likely (i.e., see code: weight = posterior)
## Rows: 10,000
## Columns: 4
## $ p_grid <dbl> 0.5805806, 0.7537538, 0.4704705, 0.6726727, 0.6016016, 0.66…
## $ prior <dbl> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,…
## $ likelihood <dbl> 0.23735584, 0.23001972, 0.13525211, 0.27292844, 0.25181752,…
## $ posterior <dbl> 0.0023759343, 0.0023024997, 0.0013538750, 0.0027320164, 0.0…
Now, we plot the samples. - The first plot shows the goal probability for each sample. - The second plot flips the first plot by 90 degree and shows the density.
The estimated density is very similar to ideal posterior that we computed via grid approximation. This is a crudely replicated version the posterior density.
“Once your model produces a posterior distribution, the model’s work is done. But your work has just begun. It is necessary to summarize and interpret the posterior distribution. Exactly now it is summarized depends upon your purpose” (p. 53).
Some sample questions (summary tasks): - How much posterior probability is below/above/between specified parameters? - Which parameter values contain 50%/80%/95% of the posterior probability (“confidence” intervals) - Which parameter values maximizes posterior probability? (point estimates)
Let’s start.
What is the likelihood for a goal probability less than 50%?
(Here you find more information on “summarise”: https://r4ds.had.co.nz/transform.html#grouped-summaries-with-summarise)
## # A tibble: 1 × 1
## sum
## <dbl>
## 1 0.172
What is the likelihood for a goal probability of more than 50% but less than 75%?
## # A tibble: 1 × 1
## sum
## <dbl>
## 1 0.604
Next, we plot the previous “questions”?
Here, we plot the lower and middle 80% quantile.
What is the value for the middle 80% quantile of goal probability?
## # A tibble: 1 × 6
## p_grid .lower .upper .width .point .interval
## <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 0.646 0.450 0.810 0.8 median qi
Let’s introduce two common types of intervals. - Percentile intervals (PI): equal area in each tail - Highest posterior density interval (HPDI): narrowest interval containing mass (always includes the highest point)
Let’s compare PI and HPDI.
Percentage intervals: What are the value for the middle 50%, 80%, and 99% quantile of goal probability?
## # A tibble: 3 × 6
## p_grid .lower .upper .width .point .interval
## <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 0.646 0.544 0.739 0.5 median qi
## 2 0.646 0.450 0.810 0.8 median qi
## 3 0.646 0.273 0.918 0.99 median qi
HPDI: What is the 50% highest posterior density interval for goal probability (HPDIs)?
## # A tibble: 1 × 6
## p_grid .lower .upper .width .point .interval
## <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 0.672 0.571 0.764 0.5 mode hdi
Here, we are interested in values for central tendency (mean, mode, median).
In tidybayes, this is done straight forward, using the following syntax: tendency_interval()
For example, this is how we can get the mean.
## # A tibble: 1 × 6
## p_grid .lower .upper .width .point .interval
## <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 0.636 0.363 0.883 0.95 mean hdi
## # A tibble: 1 × 6
## p_grid .lower .upper .width .point .interval
## <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 0.636 0.349 0.874 0.95 mean qi
## [1] 0.6724069
Let’s check mean and median.
## # A tibble: 1 × 2
## mean median
## <dbl> <dbl>
## 1 0.636 0.646
Why?
First, we create some dummy data from the binomial likelihood, with 2 trials (we use dbinom).
Given a true goal probability of 0.7 and 2 trials, what’s the likelihood of seeing 0, 1, or 2 goals?
\[\operatorname{Pr} (G|n, p) = \frac{n!}{G!(n - G)!} p^G (1 - p)^{n - G}.\] Letting n = 2, p(G) = .7, and G_observed = 0 through 2, the densities are:
## # A tibble: 3 × 4
## n probability G_observed likelihood
## <dbl> <dbl> <int> <dbl>
## 1 2 0.7 0 0.09
## 2 2 0.7 1 0.42
## 3 2 0.7 2 0.49
There is a 9% chance of seeing G = 0,…
Now, let’s sample from this distribution. We could also use n_samples from above. But R has a build-in sampling function (rbinom)
We generate a number of random dummy observations that we expect to appear in proportion to its likelihood.
## # A tibble: 3 × 3
## # Groups: draws [3]
## draws n goal_propability
## <int> <int> <dbl>
## 1 0 85 0.085
## 2 1 420 0.42
## 3 2 495 0.495
Let’s plot n = 10 penalty trials (we assume that the true goal probability is 0.7)
Exercise: > How does the distribution change if you change n, size, or prob?
Suppose the penalty kick data had turned out to be 4 goals in 15 trials. Construct the posterior distribution, using grid approximation. Use the same flat prior as before. (you can copy the code from above)
Draw 10,000 samples from the grid approximation from above. Then use the samples to calculate the 90% HPDI for p.
Construct a posterior predictive check for this model and data.
## # A tibble: 20 × 5
## p_grid prior likelihood unstd_posterior posterior
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 1 0 0 0
## 2 0.0526 1 5.78e- 3 5.78e- 3 4.87e- 3
## 3 0.105 1 4.93e- 2 4.93e- 2 4.15e- 2
## 4 0.158 1 1.28e- 1 1.28e- 1 1.08e- 1
## 5 0.211 1 1.99e- 1 1.99e- 1 1.68e- 1
## 6 0.263 1 2.28e- 1 2.28e- 1 1.92e- 1
## 7 0.316 1 2.09e- 1 2.09e- 1 1.76e- 1
## 8 0.368 1 1.60e- 1 1.60e- 1 1.35e- 1
## 9 0.421 1 1.05e- 1 1.05e- 1 8.85e- 2
## 10 0.474 1 5.90e- 2 5.90e- 2 4.97e- 2
## 11 0.526 1 2.82e- 2 2.82e- 2 2.38e- 2
## 12 0.579 1 1.13e- 2 1.13e- 2 9.52e- 3
## 13 0.632 1 3.69e- 3 3.69e- 3 3.10e- 3
## 14 0.684 1 9.32e- 4 9.32e- 4 7.85e- 4
## 15 0.737 1 1.69e- 4 1.69e- 4 1.42e- 4
## 16 0.789 1 1.91e- 5 1.91e- 5 1.61e- 5
## 17 0.842 1 1.04e- 6 1.04e- 6 8.79e- 7
## 18 0.895 1 1.54e- 8 1.54e- 8 1.30e- 8
## 19 0.947 1 9.44e-12 9.44e-12 7.95e-12
## 20 1 1 0 0 0
## # A tibble: 10,000 × 4
## p_grid prior likelihood posterior
## <dbl> <dbl> <dbl> <dbl>
## 1 0 1 0 0
## 2 0.000100 1 1.36e-13 2.18e-16
## 3 0.000200 1 2.18e-12 3.49e-15
## 4 0.000300 1 1.10e-11 1.76e-14
## 5 0.000400 1 3.48e-11 5.57e-14
## 6 0.000500 1 8.49e-11 1.36e-13
## 7 0.000600 1 1.76e-10 2.81e-13
## 8 0.000700 1 3.25e-10 5.21e-13
## 9 0.000800 1 5.54e-10 8.87e-13
## 10 0.000900 1 8.87e-10 1.42e-12
## # ℹ 9,990 more rows
## # A tibble: 1 × 6
## p_grid .lower .upper .width .point .interval
## <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 0.275 0.124 0.475 0.9 mode hdi