MATH 4140/5140

# MATH 4140/5140 <br />
## Lecture 1: Introduction and Review
### Ziwei Ma
### UTC
### updated: 2022-01-08

---

---

## What is statistics?

- infrerence from data
- building models of the world
- optimal prediction

## What is "mathematical" statistics? (as opposed to "regular" statistics)

- The goal is 
  - to develop mathematical theory (theorems, approximations,
etc.) 
  - for why statistical procedures work (or not)... this, in turn, 
  - leads to better procedures.

*The material that we’ll go over this semester mostly dates back to early
part of last century. *
---

---

## Probability basics

## Decision theory

## Parameter estimation

## Hypothesis testing

---

# Probability basics

- Fundamental idea: probability distributions, parameterized by some finite number of parameters, serve as good models for observed data.

- We’ll introduce a small zoo of useful probability distributions, and talk about some useful limit theorems, the jewels of probability theory, which permit us to make some extremely useful asymptotic simplifications of the theory, in the limit of lots of data.

---

# Decision theory

- how to behave optimally under uncertainty; this will provide us with a framework for deciding which statistical procedures are “best,” or at least better than others.

---

# Parameter estimation (main body)

- Statistical inference corresponds to an inverse problem: given data,
we want to answer questions about the “true” underlying state of the
world, e.g., the true parameter indexing the distribution that gave rise
to our observed data. Estimation is about choosing between a continuum of possible parameters.

- How to decide between different “estimators” (that is, rules for estimating parameters
from data).

- The concept of “sufficiency,” or “data reduction”: that is, how to decide which aspects of the data matter for inference, and which aspects can be safely ignored.

- Asymptomatic properties 
  - **consistent** 
  - **efficient** 
---

# Hypothesis Testing

- Testing and estimation are two sides of the same coin. Whereas estimation was about choosing one parameter from many, testing is about dividing the parameters into two sets, then deciding between these groups. So testing could be considered a special case of estimation, but it’s special enough (and comes up often enough in practice) to warrant its own discussion and techniques.

---
class: inverse, center, mid
# Probability theory review

---
# Probability theory review

> "It is remarkable that a science which began with the consideration of games
of chance should have become the most important object of human knowledge. <br>
...The most important questions of life are indeed, for the most part, really
only problems of probability.  <br>
...The theory of probabilities is at bottom nothing but common sense reduced to calculus."

> <footer>Laplace, Theorie analytique des probabilites, 1820</footer>
---

# Basics

- Sample space `$\Omega$`
- Events 
- Probability 
  - the average frequency of the event occurring
  - one’s belief that the event will happen
- A probability function is a scalar function on events if the function satisfies three conditions:
  - **positivity: ** `$P(A)\geq 0$`
  - **normalization: ** `$P(\Omega) = 1$`
  - **additivity: ** `$A_{i} \cap A_{j}=\varnothing, \, \forall i \neq j \Longrightarrow P\left(\cup_{i} A_{i}\right)=\sum_{i} P\left(A_{i}\right)$`
- Some important implications
  - `$P\left(A^{c}\right)=1-P(A)$`
  - `$P(\varnothing)=0$`
  - `$P(A \cup B)=P(A)+P(B)-P(A \cup B)$`
---

# Conditional probability

- Assume `$P(A)>0$`, we define the "conditional" probability:
$$
P(B \mid A)= \frac{P(A \cap B)}{ P(A)}
$$
  - This is basically just a redefinition of our original probability function, but we’ve now restricted our attention to the set `$A$`. (Note that if `$A$` and `$B$` are disjoint, then `$P(B|A) = 0$` no matter how big `$P(B)$` is.)
  - `$P(B)$` is the probability of `$B$` before seeing `$A$`, so it’s often called the
“prior” probability of `$B$`. For similar reasons, `$P(B|A)$` is called the “posterior”
probability of `$B$` given `$A$`.
  - Another form of the above equation 
  $$
P(B \mid A) P(A)=P(A \cap B)=P(A \mid B) P(B)
$$

---
# Conditional probability (cont.)

- Calculations with conditional probabilities can be counterintuitive at first; hence it’s important to actually get some practice on your own.

**Example** Imagine a disease occurs with 0.1 % frequency in the population.
Now let’s say there’s a blood test that comes back positive with 99 % probability
if the disease is present (i.e., 99 % correct), but 2 % correct if not (i.e., 2 %
false alarm rate). What is the conditional probability that someone has the
disease, given a positive test result?

---

# Independence

> "Independence may be considered the single most important concept in probability theory, demarcating the latter from measure theory and fostering and independent development. In the course of this evolution, probability theory has been fortified by its links with the real world, and indeed the definition of independence is the abstract counterpart of a highly intuitive and empirical notion."

> <footer>Chow and Teicher, <br> Probability Theory, Independence Interchangeablity Martingales 1978</footer>

---

# Independence

- We say two events `$A$` and `$B$` are "independent" if seeing `$A$` tells us nothing about `$B$`, and vice versa. To be precisely, assume `$P(A)>0$`, `$P(B)>0$`, then `$A$` and `$B$` are independent events if 
`$$P(A|B)=P(A)$$`

---

# Random variables

In many cases, we are interested in a function of the sample space.

**Random variable** is a function on a sample space. We'll be dealing with a lot of random variables this semester, se we'll use the abbreviation "r.v."

- There are two types of r.v. in general, 
  - **discrete** 
  - **continuous**

---

# Discrete r.v.

- Example of discrete r.v.
  - values of coin toss
  - rolling a die, lotto numbers
  - the first head in a sequential flipping coins
  
- Two important functions associated with a discrete r.v.
    - the probability mass function (pmf): 
    $$
p(u)=P(X(\omega)=u)
$$
    - the cumulative distribution function (cdf):
    $$
F(u)=\sum_{i \leq u} P(i)=P(X(\omega) \leq u)
$$

---

# Continuous r.v.

- Example of continuous r.v.
  - the angle at which a merry-go-round comes to rest
  - the interval of time between the arrival of two buses
  - the temperature at 3 pm tomorrow

- Two important functions associated with a continuous r.v.
  - The cumulative distribution function can be defined as above:
  $$
F(u)=P(X(\omega) \leq u) = \int_{-\infty}^u f(x)dx
$$
    - Note that this is a monotonically increasing, continuous function. 
    - A basic formula 
  $$
P(a<X(\omega)<b)=F(b)-F(a)
$$
  - The probability density function is a positive function, i.e. `$f(u)>0$` such that 
    - `$\int_{-\infty}^{\infty} f(u) d u=1$`
    - `$P(X \in A)=\int_{A} f(u) d u$`
  
  -

---