Lecture 2

Econometrics is statistical inference for economics. Probability theory (Ch 2) is the foundation of statistics (Ch 3), and statistics is the foundation of statistical inference (Ch 4 and further).

Probability Theory

Variables vs Random variables

In economics, we regard the world as random and everything that happens in the world as a random process or random variable.

A variable is an object with an unknown but fixed value. For example:

I have a 6-year contract with the university which states what my income is going to be during this period. Then my income is a fixed value for 6 years.

To you, my income is a variable because, although this value is fixed, you do not know what it is.
To me, my income is not a variable because I know its value.

\(2x=5\)

\(x\) is a variable because its fixed but unknown. Once we solve this equation we see that the value of \(x\) is \(5/2\). At this point, \(x\) is not a variable anymore, but just a known number.

“I am thinking of a natural number between \(1\) and \(3\) inclusive.”

To you, this number is a variable because its value is fixed but unknown. This value can be \(1\), \(2\), or \(3\).
To me this number is a value because I know which number I am thinking of.

A random variable is a variable with a value that is unknown and not fixed. For example:

My income at the end of my 6-year contract. Or my income 10 years from now.
The number that comes out after rolling a die.

Outcomes

An outcome is the actual value taken by a random variable. For example, if we roll a die and we get a \(6\), then \(6\) is called the outcome of the random variable called “number of a die roll”.

Notation We use capital letters for random variables and lower case letters for their outcomes. For example, \(X\) is “number of a die roll” and, if we see a \(6\) after we roll the die, then \(x=6\) is the outcome.

Discrete vs continuous random variables

Discrete random variables are those whose outcomes take on a discrete set of values. If \(X\) is a discrete random variable then we can list its outcomes as \(x_1, x_2,...\). This list does not necessarily have to be finite. What matters is that we can have a list of numbers (natural numbers or integers).

Continuous random variables take on a continuum of possible values. That is, their outcomes can be any number on the real line, and they cannot be listed. For example, you cannot possibly list every real number between 0 and 1. Other examples are distance, age, height, temperature, time, income. The values of these random variables can be recorded on a discrete scale (as natural numbers) but conceptually they can be any real number.

Note: In set theory, cardinality refers to the total number of elements in a set. Cardinality can be finite (a number less than infinity) or infinite. For example, the set {1,2,3} has cardinality = \(3\) which is a finite number, while the set \([0,1]\) has infinite cardinality. Continuous random variables take values in a set with infinite cardinality, while discrete random variables take values in a set with finite cardinality.

Sample space

The sample space is the set of all possible outcomes of a random variable. It is usually denoted as \(\Omega\). For example:

\(X\)=“number on the face of a die”. Then \(\Omega_1\) = {1,2,3,4,5,6}.
\(X\)=“your letter grade for this course”. Then \(\Omega_2\) = {A,B,C,D,F}.
\(X\)=“salary for your first job”. Then \(\Omega_3=\mathcal{R_+}\) (positive real line, including zero).

An event is a subset of the sample space.

\(A\) = “I roll an odd number” so \(A\)={1,3,5} \(\subset\Omega_1\)
\(A\) = “you pass this course” so \(A\)=? \(\subset\Omega_2\). Another example could be “your grade is at least a B.”
Give an example of A\(\subset\Omega_3\)

Probability

Every possible outcome of a random variable has a probability. The probability of an outcome tells us how likely it is for that outcome to happen. \(P(X=x)\) tells us how likely it is for the random variable X to equal x.

Probability distribution

Discrete Random Variable

For a discrete random variable, the probability distribution is a list of all the different values that X can take (so it is the set of all possible outcomes, \(\Omega\)), along with the probability that X assumes each of those values. This list is summarized by a function called probability mass function or pmf. The notation for the pmf is \(P(X=x)\).

For example, take a coin and toss it three times. We are interested in X = the total number of heads that we get in three tosses of the coin. Our sample space is \(\Omega\) = {0,1,2,3} and the associated probability for each value is \[ P(X=0) = P(TTT) = P(T)^3 = (\frac{1}{2})^3\] What is the pmf for X?

Pmfs must satisfy the following properties: \[ 0 \leq P(X=x_i) \leq 1\] \[ \sum\limits_{i=1}^k P(X=x_i) =1\]

The cumulative distribution function or CDF is \(P(X \leq x_i)\).

For the coin toss example, the probability of getting two or fewer heads on three tosses is \[ P(X \leq 2) = P(X=0 \lor X=1 \lor X=2) = P(X=0) + P(X=1) + P(X=2) = 1/8 + 3/8 + 3/8 \]

Properties of CDF: \[0 \leq P(X \leq x_i) \leq 1\] \[P(X \leq x_i) =1\] \[P(X=x_i) = P(X \leq x_i) - P(X \leq x_{i-1})\]

For the coin toss example then we have that \[ P(X = 2) = P(X \leq 2) - P(X \leq 1) = 7/8 -4/8 \]

Properties of probability

All probabilities are numbers between 0 and 1, i.e. \[P(X=x) \in [0,1]\]
\(P(X=x)=0\) means that the event “X=x” never occurs.

If the sample space of a random variable X is \(\Omega\) = {1,2,3}, what is \(P(X=5)\)?

\(P(\Omega)=1\)
If X and Y are mutually exclusive random variables (i.e. they cannot happen at the same time), then \[P(X=x \land Y=y)=0\] However \[P(X=x \lor Y=y) = P(X=x) + P(Y=y)\]
If X and Y are mutually exhaustive (mutually exclusive and no other outcomes are possible), then \[P(X=x \lor Y=y) = P(X=x) + P(Y=y) = 1\] where the first equality follows by X and Y being mutually exclusive, and the second by X and Y being collectively exhaustive.

Take a coin and toss it. X = you observe heads, and Y = you observe tails. \(P(X=x \lor Y = y) =\) ?

Examples of discrete distributions Bernoulli, Binomial.

Continuous Random Variable

The probability distribution function, pdf, tells us the likelihood that the outcome of X falls within an interval, call it [a,b], i.e. \(P(a \leq X \leq b)\). The biggest difference with discrete RV is that \(P(X=x) = 0\) when X is a continuous RV.

Examples of continuous distributions Normal, Exponential, Uniform.

We will skip this part since you are not required to have taken integration for this course, and working with continuous random variables involves integration. Know, though, that if you will continue taking any other metrics course, you will have to be familiar with integration since continuous random variables are fundamental to economics (i.e. money and time are continuous random variables!)

Mathematics of Expectations

Describing Random Variables

Pmfs and CDFs give us all the information we need about a random variable. However, we are usually interested just in a summary contained in the CDF rather than the entire CDF. Common summary measures are:

Measures of central tendency: mean, median, and mode
Measures of dispersion: range, variance, and standard deviation.

Measures of central tendency: The mean or expected value

The expected value of a discrete random variable X is \[E(X) = \sum\limits_{i=1}^k x_i P(X=x_i)\] where remember that \(\sum\limits_{i=1}^k P(X=x_i) = 1\). The mean is a weighted average of the different values x with the associated probabilities being the weights.

Roll a die. Then \[E(X)= 1P(X=1) + 2P(X=2)+...+6P(X=6)= (1+2+...+6)(1/6)\]

The mean of a random variable is a number.

Properties of expectation

Expectation of a constant is the constant itself, that is \[ E(a) = a\]

Let’s see why. A constant is a random variable that always takes the same value – there is just one element in the sample space, so \(\Omega\) = {a}. This means that \(P(X=a)=1\). If we look at the formula for \(E(X)\) we then know that \(k=1\) and that the only outcome is \(x_1=a\). Using this information we can write: \[E(X) = \sum\limits_{i=1}^1 x_i P(X=x_i) = aP(X=a)= a*1 = a\]

Expectation is a linear operator. Let a and b be constants, and X and Y be random variables. Then \[E(a+bX) = E(a) + E(bX) = a + bE(X)\] \[E(X+Y) = E(X) + E(Y) \]

Suppose that \(Y=2+3X\) where X is a random variable. Then \(E(Y)=2+3E(X)\).

For any function \(g(X)\) we have that \[ E(g(X)) = \sum\limits_{i=1}^k g(x_i) P(X=x_i)\]

For example, if \(g(X)=X^2\) then \[ E(g(X)) = \sum\limits_{i=1}^k x_i^2 P(X=x_i)\]

Measures of central tendency: The median

The median is the middle of the distribution. Let \(m\) denote the median, then half of the values of X are larger than \(m\) and half are smaller than \(m\), i.e. \[P(X < m) \leq \frac{1}{2}\] and \[P(X > m) \leq \frac{1}{2}\].

If X takes values 1, 2, 3, 4, the median is \(\frac{2+3}{2}\).

Measures of central tendency: The mode

The mode is the value of X associated with the largest probability, i.e. it is the most likely outcomes.

Measures of central tendency: which one?

It depends. For historical purposes, economists work with expectations. However, in the last 10 years of so, people have started to become interested in medians and quantiles, and work has shifted towards that (mathematically, working with the median is more complicated because the median is not linear like the expectation).

One thing to remember is that the mean is sensitive to outliers, while the median is not.

Measures of dispersion: The range

The range is the difference between the largest and the smallest possible values of a random variable. This measure is very sensitive to the values of X since it uses only 2 values. It can be misleading because of this.

Measures of dispersion: The variance

A larger variance means that the RV is likely to take a wide range of values. A smaller variance means that the values are closer together.

\[\sigma_x^2 = var(X)=E(X-E(X))^2 = E(X^2)-(E(X))^2\]

Is it true that \[E(X - E(X))^2 = [E(X-E(X))]^2\] Why or why not?

The variance of a constant is 0.

Show this. That is, show that if \(P(X=a)=1\) then \(var(X)=0\).

\(var(a+bX)=b^2 var(X)\)

Show this.

Measures of dispersion: The standard deviation

One problem of variance is that it does not have the same units of measurement as X. If X is measure in dollars, the variance of X will be measured in dollars squared. To revert to the original units of measurement, we take square root of var(X) and get the standard deviation: \[ \sqrt var(X) = \sigma_x\]