Econometrics is statistical inference for economics. Probability theory (Ch 2) is the foundation of statistics (Ch 3), and statistics is the foundation of statistical inference (Ch 4 and further).
In economics, we regard the world as random and everything that happens in the world as a random process or random variable.
A variable is an object with an unknown but fixed value. For example:
To you, my income is a variable because, although this value is fixed, you do not know what it is.
To me, my income is not a variable because I know its value.
To you, this number is a variable because its value is fixed but unknown. This value can be \(1\), \(2\), or \(3\).
To me this number is a value because I know which number I am thinking of.
A random variable is a variable with a value that is unknown and not fixed. For example:
An outcome is the actual value taken by a random variable. For example, if we roll a die and we get a \(6\), then \(6\) is called the outcome of the random variable called “number of a die roll”.
Notation We use capital letters for random variables and lower case letters for their outcomes. For example, \(X\) is “number of a die roll” and, if we see a \(6\) after we roll the die, then \(x=6\) is the outcome.
Discrete random variables are those whose outcomes take on a discrete set of values. If \(X\) is a discrete random variable then we can list its outcomes as \(x_1, x_2,...\). This list does not necessarily have to be finite. What matters is that we can have a list of numbers (natural numbers or integers).
Continuous random variables take on a continuum of possible values. That is, their outcomes can be any number on the real line, and they cannot be listed. For example, you cannot possibly list every real number between 0 and 1. Other examples are distance, age, height, temperature, time, income. The values of these random variables can be recorded on a discrete scale (as natural numbers) but conceptually they can be any real number.
Note: In set theory, cardinality refers to the total number of elements in a set. Cardinality can be finite (a number less than infinity) or infinite. For example, the set {1,2,3} has cardinality = \(3\) which is a finite number, while the set \([0,1]\) has infinite cardinality. Continuous random variables take values in a set with infinite cardinality, while discrete random variables take values in a set with finite cardinality.
The sample space is the set of all possible outcomes of a random variable. It is usually denoted as \(\Omega\). For example:
An event is a subset of the sample space.
Every possible outcome of a random variable has a probability. The probability of an outcome tells us how likely it is for that outcome to happen. \(P(X=x)\) tells us how likely it is for the random variable X to equal x.
For a discrete random variable, the probability distribution is a list of all the different values that X can take (so it is the set of all possible outcomes, \(\Omega\)), along with the probability that X assumes each of those values. This list is summarized by a function called probability mass function or pmf. The notation for the pmf is \(P(X=x)\).
Pmfs must satisfy the following properties: \[ 0 \leq P(X=x_i) \leq 1\] \[ \sum\limits_{i=1}^k P(X=x_i) =1\]
The cumulative distribution function or CDF is \(P(X \leq x_i)\).
Properties of CDF: \[0 \leq P(X \leq x_i) \leq 1\] \[P(X \leq x_i) =1\] \[P(X=x_i) = P(X \leq x_i) - P(X \leq x_{i-1})\]
Examples of discrete distributions Bernoulli, Binomial.
The probability distribution function, pdf, tells us the likelihood that the outcome of X falls within an interval, call it [a,b], i.e. \(P(a \leq X \leq b)\). The biggest difference with discrete RV is that \(P(X=x) = 0\) when X is a continuous RV.
Examples of continuous distributions Normal, Exponential, Uniform.
We will skip this part since you are not required to have taken integration for this course, and working with continuous random variables involves integration. Know, though, that if you will continue taking any other metrics course, you will have to be familiar with integration since continuous random variables are fundamental to economics (i.e. money and time are continuous random variables!)
Pmfs and CDFs give us all the information we need about a random variable. However, we are usually interested just in a summary contained in the CDF rather than the entire CDF. Common summary measures are:
The expected value of a discrete random variable X is \[E(X) = \sum\limits_{i=1}^k x_i P(X=x_i)\] where remember that \(\sum\limits_{i=1}^k P(X=x_i) = 1\). The mean is a weighted average of the different values x with the associated probabilities being the weights.
The mean of a random variable is a number.
Let’s see why. A constant is a random variable that always takes the same value – there is just one element in the sample space, so \(\Omega\) = {a}. This means that \(P(X=a)=1\). If we look at the formula for \(E(X)\) we then know that \(k=1\) and that the only outcome is \(x_1=a\). Using this information we can write: \[E(X) = \sum\limits_{i=1}^1 x_i P(X=x_i) = aP(X=a)= a*1 = a\]
The median is the middle of the distribution. Let \(m\) denote the median, then half of the values of X are larger than \(m\) and half are smaller than \(m\), i.e. \[P(X < m) \leq \frac{1}{2}\] and \[P(X > m) \leq \frac{1}{2}\].
The mode is the value of X associated with the largest probability, i.e. it is the most likely outcomes.
It depends. For historical purposes, economists work with expectations. However, in the last 10 years of so, people have started to become interested in medians and quantiles, and work has shifted towards that (mathematically, working with the median is more complicated because the median is not linear like the expectation).
One thing to remember is that the mean is sensitive to outliers, while the median is not.
The range is the difference between the largest and the smallest possible values of a random variable. This measure is very sensitive to the values of X since it uses only 2 values. It can be misleading because of this.
A larger variance means that the RV is likely to take a wide range of values. A smaller variance means that the values are closer together.
\[\sigma_x^2 = var(X)=E(X-E(X))^2 = E(X^2)-(E(X))^2\]
The variance of a constant is 0.
\(var(a+bX)=b^2 var(X)\)
One problem of variance is that it does not have the same units of measurement as X. If X is measure in dollars, the variance of X will be measured in dollars squared. To revert to the original units of measurement, we take square root of var(X) and get the standard deviation: \[ \sqrt var(X) = \sigma_x\]