Econ 837, Lecture 01: Measure Theory

Some Basics of Probability

The crucial notion in economics is that of randomness, which is the idea that an event can occur but it may not.

Events are of two types: deterministic and random. Deterministic events are those that occur for certain. For example, if you drop an object, it will hit the floor. No matter how many times you repeat the experiment of dropping an object, the outcome of the experiment is the same: the object will hit the floor. But some phenomena produce uncertain outcomes even when they are repeated under the same conditions, e.g. tossing a coin may or may not result in half of the outcomes being heads and half being tails, rolling a die, drawing a ball from an urn.

While a deterministic phenomenon can be described by its single outcome, a phenomenon with several possible outcomes can be described as a random experiment, i.e. an experiment with several outcomes with probabilities associated to each of these outcomes.

Probabilities can be interpreted in a couple of ways: frequentist and subjective (Bayesian).

The frequentist interpretation is that the probability of an outcome is the relative frequency of occurrence of the outcome when the experiment is repeated a large number of times. For example, the relative frequency of heads in a large number of tosses of a fair coin is \(\frac{1}{2}\). The problem with this interpretation is that many experiments to which we want to ascribe probabilities for their outcomes cannot be repeated. For example, any treatment or policy implemented at the country, city, village level cannot be run multiple times on the same individuals, which means that we won’t be able to all possible outcomes of the experiment.

The Bayesian interpretation is that the probability of an outcome is ascribed by one’s knowledge or by a general agreement among experts. The problem here is that there many not be one single probability (each expert has his/her own) or there may not be an agreement among the experts.

The solution was to build a theory of probability based on axioms without linking it to any of the two interpretations. These axioms are be consistent with both interpretations. The branch of mathematics that formalizes this is called measure theory (MT) (developed by Kolmogorov around \(1930\)).

MT is useful for many reasons (see below), but to give an example that connects it directly to (undergrad) probability theory consider the following example. Suppose we are tossing a fair coin an infinite number of times. Consider now the following two events:

E1: The proportion of heads in \(j\) tosses is at least \(\frac{9}{10}\) where \(j=1,2,...,m\) for some finite \(m\).
E2: The proportion of heads in \(j\) tosses is at least \(\frac{9}{10}\) where \(j=1,2,...\).

Notice the difference: E1 is determined by a finite number of tosses, while E2 is determined by an infinite number of tosses. To be able to compute \(P(E1)\) we can just count the number of sequences where the proportion of heads to tails is \(\frac{9}{10}\) in \(m\) tosses and divide that by the total number of possible sequences in \(m\) tosses, i.e. \(P(E1) =\) (number of events in E1)\(/2^m\). But to be able to compute \(P(E2)\) we need machinery that is a bit more sophisticated than this.

Axioms

Before motivating MT in more detail, consider what these axioms of probability are.

Let \(\Omega\) denote the sample space, i.e. the set containing all possible outcomes of a random experiment. The sample space can be finite (a finite number of outcomes) or infinite. E.g. pick a number from the set \(\{1,2,3\}\) vs pick a number from the interval \([1,3]\). An event is a subset of the sample space, e.g. pick a number from the set \(\{1,2,3\}\) that is larger than \(1\) and smaller than \(3\). Given the sample space, we can assign probabilities to events, and a complete specification of this assignment yields the probability distribution.

The probability distribution is a set function, call it \(P(.)\), defined on subsets of the sample space, i.e. \(P:\) {subsets of \(\Omega\)} \(\rightarrow S\), where \(S\) is a subset of \(\mathcal{R}\) which must satisfy the following axioms:

(Axiom 1) \(P(A) \geq 0\) for all \(A \subset \Omega\)
(Axiom 2) \(P(\Omega) = 1\)
(Axiom 3) If \(A_1, A_2,...\) are countable disjoint events in \(\Omega\) (i.e. \(A_i \cap A_j = \emptyset, \forall i \ne j\)), then \[P(A_1 \cup A_2 \cup...) = \sum_{j=1}^\infty P(A_j)\].

So the set \(S=[0,1]\).

Given these axioms, the following results must be satisfied for internal consistency.

[insert “basic properties of probability,” page 1, lect 2]

Measure-theoretic foundations of probability theory

The measure-theoretic foundations for probability theory are assumed in courses in econometrics, statistics, microeconomic theory, and finance. These foundations are not developed in the classes that use them, so the purpose of this lecture is to introduce some concepts in measure theory that will appear in those other courses.

Motivation for MT

Basically, MT will provide us the tools to:

integrate functions that are not continuous (or piece-wise continuous) or that are defined over sets that are not “very nice” (not possible with Riemann integration);

For example, the following integral is not defined in a Riemann sense: \(\int_0^1 f(u)\, du\) where \(f: [0,1] \rightarrow \mathcal{R}\) is defined as \(f(t) = t\) if t is rational, \(f(t) = 0\) else.
know to compute the volume of a weird object;

For example, to get the probability of the event “pick a point \(x\) at random from set A” defined as \[P(x \in A) = \frac{1}{area(A)}\] we have to compute area(A), where A may be any type of “weird” set.
know under what conditions we can exchange integration and differentiation;
know for what functions \(f\) it true that \[\frac{d}{dx}\int_0^x f(u)\, du = f(x)\] or that \[\frac{d}{dx}\int_a^b f(x,u)\, du = \int_a^b \frac{d}{dx} f(x,u)\, du\]
know for what functions we can apply the Fundamental Theorem of Calculus (FTC): \[\int_a^b F'(x)\, dx = F(b)-F(a)\]

It turns out that there exist functions \(F\) such that \(F'\) exists for all \(x\) but such that \(F'\) is not integrable. Then the FTC is not applicable to such functions. To apply the FTC, we need conditions that will allow us to find a class of functions which satisfy the FTC.
know for what functions it is always possible to have \[\int_0^1 f(x)\, dx =lim_{n \rightarrow \infty}\int_0^1 f_n (x)\, dx\] whenever \(\{f_n \}\) is a sequence of continuous functions on \([0,1]\) such that \[lim_{n \rightarrow \infty}f_n (x) = f(x)\] for every \(x \in [0,1]\).

Note that it is not always guaranteed that \(f(x)\) is continuous or that \(\int_0^1 f(x)\, dx =lim_{n \rightarrow \infty}\int_0^1 f_n (x)\, dx\) (since it is not always guaranteed that \(f(x)\) is Riemann integrable) for every \(x \in [0,1]\).
extend the concept of probability to probability of a random function (not just of a random variable)

For example, we are comfortable with the concept of CDF of a random variable. But what if we want to find the probability distribution of, say, a stock price over time? To be able to answer such a question, MT will change the focus from pdf’s to listing directly the probabilities (we will see how).

Final comments

The Riemann integral, dealt with in calculus, is well suited for computations (when the integrand is smooth) but is less suited for dealing with limit processes. Lebesgue integration eliminates these drawbacks, while keeping the computational advantages of the Riemann integral. At the same time, MT serves as the basis of contemporary analysis and probability.
If you will end up “developing tools”, you may need a proper course in MT. If you will end up “using tools”, a brief intro may be enough. This lecture is a bit more than a basic intro but it is definitely not meant to replace a semester long course in MT.

BT Paradox

In addition to all the above, without MT weird stuff can happen such as the Banach-Tarski (BT) paradox (mathematically possible but not yet encountered physically; as a side remark, there much debate about mathematical paradoxes since some have actually been proven correct at a subatomic level).

The BT Paradox: Take one unit ball and divide it in at least 5 disjoint pieces. Mathematically it is possible to reassemble these pieces to form two disjoint unit balls! The alternate version of the BT paradox is that it is possible to take a small pea, cut it into at least 5 different pieces, and then reassemble those pieces to form a ball the size of the sun. Vsauce has a very nice video on this.

Being faced with the BT paradox, we have two choices:

We decide that we don’t believe it. If we deny it, then we are implicitly denying some axioms in set theory without which it is possible to have the Cartesian product of two non-empty sets be empty. So if we deny the BT paradox, then we should agree with the statement that the Cartesian product of two non-empty sets can be empty. This does not make a lot of sense from an intuitive point of view.
We decide that we believe it. Then we obviously should try to find some sort of theory that will prevent the BT paradox from happening. This is where MT comes in: if we restrict each of the disjoint pieces to be measurable, then it would not be possible to reassemble them into a ball that is bigger than the original ball.¹

The Problem of Measure

Let \(f: E \rightarrow \mathcal{R}\), where \(E\) is some general space (or a general geometric body). In order to be able to answer all the questions above, we introduce the concept of measure of a set (or size or volume of a set).

You are probably familiar with at least two different concepts of size of a set: cardinality and length. For example, we have that the cardinality \(|[0,1]| = |[1,3]| = \infty\) but the length of \([0,1] = 1\), while the length of \([1,3] = 2\). Measure is another concept of size (length is a special case of this).

Measure is usually denoted as \(m(E)\), where \(m: \{subsets of E\} \rightarrow [0,\infty]\) where \([0,\infty]\) is the extended positive real line, i.e. \([0,\infty]= \mathcal{R} \cup {\infty}\). A measure measures the volume (size) of \(E\) (and of subsets of \(E\)). For example:

set	measure
\(E \in \mathcal{R}\)	length
\(E \in \mathcal{R}^2\)	area
\(E \in \mathcal{R}^3\)	volume

Formally: A measure is an extended real-valued, non negative, and countably additive set function \(m\) defined on a ring \(\S\)² and such that \(m(\emptyset)=0\).

In general terms, a measure is a non-negative function \(m\) defined on a family of subsets, call it \(\Omega\), of \(\mathcal{R}\) that are extended-real valued. We would like \(m\) to satisfy:

\(m(\Omega) = b - a\) when \(\Omega = [a,b], a \leq b\)
countable additivity: \[m(\Omega) = \sum_{i=1}^\infty m(\Omega_i)\] whenever \(\Omega = \cup_{j=1}^\infty \Omega_j\) and \(\cap_{j=1}^\infty \Omega_j = \emptyset\)
translation invariance: \[m(\Omega + h) = m(\Omega)\] for all \(h \in \mathcal{R}\).

It is possible to show that such a set-valued function exists and is unique whenever we limit ourselves to a “reasonable” class of sets - those that are measurable (later).

For technical reasons a measure will not be defined on all subsets of \(\Omega\) but only on a certain collection of subsets called a \(\sigma - algebra\). That is, letting \(\mathcal{P}(\Omega)\) be the power set of \(\Omega\), we are not going to define \(m\) on \(\mathcal{P}(\Omega)\). We will define it on \(\mathcal{F} \subset \mathcal{P}(\Omega)\). We can think of \(\mathcal{F}\) as the place where all activity (integration, differentiation, etc) will take place. Elements of \(\mathcal{F}\) are called measurable, while elements of \(\mathcal{P}(\Omega) - \mathcal{F}\) are unmeasurable. Measurable sets are candidates for having a size (either finite or infinite).

\(\sigma\) - algebra/field

Let \(\Omega\) be a set. Let \(\mathcal{F}\) be a non-empty collection of subsets of \(\Omega\). \(\mathcal{F}\) is a \(\sigma\)-algebra on \(\Omega\) if it satisfies the following conditions:

\(\Omega \in \mathcal{F}\)
\(\mathcal{F}\) is closed under complementation: If \(A \in \mathcal{F}\) then \(A^c = \Omega - A \in \mathcal{F}\)
\(\mathcal{F}\) is closed under countable union: If \(\{A_j\} \subset \mathcal{F}\) then \(\cup_{j=1}^\infty A_j \in \mathcal{F}\)

A \(\sigma\)-algebra on \(\Omega\) is a collection of subsets of \(\Omega\) which is closed under complementation and countable union. If \(\mathcal{F}\) is a \(\sigma\)-algebra on \(\Omega\), then \((\Omega,\mathcal{F})\) is called a measurable space and elements of \(\mathcal{F}\) are called measurable sets.

[switch to notes, page 5, under definition 4]

Another reason for which the BT paradox is possible is because we considered only a rigid motion – cutting and reassembling, not e.g. stretching. For example, if we were to stretch the interval \([0,1]\) to twice its size, it would not be intuitively weird to be able to cut the resulting interval into two intervals of the same size as the original interval \([0,1]\).↩
A ring is a non-empty class of sets that is closed under unions and differences. That is, if \(A \in \S\) and \(B \in \S\) then \(A \cup B \in \S\) and \(A - B \in \S\).↩