Lecture 3 on Statistics

In economics, we are interested in measuring quantities for a (statistical) population.

A population in statistics is a complete set of individuals, firms, objects, provinces, states, etc, that we are interested in knowing something about.
The quantity of interest usually means some characteristic of the distribution of a population random variable. This quantity if called population parameter.

For example, if we are interested in knowing what the average income is in Canada, our population of interest is formed of all individuals living in Canada. The population random variable is income, which we denote as \(Y\). The population parameter that we are interested in is then \(E(Y)\).

To answer this question, we would have to interview all people living in Canada, and then with that data, we would compute \(E(Y)\). However, for practical reasons it is impossible to interview all people living in Canada (too expensive, some people may not respond, some people may be living illegally so it would be impossible to find them, etc).

The solution is to interview individuals drawn at random from the population. “At random” means that all members of the population have an equal and independent chance of being selected (so we do not ask only people living on the SFU campus or only people whose last names begin with A, for example). The data that we get via this process is called a sample, and the data is denoted as \(Y_1,Y_2,...,Y_n\) where \(n\) is the sample size.

What we do now is to use the sample data to infer or to approximate the population parameter. So for our example, this means that we will use the sample average, \(\frac{1}{n} \sum_{i=1}^{n} Y_i\) to approximate the population mean, \(E(Y)\).

The sample average is an example of a statistic or estimator.

Notice that I said “approximate.” This means that the sample average is not exactly equal to the population mean. However, we will learn that the mean of the sample average is an unbiased estimator (or an unbiased guess) of the population mean, whenever the sample that we have is a random sample. Basically, we will learn that

\[E(\frac{1}{n} \sum_{i=1}^{n} Y_i) = \mu\]

when the data \(Y_i\) is i.i.d. (independently and identically distributed), which means that each individual i was sampled at random and that each observation \(Y_i\) is a realization from the (population) random variable \(Y\).

Because we randomly select individuals from the population into the sample, the observations \(\{Y_i\}_{i=1}^n\) are random variables. This means that the sample mean is a random variable because the sample mean is a function of our sample.

The big point so far is that there is a population variable \(Y\) that has some distribution. The distribution is characterized by, say, \(E(Y)\) and \(var(Y)\). These characteristics are called population parameters. These population parameters are not random variables, they are just variables, so \(E(Y)=\mu\) and \(var(Y)=\sigma\) where \(\mu\) and \(\sigma\) are constants which we do not know. We are interested in learning the values of these variables. To do this we sample at random from the population and we record our data as \(Y_1,Y_2,...,Y_n\) where \(n\) is some number less than infinity. We call this collection of observations the sample. We now use this sample to approximate the value of unknown variables. For example, we use the sample mean to learn about \(\mu\). Since the sample was selected at random, the observations in the sample are random variables. This means that the sample mean is a random variable. The question now is: How can a random variable (which has many possible values) be a good guess for a number (the population quantity)?