1 What is econometrics?

Econometrics is based upon the development of statistical methods for estimating economic relationships, testing economic theories, and evaluating and implementing government and business policy. A common application of econometrics is the forecasting of such important macroeconomic variables as interest rates, inflation rates, and gross domestic product (GDP). Whereas forecasts of economic indicators are highly visible and often widely published, econometric methods can be used in economic areas that have nothing to do with macroeconomic forecasting.

For example, we might want to study the effects of political campaign expenditures on voting outcomes. Similarly, we might consider the effect of school spending on student performance in the field of education.

Econometrics has evolved as a separate discipline from mathematical statistics because the former focuses on the problems inherent in collecting and analyzing nonexperimental economic data. Nonexperimental data are not accumulated through controlled experiments on individuals, firms, or segments of the economy. (Nonexperimental data are sometimes called observational data, or retrospective data, to emphasize the fact that the researcher is a passive collector of the data.)

Experimental data are often collected in laboratory environments in the natural sciences, but they are more difficult to obtain in the social sciences. Although some social experiments can be devised, it is often impossible, prohibitively expensive, or morally repugnant to conduct the kinds of controlled experiments that would be needed to address economic issues.

Naturally, econometricians have borrowed from mathematical statisticians whenever possible. The method of multiple regression analysis is the mainstay in both fields, but its focus and interpretation can differ markedly. In addition, economists have devised new techniques to deal with the complexities of economic data and to test the predictions of economic theories.

Econometric methods are relevant in virtually every branch of applied economics. They come into play either when we have an economic theory to test or when we have a relationship in mind that has some importance for business decisions or policy analysis. An empirical analysis uses data to test a theory or to estimate a relationship.

2 Classical Methodology in Econometrics

Formulation of theory or hypothesis
Specification of economic (mathematical) model,
Specification of econometric model,
Collecting data,
Estimation of parameters,
Hypothesis tests,
Forecasting/Prediction,
Evaluation of results for policy analysis or decision making

2.1 Economic model

The first step in an econometric analysis is to pose the question we wish to answer as precisely as possible. In most cases the question has important policy implications. For example,

How does the class size affect student performance in primary schools?
If the minimum wage increases by 10%, how much does unemployment (or employment) change?
If the severity of punishment for certain crimes increases, do crime rates fall on average?

The second step is to specify an economic model or a conceptual framework with careful economic reasoning. (utility maximization, profit maximization, cost minimization, etc.)

2.1.1 Example: An Economic Model of Crime

Gary Becker is a famous economist and a Nobel prize winner who analyzed several topics that were once considered to be outside economics such as crime, racial discrimination, marriage and family organization using utility maximization framework.

\[ y = f(x_1, x_2, x_3, x_4, x_5, x_6, x_7) \tag{1.1} \]

where:

\(y\): hours spent in criminal activities
\(x_1\): “earnings” for an hour spent in criminal activity
\(x_2\): hourly wage in legal employment
\(x_3\): other income
\(x_4\): probability of getting caught
\(x_5\): probability of being convicted, if caught
\(x_6\): expected sentence, if convicted
\(x_7\): age

NOTE: The functional form \(f()\) is not yet specified.

2.1.2 Example: An Economic Model of Job Training and Worker Productivity

\[ wage = f(educ, exper, training) \tag{1.2} \]

where:

\(wage\): hourly wage (in peso)
\(educ\): level of education (in years)
\(training\): weeks spent in job training

Similarly, the functional form \(f()\) is not yet specified.

2.2 An Econometric Model of Job Training and Worker Productivity

Suppose, we impose that \(f()\) is a linear function. Then,

\[ wage = \beta_0 + \beta_1 \times educ+\beta_2 \times exper + \beta_3\times training + \boldsymbol{u} \tag{1.3} \]

is an example of an econometric model.

NOTES:

\(\boldsymbol{u}\): random error term or disturbance term
- The random error term \(\boldsymbol{u}\) contains the influence of factors that are not included in the model. It also contains unobserved factors such as innate ability or family background.
- No matter how comprehensive the specified model, there will always be factors that cannot be included in the econometric model. We can never eliminate u entirely (otherwise, we have a deterministic model which is not realistic)
\(\beta_0, \beta_1, \beta_2, \beta_3\): parameters of the econometric model
- These are unknown constants
- They describe describe the directions and strengths of the relationship between wage and factors affecting wage included in the model.
- For example, we may be interested in testing \(H_0:\beta_3 = 0\), which says that job training has no effect on wage.
- Because we do not know these beta coefficients, we need to collect data and estimate them using the methods we will learn in this class.

2.3 Types of Data in Econometric Analysis

The types of data that we use in econometric applications can be classified as follows:

Cross-sectional data
Time series data
Pooled cross-section
Panel data (longitudinal data)

2.3.1 Cross-sectional data

Cross-sectional data consists of a sample of individuals, households, firms, cities, states, countries, or a variety of other units, taken at a given point in time. These units are usually a random sample from a target population. Cross-sectional data, generally, can be obtained through official records of individual units and surveys with the use of questionnaires. For example, household income, consumption and employment surveys conducted by the Philippine Statistics Authority, e. g. FIES.

2.3.2 Time series data

consists of observations on a variable or several variables over time
Chronological ordering
Frequency of time series data: hour, day, week, month, year
Time length between observations is generally equal
Examples of time series data include stock prices, money supply, consumer price index, gross domestic product, annual homicide rates, and automobile sales figures.

2.3.3 Pooled cross-section

consists of cross-sectional data sets that are observed in different time periods and combined together
At each time period (e.g., year) a different random sample is chosen from the population
Individual units are not the same
For example if we choose a random sample 400 firms in 2002 and choose another sample in 2010 and combine these cross-sectional data sets we obtain a pooled cross-section data set.
Cross-sectional observations are pooled together over time.

2.3.4 Panel data (Longitudinal data)

Consists of a time series for each cross-sectional member in the data set.
The same cross-sectional units (firms, households, etc.) are followed over time.
For example: wage, education, and employment history for a set of individuals followed over a ten-year period.
Another example: cross-country data set for a 20 year period containing life expectancy, income inequality, real GDP per capita and other country characteristics.
In practice, we encounter two types of panel data: micro panels and macro panels
- Micro panels (large N small T ): we have a large number (N) of cross-sectional units (consumers, firms, etc.) but small number of time periods (T)
- Macro panels (small N large T ): we have a small number of cross-sectional units (e.g. countries) but large number of time periods (e.g., 50 years of observations)

3 Causality and the Notion of Ceteris Paribus

In testing economic theory usually our goal is to infer that one variable has a causal effect on another variable.
Correlation may be suggestive but cannot be used to infer causality.
You’ve probably heard the mantra “correlation does not imply causation”. For example, the rooster’s crow in the morning does not cause the sun to rise (but they are highly correlated).
The fundamental notion “Ceteris paribus” means that “other relevant factors being equal”.
Or “holding all other factors fixed”.
Most economic questions are ceteris paribus by nature.
For example, in analyzing consumer demand, we are interested in knowing the effect of changing the price of a good on its quantity demanded, while holding all other factors (such as income, prices of other goods, and individual tastes) fixed.
If other factors are not held fixed, then we cannot know the causal effect of a price change on quantity demanded.
Therefore, the relevant question in econometric analysis is “do we control sufficient number of factors?”
Are there other factors that are not included in the model?
Can we say that other components are held fixed?
In most serious applications the number of factors is immense so the isolation of the effect of any particular variable may seem hopeless. But, if properly used, econometric methods can help us determine ceteris paribus effects.

Econ 115s (Introduction to Econometrics)

Lesson 1.1 (Introduction to Econometrics)

NE Milla, Jr.

2023-02-28