Introduction to Econometrics

Author: Muhammad Minhaj Akhtar
Designation: Lecturer Economics
College: Government Graduate College Jauharabad

Welcome to the fascinating world of econometrics! Studying econometrics bridges the gap between being a student of economics and becoming a practicing economist. It equips you with essential tools for analyzing economic data, empirically testing economic theories, providing numerical estimates of economic relationships, and forecasting future events.

In this introductory post, we will delve into the foundational aspects of econometrics, including its definition, objectives, and the distinctions between econometrics and other related fields. We’ll also explore how econometric models are structured, the role of stochastic error terms, and the various types of datasets and variables used in econometric research.

By the end of this journey, you’ll have a solid grasp of the basic principles and methods that underpin the study of econometrics, paving the way for more advanced exploration and application.

Origin of Econometrics
Definition of Econometrics
Uses/Functions/Objectives of Econometrics
Econometrics vs Other Fields
Economic vs Econometric Model
How Does an Econometric Model Look
What is a Stochastic Error Term (ε)
Types of Econometrics
Methodology of Econometrics
Different Types of Datasets Used in Econometric Rsearch
Other Types of Data
Types of Variables
Measurement Scales

Origin of Econometrics

The term econometrics is derived from two Greek words: oikonomia (economics) and metron (measurement). Literally, the word econometrics means economic measurement or the measurement of economic relationships. Ragnar Frisch, who coined the term econometrics, is one of the founders of the Econometric Society (Christ, 1983). ‘Econometrics aims at giving empirical content to economic relationships’. The three key ingredients are: Economic theory, Economic data, and Statistical methods.
Neither theory without measurement, nor measurement without theory, is sufficient for explaining economic phenomena. An econometrician is an individual who is both an economist by training and a competent mathematician and statistician. Therefore, a fundamental knowledge of mathematics, statistics, and economic theory is a necessary prerequisite for this field. Ragnar Frisch (1933) explains in the first issue of Econometrica that it is the unification of statistics, economic theory, and mathematics that constitutes econometrics (Baltagi, 2022).
Ragnar Frisch, a Norwegian economist, shared the first Nobel Prize in Economics with Jan Tinbergen, a Dutch economist, for their work in econometrics in 1969. Frisch and Tinbergen are considered the founding fathers of econometrics. Lawrence R. Klein, the 1980 recipient of the Nobel Prize in Economics “for the creation of econometric models and their application to the analysis of economic fluctuations and economic policies,” has always emphasized the integration of economic theory, statistical methods, and practical economics. Klein provides an interesting account of the pioneering works of: Moore (1914) on economic cycles, Working (1927) on demand curves, Cobb and Douglas (1928) on the theory of production, Schultz (1938) on the theory and measurement of demand, Tinbergen (1939) on business cycles.
Today, it is rare to read any professional article in leading economics and econometrics journals without encountering mathematical equations. Students of economics and econometrics need to be proficient in mathematics to comprehend this research. Professor T.W. Anderson of Stanford remarked in an Econometric Theory interview: These days econometricians are very highly trained in mathematics and statistics; much more so than statisticians are trained in economics.

Definition of Econometrics

As far as we concern about the definition of econometrics, there is no one definition of econometrics, ask a half dozen econometricians what econometrics is, and you could get a half dozen different answers. One might tell you that econometrics is the science of testing economic theories. While others say:

Econometrics is concerned with the systematic study of economic phenomena using observed data. (Spanos, 1986)
Broadly, econometrics is the science and art of using economic theory and statistical techniques to analyze economic data. (Stock & Watson, 2003)
Econometrics is based upon the development of statistical methods for estimating economic relationships, testing economic theories, and evaluating and implementing government and business policy. (Wooldridge, 2019)
Econometrics is about how we can use theory and data from economics, business, and the social sciences, along with tools from statistics, to answer ‘how many’ questions. (Griffiths, Hill, & Lim, 2008)
Econometrics is defined as the social science in which the tools of economic theory, mathematics, and statistical inference are applied to the analysis of economic phenomena. (Goldberger, 1964)
Econometrics is concerned with the empirical determination of economic laws. (Theil, 1971)
Econometrics is economic measurement for the purpose of testing and developing economic theory. (Sloan, 1949)
Econometrics is the integration of economics, mathematics, and statistics for the purpose of providing numerical values for the parameters of economic relationships (elasticities, propensities, and marginal values) and verifying economic theories. (Koutsoyiannis, 1977)
Econometrics is the application of statistical and mathematical methods to the analysis of economic data, with a purpose of giving empirical content to economic theory and verifying or refuting them. (Maddala, 2001)
Econometrics is a field of science in which mathematical-economic and mathematical-statistical research are applied in combination. (Tinbergen, 1951)
Econometrics is the integration of economic theory, mathematics, and statistical techniques for the purpose of testing hypotheses about economic phenomena, estimating coefficients of economic relationships, and forecasting or predicting future values of economic variables or phenomena. (Salvatore & Reagle, 2002)
Econometrics may be defined as the quantitative analysis of actual economic phenomena. (Samuelson, Koopmans, & Stone, 1954)
Broadly speaking, econometrics aims to give empirical content to economic relations for testing economic theories, forecasting, decision making, and for ex post decision/policy evaluation. (Geweke, Horowitz, & Pesaran, 2007)
Econometrics is the branch of economic science in which economic relationships are expressed mathematically in equation form. (Horton, Ripley, & Schnapper, 1948)
Econometrics is the application of mathematics and statistics to economics. (Marschak, 1948)
Econometrics is the application of mathematical economic theory and statistical procedures to economic data to establish numerical results in the field of economics and to verify economic theorems. (Tintner, 1952)

Uses/Functions/Objectives of Econometrics

Verifying economic theories and models
Providing numerical values of economic coefficients
Forecasting Future Events

Econometrics vs Other fields

Economic theory vs Econometrics

Economic theory makes statements or hypotheses that are mostly qualitative in nature. For example, microeconomic theory states that, other things remaining the same, a reduction in the price of a commodity is expected to increase the quantity demanded of that commodity. Thus, economic theory postulates a negative or inverse relationship between the price and quantity demanded of a commodity. But the theory itself does not provide any numerical measure of the relationship between the two; that is, it does not tell by how much the quantity will go up or down as a result of a certain change in the price of the commodity. It is the job of the econometrician to provide such numerical estimates.

Mathematical Economics vs Econometrics

Mathematical economics concerns expressing the economic theory in mathematical form (equations) without regard to measurability or empirical verification of the theory. It expresses the economic relationship in exact or deterministic form. Whereas econometrics empirically tests the economic theory or hypothesis and provides estimated values to economic relationships

Economic Statistics vs Econometrics

Economic statistics is mainly concerned with collecting, processing, and presenting economic data in the form of charts and tables. But an economic statistician does not test the economic theory, one who does that becomes an econometrician. Statistical methods describe the measurement which are developed on the basis of controlled experiments. But these methods do not fit to explain economic phenomenon. Moreover, most of the statistics is concerned with the statistical inference i.e., inference about population based on sample. Whereas econometricians are interested in casual inference i.e. finding cause and effect relationships.

Mathematical Statistics vs Econometrics

Mathematical statistics provide many estimation tools, but these tools cannot be directly applied to econometrics because of the unique nature of economic data. Economic data is often observational rather than experimental. Thus, econometrics require separate tools of analysis. As Jeffrey M. Wooldridge put “naturally, econometricians have borrowed from mathematical statisticians whenever possible. In addition, economists have devised new techniques to deal with the complexities of economic data and to test the predictions of economic theories”.

Conclusion

From all the above discussions we conclude that econometrics is different from quantitative economics because it frequently uses no mathematics. It is distinguished from mathematical economics which is quantitative but not empirical and uses no statistics. Finally, it is different from theoretical work in statistics which uses mathematics but is in general unrelated to economics. Thus, econometrics is the unification of mathematics, statistics, and economics. (Tintner, 1953, p. 37).

Economic vs Econometric Model

Economic Model

A model is a simplified representation of a real-world process. The term simplified means easy to understand, communicate, test, and validate. (G.S. Maddala) An economic model is a set of assumptions that describes the behavior of an economy. It consists of mathematical equations that describe various relationships. Some economic models are circular flow model, business cycle model, demand-supply model etc.

Econometric model

An econometric model is:

A set of behavioral equations derived from the economic model. These equations involve some observed variables and some unobserved variables.
A statement of whether there are errors of observation in the observed variables.
A specification of probability distribution of the disturbances. (G.S. Maddala)

Behavioral relationship describes how a particular variable behaves in response to changes in other variables

How does an econometric model look

Linear Regression Model

Components of an econometric model?

Firstly, in an econometric model we must realize that economic relationships are not exact, this is because economic theory does not claim to be able to predict the specific behavior of any individual or firm, but rather describes the average or systematic behavior of many individuals or firms. Thus, every econometric model has two parts: firstly, an observed, deterministic, systematic, and predictable component and, secondly, an unobserved, stochastic, or unpredictable component often called random error term, or disturbance term, or noise. It is denoted by epsilon ε. It is catchall of all other variables that are omitted from the model either due to our limited knowledge of the actual economic relationship, or we consider irrelevant for the current model, or measurement errors in observed variables. The systematic portion comes from economic theory. After specifying the systematic and unsystematic portion we must know the algebraic relationship among our economic variables whether it is linear, logarithmic, or exponential etc.

What is a stochastic error term (ε)

A stochastic error term is a term that is added to a regression equation to introduce all of the variations in Y that cannot be explained by the included Xs. It is, in effect, a symbol of the econometrician’s ignorance or inability to model all the movements of the dependent variable. It is denuded by epsilon ε. This variation probably comes from sources such as omitted influences, measurement error, incorrect functional form, or purely random and totally unpredictable occurrences. By random variation we mean something that has its value determined entirely by chance.

Types of Econometrics

Econometrics is divided into two broad categories, (i) Theoretical Econometrics and (ii) Applied Econometrics, and each uses Classical and Bayesian approach. Theoretical econometrics is concerned with the development of appropriate methods for measuring economic relationships specified by econometric models. It relies heavily on mathematical statistics. For example, least squares method, maximum likelihood method, indirect least squares method. Applied econometrics uses the tools of theoretical econometrics to study some special field(s) of economics and business, such as the production function, investment unction, demand and supply functions, portfolio theory, etc.

Methodology of Econometrics

Methodology of Econometric research involves various steps to empirically verify the economic theory by using different methods. There are several schools of thought on econometric methodology, but traditional or classical methodology is still dominated. Broadly, there are eight steps.

Statement of theory or hypothesis.
Specification of the mathematical model of the theory.
Specification of the statistical, or econometric, model.
Obtaining the data.
Estimation of the parameters of the econometric model.
Hypothesis testing.
Forecasting or prediction.
Using the model for control or policy purposes

Steps of Econometric Methodology

1. Statement of theory or hypothesis

Firstly, we proceed by a hypothesis (a testable statement). For example, according to Keynes people on average increase their consumption expenditure with the increase in income but the increase in consumption expenditure is less than increase in income i.e. MPC is less than 1 and greater than 0.

2. Specification of the mathematical model

After specifying the hypothesis, we must specify the the mathematical model which involves specifying functional relationship between consumption and disposable income. In Fact, this also comes from economic theory, economists assume the linear relationship between income and consumption. \[Y=𝛽_0+𝛽_1 𝑋\] \[0<𝛽_1<1\]

Y is the dependent variable i.e. consumption expenditure and X is called independent variable i.e. disposable income β0 and β1 are known as parameters which are intercept and slope respectively. In economic language these are autonomous consumption and MPC of consumption function respectively.

Example of Mathematical Model

3. Specification of an econometric model

A mathematical model assumes the deterministic or exact relationship between the variables and in such models all data points lie exactly on the line as shown in figure in 1. However, the relationship between economic variables is generally inexact or random. This randomness can be incorporated in an econometric model by adding a stochastic error term. Thus, our econometric model of Keynesian consumption function is written as: \[𝑌= 𝛽_0+ 𝛽_1 𝑋 + 𝑢\]

The term 𝑢 is called disturbance term as it disturbs our linear relationship between consumption and income. It is also called unobserved variation because this include all unobservable variables that are not considered explicitly in consumption function.

Example of Econometric Model

4. Obtaining the data

The purpose of any econometric model is to estimate the relationship between economic variables by finding the values of β0 and β1. This can be done by obtaining data on consumption and disposable income. This is the work of an economic statistician to collect data on economic variables. Various sources for obtaining economic data ARE:

World Development Indicators
International Financial Statistics
OECD Economic Outlook
Luxembourg Income Study (LIS)
Penn World Table (PWT)

In econometric research there are various types of data sets such as.

Time series data
Cross-sectional data
Pooled data
Panel data

5. Estimation of the parameters

Now we have econometric model and data as well, our next step is to estimate the numerical values of economic parameters. This can be done by applying appropriate estimation techniques such as least squares, maximum likelihood method etc. The estimated model looks like: \[\hat{Y}=−300+0.72𝑋\]

The hat sign indicates the estimated values of Y conditional on various values of X.

6. Hypothesis Testing

We cannot make predictions directly based on these estimated values of coefficients, because we are not sure yet whether these sample estimates represent true population parameters. This can be done by testing the statistical significance of regression coefficients. The branch of statistics which permits us to test the significance of regression coefficients is called inferential statistics. Different tests used in hypothesis testing includes t-test, F-value, Chi Square test, ANOVA test etc. For example, we test the significance of slope coefficient. The null and alternative hypothesis are.

Ho: Slope Coefficient is equal to zero
Ha: Slope Coefficient is not equal to zero.

We reject our null hypothesis if p-value is less than significance level and conclude that the true MPC is not equal to zero.

7. Forecasting and prediction

When we find that our chosen model does not refute hypothesis or theory, we can use this model to make predictions or forecast the future values of forecast variable based on known or expected values of predictors. For example, suppose that at the income level of Rs. 50,000 per month in January 2024, the predicted consumption will be.

\[\hat{Y}=−300+0.72(50,000) =35,700\]

Also suppose that actual value of consumption expenditure in January 2024 was 45,000, the forecast error is Rs. 7500, thus our estimated model underestimates the actual consumption. This does not mean that our estimated model is wrong as forecast errors are inevitable. We can also evaluate the forecasting performance of the models using various methods such as Mean Square Error, Mean Absolute Error, Mean Percentage Absolute Error etc.

8. Using the model for control or policy purposes

The estimated model can also be used for control or policy purposes. This can be done by manipulating the control variables (X) to produce the desired values of target variable (Y). For example, in recession periods government want to increase aggregate demand, this can be done by decreasing taxes which increase the disposable and increase consumption which is the component of aggregate demand.

Different types of datasets used in econometric research

Cross-Section data
Time series data
Pool data
Panel of longitudinal data

Cross-Section data

A cross-sectional dataset is collected across sample units at a given point in time. Such as individuals, households, firms, cities, states, countries. For example, Pakistan Social and Living standard Measurement is a cross-sectional dataset which collects information from various households in Pakistan about different socio-economic indicators such as income, education, health, occupation, housing, and sanitation etc. Other examples of cross-section datasets in Pakistan are Demographic Health Survey, Multiple Indicator Cluster Survey, data on population census. While dealing with cross section data an important problem that economists must face is heterogeneity.

Cross Section Data Example 1

Cross Section Data Example 2

Time-Series Data

A time series dataset is a collection of data recorded over the period of time in chronological order. For example, data about GDP, CPI, labor force of Pakistan is collected from 1970 to 2023. The order time series data is very important. Time series data is collected at various frequences such as daily, weekly, monthly, and annually. An important feature of time series data is past observations affect the current observations. The original use of time series data is forecasting based on past information. Forecasting requires that the data is stationary, but most time series data is non-stationary. Thus, time series requires separate tools for modeling. Time series data has four components trend, cyclical, seasonal and irregular component.

Time Series Data Example

Pooled Data

Pooled data or combined data have features of both cross section and time series data. For example, suppose that two cross-sectional household surveys are taken in Pakistan, one in 1985 and one in 1990. In 1985, a random sample of households is surveyed for variables such as income, savings, family size, and so on. In 1990, a new random sample of households is taken using the same survey questions. To increase our sample size, we can form a pooled cross section by combining the two years.

Pooled Data Example

Panel or longitudinal data

In panel or longitudinal dataset data is collected for several cross-section units over time. For example, we collect data about GDP, inflation, unemployment rate, money supply, investment for all developing countries over the time from 1970 to 2023. The key feature of panel data that distinguishes them from a pooled cross section is that the same cross-sectional units are followed over a given time period. World Development Indicators (WDI) of World Bank is an example of international panel dataset. More examples are Pew Research Center Surveys, World Economic Outlook (WEO) Database, World Values Survey etc. If all cross-section units have the same number of observations, then it is called balanced panel. If all cross-section units have not the same observations, then, it is called unbalanced panel.

Panel Data Example

Other types of data

Experimental data refers to the data collected through controlled experiments in natural sciences, where variables are manipulated to observe their effects on other variables. This type of data allows researchers to establish cause-and-effect relationships. Non Experimental data also known as observational data or retrospective data is collected by observing and recording events, behaviors, or phenomena as they naturally occur without interference or manipulation.
Secondary data is existing knowledge obtained from sources such as books, reports, and surveys. It is the data that has already been collected through primary sources and made readily available for researchers to use for their own research. For example, PSLM/HIES data set is a secondary data source for researchers and policymakers. Primary data is the information collected first time by a researcher for his/her own research. It is collected through Surveys, interviews, experiments, observations, and questionnaires.

Types of Variables

Quantitative variables are those variables whose values can be measured and expressed on a numerical scale. Such as GDP, GNP, Exports Investment etc. Quantitative variables can be classified into two broad categories namely: (i) discrete variables and (ii) continuous variables. Qualitative variables are those variables which cannot be measured and expressed numerically but can be classified into several groups or categories. Such as gender, education level, social status etc. Categorical variable is one that has two or more categories. There are two types of categorical variables: nominal and ordinal. A categorical variable is said to be a nominal variable if it has no intrinsic ordering to its categories, but an ordinal variable has a clear ordering to its categories.

Measurement Scales

Data can be classified according to the levels of measurement or measurement scales. Levels of measurement determine how data should be summarized or presented. It also indicates the type of statistical analysis that can be performed. There are four levels of measurement nominal, the lowest level, ordinal, interval, and ratio, the highest level.

Nominal level data

In nominal scale data is classified into mutually exclusive qualitative categories. Nominal data is often represented as labels or names. We can just count the data in each category or can convert it into percentages. Examples are gender, colors, house numbers. We neither rank the data nor perform any other mathematical operation. Nominal data is often represented by bar charts. We can also label the groups such as 1 for male and 2 for female. Table 1 shows classification of ballon colors with labels.

Labels	Balloon Colors	Percentage in bag
1	Blue	24 %
2	Green	20 %
3	Orange	16 %
4	Yellow	14 %
5	Red	13 %
6	Brown	13 %

Table 1 Example of Nominal Scale

Ordinal-Level Data

The next higher level of measurement is ordinal level which has the characteristics of nominal scale and in addition has the property of ordering or ranking of measurement. For example, the performance of students in class test like excellent, good, fair, poor, very poor. Classification of households based on their income such as poor class, lower middle class, upper middle class, and rich class. Table 2 shows 60 students rating of economics professors based on their teaching quality.

Rating	Frequency	Percentage
Superior	6	10%
Good	26	43.3%
Average	16	26.7%
Poor	9	15%
Inferior	3	5%

Table 2 Example of Ordinal Scale

An important characteristic of relative measurement scale is that we cannot distinguish the magnitude of the difference between the groups. For example, we do not know if the difference between “Superior” and “Good” is the same as the difference between “Poor” and “Inferior”.

Interval-level Data

Interval level data has all the characteristics of ordinal level data, but the difference between values is also meaningful. Temperature is an example of interval scale. Suppose that the high temperature on three consecutive winter days in Jauharabad is 18, 20, 22 degrees Celsius. These temperatures can easily be ranked, and we can also measure the distance between temperatures. The distance between 18 and 20 degrees Celsius is 2 degrees which is same as 2 degrees between 20 and 22 degrees Celsius. But we cannot say that 40 degrees Celsius is twice as high as 20 degrees Celsius. Similarly, the temperature of 0 degrees Celsius does not mean the absence of temperature (because it is 32 degrees on Fahrenheit scale). Thus, interval scale has no true or natural zero point. Remember three things about Interval scale.

It has all the characteristics of ordinal scale.
The difference between numbers makes sense but the ratio does not.
It does not have true zero points.

Month	Average Temperature (°C)
January	-2
February	0
March	5
April	10
May	16
June	20
July	24
August	23
September	18
October	12
November	5
December	0

Table 3 Example of Interval level data

Ratio-Level Data

Ratio level data is the highest level of measurement which have all the characteristics of interval scale and ratio between values and zero point are also meaningful. All quantitative variables are measured in ratio scale. It is used to measure height, weight, volume, length, money, units of production, prices etc. For example, if you have zero rupees, you have no money. Table 3 shows the monthly income of four different designations. The Professor earns twice as that of Associate Professor and four times as the Lecturer.

Designation	Income
Lecturers	80,000
Assistant Professors	120,000
Associate Professors	160,000
Professor	320,000

Table 4 Example of Ratio Scale

Summary of Measurement Scales