Author: Muhammad Minhaj Akhtar
Designation: Lecturer Economics
College: Government Graduate College
Jauharabad
Welcome to the fascinating world of econometrics! Studying econometrics bridges the gap between being a student of economics and becoming a practicing economist. It equips you with essential tools for analyzing economic data, empirically testing economic theories, providing numerical estimates of economic relationships, and forecasting future events.
In this introductory post, we will delve into the foundational aspects of econometrics, including its definition, objectives, and the distinctions between econometrics and other related fields. We’ll also explore how econometric models are structured, the role of stochastic error terms, and the various types of datasets and variables used in econometric research.
By the end of this journey, you’ll have a solid grasp of the basic principles and methods that underpin the study of econometrics, paving the way for more advanced exploration and application.
The term econometrics is derived from two Greek words:
oikonomia (economics) and metron
(measurement). Literally, the word econometrics means economic
measurement or the measurement of economic relationships. Ragnar Frisch,
who coined the term econometrics, is one of the founders of the
Econometric Society (Christ, 1983). ‘Econometrics aims at giving
empirical content to economic relationships’. The three key ingredients
are: Economic theory, Economic data, and Statistical methods.
Neither theory without measurement, nor measurement without theory, is
sufficient for explaining economic phenomena. An econometrician is an
individual who is both an economist by training and a competent
mathematician and statistician. Therefore, a fundamental knowledge of
mathematics, statistics, and economic theory is a necessary prerequisite
for this field. Ragnar Frisch (1933) explains in the first issue of
Econometrica that it is the unification of statistics, economic
theory, and mathematics that constitutes econometrics (Baltagi,
2022).
Ragnar Frisch, a Norwegian economist, shared the first Nobel Prize in
Economics with Jan Tinbergen, a Dutch economist, for their work in
econometrics in 1969. Frisch and Tinbergen are considered the founding
fathers of econometrics. Lawrence R. Klein, the 1980 recipient of the
Nobel Prize in Economics “for the creation of econometric models and
their application to the analysis of economic fluctuations and economic
policies,” has always emphasized the integration of economic theory,
statistical methods, and practical economics. Klein provides an
interesting account of the pioneering works of: Moore (1914) on economic
cycles, Working (1927) on demand curves, Cobb and Douglas (1928) on the
theory of production, Schultz (1938) on the theory and measurement of
demand, Tinbergen (1939) on business cycles.
Today, it is rare to read any professional article in leading economics
and econometrics journals without encountering mathematical equations.
Students of economics and econometrics need to be proficient in
mathematics to comprehend this research. Professor T.W. Anderson of
Stanford remarked in an Econometric Theory interview: These
days econometricians are very highly trained in mathematics and
statistics; much more so than statisticians are trained in
economics.
As far as we concern about the definition of econometrics, there is no one definition of econometrics, ask a half dozen econometricians what econometrics is, and you could get a half dozen different answers. One might tell you that econometrics is the science of testing economic theories. While others say:
Econometrics is concerned with the systematic study of economic phenomena using observed data. (Spanos, 1986)
Broadly, econometrics is the science and art of using economic theory and statistical techniques to analyze economic data. (Stock & Watson, 2003)
Econometrics is based upon the development of statistical methods for estimating economic relationships, testing economic theories, and evaluating and implementing government and business policy. (Wooldridge, 2019)
Econometrics is about how we can use theory and data from economics, business, and the social sciences, along with tools from statistics, to answer ‘how many’ questions. (Griffiths, Hill, & Lim, 2008)
Econometrics is defined as the social science in which the tools of economic theory, mathematics, and statistical inference are applied to the analysis of economic phenomena. (Goldberger, 1964)
Econometrics is concerned with the empirical determination of economic laws. (Theil, 1971)
Econometrics is economic measurement for the purpose of testing and developing economic theory. (Sloan, 1949)
Econometrics is the integration of economics, mathematics, and statistics for the purpose of providing numerical values for the parameters of economic relationships (elasticities, propensities, and marginal values) and verifying economic theories. (Koutsoyiannis, 1977)
Econometrics is the application of statistical and mathematical methods to the analysis of economic data, with a purpose of giving empirical content to economic theory and verifying or refuting them. (Maddala, 2001)
Econometrics is a field of science in which mathematical-economic and mathematical-statistical research are applied in combination. (Tinbergen, 1951)
Econometrics is the integration of economic theory, mathematics, and statistical techniques for the purpose of testing hypotheses about economic phenomena, estimating coefficients of economic relationships, and forecasting or predicting future values of economic variables or phenomena. (Salvatore & Reagle, 2002)
Econometrics may be defined as the quantitative analysis of actual economic phenomena. (Samuelson, Koopmans, & Stone, 1954)
Broadly speaking, econometrics aims to give empirical content to economic relations for testing economic theories, forecasting, decision making, and for ex post decision/policy evaluation. (Geweke, Horowitz, & Pesaran, 2007)
Econometrics is the branch of economic science in which economic relationships are expressed mathematically in equation form. (Horton, Ripley, & Schnapper, 1948)
Econometrics is the application of mathematics and statistics to economics. (Marschak, 1948)
Econometrics is the application of mathematical economic theory and statistical procedures to economic data to establish numerical results in the field of economics and to verify economic theorems. (Tintner, 1952)
Economic theory makes statements or hypotheses that are mostly qualitative in nature. For example, microeconomic theory states that, other things remaining the same, a reduction in the price of a commodity is expected to increase the quantity demanded of that commodity. Thus, economic theory postulates a negative or inverse relationship between the price and quantity demanded of a commodity. But the theory itself does not provide any numerical measure of the relationship between the two; that is, it does not tell by how much the quantity will go up or down as a result of a certain change in the price of the commodity. It is the job of the econometrician to provide such numerical estimates.
Mathematical economics concerns expressing the economic theory in mathematical form (equations) without regard to measurability or empirical verification of the theory. It expresses the economic relationship in exact or deterministic form. Whereas econometrics empirically tests the economic theory or hypothesis and provides estimated values to economic relationships
Economic statistics is mainly concerned with collecting, processing, and presenting economic data in the form of charts and tables. But an economic statistician does not test the economic theory, one who does that becomes an econometrician. Statistical methods describe the measurement which are developed on the basis of controlled experiments. But these methods do not fit to explain economic phenomenon. Moreover, most of the statistics is concerned with the statistical inference i.e., inference about population based on sample. Whereas econometricians are interested in casual inference i.e. finding cause and effect relationships.
Mathematical statistics provide many estimation tools, but these tools cannot be directly applied to econometrics because of the unique nature of economic data. Economic data is often observational rather than experimental. Thus, econometrics require separate tools of analysis. As Jeffrey M. Wooldridge put “naturally, econometricians have borrowed from mathematical statisticians whenever possible. In addition, economists have devised new techniques to deal with the complexities of economic data and to test the predictions of economic theories”.
From all the above discussions we conclude that econometrics is different from quantitative economics because it frequently uses no mathematics. It is distinguished from mathematical economics which is quantitative but not empirical and uses no statistics. Finally, it is different from theoretical work in statistics which uses mathematics but is in general unrelated to economics. Thus, econometrics is the unification of mathematics, statistics, and economics. (Tintner, 1953, p. 37).
A model is a simplified representation of a real-world process. The term simplified means easy to understand, communicate, test, and validate. (G.S. Maddala) An economic model is a set of assumptions that describes the behavior of an economy. It consists of mathematical equations that describe various relationships. Some economic models are circular flow model, business cycle model, demand-supply model etc.
An econometric model is:
Behavioral relationship describes how a particular variable behaves in response to changes in other variables
Firstly, in an econometric model we must realize that economic relationships are not exact, this is because economic theory does not claim to be able to predict the specific behavior of any individual or firm, but rather describes the average or systematic behavior of many individuals or firms. Thus, every econometric model has two parts: firstly, an observed, deterministic, systematic, and predictable component and, secondly, an unobserved, stochastic, or unpredictable component often called random error term, or disturbance term, or noise. It is denoted by epsilon ε. It is catchall of all other variables that are omitted from the model either due to our limited knowledge of the actual economic relationship, or we consider irrelevant for the current model, or measurement errors in observed variables. The systematic portion comes from economic theory. After specifying the systematic and unsystematic portion we must know the algebraic relationship among our economic variables whether it is linear, logarithmic, or exponential etc.
A stochastic error term is a term that is added to a regression equation to introduce all of the variations in Y that cannot be explained by the included Xs. It is, in effect, a symbol of the econometrician’s ignorance or inability to model all the movements of the dependent variable. It is denuded by epsilon ε. This variation probably comes from sources such as omitted influences, measurement error, incorrect functional form, or purely random and totally unpredictable occurrences. By random variation we mean something that has its value determined entirely by chance.
Econometrics is divided into two broad categories, (i) Theoretical Econometrics and (ii) Applied Econometrics, and each uses Classical and Bayesian approach. Theoretical econometrics is concerned with the development of appropriate methods for measuring economic relationships specified by econometric models. It relies heavily on mathematical statistics. For example, least squares method, maximum likelihood method, indirect least squares method. Applied econometrics uses the tools of theoretical econometrics to study some special field(s) of economics and business, such as the production function, investment unction, demand and supply functions, portfolio theory, etc.
Methodology of Econometric research involves various steps to empirically verify the economic theory by using different methods. There are several schools of thought on econometric methodology, but traditional or classical methodology is still dominated. Broadly, there are eight steps.
Firstly, we proceed by a hypothesis (a testable statement). For example, according to Keynes people on average increase their consumption expenditure with the increase in income but the increase in consumption expenditure is less than increase in income i.e. MPC is less than 1 and greater than 0.
After specifying the hypothesis, we must specify the the mathematical model which involves specifying functional relationship between consumption and disposable income. In Fact, this also comes from economic theory, economists assume the linear relationship between income and consumption. \[Y=𝛽_0+𝛽_1 𝑋\] \[0<𝛽_1<1\]
Y is the dependent variable i.e. consumption expenditure and X is called independent variable i.e. disposable income β0 and β1 are known as parameters which are intercept and slope respectively. In economic language these are autonomous consumption and MPC of consumption function respectively.
A mathematical model assumes the deterministic or exact relationship between the variables and in such models all data points lie exactly on the line as shown in figure in 1. However, the relationship between economic variables is generally inexact or random. This randomness can be incorporated in an econometric model by adding a stochastic error term. Thus, our econometric model of Keynesian consumption function is written as: \[𝑌= 𝛽_0+ 𝛽_1 𝑋 + 𝑢\]
The term 𝑢 is called disturbance term as it disturbs our linear relationship between consumption and income. It is also called unobserved variation because this include all unobservable variables that are not considered explicitly in consumption function.
The purpose of any econometric model is to estimate the relationship between economic variables by finding the values of β0 and β1. This can be done by obtaining data on consumption and disposable income. This is the work of an economic statistician to collect data on economic variables. Various sources for obtaining economic data ARE:
In econometric research there are various types of data sets such as.
Now we have econometric model and data as well, our next step is to estimate the numerical values of economic parameters. This can be done by applying appropriate estimation techniques such as least squares, maximum likelihood method etc. The estimated model looks like: \[\hat{Y}=−300+0.72𝑋\]
The hat sign indicates the estimated values of Y conditional on various values of X.
We cannot make predictions directly based on these estimated values of coefficients, because we are not sure yet whether these sample estimates represent true population parameters. This can be done by testing the statistical significance of regression coefficients. The branch of statistics which permits us to test the significance of regression coefficients is called inferential statistics. Different tests used in hypothesis testing includes t-test, F-value, Chi Square test, ANOVA test etc. For example, we test the significance of slope coefficient. The null and alternative hypothesis are.
Ho: Slope Coefficient is equal to zero
Ha: Slope Coefficient is not equal to zero.
We reject our null hypothesis if p-value is less than significance level and conclude that the true MPC is not equal to zero.
When we find that our chosen model does not refute hypothesis or theory, we can use this model to make predictions or forecast the future values of forecast variable based on known or expected values of predictors. For example, suppose that at the income level of Rs. 50,000 per month in January 2024, the predicted consumption will be.
\[\hat{Y}=−300+0.72(50,000) =35,700\]
Also suppose that actual value of consumption expenditure in January 2024 was 45,000, the forecast error is Rs. 7500, thus our estimated model underestimates the actual consumption. This does not mean that our estimated model is wrong as forecast errors are inevitable. We can also evaluate the forecasting performance of the models using various methods such as Mean Square Error, Mean Absolute Error, Mean Percentage Absolute Error etc.
The estimated model can also be used for control or policy purposes. This can be done by manipulating the control variables (X) to produce the desired values of target variable (Y). For example, in recession periods government want to increase aggregate demand, this can be done by decreasing taxes which increase the disposable and increase consumption which is the component of aggregate demand.
A cross-sectional dataset is collected across sample units at a given point in time. Such as individuals, households, firms, cities, states, countries. For example, Pakistan Social and Living standard Measurement is a cross-sectional dataset which collects information from various households in Pakistan about different socio-economic indicators such as income, education, health, occupation, housing, and sanitation etc. Other examples of cross-section datasets in Pakistan are Demographic Health Survey, Multiple Indicator Cluster Survey, data on population census. While dealing with cross section data an important problem that economists must face is heterogeneity.
A time series dataset is a collection of data recorded over the period of time in chronological order. For example, data about GDP, CPI, labor force of Pakistan is collected from 1970 to 2023. The order time series data is very important. Time series data is collected at various frequences such as daily, weekly, monthly, and annually. An important feature of time series data is past observations affect the current observations. The original use of time series data is forecasting based on past information. Forecasting requires that the data is stationary, but most time series data is non-stationary. Thus, time series requires separate tools for modeling. Time series data has four components trend, cyclical, seasonal and irregular component.
Pooled data or combined data have features of both cross section and time series data. For example, suppose that two cross-sectional household surveys are taken in Pakistan, one in 1985 and one in 1990. In 1985, a random sample of households is surveyed for variables such as income, savings, family size, and so on. In 1990, a new random sample of households is taken using the same survey questions. To increase our sample size, we can form a pooled cross section by combining the two years.
In panel or longitudinal dataset data is collected for several cross-section units over time. For example, we collect data about GDP, inflation, unemployment rate, money supply, investment for all developing countries over the time from 1970 to 2023. The key feature of panel data that distinguishes them from a pooled cross section is that the same cross-sectional units are followed over a given time period. World Development Indicators (WDI) of World Bank is an example of international panel dataset. More examples are Pew Research Center Surveys, World Economic Outlook (WEO) Database, World Values Survey etc. If all cross-section units have the same number of observations, then it is called balanced panel. If all cross-section units have not the same observations, then, it is called unbalanced panel.
Experimental data refers to the data collected
through controlled experiments in natural sciences, where variables are
manipulated to observe their effects on other variables. This type of
data allows researchers to establish cause-and-effect relationships.
Non Experimental data also known as observational
data or retrospective data is collected by observing and
recording events, behaviors, or phenomena as they naturally occur
without interference or manipulation.
Secondary data is existing knowledge obtained from
sources such as books, reports, and surveys. It is the data that has
already been collected through primary sources and made readily
available for researchers to use for their own research. For example,
PSLM/HIES data set is a secondary data source for researchers and
policymakers. Primary data is the information collected
first time by a researcher for his/her own research. It is collected
through Surveys, interviews, experiments, observations, and
questionnaires.
Quantitative variables are those variables whose values can be measured and expressed on a numerical scale. Such as GDP, GNP, Exports Investment etc. Quantitative variables can be classified into two broad categories namely: (i) discrete variables and (ii) continuous variables. Qualitative variables are those variables which cannot be measured and expressed numerically but can be classified into several groups or categories. Such as gender, education level, social status etc. Categorical variable is one that has two or more categories. There are two types of categorical variables: nominal and ordinal. A categorical variable is said to be a nominal variable if it has no intrinsic ordering to its categories, but an ordinal variable has a clear ordering to its categories.
Data can be classified according to the levels of measurement or measurement scales. Levels of measurement determine how data should be summarized or presented. It also indicates the type of statistical analysis that can be performed. There are four levels of measurement nominal, the lowest level, ordinal, interval, and ratio, the highest level.
Nominal level data
In nominal scale data is classified into mutually exclusive qualitative categories. Nominal data is often represented as labels or names. We can just count the data in each category or can convert it into percentages. Examples are gender, colors, house numbers. We neither rank the data nor perform any other mathematical operation. Nominal data is often represented by bar charts. We can also label the groups such as 1 for male and 2 for female. Table 1 shows classification of ballon colors with labels.
Labels | Balloon Colors | Percentage in bag |
---|---|---|
1 | Blue | 24 % |
2 | Green | 20 % |
3 | Orange | 16 % |
4 | Yellow | 14 % |
5 | Red | 13 % |
6 | Brown | 13 % |
Table 1 Example of Nominal Scale
Ordinal-Level Data
The next higher level of measurement is ordinal level which has the characteristics of nominal scale and in addition has the property of ordering or ranking of measurement. For example, the performance of students in class test like excellent, good, fair, poor, very poor. Classification of households based on their income such as poor class, lower middle class, upper middle class, and rich class. Table 2 shows 60 students rating of economics professors based on their teaching quality.
Rating | Frequency | Percentage |
---|---|---|
Superior | 6 | 10% |
Good | 26 | 43.3% |
Average | 16 | 26.7% |
Poor | 9 | 15% |
Inferior | 3 | 5% |
Table 2 Example of Ordinal Scale
An important characteristic of relative measurement scale is that we cannot distinguish the magnitude of the difference between the groups. For example, we do not know if the difference between “Superior” and “Good” is the same as the difference between “Poor” and “Inferior”.
Interval-level Data
Interval level data has all the characteristics of ordinal level data, but the difference between values is also meaningful. Temperature is an example of interval scale. Suppose that the high temperature on three consecutive winter days in Jauharabad is 18, 20, 22 degrees Celsius. These temperatures can easily be ranked, and we can also measure the distance between temperatures. The distance between 18 and 20 degrees Celsius is 2 degrees which is same as 2 degrees between 20 and 22 degrees Celsius. But we cannot say that 40 degrees Celsius is twice as high as 20 degrees Celsius. Similarly, the temperature of 0 degrees Celsius does not mean the absence of temperature (because it is 32 degrees on Fahrenheit scale). Thus, interval scale has no true or natural zero point. Remember three things about Interval scale.
Month | Average Temperature (°C) |
---|---|
January | -2 |
February | 0 |
March | 5 |
April | 10 |
May | 16 |
June | 20 |
July | 24 |
August | 23 |
September | 18 |
October | 12 |
November | 5 |
December | 0 |
Table 3 Example of Interval level data
Ratio-Level Data
Ratio level data is the highest level of measurement which have all the characteristics of interval scale and ratio between values and zero point are also meaningful. All quantitative variables are measured in ratio scale. It is used to measure height, weight, volume, length, money, units of production, prices etc. For example, if you have zero rupees, you have no money. Table 3 shows the monthly income of four different designations. The Professor earns twice as that of Associate Professor and four times as the Lecturer.
Designation | Income |
---|---|
Lecturers | 80,000 |
Assistant Professors | 120,000 |
Associate Professors | 160,000 |
Professor | 320,000 |
Table 4 Example of Ratio Scale