Topic 3: Econometric Modelling

In this topic you will learn;

3.1 Introduction to R Programming
3.2 Econometric Model Definition
3.3 Model Construction Issues
3.4 The Assumption
3.5 Validation and Testing and
3.6 Model Estimation Procedure.

If you view by using web, please choose the tab accordingly;

3.1 Introduction to R Programming

R is a language and environment for statistical computing and graphics. It’s an open source solution to data analysis that is supported by a large and active worldwide research community. Why R?

R is free.
R is a comprehensive statistical platform.
R is a powerful platform for interactive data analysis and exploration.
R functionality can be integrated into application written in other languages.
R runs on a wide array of platform (Windows, Unix and Mac OS X).

i. Working with R

An object basically anything that can be assigned as a value, as example;

Example 1

x <- 3
x

## [1] 3

We assigned an object, x, with value equal to 3.

Example 2

y <- c(2, 1, 3)
y

## [1] 2 1 3

We created a vector named y containing three values, 2, 1 and 3.

R uses the symbol ‘<-’ for assignment rather than typical ‘=’. Nevertheless, R allows the ‘=’ symbol to be used for object assignment, but it’s not a standard syntax. There are some situations in which it won’t work.

ii. Workspace

Workspace is our current R working environment and includes any user-defined objects. At the end of an R session, we can save an image of the current workspace that’s automatically reloaded next time R starts.

The current working directory is the directory from which R will read files and to which it will save results by default. We can find out what is the current working directory by using getwd() function.

getwd()

## [1] "C:/Users/Asmui/Documents/time series notes"

We can set the current working directory by using the setwd() function or by navigating Session > Set Working Directory > Choose Directory in RStudio tabs. Note that if we need to input (or read) a file that is not in the current working directory, we have to use the full pathname in the call.

iii. Packages

Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored on our computer is called library.

R comes with a standard set of packages (including base, utils, stats, graphics, and many more). These standard packages are already included in our R installation. The packages does not need to be loaded by library() function.

In addition, R packages also available as extension, more than thousands user-contributed packagegs that extend the R capabilities. These packages are available for download and installation. Once installed, they must be loaded into the session (every new session) in order to be used.

To install a package for the first time;

install.packages ("tseries")

or you can use RStudio tabs; Tools > Install Packages

Load the installed package;

library (tseries)

iv. R Data Structures

A scalar: holds only a single atomic value at a time.

h <- 3.5

Vector: one-dimensional arrays that can hold numeric data, character data, or logical data. The combine function c() is used to form vector.

a <- c(1, 2, 4, -1)
b <- c("blue", "yellow", "green")
c <- c(TRUE, TRUE, FALSE, FALSE, TRUE)

The outputs are given as below;

## [1]  1  2  4 -1

## [1] "blue"   "yellow" "green"

## [1]  TRUE  TRUE FALSE FALSE  TRUE

Matrix: two-dimensional array in which each element has the same mode (numeric, character, or logical).

y1 <- matrix (1:20, nrow=5, ncol=4)
y1

##      [,1] [,2] [,3] [,4]
## [1,]    1    6   11   16
## [2,]    2    7   12   17
## [3,]    3    8   13   18
## [4,]    4    9   14   19
## [5,]    5   10   15   20

y2 <- matrix (1:20, nrow=5, ncol=4, byrow=TRUE)
y2

##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12
## [4,]   13   14   15   16
## [5,]   17   18   19   20

Data frame: more general than a matrix (two-dimensional) in that different columns can contain different modes of data (numeric, character, and so on). The most common data structure that we deal with in R.

age <- c(25, 34, 28, 52)
bloodtype <- c("O", "A", "B", "AB")
blood.data <- data.frame (age, bloodtype)
blood.data

##   age bloodtype
## 1  25         O
## 2  34         A
## 3  28         B
## 4  52        AB

Time series: Represent data which has been sampled chronologically with equal discrete time interval. Function ts() is used to create time-series objects.

sales <- c(18, 33, 41, 7, 34, 35, 24, 25, 24, 21, 25, 20, 22, 31, 
           40, 29, 25, 21, 22, 54, 31, 25, 26, 35)
sales.ts <- ts(sales, start=c(2018, 1), frequency=12)
sales.ts

##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2018  18  33  41   7  34  35  24  25  24  21  25  20
## 2019  22  31  40  29  25  21  22  54  31  25  26  35

We can plot the time series object, sales.ts;

plot(sales.ts, type="o", pch=19, col="blue")

Some important functions related to time series in R Programming;

Functions for time-series analysis
Function	Package	Use
ts()	stats	Creates a time-series object.
plot()	graphics	Plots a time series.
start()	stats	Returns the starting time of a time series.
end()	stats	Returns the ending time of a time series.
frequency()	stats	Returns the period of a time series.
window()	stats	Subsets a time-series object.
ma()	forecast	Fits a simple moving-average model.
st1()	stats	Decompose a time series into seasonal, trend and irregular components using loess.
monthplot()	stats	Plots the seasonal components of a time series.
seasonplot()	forecast	Generates a season plot.
HoltWinters()	stats	Fits an exponential smoothing model.
forecast()	forecast	Forecasts future values of a time series.
accuracy()	forecast	Reports fit measures for a time-series model.
ets()	forecast	Fits an exponential smoothing model. Includes the ability to automate the selection of a model.
lag()	stats	Returns a lagged version of a time series.
Acf()	forecast	Estimates the autocorrelation function.
Pacf()	forecast	Estimates the partial autocorrelation function.
diff()	base	Returns lagged and iterated differences.
ndiffs()	forecast	Determines the level of differencing needed to remove trends in a time series.
adf.test()	tseries	Computes an Augmented Dickey-Fuller test that a time series is stationary.
arima()	stats	Fits autoregressive integrated moving-average models.
Box.test()	stats	Computes a Ljung-Box test that the residuals of a time serues are independent.
bds.test()	tseries	Computes the BDS test that a series consist of independent, identically, distributed random variables.
auto.arima()	forecast	Automates the selection of an ARIMA model.

Reference

Robert Kabacoff. 2015. R in Action: Data Analysis and Graphics with R. Manning Publications Co., USA.

3.2 Econometric Model Definition

i. Basic structure of Econometric Model

The variables used may be categorized as;

dependent/endogenous/response variable and
independent/exogenous/explanatory variables.

For example, let say, we try to investigate the factors that affecting the sales, represent as;

\[y_t=f(x_{1t}, x_{2t}, ..., x_{mt})\tag{1}\] where \(y_t\) is total sales in time, \(t\), \(x_{1t},x_{2t},x_{3t}, ..., x_{mt}\) are the possible \(m\) factors that affecting the total sales, which may include price of product, consumers’ income level, interest rate, and so forth at time \(t\).

Equation 1 states that the the factors, \(y_t\), is influenced by the factors \(x_{1t},x_{2t},x_{3t}, ..., x_{mt}\) which are defined as independent variables, and the relationship between these variables is established based on the historical data.

Dependent Variable;
\(y_t\)

Independent Variable;
\(x_{1t}, x_{2t},x_{3t},...,x_{mt}\)
\(x_{j(t-p)}\) (lag \(p\) of the \(j^{th}\)) variable for \(j=1,2,3,...,m\)
\(y_{t-1},y_{t-2},...,y_{t-p}\), for \(p=1,2,3,...\) (lag of dependent variable used as independent variables)

and it can be put in a compact form; \[ y_t=\beta_0+\sum_{j=1}^m\beta_jx_{jt}+\varepsilon_t \tag{2}\]
A general model with lag variables used as independent variables can be written as;

\[ y_t=\beta_0+\sum_{k=1}^K\beta_kx_{kt}+\sum_{j=1}^{P}\phi_jy_{t-j}+\sum_{k=1}^K\sum_{j=1}^q\omega_{kj}x_{k(t-j)}+\varepsilon_t \tag{3}\] Equation (2) assumes that the relationship between \(y_t\) and \(x_{jt}\)’s is linear and that the matrix of all \(x_{jt}\)’s variables are non-stochastic (non-random with specified fixed values) and \(y_t\) is a random variable.

ii. The fundamentals of the OLS Technique

To explain the concept, let us consider a regression model with one independent variable, \(x_{1t}\),

\[y_t=\beta_{0}+\beta_1x_{1t}+\varepsilon_t \tag{4}\]

where \(\beta_0\) and \(\beta_1\) are unknown parameters to be estimated, and \(\varepsilon_t\) is identically, independently and normally distributed with mean zero and variance, \(\sigma_{\varepsilon}^2\).

Next, it follows that the estimated regression equation can be written as; \[\hat{y}_t=\hat{\beta}_0+\hat{\beta}_1x_{1t}\tag{5}\] and that \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are unbiased estimator of \(\beta_0\) and \(\beta_1\) respectively. From Equation (4), therefore, we have;

\[ \begin {aligned} e_t&=y_t-\hat{y}_t\\ &=y_t-(\hat{\beta}_0+\hat{\beta}_1x_{1t}) \end {aligned} \] and that, \[ \sum_{t=1}^ne_t^2=\sum_{t=1}^n[y_t-(\hat{\beta}_0+\hat{\beta}_1x_{1t})]^2\tag{6}\] thus, minimising \(\sum_{t-1}^ne_t^2\) is also minimising \(\sum_{t=1}^n[y_t-(\hat{\beta}_0+\hat{\beta}_1x_{1t})]^2\). Further explanation in textbook page 175-178.

The goal for OLS is to select the model parameters (intercept and slopes) that minimise the difference between actual response values and those predicted by the model.

Generally, to properly interpret the coefficients of the OLS model, we must statisfy a number of statistical assumptions which will discuss later.

iii. Econometric Model in R Programming

Since we will formulating model that assume the dependent variable, \(y_t\) is a linear function of a set of independent variables, we will discuss on how to fit multiple linear regression model by using R.

Basic funtion for fitting a linear model in lm (). The format is

model1 <- lm(formula, data)

where formula describes the model to be fit and data is the data frame containing the data to be used in fitting the model. The resulting object (model1) is a list that contains extensive information about the model. The formula is typically written as; \[Y \sim X_1+X_2+...+X_k \] where the ~ separates the response variable on the left from the predictor variables on the right, and the predictor variables are separated by + signs.

Example by using built-in dataset state.x77. It is always a good idea to examine the relationships among the variables two at a time. Bivariate correlations are provided by the cor () function, and scatter plots are generated from the scatterplotMatrix() function on the car package.

states <- as.data.frame(state.x77[, c("Murder", "Population", "Illiteracy", "Income", "Frost")]) 
cor(states)

##                Murder Population Illiteracy     Income      Frost
## Murder      1.0000000  0.3436428  0.7029752 -0.2300776 -0.5388834
## Population  0.3436428  1.0000000  0.1076224  0.2082276 -0.3321525
## Illiteracy  0.7029752  0.1076224  1.0000000 -0.4370752 -0.6719470
## Income     -0.2300776  0.2082276 -0.4370752  1.0000000  0.2262822
## Frost      -0.5388834 -0.3321525 -0.6719470  0.2262822  1.0000000

library(car)

## Loading required package: carData

scatterplotMatrix (states, spread=FALSE, smoother.args=list(lty=2), main="Scatter Plot Matrix")

Next, we will fit the multiple regression model by using lm() function;

fit <- lm(Murder ~ Population + Illiteracy + Income + Frost, data = states)
summary(fit)

## 
## Call:
## lm(formula = Murder ~ Population + Illiteracy + Income + Frost, 
##     data = states)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7960 -1.6495 -0.0811  1.4815  7.6210 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.235e+00  3.866e+00   0.319   0.7510    
## Population  2.237e-04  9.052e-05   2.471   0.0173 *  
## Illiteracy  4.143e+00  8.744e-01   4.738 2.19e-05 ***
## Income      6.442e-05  6.837e-04   0.094   0.9253    
## Frost       5.813e-04  1.005e-02   0.058   0.9541    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.535 on 45 degrees of freedom
## Multiple R-squared:  0.567,  Adjusted R-squared:  0.5285 
## F-statistic: 14.73 on 4 and 45 DF,  p-value: 9.133e-08

Functions that are useful when fitting linear models
Function	Action
summary ()	Displays detailed results for the fited model
coefficients ()	Lists the model parameters (intercept and slopes) for the fitted model
confint ()	Provides confidence intervals for the model paramaters (95% by default)
fitted ()	Lists the predicted vaues in a fitted model
residuals ()	Lists the residuals values in a fitted model
anova ()	Generates an ANOVA table for a firtted model, or an ANOVA table comparing two or more fitted models
vcov ()	Lists the covariance matrix for model parameters
AIC ()	Prints Akaike’s Information Criterion
plot ()	Generates diagnostic plots for evaluating the fit of a model
predict ()	Uses a fitted model to predict response values for a new dataset

Reference

Robert Kabacoff. 2015. R in Action: Data Analysis and Graphics with R. Manning Publications Co., USA.
Mohd. Alias Lazim. (2007). Introductory Business Forecasting a practical approach. University Publication Centre (UPENA).

3.3 Model Construction Issues

A good model builder will initially adress several important issues prior to actually starting to develop model. Process of building an econometric model for forecasting purposes is not simply the act of finding the dependent variable to be forecasted and then determining the independent variable(s) to explain it. These are some issues that need to be determined before we construct the model.

Reference

Mohd. Alias Lazim. (2007). Introductory Business Forecasting a practical approach. University Publication Centre (UPENA).

3.4 The Assumptions

Reference

Mohd. Alias Lazim. (2007). Introductory Business Forecasting a practical approach. University Publication Centre (UPENA).

3.5 Model Validation and Testing

i. Introduction

The following examples describe some of the commonly used statistical testing procedures by using R Programming.

We will use several data sets as example for each statistical testing procedures.

library (dynlm)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library (foreign)
library (car)
library (lmtest)

If there is no package called ‘dynlm’, ‘foreign’, ‘car’ and/or ‘lmtest’, install the packages needed, internet connection is a must to install the packages;

install.packages ('dynlm')
install.packages ('foreign')
install.packages ('car')
install.packages ('lmtest')

‘dynlm’ package is needed to incoprate the lag term in the model.
‘foreign’ package is needed to call the data from the URL.
‘car’ for vif () function to test the multicollinearity.
‘lmtest’ is needed to perform the Durbin-Watson test.

ii. Model Validation and Testing

data1<-longley
colnames (longley)

## [1] "GNP.deflator" "GNP"          "Unemployed"   "Armed.Forces" "Population"  
## [6] "Year"         "Employed"

reg1<-lm(Employed~GNP.deflator + GNP + Unemployed + Armed.Forces + Population, data=data1)
summary(reg1)

## 
## Call:
## lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
##     Population, data = data1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.55324 -0.36478  0.06106  0.20550  0.93359 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  92.461308  35.169248   2.629   0.0252 *
## GNP.deflator -0.048463   0.132248  -0.366   0.7217  
## GNP           0.072004   0.031734   2.269   0.0467 *
## Unemployed   -0.004039   0.004385  -0.921   0.3788  
## Armed.Forces -0.005605   0.002838  -1.975   0.0765 .
## Population   -0.403509   0.330264  -1.222   0.2498  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4832 on 10 degrees of freedom
## Multiple R-squared:  0.9874, Adjusted R-squared:  0.9811 
## F-statistic: 156.4 on 5 and 10 DF,  p-value: 3.699e-09

1. General Fitness of the Model (F-Statistics)

Based on the output above, F-statistics is equal to 156.6 and is highly significant (p-value = 3.699e-09 < \(\alpha = 0.05\)). Overall, estimated model fits the data well.

2. Regression Coefficients (t-statistics)

Next, we carry out individual significance test of the estimated model by using the p-value for t-test. The null hypothesis for t-test implied that the individual variable should not be included in the data, \(\beta=0\).

Based on the output above, only GNP is significant as the p-value = 0.0467 which is less than \(\alpha = 0.05\), while other variables produce insignificant value at 5% significance level (fail to reject the null hypothesis).

3. Goodness of Fit (R-squared and Adjusted R-Squared)

We can measure the goodness of fit of the model by using the R-squared or/and the Adjusted R-squared value i which, the value is bounded between 0 and 1.

The value R-squared is interpreted as the total variation in \(y\) that is explained by the independent variable (s). In this example, 98.74% of the total variation in Employed is explained by the all independent variables, while remaining 1.26% is explained by other factors.

It is advisable to evaluate the goodness fit of the model based on the value of adjusted R-squared. Closer to 1, meaning that the model is a good fit. For this example, the adjusted \(R^2\) = 0.9811, suggesting the estimated models fits the data well.

Note that, based on the information of previous t-statistics, they might be only one variable that contribute to the high value of R-squared and adjusted R-squared, thus further investigation is needed when we dealing with this validation and testing procedure.

4. Heterocedasticity

Usual practice in econometric modelling is to assume the error variance is constan over all times and locations (homocedasticity).

If we do not have constant variance, then we have heterocedasticity. If there is such issue, parameters obtained by OLS method are no longer minimum variance unbiased estimator. Over time, the estimates of the dependent variance become less and less predictable.

Procedure to eliminate heterocedasticity problem is to take a suitable transformation, such as log transformation.

In addition with plot, the White test and the Breusch-Pagan test are commonly used to determine the existence of heterocedasticity.

bptest (reg1)

## 
##  studentized Breusch-Pagan test
## 
## data:  reg1
## BP = 3.9203, df = 5, p-value = 0.5609

Based on the regression model we fitted before this, the Breusch-Pagan test produce p-value more than 0.05, thus we accept the null hypothesis that the variance of the residuals is constants. Thus, the model is homoscedastic.

5. Multicollinearity

When one independent variables are related to each other, multicollinearity among the variables is said to exist.

vif (reg1)

## GNP.deflator          GNP   Unemployed Armed.Forces   Population 
##   130.829201   639.049777    10.786858     2.505775   339.011693

tol <- 1/vif(reg1)

Collinearity <- data.frame (VIF = vif (reg1), Tolerance = tol)
Collinearity

##                     VIF   Tolerance
## GNP.deflator 130.829201 0.007643554
## GNP          639.049777 0.001564823
## Unemployed    10.786858 0.092705399
## Armed.Forces   2.505775 0.399078059
## Population   339.011693 0.002949751

For multicollinearity, it can be detected if;

the largest VIF is greater than 10.
the tolerance statistics is below than 0.1.

Based on the output, there are serious multicollinearity problem where most of the VIF is greater than 10 (tolerance below than 0.1) suggesting that the remedial action is needed to improve the model fitting.

Remedial action:

If the presence of multicollinearity does not affect forecasting performance, retain the variables.
The usual procedure when multicollinearity exists is to drop the offending variable or;
Alternatively to drop the variable that provides lesser contribution towards model improvement.
Increase the sample size since larger data set is presumed to provide more accurate estimates.

6. Serial Correlation

Serial correlation also known as autocorrelation. This is the case when the error terms, \(\varepsilon_t\), corresponding to different period of time are related to each other.

For this example, we will use phillps data to show example of serial correlation

phillips<-read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/phillips.dta")
tsdata<-ts(phillips, start=1948) #define yearly time series data in 1948
reg.s<-dynlm(inf~unem, data=tsdata, end=1996) #estimation of static Phillips curve
reg.ea<-dynlm(d(inf)~unem, data=tsdata, end=1996) #same with expectations-augmented Phillips curve

Durbin-Watson test for Serial Correlation

dwtest(reg.s)

## 
##  Durbin-Watson test
## 
## data:  reg.s
## DW = 0.8027, p-value = 7.552e-07
## alternative hypothesis: true autocorrelation is greater than 0

dwtest(reg.ea)

## 
##  Durbin-Watson test
## 
## data:  reg.ea
## DW = 1.7696, p-value = 0.1783
## alternative hypothesis: true autocorrelation is greater than 0

The null hypothesis for Durbin-Watson test is there is no serial correlation in the model. Since reg.s is lower than \(\alpha = 0.05\), there is the presence of serial correlation for model estimation based on static Phillips curve.

However, when the second regression, reg.ea is estimated, the p-value is greater than \(\alpha = 0.05\) which means the regression has no autocorrelation. If only D-Watson statistic value is given, we can use the rule of thumbs as discussed during class session.

Reference

Mohd. Alias Lazim. (2007). Introductory Business Forecasting a practical approach. University Publication Centre (UPENA).
https://www.dummies.com/education/economics/econometrics/a-graphical-inspection-of-residuals/

3.6 Model Estimation Procedure

i. Loading Require Packages

library (dynlm)

ii. Data Preparation

Download Example6_3.csv datasets
Read Example6_3.csv data set into R Console

example6.3 <- read.csv (file.choose (), header = TRUE)

and choose Example6_3 data set in your downloaded folder.

Or, Copy and Paste the data set into your file directory and perform this code in R console.

example6.3 <- read.csv ("Example6_3.csv", header = TRUE)

Since that we will works with time series data, it is best for us to define the data set as time series by using ts () function

example6.3 <- ts (example6.3, start = 1962, frequency = 1)

Check the stucture of the dataset to ensure that R read our data set correctly.

str (example6.3) # check the structure

##  Time-Series [1:36, 1:9] from 1962 to 1997: 1962 1963 1964 1965 1966 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:9] "Year" "Cars" "UnemplRa" "GDP" ...

head (example6.3) # check the first 6 observations

##      Year Cars UnemplRa   GDP Export PopSize AvCarLan PerCapIn  CPI
## [1,] 1962 11.9      7.9 10426   2626     7.4      7.5     1409 31.3
## [2,] 1963 14.1      7.8 13077   2705     8.9      7.2     1469 32.2
## [3,] 1964 16.5      7.8 13932   2781     9.2      7.2     1514 32.1
## [4,] 1965 18.1      7.9 15400   3103     9.4      7.5     1638 32.9
## [5,] 1966 17.6      7.8 16376   3120     9.7      7.5     1688 33.4
## [6,] 1967 16.3      7.8 16612   3723    10.0      7.4     1661 34.8

List of all variables name

colnames (example6.3)

## [1] "Year"     "Cars"     "UnemplRa" "GDP"      "Export"   "PopSize"  "AvCarLan"
## [8] "PerCapIn" "CPI"

Simple Plotting to check variables characteristics over time

Plotting the variables except ‘Year’ variable;

plot (example6.3[,-1], main = "Plotting for All Variables over Times (1962-1997)")

Comment: (Answer 1)

iii. General-to-Specific Modelling

We will demonstrate on how to perform Model Estimation Procedure (General-to-Specific Approach) by using R Programming.

Regression Model 1

reg1 <- dynlm(Cars~UnemplRa+GDP+Export+PopSize+AvCarLan+PerCapIn+CPI+L(Cars), data = example6.3)
summary (reg1)

## 
## Time series regression with "ts" data:
## Start = 1963, End = 1997
## 
## Call:
## dynlm(formula = Cars ~ UnemplRa + GDP + Export + PopSize + AvCarLan + 
##     PerCapIn + CPI + L(Cars), data = example6.3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.487  -7.154  -0.404   8.736  42.431 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -68.490721 120.329724  -0.569 0.574111    
## UnemplRa      6.041032   7.192849   0.840 0.408640    
## GDP          -0.005416   0.004463  -1.213 0.235877    
## Export        0.001423   0.001180   1.206 0.238688    
## PopSize      -2.310819  16.809494  -0.137 0.891718    
## AvCarLan     -5.795693   5.671624  -1.022 0.316255    
## PerCapIn      0.106478   0.057699   1.845 0.076396 .  
## CPI          -0.064020   1.438768  -0.044 0.964849    
## L(Cars)       0.932328   0.215747   4.321 0.000201 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.83 on 26 degrees of freedom
## Multiple R-squared:  0.963,  Adjusted R-squared:  0.9517 
## F-statistic: 84.67 on 8 and 26 DF,  p-value: < 2.2e-16

Comment: (Answer 2)

Regression Model 2

reg2 <- dynlm(Cars~UnemplRa+Export+PopSize+AvCarLan+PerCapIn+CPI+L(Cars), data = example6.3)
summary (reg2)

## 
## Time series regression with "ts" data:
## Start = 1963, End = 1997
## 
## Call:
## dynlm(formula = Cars ~ UnemplRa + Export + PopSize + AvCarLan + 
##     PerCapIn + CPI + L(Cars), data = example6.3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.352  -8.066   0.741   8.500  44.422 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.345e+01  5.198e+01   1.221 0.232760    
## UnemplRa     6.086e+00  7.255e+00   0.839 0.408962    
## Export       6.707e-05  3.822e-04   0.175 0.862023    
## PopSize     -1.615e+01  1.245e+01  -1.297 0.205589    
## AvCarLan    -6.531e+00  5.688e+00  -1.148 0.261001    
## PerCapIn     7.724e-02  5.288e-02   1.461 0.155663    
## CPI         -7.812e-01  1.323e+00  -0.590 0.559821    
## L(Cars)      8.779e-01  2.129e-01   4.124 0.000318 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.99 on 27 degrees of freedom
## Multiple R-squared:  0.9609, Adjusted R-squared:  0.9508 
## F-statistic: 94.89 on 7 and 27 DF,  p-value: < 2.2e-16

Comment: (Answer 3)

Regression Model 3

reg3 <- dynlm(Cars~UnemplRa+Export+AvCarLan+PerCapIn+CPI+L(Cars), data = example6.3)
summary (reg3)

## 
## Time series regression with "ts" data:
## Start = 1963, End = 1997
## 
## Call:
## dynlm(formula = Cars ~ UnemplRa + Export + AvCarLan + PerCapIn + 
##     CPI + L(Cars), data = example6.3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.216  -4.940  -0.819   8.499  53.431 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 47.5385857 51.1243344   0.930 0.360390    
## UnemplRa     0.1373883  5.6903397   0.024 0.980909    
## Export       0.0003430  0.0003214   1.067 0.294909    
## AvCarLan    -6.6973355  5.7557271  -1.164 0.254408    
## PerCapIn     0.0230321  0.0327941   0.702 0.488275    
## CPI         -0.9711533  1.3309798  -0.730 0.471663    
## L(Cars)      0.8708293  0.2153761   4.043 0.000374 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.22 on 28 degrees of freedom
## Multiple R-squared:  0.9585, Adjusted R-squared:  0.9496 
## F-statistic: 107.8 on 6 and 28 DF,  p-value: < 2.2e-16

Comment: (Answer 4)

Regression Model 4

reg4 <- dynlm(Cars~Export+AvCarLan+PerCapIn+CPI+L(Cars), data = example6.3)
summary (reg4)

## 
## Time series regression with "ts" data:
## Start = 1963, End = 1997
## 
## Call:
## dynlm(formula = Cars ~ Export + AvCarLan + PerCapIn + CPI + L(Cars), 
##     data = example6.3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.278  -4.893  -0.847   8.518  53.452 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 48.3930460 36.2537524   1.335    0.192    
## Export       0.0003476  0.0002544   1.366    0.182    
## AvCarLan    -6.6159001  4.5828373  -1.444    0.160    
## PerCapIn     0.0227653  0.0303402   0.750    0.459    
## CPI         -0.9679251  1.3012286  -0.744    0.463    
## L(Cars)      0.8670720  0.1463046   5.926 1.95e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.89 on 29 degrees of freedom
## Multiple R-squared:  0.9585, Adjusted R-squared:  0.9514 
## F-statistic:   134 on 5 and 29 DF,  p-value: < 2.2e-16

Comment: (Answer 5)

Regression Model 5

reg5 <- dynlm(Cars~Export+AvCarLan+CPI+L(Cars), data = example6.3)
summary (reg5)

## 
## Time series regression with "ts" data:
## Start = 1963, End = 1997
## 
## Call:
## dynlm(formula = Cars ~ Export + AvCarLan + CPI + L(Cars), data = example6.3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.970  -6.370   0.206   7.429  56.067 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 66.0337831 27.3952319   2.410  0.02227 *  
## Export       0.0004972  0.0001570   3.167  0.00353 ** 
## AvCarLan    -8.0146917  4.1559314  -1.928  0.06330 .  
## CPI         -0.0268940  0.3443225  -0.078  0.93826    
## L(Cars)      0.9015169  0.1379005   6.537 3.14e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.75 on 30 degrees of freedom
## Multiple R-squared:  0.9577, Adjusted R-squared:  0.9521 
## F-statistic: 169.8 on 4 and 30 DF,  p-value: < 2.2e-16

Comment: (Answer 6)

Regression Model 6

reg6 <- dynlm(Cars~Export+AvCarLan+L(Cars), data = example6.3)
summary (reg6)

## 
## Time series regression with "ts" data:
## Start = 1963, End = 1997
## 
## Call:
## dynlm(formula = Cars ~ Export + AvCarLan + L(Cars), data = example6.3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.084  -6.157   0.384   7.248  55.736 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 66.1549333 26.9092562   2.458 0.019741 *  
## Export       0.0004911  0.0001340   3.663 0.000922 ***
## AvCarLan    -8.1770225  3.5407782  -2.309 0.027751 *  
## L(Cars)      0.8998982  0.1341310   6.709 1.66e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.45 on 31 degrees of freedom
## Multiple R-squared:  0.9577, Adjusted R-squared:  0.9536 
## F-statistic: 233.9 on 3 and 31 DF,  p-value: < 2.2e-16

Comment: (Answer 7)

Exercise

Comment on each part;

Answer 1:

Answer 2:

Answer 3:

Answer 4:

Answer 5:

Answer 6:

Answer 7:

Reference

Mohd. Alias Lazim. (2007). Introductory Business Forecasting a practical approach. University Publication Centre (UPENA).

Notes compiled by;

Muhammad Asmu’i Abdul Rahim

Email: asmui@tmsk.uitm.edu.my

Updated On: 11Nov2020

STA572: Time Series Analysis and Forecasting

Econometric Modelling with R Examples

Nov2020

Topic 3: Econometric Modelling

3.1 Introduction to R Programming

i. Working with R

ii. Workspace

iii. Packages

iv. R Data Structures

Reference

3.2 Econometric Model Definition

i. Basic structure of Econometric Model

ii. The fundamentals of the OLS Technique

iii. Econometric Model in R Programming

Reference

3.3 Model Construction Issues

Reference

3.4 The Assumptions

Reference

3.5 Model Validation and Testing

i. Introduction

ii. Model Validation and Testing

1. General Fitness of the Model (F-Statistics)

2. Regression Coefficients (t-statistics)

3. Goodness of Fit (R-squared and Adjusted R-Squared)

4. Heterocedasticity

5. Multicollinearity

6. Serial Correlation

Reference

3.6 Model Estimation Procedure

i. Loading Require Packages

ii. Data Preparation

iii. General-to-Specific Modelling

Regression Model 1

Regression Model 2

Regression Model 3

Regression Model 4

Regression Model 5

Regression Model 6

Exercise