Topic 3: Econometric Modelling

In this topic you will learn;

3.1 Introduction to R Programming
3.2 Econometric Model Definition
3.3 Model Construction Issues
3.4 The Assumption
3.5 Validation and Testing and
3.6 Model Estimation Procedure.

If you view by using web, please choose the tab accordingly;

3.1 Introduction to R Programming

R is a language and environment for statistical computing and graphics. It’s an open source solution to data analysis that is supported by a large and active worldwide research community. Why R?

  1. R is free.
  2. R is a comprehensive statistical platform.
  3. R is a powerful platform for interactive data analysis and exploration.
  4. R functionality can be integrated into application written in other languages.
  5. R runs on a wide array of platform (Windows, Unix and Mac OS X).

i. Working with R

An object basically anything that can be assigned as a value, as example;

Example 1

x <- 3
x
## [1] 3

We assigned an object, x, with value equal to 3.

Example 2

y <- c(2, 1, 3)
y
## [1] 2 1 3

We created a vector named y containing three values, 2, 1 and 3.

R uses the symbol ‘<-’ for assignment rather than typical ‘=’. Nevertheless, R allows the ‘=’ symbol to be used for object assignment, but it’s not a standard syntax. There are some situations in which it won’t work.

ii. Workspace

Workspace is our current R working environment and includes any user-defined objects. At the end of an R session, we can save an image of the current workspace that’s automatically reloaded next time R starts.

The current working directory is the directory from which R will read files and to which it will save results by default. We can find out what is the current working directory by using getwd() function.

getwd()
## [1] "C:/Users/Asmui/Documents/time series notes"

We can set the current working directory by using the setwd() function or by navigating Session > Set Working Directory > Choose Directory in RStudio tabs. Note that if we need to input (or read) a file that is not in the current working directory, we have to use the full pathname in the call.

iii. Packages

Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored on our computer is called library.

R comes with a standard set of packages (including base, utils, stats, graphics, and many more). These standard packages are already included in our R installation. The packages does not need to be loaded by library() function.

In addition, R packages also available as extension, more than thousands user-contributed packagegs that extend the R capabilities. These packages are available for download and installation. Once installed, they must be loaded into the session (every new session) in order to be used.

To install a package for the first time;

install.packages ("tseries")

or you can use RStudio tabs; Tools > Install Packages

Load the installed package;

library (tseries)

iv. R Data Structures

A scalar: holds only a single atomic value at a time.

h <- 3.5

Vector: one-dimensional arrays that can hold numeric data, character data, or logical data. The combine function c() is used to form vector.

a <- c(1, 2, 4, -1)
b <- c("blue", "yellow", "green")
c <- c(TRUE, TRUE, FALSE, FALSE, TRUE)

The outputs are given as below;

a
## [1]  1  2  4 -1
b
## [1] "blue"   "yellow" "green"
c
## [1]  TRUE  TRUE FALSE FALSE  TRUE

Matrix: two-dimensional array in which each element has the same mode (numeric, character, or logical).

y1 <- matrix (1:20, nrow=5, ncol=4)
y1
##      [,1] [,2] [,3] [,4]
## [1,]    1    6   11   16
## [2,]    2    7   12   17
## [3,]    3    8   13   18
## [4,]    4    9   14   19
## [5,]    5   10   15   20
y2 <- matrix (1:20, nrow=5, ncol=4, byrow=TRUE)
y2
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12
## [4,]   13   14   15   16
## [5,]   17   18   19   20

Data frame: more general than a matrix (two-dimensional) in that different columns can contain different modes of data (numeric, character, and so on). The most common data structure that we deal with in R.

age <- c(25, 34, 28, 52)
bloodtype <- c("O", "A", "B", "AB")
blood.data <- data.frame (age, bloodtype)
blood.data
##   age bloodtype
## 1  25         O
## 2  34         A
## 3  28         B
## 4  52        AB

Time series: Represent data which has been sampled chronologically with equal discrete time interval. Function ts() is used to create time-series objects.

sales <- c(18, 33, 41, 7, 34, 35, 24, 25, 24, 21, 25, 20, 22, 31, 
           40, 29, 25, 21, 22, 54, 31, 25, 26, 35)
sales.ts <- ts(sales, start=c(2018, 1), frequency=12)
sales.ts
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2018  18  33  41   7  34  35  24  25  24  21  25  20
## 2019  22  31  40  29  25  21  22  54  31  25  26  35

We can plot the time series object, sales.ts;

plot(sales.ts, type="o", pch=19, col="blue")

Some important functions related to time series in R Programming;

Functions for time-series analysis
Function Package Use
ts() stats Creates a time-series object.
plot() graphics Plots a time series.
start() stats Returns the starting time of a time series.
end() stats Returns the ending time of a time series.
frequency() stats Returns the period of a time series.
window() stats Subsets a time-series object.
ma() forecast Fits a simple moving-average model.
st1() stats Decompose a time series into seasonal, trend and irregular components using loess.
monthplot() stats Plots the seasonal components of a time series.
seasonplot() forecast Generates a season plot.
HoltWinters() stats Fits an exponential smoothing model.
forecast() forecast Forecasts future values of a time series.
accuracy() forecast Reports fit measures for a time-series model.
ets() forecast Fits an exponential smoothing model. Includes the ability to automate the selection of a model.
lag() stats Returns a lagged version of a time series.
Acf() forecast Estimates the autocorrelation function.
Pacf() forecast Estimates the partial autocorrelation function.
diff() base Returns lagged and iterated differences.
ndiffs() forecast Determines the level of differencing needed to remove trends in a time series.
adf.test() tseries Computes an Augmented Dickey-Fuller test that a time series is stationary.
arima() stats Fits autoregressive integrated moving-average models.
Box.test() stats Computes a Ljung-Box test that the residuals of a time serues are independent.
bds.test() tseries Computes the BDS test that a series consist of independent, identically, distributed random variables.
auto.arima() forecast Automates the selection of an ARIMA model.
Reference
  1. Robert Kabacoff. 2015. R in Action: Data Analysis and Graphics with R. Manning Publications Co., USA.

3.2 Econometric Model Definition

i. Basic structure of Econometric Model

The variables used may be categorized as;

  1. dependent/endogenous/response variable and

  2. independent/exogenous/explanatory variables.

For example, let say, we try to investigate the factors that affecting the sales, represent as;

\[y_t=f(x_{1t}, x_{2t}, ..., x_{mt})\tag{1}\] where \(y_t\) is total sales in time, \(t\), \(x_{1t},x_{2t},x_{3t}, ..., x_{mt}\) are the possible \(m\) factors that affecting the total sales, which may include price of product, consumers’ income level, interest rate, and so forth at time \(t\).

Equation 1 states that the the factors, \(y_t\), is influenced by the factors \(x_{1t},x_{2t},x_{3t}, ..., x_{mt}\) which are defined as independent variables, and the relationship between these variables is established based on the historical data.

Dependent Variable;
\(y_t\)

Independent Variable;
\(x_{1t}, x_{2t},x_{3t},...,x_{mt}\)
\(x_{j(t-p)}\) (lag \(p\) of the \(j^{th}\)) variable for \(j=1,2,3,...,m\)
\(y_{t-1},y_{t-2},...,y_{t-p}\), for \(p=1,2,3,...\) (lag of dependent variable used as independent variables)


and it can be put in a compact form; \[ y_t=\beta_0+\sum_{j=1}^m\beta_jx_{jt}+\varepsilon_t \tag{2}\]
A general model with lag variables used as independent variables can be written as;

\[ y_t=\beta_0+\sum_{k=1}^K\beta_kx_{kt}+\sum_{j=1}^{P}\phi_jy_{t-j}+\sum_{k=1}^K\sum_{j=1}^q\omega_{kj}x_{k(t-j)}+\varepsilon_t \tag{3}\] Equation (2) assumes that the relationship between \(y_t\) and \(x_{jt}\)’s is linear and that the matrix of all \(x_{jt}\)’s variables are non-stochastic (non-random with specified fixed values) and \(y_t\) is a random variable.

ii. The fundamentals of the OLS Technique

To explain the concept, let us consider a regression model with one independent variable, \(x_{1t}\),

\[y_t=\beta_{0}+\beta_1x_{1t}+\varepsilon_t \tag{4}\]

where \(\beta_0\) and \(\beta_1\) are unknown parameters to be estimated, and \(\varepsilon_t\) is identically, independently and normally distributed with mean zero and variance, \(\sigma_{\varepsilon}^2\).

Next, it follows that the estimated regression equation can be written as; \[\hat{y}_t=\hat{\beta}_0+\hat{\beta}_1x_{1t}\tag{5}\] and that \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are unbiased estimator of \(\beta_0\) and \(\beta_1\) respectively. From Equation (4), therefore, we have;

\[ \begin {aligned} e_t&=y_t-\hat{y}_t\\ &=y_t-(\hat{\beta}_0+\hat{\beta}_1x_{1t}) \end {aligned} \] and that, \[ \sum_{t=1}^ne_t^2=\sum_{t=1}^n[y_t-(\hat{\beta}_0+\hat{\beta}_1x_{1t})]^2\tag{6}\] thus, minimising \(\sum_{t-1}^ne_t^2\) is also minimising \(\sum_{t=1}^n[y_t-(\hat{\beta}_0+\hat{\beta}_1x_{1t})]^2\). Further explanation in textbook page 175-178.



The goal for OLS is to select the model parameters (intercept and slopes) that minimise the difference between actual response values and those predicted by the model.

Generally, to properly interpret the coefficients of the OLS model, we must statisfy a number of statistical assumptions which will discuss later.

iii. Econometric Model in R Programming

Since we will formulating model that assume the dependent variable, \(y_t\) is a linear function of a set of independent variables, we will discuss on how to fit multiple linear regression model by using R.

Basic funtion for fitting a linear model in lm (). The format is

model1 <- lm(formula, data)

where formula describes the model to be fit and data is the data frame containing the data to be used in fitting the model. The resulting object (model1) is a list that contains extensive information about the model. The formula is typically written as; \[Y \sim X_1+X_2+...+X_k \] where the ~ separates the response variable on the left from the predictor variables on the right, and the predictor variables are separated by + signs.

Example by using built-in dataset state.x77. It is always a good idea to examine the relationships among the variables two at a time. Bivariate correlations are provided by the cor () function, and scatter plots are generated from the scatterplotMatrix() function on the car package.

states <- as.data.frame(state.x77[, c("Murder", "Population", "Illiteracy", "Income", "Frost")]) 
cor(states)
##                Murder Population Illiteracy     Income      Frost
## Murder      1.0000000  0.3436428  0.7029752 -0.2300776 -0.5388834
## Population  0.3436428  1.0000000  0.1076224  0.2082276 -0.3321525
## Illiteracy  0.7029752  0.1076224  1.0000000 -0.4370752 -0.6719470
## Income     -0.2300776  0.2082276 -0.4370752  1.0000000  0.2262822
## Frost      -0.5388834 -0.3321525 -0.6719470  0.2262822  1.0000000
library(car)
## Loading required package: carData
scatterplotMatrix (states, spread=FALSE, smoother.args=list(lty=2), main="Scatter Plot Matrix")

Next, we will fit the multiple regression model by using lm() function;

fit <- lm(Murder ~ Population + Illiteracy + Income + Frost, data = states)
summary(fit)
## 
## Call:
## lm(formula = Murder ~ Population + Illiteracy + Income + Frost, 
##     data = states)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7960 -1.6495 -0.0811  1.4815  7.6210 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.235e+00  3.866e+00   0.319   0.7510    
## Population  2.237e-04  9.052e-05   2.471   0.0173 *  
## Illiteracy  4.143e+00  8.744e-01   4.738 2.19e-05 ***
## Income      6.442e-05  6.837e-04   0.094   0.9253    
## Frost       5.813e-04  1.005e-02   0.058   0.9541    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.535 on 45 degrees of freedom
## Multiple R-squared:  0.567,  Adjusted R-squared:  0.5285 
## F-statistic: 14.73 on 4 and 45 DF,  p-value: 9.133e-08
Functions that are useful when fitting linear models
Function Action
summary () Displays detailed results for the fited model
coefficients () Lists the model parameters (intercept and slopes) for the fitted model
confint () Provides confidence intervals for the model paramaters (95% by default)
fitted () Lists the predicted vaues in a fitted model
residuals () Lists the residuals values in a fitted model
anova () Generates an ANOVA table for a firtted model, or an ANOVA table comparing two or more fitted models
vcov () Lists the covariance matrix for model parameters
AIC () Prints Akaike’s Information Criterion
plot () Generates diagnostic plots for evaluating the fit of a model
predict () Uses a fitted model to predict response values for a new dataset
Reference
  1. Robert Kabacoff. 2015. R in Action: Data Analysis and Graphics with R. Manning Publications Co., USA.
  2. Mohd. Alias Lazim. (2007). Introductory Business Forecasting a practical approach. University Publication Centre (UPENA).

3.3 Model Construction Issues

A good model builder will initially adress several important issues prior to actually starting to develop model. Process of building an econometric model for forecasting purposes is not simply the act of finding the dependent variable to be forecasted and then determining the independent variable(s) to explain it. These are some issues that need to be determined before we construct the model.

Reference
  1. Mohd. Alias Lazim. (2007). Introductory Business Forecasting a practical approach. University Publication Centre (UPENA).

3.4 The Assumptions

Reference
  1. Mohd. Alias Lazim. (2007). Introductory Business Forecasting a practical approach. University Publication Centre (UPENA).

3.5 Model Validation and Testing

i. Introduction

The following examples describe some of the commonly used statistical testing procedures by using R Programming.

We will use several data sets as example for each statistical testing procedures.

library (dynlm)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library (foreign)
library (car)
library (lmtest)

If there is no package called ‘dynlm’, ‘foreign’, ‘car’ and/or ‘lmtest’, install the packages needed, internet connection is a must to install the packages;

install.packages ('dynlm')
install.packages ('foreign')
install.packages ('car')
install.packages ('lmtest')

‘dynlm’ package is needed to incoprate the lag term in the model.
‘foreign’ package is needed to call the data from the URL.
‘car’ for vif () function to test the multicollinearity.
‘lmtest’ is needed to perform the Durbin-Watson test.

ii. Model Validation and Testing

data1<-longley
colnames (longley)
## [1] "GNP.deflator" "GNP"          "Unemployed"   "Armed.Forces" "Population"  
## [6] "Year"         "Employed"
reg1<-lm(Employed~GNP.deflator + GNP + Unemployed + Armed.Forces + Population, data=data1)
summary(reg1)
## 
## Call:
## lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
##     Population, data = data1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.55324 -0.36478  0.06106  0.20550  0.93359 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  92.461308  35.169248   2.629   0.0252 *
## GNP.deflator -0.048463   0.132248  -0.366   0.7217  
## GNP           0.072004   0.031734   2.269   0.0467 *
## Unemployed   -0.004039   0.004385  -0.921   0.3788  
## Armed.Forces -0.005605   0.002838  -1.975   0.0765 .
## Population   -0.403509   0.330264  -1.222   0.2498  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4832 on 10 degrees of freedom
## Multiple R-squared:  0.9874, Adjusted R-squared:  0.9811 
## F-statistic: 156.4 on 5 and 10 DF,  p-value: 3.699e-09

1. General Fitness of the Model (F-Statistics)

Based on the output above, F-statistics is equal to 156.6 and is highly significant (p-value = 3.699e-09 < \(\alpha = 0.05\)). Overall, estimated model fits the data well.

2. Regression Coefficients (t-statistics)

Next, we carry out individual significance test of the estimated model by using the p-value for t-test. The null hypothesis for t-test implied that the individual variable should not be included in the data, \(\beta=0\).

Based on the output above, only GNP is significant as the p-value = 0.0467 which is less than \(\alpha = 0.05\), while other variables produce insignificant value at 5% significance level (fail to reject the null hypothesis).

3. Goodness of Fit (R-squared and Adjusted R-Squared)

We can measure the goodness of fit of the model by using the R-squared or/and the Adjusted R-squared value i which, the value is bounded between 0 and 1.

The value R-squared is interpreted as the total variation in \(y\) that is explained by the independent variable (s). In this example, 98.74% of the total variation in Employed is explained by the all independent variables, while remaining 1.26% is explained by other factors.

It is advisable to evaluate the goodness fit of the model based on the value of adjusted R-squared. Closer to 1, meaning that the model is a good fit. For this example, the adjusted \(R^2\) = 0.9811, suggesting the estimated models fits the data well.

Note that, based on the information of previous t-statistics, they might be only one variable that contribute to the high value of R-squared and adjusted R-squared, thus further investigation is needed when we dealing with this validation and testing procedure.

4. Heterocedasticity

Usual practice in econometric modelling is to assume the error variance is constan over all times and locations (homocedasticity).

If we do not have constant variance, then we have heterocedasticity. If there is such issue, parameters obtained by OLS method are no longer minimum variance unbiased estimator. Over time, the estimates of the dependent variance become less and less predictable.

Procedure to eliminate heterocedasticity problem is to take a suitable transformation, such as log transformation.

In addition with plot, the White test and the Breusch-Pagan test are commonly used to determine the existence of heterocedasticity.

bptest (reg1)
## 
##  studentized Breusch-Pagan test
## 
## data:  reg1
## BP = 3.9203, df = 5, p-value = 0.5609

Based on the regression model we fitted before this, the Breusch-Pagan test produce p-value more than 0.05, thus we accept the null hypothesis that the variance of the residuals is constants. Thus, the model is homoscedastic.

5. Multicollinearity

When one independent variables are related to each other, multicollinearity among the variables is said to exist.

vif (reg1)
## GNP.deflator          GNP   Unemployed Armed.Forces   Population 
##   130.829201   639.049777    10.786858     2.505775   339.011693
tol <- 1/vif(reg1)
Collinearity <- data.frame (VIF = vif (reg1), Tolerance = tol)
Collinearity
##                     VIF   Tolerance
## GNP.deflator 130.829201 0.007643554
## GNP          639.049777 0.001564823
## Unemployed    10.786858 0.092705399
## Armed.Forces   2.505775 0.399078059
## Population   339.011693 0.002949751

For multicollinearity, it can be detected if;

  1. the largest VIF is greater than 10.

  2. the tolerance statistics is below than 0.1.

Based on the output, there are serious multicollinearity problem where most of the VIF is greater than 10 (tolerance below than 0.1) suggesting that the remedial action is needed to improve the model fitting.

Remedial action:

  1. If the presence of multicollinearity does not affect forecasting performance, retain the variables.

  2. The usual procedure when multicollinearity exists is to drop the offending variable or;

  3. Alternatively to drop the variable that provides lesser contribution towards model improvement.

  4. Increase the sample size since larger data set is presumed to provide more accurate estimates.

6. Serial Correlation

Serial correlation also known as autocorrelation. This is the case when the error terms, \(\varepsilon_t\), corresponding to different period of time are related to each other.

For this example, we will use phillps data to show example of serial correlation

phillips<-read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/phillips.dta")
tsdata<-ts(phillips, start=1948) #define yearly time series data in 1948
reg.s<-dynlm(inf~unem, data=tsdata, end=1996) #estimation of static Phillips curve
reg.ea<-dynlm(d(inf)~unem, data=tsdata, end=1996) #same with expectations-augmented Phillips curve

Durbin-Watson test for Serial Correlation

dwtest(reg.s)
## 
##  Durbin-Watson test
## 
## data:  reg.s
## DW = 0.8027, p-value = 7.552e-07
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(reg.ea)
## 
##  Durbin-Watson test
## 
## data:  reg.ea
## DW = 1.7696, p-value = 0.1783
## alternative hypothesis: true autocorrelation is greater than 0

The null hypothesis for Durbin-Watson test is there is no serial correlation in the model. Since reg.s is lower than \(\alpha = 0.05\), there is the presence of serial correlation for model estimation based on static Phillips curve.

However, when the second regression, reg.ea is estimated, the p-value is greater than \(\alpha = 0.05\) which means the regression has no autocorrelation. If only D-Watson statistic value is given, we can use the rule of thumbs as discussed during class session.


Reference
  1. Mohd. Alias Lazim. (2007). Introductory Business Forecasting a practical approach. University Publication Centre (UPENA).
  2. https://www.dummies.com/education/economics/econometrics/a-graphical-inspection-of-residuals/

3.6 Model Estimation Procedure

i. Loading Require Packages

library (dynlm)

ii. Data Preparation

  1. Download Example6_3.csv datasets

  2. Read Example6_3.csv data set into R Console

example6.3 <- read.csv (file.choose (), header = TRUE)

and choose Example6_3 data set in your downloaded folder.

Or, Copy and Paste the data set into your file directory and perform this code in R console.

example6.3 <- read.csv ("Example6_3.csv", header = TRUE)
  1. Since that we will works with time series data, it is best for us to define the data set as time series by using ts () function
example6.3 <- ts (example6.3, start = 1962, frequency = 1)
  1. Check the stucture of the dataset to ensure that R read our data set correctly.
str (example6.3) # check the structure
##  Time-Series [1:36, 1:9] from 1962 to 1997: 1962 1963 1964 1965 1966 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:9] "Year" "Cars" "UnemplRa" "GDP" ...
head (example6.3) # check the first 6 observations
##      Year Cars UnemplRa   GDP Export PopSize AvCarLan PerCapIn  CPI
## [1,] 1962 11.9      7.9 10426   2626     7.4      7.5     1409 31.3
## [2,] 1963 14.1      7.8 13077   2705     8.9      7.2     1469 32.2
## [3,] 1964 16.5      7.8 13932   2781     9.2      7.2     1514 32.1
## [4,] 1965 18.1      7.9 15400   3103     9.4      7.5     1638 32.9
## [5,] 1966 17.6      7.8 16376   3120     9.7      7.5     1688 33.4
## [6,] 1967 16.3      7.8 16612   3723    10.0      7.4     1661 34.8


List of all variables name

colnames (example6.3)
## [1] "Year"     "Cars"     "UnemplRa" "GDP"      "Export"   "PopSize"  "AvCarLan"
## [8] "PerCapIn" "CPI"
  1. Simple Plotting to check variables characteristics over time

Plotting the variables except ‘Year’ variable;

plot (example6.3[,-1], main = "Plotting for All Variables over Times (1962-1997)")

Comment: (Answer 1)

iii. General-to-Specific Modelling

We will demonstrate on how to perform Model Estimation Procedure (General-to-Specific Approach) by using R Programming.

Regression Model 1

reg1 <- dynlm(Cars~UnemplRa+GDP+Export+PopSize+AvCarLan+PerCapIn+CPI+L(Cars), data = example6.3)
summary (reg1)
## 
## Time series regression with "ts" data:
## Start = 1963, End = 1997
## 
## Call:
## dynlm(formula = Cars ~ UnemplRa + GDP + Export + PopSize + AvCarLan + 
##     PerCapIn + CPI + L(Cars), data = example6.3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -44.487  -7.154  -0.404   8.736  42.431 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -68.490721 120.329724  -0.569 0.574111    
## UnemplRa      6.041032   7.192849   0.840 0.408640    
## GDP          -0.005416   0.004463  -1.213 0.235877    
## Export        0.001423   0.001180   1.206 0.238688    
## PopSize      -2.310819  16.809494  -0.137 0.891718    
## AvCarLan     -5.795693   5.671624  -1.022 0.316255    
## PerCapIn      0.106478   0.057699   1.845 0.076396 .  
## CPI          -0.064020   1.438768  -0.044 0.964849    
## L(Cars)       0.932328   0.215747   4.321 0.000201 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.83 on 26 degrees of freedom
## Multiple R-squared:  0.963,  Adjusted R-squared:  0.9517 
## F-statistic: 84.67 on 8 and 26 DF,  p-value: < 2.2e-16

Comment: (Answer 2)

Regression Model 2

reg2 <- dynlm(Cars~UnemplRa+Export+PopSize+AvCarLan+PerCapIn+CPI+L(Cars), data = example6.3)
summary (reg2)
## 
## Time series regression with "ts" data:
## Start = 1963, End = 1997
## 
## Call:
## dynlm(formula = Cars ~ UnemplRa + Export + PopSize + AvCarLan + 
##     PerCapIn + CPI + L(Cars), data = example6.3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.352  -8.066   0.741   8.500  44.422 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.345e+01  5.198e+01   1.221 0.232760    
## UnemplRa     6.086e+00  7.255e+00   0.839 0.408962    
## Export       6.707e-05  3.822e-04   0.175 0.862023    
## PopSize     -1.615e+01  1.245e+01  -1.297 0.205589    
## AvCarLan    -6.531e+00  5.688e+00  -1.148 0.261001    
## PerCapIn     7.724e-02  5.288e-02   1.461 0.155663    
## CPI         -7.812e-01  1.323e+00  -0.590 0.559821    
## L(Cars)      8.779e-01  2.129e-01   4.124 0.000318 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.99 on 27 degrees of freedom
## Multiple R-squared:  0.9609, Adjusted R-squared:  0.9508 
## F-statistic: 94.89 on 7 and 27 DF,  p-value: < 2.2e-16

Comment: (Answer 3)

Regression Model 3

reg3 <- dynlm(Cars~UnemplRa+Export+AvCarLan+PerCapIn+CPI+L(Cars), data = example6.3)
summary (reg3)
## 
## Time series regression with "ts" data:
## Start = 1963, End = 1997
## 
## Call:
## dynlm(formula = Cars ~ UnemplRa + Export + AvCarLan + PerCapIn + 
##     CPI + L(Cars), data = example6.3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.216  -4.940  -0.819   8.499  53.431 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 47.5385857 51.1243344   0.930 0.360390    
## UnemplRa     0.1373883  5.6903397   0.024 0.980909    
## Export       0.0003430  0.0003214   1.067 0.294909    
## AvCarLan    -6.6973355  5.7557271  -1.164 0.254408    
## PerCapIn     0.0230321  0.0327941   0.702 0.488275    
## CPI         -0.9711533  1.3309798  -0.730 0.471663    
## L(Cars)      0.8708293  0.2153761   4.043 0.000374 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.22 on 28 degrees of freedom
## Multiple R-squared:  0.9585, Adjusted R-squared:  0.9496 
## F-statistic: 107.8 on 6 and 28 DF,  p-value: < 2.2e-16

Comment: (Answer 4)

Regression Model 4

reg4 <- dynlm(Cars~Export+AvCarLan+PerCapIn+CPI+L(Cars), data = example6.3)
summary (reg4)
## 
## Time series regression with "ts" data:
## Start = 1963, End = 1997
## 
## Call:
## dynlm(formula = Cars ~ Export + AvCarLan + PerCapIn + CPI + L(Cars), 
##     data = example6.3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.278  -4.893  -0.847   8.518  53.452 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 48.3930460 36.2537524   1.335    0.192    
## Export       0.0003476  0.0002544   1.366    0.182    
## AvCarLan    -6.6159001  4.5828373  -1.444    0.160    
## PerCapIn     0.0227653  0.0303402   0.750    0.459    
## CPI         -0.9679251  1.3012286  -0.744    0.463    
## L(Cars)      0.8670720  0.1463046   5.926 1.95e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.89 on 29 degrees of freedom
## Multiple R-squared:  0.9585, Adjusted R-squared:  0.9514 
## F-statistic:   134 on 5 and 29 DF,  p-value: < 2.2e-16

Comment: (Answer 5)

Regression Model 5

reg5 <- dynlm(Cars~Export+AvCarLan+CPI+L(Cars), data = example6.3)
summary (reg5)
## 
## Time series regression with "ts" data:
## Start = 1963, End = 1997
## 
## Call:
## dynlm(formula = Cars ~ Export + AvCarLan + CPI + L(Cars), data = example6.3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.970  -6.370   0.206   7.429  56.067 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 66.0337831 27.3952319   2.410  0.02227 *  
## Export       0.0004972  0.0001570   3.167  0.00353 ** 
## AvCarLan    -8.0146917  4.1559314  -1.928  0.06330 .  
## CPI         -0.0268940  0.3443225  -0.078  0.93826    
## L(Cars)      0.9015169  0.1379005   6.537 3.14e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.75 on 30 degrees of freedom
## Multiple R-squared:  0.9577, Adjusted R-squared:  0.9521 
## F-statistic: 169.8 on 4 and 30 DF,  p-value: < 2.2e-16

Comment: (Answer 6)

Regression Model 6

reg6 <- dynlm(Cars~Export+AvCarLan+L(Cars), data = example6.3)
summary (reg6)
## 
## Time series regression with "ts" data:
## Start = 1963, End = 1997
## 
## Call:
## dynlm(formula = Cars ~ Export + AvCarLan + L(Cars), data = example6.3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.084  -6.157   0.384   7.248  55.736 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 66.1549333 26.9092562   2.458 0.019741 *  
## Export       0.0004911  0.0001340   3.663 0.000922 ***
## AvCarLan    -8.1770225  3.5407782  -2.309 0.027751 *  
## L(Cars)      0.8998982  0.1341310   6.709 1.66e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.45 on 31 degrees of freedom
## Multiple R-squared:  0.9577, Adjusted R-squared:  0.9536 
## F-statistic: 233.9 on 3 and 31 DF,  p-value: < 2.2e-16

Comment: (Answer 7)

Exercise

Comment on each part;

Answer 1:

Answer 2:

Answer 3:

Answer 4:

Answer 5:

Answer 6:

Answer 7:


Reference

  1. Mohd. Alias Lazim. (2007). Introductory Business Forecasting a practical approach. University Publication Centre (UPENA).

Notes compiled by;

Muhammad Asmu’i Abdul Rahim

Email:

Updated On: 11Nov2020