Ge Chen

May.14th, 2015

RPI

Final

1. Data

(1) Data Selection

US International goods and services Trade dataset is a combination of two datasets collected from United States Census Bureau and Federal Reserve. They can be accessed from

URL:(Capital Export) http://www.census.gov/foreign-trade/statistics/historical/SAEXP.xls (Capital Import) http://www.census.gov/foreign-trade/statistics/historical/SAIMP.xls and URL:(Capacity Utilization) http://www.federalreserve.gov/datadownload/Build.aspx?rel=G17 (Euro Dollar Rate) http://www.federalreserve.gov/datadownload/Build.aspx?rel=H15 ###Read in the dataset

#data read in 
rm(list=ls())
Trade.data<-read.csv("~/Desktop/Applied_Regression/Logistic/International_Export_Import.csv")
head(Trade.data,n=14L);

##       Time Export Import TradeBalance CapacityUtilization EuroDollarRate
## 1  1992-03  16054  11365            1             80.2999           4.43
## 2  1992-04  14347  10863            1             80.6967           4.19
## 3  1992-05  13956  10407            1             80.7975           3.99
## 4  1992-06  15698  11533            1             80.5974           4.00
## 5  1992-07  13979  11485            1             81.1304           3.54
## 6  1992-08  13547  11306            1             80.5548           3.43
## 7  1992-09  14606  11680            1             80.5531           3.22
## 8  1992-10  15625  12177            1             80.9928           3.32
## 9  1992-11  14165  11581            1             81.1599           3.70
## 10 1992-12  15794  12419            1             81.0541           3.60
## 11 1993-01  13903  10521            1             81.2884           3.37
## 12 1993-02  13667  10870            1             81.4514           3.24
## 13 1993-03  16619  13334            1             81.2911           3.21
## 14 1993-04  15222  12367            1             81.4254           3.21

(2) Dataset Description

The US International goods and services Trade dataset contains 6 variables which cover the data from March 1992 to March 2015.

str(Trade.data)

## 'data.frame':    277 obs. of  6 variables:
##  $ Time               : Factor w/ 277 levels "1992-03","1992-04",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Export             : num  16054 14347 13956 15698 13979 ...
##  $ Import             : num  11365 10863 10407 11533 11485 ...
##  $ TradeBalance       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ CapacityUtilization: num  80.3 80.7 80.8 80.6 81.1 ...
##  $ EuroDollarRate     : num  4.43 4.19 3.99 4 3.54 3.43 3.22 3.32 3.7 3.6 ...

Time: time of the data, marked with the format yyyy-mm

Export:Total Capital Goods(The materials using for final goods production) Export at Time(i), i=1,2,3,…277, the unit is millions

Import:The value of total capital goods Import at Time(i), i=1,2,3,…277, the unit is millions

TradeBalance:if the International Trade Account is deficit(Import>Export), TradeBalance=0. If the account is surplus(Export>Import), TradeBalance=1

CapacityUtilization:The Ratio of Capacity actually used over installed productive capacity (Wikipedia) the unit is Percentage

EuroDollarRate:U.S.-dollar denominated deposits in foreign banks or foreign branches of American banks. (investopedia) the unit is percentage

Variables of Interest

For this Analysis, there mush be two continuous independent variabls and one categorical dependent variable. I am interest in how Capacity Utilization and Euro Dollar affect the US Internation Trade Balance.

Hypothesis

In This case, I want to test if US International Trade Balance can be explained by US Capacity Utilization and Euro Dollar Rate. \(Alternative Hypothesis\)

the \(Null Hypothesis\) is that US International Trade Balance can not be explained by US Capacity Utilization and Euro Dollar Rate.In other words, US International Trade Balance can be explained by other variables or it just turns out to be randomization.

2. Model

Model Goals

In this case, I try to find out whether the volatility of Euro Dollar Rate and US Capacity Utilization will make US Trade Balance deficit or surplus.

Model Construction

#construct new dataset containing the data I used
attach(Trade.data)
Trade.subdata <- subset(Trade.data,select = c(TradeBalance,CapacityUtilization,EuroDollarRate))

Step-wise

cor(Trade.subdata)

##                     TradeBalance CapacityUtilization EuroDollarRate
## TradeBalance           1.0000000           0.1636181      0.3659614
## CapacityUtilization    0.1636181           1.0000000      0.7056917
## EuroDollarRate         0.3659614           0.7056917      1.0000000

Following the correlation of the dataset, I will use the single independent variable regression between Trade Balance and EuroDollarRate firstly, and then plug in Capacity Utilization to check if the variable will contribute significant explaination to the dependent variable.

Single Independent Variable Model

attach(Trade.subdata)

## The following objects are masked from Trade.data:
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance

TradeBalance<-factor(TradeBalance)
model.1IV<-glm(TradeBalance~EuroDollarRate,data = Trade.subdata,family = "binomial")
summary(model.1IV)

## 
## Call:
## glm(formula = TradeBalance ~ EuroDollarRate, family = "binomial", 
##     data = Trade.subdata)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9888  -0.9737   0.6593   0.9553   1.4344  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -0.74037    0.23088  -3.207  0.00134 ** 
## EuroDollarRate  0.37559    0.06413   5.857 4.72e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 371.34  on 276  degrees of freedom
## Residual deviance: 332.92  on 275  degrees of freedom
## AIC: 336.92
## 
## Number of Fisher Scoring iterations: 4

Chi Square test for Single IV model

library(aod)
wald.test(b = coef(model.1IV),Sigma = vcov(model.1IV),Terms =2)

## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 34.3, df = 1, P(> X2) = 4.7e-09

Two Independent Variables Model

attach(Trade.subdata)

## The following object is masked _by_ .GlobalEnv:
## 
##     TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 4):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.data:
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance

model.2IV<-glm(TradeBalance~EuroDollarRate+CapacityUtilization,data = Trade.subdata,family = "binomial")
summary(model.2IV)

## 
## Call:
## glm(formula = TradeBalance ~ EuroDollarRate + CapacityUtilization, 
##     family = "binomial", data = Trade.subdata)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0897  -0.9395   0.6805   0.8985   1.6211  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          7.57677    3.74835   2.021   0.0432 *  
## EuroDollarRate       0.51834    0.09228   5.617 1.94e-08 ***
## CapacityUtilization -0.11114    0.05000  -2.223   0.0262 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 371.34  on 276  degrees of freedom
## Residual deviance: 327.67  on 274  degrees of freedom
## AIC: 333.67
## 
## Number of Fisher Scoring iterations: 4

Chi Square test for Two Independent Variables Model

wald.test(b = coef(model.2IV),Sigma = vcov(model.2IV),Terms =2:3)

## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 37.9, df = 2, P(> X2) = 5.8e-09

Describe the Model

The model with two continuous Independent Variables has the highest Chi Squre score. Also the coefficients of two IVs are both significant from zero. Hence, I will use the two independent variables as my final models.

FinalModel<-glm(TradeBalance~EuroDollarRate+CapacityUtilization,data = Trade.subdata, family = "binomial")
summary(FinalModel)

## 
## Call:
## glm(formula = TradeBalance ~ EuroDollarRate + CapacityUtilization, 
##     family = "binomial", data = Trade.subdata)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0897  -0.9395   0.6805   0.8985   1.6211  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          7.57677    3.74835   2.021   0.0432 *  
## EuroDollarRate       0.51834    0.09228   5.617 1.94e-08 ***
## CapacityUtilization -0.11114    0.05000  -2.223   0.0262 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 371.34  on 276  degrees of freedom
## Residual deviance: 327.67  on 274  degrees of freedom
## AIC: 333.67
## 
## Number of Fisher Scoring iterations: 4

3.Plot

Residuals Vs Fitted Value Plot

par(mfrow = c(1,1))
FinalModel.res<-residuals(FinalModel,type = "deviance")
plot(fitted(FinalModel),FinalModel.res,pch=21, cex=1, bg='blue',main="Plot of Fitted Values vs. Residuals ", xlab = "Fitted Values of Model", ylab = "Residuals")
abline(0,0)

Diagnostic Plots

Residuals Plot (Residuals Vs. IVs)

In the plot below, I will check if the residuals has correlation with the Independent variable.

####Residual Vs. Capacity Utilization

attach(Trade.subdata)

## The following object is masked _by_ .GlobalEnv:
## 
##     TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 3):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 5):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.data:
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance

plot(CapacityUtilization,FinalModel.res,pch=21, cex=1, bg='blue',main="Plot of Capacity Utilization vs. Residuals ", xlab = "Capacity Utilization", ylab = "Residuals")
abline(1,0)
abline(-1,0)

Residual Vs.EuroDollar Rate

plot(EuroDollarRate,FinalModel.res,pch=21, cex=1, bg='blue',main="Plot of EuroDollarRate vs. Residuals ", xlab = "EuroDollarRate", ylab = "Residuals")
abline(1,0)
abline(-1,0)

We can seee in the plot that the residuals have apparently correlation with the Capacity Utilization, when Untilization is over 75%. Also, residuals of the model have positive correlation with Euro Dollar Rate, since it keep increasing along with the rise of Euro Dollar Rate.

Histogram

The histogram of the residuals are not normally distributed. The peak of the histogram diagram is lower than the sides both on the left and right. I am curious that if I can separate the dataset into two subsets: one subset contains the data when Trade Balance equals to 1, and the other contains the data when Trade Balance equals. Therefore, the residuals can be seen as the two residuals subsets for the prediction of Trade Balance surplus and deficit. Then, we want to check if the residuals for each of predication are normal distributed.

hist(FinalModel.res,xlab = "Residuals",main = "The Histogram of Standard Residuals of model")

fit1<-predict(FinalModel,subset(Trade.subdata,Trade.subdata$TradeBalance==1))
fit0<-predict(FinalModel,subset(Trade.subdata,Trade.subdata$TradeBalance==0))
#find the residual around 1 and 0
resid1<-fit1-1
resid0<-fit0-0
hist(resid1, main = "Histogram of Residual When Dependent Variable = 1",xlab = "Residual")

hist(resid0, main = "Histogram of Residual When Dependent Variable = 0",xlab = "Residual")

apparently, each of the residual subsets is not normally distributed.

Boxplot

boxplot(FinalModel.res,main="Box PLot of the Residual")

QQPlot

qqnorm(FinalModel.res,main = "QQPlot of the Residual")
qqline(FinalModel.res)

qqnorm(resid1,main="QQplot of the Residual, When Trade Balance =1")
qqline(resid1)

qqnorm(resid0,main = "QQplot of the Residual, When Trade Balance =0")
qqline(resid0)

Standardized Residual Plot

plot(FinalModel.res, ylab = "Standardized Residual", main = ("Standardized residual plot"))

4.Interpretation

Statistical Analysis

We can see in the summary, the intercept of the model is 7.58, the slope of Euro Dollar rate is 0.52 and the slope of the slope of Capacity Utilization is -0.11. The intercept of the model shows that when Euro Dollar Rate and Capacity Utilization both equal to zero, the odds ratio is equal to 7.57, which means US trading balance is surplus. Increasing Euro Dollar Rate by 1 units (1%) will increase the odds ratio by 0.52, while keeping Capacity Utilization unchanged. In a similar way, increaing 1 unit of Capacity Utilization (1%) will decresse the odds ratio by 0.11.

In this case, the p-values for the estimate is 0.0432 (intercept), 1.04e-08, (Euro Dollar Rate) and 0.0262(Capacity Utilization). If we use the significant level of 95%, the Null hypothesis that the two independent variables cannot explain the dependent variable is rejected. Hence, we accept the alternative hypothesis that Euro Dollar rate and US Capacity Utilization can exlain the phenomenon of US Trade Balance deficit or surplus.

summary(FinalModel)

## 
## Call:
## glm(formula = TradeBalance ~ EuroDollarRate + CapacityUtilization, 
##     family = "binomial", data = Trade.subdata)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0897  -0.9395   0.6805   0.8985   1.6211  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          7.57677    3.74835   2.021   0.0432 *  
## EuroDollarRate       0.51834    0.09228   5.617 1.94e-08 ***
## CapacityUtilization -0.11114    0.05000  -2.223   0.0262 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 371.34  on 276  degrees of freedom
## Residual deviance: 327.67  on 274  degrees of freedom
## AIC: 333.67
## 
## Number of Fisher Scoring iterations: 4

LINE analysis

Linearity

From the plot, we car hardly conclude that the model does not shows the linearity of the residuals. In other words, the expected value of reisidual is not zero in this case. The empirical scatter plot of the residuals shaw that there are two groups of reisduals, separately falling around the categorical value we assumed.

par(mfrow = c(1,1))
FinalModel.res<-residuals(FinalModel,type = "deviance")
plot(fitted(FinalModel),FinalModel.res,pch=21, cex=1, bg='blue',main="Plot of Fitted Values vs. Residuals ", xlab = "Fitted Values of Model", ylab = "Residuals")
abline(0,0)

The plot of the fitted value Vs. Residual does nots show linear. So, the expected value of residual is not zero.

Independent

par(mfrow = c(1,1))
plot(FinalModel.res,pch=21,cex=1,bg="blue",xlab = "index",ylab = "Residual", main="Residual Value")

The plot does not clearly show there is absolutely no serial correlation. We can find autocorrelation does exist in the year of 2005~2009 (index 150 to 200)

Normal Distributed

The residual is not normal distributed as showed in the graph. I try to look into the histograms separated by the dependent value, when dependent variable equal to 1 or 0. The two histograms also appear not normal distributed, with some skewness. It is showed that the residual plot, when dependent variable is equal to 1, is more closed to normal ditribution, rather than the residuals, when dependet variable is equal to 0. (We can find large deviation in the both sides of the plot from QQ plot)

hist(FinalModel.res, main = "Residual Histogram")

#find the residual around 1 and 0
hist(resid1, main = "Histogram of Residual When Dependent Variable = 1",xlab = "Residual")

hist(resid0, main = "Histogram of Residual When Dependent Variable = 0",xlab ="Residual")

qqnorm(resid1,main="QQplot of the Residual, When Trade Balance =1")
qqline(resid1)

qqnorm(resid0,main = "QQplot of the Residual, When Trade Balance =0")
qqline(resid0)

Equal Variance

The residual of the model turns to be Heteroskedastic. Next, I use Breusch-Pagan test to find if the residual is really Heteroskedastic.

plot(FinalModel.res,main = "Residual plot")

Breusch - Pagan Test

attach(Trade.subdata)

## The following object is masked _by_ .GlobalEnv:
## 
##     TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 3):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 4):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 6):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.data:
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance

library(lmtest)

## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

#test TradeBalance ~ EuroDollarRate
bptest(model.1IV)

## 
##  studentized Breusch-Pagan test
## 
## data:  model.1IV
## BP = 3.1605, df = 1, p-value = 0.07544

# test TradeBalance ~ CapacityUtilization
model.1IV2<-glm(TradeBalance~CapacityUtilization,data = Trade.subdata,family = "binomial")
bptest(model.1IV2)

## 
##  studentized Breusch-Pagan test
## 
## data:  model.1IV2
## BP = 65.3676, df = 1, p-value = 6.215e-16

# test TradeBalance ~ EuroDollarRate + CapacityUtilization
bptest(FinalModel)

## 
##  studentized Breusch-Pagan test
## 
## data:  FinalModel
## BP = 4.9134, df = 2, p-value = 0.08572

The Null Hypothesis for the Breusch - Pagan test is that the data is homoskedestic and the Alternative Hypothesis is that the data is heteroskedastic. the first test for Trade balance which is explained by Euro Dollar Rate, second test for Trade balance explaiined by Capacity Utilization and the last shows that Trade balance explained by both independent variables. The p-value for the three tests are 0.07544, 6.25e-16 and 0.08572. If we use alpha = 10%, the Null hypothesis will be rejected, which means the data is heteroskedastic.

Four Issues

1.Causality

Both Euro Dollar Rate and Capacity Utilization are not ultimate or proximal cause of the Trade Balance deficit or surplus. There are too many factors that will affect the Trade Balance fluctuating. Also, Euro Dollar Rate may be a probability cause of Trade Balance, because there is no clearly relationship bwtween Trade Balance and Euro Dollar Rate. The relationship between Trade Balance and Capacity Utilization is a little more complicated. there may be a probability cause between two variables, since the Trade Balance can also be affected by otehr factors. But, When native capacity utilization inceases, there is a obvious result that the demand for capital goods will rise. Hences, the relationship between Capacity and Trade Balance may also be determinate.

2.Sample Size

I use the G* Power with odds ratio (Calcuate using EuroDollarRate and CapacityUtilization both equal to 1), 0.39 for H0(from model), 0.05 alpha and .95 power to find the fitted size for the model. The best sample size is 255, which is really close to the number of dataset used. Hence, I decide to use the original dataset for the model.

#form an index array
samplesize<-255
set.seed(88)
samplerow<- nrow(Trade.subdata)
#random pick index from the oringinal set
model.index <- sample(samplerow, samplesize, replace = FALSE)
#construct a new set containing the samples
Trade.sample<-Trade.subdata[model.index,]

attach(Trade.sample)

## The following object is masked _by_ .GlobalEnv:
## 
##     TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 5):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 6):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 7):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 9):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.data:
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance

model.sample<-glm(Trade.sample$TradeBalance~Trade.sample$EuroDollarRate+Trade.sample$CapacityUtilization,data =  Trade.sample, family = "binomial")
summary(model.sample)

## 
## Call:
## glm(formula = Trade.sample$TradeBalance ~ Trade.sample$EuroDollarRate + 
##     Trade.sample$CapacityUtilization, family = "binomial", data = Trade.sample)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1166  -0.9681   0.6781   0.8845   1.5637  
## 
## Coefficients:
##                                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                       7.03602    3.81745   1.843   0.0653 .  
## Trade.sample$EuroDollarRate       0.50610    0.09704   5.215 1.84e-07 ***
## Trade.sample$CapacityUtilization -0.10263    0.05100  -2.012   0.0442 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 335.69  on 254  degrees of freedom
## Residual deviance: 298.24  on 252  degrees of freedom
## AIC: 304.24
## 
## Number of Fisher Scoring iterations: 4

wald.test(b = coef(model.sample),Sigma = vcov(model.sample),Terms =2:3)

## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 32.8, df = 2, P(> X2) = 7.6e-08

3.Colinearity

I regressed Euro Dollar Rate on Capacity Utilization to check the whether colinearity existed in the two independent variables.The R square of the linear regression is 0.498, which is not a small number. Hence, there is some multicollinearity in the data.

attach(Trade.subdata)

## The following object is masked _by_ .GlobalEnv:
## 
##     TradeBalance
## 
## The following objects are masked from Trade.sample:
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 6):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 7):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 8):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 10):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.data:
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance

Colinear<-lm(EuroDollarRate~CapacityUtilization)
summary(Colinear)

## 
## Call:
## lm(formula = EuroDollarRate ~ CapacityUtilization)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2343 -1.0346  0.1970  0.9844  3.6099 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -28.26228    1.91317  -14.77   <2e-16 ***
## CapacityUtilization   0.39916    0.02417   16.52   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.526 on 275 degrees of freedom
## Multiple R-squared:  0.498,  Adjusted R-squared:  0.4962 
## F-statistic: 272.8 on 1 and 275 DF,  p-value: < 2.2e-16

4.Measurement error

The data is definately accurate, since the data is collected from the official government site. The only error we may need to think about is the Euro Dollar Rate, which is not absolutely determined by the market. Some manipulation may affect the real rate, if the faking LIBOR rate case happens on Euro Dollar Rate.

Interact Effect

we input a interaction between two independent variables to model and find the effect of the interation is significant. Chi-squared test score improved and what’s more, the slope of Capacity Utilization turns out to more significant from zero, even with a more harsh test level.

attach(Trade.subdata)

## The following object is masked _by_ .GlobalEnv:
## 
##     TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 3):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.sample:
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 7):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 8):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 9):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.subdata (pos = 11):
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance
## 
## The following objects are masked from Trade.data:
## 
##     CapacityUtilization, EuroDollarRate, TradeBalance

model.interact<-glm(TradeBalance~EuroDollarRate+CapacityUtilization+EuroDollarRate*CapacityUtilization,data = Trade.subdata, family = "binomial")
summary(model.interact)

## 
## Call:
## glm(formula = TradeBalance ~ EuroDollarRate + CapacityUtilization + 
##     EuroDollarRate * CapacityUtilization, family = "binomial", 
##     data = Trade.subdata)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2112  -0.9837   0.5057   0.9955   1.7593  
## 
## Coefficients:
##                                    Estimate Std. Error z value Pr(>|z|)
## (Intercept)                        28.01647    7.11462   3.938 8.22e-05
## EuroDollarRate                     -7.67700    2.25556  -3.404 0.000665
## CapacityUtilization                -0.37305    0.09233  -4.040 5.34e-05
## EuroDollarRate:CapacityUtilization  0.10266    0.02835   3.621 0.000294
##                                       
## (Intercept)                        ***
## EuroDollarRate                     ***
## CapacityUtilization                ***
## EuroDollarRate:CapacityUtilization ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 371.34  on 276  degrees of freedom
## Residual deviance: 312.97  on 273  degrees of freedom
## AIC: 320.97
## 
## Number of Fisher Scoring iterations: 4

wald.test(b = coef(model.interact),Sigma = vcov(model.interact),Terms =2:4)

## Wald test:
## ----------
## 
## Chi-squared test:
## X2 = 44.9, df = 3, P(> X2) = 9.9e-10

5 Conclusion

Euro Dollar Rate and Capacity Utilization are likely to exlain the log odds of Trade Balance for deficit or surplus.The two independent variables have co-linearity and the interactation effect is significant. But using two variables together can eplain the Trade Balance beter rather then a single variable.

Logistic Regression