Data 605 Discussion 11

—– LINERA REGRESSION SAMPLE ANALYSIS —–

—– Comparing United States Exports to Government Debt from 1989 to 2020 —–

NOTE: This is a sample analysis to help with understanding underlying methodology, the dataset is from real publicly available data and the combination of the two fields used are arbitrary, Evaluating if there is a possible relationship between the two fields.

Data is Presented as a percent of GDP -

—– Importing dataset —–

data <- read.csv("C:\\temp_data\\discussion_11.csv", header=TRUE)

The following is a summary of the dataset gathered from publicly available data

https://www.macrotrends.net/

summary(data)

##       year      exports_billions exp_percent_of_gdp govt_debt_percent_of_gdp
##  Min.   :1989   Min.   : 504.3   Min.   : 8.939     Min.   : 33.27          
##  1st Qu.:1997   1st Qu.: 931.6   1st Qu.: 9.708     1st Qu.: 46.57          
##  Median :2004   Median :1239.0   Median :10.667     Median : 55.96          
##  Mean   :2004   Mean   :1449.7   Mean   :10.992     Mean   : 66.28          
##  3rd Qu.:2012   3rd Qu.:2165.9   3rd Qu.:12.282     3rd Qu.: 94.12          
##  Max.   :2020   Max.   :2538.4   Max.   :13.644     Max.   :126.39

The following is the header information of the dataset.

head(data)

##   year exports_billions exp_percent_of_gdp govt_debt_percent_of_gdp
## 1 1989          504.289             8.9388                  39.1284
## 2 1990          551.873             9.2547                  40.9339
## 3 1991          594.931             9.6609                  44.0616
## 4 1992          633.053             9.7089                  46.0501
## 5 1993          654.799             9.5472                  48.2461
## 6 1994          720.937             9.8931                  47.3535

—— Building a linear model for Government Debt as a function of Exports —-

In the following plot we see the Govt Debt (Y-axis) being plotted as a Function of Exports (x - axis)

plot(data[,"govt_debt_percent_of_gdp"],data[,"exp_percent_of_gdp"], main="US Govt Debt Compared to Exports 1989 - 2020",
xlab="Exports Percent Of GDP", ylab="Govt Debt Percent Of GDP")

Applying the “lm” function to the dataset :

data.lm <- lm(govt_debt_percent_of_gdp ~ exp_percent_of_gdp, data=data)

data.lm

## 
## Call:
## lm(formula = govt_debt_percent_of_gdp ~ exp_percent_of_gdp, data = data)
## 
## Coefficients:
##        (Intercept)  exp_percent_of_gdp  
##             -64.35               11.88

From the “lm” function above we get the following linear regression equation:

\(\widehat{ Debt %} = -64.35 + 11.88 * Exports\)

Expressed as a % of GDP

The following plots includes a “fitted” line using the slope and intercept of the linear model in its argument.

plot(govt_debt_percent_of_gdp ~ exp_percent_of_gdp, data=data,xlab="Exports Percent Of GDP", ylab="Govt Debt Percent Of GDP")

abline(data.lm)

—– Evaluation the Quality of the Model —–

Using the “summary” function :

summary(data.lm)

## 
## Call:
## lm(formula = govt_debt_percent_of_gdp ~ exp_percent_of_gdp, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.462  -9.196  -2.121   9.834  69.492 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -64.350     25.698  -2.504   0.0179 *  
## exp_percent_of_gdp   11.885      2.318   5.127 1.63e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.8 on 30 degrees of freedom
## Multiple R-squared:  0.467,  Adjusted R-squared:  0.4492 
## F-statistic: 26.28 on 1 and 30 DF,  p-value: 1.63e-05

Residuals - The differences between the actual measured values and the corresponding values on the fitted regression line.

Note that for this model, the 1Q and 3Q values are almost mirror images about the median ~ -2, these symmetrical values together with the Min and Max(~2.5 times MIN ) are NOT outside the realm of a Gaussian Distribution.

Examining the estimated coefficients which are the fitted regression model from :

\(\widehat{ Debt %} = -64.35 + 11.88 * Exports\)

Expressed as a % of GDP

The Std. Error column shows the statistical standard error for each of the coefficients.

——- Test Statistic —- T- Value—–

In this model the “Exports” coefficient is about 5 times larger than the std. error - this is interpreted to mean that there is some - smaller variability is usually (5 to 10 time larger variability) in the slope estimate - “Exports”.

Using the same idea for the intercept coefficient (|-64/25| = ~ 2.5), we can deduce that there is significant variability/uncertainty in intercept estimate.

In the following plot, we examine the fitted vs residuals.

Here we can see that the residuals stay within a range as we move from left to right.

The residuals are somewhat uniformly distributed about the “~ -2”.

This plot suggests that using the exports as the sole predictor in the regression model does not sufficiently or fully explain the data.

plot(fitted(data.lm),resid(data.lm))

Quantile-versus-Quantile Plots

The Q-Q plot gives visual indication of whether the residuals from the model are normally distributed.

Examining the Normal QQ Plot, we see that since the the two ends converge about that line. This behavior indicates that the residuals are normally distributed.

The Q-Q Plot suggests that both of the distribution’s tails are what we would expect from a normal distribution. This pattern is indicative of a almost equally-skewed distribution. This test confirms that using only the Exports as a predictor in the model may be sufficient to explain the data.

qqnorm(resid(data.lm))
qqline(resid(data.lm))

Four Diagnostic Plots for the SLR (Simple Linear Regression Model of the Debt - Exports Dataset).

It includes the Normal QQ Plot and the Residual Values VS Output Values created and discussed above.

The “Scale-Location” plot is an another way of visualizing the residuals versus fitted values from the linear regression model.

par(mfrow=c(2,2))
plot(data.lm)

The Linear Model does show a relationship between US Exports and Govt Debt, I would deem this to be an appropriate model.

References

https://www.macrotrends.net/

https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/cars

http://www.cs.uni.edu/~jacobson/4772/week11/R_in_Action.pdf

http://www.ievbras.ru/ecostat/Kiril/R/Biblio_N/R_Eng/Maindonald2010.pdf

Data 605 Discussion 11

Tage N Singh

April 06, 2023