—– LINERA REGRESSION SAMPLE ANALYSIS —–
—– Comparing United States Exports to Government Debt from 1989 to 2020 —–
NOTE: This is a sample analysis to help with understanding underlying methodology, the dataset is from real publicly available data and the combination of the two fields used are arbitrary, Evaluating if there is a possible relationship between the two fields.
- Data is Presented as a percent of GDP -
—– Importing dataset —–
data <- read.csv("C:\\temp_data\\discussion_11.csv", header=TRUE)The following is a summary of the dataset gathered from publicly available data
summary(data)## year exports_billions exp_percent_of_gdp govt_debt_percent_of_gdp
## Min. :1989 Min. : 504.3 Min. : 8.939 Min. : 33.27
## 1st Qu.:1997 1st Qu.: 931.6 1st Qu.: 9.708 1st Qu.: 46.57
## Median :2004 Median :1239.0 Median :10.667 Median : 55.96
## Mean :2004 Mean :1449.7 Mean :10.992 Mean : 66.28
## 3rd Qu.:2012 3rd Qu.:2165.9 3rd Qu.:12.282 3rd Qu.: 94.12
## Max. :2020 Max. :2538.4 Max. :13.644 Max. :126.39
The following is the header information of the dataset.
head(data)## year exports_billions exp_percent_of_gdp govt_debt_percent_of_gdp
## 1 1989 504.289 8.9388 39.1284
## 2 1990 551.873 9.2547 40.9339
## 3 1991 594.931 9.6609 44.0616
## 4 1992 633.053 9.7089 46.0501
## 5 1993 654.799 9.5472 48.2461
## 6 1994 720.937 9.8931 47.3535
—— Building a linear model for Government Debt as a function of Exports —-
In the following plot we see the Govt Debt (Y-axis) being plotted as a Function of Exports (x - axis)
plot(data[,"govt_debt_percent_of_gdp"],data[,"exp_percent_of_gdp"], main="US Govt Debt Compared to Exports 1989 - 2020",
xlab="Exports Percent Of GDP", ylab="Govt Debt Percent Of GDP")Applying the “lm” function to the dataset :
data.lm <- lm(govt_debt_percent_of_gdp ~ exp_percent_of_gdp, data=data)
data.lm##
## Call:
## lm(formula = govt_debt_percent_of_gdp ~ exp_percent_of_gdp, data = data)
##
## Coefficients:
## (Intercept) exp_percent_of_gdp
## -64.35 11.88
From the “lm” function above we get the following linear regression equation:
\(\widehat{ Debt %} = -64.35 + 11.88 * Exports\)
Expressed as a % of GDP
The following plots includes a “fitted” line using the slope and intercept of the linear model in its argument.
plot(govt_debt_percent_of_gdp ~ exp_percent_of_gdp, data=data,xlab="Exports Percent Of GDP", ylab="Govt Debt Percent Of GDP")
abline(data.lm)—– Evaluation the Quality of the Model —–
Using the “summary” function :
summary(data.lm)##
## Call:
## lm(formula = govt_debt_percent_of_gdp ~ exp_percent_of_gdp, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.462 -9.196 -2.121 9.834 69.492
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -64.350 25.698 -2.504 0.0179 *
## exp_percent_of_gdp 11.885 2.318 5.127 1.63e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.8 on 30 degrees of freedom
## Multiple R-squared: 0.467, Adjusted R-squared: 0.4492
## F-statistic: 26.28 on 1 and 30 DF, p-value: 1.63e-05
Residuals - The differences between the actual measured values and the corresponding values on the fitted regression line.
Note that for this model, the 1Q and 3Q values are almost mirror images about the median ~ -2, these symmetrical values together with the Min and Max(~2.5 times MIN ) are NOT outside the realm of a Gaussian Distribution.
Examining the estimated coefficients which are the fitted regression model from :
\(\widehat{ Debt %} = -64.35 + 11.88 * Exports\)
Expressed as a % of GDP
The Std. Error column shows the statistical standard error for each of the coefficients.
——- Test Statistic —- T- Value—–
In this model the “Exports” coefficient is about 5 times larger than the std. error - this is interpreted to mean that there is some - smaller variability is usually (5 to 10 time larger variability) in the slope estimate - “Exports”.
Using the same idea for the intercept coefficient (|-64/25| = ~ 2.5), we can deduce that there is significant variability/uncertainty in intercept estimate.
In the following plot, we examine the fitted vs residuals.
Here we can see that the residuals stay within a range as we move from left to right.
The residuals are somewhat uniformly distributed about the “~ -2”.
This plot suggests that using the exports as the sole predictor in the regression model does not sufficiently or fully explain the data.
plot(fitted(data.lm),resid(data.lm))Quantile-versus-Quantile Plots
The Q-Q plot gives visual indication of whether the residuals from the model are normally distributed.
Examining the Normal QQ Plot, we see that since the the two ends converge about that line. This behavior indicates that the residuals are normally distributed.
The Q-Q Plot suggests that both of the distribution’s tails are what we would expect from a normal distribution. This pattern is indicative of a almost equally-skewed distribution. This test confirms that using only the Exports as a predictor in the model may be sufficient to explain the data.
qqnorm(resid(data.lm))
qqline(resid(data.lm))Four Diagnostic Plots for the SLR (Simple Linear Regression Model of the Debt - Exports Dataset).
It includes the Normal QQ Plot and the Residual Values VS Output Values created and discussed above.
The “Scale-Location” plot is an another way of visualizing the residuals versus fitted values from the linear regression model.
par(mfrow=c(2,2))
plot(data.lm)The Linear Model does show a relationship between US Exports and Govt Debt, I would deem this to be an appropriate model.
References
https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/cars
http://www.cs.uni.edu/~jacobson/4772/week11/R_in_Action.pdf
http://www.ievbras.ru/ecostat/Kiril/R/Biblio_N/R_Eng/Maindonald2010.pdf