Part I: Creating Dataset

Objective: Create a dataset with 1000 data points, 3 independent continuous variables (predictors) and one continuous dependent variable (outcome). There should be a linear relationship (including some noise) between the predictors and the outcome. At least 2 predictors should have a limited range (e.g. between -10 and 10).

Preparing Workspace

rm(list = ls())
cat("\f") 

Preparing Packages

library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(MASS)

Defining Linear Relationship

#making data set reproduceable
set.seed(7)

#defining sample size
n <- 1000

#defining three independent continuous variables w/ limited range 
x.1 <- runif(n, min = 10, max = 20) #predictor for the natural resource revenues of a country
x.2 <- runif(n, min = 15, max = 30) #predictor for natural resource dependence of a country
x.3 <- runif(n, min = 8, max = 16) #predictor for the natural resource market share of a country

#defining linear relationship
beta.0 <- 4 #intercept, when all independent variables are equal to 0, y = 4
beta.1 <- 3 #for every one unit increase in x.1, y increases 3 units
beta.2 <- .8 #for every one unit increase in x.2, y increases .8 units
beta.3 <- 1.8 #for every one unit increase in x.3, y increases 1.8 units

#incorporating noise 
epsilon <- rnorm(n, mean = 0, sd = 5) 

#defining multiple linear regression equation
y <- beta.0 + beta.1 * x.1 + beta.2 * x.2 + beta.3 * x.3 + epsilon #the outcome variable (y) is the military expenditures of a given country

Background: For this assignment, I chose to create a hypothetical data set containing fictitious country-level data related to a state’s natural resource endowment and military expenditures. The data set contains three independent, continuous variables (predictors): 1) Natural resource revenues (x.1), 2) Natural resource dependence (x.2), 3) Natural resource market share (x.3). The outcome variable (y) is the military expenditures of a given country.

Building Dataset

my.dataset <- data.frame(x.1, x.2, x.3, y)
colnames(my.dataset) <- c("resource.wealth", "resource.dependence", "resource.marketshare", "military.expenditures")
summary(my.dataset)
##  resource.wealth resource.dependence resource.marketshare military.expenditures
##  Min.   :10.01   Min.   :15.04       Min.   : 8.008       Min.   : 55.88       
##  1st Qu.:12.61   1st Qu.:18.68       1st Qu.: 9.792       1st Qu.: 80.16       
##  Median :14.96   Median :22.21       Median :11.858       Median : 88.74       
##  Mean   :15.06   Mean   :22.38       Mean   :11.922       Mean   : 88.63       
##  3rd Qu.:17.63   3rd Qu.:26.16       3rd Qu.:13.935       3rd Qu.: 96.69       
##  Max.   :19.99   Max.   :29.98       Max.   :15.993       Max.   :120.19

Part II: Simple Linear Regression

Objective: Run 3 distinct simple linear regressions, regressing the outcome against each one of the independent variables

Model 1: Natural Resource Wealth

plot(x = my.dataset$resource.wealth, y = my.dataset$military.expenditures,
     main = "Resource Wealth vs. Military Expenditures",
     xlab = "Resource Wealth (GDP per capita)",
     ylab = "Military Expenditures (MM USD)"
     )
abline(lm(my.dataset$military.expenditures ~ my.dataset$resource.wealth), col="red")

mylm.1 <- lm(my.dataset$military.expenditures ~ my.dataset$resource.wealth)
summary(mylm.1)
## 
## Call:
## lm(formula = my.dataset$military.expenditures ~ my.dataset$resource.wealth)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.1540  -4.9548  -0.3493   4.9448  22.6141 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                43.35611    1.26019   34.40   <2e-16 ***
## my.dataset$resource.wealth  3.00583    0.08219   36.57   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.452 on 998 degrees of freedom
## Multiple R-squared:  0.5727, Adjusted R-squared:  0.5723 
## F-statistic:  1338 on 1 and 998 DF,  p-value: < 2.2e-16

Interpretation: The first linear regression model estimates the relationship between a country’s natural resource wealth (predictor) and military expenditures (outcome).

The intercept parameter of indicates that the predicted military expenditures of a country are 43.36MM USD when natural resource revenues are 0 (% of GDP per capita) - the result is statistically significant at the .001 level.

The coefficient suggests a 1% GDP per capita increase in natural resource revenues will lead to a 3MM USD increase in military expenditures. The result is statistically significant at the .001 level.

The p-value is statistically significant at the .001 level. In a simplified example, let’s say we are policymakers attempting to determine whether there is a link between a country’s natural resource wealth and military expenditures. Our null hypothesis claims a country’s natural resource wealth has no effect on its military expenditures, while our alternative hypothesis takes the opposite position - natural resource wealth does have an effect on military expenditures. If we are testing this claim at a significance level of .05 (alpha), our p-value of <2.2e-16 indicates we have strong evidence supporting the alternative hypothesis - natural resource wealth does not have “no effect” on a country’s military expenditures.

The high F-statistic indicates the model fits the data well - there is a highly significant relationship between resource wealth and military expenditures.

The adjusted R-squared value of .57 indicates there is a moderate difference between the observed and fitted values - natural resource wealth accounts for a large proportion of variability in military expenditures, however, there is still a large amount of unexplained variability.

par(mfrow=c(2,3))
plot(mylm.1, c(1:6))

Interpretation:

Model 2: Natural Resource Dependence

plot(x = my.dataset$resource.dependence, y = my.dataset$military.expenditures,
     main = "Resource Dependence vs. Military Expenditures",
     xlab = "Resource Dependence (% of GDP)",
     ylab = "Military Expenditures (MM USD)"
     )
abline(lm(my.dataset$military.expenditures ~ my.dataset$resource.dependence), col="red")

mylm.2 <- lm(my.dataset$military.expenditures ~ my.dataset$resource.dependence)
summary(mylm.2)
## 
## Call:
## lm(formula = my.dataset$military.expenditures ~ my.dataset$resource.dependence)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -28.6105  -8.0119   0.1986   7.8615  26.2196 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     71.0730     1.7967  39.558   <2e-16 ***
## my.dataset$resource.dependence   0.7846     0.0788   9.957   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.87 on 998 degrees of freedom
## Multiple R-squared:  0.09036,    Adjusted R-squared:  0.08945 
## F-statistic: 99.14 on 1 and 998 DF,  p-value: < 2.2e-16

Interpretation: The second linear regression model estimates the relationship between a country’s natural resource dependence and military expenditures.

The intercept parameter of indicates that the predicted military expenditures of a country is 71.07 MM USD when natural resource dependence is 0 units (% of GDP) - the result is statistically significant at the .001 level.

The coefficient suggests a 1% GDP increase in natural resource dependence will lead to a .78 M USD increase in military expenditures. The result is statistically significant at the .001 level.

The p-value is statistically significant at the .001 level. Using a similar hypothesis test to the one used for the first regression (the null states natural resource dependence has no effect on military expenditures), at a significance level of .05 (alpha), our p-value of <2.2e-16 indicates we have strong evidence supporting the alternative hypothesis - natural resource dependence does not have “no effect” on a country’s military expenditures.

The high F-statistic indicates the model fits the data well - there is a highly significant relationship between resource dependence and military expenditures.

The adjusted R-squared value of .09 indicates the model has very little explanatory power (lowest result of the three regressions). The result does raise questions around fit considering a significant portion of variability remains unexplained.

par(mfrow=c(2,3))
plot(mylm.2, c(1:6))

Interpretation:

Model 3: Natural Resource Market Share

plot(x = my.dataset$resource.marketshare, y = my.dataset$military.expenditures,
     main = "Resource Market Share vs. Military Expenditures",
     xlab = "Resource Market Share (production volume as a percentage of global production)",
     ylab = "Military Expenditures (MM USD)"
     )
abline(lm(my.dataset$military.expenditures ~ my.dataset$resource.marketshare), col="red")

mylm.3 <- lm(my.dataset$military.expenditures ~ my.dataset$resource.marketshare)
summary(mylm.3)
## 
## Call:
## lm(formula = my.dataset$military.expenditures ~ my.dataset$resource.marketshare)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -33.734  -6.967  -0.191   7.238  27.426 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      62.3756     1.6459   37.90   <2e-16 ***
## my.dataset$resource.marketshare   2.2024     0.1354   16.26   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.13 on 998 degrees of freedom
## Multiple R-squared:  0.2095, Adjusted R-squared:  0.2087 
## F-statistic: 264.5 on 1 and 998 DF,  p-value: < 2.2e-16

Interpretation: The third linear regression model estimates the relationship between a country’s natural resource market share and military expenditures.

The intercept parameter indicates that the predicted military expenditures of a country is 62.38MM USD when a country’s natural resource market share is 0 (total volume as a % of global production) - the result is statistically significant at the .001 level.

The coefficient suggests a one-unit increase (total volume as a % of global production) in natural resource market share will lead to a 2.2 unit increase in military expenditures. The result is statistically significant at the .001 level.

The p-value is statistically significant at the .001 level, which would lead us to reject the null hypothesis - at a significance level of .05 (alpha), our p-value of <2.2e-16 indicates we have strong evidence supporting the alternative hypothesis - natural resource market share does not have “no effect” on a country’s military expenditures.

The high F-statistic indicates the model fits the data well - there is a highly significant relationship between resource market share and military expenditures.

The adjusted R-squared value of .21 indicates the model’s explanatory power falls in between the first two regressions - natural resource market share explains a smaller proportion of variability in the model than natural resource wealth, and a greater amount of variability than natural resource dependence. A large amount of variability remains unexplained which raises questions around fit.

par(mfrow=c(2,3))
plot(mylm.3, c(1:6))

Interpretation: