A Basic Overview of Simple Linear Regression

Preface

The following project is one I did to learn more about both R and fundamental ideas within statistics. If any of my information is incorrect, please forgive me: I am still working towards a basic understanding of these concepts and their implementation.

Data exported from United Nations Environment Programme, International Resource Panel, Global Material Flows Database, specifically the 13 categories option.

Introduction

Linear regression is a model that can describe the relationship between a dependent variable and at least one explanatory variable.

This allows for statistical investigation regarding the possible causal relationship between these variables (if any exists), and to what degree it is represented.

This method of statistical analysis has found use across many disciplines of the natural sciences, medicine, and the social sciences, including but not limited to:

Describing material properties in physics and engineering
Medical research studies
Econometric and financial investigation
Agricultural efficiency monitoring and development

Simple Linear Regression

Simple linear regression is the subset of general linear regression dealing with a single dependent variable and a single explanatory variable. As its name suggests, it is in many ways the simplest and most straightforward implementation of linear regression.

Mathematically, for explanatory variable x and dependent variable y

\(y = \beta_0 + \beta_1x + \epsilon\)

\(\text{where} \hspace{.1cm} \beta_0 \hspace{.1cm} \text{is a constant, }\) \(\beta_1 \hspace{.1cm} \text{is the regression coefficient, and}\) \(\epsilon \hspace{.1cm} \text{is the error term}\)

Generally with simple linear regression, ordinary least squares (OLS) is used to measure the relative accuracy of resulting regression line. This is done by squaring the vertical distance between the predicted regression value and a given point of the data set, with the goal that the smallest summation of these values is the most accurate regression estimation line.

Simple Linear Regression Graph 1

The graph on the next slide is an example plot of a simple linear regression describing the relationship between US coal exports and physical trade balance. Physical trade balance is an economic measure describing the net flow of materials across a nation’s borders.

Mathematically it is described by

\(\text{Physical Trade Balance} = \text{Exports} - \text{Imports}\)

This code produces the first plot:

simpleLR = usa %>% 
  filter(Category == "Coal", 
         Flow.name %in% c("Exports", "Physical Trade Balance")) %>%
  select(-"Category") %>%
  pivot_longer(
    cols = "1970":"2024",
    names_to = "year",
    values_to = "value") %>%
  pivot_wider(names_from = Flow.name, values_from = value)

Simple Linear Regression Graph 2

This code produces the plot on the following page. The graph is that of the relationship between Domestic Material Input (the total amount of material consumed by a country, including imports) and Domestic Extraction (how much a country produces of a given material). They appear to have a negative relationship between the given years of 1970 and 2012, indicating possibility that as domestic extraction increases, domestic material input decreases

simpleLR2 = usa %>% 
  filter(Category == "Petroleum", 
         Flow.name %in% c("Domestic Extraction", "Domestic Material Input")) %>%
  select(-"Category", "1970":"2012", -("2013":"2024")) %>%
  pivot_longer(
    cols = "1970":"2012",
    names_to = "year",
    values_to = "value") %>%
  pivot_wider(names_from = Flow.name, values_from = value)