Preface

Introduction

Linear regression is a model that can describe the relationship between a dependent variable and at least one explanatory variable.

This allows for statistical investigation regarding the possible causal relationship between these variables (if any exists), and to what degree it is represented.

This method of statistical analysis has found use across many disciplines of the natural sciences, medicine, and the social sciences, including but not limited to:

  • Describing material properties in physics and engineering
  • Medical research studies
  • Econometric and financial investigation
  • Agricultural efficiency monitoring and development

Simple Linear Regression

Simple linear regression is the subset of general linear regression dealing with a single dependent variable and a single explanatory variable. As its name suggests, it is in many ways the simplest and most straightforward implementation of linear regression.

Mathematically, for explanatory variable x and dependent variable y

\(y = \beta_0 + \beta_1x + \epsilon\)

\(\text{where} \hspace{.1cm} \beta_0 \hspace{.1cm} \text{is a constant, }\) \(\beta_1 \hspace{.1cm} \text{is the regression coefficient, and}\) \(\epsilon \hspace{.1cm} \text{is the error term}\)

Generally with simple linear regression, ordinary least squares (OLS) is used to measure the relative accuracy of resulting regression line. This is done by squaring the vertical distance between the predicted regression value and a given point of the data set, with the goal that the smallest summation of these values is the most accurate regression estimation line.

Simple Linear Regression Graph 1

The graph on the next slide is an example plot of a simple linear regression describing the relationship between US coal exports and physical trade balance. Physical trade balance is an economic measure describing the net flow of materials across a nation’s borders.

Mathematically it is described by

\(\text{Physical Trade Balance} = \text{Exports} - \text{Imports}\)

This code produces the first plot:

simpleLR = usa %>% 
  filter(Category == "Coal", 
         Flow.name %in% c("Exports", "Physical Trade Balance")) %>%
  select(-"Category") %>%
  pivot_longer(
    cols = "1970":"2024",
    names_to = "year",
    values_to = "value") %>%
  pivot_wider(names_from = Flow.name, values_from = value)

Simple Linear Regression Graph 2

This code produces the plot on the following page. The graph is that of the relationship between Domestic Material Input (the total amount of material consumed by a country, including imports) and Domestic Extraction (how much a country produces of a given material). They appear to have a negative relationship between the given years of 1970 and 2012, indicating possibility that as domestic extraction increases, domestic material input decreases

simpleLR2 = usa %>% 
  filter(Category == "Petroleum", 
         Flow.name %in% c("Domestic Extraction", "Domestic Material Input")) %>%
  select(-"Category", "1970":"2012", -("2013":"2024")) %>%
  pivot_longer(
    cols = "1970":"2012",
    names_to = "year",
    values_to = "value") %>%
  pivot_wider(names_from = Flow.name, values_from = value)