2026-06-02

Linear Regression

  • Linear Regression is a sequence of mathematical processes that help analysts predict output for a given input based on data derived from different observations.

  • Linear Regression is essential for data science. It allows scientist to derive functions from a set of observations.

  • Allow scientists to predict future outcome based on previous observations.

General Applications

  • In Economics it is common to use the demand function. This function tells you the change of quantity demanded with respects to changes in prices in the market.

  • It would be hard to ask every single person in the world what is the maximum willingness to pay for a specific product. Not to mention about the costs related to this process.

  • What market analyst do instead is to post a demo of the product at different price levels, record the quantity sold, and input the information in a linear regression model. This limits the cost of obtaining the data and the time to collect.

How to Set Up a Linear Regression Model

  • For the purpose of this example a basic dataframe was loaded. This dataframe records coffee sales during a given amount of time.

head(df)
##         date money         coffee_name
## 1 2024-03-01  38.7               Latte
## 2 2024-03-01  38.7       Hot Chocolate
## 3 2024-03-01  38.7       Hot Chocolate
## 4 2024-03-01  28.9           Americano
## 5 2024-03-01  38.7               Latte
## 6 2024-03-01  33.8 Americano with Milk

date is the date of the transaction.
money is the price paid in Ukrainian hryvnias.
coffee_name is the name of the product.

Initial Plot

First, we want to set the count of drink sales with respect to price (quantity vs price).
Also, we are going to focus on a single drink. Different drinks have different ingredients that can affect the price.

Applying Linear Regression

Now, we have create the regression line using “ggplot” only one line of code has to be added to our previous graph.

The line and shade represents the possible quantity sold given a specific input.
The shade is a .95 confidence level. 95% of the times, with given input. The average quantity sold per day is going to lay inside the gray area.

Linear Regression Formula

It’s easy to see that a line function is created. The formula for this line (simplified) is:
\[Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\]

Y is the projected outcome.
\(\beta_0\) is a given value for the circumstance where the independent variable is zero.
\(\beta_1\) is the relationship between the variable \(X_i\) and \(Y_i\)
\(\epsilon_i\) is an error of the model produced by significant variables not included in the model.

Conclussion

  • Linear regression can be very useful to predict output in any job related to data analysis and manipulation.

  • The formula: \(Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\) can be extended to \(Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_2 + \epsilon_i\) or longer depending on the number of variables that the researcher wants to account for.

  • From our example we could show how increasing prices could cause a drop in quantities sold. The same way, important insight can be drawn from linear regression analysis.