Simple Linear Regression Model:

\[ Y = \beta_0 + \beta_1 X + \varepsilon \] This equation models execution time as a linear function of CPU speed.

Create Example data

We generate synthetic CPU speed and execution time data for demonstration.

##   cpu_speed execution_time
## 1  2.575155       77.43146
## 2  3.576610       50.95705
## 3  2.817954       53.80784
## 4  3.766035       48.18608
## 5  3.880935       40.01735
## 6  2.091113       72.83862

Estimation of Coefficients:

These formulas show how the slope and intercept are estimated using least squares.

\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

R Code used to model

The following R code fits a simple linear regression model using the lm() function.

model <- lm(execution_time ~ cpu_speed, data = df)
model
## 
## Call:
## lm(formula = execution_time ~ cpu_speed, data = df)
## 
## Coefficients:
## (Intercept)    cpu_speed  
##      127.43       -22.32

Model Output

## 
## Call:
## lm(formula = execution_time ~ cpu_speed, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.7147  -3.6014   0.0977   3.5281   9.4061 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  127.433      5.030   25.34  < 2e-16 ***
## cpu_speed    -22.325      1.574  -14.19  2.6e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.938 on 28 degrees of freedom
## Multiple R-squared:  0.8779, Adjusted R-squared:  0.8735 
## F-statistic: 201.3 on 1 and 28 DF,  p-value: 2.603e-14

ggplot 1: Scatter and Regression Line

ggplot 2: Residual Plot

Add Third Variable

We add temperature as a third variable to illustrate how execution time depends on both CPU speed and temperature, enabling a 3D visualization in the next slide.

Plotly 3D Plot