Study Data

##    hours score
## 1      1    52
## 2      2    55
## 3      3    61
## 4      4    63
## 5      5    68
## 6      6    72
## 7      7    75
## 8      8    81
## 9      9    85
## 10    10    88

Scatterplot of the Data (ggplot)

Regression Equation

Here is the equation for a simple linear regression model: \[ \hat{y} = b_0 + b_1 x \]

where:

  • \(\hat{y}\) is the predicted value

  • \(b_0\) is the intercept

  • \(b_1\) is the slope

  • \(x\) is the predictor variable

The slope tells us how much the response variable changes for each one-unit increase in \(x\).

Fitting the model

We can use R to compute the regression line using the lm() function.

\[\hat{y} = 47.53 + 4.08x\]

Regression Line Plot

This plot shows the fitted regression line along with the data.

Residuals

Residuals are the differences between the observed values and the predicted values.

\[ e_i = y_i - \hat{y}_i \]

A good model has residuals that are relatively small and randomly scattered around 0.

Plotly Interactive Plot

This interactive plot shows the data points and fitted regression line.

Conclusion

Simple linear regression is a useful statistical method for understanding the relationship between two variables. In this example, we saw that as hours studied increase, exam scores also increase.

This model allows us to:

  • describe relationships between variables

  • make predictions

  • identify trends in data

Overall, simple linear regression is a widely applicable topic that can be used in variety of fields and industries.