2024-03-17

Slide 1: What is Simple Linear Regression?

Definition: A statistical technique employed to evaluate the connection between two numerical variables. It offers a simple method for forecasting a quantitative outcome Y based on a sole predictor variable X, presuming a roughly linear correlation between X and Y.

Example: Let’s use simple linear regression to predict daily maximum temperature based on historical weather data.

Slide 2: Steps in Simple Linear Regression

  1. Data Collection: Gather a data set containing paired observations of two variables: the predictor variable \(X\) (weather parameters) and the response variable \(Y\) (maximum temperature readings).

  2. Exploratory Data Analysis (EDA): Explore the relationship between the predictor and response variables through visualizations and summary statistics.This step helps identify patterns and correlations between weather parameters and maximum temperature readings.

  3. Model Fitting: Use the least squares method to fit a linear regression model to the data.This model aims to capture the linear relationship between humidity and maximum temperature.

  4. Model Assessment: Evaluate the goodness of fit of the model using diagnostic plots and statistical metrics. These assessments provide insights into how well the model explains the variation in maximum temperature based on humidity.

  5. Interpretation: Interpret the coefficients of the regression model to understand the relationship between the predictor and response variables. This interpretation helps understand the quantitative relationship between weather parameters and temperature.

  6. Prediction: Use the fitted model to make predictions on new data. By inputting values of humidity, the model can forecast corresponding maximum temperature values, aiding in weather forecasting and planning.

Slide 3: Mathematical Representation of Simple Linear Regression

The simple linear regression model can be mathematically represented as follows:

\[ Y = \beta_0 + \beta_1 X + \varepsilon \]

where: - \(Y\) is the response variable. - \(X\) is the predictor variable. - \(\beta_0\) is the intercept term. - \(\beta_1\) is the slope coefficient. - \(\varepsilon\) is the error term representing random variation.

The goal of simple linear regression is to estimate the values of \(\beta_0\) and \(\beta_1\) that minimize the sum of squared residuals, thereby providing the best-fitting line through the data points.

Slide 4: R code

Let’s generate a sample weather data set and explore its structure using R.

## 'data.frame':    100 obs. of  2 variables:
##  $ Humidity       : num  56.6 61.5 88.4 66.1 66.9 ...
##  $ Max_Temperature: num  53.3 64 76.1 61.9 57.6 ...
##   Humidity Max_Temperature
## 1 56.59287        53.27247
## 2 61.54734        63.98347
## 3 88.38062        76.05484
## 4 66.05763        61.85423
## 5 66.93932        57.55064
## 6 90.72597        79.07536

Slide 5: Plotly Plot

## Warning: package 'plotly' was built under R version 4.3.3
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Slide 6: Scatter Plot (ggplot)

Slide 7: Histogram (ggplot)

Slide 8: Performing the Test

## 
## Call:
## lm(formula = Max_Temperature ~ Humidity, data = weather_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.2587  -5.4680  -0.6999   4.6451  26.3232 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   25.997      3.861   6.733 1.14e-09 ***
## Humidity       0.572      0.057  10.035  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.766 on 98 degrees of freedom
## Multiple R-squared:  0.5068, Adjusted R-squared:  0.5018 
## F-statistic: 100.7 on 1 and 98 DF,  p-value: < 2.2e-16