2024-09-20

Introduction

  • Definition
    • A statistical method to understand the connection between two continuous variables.
  • Variables
    • Independent variable (x): Predictor
    • Dependent variable (y): Response
  • Purpose
    • Fits a straight line to data
    • Estimates the relationship between x and y
    • Predicts y values based on x values
  • Key Features
    • Assumes a linear relationship.
    • Can predict future outcomes (y) from given inputs (x).

The Linear Regression Formula

The equation: \[ y = \beta_0 + \beta_1x + \epsilon \]

Where:

  • \(\beta_0\): intercept

  • \(\beta_1\): slope

  • \(\epsilon\): error term.

Dataset Creation

Dataset contains height (in cm) and weight (in kg) for 100 individuals:

height weight
1 150.0 109.7
2 150.5 111.7
3 151.0 121.1
4 151.5 114.0
5 152.0 114.7
6 152.5 123.0
7 153.0 117.1
8 153.5 108.8
9 154.0 112.1
10 154.5 113.7
11 155.1 122.4
12 155.6 118.5
13 156.1 119.0
14 156.6 118.0
15 157.1 115.0
16 157.6 127.1
17 158.1 121.0
18 158.6 109.1
19 159.1 122.8
20 159.6 117.3
21 160.1 114.7
22 160.6 119.4
23 161.1 115.7
24 161.6 117.6
25 162.1 118.5
26 162.6 113.5
27 163.1 126.5
28 163.6 123.5
29 164.1 117.4
30 164.6 129.8
31 165.2 126.0
32 165.7 122.8
33 166.2 129.1
34 166.7 129.4
35 167.2 129.5
36 167.7 129.2
37 168.2 128.9
38 168.7 126.2
39 169.2 125.4
40 169.7 125.4
41 170.2 124.2
42 170.7 127.0
43 171.2 122.1
44 171.7 139.6
45 172.2 135.2
46 172.7 123.9
47 173.2 127.9
48 173.7 128.0
49 174.2 134.6
50 174.7 130.6
51 175.3 132.7
52 175.8 131.7
53 176.3 132.0
54 176.8 139.4
55 177.3 131.8
56 177.8 140.9
57 178.3 126.0
58 178.8 137.0
59 179.3 135.1
60 179.8 135.9
61 180.3 137.1
62 180.8 133.1
63 181.3 134.3
64 181.8 131.3
65 182.3 131.4
66 182.8 138.6
67 183.3 139.7
68 183.8 138.1
69 184.3 142.9
70 184.8 148.9
71 185.4 136.6
72 185.9 127.8
73 186.4 144.8
74 186.9 136.6
75 187.4 137.1
76 187.9 146.0
77 188.4 139.9
78 188.9 135.6
79 189.4 143.0
80 189.9 141.7
81 190.4 142.8
82 190.9 145.1
83 191.4 141.7
84 191.9 147.2
85 192.4 143.2
86 192.9 146.4
87 193.4 150.6
88 193.9 147.6
89 194.4 144.2
90 194.9 152.0
91 195.5 151.6
92 196.0 149.7
93 196.5 148.5
94 197.0 144.6
95 197.5 154.9
96 198.0 145.5
97 198.5 159.8
98 199.0 156.9
99 199.5 148.4
100 200.0 144.9

Scatter Plot of Dataset

Scatter plot shows the relationship between height and weight:

Fitting the Model

Model equation: \[ \hat{y} = \beta_0 + \beta_1x \]

Where:

  • \(\hat{y}\) is the predicted weight,

  • \(x\) is the height.

Summary and Confidence Intervals:

Summary that provides intercept, slope, and R-squared value:

## 
## Call:
## lm(formula = weight ~ height, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.3483  -2.7408  -0.1779   3.2188  10.4538 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.88638    5.50755  -0.706    0.482    
## height       0.77480    0.03136  24.704   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.573 on 98 degrees of freedom
## Multiple R-squared:  0.8616, Adjusted R-squared:  0.8602 
## F-statistic: 610.3 on 1 and 98 DF,  p-value: < 2.2e-16

Confidence intervals for the slope and intercept:

##                   2.5 %    97.5 %
## (Intercept) -14.8159242 7.0431722
## height        0.7125575 0.8370354

3D Visualization

A 3D plot of height, weight and the interaction between them:

Residuals Plot

The residuals that represent the difference between actual and predicted values:

plot(model$residuals, main="Residuals Plot", 
  ylab="Residuals", xlab="Fitted values")
abline(h=0, col="blue")