2024-10-28

Introduction

  • For this assignment, I have utilized the built-in dataset “Iris”.
  • It contains measurements of sepal length, sepal width, petal length, and petal width of various iris species.

Dataset Iris

data(iris)
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Simple Linear Regression Model

The simple linear regression model predicts Petal Length (\(Y\)) based on Petal Width (\(X\)):

\[ Y = \beta_0 + \beta_1 X + \epsilon \] The fitted line equation: \[ \hat{Y} = \hat{\beta_0} + \hat{\beta_1} X \]

Regression Coefficient Derivations

Slope and intercept: \[ \hat{\beta_1} = \frac{\sum{(X_i - \bar{X})(Y_i - \bar{Y})}}{\sum{(X_i - \bar{X})^2}} \] \[ \hat{\beta_0} = \bar{Y} - \hat{\beta_1} \bar{X} \] —

Simple Linear Regression Plot

Simple Linear Regression Plot Code

y = iris$Petal.Length; x = iris$Petal.Width
mod = lm(y~x)

xax <- list(
  title = "Petal Width",
  titlefont = list(family="Modern Computer Roman")
)
yax <- list(
  title = "Petal Length",
  titlefont = list(family="Modern Computer Roman")
)
plot_ly(x=x, y=y, type="scatter", mode="markers") %>%
   add_lines(x = x, y = fitted(mod), line = list(color = 'pink')) %>%
   layout(xaxis = xax, yaxis = yax)

Scatter Plot

Box Plot