2025-03-18

Plotly of Simple Linear Regression

The orange line in this plot exemplifies the linear regression line. This line is dependent on the temperature, on the x-axis. Temperature is the independent variable, while ozone concentration is the dependent variation. The orange line, based on the values of Temperature, can be used to predict the values of ozone contentration, hence the purpose of a linear regression model.

ggplot - Example 1

## `geom_smooth()` using formula = 'y ~ x'

This plot shows the simple linear regression model for a women’s height versus their weight. The height is the independent variable, and the weight is the dependent variable. The blue line, based on the values of height,can be used to predict the values of weight, hence the purpose of a linear regression model. The correlation of this data is 0.9955, which is almost completely linear.

ggplot - Example 2

## `geom_smooth()` using formula = 'y ~ x'

This plot shows the simple linear regression model for a cars speed versus the distance that the car has traveled. The speed is the independent variable, and the distance is the dependent variable. The blue line, based on the values of speed, can be used to predict the values of distance, hence the purpose of a linear regression model. The correlation of this data is 0.8068, which is still linear, but contains more outliers than the previous graph.

Latex Equations of Simple Linear Regression

The following Equation is the base line for a simple linear regression. For this, let X and Y be quantitative/continuous variables, and assume that they have a linear correlation. Simple linear regression describes this linear correlation through the following equation, \[Y = b_0 + b_1X + \epsilon\] where \(b_0\) and \(b_1\) are considered to be real numbers, and \(\epsilon\) is considered a continuous random variable. \(\epsilon\) represents a normal distribution, with a standard deviation, \(\sigma\), that is a unknown positive number, and a mean of 0.

Latex Equations of Least Squares Estimates

Now that we have a basic knowledge of simple linear regression, let’s address how the values of \(b_0\) and \(b_1\) are found. For this, again, let X and Y be quantitative/continuous variables from the same population of data, where \((x_i, y_i)\) is a data point from X and Y, respectfully. Lastly, let n be a simple random sample size from the same population. We now have the background to prove the least squares estimates of \(b_0\) and \(b_1\). The regression equation, \(E(Y) = b_0 + b_1x\) are found through \[\hat{b_1} = \frac{n(\sum_{i = 1}^{n}x_iy_i) -(\sum_{i = 1}^{n}x_i) (\sum_{i = 1}^{n}y_i)}{n(\sum_{i = 1}^{n}x_i^2) - (\sum_{i = 1}^{n}x_i)^2}\] \[\hat{b_1} = \frac{1}{n}\sum_{i = 1}^{n}y_i -\hat{b_0} \frac{1}{n}\sum_{i = 1}^{n}x_i\] After finding these, the values of \(b_0\) and \(b_1\) can be used in \[\hat{y} = \hat{b_0} + \hat{b_1}x\] This is the estimate regression equation or line.

R code of Simple Linear Regression for ggplot

ggplot(women, aes(x = height, y = weight)) + 
  geom_point(color = "magenta", size = 2) + 
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(title = "Women's Height (in) vs. Weight (lb)", 
       x = "Height (in)", y = "Weight (lb)") +
  theme(plot.title = element_text(hjust = 0.5))
cor(women$height, women$weight)

R code of Simple Linear Regression for Plotly

no_na_airquality = na.omit(airquality)
mod = lm(Ozone ~ Temp, data = no_na_airquality)
y = no_na_airquality$Ozone
x = no_na_airquality$Temp
xax <- list(
  title = "Temperature (Degrees Fahrenheit)",
  titlefont = list(family="Modern Computer Roman"),
  range = c(55, max(x))
)
yax <- list(
  title = "Ozone Concentration (ppb)",
  titlefont = list(family="Modern Computer Roman"),
   range = c(-10, max(y))
)

R code of Simple Linear Regression for Plotly Cont.

graph_ozonev.solar <- plot_ly(x=x, y=y, type="scatter", mode="markers", 
                              name="Data from Airquality",
               width=690, height=300) %>%
  add_lines(x = x, y = fitted(mod), name="Linear Fit") %>%
  layout(xaxis = xax, yaxis = yax) %>%
  layout(margin=list(
    l=50,
    r=50,
    b=0,
    t=0
  )
  )
config(graph_ozonev.solar )