2024-09-20

Simple Linear Regression

  • A model in Statistics used to find the correlation between an independent and dependent variable.
  • Where the regression line, which is shaded, is used to predict values.

Point Estimation

  • A method in Statistics, used often in Linear Regression, to predict a value for a populated parameter, based on given sample data.
  • Finding the mean, or average, is highly pertinent to point estimation.

Dataset quakes

data(quakes)
head(quakes)
     lat   long depth mag stations
1 -20.42 181.62   562 4.8       41
2 -20.62 181.03   650 4.2       15
3 -26.00 184.10    42 5.4       43
4 -17.97 181.66   626 4.1       19
5 -20.42 181.96   649 4.0       11
6 -19.68 184.31   195 4.0       12

Dataset quakes; Magnitude VS. Reportings

model: \(\text{Magnitude}= \beta_0 + \beta_1\cdot \text{Reportings}+ \varepsilon; \hspace{1cm} \varepsilon \sim \mathcal{N} (0; \sigma^2)\)
  fitted: \(\text{Magnitude}= \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{Reportings}\)               \(\hat{\beta}_0 = b_0 - \text{estimate of } \beta_0\); \(\hat{\beta}_1 = b_1 - \text{estimate of } \beta_1\)

R code for previous slide’s 2D Plotly Plot

  • Graph of Magnitude VS. Reportings
data(quakes) 
mod = lm(stations ~ mag, data = quakes)
x = quakes$mag; y = quakes$stations
xax <- list(
  title = "Magnitude (Richter Scale)",
  titlefont = list(family ="Modern Computer Roman")
)
yax <- list(
  title = "Number of Stations Reporting",
  titlefont = list(family="Modern Computer Roman")
)
fig <- plot_ly(x=x, y=y, type="scatter", mode="markers", 
               name="data", width = 800, height = 430) %>%
  add_lines(x = x, y = fitted(mod), name="fitted") %>%
    layout(xaxis = xax, yaxis = yax) %>%
      layout(margin=list(
        l=150,
        r=50,
        b=20,
        t=40
        )
      )
config(fig, displaylogo=FALSE)

3D Plotly Plot

R code for previous slide’s 3D Plotly Plot

  • 3D Graph on Quake Longitude, Latitude, & Magnitude.
  • Intentionally inputted more than 8 colors for the color legend.
  • In order to see the variation in magnitude more realistically.
  • For this reason, I turned off warnings for this particular code snippet.
  • Pipe operator useful for layering.
xax2 <- list(
  title = "Latitude (degrees)",
  titlefont = list(family = "Modern Computer Roman")
)
yax2 <- list(
  title = "Longitude (degrees)",
  titlefont = list(family = "Modern Computer Roman")
)
zax2 <- list(
  title = "Magnitude (Richter Scale)",
  titlefont = list(family = "Modern Computer Roman")
)
plot_ly(data=quakes, x=quakes$lat, y=quakes$long, z=quakes$mag,
        type="scatter3d", mode="markers", color=as.factor(quakes$mag)) %>%
  layout(
    title = "3D Graph on Quake Longitude, Latitude, & Magnitude",
    scene=list(xaxis=xax2, yaxis=yax2, zaxis=zax2))

Mean of Depth and Magnitude

mean: \(\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_{i}\)

mean_mag <- round(mean(quakes$mag), 2)
mean_mag
[1] 4.62
mean_depth <- round(mean(quakes$depth), 2)
mean_depth
[1] 311.37

Magnitude VS. Depth Regression ggplot

g <- ggplot(quakes, aes(x = depth, y = mag)) + geom_point(aes(color=mag))
g + geom_smooth(method="lm", level=0.99) + labs(title = "Magnitude VS. Depth", 
  x = "Depth (km)", y = "Magnitude (Richter Scale)") + theme_classic() + 
  coord_cartesian(ylim=c(3.9,6.5))

Stations VS. Depth Regression ggplot

g1 <- ggplot(quakes, aes(x = depth, y = stations)) + geom_point(aes(color=mag))
g1 + geom_smooth(method="lm", level=0.99) + labs(title = "Depth VS. Reportings", 
  x = "Depth (km)", y = "Number of Stations Reporting") + theme_classic()