2026-06-07

What is Linear Regression, and why is it useful?

Linear regression is a method of modeling the relationship between a dependent variable and an independent variable, or a response variable and predictor variable.

Using functions, we model linearly the estimated parameters from a data set, which then can be later used for future estimations of similar data.

In this presentation, we will take the distance and velocity of 100 known galaxies, and use them along with the Hubble’s Law equation to demonstrate the expansion we now know to be happening throughout our universe.

What is Hubble’s Law?

Hubble’s Law is the fundamental principle that describes the relationship between a galaxy’s distance from Earth and its velocity.

Edwin Hubble made the discovery that not only were galaxies moving away from each other, but the universe itself was expanding at an accelerated rate due to an unknown force we call dark energy.

The equation: \[ v=H0d \] where: \(d\) = distance(Mpc) \(v\) = recession velocity(km/s) \(H_0\) = Hubble Constant(typically about 73-74km/s/Mpc for later/modern galaxies)

We use galaxies’ redshift to measure their velocity of recession, or the velocity at which they move away from the observer.

The Data

The following data set has been created for this presentation with values collected from the public NED database by NASA/IPAC. Variables include: galaxy, distance(Megaparsecs,Mpc), and velocity(kms).

##            Galaxy Distance_Mpc Velocity_kms
## 1       UGC 12914        51.80         4545
## 2        NGC 7814        12.20         1204
## 3  ESO 349- G 031         3.21          195
## 4 2dFGRS S839Z607       271.00        17665
## 5        NGC 0048        60.80         1972
## 6        NGC 0045        10.60          493

GGPlot Scatterplot

library(ggplot2)
ggplot(Galaxies, aes(x=Distance_Mpc,y=Velocity_kms))+geom_point()

Regression Model

\[ y= \beta_0 + \beta_1 x \] where: \(\beta_0\) = intercept \(\beta_1\) = slope \(x\) = predictor variable \(y\) = response variable \[ \hat{v} = 274.95 + 72.98d \]

## 
## Call:
## lm(formula = Velocity_kms ~ Distance_Mpc, data = Galaxies)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2740.5  -350.5  -207.7   196.4  4179.3 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    274.95     142.18   1.934    0.056 .  
## Distance_Mpc    72.98       2.07  35.261   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1100 on 97 degrees of freedom
## Multiple R-squared:  0.9276, Adjusted R-squared:  0.9269 
## F-statistic:  1243 on 1 and 97 DF,  p-value: < 2.2e-16

GGPlot Regression Line

ggplot(Galaxies, aes(x=Distance_Mpc,y=Velocity_kms))+geom_point()+
  geom_smooth(method="lm",se=FALSE)+
  labs(title="Hubble's Law",x="Distance (Mpc)",y="Velocity (km/s)")

Plotly 3D Visualization

Redshift was approximated using \[z\approx\frac{v}{c}\] \(c\) = speed of light

Statistical Interpretation

The fit of our scatterplot gives an estimated slope of 72.98 km/s/Mpc, an accurate representation of the Hubble Constant, or the constant that describes the expansion of our universe! This also shows that farther galaxies tend to have a higher recessional velocity.

Our R-squared value, 0.928, demonstrates that about 92.8% of the variation in galaxy velocity can be described by distance. This is the importance and value of using linear regression to estimate relationship.

Note: Our summary also includes a p-value of <2e-16, or 0.0000000000000002, suggesting a strong relationship between our two variables.

Conclusion

Using linear regression, we were able to explore one of the most groundbreaking discoveries in astrophysics and modern astronomy: universe expansion. We were also able to describe mathematically and graphically the relational connection between our response and predictor variables, distance an recessional velocity.