Edwin Hubble’s Dataset of Extra-Galactic Nebulae
[Reference: Hubble, E. (1929) “A Relationship Between Distance and Radial Velocity among Extra-Galactic Nebulae,” Proceedings of the National Academy of Science, 168. http://lib.stat.cmu.edu/DASL/Datafiles/Hubble.html]
In this experiment, the independent variable is the distance of the extra-galactic nebulae from Earth (in Megaparsecs) and the dependent variable is the recesson velocity (in km/sec) of the extra-galactic nebulae.
##Load in the Hubble Dataset
#Get dataset from Project Documents File
hubble <- read.csv("~/Academics (RPI)/10. Spring 2015/Applied Regression Analysis/Assignments/Assignment #1/Hubble.csv", header=TRUE)
head(hubble)
## distance recession_velocity
## 1 0.032 170
## 2 0.034 290
## 3 0.214 -130
## 4 0.263 -70
## 5 0.275 -185
## 6 0.275 -220
tail(hubble)
## distance recession_velocity
## 19 1.4 500
## 20 1.7 960
## 21 2.0 500
## 22 2.0 850
## 23 2.0 800
## 24 2.0 1090
In this experiment, we are trying to determine whether or not the variation that is observed in the response variable (which corresponds to ‘recession_velocity’ in this analysis) can be explained by the variation existent in the single treatment of the experiment (which corresponds to ‘distance’). Therefore, the null hypothesis that is being tested states that the distance of the extra-galactic nebulae from Eartn does not have a significant effect on the recesson velocity of the extra-galactic nebulae.
In order to determine whether or not the variation that is observed in the response variable (which corresponds to ‘recession_velocity’ in this analysis) can be explained by the variation existent in the single treatment of the experiment (which corresponds to ‘distance’), we can generate a linear model using the “lm()” function. With this linear model, we will be able to determine if the variation in nebulae recession velocity can be explained by the variation existent in the distance of the nebulae from Earth.
summary(hubble)
## distance recession_velocity
## Min. :0.0320 Min. :-220.0
## 1st Qu.:0.4062 1st Qu.: 165.0
## Median :0.9000 Median : 295.0
## Mean :0.9114 Mean : 373.1
## 3rd Qu.:1.1750 3rd Qu.: 537.5
## Max. :2.0000 Max. :1090.0
str(hubble)
## 'data.frame': 24 obs. of 2 variables:
## $ distance : num 0.032 0.034 0.214 0.263 0.275 0.275 0.45 0.5 0.5 0.63 ...
## $ recession_velocity: int 170 290 -130 -70 -185 -220 200 290 270 200 ...
hubble_model <- lm(hubble$recession_velocity~hubble$distance)
summary(hubble_model)
##
## Call:
## lm(formula = hubble$recession_velocity ~ hubble$distance)
##
## Residuals:
## Min 1Q Median 3Q Max
## -397.96 -158.10 -13.16 148.09 506.63
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -40.78 83.44 -0.489 0.63
## hubble$distance 454.16 75.24 6.036 4.48e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 232.9 on 22 degrees of freedom
## Multiple R-squared: 0.6235, Adjusted R-squared: 0.6064
## F-statistic: 36.44 on 1 and 22 DF, p-value: 4.477e-06
In 1929, Edwin Hubble investigated the relationship between the distance from Earth and the radial velocity of extra-galactic nebulae (celestial objects) to see if there was any significant relationship between them. To test this experiment, Hubble collected and used data from 24 nebulae, which is comprised of the distances from Earth (in Megaparsecs) and the recession velocities (in km/sec) for each respective nebulae.
plot(y = hubble$recession_velocity,x = hubble$distance, col="red", main="Nebulae Distance from Earth vs. Nebulae Recession Velocity", ylab = "Recession Velocity (in km/s)", xlab = "Distance from Earth (in Megaparsecs)")
plot(y = hubble$recession_velocity,x = hubble$distance, col="red", main="Nebulae Distance from Earth vs. Nebulae Recession Velocity", ylab = "Recession Velocity (in km/s)", xlab = "Distance from Earth (in Megaparsecs)")
abline(hubble_model)
confint(hubble_model, 'hubble$distance', level=0.95)
## 2.5 % 97.5 %
## hubble$distance 298.1262 610.1906
hubble_model <- lm(hubble$recession_velocity~hubble$distance)
summary(hubble_model)
##
## Call:
## lm(formula = hubble$recession_velocity ~ hubble$distance)
##
## Residuals:
## Min 1Q Median 3Q Max
## -397.96 -158.10 -13.16 148.09 506.63
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -40.78 83.44 -0.489 0.63
## hubble$distance 454.16 75.24 6.036 4.48e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 232.9 on 22 degrees of freedom
## Multiple R-squared: 0.6235, Adjusted R-squared: 0.6064
## F-statistic: 36.44 on 1 and 22 DF, p-value: 4.477e-06
(To be completed on final version of Assignment #1.)