Let’s look at some additional details of regression analysis.
library(lehmansociology)
load("~/data/RMS_Titanic.Rda")
library(pander)
results2<-lm(Survival ~ Gender, RMS_Titanic)
results3<-lm(Survival ~ Gender + Crew, RMS_Titanic)
pander(results2)
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
Gender | 0.5258 | 0.02125 | 24.74 | 1.815e-119 |
(Intercept) | 0.2067 | 0.00997 | 20.73 | 2.127e-87 |
summary(results2)$r.squared
## [1] 0.2171981
pander(results3)
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
Gender | 0.5416 | 0.02302 | 23.53 | 2.24e-109 |
Crew | 0.03472 | 0.01944 | 1.786 | 0.07431 |
(Intercept) | 0.1892 | 0.01398 | 13.54 | 3.541e-40 |
summary(results3)$r.squared
## [1] 0.2183283
This presents the regression results in tabular form. Notice two additional itemsl PR(>|t|) and the R squared.
The “P values” represent: The probability of a random sample from a population giving you this estimate or higher if the true population value for the coefficient is 0 (the null hypothesis is true). You will see P values in many articles. The smaller the p value, the less likely that you would get a sample like this from a population where the coeffient is 0. Researcher often use .05, . 01, and .001 as cut off values for saying a result is “statistically significant.” That means that they can “reject the null hypothesis” even though they never should say they “accept” a different hypothesis. In this kind of research “reject the null” is the strongest statement you can make. You never say your hypothesis is proved.
\(R^2\) squared represents the proportion of variance explained. It goes from 0 to 1, with one being when the independent variables give you predicted \(y\) values that are all exactly equal to the \(y\) values in the data. \(R^2\) of 0 would mean that there is no linear relationship or, in other words, that the coefficient is 0.
\(\hat{y}\) $\hat{y}$
\(R^2\) $R^2$