Let’s look at some additional details of regression analysis.

library(lehmansociology)
load("~/data/RMS_Titanic.Rda")
library(pander)

results2<-lm(Survival ~ Gender, RMS_Titanic)
results3<-lm(Survival ~ Gender + Crew, RMS_Titanic)

pander(results2)
Fitting linear model: Survival ~ Gender
  Estimate Std. Error t value Pr(>|t|)
Gender 0.5258 0.02125 24.74 1.815e-119
(Intercept) 0.2067 0.00997 20.73 2.127e-87
summary(results2)$r.squared
## [1] 0.2171981
pander(results3)
Fitting linear model: Survival ~ Gender + Crew
  Estimate Std. Error t value Pr(>|t|)
Gender 0.5416 0.02302 23.53 2.24e-109
Crew 0.03472 0.01944 1.786 0.07431
(Intercept) 0.1892 0.01398 13.54 3.541e-40
summary(results3)$r.squared
## [1] 0.2183283

This presents the regression results in tabular form. Notice two additional itemsl PR(>|t|) and the R squared.

The “P values” represent: The probability of a random sample from a population giving you this estimate or higher if the true population value for the coefficient is 0 (the null hypothesis is true). You will see P values in many articles. The smaller the p value, the less likely that you would get a sample like this from a population where the coeffient is 0. Researcher often use .05, . 01, and .001 as cut off values for saying a result is “statistically significant.” That means that they can “reject the null hypothesis” even though they never should say they “accept” a different hypothesis. In this kind of research “reject the null” is the strongest statement you can make. You never say your hypothesis is proved.

\(R^2\) squared represents the proportion of variance explained. It goes from 0 to 1, with one being when the independent variables give you predicted \(y\) values that are all exactly equal to the \(y\) values in the data. \(R^2\) of 0 would mean that there is no linear relationship or, in other words, that the coefficient is 0.

How to make nice math notation.

\(\hat{y}\) $\hat{y}$

\(R^2\) $R^2$