Let’s look at some additional details of regression analysis.

library(lehmansociology)
load("~/data/RMS_Titanic.Rda")

library(pander)

results2<-lm(Survival ~ Gender, RMS_Titanic)
results3<-lm(Survival ~ Gender + Crew, RMS_Titanic)

pander(results2)

Fitting linear model: Survival ~ Gender
	Estimate	Std. Error	t value	Pr(>\|t\|)
Gender	0.5258	0.02125	24.74	1.815e-119
(Intercept)	0.2067	0.00997	20.73	2.127e-87

summary(results2)$r.squared

## [1] 0.2171981

pander(results3)

Fitting linear model: Survival ~ Gender + Crew
	Estimate	Std. Error	t value	Pr(>\|t\|)
Gender	0.5416	0.02302	23.53	2.24e-109
Crew	0.03472	0.01944	1.786	0.07431
(Intercept)	0.1892	0.01398	13.54	3.541e-40

summary(results3)$r.squared

## [1] 0.2183283

This presents the regression results in tabular form. Notice two additional itemsl PR(>|t|) and the R squared.

The “P values” represent: The probability of a random sample from a population giving you this estimate or higher if the true population value for the coefficient is 0 (the null hypothesis is true). You will see P values in many articles. The smaller the p value, the less likely that you would get a sample like this from a population where the coeffient is 0. Researcher often use .05, . 01, and .001 as cut off values for saying a result is “statistically significant.” That means that they can “reject the null hypothesis” even though they never should say they “accept” a different hypothesis. In this kind of research “reject the null” is the strongest statement you can make. You never say your hypothesis is proved.

$R^2$ squared represents the proportion of variance explained. It goes from 0 to 1, with one being when the independent variables give you predicted $y$ values that are all exactly equal to the $y$ values in the data. $R^2$ of 0 would mean that there is no linear relationship or, in other words, that the coefficient is 0.

How to make nice math notation.

$\hat{y}$ $\hat{y}$

$R^2$ $R^2$