Here are some example R commands.
I created a data file for R with all of the labels and groups predefined for participants' demographics. Wrapping the "load" command in parentheses tells R to output the name of data table (called a data frame by R).
(load("~/Desktop/w00Demographics.rdata"))
[1] "w00Dem"
Let's see what's in the data. A summary command prints the minimum, maximum, mean, and median for each variable or the numbers of participants in groups for categorical variables.
summary(w00Dem)
HNDid DOB Age0 Age0grp Age0med Sex Race PovStat
Min. :8031042901 Min. :1939-07-28 Min. :30.0 30-34:392 Below:1860 Women:2035 White:1522 Above:2185
1st Qu.:8133082301 1st Qu.:1950-11-27 1st Qu.:40.0 35-39:461 Above:1860 Men :1685 AfrAm:2198 Below:1535
Median :8162502101 Median :1958-02-09 Median :48.0 40-44:509
Mean :8160956892 Mean :1958-07-25 Mean :47.7 45-39:692
3rd Qu.:8192566076 3rd Qu.:1966-01-17 3rd Qu.:55.0 50-54:631
Max. :8224521902 Max. :1978-07-21 Max. :64.0 55-59:577
60-64:458
The summary command show the distributions of values for each variable separately. What about if we want to know how many women are African Americans in our sample?
with(w00Dem, table(Race, Sex))
Sex
Race Women Men
White 835 687
AfrAm 1200 998
Can you show how many men are in the below poverty status group?
You already know something about t-tests. In a t-test we ask whether a measure (e.g., age) is different in two groups. For example, do men and women have different mean ages in HANDLS?
In R there are two equivalent forms for the t-test command, but the most simple is:
with(w00Dem, t.test(Age0 ~ Sex))
Welch Two Sample t-test
data: Age0 by Sex
t = 0.7444, df = 3615, p-value = 0.4567
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.3736 0.8310
sample estimates:
mean in group Women mean in group Men
47.82 47.59
So what does this tell us? Are there statistically significant differences in the mean ages for men and women in HANDLS?
The notation with the "curl" character says that we are examining age (the outcome or dependent variable) as a function of sex (the predictor, the grouping factor, the independent variable).
What about age differences by poverty status?
with(w00Dem, t.test(Age0 ~ PovStat))
Welch Two Sample t-test
data: Age0 by PovStat
t = 2.491, df = 3363, p-value = 0.01277
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.1642 1.3773
sample estimates:
mean in group Above mean in group Below
48.03 47.26
Linear regression is an equivalent form of a t-test. Do you see some similarities between this analysis and the t.test above?
reg1 = lm(Age0 ~ Sex, data = w00Dem)
summary(reg1)
Call:
lm(formula = Age0 ~ Sex, data = w00Dem)
Residuals:
Initial age
Min 1Q Median 3Q Max
-17.816 -7.588 0.412 7.412 16.412
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.816 0.207 230.81 <2e-16
SexMen -0.229 0.308 -0.74 0.46
Residual standard error: 9.35 on 3718 degrees of freedom
Multiple R-squared: 0.000148, Adjusted R-squared: -0.00012
F-statistic: 0.552 on 1 and 3718 DF, p-value: 0.458