suppressPackageStartupMessages(library(readxl))
url <- "https://plos.figshare.com/ndownloader/files/22322502"
destfile <- "gaodata.xls"
curl::curl_download(url, destfile)
gaodata <- read_excel(destfile)
suppressPackageStartupMessages(library(tidyverse))
gaodata_recode <- mutate(
gaodata,
nplace = recode_factor(
nplace,
"河北" = "Hebei",
"黑龙" = "Black dragon",
"广西" = "Guangxi",
"上海" = "Shanghai",
"河南" = "Henan"
),
depression = recode_factor(depression,
"无抑郁" = "No depression",
"有抑郁" = "Have depression")
)
Here is some additional recoding for this dataset, but taken out of context so you at least have to do some work to incorporate it into your program :)
education = recode_factor(edu, "1"="Middle school", "2"="High School","3"="College","4"="Master")
occupation = recode_factor(ocupn, "0"="Health care workers", "1"="Students/retired","2"="Others")
area = recode_factor(pltype, "2"="Urban", "3"="Rural")
age = recode_factor(cage, "0"="<=20", "1"="21-30","2"="31-40","3"="41-50","4"=">=51")
marriage = recode_factor(marr, "0"="Not married", "2"="Married")
cities = recode_factor(nnplace, "0"="Hubei", "1"="Others")
anxiety = recode_factor(anxiety, "0"="No", "1"="Yes")
depression = recode_factor(depression, "0"="No", "1"="Yes")
I recommend using the excellent table1 package for making your table one! Note this is different from the tableone package, which is also very good, but not as easy to make nice-looking tables.
Here is an example from ?table1
. Note how it allows you to define labels for each variable, and units for continuous variables, and to stratify the table. Also, note that other table1
documentation includes recoding and creation of factors using base R, which is different than how I’ve shown you. Those methods of recoding are equivalent, and you don’t need to use them in order to use the table
package, it’s just part of their examples. The example shown here doesn’t do any recoding.
This first bit of code just creates a simulated dataset
dat <- expand.grid(id=1:10, sex=c("Male", "Female"), treat=c("Treated", "Placebo"))
dat$age <- runif(nrow(dat), 10, 50)
dat$age[3] <- NA # Add a missing value
dat$wt <- exp(rnorm(nrow(dat), log(70), 0.2))
Now use the table1
package to define labels and units.
suppressPackageStartupMessages(library(table1))
label(dat$sex) <- "Sex"
label(dat$age) <- "Age"
label(dat$treat) <- "Treatment Group"
label(dat$wt) <- "Weight"
units(dat$age) <- "years"
units(dat$wt) <- "kg"
Make a table with one level of stratification:
table1(~ sex + age + wt | treat, data=dat)
Treated (N=20) |
Placebo (N=20) |
Overall (N=40) |
|
---|---|---|---|
Sex | |||
Male | 10 (50.0%) | 10 (50.0%) | 20 (50.0%) |
Female | 10 (50.0%) | 10 (50.0%) | 20 (50.0%) |
Age (years) | |||
Mean (SD) | 33.6 (11.8) | 27.7 (11.0) | 30.6 (11.6) |
Median [Min, Max] | 37.4 [11.9, 49.6] | 29.2 [10.1, 45.9] | 32.3 [10.1, 49.6] |
Missing | 1 (5.0%) | 0 (0%) | 1 (2.5%) |
Weight (kg) | |||
Mean (SD) | 66.7 (15.7) | 69.6 (11.4) | 68.1 (13.6) |
Median [Min, Max] | 65.3 [42.5, 110] | 67.2 [48.7, 87.5] | 65.7 [42.5, 110] |
Make a table with two levels of stratification (nesting):
table1(~ age + wt | treat*sex, data=dat)
Treated |
Placebo |
Overall |
||||
---|---|---|---|---|---|---|
Male (N=10) |
Female (N=10) |
Male (N=10) |
Female (N=10) |
Male (N=20) |
Female (N=20) |
|
Age (years) | ||||||
Mean (SD) | 33.5 (12.0) | 33.7 (12.3) | 26.7 (9.74) | 28.6 (12.5) | 29.9 (11.1) | 31.2 (12.4) |
Median [Min, Max] | 37.4 [16.2, 49.6] | 37.5 [11.9, 48.5] | 26.5 [10.1, 40.2] | 31.5 [11.0, 45.9] | 28.7 [10.1, 49.6] | 34.4 [11.0, 48.5] |
Missing | 1 (10.0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1 (5.0%) | 0 (0%) |
Weight (kg) | ||||||
Mean (SD) | 71.7 (17.1) | 61.7 (13.2) | 66.3 (9.86) | 72.9 (12.4) | 69.0 (13.8) | 67.3 (13.7) |
Median [Min, Max] | 67.0 [50.1, 110] | 60.1 [42.5, 82.1] | 64.1 [50.5, 81.6] | 75.0 [48.7, 87.5] | 65.0 [50.1, 110] | 68.2 [42.5, 87.5] |