1 Load data

suppressPackageStartupMessages(library(readxl))
url <- "https://plos.figshare.com/ndownloader/files/22322502"
destfile <- "gaodata.xls"
curl::curl_download(url, destfile)
gaodata <- read_excel(destfile)

2 Recode Gao et al. data

2.1 Initial recoding provided last week

suppressPackageStartupMessages(library(tidyverse))
gaodata_recode <- mutate(
  gaodata,
  nplace = recode_factor(
    nplace,
    "河北" = "Hebei",
    "黑龙" = "Black dragon",
    "广西" = "Guangxi",
    "上海" = "Shanghai",
    "河南" = "Henan"
  ),
  depression = recode_factor(depression,
                             "无抑郁" = "No depression",
                             "有抑郁" = "Have depression")
)

2.2 Additional recoding

Here is some additional recoding for this dataset, but taken out of context so you at least have to do some work to incorporate it into your program :)

education = recode_factor(edu, "1"="Middle school", "2"="High School","3"="College","4"="Master")
occupation = recode_factor(ocupn, "0"="Health care workers", "1"="Students/retired","2"="Others")
area = recode_factor(pltype, "2"="Urban", "3"="Rural")
age = recode_factor(cage, "0"="<=20", "1"="21-30","2"="31-40","3"="41-50","4"=">=51")
marriage = recode_factor(marr, "0"="Not married", "2"="Married")
cities = recode_factor(nnplace, "0"="Hubei", "1"="Others")
anxiety = recode_factor(anxiety, "0"="No", "1"="Yes")
depression = recode_factor(depression, "0"="No", "1"="Yes")

3 Using the table1 package to make your table 1

I recommend using the excellent table1 package for making your table one! Note this is different from the tableone package, which is also very good, but not as easy to make nice-looking tables.

Here is an example from ?table1. Note how it allows you to define labels for each variable, and units for continuous variables, and to stratify the table. Also, note that other table1 documentation includes recoding and creation of factors using base R, which is different than how I’ve shown you. Those methods of recoding are equivalent, and you don’t need to use them in order to use the table package, it’s just part of their examples. The example shown here doesn’t do any recoding.

This first bit of code just creates a simulated dataset

dat <- expand.grid(id=1:10, sex=c("Male", "Female"), treat=c("Treated", "Placebo"))
dat$age <- runif(nrow(dat), 10, 50)
dat$age[3] <- NA  # Add a missing value
dat$wt <- exp(rnorm(nrow(dat), log(70), 0.2))

Now use the table1 package to define labels and units.

suppressPackageStartupMessages(library(table1))
label(dat$sex) <- "Sex"
label(dat$age) <- "Age"
label(dat$treat) <- "Treatment Group"
label(dat$wt) <- "Weight"

units(dat$age) <- "years"
units(dat$wt) <- "kg"

Make a table with one level of stratification:

table1(~ sex + age + wt | treat, data=dat)
Treated
(N=20)
Placebo
(N=20)
Overall
(N=40)
Sex
Male 10 (50.0%) 10 (50.0%) 20 (50.0%)
Female 10 (50.0%) 10 (50.0%) 20 (50.0%)
Age (years)
Mean (SD) 33.6 (11.8) 27.7 (11.0) 30.6 (11.6)
Median [Min, Max] 37.4 [11.9, 49.6] 29.2 [10.1, 45.9] 32.3 [10.1, 49.6]
Missing 1 (5.0%) 0 (0%) 1 (2.5%)
Weight (kg)
Mean (SD) 66.7 (15.7) 69.6 (11.4) 68.1 (13.6)
Median [Min, Max] 65.3 [42.5, 110] 67.2 [48.7, 87.5] 65.7 [42.5, 110]

Make a table with two levels of stratification (nesting):

table1(~ age + wt | treat*sex, data=dat)
Treated
Placebo
Overall
Male
(N=10)
Female
(N=10)
Male
(N=10)
Female
(N=10)
Male
(N=20)
Female
(N=20)
Age (years)
Mean (SD) 33.5 (12.0) 33.7 (12.3) 26.7 (9.74) 28.6 (12.5) 29.9 (11.1) 31.2 (12.4)
Median [Min, Max] 37.4 [16.2, 49.6] 37.5 [11.9, 48.5] 26.5 [10.1, 40.2] 31.5 [11.0, 45.9] 28.7 [10.1, 49.6] 34.4 [11.0, 48.5]
Missing 1 (10.0%) 0 (0%) 0 (0%) 0 (0%) 1 (5.0%) 0 (0%)
Weight (kg)
Mean (SD) 71.7 (17.1) 61.7 (13.2) 66.3 (9.86) 72.9 (12.4) 69.0 (13.8) 67.3 (13.7)
Median [Min, Max] 67.0 [50.1, 110] 60.1 [42.5, 82.1] 64.1 [50.5, 81.6] 75.0 [48.7, 87.5] 65.0 [50.1, 110] 68.2 [42.5, 87.5]

4 Question: can I do different analyses, or analyze different variables than the authors of the paper did?

Yes, you are free to choose different analyses / exposures than the authors did, and you don’t have to use all the variables. I am only looking for you to demonstrate proficiency in the methods, and how you do that is up to you.