事前資料の内容も一部記載していますが、コードは最低限にしています。
DB化やファイル一覧化の詳細は、事前資料を参照してください。

package

pacman::p_load(
  tidyverse,
  lubridate,
  tableone,
  skimr,
  duckdb,
  DBI,
  dbplyr,
  arrow,
  RSQLite
)

データベース

Fドライブの Database フォルダ内にある mimic_iv.db にアクセスします。
各自の環境に合わせて、必要であれば以下のパスを修正してください。

db_path <- "F:/Database/mimic_iv.db"

mimic_iv.db が存在しない場合は、空のDBファイルが作成されます。
事前課題でMIMIC-IVのCSVファイルをDB化していない場合、この後のテーブル確認は実行できません。

con_mimic <- DBI::dbConnect(duckdb::duckdb(), dbdir = db_path)

DB内に入っているテーブル一覧を確認します。

DBI::dbListTables(con_mimic)

##  [1] "admissions"               "bja_step01_patients_2026"
##  [3] "caregiver"                "chartevents"             
##  [5] "d_hcpcs"                  "d_icd_diagnoses"         
##  [7] "d_icd_procedures"         "d_items"                 
##  [9] "d_labitems"               "datetimeevents"          
## [11] "diagnoses_icd"            "drgcodes"                
## [13] "emar"                     "emar_detail"             
## [15] "hcpcsevents"              "icustays"                
## [17] "ingredientevents"         "inputevents"             
## [19] "labevents"                "microbiologyevents"      
## [21] "omr"                      "outputevents"            
## [23] "patients"                 "pharmacy"                
## [25] "poe"                      "poe_detail"              
## [27] "prescriptions"            "procedureevents"         
## [29] "procedures_icd"           "provider"                
## [31] "sepsis3"                  "services"                
## [33] "transfers"

その他の処理

テーブルを削除する場合の例です。通常は実行しません。

# DBI::dbRemoveTable(con_mimic, "caregivers")

最後に接続解除する場合は、以下を使います。

# DBI::dbDisconnect(con_mimic, shutdown = TRUE)

dataの読み込み

事前資料で実施済みのため、省略します。

事前課題：解答例

admissionsで全行数とユニーク数が同じ列は？

admissions は入院単位のテーブルです。
まず列名、行数、データの概要を確認します。

db_ad <- tbl(con_mimic, "admissions")

db_ad |>
  colnames()

##  [1] "subject_id"           "hadm_id"              "admittime"           
##  [4] "dischtime"            "deathtime"            "admission_type"      
##  [7] "admit_provider_id"    "admission_location"   "discharge_location"  
## [10] "insurance"            "language"             "marital_status"      
## [13] "race"                 "edregtime"            "edouttime"           
## [16] "hospital_expire_flag"

db_ad |>
  count()

skimr::skim() を使うと、各列の欠測数やユニーク数をまとめて確認できます。
大きすぎるテーブルでは注意が必要ですが、admissions は比較的扱いやすいサイズです。

db_ad |>
  mutate_if(is.numeric, as.character) |>
  skimr::skim()

Data summary
Name	mutate_if(db_ad, is.numer…
Number of rows	546028
Number of columns	16
_______________________
Column type frequency:
character	11
POSIXct	5
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
subject_id	0	1.00	8	8	223452
hadm_id	0	1.00	8	8	546028
admission_type	0	1.00	6	27	9
admit_provider_id	4	1.00	6	6	2045
admission_location	1	1.00	4	38	11
discharge_location	149818	0.73	4	28	13
insurance	9355	0.98	5	9	5
language	775	1.00	4	22	25
marital_status	13619	0.98	6	8	4
race	0	1.00	5	41	33
hospital_expire_flag	0	1.00	1	1	2

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
admittime	0	1.00	2105-10-04 17:26:00	2214-12-15 19:11:00	2155-01-11 17:56:00	534919
dischtime	0	1.00	2105-10-12 11:11:00	2214-12-24 13:44:00	2155-01-16 13:38:30	528871
deathtime	534238	0.02	2110-01-25 09:40:00	2214-10-12 12:51:00	2153-11-01 23:12:30	11788
edregtime	166788	0.69	2106-02-06 15:47:00	2214-12-15 00:45:00	2155-06-11 20:23:00	372692
edouttime	166788	0.69	2106-02-07 09:31:00	2214-12-15 22:50:00	2155-06-12 06:27:30	372755

hadm_id は入院IDです。
admissions は入院単位なので、基本的に hadm_id は一意になります。

db_ad |>
  distinct(hadm_id) |>
  count()

1列ずつユニーク数を数える場合は、以下のようにforループで確認できます。

cols_ad <-
  db_ad |>
  head(1) |>
  collect() |>
  select(!ends_with("time")) |>
  colnames()

cols_count <-
  data.frame(
    name = cols_ad,
    count = NA
  )

for (i in seq_along(cols_ad)) {
  cols_count[i, 2] <-
    db_ad |>
    select(all_of(cols_ad[i])) |>
    filter(!is.na(.data[[cols_ad[i]]])) |>
    distinct() |>
    count() |>
    pull()
}

cols_count

diagnoses_icdとd_icd_diagnosesで共通する列は？

MIMIC-IVではICD-9とICD-10が混在するため、icd_code と icd_version の組み合わせが重要です。

db_diag_icd <- tbl(con_mimic, "diagnoses_icd")
db_d_icd_diag <- tbl(con_mimic, "d_icd_diagnoses")

db_diag_icd |> colnames()

## [1] "subject_id"  "hadm_id"     "seq_num"     "icd_code"    "icd_version"

db_d_icd_diag |> colnames()

## [1] "icd_code"    "icd_version" "long_title"

intersect(
  colnames(db_diag_icd),
  colnames(db_d_icd_diag)
)

## [1] "icd_code"    "icd_version"

# db_diag_icd %>% head(100) %>% collect() %>% View()

第1回講義

男女別・年齢別で死亡数のTableを作る

MIMIC-IVではpatients の anchor_age に anchor_yearにおける年齢が格納されています。

db_pt <- tbl(con_mimic, "patients")

db_pt |> colnames()

## [1] "subject_id"        "gender"            "anchor_age"       
## [4] "anchor_year"       "anchor_year_group" "dod"

db_pt |> count()

db_pt |> head(5)

年齢の算出方法

MIMIC-IVでは、匿名化のため実際の暦年はシフトされています。
anchor_age は anchor_year 時点の年齢を表します。
「anchor_yearと入院年との差」を使って、以下のように入院時年齢を推定することができます。　　

注）　　 lubridate全部がdbplyrで動くわけではないが、year(), month(), day() など一部の日時抽出関数はDuckDB向けSQLに翻訳されるようです　　

db_pt_age <-
  db_ad |>
  left_join(db_pt, by = "subject_id") |>
  mutate(
    admit_year = lubridate::year(admittime),
    age = anchor_age + admit_year - anchor_year,
    # age = if_else(age > 91, 91, age)
  )

db_pt_age |>
  select(admittime, anchor_age, anchor_year, admit_year, age) |>
  head(10)

MIMIC-IIIでは、89歳以上の患者が見かけ上300歳以上になる仕様でした。
MIMIC-IVでは anchor_age が用意されており、91歳以上は高齢者として丸められています。

年齢の分布を確認する

患者ごとの初回入院を使って、年齢分布を確認します。

df_pt_age_first <-
  db_pt_age |>
  group_by(subject_id) |>
  window_order(admittime) |> 
  mutate(admission_seq = row_number()) |>
  ungroup() |>
  filter(admission_seq == 1) |>
  select(subject_id, age, gender, hospital_expire_flag)

df_pt_age_first |>
  ggplot() +
  geom_histogram(aes(x = age), color = "black") +
  theme_bw() +
  xlab("Age") +
  ylab("Number of patients")

91歳以上は丸められているはずだが、若干名存在する。
age計算時に91歳以上の丸め処置を考慮しなかったから。
（コメントアウトしていた部分をオンにすると解決する）

表の作成

70歳以上・未満、性別ごとに院内死亡数を確認します。

db_pt_age |>
  mutate(age_70 = if_else(age >= 70, "over70", "under70")) |>
  group_by(gender, age_70) |>
  summarise(
    n = n(),
    hospital_death = sum(hospital_expire_flag, na.rm = TRUE),
    .groups = "drop"
  )

死亡割合

CreateTableOne() の練習として、性別と年齢カテゴリごとに院内死亡割合を確認します。

table <-
  db_pt_age |>
  mutate(age_70 = if_else(age >= 70, "over70", "under70")) |>
  select(gender, age_70, hospital_expire_flag) |>
  collect() |>
  CreateTableOne(
    vars = "hospital_expire_flag",
    strata = c("gender", "age_70"),
    factorVars = "hospital_expire_flag"
  )

print(table, test = FALSE) |> as.data.frame()

##                               Stratified by gender:age_70
##                                F:over70     M:over70     F:under70    
##   n                            95514        83990        188583       
##   hospital_expire_flag = 1 (%)  3418 (3.6)   3496 (4.2)    1984 (1.1) 
##                               Stratified by gender:age_70
##                                M:under70    
##   n                            177941       
##   hospital_expire_flag = 1 (%)   2903 (1.6)

論文再現

Gu W-J, Duan X-J, Liu X-Z, Cen Y, Tao L-Y, Lyu J, Yin H-Y.
Association of magnesium sulfate use with mortality in critically ill patients with sepsis: a retrospective propensity score-matched cohort study.
British Journal of Anaesthesia. 2023;131(5):861-870. doi:10.1016/j.bja.2023.08.005

今回の演習では、MIMIC-IVを用いてこのBJA論文の再現を目指します。ただし、使用しているMIMIC IVのバージョンが違う事もあり、数値が一致することはありません。解析の流れを通して前処理スキルを身に着けることを目的とした演習になります。

研究の概要

対象患者: 成人の重症敗血症ICU患者
曝露: ICU滞在中の硫酸マグネシウム使用
主要アウトカム: 28日全死亡
解析: propensity score matching後の死亡解析

第1回では、曝露やアウトカムを作る前に、まず対象患者を作成します。

1. 対象患者：adult critically ill patients with sepsis

使用するデータベース

今回の第1回で主に使用するテーブルは以下です。

patients
admissions
icustays
sepsis3

sepsis3 はMIMIC-IV derivedのテーブルです。
このテーブルがDB内にない場合は、BigQueryなどから取得した sepsis3 をDBに追加してから実行してください。

BigQuery経由で sepsis3 ファイルを入手する

詳細は講義資料参照。　　 1. BigQueryからderived -> sepsis3 のcsvファイルをダウンロードする　　 2. “F:/BigDatas/mimic_iv”の下に”derived”フォルダを作成してcsvファイルを格納する　　 3. csvファイルをcon_mimic（すなわちmimic_iv.db）に登録する　　

パスの指定

derived_path <- "F:/BigDatas/mimic_iv/derived"

まず確認

derived_csv_list <-
  list.files(
    path = derived_path,
    pattern = "\\.csv$",
    recursive = TRUE,
    full.names = TRUE
  )

derived_short_list <-
  list.files(
    path = derived_path,
    pattern = "\\.csv$",
    recursive = TRUE,
    full.names = FALSE
  )

derived_name_list <- derived_short_list |>
  str_replace_all("\\\\", "/") |>
  basename() |>
  str_remove("\\.csv$") |>
  str_to_lower()

tibble(
  csv_file = derived_csv_list,
  table_name = derived_name_list,
  already_registered = derived_name_list %in% DBI::dbListTables(con_mimic)
)

登録

for (i in seq_along(derived_name_list)) {
  
  message("Importing: ", derived_csv_list[i], " -> ", derived_name_list[i])
  
  if (DBI::dbExistsTable(con_mimic, derived_name_list[i])) {
    message("Skip existing table: ", derived_name_list[i])
    next
  }
  
  table_name <- DBI::dbQuoteIdentifier(con_mimic, derived_name_list[i])
  file_path <- DBI::dbQuoteString(
    con_mimic,
    normalizePath(derived_csv_list[i], winslash = "/", mustWork = TRUE)
  )
  
  sql <- paste0(
    "CREATE TABLE ", table_name,
    " AS SELECT * FROM read_csv_auto(",
    file_path,
    ", header = true, quote = '\"', escape = '\"', strict_mode = false, sample_size = -1);"
  )
  
  DBI::dbExecute(con_mimic, sql)
}

derivedフォルダに新しいCSVファイルを追加した場合は、このコードをもう一度実行します。すでにDBに登録済みのテーブルはスキップされ、未登録のCSVだけが新しく登録されます。上書きや追記は行わないため、同じコードを再実行してもデータが重複して追加されることはありません。

適格基準の確認

第1回では、以下の条件で対象患者を作成します。

ICU入室患者に限定する
成人、18歳以上に限定する
患者ごとの初回ICU stayに限定する
Sepsis-3患者に限定する

この後の講義で、そのほかの適格基準（マグネシウム投与、ベースラインMg値）、除外条件、アウトカム、共変量を追加していきます。

patients, admissions, icustaysの確認

db_ad <- tbl(con_mimic, "admissions")
db_ad |> colnames()

##  [1] "subject_id"           "hadm_id"              "admittime"           
##  [4] "dischtime"            "deathtime"            "admission_type"      
##  [7] "admit_provider_id"    "admission_location"   "discharge_location"  
## [10] "insurance"            "language"             "marital_status"      
## [13] "race"                 "edregtime"            "edouttime"           
## [16] "hospital_expire_flag"

db_ad |> count()

## # Source:   SQL [?? x 1]
## # Database: DuckDB 1.4.4 [mihara@Windows 10 x64:R 4.5.0/F:\Database\mimic_iv.db]
##        n
##    <dbl>
## 1 546028

db_pt <- tbl(con_mimic, "patients")
db_pt |> colnames()

## [1] "subject_id"        "gender"            "anchor_age"       
## [4] "anchor_year"       "anchor_year_group" "dod"

db_pt |> count()

## # Source:   SQL [?? x 1]
## # Database: DuckDB 1.4.4 [mihara@Windows 10 x64:R 4.5.0/F:\Database\mimic_iv.db]
##        n
##    <dbl>
## 1 364627

db_icustay <- tbl(con_mimic, "icustays")
db_icustay |> colnames()

## [1] "subject_id"     "hadm_id"        "stay_id"        "first_careunit"
## [5] "last_careunit"  "intime"         "outtime"        "los"

db_icustay |> count()

## # Source:   SQL [?? x 1]
## # Database: DuckDB 1.4.4 [mihara@Windows 10 x64:R 4.5.0/F:\Database\mimic_iv.db]
##       n
##   <dbl>
## 1 94458

db_icustay |> head(5)

## # Source:   SQL [?? x 8]
## # Database: DuckDB 1.4.4 [mihara@Windows 10 x64:R 4.5.0/F:\Database\mimic_iv.db]
##   subject_id  hadm_id  stay_id first_careunit  last_careunit intime             
##        <dbl>    <dbl>    <dbl> <chr>           <chr>         <dttm>             
## 1   10000032 29079034 39553978 Medical Intens… Medical Inte… 2180-07-23 14:00:00
## 2   10000690 25860671 37081114 Medical Intens… Medical Inte… 2150-11-02 19:37:00
## 3   10000980 26913865 39765666 Medical Intens… Medical Inte… 2189-06-27 08:42:00
## 4   10001217 24597018 37067082 Surgical Inten… Surgical Int… 2157-11-20 19:18:02
## 5   10001217 27703517 34592300 Surgical Inten… Surgical Int… 2157-12-19 15:42:24
## # ℹ 2 more variables: outtime <dttm>, los <dbl>

1. ICU入室患者に限定する

icustays はICU stay単位のテーブルです。
今回のBJA論文再現では、解析単位は基本的にICU stayです。

ICU stay数、入院数、患者数の違いを確認します。

db_icustay |>
  summarise(
    n_icustays = n(),
    n_admissions = n_distinct(hadm_id),
    n_subjects = n_distinct(subject_id)
  )

2. 成人に限定する

ICU stayを中心に、入院情報と患者情報を結合します。
年齢はICU入室年を用いて推定します。先ほど確認したように、MIMIC IVデータは基本18歳以上なので、この段階での除外は発生しないはずです。

db_icu_detail <-
  db_icustay |>
  left_join(db_ad, by = c("subject_id", "hadm_id")) |>
  left_join(db_pt, by = "subject_id") |>
  mutate(
    icu_year = lubridate::year(intime),
    age = anchor_age + icu_year - anchor_year,
    age = if_else(age > 91, 91, age)
  )


db_icu_detail |>
  select(
    subject_id,
    hadm_id,
    stay_id,
    age,
    gender,
    race,
    first_careunit,
    intime,
    outtime,
    los
  ) |>
  head(10)

db_icu_adult <-
  db_icu_detail |>
  filter(age >= 18)

db_icu_adult |>
  summarise(
    n_icustays = n(),
    n_admissions = n_distinct(hadm_id),
    n_subjects = n_distinct(subject_id),
    age_min = min(age, na.rm = TRUE),
    age_median = median(age, na.rm = TRUE),
    age_max = max(age, na.rm = TRUE)
  )

3. 初回ICU stayに限定する

同じ患者が複数回ICUに入室している場合、どのICU stayを使うかを明確にする必要があります。
ここでは、患者ごとの初回ICU stayに限定します。

db_icu_adult_first <-
  db_icu_adult |>
  group_by(subject_id) |>
  window_order(intime) |>
  mutate(icu_seq = row_number()) |>
  ungroup() |>
  filter(icu_seq == 1)

db_icu_adult_first |>
  summarise(
    n_icustays = n(),
    n_admissions = n_distinct(hadm_id),
    n_subjects = n_distinct(subject_id)
  )

4. Sepsis-3患者に限定する

MIMIC-IV derivedの sepsis3 は、ICU内でのSepsis-3発症時刻をまとめたテーブルです。
MIT-LCPのderived conceptでは、SOFA 2点以上と感染疑いを用いてICU内Sepsis-3 onsetを定義しています。

db_sepsis3 <- tbl(con_mimic, "sepsis3")

db_sepsis3 |> colnames()

##  [1] "subject_id"               "stay_id"                 
##  [3] "antibiotic_time"          "culture_time"            
##  [5] "suspected_infection_time" "sofa_time"               
##  [7] "sofa_score"               "respiration"             
##  [9] "coagulation"              "liver"                   
## [11] "cardiovascular"           "cns"                     
## [13] "renal"                    "sepsis3"

db_sepsis3 |> count()

db_sepsis3 |> head(10)

sepsis3 == TRUE の患者に限定します。

db_sepsis3_true <-
  db_sepsis3 |>
  filter(sepsis3 == TRUE)

db_sepsis3_true |>
  summarise(
    n_rows = n(),
    n_stays = n_distinct(stay_id),
    n_subjects = n_distinct(subject_id)
  )

成人・初回ICU stayとSepsis-3を結合します。
sepsis3 とのKeyには”subject_id”、“stay_id”が使えます。

sepsis_join_keys <- c("subject_id", "stay_id")

db_bja_patients <-
  db_icu_adult_first |>
  inner_join(
    db_sepsis3_true,
    by = sepsis_join_keys
  ) |>
  rename(
    icu_intime = intime,
    icu_outtime = outtime,
    icu_los = los
  )

db_bja_patients |>
  summarise(
    n_rows = n(),
    n_stays = n_distinct(stay_id),
    n_admissions = n_distinct(hadm_id),
    n_subjects = n_distinct(subject_id)
  )

第1回で残す列を整理する

今後、マグネシウム投与、アウトカム、重症度、検査値、併存疾患などを追加していきます。
第1回では、対象患者の基本情報とSepsis-3関連情報を残します。

cols_bja_step01 <- c(
  "subject_id",
  "hadm_id",
  "stay_id",
  "age",
  "gender",
  "race",
  "admission_type",
  "admission_location",
  "discharge_location",
  "insurance",
  "language",
  "marital_status",
  "hospital_expire_flag",
  "admittime",
  "dischtime",
  "deathtime",
  "dod",
  "first_careunit",
  "last_careunit",
  "icu_intime",
  "icu_outtime",
  "icu_los",
  "suspected_infection_time",
  "sofa_time",
  "sofa_score",
  "respiration",
  "coagulation",
  "liver",
  "cardiovascular",
  "cns",
  "renal"
)

存在する列だけを選択します。
MIMIC-IV derivedのバージョンや作成方法により、sepsis3 テーブルの列名が一部異なる場合があるためです。今回は31列全て存在します。

available_cols <- intersect(cols_bja_step01, colnames(db_bja_patients))
available_cols

##  [1] "subject_id"               "hadm_id"                 
##  [3] "stay_id"                  "age"                     
##  [5] "gender"                   "race"                    
##  [7] "admission_type"           "admission_location"      
##  [9] "discharge_location"       "insurance"               
## [11] "language"                 "marital_status"          
## [13] "hospital_expire_flag"     "admittime"               
## [15] "dischtime"                "deathtime"               
## [17] "dod"                      "first_careunit"          
## [19] "last_careunit"            "icu_intime"              
## [21] "icu_outtime"              "icu_los"                 
## [23] "suspected_infection_time" "sofa_time"               
## [25] "sofa_score"               "respiration"             
## [27] "coagulation"              "liver"                   
## [29] "cardiovascular"           "cns"                     
## [31] "renal"

db_bja_step01_patients <-
  db_bja_patients |>
  select(all_of(available_cols))

db_bja_step01_patients |> head(10)

db_bja_step01_patients |> count()

データベース上に保存する

ここまでの集大成を bja_step01_patients_2026 とします。
次回以降、このテーブルに曝露、アウトカム、共変量を追加していきます。実際にDBへ保存する場合は、以下のコメントアウトを外して実行してください。

注意: 既存テーブルは削除も上書きもしません。
同じ名前のテーブルが既に存在する場合は、処理を停止します。再作成したい場合は、下の output_table を別名に変更してください。

output_table <- "bja_step01_patients_2026"

# if (DBI::dbExistsTable(con_mimic, output_table)) {
#   stop(paste0(
#     output_table,
#     " は既に存在します。上書きしないため処理を停止しました。別名を指定してください。"
#   ))
# }
# 
# DBI::dbWriteTable(
#   conn = con_mimic,
#   name = output_table,
#   value = db_bja_step01_patients |> collect()
# )

保存できたか確認します。

DBI::dbListTables(con_mimic)

##  [1] "admissions"               "bja_step01_patients_2026"
##  [3] "caregiver"                "chartevents"             
##  [5] "d_hcpcs"                  "d_icd_diagnoses"         
##  [7] "d_icd_procedures"         "d_items"                 
##  [9] "d_labitems"               "datetimeevents"          
## [11] "diagnoses_icd"            "drgcodes"                
## [13] "emar"                     "emar_detail"             
## [15] "hcpcsevents"              "icustays"                
## [17] "ingredientevents"         "inputevents"             
## [19] "labevents"                "microbiologyevents"      
## [21] "omr"                      "outputevents"            
## [23] "patients"                 "pharmacy"                
## [25] "poe"                      "poe_detail"              
## [27] "prescriptions"            "procedureevents"         
## [29] "procedures_icd"           "provider"                
## [31] "sepsis3"                  "services"                
## [33] "transfers"

<第1回講義・課題>

第2回では、BJA論文のPatientを完成させ、主要評価項目である28日全死亡を作成します。
その準備として、第1回の課題では、まだ反映できていない適格基準・除外基準を整理してください。

ICU滞在時間について
BJA論文では、ICU滞在が24時間以下の患者を除外しています。
bja_step01_patients_2026 に含まれる icu_los、または icu_intime と icu_outtime を使って、ICU滞在時間が24時間以下の患者が何名いるか確認してください。
また、24時間超に限定した場合に患者数がどのように変化するか確認してください。
Sepsis onset前の血清Mg値について
BJA論文では、sepsis onset前に高Mg血症がある患者を除外しています。
高Mg血症は serum magnesium > 1.07 mmol/L と定義されています。
次回使用する labevents またはderivedの検査値テーブルを想定して、sepsis onset前のMg値をどのように抽出すればよいか考えてください。
低Mg血症は除外ではなく、サブグループとして扱う点に注意してください。
ICU入室前の硫酸マグネシウム投与について
BJA論文では、ICU入室前に硫酸マグネシウムを投与された患者を除外しています。
prescriptions, pharmacy, emar, emar_detail のどのテーブルを使えば確認できそうか、列名を確認してください。
特に、薬剤名、投与開始時刻、ICU入室時刻の3つをどのように組み合わせるかを考えてください。

# DBI::dbDisconnect(con_mimic, shutdown = TRUE)

第1回 MIMIC-IV演習：BJAマグネシウム敗血症研究

Takahiro Mihara

2026-05-31