CJUG SDTM team, R subteam
2017/03/03
“
This presentaion can be found in following URL:
Presentation HTML file: http://rpubs.com/mokjpn/cjugws2017
Original R Markdown document: https://github.com/mokjpn/cjugr/blob/master/Workshop2017.Rpres
XPT files can be read by read.xport() function of foreign package.
setwd("~/CJUG/SDTM/20_Work_in_progress/23_HCT-1337")
library(foreign)
# read.xportでxptファイルを変数QSに、データフレームとして読み込み
QS <- read.xport("30_Summary/dataset/QS.xpt")
DA <- read.xport("30_Summary/dataset/DA.xpt")
str(QS)
'data.frame': 118 obs. of 18 variables:
$ STUDYID : Factor w/ 1 level "HCT-1337": 1 1 1 1 1 1 1 1 1 1 ...
$ DOMAIN : Factor w/ 1 level "QS": 1 1 1 1 1 1 1 1 1 1 ...
$ USUBJID : Factor w/ 59 levels "HCT-13370101",..: 1 1 2 2 3 3 4 4 5 5 ...
$ QSSEQ : num 1 2 1 2 1 2 1 2 1 2 ...
$ QSTESTCD: Factor w/ 2 levels "POSTQ","PREQS": 2 1 2 1 2 1 2 1 2 1 ...
$ QSTEST : Factor w/ 2 levels "Post-dose Calculation test",..: 2 1 2 1 2 1 2 1 2 1 ...
$ QSCAT : Factor w/ 1 level "Calculation Test": 1 1 1 1 1 1 1 1 1 1 ...
$ QSORRES : Factor w/ 83 levels "","102","104",..: 5 21 20 52 65 76 60 56 49 61 ...
$ QSSTRESC: Factor w/ 83 levels "","102","104",..: 5 21 20 52 65 76 60 56 49 61 ...
$ QSBLFL : Factor w/ 2 levels "","Y": 2 1 2 1 2 1 2 1 2 1 ...
$ VISITNUM: num 2 2 2 2 2 2 2 2 2 2 ...
$ VISIT : Factor w/ 1 level "Dosing": 1 1 1 1 1 1 1 1 1 1 ...
$ VISITDY : num 1 1 1 1 1 1 1 1 1 1 ...
$ EPOCH : Factor w/ 2 levels "SCREENING","TREATMENT": 1 2 1 2 1 2 1 2 1 2 ...
$ QSDTC : Factor w/ 1 level "..": 1 1 1 1 1 1 1 1 1 1 ...
$ QSDY : num 1 1 1 1 1 1 1 1 1 1 ...
$ QSTPT : Factor w/ 2 levels "Post-dose","Pre-dose": 2 1 2 1 2 1 2 1 2 1 ...
$ QSTPTNUM: num 1 2 1 2 1 2 1 2 1 2 ...
Dataset-XML files can be read by read.dataset.xml() function of R4DSXML package.
setwd("../Define2Validate")
library(R4DSXML)
# read.dataset.xmlでDataset-XMLファイルを変数CMに、データフレームとして読み込み
CM <- read.dataset.xml("Odm_CM.xml", "Odm_Define.xml")
str(CM)
'data.frame': 2 obs. of 6 variables:
$ DOMAIN : chr "CM" "CM"
$ CMSEQ : int 1 2
$ CMTRT : chr "マイスタン錠5mg" "クレストール錠"
$ CMDOSE : int 3 5
$ CMDOSU : chr "CAPSULE" "mgg"
$ CMSTDTC: chr "2016-03-16T10:56:40" "2016-03-16T10:56:401"
library(tidyr)
library(dplyr)
QS %>%
# mutate()は列を加工して新たな列を追加します。QSSTRESCの数値への変換。
mutate(qsstrescn = as.numeric(as.character(QSSTRESC))) %>%
# select()は特定の列だけを取り出します。USUBJIDとQSTESTCDとqsstrescn だけに。
select(USUBJID, QSTESTCD, qsstrescn) %>%
# USUBJID でグループ化
group_by(USUBJID) %>%
# spread() はNormalized formatを、1行1症例のフォーマットに変換します。
# QSTESTCDの値をそれぞれ列にして、qsstrescnを値にします。
spread(QSTESTCD,qsstrescn) %>%
# POSTQとPRESQの差をとって、diffという列に追加します。
mutate(diff = POSTQ - PREQS) %>%
# DAドメインの表を結合
inner_join(DA,by="USUBJID") %>%
# DASCATごとにdiffの値を箱ひげ図にします
plot(diff ~ DASCAT, data=.)
knitr R package or Rstudio development software, we can convert R Markdown documents into HTML, Word, PDF, or Presentation like this.Like that:
Dataset-XML files can be read by `read.dataset.xml()` function of `R4DSXML` package.
```{r}
setwd("../Define2Validate")
library(R4DSXML)
# read.dataset.xmlでDataset-XMLファイルを変数CMに、データフレームとして読み込み
CM <- read.dataset.xml("Odm_CM.xml", "Odm_Define.xml")
str(CM)
```
The code inside backquotes are interpreted when converting document into HTML/Word/PDF, and the result(string representation of CM variable) will be displayed in the final document even there is no actual data in the R markdown document.
From the Rmd document below:
Currently, there are `r length(unique(DM$USUBJID))` subjects. Number of non-treated arm is `r length(na.omit(DM$ACTARMCD=="NOTTRT"))`.
You will get this Word file:
Currently, there are 32 subjects. Number of non-treated arm is 8.
This technique will improve reproducibility of the document, because the actual data is not written in document. Every time we need the values, it will be re-calculated from the original dataset. This concept is known as 'Reproducible Research'.
Q: To begin using R, which packages do you recommend?
A: If you would like to code using some modern expressions like '%>%', I recommend to install 'tidyverse' package first. It includes much of modern features, such as ggplot2 graphics, dplyr data manipulation, readr data import. But if your textbook does not use these modern features, you do not need to install any package. Plain R itself contain all of basic, easy-to-learn methods to import, manipulate, and analyse your data.
Q: What is superiority of R compareing to SAS? (excluding the price issue)
A: Unfortunately I have no experience of using SAS. So I cannot answer this question. But one thing I can say is that I have no problem when I handle SDTM dataset with R.
Q: When I use R for my business, will the validation of R required?
A: In general, the computer system validation will be required to produce dataset for the submission. To include R or Rstudio for your business process, these documents will be helpful: