Solution

Basic data analysis

use getwd() to find where you are.

getwd()

## [1] "/Users/nanhung/SuperfundWorkshop/exercises"

Use ? before function to get help and look into the Theoph dataset.

?Theoph

Use head() function to check the top row of theophylline PK data.

head(x = Theoph)

##   Subject   Wt Dose Time  conc
## 1       1 79.6 4.02 0.00  0.74
## 2       1 79.6 4.02 0.25  2.84
## 3       1 79.6 4.02 0.57  6.57
## 4       1 79.6 4.02 1.12 10.50
## 5       1 79.6 4.02 2.02  9.66
## 6       1 79.6 4.02 3.82  8.58

Use tail() function to check the top row of theophylline PK data.

tail(x = Theoph)

##     Subject   Wt Dose  Time conc
## 127      12 60.5  5.3  3.52 9.75
## 128      12 60.5  5.3  5.07 8.57
## 129      12 60.5  5.3  7.07 6.59
## 130      12 60.5  5.3  9.03 6.11
## 131      12 60.5  5.3 12.05 4.57
## 132      12 60.5  5.3 24.15 1.17

Use class() to see the data type of Theophylline

class(x = Theoph)

## [1] "nfnGroupedData" "nfGroupedData"  "groupedData"    "data.frame"

Use View() function to overview the data {r. eval=F} View(x = Theoph)

Use write.csv() to save your data to a csv file and named it “Theoph.csv”

write.csv(x = Theoph, file = "Theoph.csv")

Now, there is a csv file called “Theoph.csv”. Use read.csv() to load it to your wrok space. In addtion, assign the data and named it dat

dat <- read.csv(file = "Theoph.csv")

Sometimes you might need to focus on specific subgroup (individual). Use subset() to select Subject 1.

subset(Theoph, Subject == 1)

##    Subject   Wt Dose  Time  conc
## 1        1 79.6 4.02  0.00  0.74
## 2        1 79.6 4.02  0.25  2.84
## 3        1 79.6 4.02  0.57  6.57
## 4        1 79.6 4.02  1.12 10.50
## 5        1 79.6 4.02  2.02  9.66
## 6        1 79.6 4.02  3.82  8.58
## 7        1 79.6 4.02  5.10  8.36
## 8        1 79.6 4.02  7.03  7.47
## 9        1 79.6 4.02  9.05  6.89
## 10       1 79.6 4.02 12.12  5.94
## 11       1 79.6 4.02 24.37  3.28

Assign the variable S1 to store the information of Subject 1. Then, try using $ and [] to select the blood concentraion

S1 <- subset(x = Theoph, Subject==1)
S1$conc

##  [1]  0.74  2.84  6.57 10.50  9.66  8.58  8.36  7.47  6.89  5.94  3.28

How’s the average concentration of Subject 1?

mean(S1$conc)

## [1] 6.439091

How’s the maximum concentration ($Cmax$) of Subject 1?

max(S1$conc)

## [1] 10.5

Plot the PK profile for subject 1.

plot(x = S1$Time, y = S1$conc)

Plot the PK profile for all subjects.

plot(x = Theoph$Time, y = Theoph$conc, col = Theoph$Subject)

Try to seperate each individaul to different sub figures.

par(mfrow=c(m=3,n=4))
for(i in 1:12){
  Si <- subset(x = Theoph, Subject == i)  # Select subject
  plot(x = Si$Time, y = Si$conc, main = i)   # plot PK profile
}

Tidyverse package

Use library() to load tidyverse

library(tidyverse)

## ── Attaching packages ───────────────────────────────────────────────────────── tidyverse 1.3.0 ──

## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0

## ── Conflicts ──────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Use ggplot to generate the PK profile for all subjects.

Theoph %>% ggplot(aes(x = Time, y =conc, color = Subject)) + 
  geom_point()

Look into the data tye of Theoph$Subject

class(Theoph$Subject)

## [1] "ordered" "factor"

The currnt order of the subject is based on the peak concentration. Try to reorder it by number.

Theoph$Subject

##   [1] 1  1  1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2  3 
##  [24] 3  3  3  3  3  3  3  3  3  3  4  4  4  4  4  4  4  4  4  4  4  5  5 
##  [47] 5  5  5  5  5  5  5  5  5  6  6  6  6  6  6  6  6  6  6  6  7  7  7 
##  [70] 7  7  7  7  7  7  7  7  8  8  8  8  8  8  8  8  8  8  8  9  9  9  9 
##  [93] 9  9  9  9  9  9  9  10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11
## [116] 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12
## Levels: 6 < 7 < 8 < 11 < 3 < 2 < 4 < 9 < 12 < 10 < 1 < 5

Re-create the ggplot and check the different with previous plot.

Theoph$Subject <- factor(Theoph$Subject, 
                            level = c("1", "2", "3", "4", "5", "6", 
                                      "7", "8", "9", "10", "11","12"))

Try to seperate each individaul to different sub figures.

Theoph$Subject

##   [1] 1  1  1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2  3 
##  [24] 3  3  3  3  3  3  3  3  3  3  4  4  4  4  4  4  4  4  4  4  4  5  5 
##  [47] 5  5  5  5  5  5  5  5  5  6  6  6  6  6  6  6  6  6  6  6  7  7  7 
##  [70] 7  7  7  7  7  7  7  7  8  8  8  8  8  8  8  8  8  8  8  9  9  9  9 
##  [93] 9  9  9  9  9  9  9  10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11
## [116] 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12
## Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < 11 < 12

Find the Tmax - First, assign the peak concentration and named it S1.Cmax

Theoph %>% ggplot(aes(x = Time, y =conc, color = Subject)) + 
  geom_point() + geom_line() + 
  labs(x = "Time (hr)", y = "Concentration (mg/L)")

Second, use which() to find the corresponding row number and assigned the variable Cmax.pt

Theoph %>% ggplot(aes(x = Time, y =conc)) + 
  geom_point() + geom_line() + 
  facet_wrap(~Subject) + 
  labs(x = "Time (hr)", y = "Concentration (mg/L)")

Third, use Cmax.pt to find the corresponding $T_{max}$

S1.Cmax <- max(S1$conc)
S1.Cmax

## [1] 10.5

Use pipline %>%, group_by(), and summarise() to manipulate the data and find peak concentration for all subjects. Assigned the new variable Cmax_pop

which(S1$conc == S1.Cmax)

## [1] 4

Plot histogram and baplot

Cmax.pt <- which(S1$conc == S1.Cmax)
T_max <- S1$Time[Cmax.pt]
T_max

## [1] 1.12

Try to figure ou the relationship between body weight and peak concentration. - Summarize the average body weight for each individual

Cmax_pop <- Theoph %>% group_by(Subject) %>% summarise(C_max = max(conc))
Cmax_pop

## # A tibble: 12 x 2
##    Subject C_max
##    <ord>   <dbl>
##  1 1       10.5 
##  2 2        8.33
##  3 3        8.2 
##  4 4        8.6 
##  5 5       11.4 
##  6 6        6.44
##  7 7        7.09
##  8 8        7.56
##  9 9        9.03
## 10 10      10.2 
## 11 11       8   
## 12 12       9.75

summary(Cmax_pop$C_max)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.440   7.890   8.465   8.759   9.865  11.400

hist(Cmax_pop$C_max) # histogram

boxplot(Cmax_pop$C_max) # boxplot

Visualize by ggplot

Theoph %>% group_by(Subject) %>% 
  summarise(C_max = max(conc), Wt=mean(Wt)) %>% 
  ggplot(aes(x=Wt, y=C_max)) + geom_point()

Add linear regression model

Theoph %>% group_by(Subject) %>% 
  summarise(C_max = max(conc), Wt=mean(Wt)) %>% 
  ggplot(aes(x=Wt, y=C_max)) + geom_point() + geom_smooth(method = "lm")

Create the table that summarrize the $C_{max}$ and $T_{max}$ for each individual. - First, generate data frame df that include subject number, $C_{max}$ and $T_{max}$

df <- Theoph %>% group_by(Subject) %>% summarize(C_max = max(conc))
df$T_max <- NA
df

## # A tibble: 12 x 3
##    Subject C_max T_max
##    <ord>   <dbl> <lgl>
##  1 1       10.5  NA   
##  2 2        8.33 NA   
##  3 3        8.2  NA   
##  4 4        8.6  NA   
##  5 5       11.4  NA   
##  6 6        6.44 NA   
##  7 7        7.09 NA   
##  8 8        7.56 NA   
##  9 9        9.03 NA   
## 10 10      10.2  NA   
## 11 11       8    NA   
## 12 12       9.75 NA

Second, add find the corresponding $T_{max}$ for subject 1

dat <- subset(Theoph, Subject==1)
C_max.pt <- which(dat$conc == df$C_max[1])
df$T_max[1] <- dat$Time[C_max.pt]
df

## # A tibble: 12 x 3
##    Subject C_max T_max
##    <ord>   <dbl> <dbl>
##  1 1       10.5   1.12
##  2 2        8.33 NA   
##  3 3        8.2  NA   
##  4 4        8.6  NA   
##  5 5       11.4  NA   
##  6 6        6.44 NA   
##  7 7        7.09 NA   
##  8 8        7.56 NA   
##  9 9        9.03 NA   
## 10 10      10.2  NA   
## 11 11       8    NA   
## 12 12       9.75 NA

Third, use for loop add find the corresponding $T_{max}$ for other subjects

for(i in 2:12){
  dat <- subset(Theoph, Subject==i)
  C_max.pt <- which(dat$conc == df$C_max[i])
  df$T_max[i] <- dat$Time[C_max.pt]
}
df

## # A tibble: 12 x 3
##    Subject C_max T_max
##    <ord>   <dbl> <dbl>
##  1 1       10.5   1.12
##  2 2        8.33  1.92
##  3 3        8.2   1.02
##  4 4        8.6   1.07
##  5 5       11.4   1   
##  6 6        6.44  1.15
##  7 7        7.09  3.48
##  8 8        7.56  2.02
##  9 9        9.03  0.63
## 10 10      10.2   3.55
## 11 11       8     0.98
## 12 12       9.75  3.52

Exercise 1.1: Exploratory data analysis

Nan-Hung Hsieh

2019/12/09 (update: 2019-12-10)

Task

Solution

Basic data analysis

Tidyverse package