Task

Learn how to use R to conduct basic analysis for PK data. Now, we have a theophylline PK dataset. The purpose of this exercise is to develop the simple PK model and use it to describe the PK of theophylline. First, look into the theophylline dataset. The Theoph data frame has 132 rows and 5 columns of data from an experiment on the pharmacokinetics of theophylline.

Questions
- Where can I get help?
- How do I read data into R?
- What is a data frame?
- How do I assign variables?
- How do I access subsets of a data frame?
- How do I calculate simple statistics like mean and median?
- How can I plot my data?

Objectives
- Read tabular data from a file into a program.
- Assign values to variables.
- Select individual values and subsections from data.
- Perform operations on a data frame of data.
- Display simple graphs (base and ggplot).
- Find the Cmax and Tmax for each individual.

Solution

Basic data analysis

use getwd() to find where you are.

getwd()
## [1] "/Users/nanhung/SuperfundWorkshop/exercises"

Use ? before function to get help and look into the Theoph dataset.

?Theoph

Use head() function to check the top row of theophylline PK data.

head(x = Theoph)
##   Subject   Wt Dose Time  conc
## 1       1 79.6 4.02 0.00  0.74
## 2       1 79.6 4.02 0.25  2.84
## 3       1 79.6 4.02 0.57  6.57
## 4       1 79.6 4.02 1.12 10.50
## 5       1 79.6 4.02 2.02  9.66
## 6       1 79.6 4.02 3.82  8.58

Use tail() function to check the top row of theophylline PK data.

tail(x = Theoph)
##     Subject   Wt Dose  Time conc
## 127      12 60.5  5.3  3.52 9.75
## 128      12 60.5  5.3  5.07 8.57
## 129      12 60.5  5.3  7.07 6.59
## 130      12 60.5  5.3  9.03 6.11
## 131      12 60.5  5.3 12.05 4.57
## 132      12 60.5  5.3 24.15 1.17

Use class() to see the data type of Theophylline

class(x = Theoph)
## [1] "nfnGroupedData" "nfGroupedData"  "groupedData"    "data.frame"

Use View() function to overview the data {r. eval=F} View(x = Theoph)

Use write.csv() to save your data to a csv file and named it “Theoph.csv”

write.csv(x = Theoph, file = "Theoph.csv")

Now, there is a csv file called “Theoph.csv”. Use read.csv() to load it to your wrok space. In addtion, assign the data and named it dat

dat <- read.csv(file = "Theoph.csv")

Sometimes you might need to focus on specific subgroup (individual). Use subset() to select Subject 1.

subset(Theoph, Subject == 1)
##    Subject   Wt Dose  Time  conc
## 1        1 79.6 4.02  0.00  0.74
## 2        1 79.6 4.02  0.25  2.84
## 3        1 79.6 4.02  0.57  6.57
## 4        1 79.6 4.02  1.12 10.50
## 5        1 79.6 4.02  2.02  9.66
## 6        1 79.6 4.02  3.82  8.58
## 7        1 79.6 4.02  5.10  8.36
## 8        1 79.6 4.02  7.03  7.47
## 9        1 79.6 4.02  9.05  6.89
## 10       1 79.6 4.02 12.12  5.94
## 11       1 79.6 4.02 24.37  3.28

Assign the variable S1 to store the information of Subject 1. Then, try using $ and [] to select the blood concentraion

S1 <- subset(x = Theoph, Subject==1)
S1$conc
##  [1]  0.74  2.84  6.57 10.50  9.66  8.58  8.36  7.47  6.89  5.94  3.28

How’s the average concentration of Subject 1?

mean(S1$conc)
## [1] 6.439091

How’s the maximum concentration (\(Cmax\)) of Subject 1?

max(S1$conc)
## [1] 10.5

Plot the PK profile for subject 1.

plot(x = S1$Time, y = S1$conc)

Plot the PK profile for all subjects.

plot(x = Theoph$Time, y = Theoph$conc, col = Theoph$Subject)

Try to seperate each individaul to different sub figures.

par(mfrow=c(m=3,n=4))
for(i in 1:12){
  Si <- subset(x = Theoph, Subject == i)  # Select subject
  plot(x = Si$Time, y = Si$conc, main = i)   # plot PK profile
}

Tidyverse package

Use library() to load tidyverse

library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Use ggplot to generate the PK profile for all subjects.

Theoph %>% ggplot(aes(x = Time, y =conc, color = Subject)) + 
  geom_point()

Look into the data tye of Theoph$Subject

class(Theoph$Subject)
## [1] "ordered" "factor"

The currnt order of the subject is based on the peak concentration. Try to reorder it by number.

Theoph$Subject
##   [1] 1  1  1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2  3 
##  [24] 3  3  3  3  3  3  3  3  3  3  4  4  4  4  4  4  4  4  4  4  4  5  5 
##  [47] 5  5  5  5  5  5  5  5  5  6  6  6  6  6  6  6  6  6  6  6  7  7  7 
##  [70] 7  7  7  7  7  7  7  7  8  8  8  8  8  8  8  8  8  8  8  9  9  9  9 
##  [93] 9  9  9  9  9  9  9  10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11
## [116] 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12
## Levels: 6 < 7 < 8 < 11 < 3 < 2 < 4 < 9 < 12 < 10 < 1 < 5

Re-create the ggplot and check the different with previous plot.

Theoph$Subject <- factor(Theoph$Subject, 
                            level = c("1", "2", "3", "4", "5", "6", 
                                      "7", "8", "9", "10", "11","12"))

Try to seperate each individaul to different sub figures.

Theoph$Subject
##   [1] 1  1  1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2  3 
##  [24] 3  3  3  3  3  3  3  3  3  3  4  4  4  4  4  4  4  4  4  4  4  5  5 
##  [47] 5  5  5  5  5  5  5  5  5  6  6  6  6  6  6  6  6  6  6  6  7  7  7 
##  [70] 7  7  7  7  7  7  7  7  8  8  8  8  8  8  8  8  8  8  8  9  9  9  9 
##  [93] 9  9  9  9  9  9  9  10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11
## [116] 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12
## Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < 11 < 12

Find the Tmax - First, assign the peak concentration and named it S1.Cmax

Theoph %>% ggplot(aes(x = Time, y =conc, color = Subject)) + 
  geom_point() + geom_line() + 
  labs(x = "Time (hr)", y = "Concentration (mg/L)")

  • Second, use which() to find the corresponding row number and assigned the variable Cmax.pt
Theoph %>% ggplot(aes(x = Time, y =conc)) + 
  geom_point() + geom_line() + 
  facet_wrap(~Subject) + 
  labs(x = "Time (hr)", y = "Concentration (mg/L)")

  • Third, use Cmax.pt to find the corresponding \(T_{max}\)
S1.Cmax <- max(S1$conc)
S1.Cmax
## [1] 10.5

Use pipline %>%, group_by(), and summarise() to manipulate the data and find peak concentration for all subjects. Assigned the new variable Cmax_pop

which(S1$conc == S1.Cmax)
## [1] 4

Plot histogram and baplot

Cmax.pt <- which(S1$conc == S1.Cmax)
T_max <- S1$Time[Cmax.pt]
T_max
## [1] 1.12

Try to figure ou the relationship between body weight and peak concentration. - Summarize the average body weight for each individual

Cmax_pop <- Theoph %>% group_by(Subject) %>% summarise(C_max = max(conc))
Cmax_pop
## # A tibble: 12 x 2
##    Subject C_max
##    <ord>   <dbl>
##  1 1       10.5 
##  2 2        8.33
##  3 3        8.2 
##  4 4        8.6 
##  5 5       11.4 
##  6 6        6.44
##  7 7        7.09
##  8 8        7.56
##  9 9        9.03
## 10 10      10.2 
## 11 11       8   
## 12 12       9.75
summary(Cmax_pop$C_max)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.440   7.890   8.465   8.759   9.865  11.400
hist(Cmax_pop$C_max) # histogram

boxplot(Cmax_pop$C_max) # boxplot

  • Visualize by ggplot
Theoph %>% group_by(Subject) %>% 
  summarise(C_max = max(conc), Wt=mean(Wt)) %>% 
  ggplot(aes(x=Wt, y=C_max)) + geom_point()

  • Add linear regression model
Theoph %>% group_by(Subject) %>% 
  summarise(C_max = max(conc), Wt=mean(Wt)) %>% 
  ggplot(aes(x=Wt, y=C_max)) + geom_point() + geom_smooth(method = "lm")

Create the table that summarrize the \(C_{max}\) and \(T_{max}\) for each individual. - First, generate data frame df that include subject number, \(C_{max}\) and \(T_{max}\)

df <- Theoph %>% group_by(Subject) %>% summarize(C_max = max(conc))
df$T_max <- NA
df
## # A tibble: 12 x 3
##    Subject C_max T_max
##    <ord>   <dbl> <lgl>
##  1 1       10.5  NA   
##  2 2        8.33 NA   
##  3 3        8.2  NA   
##  4 4        8.6  NA   
##  5 5       11.4  NA   
##  6 6        6.44 NA   
##  7 7        7.09 NA   
##  8 8        7.56 NA   
##  9 9        9.03 NA   
## 10 10      10.2  NA   
## 11 11       8    NA   
## 12 12       9.75 NA
  • Second, add find the corresponding \(T_{max}\) for subject 1
dat <- subset(Theoph, Subject==1)
C_max.pt <- which(dat$conc == df$C_max[1])
df$T_max[1] <- dat$Time[C_max.pt]
df
## # A tibble: 12 x 3
##    Subject C_max T_max
##    <ord>   <dbl> <dbl>
##  1 1       10.5   1.12
##  2 2        8.33 NA   
##  3 3        8.2  NA   
##  4 4        8.6  NA   
##  5 5       11.4  NA   
##  6 6        6.44 NA   
##  7 7        7.09 NA   
##  8 8        7.56 NA   
##  9 9        9.03 NA   
## 10 10      10.2  NA   
## 11 11       8    NA   
## 12 12       9.75 NA
  • Third, use for loop add find the corresponding \(T_{max}\) for other subjects
for(i in 2:12){
  dat <- subset(Theoph, Subject==i)
  C_max.pt <- which(dat$conc == df$C_max[i])
  df$T_max[i] <- dat$Time[C_max.pt]
}
df
## # A tibble: 12 x 3
##    Subject C_max T_max
##    <ord>   <dbl> <dbl>
##  1 1       10.5   1.12
##  2 2        8.33  1.92
##  3 3        8.2   1.02
##  4 4        8.6   1.07
##  5 5       11.4   1   
##  6 6        6.44  1.15
##  7 7        7.09  3.48
##  8 8        7.56  2.02
##  9 9        9.03  0.63
## 10 10      10.2   3.55
## 11 11       8     0.98
## 12 12       9.75  3.52