Learn how to use R to conduct basic analysis for PK data. Now, we have a theophylline PK dataset. The purpose of this exercise is to develop the simple PK model and use it to describe the PK of theophylline. First, look into the theophylline dataset. The Theoph data frame has 132 rows and 5 columns of data from an experiment on the pharmacokinetics of theophylline.
Questions
- Where can I get help?
- How do I read data into R?
- What is a data frame?
- How do I assign variables?
- How do I access subsets of a data frame?
- How do I calculate simple statistics like mean and median?
- How can I plot my data?
Objectives
- Read tabular data from a file into a program.
- Assign values to variables.
- Select individual values and subsections from data.
- Perform operations on a data frame of data.
- Display simple graphs (base and ggplot).
- Find the Cmax and Tmax for each individual.
use getwd() to find where you are.
getwd()
## [1] "/Users/nanhung/SuperfundWorkshop/exercises"
Use ? before function to get help and look into the Theoph dataset.
?Theoph
Use head() function to check the top row of theophylline PK data.
head(x = Theoph)
## Subject Wt Dose Time conc
## 1 1 79.6 4.02 0.00 0.74
## 2 1 79.6 4.02 0.25 2.84
## 3 1 79.6 4.02 0.57 6.57
## 4 1 79.6 4.02 1.12 10.50
## 5 1 79.6 4.02 2.02 9.66
## 6 1 79.6 4.02 3.82 8.58
Use tail() function to check the top row of theophylline PK data.
tail(x = Theoph)
## Subject Wt Dose Time conc
## 127 12 60.5 5.3 3.52 9.75
## 128 12 60.5 5.3 5.07 8.57
## 129 12 60.5 5.3 7.07 6.59
## 130 12 60.5 5.3 9.03 6.11
## 131 12 60.5 5.3 12.05 4.57
## 132 12 60.5 5.3 24.15 1.17
Use class() to see the data type of Theophylline
class(x = Theoph)
## [1] "nfnGroupedData" "nfGroupedData" "groupedData" "data.frame"
Use View() function to overview the data {r. eval=F} View(x = Theoph)
Use write.csv() to save your data to a csv file and named it “Theoph.csv”
write.csv(x = Theoph, file = "Theoph.csv")
Now, there is a csv file called “Theoph.csv”. Use read.csv() to load it to your wrok space. In addtion, assign the data and named it dat
dat <- read.csv(file = "Theoph.csv")
Sometimes you might need to focus on specific subgroup (individual). Use subset() to select Subject 1.
subset(Theoph, Subject == 1)
## Subject Wt Dose Time conc
## 1 1 79.6 4.02 0.00 0.74
## 2 1 79.6 4.02 0.25 2.84
## 3 1 79.6 4.02 0.57 6.57
## 4 1 79.6 4.02 1.12 10.50
## 5 1 79.6 4.02 2.02 9.66
## 6 1 79.6 4.02 3.82 8.58
## 7 1 79.6 4.02 5.10 8.36
## 8 1 79.6 4.02 7.03 7.47
## 9 1 79.6 4.02 9.05 6.89
## 10 1 79.6 4.02 12.12 5.94
## 11 1 79.6 4.02 24.37 3.28
Assign the variable S1 to store the information of Subject 1. Then, try using $ and [] to select the blood concentraion
S1 <- subset(x = Theoph, Subject==1)
S1$conc
## [1] 0.74 2.84 6.57 10.50 9.66 8.58 8.36 7.47 6.89 5.94 3.28
How’s the average concentration of Subject 1?
mean(S1$conc)
## [1] 6.439091
How’s the maximum concentration (\(Cmax\)) of Subject 1?
max(S1$conc)
## [1] 10.5
Plot the PK profile for subject 1.
plot(x = S1$Time, y = S1$conc)
Plot the PK profile for all subjects.
plot(x = Theoph$Time, y = Theoph$conc, col = Theoph$Subject)
Try to seperate each individaul to different sub figures.
par(mfrow=c(m=3,n=4))
for(i in 1:12){
Si <- subset(x = Theoph, Subject == i) # Select subject
plot(x = Si$Time, y = Si$conc, main = i) # plot PK profile
}
Use library() to load tidyverse
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.3
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Use ggplot to generate the PK profile for all subjects.
Theoph %>% ggplot(aes(x = Time, y =conc, color = Subject)) +
geom_point()
Look into the data tye of Theoph$Subject
class(Theoph$Subject)
## [1] "ordered" "factor"
The currnt order of the subject is based on the peak concentration. Try to reorder it by number.
Theoph$Subject
## [1] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3
## [24] 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 5 5
## [47] 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 7 7 7
## [70] 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9
## [93] 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11
## [116] 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12
## Levels: 6 < 7 < 8 < 11 < 3 < 2 < 4 < 9 < 12 < 10 < 1 < 5
Re-create the ggplot and check the different with previous plot.
Theoph$Subject <- factor(Theoph$Subject,
level = c("1", "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11","12"))
Try to seperate each individaul to different sub figures.
Theoph$Subject
## [1] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3
## [24] 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 5 5
## [47] 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 7 7 7
## [70] 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9
## [93] 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11
## [116] 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12
## Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10 < 11 < 12
Find the Tmax - First, assign the peak concentration and named it S1.Cmax
Theoph %>% ggplot(aes(x = Time, y =conc, color = Subject)) +
geom_point() + geom_line() +
labs(x = "Time (hr)", y = "Concentration (mg/L)")
which() to find the corresponding row number and assigned the variable Cmax.ptTheoph %>% ggplot(aes(x = Time, y =conc)) +
geom_point() + geom_line() +
facet_wrap(~Subject) +
labs(x = "Time (hr)", y = "Concentration (mg/L)")
Cmax.pt to find the corresponding \(T_{max}\)S1.Cmax <- max(S1$conc)
S1.Cmax
## [1] 10.5
Use pipline %>%, group_by(), and summarise() to manipulate the data and find peak concentration for all subjects. Assigned the new variable Cmax_pop
which(S1$conc == S1.Cmax)
## [1] 4
Plot histogram and baplot
Cmax.pt <- which(S1$conc == S1.Cmax)
T_max <- S1$Time[Cmax.pt]
T_max
## [1] 1.12
Try to figure ou the relationship between body weight and peak concentration. - Summarize the average body weight for each individual
Cmax_pop <- Theoph %>% group_by(Subject) %>% summarise(C_max = max(conc))
Cmax_pop
## # A tibble: 12 x 2
## Subject C_max
## <ord> <dbl>
## 1 1 10.5
## 2 2 8.33
## 3 3 8.2
## 4 4 8.6
## 5 5 11.4
## 6 6 6.44
## 7 7 7.09
## 8 8 7.56
## 9 9 9.03
## 10 10 10.2
## 11 11 8
## 12 12 9.75
summary(Cmax_pop$C_max)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.440 7.890 8.465 8.759 9.865 11.400
hist(Cmax_pop$C_max) # histogram
boxplot(Cmax_pop$C_max) # boxplot
Theoph %>% group_by(Subject) %>%
summarise(C_max = max(conc), Wt=mean(Wt)) %>%
ggplot(aes(x=Wt, y=C_max)) + geom_point()
Theoph %>% group_by(Subject) %>%
summarise(C_max = max(conc), Wt=mean(Wt)) %>%
ggplot(aes(x=Wt, y=C_max)) + geom_point() + geom_smooth(method = "lm")
Create the table that summarrize the \(C_{max}\) and \(T_{max}\) for each individual. - First, generate data frame df that include subject number, \(C_{max}\) and \(T_{max}\)
df <- Theoph %>% group_by(Subject) %>% summarize(C_max = max(conc))
df$T_max <- NA
df
## # A tibble: 12 x 3
## Subject C_max T_max
## <ord> <dbl> <lgl>
## 1 1 10.5 NA
## 2 2 8.33 NA
## 3 3 8.2 NA
## 4 4 8.6 NA
## 5 5 11.4 NA
## 6 6 6.44 NA
## 7 7 7.09 NA
## 8 8 7.56 NA
## 9 9 9.03 NA
## 10 10 10.2 NA
## 11 11 8 NA
## 12 12 9.75 NA
dat <- subset(Theoph, Subject==1)
C_max.pt <- which(dat$conc == df$C_max[1])
df$T_max[1] <- dat$Time[C_max.pt]
df
## # A tibble: 12 x 3
## Subject C_max T_max
## <ord> <dbl> <dbl>
## 1 1 10.5 1.12
## 2 2 8.33 NA
## 3 3 8.2 NA
## 4 4 8.6 NA
## 5 5 11.4 NA
## 6 6 6.44 NA
## 7 7 7.09 NA
## 8 8 7.56 NA
## 9 9 9.03 NA
## 10 10 10.2 NA
## 11 11 8 NA
## 12 12 9.75 NA
for(i in 2:12){
dat <- subset(Theoph, Subject==i)
C_max.pt <- which(dat$conc == df$C_max[i])
df$T_max[i] <- dat$Time[C_max.pt]
}
df
## # A tibble: 12 x 3
## Subject C_max T_max
## <ord> <dbl> <dbl>
## 1 1 10.5 1.12
## 2 2 8.33 1.92
## 3 3 8.2 1.02
## 4 4 8.6 1.07
## 5 5 11.4 1
## 6 6 6.44 1.15
## 7 7 7.09 3.48
## 8 8 7.56 2.02
## 9 9 9.03 0.63
## 10 10 10.2 3.55
## 11 11 8 0.98
## 12 12 9.75 3.52