Overview

Getting Started
General R Use
Data Handling for 'Pharmacometricians'
Plotting
What else?

— .class #id

Getting Started

While we start the overview of R we wanted to make sure everyone has the necessary components set-up for R use. Please open the setup.R file and we will show everyone how to run the code. If you have any problems please put up a RED sticky and a helper will come assist

R Vocabulary

packages are add on features to R that include data, new functions and methods, and extended capabilities. Think of them as “apps'' on your phone. We've already installed several!
terminal this is the main window of R where you enter commands
scripts these are where you store commands to be run in the terminal later, like syntax files in SPSS or .do files in Stata
functions commands that do something to an object in R
data frame the main element for statistical purposes, an object with rows and columns that includes numbers, factors, and other data types
workspace the working memory of R where all objects are stored
vector the basic unit of data in R
symbols used to name and store objects or to designate operations/functions
attributes determine how functions act on objects

Components of an R Setup

R - R works in the command line of any OS, but also comes with a basic GUI to operate on its own in Windows and Mac download
RStudio - a much better way to work in R that allows editing of scripts, operation of R, viewing of the workspace, and R help all on one screen download

Rstudio

The R workspace in RStudio

Let's Look at RStudio

RStudio has made R more accessible and easy to use than ever!
4 panels, various tabs
Help, plots, file structure
Workspace, history, version control
Working files
The R Console

Self-help

Let's see, type: ?summary
Now type: ??regression
For tricky questions, funky error messages (there are many), and other issues, use Google (include "in R” to the end of your query)
We can also use RSeek - the search engine just for R!
StackOverflow has become a great resource with many questions for many specific packages in R, and a rating system for answers
A number of R Core members contribute there

Self-help (2)

Sometimes R Help can be a bit prickly and unfriendly…
The most important part of getting help is being able to ask your question with a reproducible example (i.e. some short simulated data and code that doesn't do what you want)
Like this:

Dose <- c(0.1, 0.1, 0.1)
Weight <- c(52, "12", 38)
# Trying to calculate amount to give for weight-based dosing
Dose * Weight

## Error: non-numeric argument to binary operator

In this case, someone could replicate it, allowing them to notice that the '12' is a character rather than numeric and point it out.

For R Help etiquette (for the tough problems) see the great advice here

R As A Calculator

2 + 2  ## add numbers
2 * pi  #multiply by a constant
7 + runif(1, min = 0, max = 1)  #add a random variable
4^4  # powers
sqrt(4^4)  # functions

Arithmetic Operators

In addition to the obvious + - = / * and exponential ^, there is also integer division %/% and remainder in integer division (known as modulo arithmetic) %%

2 + 2
2/2
2 * 2
2^2
2 == 2
23%/%2
23%%2

Other Key Symbols

<- is the assignment operator, it declares something is something else

ID <- 3
ID

## [1] 3

TRT <- "placebo"
TRT

## [1] "placebo"

Other Key Symbols (II)

: is the sequence operator

1:10

##  [1]  1  2  3  4  5  6  7  8  9 10

# it increments by one
a <- 100:120
a

##  [1] 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116
## [18] 117 118 119 120

This is handy

Sequence and Repeat

Can have more fine control

# seq(from #, to #, by increment)
seq(0, 24, 0.5)

##  [1]  0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5
## [15]  7.0  7.5  8.0  8.5  9.0  9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5
## [29] 14.0 14.5 15.0 15.5 16.0 16.5 17.0 17.5 18.0 18.5 19.0 19.5 20.0 20.5
## [43] 21.0 21.5 22.0 22.5 23.0 23.5 24.0


# rep (values to repeat, # of times to repeat, ...)
rep(c("a", "b"), 3)

## [1] "a" "b" "a" "b" "a" "b"

Comments in R

# denotes a comment in R
Anything after the # is not evaluated and ignored in R
This is handy for making things reproducible


# weight of female adults patients in kg
weight <- c(46.3, 51, 42)

R Advanced Math

R also supports advanced mathematical features and expressions
R can take integrals and derivatives and express complex functions
Easiest of all, R can generate distributions of data very easily
- e.g. rnorm(100) or rbinom(100)
This comes in handy when writing examples and building analyses because it is trivial to generate a synthetic piece of data to use as an example
Try typing hist(rnorm(10000))

Using the Workspace

To do more we need to learn how to manipulate the 'workspace'.
This includes all the vectors, datasets, and functions stored in memory.
All R objects are stored in the memory of the computer, limiting the available space for calculation to the size of the RAM on your machine.
R makes organizing the workspace easy.

Using the Workspace (2)

x<-5 #store a variable with <-
x    #print the variable
z<-3 
ls() #list all variables
ls.str() #list and describe variables
rm(x)    # delete a variable
ls()

You can also also visually see what is in your workspace in Rstudio

R as a Language

R is more than statistical software, it is a computer language
Like any language it has rules (some poorly enforced), and conventions
Case sensitivity matters

a <- 3
A <- 4
print(c(a, A))

## [1] 3 4

a ≠ A 2. What happens if I type print(a,A)?

Concatenating with `c`

So what does c do?

A <- c(3, 4)
print(A)

## [1] 3 4

c stands for concatenate and allows vectors to have multiple elements
If you ever need two elements in a vector, you need to wrap it up in c, which is one of the most used functions you will ever use
c is important to put any vector together, but remember that objects within a vector must all be of the same type

Unintended Consequences

R is smart - it will often infer what you are trying to do, but this can sometimes get you into trouble

f <- c(3, "a")
class(f)

## [1] "character"

f[2] <- 5
f

## [1] "3" "5"

class(f)

## [1] "character"

Some Other Language Bugs Features

R is maddeningly inconsistent in it's naming conventions
- Some functions are camelCase; others are.dot.separated; others use_underscores
- Function results are stored in a variety of ways across function implementations
- R has multiple graphics packages that different functions use for default plot construction (base, grid, lattice, and ggplot2)
- R has multiple packages and functions to do the same analysis as well, though some standardization has started to occur
- Be flexible and be aware of R's flexibility

Special Operators

The comparison operators <, >, <=, >=, ==, and != are used to compare values across vectors

big <- c(9, 12, 15, 25)
small <- c(9, 3, 4, 2)
# Give us a nice vector of logical values
big > small

## [1] FALSE  TRUE  TRUE  TRUE

big = small
# Oops--don't do this, reassigns big to small
print(big)

## [1] 9 3 4 2

Comparison operators can be tricky, so to keep it straight never use = or == to assign anything, always use <-

Special Operators II

The best way to evaluate these objects is to use brackets [] to avoid confusion

big <- c(9, 12, 15, 25)
big[big == small]

## [1] 9

# Returns values where the logical vector is true
big[big > small]

## [1] 12 15 25

big[big < small]  # Returns an empty set

## numeric(0)

Special operators (III)

The logical operators | (or) and & (and) can be used to combine two logical values and produce another logical value as the result. The operator ! (not) negates a logical value. These operators allow complex conditions to be constructed.

foo <- c(NA, 4, 9, 8.7)
!is.na(foo)  # Returns TRUE for non-NA

## [1] FALSE  TRUE  TRUE  TRUE

a <- foo[!is.na(foo)]
a

## [1] 4.0 9.0 8.7

Data Structures in R

R has a number of basic data classes as well as arbitrary specialized object types for various purposes
vectors are the basic data class in R and can be thought of as a single column of data (even a column of length 1)
matrices and arrays are rows and columns of all the same mode data
data frames are rows and columns where the columns can represent different data types
lists are arbitrary combinations of disparate object types in R

R for Pharmacometricians

Data Handling
Data Manipulation
- Some Frequent Tasks
Plotting
Reporting

The rest of this presenation will be example-driven, and will often draw on the quinidine dataset provided or the built-in R datasets

Reading in Data

Data from a variety of formats an be read into R - two of the most common are .tab and .csv

Lets read in the dataset:

data <- read.csv("./data/PopPKIVdata.csv", stringsAsFactors = F, skip = 3, header = T)
head(data)

##   CID TIME  CONC AMT DOSE MDV   AGE    WT   SCR ISM  CLCR
## 1   1 0.00  0.00 100  100   1 34.82 38.21 1.113   0 42.63
## 2   1 0.25 13.03   0  100   0 34.82 38.21 1.113   0 42.63
## 3   1 0.50 14.98   0  100   0 34.82 38.21 1.113   0 42.63
## 4   1 0.75 14.16   0  100   0 34.82 38.21 1.113   0 42.63
## 5   1 1.00 19.32   0  100   0 34.82 38.21 1.113   0 42.63
## 6   1 1.50 13.15   0  100   0 34.82 38.21 1.113   0 42.63

We can quickly find additional information about the dataset

# What are the column names?
names(data)

##  [1] "CID"  "TIME" "CONC" "AMT"  "DOSE" "MDV"  "AGE"  "WT"   "SCR"  "ISM" 
## [11] "CLCR"

# how long is the dataset - did everything read in correctly?
nrow(data)

## [1] 1600

# numer of columns
ncol(data)

## [1] 11

# What is the structure of each column
str(data)

## 'data.frame':    1600 obs. of  11 variables:
##  $ CID : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ TIME: num  0 0.25 0.5 0.75 1 1.5 2 2.5 3 4 ...
##  $ CONC: num  0 13 15 14.2 19.3 ...
##  $ AMT : int  100 0 0 0 0 0 0 0 0 0 ...
##  $ DOSE: int  100 100 100 100 100 100 100 100 100 100 ...
##  $ MDV : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ AGE : num  34.8 34.8 34.8 34.8 34.8 ...
##  $ WT  : num  38.2 38.2 38.2 38.2 38.2 ...
##  $ SCR : num  1.11 1.11 1.11 1.11 1.11 ...
##  $ ISM : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ CLCR: num  42.6 42.6 42.6 42.6 42.6 ...

We can get some summary statistics

summary(data)

##       CID             TIME             CONC             AMT       
##  Min.   :  1.0   Min.   : 0.000   Min.   :  0.00   Min.   :  0.0  
##  1st Qu.: 25.8   1st Qu.: 0.938   1st Qu.:  6.93   1st Qu.:  0.0  
##  Median : 50.5   Median : 2.750   Median : 13.49   Median :  0.0  
##  Mean   : 50.5   Mean   : 6.344   Mean   : 18.11   Mean   : 10.9  
##  3rd Qu.: 75.2   3rd Qu.: 9.000   3rd Qu.: 24.95   3rd Qu.:  0.0  
##  Max.   :100.0   Max.   :24.000   Max.   :117.37   Max.   :250.0  
##       DOSE          MDV              AGE             WT       
##  Min.   :100   Min.   :0.0000   Min.   :19.2   Min.   : 20.4  
##  1st Qu.:100   1st Qu.:0.0000   1st Qu.:33.8   1st Qu.: 40.1  
##  Median :175   Median :0.0000   Median :41.3   Median : 49.8  
##  Mean   :175   Mean   :0.0625   Mean   :43.0   Mean   : 51.9  
##  3rd Qu.:250   3rd Qu.:0.0000   3rd Qu.:48.7   3rd Qu.: 59.2  
##  Max.   :250   Max.   :1.0000   Max.   :79.3   Max.   :113.9  
##       SCR             ISM            CLCR      
##  Min.   :0.760   Min.   :0.00   Min.   : 19.0  
##  1st Qu.:0.976   1st Qu.:0.00   1st Qu.: 39.9  
##  Median :1.114   Median :0.00   Median : 53.8  
##  Mean   :1.143   Mean   :0.47   Mean   : 58.1  
##  3rd Qu.:1.267   3rd Qu.:1.00   3rd Qu.: 72.9  
##  Max.   :1.720   Max.   :1.00   Max.   :129.0

Summaries with table()

table(data$ISM)

## 
##   0   1 
## 848 752

table(data$ISM, data$DOSE)

##    
##     100 250
##   0 448 400
##   1 352 400

Working with data frames

class(data)

Data frames are uniform in shape and information inside a column - this makes it easy to work with!
[ ]'s are an easy way to get to certain parts of a data frame
A data frame is written as: data[row, column]

data[2, ]

##   CID TIME  CONC AMT DOSE MDV   AGE    WT   SCR ISM  CLCR
## 2   1 0.25 13.03   0  100   0 34.82 38.21 1.113   0 42.63

head(data[, 2], n = 16)

##  [1]  0.00  0.25  0.50  0.75  1.00  1.50  2.00  2.50  3.00  4.00  6.00
## [12]  8.00 12.00 16.00 20.00 24.00

Columns using $

Can easily call on or manipulate a single column via dataframe$colname

head(data$AMT)

## [1] 100   0   0   0   0   0

unique(data$TIME)

##  [1]  0.00  0.25  0.50  0.75  1.00  1.50  2.00  2.50  3.00  4.00  6.00
## [12]  8.00 12.00 16.00 20.00 24.00

POP QUIZ

How can easily check the number of unique individuals in the dataset using what we've learned so far?

ANSWER

length(unique(data$CID))

## [1] 100

Subsetting

Can easily subset rows of the data frame using the [ ] in combination with $

ID1 <- data[data$CID == 1, ]
head(ID1, n = 3)

##   CID TIME  CONC AMT DOSE MDV   AGE    WT   SCR ISM  CLCR
## 1   1 0.00  0.00 100  100   1 34.82 38.21 1.113   0 42.63
## 2   1 0.25 13.03   0  100   0 34.82 38.21 1.113   0 42.63
## 3   1 0.50 14.98   0  100   0 34.82 38.21 1.113   0 42.63

tail(ID1, n = 3)

##    CID TIME  CONC AMT DOSE MDV   AGE    WT   SCR ISM  CLCR
## 14   1   16 2.418   0  100   0 34.82 38.21 1.113   0 42.63
## 15   1   20 4.759   0  100   0 34.82 38.21 1.113   0 42.63
## 16   1   24 2.365   0  100   0 34.82 38.21 1.113   0 42.63

Subsetting (II)

Or can take only a subset of columns

datasmall <- data[, c(1:4)]

or using the column names

keepcols <- c("CID", "TIME", "CONC", "AMT")
datasmall <- data[, keepcols]

Using both together


keepcols <- c("CID", "TIME", "CONC", "AMT")
ID1 <- data[data$CID == 1, keepcols]
head(ID1)

##   CID TIME  CONC AMT
## 1   1 0.00  0.00 100
## 2   1 0.25 13.03   0
## 3   1 0.50 14.98   0
## 4   1 0.75 14.16   0
## 5   1 1.00 19.32   0
## 6   1 1.50 13.15   0

Multiple subsets

You can also use the & and | operators to do multiple subsets at once

Let's make a dataset with only males less than 40 years old

youngmales <- data[data$ISM != 0 & data$AGE <= 40, ]
head(youngmales[!duplicated(youngmales$CID), ])

##     CID TIME CONC AMT DOSE MDV   AGE    WT    SCR ISM   CLCR
## 17    2    0    0 100  100   1 32.77 74.84 0.8846   1 126.00
## 33    3    0    0 100  100   1 35.97 37.30 1.1004   1  48.98
## 49    4    0    0 100  100   1 38.21 32.97 1.1972   1  38.93
## 257  17    0    0 100  100   1 33.80 49.64 1.0998   1  66.57
## 337  22    0    0 100  100   1 33.14 52.33 1.4894   1  52.15
## 401  26    0    0 100  100   1 31.03 27.41 0.9585   1  43.28

Subsetting to assign values

What if we needed to change all the 0's in the AMT column to NA values

# so we can keep the original dataset
changedata <- data
head(changedata, n = 3)

##   CID TIME  CONC AMT DOSE MDV   AGE    WT   SCR ISM  CLCR
## 1   1 0.00  0.00 100  100   1 34.82 38.21 1.113   0 42.63
## 2   1 0.25 13.03   0  100   0 34.82 38.21 1.113   0 42.63
## 3   1 0.50 14.98   0  100   0 34.82 38.21 1.113   0 42.63

changedata[changedata$AMT == 0, "AMT"] <- "NA"
head(changedata, n = 3)

##   CID TIME  CONC AMT DOSE MDV   AGE    WT   SCR ISM  CLCR
## 1   1 0.00  0.00 100  100   1 34.82 38.21 1.113   0 42.63
## 2   1 0.25 13.03  NA  100   0 34.82 38.21 1.113   0 42.63
## 3   1 0.50 14.98  NA  100   0 34.82 38.21 1.113   0 42.63

Adding Columns to a DF with $

# find the mean age and add it to new column
data$MEANAGE <- mean(data$AGE)
head(data, n = 3)

##   CID TIME  CONC AMT DOSE MDV   AGE    WT   SCR ISM  CLCR MEANAGE
## 1   1 0.00  0.00 100  100   1 34.82 38.21 1.113   0 42.63   42.99
## 2   1 0.25 13.03   0  100   0 34.82 38.21 1.113   0 42.63   42.99
## 3   1 0.50 14.98   0  100   0 34.82 38.21 1.113   0 42.63   42.99

Transforming data

What if we wanted time to be in minutes rather than hours

data$TIMEMIN <- data$TIME * 60
head(data)

##   CID TIME  CONC AMT DOSE MDV   AGE    WT   SCR ISM  CLCR TIMEMIN
## 1   1 0.00  0.00 100  100   1 34.82 38.21 1.113   0 42.63       0
## 2   1 0.25 13.03   0  100   0 34.82 38.21 1.113   0 42.63      15
## 3   1 0.50 14.98   0  100   0 34.82 38.21 1.113   0 42.63      30
## 4   1 0.75 14.16   0  100   0 34.82 38.21 1.113   0 42.63      45
## 5   1 1.00 19.32   0  100   0 34.82 38.21 1.113   0 42.63      60
## 6   1 1.50 13.15   0  100   0 34.82 38.21 1.113   0 42.63      90

data$TIMEMIN <- NULL

Create some Data to Merge

tableOne <- data.frame(ID = c(1:3), PERIOD = c("a", "b", "c"))
tableTwo <- data.frame(ID = c(seq(1, 5, 2)), PERIOD = c("a", "c", "f"), ISM = c(0, 
    1, 0))

tableOne

##   ID PERIOD
## 1  1      a
## 2  2      b
## 3  3      c

tableTwo

##   ID PERIOD ISM
## 1  1      a   0
## 2  3      c   1
## 3  5      f   0

Merging Options

Can merge by similarities in specific column, specific index, or can combine all

merge(tableOne, tableTwo, by = "ID")

##   ID PERIOD.x PERIOD.y ISM
## 1  1        a        a   0
## 2  3        c        c   1

merge(tableOne, tableTwo, by = "ID", all = T)

##   ID PERIOD.x PERIOD.y ISM
## 1  1        a        a   0
## 2  2        b     <NA>  NA
## 3  3        c        c   1
## 4  5     <NA>        f   0

Merging Options (II)

merge(tableOne, tableTwo, by = "ID", all.x = T)

##   ID PERIOD.x PERIOD.y ISM
## 1  1        a        a   0
## 2  2        b     <NA>  NA
## 3  3        c        c   1

merge(tableOne, tableTwo, by = "ID", all.y = T)

##   ID PERIOD.x PERIOD.y ISM
## 1  1        a        a   0
## 2  3        c        c   1
## 3  5     <NA>        f   0

Merging Options (III)

Can also bind together next to one another

cbind(tableOne, tableTwo)

##   ID PERIOD ID PERIOD ISM
## 1  1      a  1      a   0
## 2  2      b  3      c   1
## 3  3      c  5      f   0

Merging Options (IV)

Can stack two data frames using rbind but data frames must have equal number of columns

tableOne <- data.frame(ID = c(1:3), PERIOD = c("a", "b", "c"))
tableTwo <- data.frame(ID = c(seq(1, 5, 2)), PERIOD = c("a", "c", "f"))
rbind(tableOne, tableTwo)

##   ID PERIOD
## 1  1      a
## 2  2      b
## 3  3      c
## 4  1      a
## 5  3      c
## 6  5      f

Reshaping Data

There is an excellent package (library) called reshape2

Let's load it and get some data to play with

library(reshape2)
keepcols <- c("CID", "TIME", "CONC")
concTime <- data[data$CID < 10, keepcols]
tail(concTime)

##     CID TIME   CONC
## 139   9    6 20.458
## 140   9    8 17.128
## 141   9   12 12.192
## 142   9   16  5.068
## 143   9   20 10.035
## 144   9   24  8.961

How can we make a table of concentrations by time for all ID's? (long to wide conversion)

molten <- melt(concTime, id.vars = c("TIME", "CID"), measure.vars = c("CONC"))
head(molten)

##   TIME CID variable value
## 1 0.00   1     CONC  0.00
## 2 0.25   1     CONC 13.03
## 3 0.50   1     CONC 14.98
## 4 0.75   1     CONC 14.16
## 5 1.00   1     CONC 19.32
## 6 1.50   1     CONC 13.15

concTimeWide <- dcast(molten, TIME + variable ~ CID)
head(concTimeWide, n = 4)

##   TIME variable     1      2     3     4     5      6    7     8     9
## 1 0.00     CONC  0.00  0.000  0.00  0.00  0.00  0.000  0.0  0.00  0.00
## 2 0.25     CONC 13.03  7.420 12.80 24.69 11.33  7.453 12.1 30.88 37.13
## 3 0.50     CONC 14.98  4.987 15.15 34.27 12.68 12.971 11.1 30.39 35.43
## 4 0.75     CONC 14.16 13.630 12.30 32.88 12.16  9.782 11.3 25.95 30.43

PLYR

Lets introduce another powerful library plyr

It is based around the paradigm SPLIT-APPLY-COMBINE

library(plyr)

cmax <- ddply(data, .(CID), summarize, CMAXobs = max(CONC))
head(cmax)

##   CID CMAXobs
## 1   1   19.32
## 2   2   14.52
## 3   3   15.15
## 4   4   34.27
## 5   5   13.41
## 6   6   17.07

Lets merge the CMAX column back into the dataset

data <- merge(data, cmax)
head(data, n = 3)

##   CID TIME  CONC AMT DOSE MDV   AGE    WT   SCR ISM  CLCR CMAXobs
## 1   1 0.00  0.00 100  100   1 34.82 38.21 1.113   0 42.63   19.32
## 2   1 0.25 13.03   0  100   0 34.82 38.21 1.113   0 42.63   19.32
## 3   1 0.50 14.98   0  100   0 34.82 38.21 1.113   0 42.63   19.32

tail(data, n = 3)

##      CID TIME   CONC AMT DOSE MDV   AGE    WT   SCR ISM  CLCR CMAXobs
## 1598 100   16  3.969   0  250   0 40.15 58.73 1.248   0 55.49   24.05
## 1599 100   20 11.177   0  250   0 40.15 58.73 1.248   0 55.49   24.05
## 1600 100   24  6.903   0  250   0 40.15 58.73 1.248   0 55.49   24.05

Sorting Data

Lets sort the data by time

keep <- c("CID", "TIME", "CONC", "DOSE")
sorted <- data[, keep]
sorted <- sorted[order(sorted$TIME), ]
head(sorted)

##    CID TIME CONC DOSE
## 1    1    0    0  100
## 17   2    0    0  100
## 33   3    0    0  100
## 49   4    0    0  100
## 65   5    0    0  100
## 81   6    0    0  100

Multiple Sorts at Once

First let's 'mess up' the dataset - sort by descending concentrations

concSort <- sort[order(-sort$CONC), ]

## Error: object of type 'closure' is not subsettable

head(concSort)

## Error: object 'concSort' not found

##Mutliple Sorts at Once (II)

let's see what happens if we sort by ID

concSort <- concSort[order(concSort$CID), ]

## Error: object 'concSort' not found

head(concSort)

## Error: object 'concSort' not found

Oh no the times aren't in order!

Multiple Sorts at Once (III)

Thankfully we can fix this by sorting ID and time within ID

concSort <- concSort[order(concSort$CID, concSort$TIME), ]

## Error: object 'concSort' not found

head(concSort)

## Error: object 'concSort' not found

Sample Dosing Sheet Creation

conc <- c(0)
nSubj <- 40
times <- c(0, 0.5, 1, 2, 3, 4, 6, 8, 12, 16, 24)
t <- expand.grid(TIME = times, CID = 1:nSubj, CONC = conc)
t <- as.data.frame(t)
democols <- c("CID", "AGE", "WT", "CLCR")
demo <- unique(data[, democols])
res <- merge(t, demo, by = "CID")
head(res)

##   CID TIME CONC   AGE    WT  CLCR
## 1   1  0.0    0 34.82 38.21 42.63
## 2   1  0.5    0 34.82 38.21 42.63
## 3   1  1.0    0 34.82 38.21 42.63
## 4   1  2.0    0 34.82 38.21 42.63
## 5   1  3.0    0 34.82 38.21 42.63
## 6   1  4.0    0 34.82 38.21 42.63

Plotting

There are three main graphics systems that have evolved with R to allow for plotting
- Base
- Lattice
- ggplot2

ggplot is the newest to the 'game' but has risen to be the most popular package due to its ease of use, consistent style, and quality graphics it produces.

Intro to ggplot

Lets dive right in and create a subset of the original dataset provided

# lets take a subset of some individuals - make sure get some from 250 mg
# and 100 mg doses
PKdata100 <- data[data$CID <= 6 & data$DOSE == 100, ]
PKdata250 <- data[data$CID >= 95 & data$DOSE == 250, ]
PKdata <- rbind(PKdata100, PKdata250)
PKdataID <- PKdata[!duplicated(PKdata$CID), ]
# lets check to make sure we have males and females at each dose group (for
# later examples)
table(PKdataID$ISM, PKdataID$DOSE)

##    
##     100 250
##   0   3   2
##   1   3   4

qplot for quick visualizations

library(ggplot2)
qplot(data = PKdata, x = TIME, y = CONC)

plot of chunk unnamed-chunk-42

qplot for quick visualizations

qplot(data = PKdata, x = TIME, y = CONC)

plot of chunk unnamed-chunk-43

Additional customization

Display additional variables through the use of:
- geoms: how the data is displayed
- aesthetics: control of shape, color, and size of geom
- facetting: how geoms are displayed

Let's Add Some Color

qplot(data = PKdata, x = TIME, y = CONC, color = as.factor(CID))

# Add lines as well as a horizontal line at a y-intercept of 40
qplot(data = PKdata, x = TIME, y = CONC, color = as.factor(CID), geom = c("point", 
    "line")) + geom_hline(aes(yintercept = 40))

plot of chunk unnamed-chunk-45

Statistical Plots Are Easy Too!

IDinfo <- data[!duplicated(data$CID), ]
qplot(data = IDinfo, x = AGE)

plot of chunk unnamed-chunk-46

# change the binwidth and add a red vertical line dotted line @ median age
qplot(data = IDinfo, x = AGE, binwidth = 5) + geom_vline(aes(xintercept = median(AGE), 
    color = "red"), linetype = 2)

plot of chunk unnamed-chunk-47

# lets look at the distribution of males and females
qplot(data = IDinfo, x = AGE, fill = as.factor(ISM))

plot of chunk unnamed-chunk-48

What About Facetting

qplot(data = IDinfo, x = AGE, binwidth = 5) + facet_wrap(~ISM)

plot of chunk unnamed-chunk-49

Boxplots

qplot(data = IDinfo, x = as.factor(ISM), y = AGE, geom = "boxplot")

plot of chunk unnamed-chunk-50

More Boxplot Options

qplot(data = IDinfo, x = as.factor(ISM), y = AGE, geom = c("boxplot", "point"))

plot of chunk unnamed-chunk-51

'ggplot'

qplot is a basic function in the ggplot package to allow for quick graphics. To really start fine tuning your control you can use the ggplot function that will allow easy control of everything about the plot.
to learn more here are a view great resources:

What's Missing?

What we've shown has only scratched the surface of R's capabilities - so what else can be done?

Functions
Statistics
Reporting
Version Control/Reproducible Research

Overview

Getting Started

R Vocabulary

Components of an R Setup

Rstudio

Let's Look at RStudio

Self-help

Self-help (2)

R As A Calculator

Arithmetic Operators

Other Key Symbols

Other Key Symbols (II)

Sequence and Repeat

Comments in R

R Advanced Math

Using the Workspace

Using the Workspace (2)

R as a Language

Concatenating with c

Unintended Consequences

Some Other Language Bugs Features

Special Operators

Special Operators II

Special operators (III)

Data Structures in R

R for Pharmacometricians

Reading in Data

Summaries with table()

Working with data frames

Columns using $

POP QUIZ

ANSWER

Subsetting

Subsetting (II)

Using both together

Multiple subsets

Subsetting to assign values

Adding Columns to a DF with $

Transforming data

Create some Data to Merge

Merging Options

Merging Options (II)

Merging Options (III)

Merging Options (IV)

Reshaping Data

PLYR

Sorting Data

Multiple Sorts at Once

Multiple Sorts at Once (III)

Sample Dosing Sheet Creation

Plotting

Intro to ggplot

qplot for quick visualizations

qplot for quick visualizations

Additional customization

Let's Add Some Color

Statistical Plots Are Easy Too!

What About Facetting

Boxplots

More Boxplot Options

'ggplot'

What's Missing?

Concatenating with `c`