R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. To download R, please choose your preferred CRAN mirror.
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, .) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. R is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS. R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes
The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software. R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly. Many users think of R as a statistics system. We prefer to think of it of an environment within which statistical techniques are implemented. R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics. R has its own LaTeX-like documentation format, which is used to supply comprehensive documentation, both on-line in a number of formats and in hardcopy.
Once you have downloaded and installed R, with the default settings, you may then launch R from your Start Menu. On opening, the following will appear:
The above is the R Console where you write your R commands and functions. Alternatively, you may use R Commander.
R provides a powerful and comprehensive system for analysing data and when used in conjunction with the R-commander (a graphical user interface, commonly known as Rcmdr) it also provides one that is easy and intuitive to use. Basically, R provides the engine that carries out the analyses and Rcmdr provides a convenient way for users to input commands. The Rcmdr program enables analysts to access a selection of commonly-used R commands using a simple interface that should be familiar to most computer users. It also serves the important role of helping users to implement R commands and develop their knowledge and expertise in using the command line — an important skill for those wishing to exploit the full power of the program.
To download Rstudio Desktop (Free Vesion) visit https://www.rstudio.com/products/rstudio/download3/ and to learn about using the RStudio IDE, visit http://dss.princeton.edu/training/RStudio101.pdf.
The help() function and ? help operator in R provide access to the documentation pages for R functions, data sets, and other objects, both for packages in the standard R distribution and for contributed packages. To access documentation for the standard lm (linear model) function, for example, enter the command help(lm) or help(“lm”), or ?lm or ?“lm” (i.e., the quotes are optional). To access help for a function in a package that’s not currently loaded, specify in addition the name of the package: For example, to obtain documentation for the rlm() (robust linear model) function in theMASS package, help(rlm, package=“MASS”). Standard names in R consist of upper- and lower-case letters, numerals (0-9), underscores (_), and periods (.), and must begin with a letter or a period. To obtain help for an object with a non-standardname (such as the help operator ?), the name must be quoted: for example, help(‘?’) or ?“?”. You may also use the help() function to access information about a package in your library - for example, help(package=“MASS”) - which displays an index of available help pages for the package along with some other information. Help pages for functions usually include a section with executable examples illustrating how the functions work. You can execute these examples in the current R session via the example() command: e.g., example(lm).
Many packages include vignettes, which are discursive documents meant to illustrate and explain facilities in the package. You can discover vignettes by accessing the help page for a package, or via the browseVignettes() function: the command browseVignettes() opens a list of vignettes from all of your installed packages in your browser, while browseVignettes(package=package-name) (e.g.,browseVignettes(package=“survival”)) shows the vignettes, if any, for a particular package.vignette() is employed similarly, but displays a list of vignettes in text form. You can also use the vignette(“vignette-name”) command to view a vignette (possibly specifying the name of the package in which the vignette resides, if the vignette name is not unique): for example,vignette(“timedep”) or vignette(“timedep”, package=“survival”) (which are, in this case, equivalent). Vignettes may also be accessed from the CRAN page for the package (e.g. survival), if you wish to review the vignette for a package prior to installing and/or using it. Packages may also include extended code demonstrations (“demos”). The command demo() lists all demos for all packages in your library, while demo(package=“package-name”) (e.g.,demo(package=“stats”)) lists demos in a particular package. To run a demo, call the demo() function with the quoted name of the demo (e.g., demo(“nlm”)), specifying the name of the package if the name of the demo isn’t unique (e.g., demo(“nlm”, package=“stats”), where, in this case, the package name need not be given explicitly).
The help() function and ? operator are useful only if you already know the name of the function that you wish to use. There are also facilities in the standard R distribution for discovering functions and other objects. The following functions cast a progressively wider net. Use the help system to obtain complete documentation for these functions: for example, ?apropos.
apropos()
The apropos() function searches for objects, including functions, directly accessible in the current R session that have names that include a specified character string. This may be a literal string or aregular expression to be used for pattern-matching (see ?“regular expression”). By default, string matching by apropos() is case-insensitive. For example, apropos(“^glm”) returns the names of all accessible objects that start with the (case-insensitive) characters “glm”.
help.search() and ??
The help.search() function scans the documentation for packages installed in your library. The (first) argument to help.search() is a character string or regular expression. For example,help.search(“^glm”) searches for help pages, vignettes, and code demos that have help “aliases,” “concepts,” or titles that begin (case-insensitively) with the characters “glm”. The ?? operator is a synonym for help.search(): for example, ??“^glm”.
RSiteSearch()
RSiteSearch() uses an internet search engine (also see below) to search for information in function help pages and vignettes for all CRAN packages, and in CRAN task views (described below). Unlike theapropos() and help.search() functions, RSiteSearch() requires an active internet connection and doesn’t employ regular expressions. Braces may be used to specify multi-word terms; otherwise matches for individual words are included. For example, RSiteSearch(“{generalized linear model}”)returns information about R functions, vignettes, and CRAN task views related to the term“generalized linear model” without matching the individual words “generalized”, “linear”, or“model”.
findfn() and ??? in the sos package, which is not part of the standard R distribution but is available on CRAN, provide an alternative interface to RSiteSearch().
help.start()
help.start() starts and displays a hypertext based version of R’s online documentation in your default browser that provides links to locally installed versions of the R manuals, a listing of your currently installed packages and other documentation resources.
R Help on the Internet
There are internet search sites that are specialized for R searches, including search.r-project.org(which is the site used by RSiteSearch) and Rseek.org.
It is also possible to use a general search site like Google, by qualifying the search with “R” or the name of an R package (or both). It can be particularly helpful to paste an error message into a search engine to find out whether others have solved a problem that you encountered.
CRAN Task Views are documents that summarize R resources on CRAN in particular areas of application, helping your to navigate the maze of thousands of CRAN packages. A list of available Task Views may be found on CRAN.
There are three primary FAQ listings which are periodically updated to reflect very commonly asked questions by R users. There is a Main R FAQ, a Windows specific R FAQ and a Mac OS (OS X) specific R FAQ.
If you find that you can’t answer a question or solve a problem yourself, you can ask others for help, either locally (if you know someone who is knowledgeable about R) or on the internet. In order to ask a question effectively, it helps to phrase the question clearly, and, if you’re trying to solve a problem, to include a small, self-contained, reproducible example of the problem that others can execute. For information on how to ask questions, see, e.g., the R mailing list posting guide, and the document abouthow to create reproducible examples for R on Stack Overflow.
Stack Overflow is a well organized and formatted site for help and discussions about programming. It has excellent searchability. Topics are tagged, and “r” is a very popular tag on the site with almost 150,000 questions (as of summer 2016). To go directly to R-related topics, visithttp://stackoverflow.com/questions/tagged/r. For an example both of the value of the site’s organization and information that is very useful to R users, see “How to make a great R reproducible example?”, which is also mentioned above.
The R Project maintains a number of subscription-based email lists for posing and answering questions about R, including the general R-help email list, the R-devel list for R code development, and R-package-devel list for developers of CRAN packages; lists for announcements about R and R packages; and a variety of more specialized lists. Before posing a question on one of these lists, please read the R mailing list instructions and the posting guide.
# Creating vectors
a <- c(1, 2, 5, 3, 6, -2, 4)
b <- c("one", "two", "three")
c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
# selecting specific datapoints
a <- c(1, 2, 5, 3, 6, -2, 4)
a[3]
## [1] 5
a[c(1, 3, 5)]
## [1] 1 5 6
a[2:6]
## [1] 2 5 3 6 -2
# Creating Matrices
y <- matrix(1:20, nrow = 5, ncol = 4)
y
## [,1] [,2] [,3] [,4]
## [1,] 1 6 11 16
## [2,] 2 7 12 17
## [3,] 3 8 13 18
## [4,] 4 9 14 19
## [5,] 5 10 15 20
cells <- c(1, 26, 24, 68)
rnames <- c("R1", "R2")
cnames <- c("C1", "C2")
mymatrix <- matrix(cells, nrow = 2, ncol = 2, byrow = TRUE,
dimnames = list(rnames, cnames))
mymatrix
## C1 C2
## R1 1 26
## R2 24 68
mymatrix <- matrix(cells, nrow = 2, ncol = 2, byrow = FALSE,
dimnames = list(rnames, cnames))
mymatrix
## C1 C2
## R1 1 24
## R2 26 68
# selecting elements
x <- matrix(1:10, nrow = 2)
x
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
x[2, ]
## [1] 2 4 6 8 10
x[, 2]
## [1] 3 4
x[1, 4]
## [1] 7
x[1, c(4, 5)]
## [1] 7 9
# Creating an array
dim1 <- c("A1", "A2")
dim2 <- c("B1", "B2", "B3")
dim3 <- c("C1", "C2", "C3", "C4")
z <- array(1:24, c(2, 3, 4), dimnames = list(dim1,
dim2, dim3))
z
## , , C1
##
## B1 B2 B3
## A1 1 3 5
## A2 2 4 6
##
## , , C2
##
## B1 B2 B3
## A1 7 9 11
## A2 8 10 12
##
## , , C3
##
## B1 B2 B3
## A1 13 15 17
## A2 14 16 18
##
## , , C4
##
## B1 B2 B3
## A1 19 21 23
## A2 20 22 24
# Creating a dataframe
patientID <- c(1, 2, 3, 4)
age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")
patientdata <- data.frame(patientID, age, diabetes,
status)
patientdata
## patientID age diabetes status
## 1 1 25 Type1 Poor
## 2 2 34 Type2 Improved
## 3 3 28 Type1 Excellent
## 4 4 52 Type1 Poor
# Specifying elements of a dataframe
patientdata[1:2]
## patientID age
## 1 1 25
## 2 2 34
## 3 3 28
## 4 4 52
patientdata[c("diabetes", "status")]
## diabetes status
## 1 Type1 Poor
## 2 Type2 Improved
## 3 Type1 Excellent
## 4 Type1 Poor
patientdata$age
## [1] 25 34 28 52
# Using factors
patientID <- c(1, 2, 3, 4)
age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")
diabetes <- factor(diabetes)
status <- factor(status, order = TRUE)
patientdata <- data.frame(patientID, age, diabetes,
status)
str(patientdata)
## 'data.frame': 4 obs. of 4 variables:
## $ patientID: num 1 2 3 4
## $ age : num 25 34 28 52
## $ diabetes : Factor w/ 2 levels "Type1","Type2": 1 2 1 1
## $ status : Ord.factor w/ 3 levels "Excellent"<"Improved"<..: 3 2 1 3
summary(patientdata)
## patientID age diabetes status
## Min. :1.00 Min. :25.00 Type1:3 Excellent:1
## 1st Qu.:1.75 1st Qu.:27.25 Type2:1 Improved :1
## Median :2.50 Median :31.00 Poor :2
## Mean :2.50 Mean :34.75
## 3rd Qu.:3.25 3rd Qu.:38.50
## Max. :4.00 Max. :52.00
# Creating a list
g <- "My First List"
h <- c(25, 26, 18, 39)
j <- matrix(1:10, nrow = 5)
k <- c("one", "two", "three")
mylist <- list(title = g, ages = h, j, k)
mylist
## $title
## [1] "My First List"
##
## $ages
## [1] 25 26 18 39
##
## [[3]]
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
##
## [[4]]
## [1] "one" "two" "three"
# Creating the leadership data frame
manager <- c(1, 2, 3, 4, 5)
date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08",
"5/1/09")
gender <- c("M", "F", "F", "M", "F")
age <- c(32, 45, 25, 39, 99)
q1 <- c(5, 3, 3, 3, 2)
q2 <- c(4, 5, 5, 3, 2)
q3 <- c(5, 2, 5, 4, 1)
q4 <- c(5, 5, 5, NA, 2)
q5 <- c(5, 5, 2, NA, 1)
leadership <- data.frame(manager, date, gender, age,
q1, q2, q3, q4, q5, stringsAsFactors = FALSE)
# the individual vectors are no longer needed
rm(manager, date, gender, age, q1, q2, q3, q4, q5)
# Creating new variables
mydata <- data.frame(x1 = c(2, 2, 6, 4), x2 = c(3,
4, 2, 8))
mydata$sumx <- mydata$x1 + mydata$x2
mydata$meanx <- (mydata$x1 + mydata$x2)/2
attach(mydata)
mydata$sumx <- x1 + x2
mydata$meanx <- (x1 + x2)/2
detach(mydata)
mydata <- transform(mydata, sumx = x1 + x2, meanx = (x1 +
x2)/2)
# Recoding variables
leadership$agecat[leadership$age > 75] <- "Elder"
leadership$agecat[leadership$age > 45 &
leadership$age <= 75] <- "Middle Aged"
leadership$agecat[leadership$age <= 45] <- "Young"
# or more compactly
leadership <- within(leadership, {
agecat <- NA
agecat[age > 75] <- "Elder"
agecat[age >= 55 & age <= 75] <- "Middle Aged"
agecat[age < 55] <- "Young"
})
# Renaming variables with the reshape package
library(reshape)
rename(leadership, c(manager = "managerID", date = "testDate"))
## managerID testDate gender age q1 q2 q3 q4 q5 agecat
## 1 1 10/24/08 M 32 5 4 5 5 5 Young
## 2 2 10/28/08 F 45 3 5 2 5 5 Young
## 3 3 10/1/08 F 25 3 5 5 5 2 Young
## 4 4 10/12/08 M 39 3 3 4 NA NA Young
## 5 5 5/1/09 F 99 2 2 1 2 1 Elder
# Applying the is.na() function
is.na(leadership[, 6:10])
## q2 q3 q4 q5 agecat
## [1,] FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE TRUE TRUE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE
# recode 99 to missing for the variable age
leadership[leadership$age == 99, "age"] <- NA
leadership
## manager date gender age q1 q2 q3 q4 q5 agecat
## 1 1 10/24/08 M 32 5 4 5 5 5 Young
## 2 2 10/28/08 F 45 3 5 2 5 5 Young
## 3 3 10/1/08 F 25 3 5 5 5 2 Young
## 4 4 10/12/08 M 39 3 3 4 NA NA Young
## 5 5 5/1/09 F NA 2 2 1 2 1 Elder
# Using na.omit() to delete incomplete observations
newdata <- na.omit(leadership)
newdata
## manager date gender age q1 q2 q3 q4 q5 agecat
## 1 1 10/24/08 M 32 5 4 5 5 5 Young
## 2 2 10/28/08 F 45 3 5 2 5 5 Young
## 3 3 10/1/08 F 25 3 5 5 5 2 Young
# Working with Dates
mydates <- as.Date(c("2007-06-22", "2004-02-13"))
# Converting character values to dates
strDates <- c("01/05/1965", "08/16/1975")
dates <- as.Date(strDates, "%m/%d/%Y")
myformat <- "%m/%d/%y"
leadership$date <- as.Date(leadership$date, myformat)
# Useful date functions
Sys.Date()
## [1] "2016-11-14"
date()
## [1] "Mon Nov 14 21:02:42 2016"
today <- Sys.Date()
format(today, format = "%B %d %Y")
## [1] "November 14 2016"
format(today, format = "%A")
## [1] "Monday"
# Calculations with with dates
startdate <- as.Date("2004-02-13")
enddate <- as.Date("2009-06-22")
days <- enddate - startdate
# Date functions and formatted printing
today <- Sys.Date()
format(today, format = "%B %d %Y")
## [1] "November 14 2016"
dob <- as.Date("1956-10-10")
format(dob, format = "%A")
## [1] "Wednesday"
# Converting from one data type to another
a <- c(1, 2, 3)
a
## [1] 1 2 3
is.numeric(a)
## [1] TRUE
is.vector(a)
## [1] TRUE
a <- as.character(a)
a
## [1] "1" "2" "3"
is.numeric(a)
## [1] FALSE
is.vector(a)
## [1] TRUE
is.character(a)
## [1] TRUE
# Sorting a dataset
attach(leadership)
newdata <- leadership[order(age), ]
newdata
## manager date gender age q1 q2 q3 q4 q5 agecat
## 3 3 2008-10-01 F 25 3 5 5 5 2 Young
## 1 1 2008-10-24 M 32 5 4 5 5 5 Young
## 4 4 2008-10-12 M 39 3 3 4 NA NA Young
## 2 2 2008-10-28 F 45 3 5 2 5 5 Young
## 5 5 2009-05-01 F NA 2 2 1 2 1 Elder
detach(leadership)
attach(leadership)
newdata <- leadership[order(gender, -age), ]
newdata
## manager date gender age q1 q2 q3 q4 q5 agecat
## 2 2 2008-10-28 F 45 3 5 2 5 5 Young
## 3 3 2008-10-01 F 25 3 5 5 5 2 Young
## 5 5 2009-05-01 F NA 2 2 1 2 1 Elder
## 4 4 2008-10-12 M 39 3 3 4 NA NA Young
## 1 1 2008-10-24 M 32 5 4 5 5 5 Young
detach(leadership)
# Selecting variables
newdata <- leadership[, c(6:10)]
myvars <- c("q1", "q2", "q3", "q4", "q5")
newdata <- leadership[myvars]
myvars <- paste("q", 1:5, sep = "")
newdata <- leadership[myvars]
# Dropping variables
myvars <- names(leadership) %in% c("q3", "q4")
newdata <- leadership[!myvars]
newdata <- leadership[c(-7, -8)]
# You could use the following to delete q3 and q4
# from the leadership dataset (commented out so
# the rest of the code in this file will work)
#
# leadership$q3 <- leadership$q4 <- NULL
# Selecting Observations
newdata <- leadership[1:3, ]
newdata <- leadership[which(leadership$gender == "M" &
leadership$age > 30), ]
attach(leadership)
newdata <- leadership[which(leadership$gender == "M" &
leadership$age > 30), ]
detach(leadership)
# Selecting observations based on dates
leadership$date <- as.Date(leadership$date, "%m/%d/%y")
startdate <- as.Date("2009-01-01")
enddate <- as.Date("2009-10-31")
newdata <- leadership[leadership$date >= startdate &
leadership$date <= enddate, ]
# Using the subset() function
newdata <- subset(leadership, age >= 35 | age < 24,
select = c(q1, q2, q3, q4))
newdata <- subset(leadership, gender == "M" & age >
25, select = gender:q4)
# working with graphs
attach(mtcars)
plot(wt, mpg)
abline(lm(mpg ~ wt))
title("Regression of MPG on Weight")
detach(mtcars)
# an example
dose <- c(20, 30, 40, 45, 60)
drugA <- c(16, 20, 27, 40, 60)
drugB <- c(15, 18, 25, 31, 40)
plot(dose, drugA, type = "b")
# --Section 3.3--
opar <- par(no.readonly = TRUE)
par(lty = 2, pch = 17)
plot(dose, drugA, type = "b")
par(opar)
plot(dose, drugA, type = "b", lty = 2, pch = 17)
plot(dose, drugA, type = "b", lty = 3, lwd = 3, pch = 15,
cex = 2)
# Using graphical parameters to control
# graph appearance
dose <- c(20, 30, 40, 45, 60)
drugA <- c(16, 20, 27, 40, 60)
drugB <- c(15, 18, 25, 31, 40)
opar <- par(no.readonly = TRUE)
par(pin = c(2, 3))
par(lwd = 2, cex = 1.5)
par(cex.axis = 0.75, font.axis = 3)
plot(dose, drugA, type = "b", pch = 19, lty = 2, col = "red")
plot(dose, drugB, type = "b", pch = 23, lty = 6, col = "blue",
bg = "green")
par(opar)
# Adding text, customized axes, and legends
plot(dose, drugA, type = "b", col = "red", lty = 2,
pch = 2, lwd = 2, main = "Clinical Trials for Drug A",
sub = "This is hypothetical data",
xlab = "Dosage", ylab = "Drug Response", xlim = c(0, 60),
ylim = c(0, 70))
# An Example of Custom Axes
x <- c(1:10)
y <- x
z <- 10/x
opar <- par(no.readonly = TRUE)
par(mar = c(5, 4, 4, 8) + 0.1)
plot(x, y, type = "b", pch = 21, col = "red", yaxt = "n",
lty = 3, ann = FALSE)
lines(x, z, type = "b", pch = 22, col = "blue", lty = 2)
axis(2, at = x, labels = x, col.axis = "red", las = 2)
axis(4, at = z, labels = round(z, digits = 2), col.axis = "blue",
las = 2, cex.axis = 0.7, tck = -0.01)
mtext("y=1/x", side = 4, line = 3, cex.lab = 1, las = 2,
col = "blue")
title("An Example of Creative Axes", xlab = "X values",
ylab = "Y=X")
par(opar)
# Comparing Drug A and Drug B response by dose
dose <- c(20, 30, 40, 45, 60)
drugA <- c(16, 20, 27, 40, 60)
drugB <- c(15, 18, 25, 31, 40)
opar <- par(no.readonly = TRUE)
par(lwd = 2, cex = 1.5, font.lab = 2)
plot(dose, drugA, type = "b", pch = 15, lty = 1, col = "red",
ylim = c(0, 60), main = "Drug A vs. Drug B", xlab = "Drug Dosage",
ylab = "Drug Response")
lines(dose, drugB, type = "b", pch = 17, lty = 2,
col = "blue")
abline(h = c(30), lwd = 1.5, lty = 2, col = "grey")
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
minor.tick(nx = 3, ny = 3, tick.ratio = 0.5)
legend("topleft", inset = 0.05, title = "Drug Type",
c("A", "B"), lty = c(1, 2), pch = c(15, 17), col = c("red",
"blue"))
par(opar)
# Example of labeling points
attach(mtcars)
## The following object is masked from package:ggplot2:
##
## mpg
plot(wt, mpg, main = "Milage vs. Car Weight", xlab = "Weight",
ylab = "Mileage", pch = 18, col = "blue")
text(wt, mpg, row.names(mtcars), cex = 0.6, pos = 4,
col = "red")
detach(mtcars)
# View font families
opar <- par(no.readonly = TRUE)
par(cex = 1.5)
plot(1:7, 1:7, type = "n")
text(3, 3, "Example of default text")
text(4, 4, family = "mono", "Example of mono-spaced text")
text(5, 5, family = "serif", "Example of serif text")
par(opar)
# combining graphs
attach(mtcars)
## The following object is masked from package:ggplot2:
##
## mpg
opar <- par(no.readonly = TRUE)
par(mfrow = c(2, 2))
plot(wt, mpg, main = "Scatterplot of wt vs. mpg")
plot(wt, disp, main = "Scatterplot of wt vs disp")
hist(wt, main = "Histogram of wt")
boxplot(wt, main = "Boxplot of wt")
par(opar)
detach(mtcars)
attach(mtcars)
## The following object is masked from package:ggplot2:
##
## mpg
opar <- par(no.readonly = TRUE)
par(mfrow = c(3, 1))
hist(wt)
hist(mpg)
hist(disp)
par(opar)
detach(mtcars)
attach(mtcars)
## The following object is masked from package:ggplot2:
##
## mpg
layout(matrix(c(1, 1, 2, 3), 2, 2, byrow = TRUE))
hist(wt)
hist(mpg)
hist(disp)
detach(mtcars)
attach(mtcars)
## The following object is masked from package:ggplot2:
##
## mpg
layout(matrix(c(1, 1, 2, 3), 2, 2, byrow = TRUE),
widths = c(1, 1), heights = c(1, 1))
hist(wt)
hist(mpg)
hist(disp)
detach(mtcars)
# Fine placement of figures in a graph
opar <- par(no.readonly = TRUE)
par(fig = c(0, 0.8, 0, 0.8))
plot(mtcars$wt, mtcars$mpg, xlab = "Miles Per Gallon",
ylab = "Car Weight")
par(fig = c(0, 0.8, 0.55, 1), new = TRUE)
boxplot(mtcars$wt, horizontal = TRUE, axes = FALSE)
par(fig = c(0.65, 1, 0, 0.8), new = TRUE)
boxplot(mtcars$mpg, axes = FALSE)
mtext("Enhanced Scatterplot", side = 3, outer = TRUE,
line = -3)
par(opar)
# pause after each graph
par(ask = TRUE)
# save original graphic settings
opar <- par(no.readonly = TRUE)
# Load vcd package
library(vcd)
## Loading required package: grid
# Get cell counts for improved variable
counts <- table(Arthritis$Improved)
counts
##
## None Some Marked
## 42 14 28
# simple bar plot
barplot(counts, main = "Simple Bar Plot", xlab = "Improvement",
ylab = "Frequency")
# horizontal bar plot
barplot(counts, main = "Horizontal Bar Plot", xlab = "Frequency",
ylab = "Improvement", horiz = TRUE)
# get counts for Improved by Treatment table
counts <- table(Arthritis$Improved, Arthritis$Treatment)
counts
##
## Placebo Treated
## None 29 13
## Some 7 7
## Marked 7 21
# stacked barplot
barplot(counts, main = "Stacked Bar Plot", xlab = "Treatment",
ylab = "Frequency", col = c("red", "yellow", "green"),
legend = rownames(counts))
# grouped barplot
barplot(counts, main = "Grouped Bar Plot", xlab = "Treatment",
ylab = "Frequency", col = c("red", "yellow", "green"),
legend = rownames(counts),
beside = TRUE)
# Mean bar plots
states <- data.frame(state.region, state.x77)
means <- aggregate(states$Illiteracy,
by = list(state.region),
FUN = mean)
means
## Group.1 x
## 1 Northeast 1.000000
## 2 South 1.737500
## 3 North Central 0.700000
## 4 West 1.023077
means <- means[order(means$x), ]
means
## Group.1 x
## 3 North Central 0.700000
## 1 Northeast 1.000000
## 4 West 1.023077
## 2 South 1.737500
barplot(means$x, names.arg = means$Group.1)
title("Mean Illiteracy Rate")
# Fitting labels in bar plots
par(mar = c(5, 8, 4, 2))
par(las = 2)
counts <- table(Arthritis$Improved)
barplot(counts, main = "Treatment Outcome", horiz = TRUE,
cex.names = 0.8, names.arg = c("No Improvement",
"Some Improvement", "Marked Improvement"))
# Spinograms
library(vcd)
attach(Arthritis)
counts <- table(Treatment, Improved)
spine(counts, main = "Spinogram Example")
detach(Arthritis)
# Pie charts
par(mfrow = c(2, 2))
slices <- c(10, 12, 4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France")
pie(slices, labels = lbls, main = "Simple Pie Chart")
pct <- round(slices/sum(slices) * 100)
lbls2 <- paste(lbls, " ", pct, "%", sep = "")
pie(slices, labels = lbls2, col = rainbow(length(lbls)),
main = "Pie Chart with Percentages")
library(plotrix)
pie3D(slices, labels = lbls, explode = 0.1, main = "3D Pie Chart ")
mytable <- table(state.region)
lbls <- paste(names(mytable), "\n", mytable, sep = "")
pie(mytable, labels = lbls,
main = "Pie Chart from a Table\n (with sample sizes)")
# restore original graphic parameters
par(opar)
# fan plots
library(plotrix)
slices <- c(10, 12, 4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France")
fan.plot(slices, labels = lbls, main = "Fan Plot")
# Histograms
par(mfrow = c(2, 2))
hist(mtcars$mpg)
hist(mtcars$mpg, breaks = 12, col = "red",
xlab = "Miles Per Gallon",
main = "Colored histogram with 12 bins")
hist(mtcars$mpg, freq = FALSE, breaks = 12, col = "red",
xlab = "Miles Per Gallon",
main = "Histogram, rug plot, density curve")
rug(jitter(mtcars$mpg))
lines(density(mtcars$mpg), col = "blue", lwd = 2)
# Histogram with Superimposed Normal Curve
# (Thanks to Peter Dalgaard)
x <- mtcars$mpg
h <- hist(x, breaks = 12, col = "red",
xlab = "Miles Per Gallon",
main = "Histogram with normal curve and box")
xfit <- seq(min(x), max(x), length = 40)
yfit <- dnorm(xfit, mean = mean(x), sd = sd(x))
yfit <- yfit * diff(h$mids[1:2]) * length(x)
lines(xfit, yfit, col = "blue", lwd = 2)
box()
# restore original graphic parameters
par(opar)
# Kernel density plot
par(mfrow = c(2, 1))
d <- density(mtcars$mpg)
plot(d)
d <- density(mtcars$mpg)
plot(d, main = "Kernel Density of Miles Per Gallon")
polygon(d, col = "red", border = "blue")
rug(mtcars$mpg, col = "brown")
# restore original graphic parameters
par(opar)
# Comparing kernel density plots
par(lwd = 2)
library(sm)
## Package 'sm', version 2.2-5.4: type help(sm) for summary information
attach(mtcars)
## The following object is masked from package:ggplot2:
##
## mpg
cyl.f <- factor(cyl, levels = c(4, 6, 8),
labels = c("4 cylinder", "6 cylinder", "8 cylinder"))
sm.density.compare(mpg, cyl, xlab = "Miles Per Gallon")
title(main = "MPG Distribution by Car Cylinders")
colfill <- c(2:(1 + length(levels(cyl.f))))
cat("Use mouse to place legend...", "\n\n")
## Use mouse to place legend...
#legend(locator(1), levels(cyl.f), fill = colfill)
detach(mtcars)
par(lwd = 1)
# Box Plot
boxplot(mpg ~ cyl, data = mtcars,
main = "Car Milage Data",
xlab = "Number of Cylinders",
ylab = "Miles Per Gallon")
boxplot(mpg ~ cyl, data = mtcars, notch = TRUE,
varwidth = TRUE, col = "red",
main = "Car Mileage Data",
xlab = "Number of Cylinders",
ylab = "Miles Per Gallon")
## Warning in bxp(structure(list(stats = structure(c(21.4, 22.8, 26, 30.4, :
## some notches went outside hinges ('box'): maybe set notch=FALSE
# Box plots for two crossed factors
mtcars$cyl.f <- factor(mtcars$cyl, levels = c(4, 6,
8), labels = c("4", "6", "8"))
mtcars$am.f <- factor(mtcars$am, levels = c(0, 1),
labels = c("auto", "standard"))
boxplot(mpg ~ am.f * cyl.f, data = mtcars,
varwidth = TRUE, col = c("gold", "darkgreen"),
main = "MPG Distribution by Auto Type",
xlab = "Auto Type")
# Violin plots
library(vioplot)
x1 <- mtcars$mpg[mtcars$cyl == 4]
x2 <- mtcars$mpg[mtcars$cyl == 6]
x3 <- mtcars$mpg[mtcars$cyl == 8]
vioplot(x1, x2, x3,
names = c("4 cyl", "6 cyl", "8 cyl"),
col = "gold")
title("Violin Plots of Miles Per Gallon")
# dotchart
dotchart(mtcars$mpg, labels = row.names(mtcars),
cex = 0.7,
main = "Gas Milage for Car Models",
xlab = "Miles Per Gallon")
# sorted colored grouped dot chart
x <- mtcars[order(mtcars$mpg), ]
x$cyl <- factor(x$cyl)
x$color[x$cyl == 4] <- "red"
x$color[x$cyl == 6] <- "blue"
x$color[x$cyl == 8] <- "darkgreen"
dotchart(x$mpg, labels = row.names(x), cex = 0.7,
pch = 19, groups = x$cyl,
gcolor = "black", color = x$color,
main = "Gas Milage for Car Models\ngrouped by cylinder",
xlab = "Miles Per Gallon")