At the end of the session, the participants are expected to
R is a programming language and environment that is used widely for statistical data manipulation and analysis.
R has become popular because it’s free and people can contribute to the development of R.
R is the defacto standard among professional statisticians.
comparable, and often superior, in power to commercial products.
available for Windows, Mac, Linux.
in addition to enabling statistical operations, it’s a general programming language, so that you can automate your analyses and create new functions.
it’s easy to get help from the user community, and lots of new functions get contributed by users, many of which are prominent statisticians.
it has a feature that supports the use of other programming languages, such as Python, SQL and the like.
it has a feature for reproducible research.
To install R
Open an internet browser and go to www.r-project.org
Click the “download R” link in the middle of the page under “Getting Started.”
Select a Comprehensive R Archive Network (CRAN) location (a mirror site) and click the corresponding link.Look for the Philippines.
Click on the “Download R for Windows” link if you are using Windows. Otherwise, Click on the “Download R for (Mac) OS X” or “Download R for linux”.
Click “base”.
Click “Download R 4.0.2 for Windows”.
Note that R is constantly being updated and so new version will be available from time to time. When you navigate to the website page, by default, you will see the latest version.Download the latest version of R.
Double-click the downloaded file to open, and follow the installation instructions.
Now that R is installed, you need to download and install RStudio.
RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management. Click here to see more RStudio features.
For ease of use, navigation and computation, we will be using RStudio throughout our lessons.
To Install R Studio
Go to RStudio
Click on “Products”.
Under Open Source, click “RStudio”.
Scroll down the page, and click RStudio Desktop.
Choose the open source edition which is free.
Click on the version recommended for your system.
To get started with using R Studio, open your RStudio. You will see the following R Studio interface where the window is partitioned into four major panes.
Figure 1
RStudio RStudio is a four pane work-space for
creating file containing R script,
typing R commands,
viewing command histories,
viewing plots and more.
Code editor allows you to create and open a file containing R script. The R script is where you keep a record of your work. R script can be created as follow: File –> New –> R Script.
R console is for typing R commands.
Environment Pane: shows the list of R objects you created during your R session.
History Pane: shows the history of all previous commands
Connections Pane shows you all the connections you have made to supported data sources, and lets you know which connections are currently active.
Tutorial Pane is powered by the learnr package which hosts the tutorials of the use of different packages.
Files tab: show files in your working directory
Plots tab: show the history of plots you created. From this tab, you can export a plot to a PDF or an image files.
Packages tab: show external R packages available on your system. If checked, the package is loaded in R.
Viewer : To know more, click here
To change the appearance of RStudio: Tools > Global Options > Appearance
You can use the Global options to customize the Code, Pane layout and other parts of RStudio. Take time to explore.
For more info about RStudio read the Online Documentation.
Recall that, the working directory is a folder where R reads and saves files.
Check your existing working directory
To check your existing working directory: Type getwd() in the R Console and press enter.
## [1] "C:/Users/Roel Ceballos/Dropbox/ROEL/R Training"
Change your working directory
From RStudio, use the menu to change your working directory under Session > Set Working Directory > Choose Directory.
R can be used to perform calculations. For a start, we have the basic arithmetic operators:
## [1] 10
## [1] 4
## [1] 21
## [1] 4
## [1] 49
Basic arithmetic functions
Logarithms and Exponentials:
Trigonometric functions:
cos(x) # Cosine of x
sin(x) # Sine of x
tan(x) #Tangent of x
acos(x) # arc-cosine of x
asin(x) # arc-sine of x
atan(x) #arc-tangent of xAlmost all mathematical operations and functions can be found in R, meaning you don’t have to create them from scratch. All you have to do is call the function if it is contained in the base package. If not, install and load the package that contains the function. Make sure to input the right argument of the function.
The base package, which is loaded when you install R, contains the most commonly use of math functions. Some useful built-in functions can be found here.
On the other hand, advanced mathematical and statistical functions such as those that are used in Time series analysis, Differential equations, Optimization, and Mathematical Modeling can be found in other R packages that you can easily install and use.
CRAN has created Task Views for these advanced mathematical and statistical operations. It aims to serve as a guide for users. For instance, you may click on the hyperlink below to check these Task Views for DE, Optimization and Time series.
A variable can be used to store a value.
For example, the R code below will store the price of a lemon in a variable, say “lemon_price”:
Note that it’s possible to use <- or = for variable assignments.
Note that R is case-sensitive. This means that lemon_price is different from Lemon_Price.
Note that # is use to make a comment
To print the value of the created object, just type its name:
## [1] 2
or use the function print():
## [1] 2
R saves the object lemon_price (also known as a variable) in memory. It’s possible to make some operations with it.
## [1] 10
You can change the value of the object:
## [1] 5
The following R code creates two variables holding the width and the height of a rectangle. These two variables will be used to compute the area of the rectangle.
# Rectangle height
height <- 10
# rectangle width
width <- 5
# compute rectangle area
area <- height*width
print(area)## [1] 50
The function ls() can be used to see the list of objects we have created:
## [1] "area" "height" "lemon_price" "width"
Note that, each variable takes some place in the computer memory. If you work on a big project, it’s good to clean up your workspace.
To remove a variable, use the function rm():
## [1] "area" "lemon_price"
Basic data types are numeric, character and logical.
# Numeric object: How old are you?
my_age <- 18
# Character object: What's your name?
my_name <- "Nicolas"
# logical object: Are you a data scientist?
# (yes/no) <=> (TRUE/FALSE)
is_datascientist <- TRUE*Note that, character vector can be created using double (“) or single (’) quotes.
## [1] "My Friend's name is Jerome"
It’s possible to use the function class() to see what type a variable is:
## [1] "numeric"
## [1] "character"
You can also use the functions is.numeric(), is.character(), is.logical() to check whether a variable is numeric, character or logical, respectively. For instance:
## [1] TRUE
## [1] FALSE
If you want to change the type of a variable to another one, use the as.* functions, including: as.numeric(), as.character(), as.logical(), etc.
## [1] 18
## [1] "18"
Note that the conversion of a character to a numeric will output NA (for not available). R doesn’t know how to convert a numeric variable to a character variable.
A vector is a combination of multiple values (numeric, character or logical) in the same object. In this case, you can have numeric vectors, character vectors or logical vectors.
Create a vector
A vector is created using the function c() (for concatenate), as follow:
# Store your friends'age in a numeric vector
friend_ages <- c(27, 25, 29, 26) # Create
friend_ages # Print## [1] 27 25 29 26
# Store your friend names in a character vector**
my_friends <- c("Nicolas", "Thierry", "Bernard", "Jerome")
my_friends ## [1] "Nicolas" "Thierry" "Bernard" "Jerome"
# Store your friends marital status in a logical vector
# Are they married? (yes/no <=> TRUE/FALSE)
are_married <- c(TRUE, FALSE, TRUE, TRUE)
are_married## [1] TRUE FALSE TRUE TRUE
It’s possible to give a name to the elements of a vector using the function names().
## [1] 27 25 29 26
# Vector with element names
names(friend_ages) <- c("Nicolas", "Thierry", "Bernard", "Jerome")
friend_ages## Nicolas Thierry Bernard Jerome
## 27 25 29 26
You can also create a named vector as follow
## Nicolas Thierry Bernard Jerome
## 27 25 29 26
Note that a vector can only hold elements of the same type. For example, you cannot have a vector that contains both characters and numeric values.
Find the length of a vector (i.e., the number of elements in a vector)
## [1] 4
Case of missing values
Suppose you know that some of your friends (Nicolas and Thierry) have 2 child. But this information is not available (NA) for the remaining friends (Bernard and Jerome).
In R missing values (or missing information) are represented by NA:
## Nicolas Thierry Bernard Jerome
## "yes" "yes" NA NA
It’s possible to use the function is.na() to check whether a data contains missing value. The result of the function is.na() is a logical vector in which, the value TRUE specifies that the corresponding element in x is NA.
## Nicolas Thierry Bernard Jerome
## FALSE FALSE TRUE TRUE
Note that there is a second type of missing values named NaN (“Not a Number”). This is produced in a situation where mathematical function won’t work properly, for example 0/0 = NaN.
Note also that the function is.na() is TRUE for both NA and NaN values. To differentiate these, the function is.nan() is only TRUE for NaNs.
Get a subset of a vector
Subsetting a vector consists of selecting a part of your vector.
## [1] "Thierry"
## [1] "Thierry" "Jerome"
## [1] "Nicolas" "Thierry" "Bernard"
Note that R indexes from 1, NOT 0. So your first column is at [1] and not [0].
If you have a named vector, it’s also possible to use the name for selecting an element:
## Bernard
## 29
## [1] "Nicolas" "Bernard" "Jerome"
## [1] "Nicolas" "Bernard"
## [1] "Jerome"
## [1] "Nicolas" "Bernard" "Jerome"
## [1] "Nicolas" "Bernard"
## [1] "Thierry" "Bernard" "Jerome"
If you want to remove missing data, use this:
## Nicolas Thierry Bernard Jerome
## "yes" "yes" NA NA
## Nicolas Thierry
## "yes" "yes"
## Nicolas Thierry Bernard Jerome
## "NO" "NO" NA NA
Note that, the “logical” comparison operators available in R are:
<: for less than
>: for greater than
<=: for less than or equal to
>=: for greater than or equal to
==: for equal to each other
!=: not equal to each otherCalculations with vectors
Note that, all the basic arithmetic operators (+, -, *, / and ^ ) as well as the common arithmetic functions (log, exp, sin, cos, tan, sqrt, abs, …), described in the previous sections, can be applied on a numeric vector.
If you perform an operation with vectors, the operation will be applied to each element of the vector. An example is provided below:
# My friends' salary in dollars
salaries <- c(2000, 1800, 2500, 3000)
names(salaries) <- c("Nicolas", "Thierry", "Bernard", "Jerome")
salaries## Nicolas Thierry Bernard Jerome
## 2000 1800 2500 3000
## Nicolas Thierry Bernard Jerome
## 4000 3600 5000 6000
As you can see, R multiplies each element in the salaries vector with 2.
Now, suppose that you want to multiply the salaries by different coefficients. The following R code can be used:
# create coefs vector with the same length as salaries
coefs <- c(2, 1.5, 1, 3)
# Multiply salaries by coeff
salaries*coefs## Nicolas Thierry Bernard Jerome
## 4000 2700 2500 9000
Note that the calculation is done element-wise. The first element of salaries vector is multiplied by the first element of coefs vector, and so on.
Compute the square root of a numeric vector:
## [1] 2 4 3
Other useful functions are:
max(x) # Get the maximum value of x
min(x) # Get the minimum value of x
range(x)# Get the range of x. Returns a vector containing the minimum and the maximum of x
length(x) # Get the number of elements in x
sum(x) # Get the total of the elements in x
prod(x) # Get the product of the elements in x
mean(x) = sum(x)/length(x)
sd(x) # Standard deviation of x
var(x) # Variance of x
sort(x) # Sort the element of x in ascending orderFor example, if you want to compute the total sum of salaries, type this:
## [1] 9300
Compute the mean of salaries:
## [1] 2325
The range (minimum, maximum) of salaries is:
## [1] 1800 3000
Please be cautious in generating descriptive statistics because some measures are more appropriate than the others in specific situations. For example, when there are outliers in the data, the median is more appropriate than the mean.
A matrix is like an Excel sheet containing multiple rows and columns. It’s used to combine vectors with the same type, which can be either numeric, character or logical. Matrices are used to store a data table in R. The rows of a matrix are generally individuals/observations and the columns are variables.
To create easily a matrix, use the function cbind() or rbind() as follow:
# Numeric vectors
col1 <- c(5, 6, 7, 8, 9)
col2 <- c(2, 4, 5, 9, 8)
col3 <- c(7, 3, 4, 8, 7)
# Combine the vectors by column
my_data <- cbind(col1, col2, col3)
my_data## col1 col2 col3
## [1,] 5 2 7
## [2,] 6 4 3
## [3,] 7 5 4
## [4,] 8 9 8
## [5,] 9 8 7
## col1 col2 col3
## row1 5 2 7
## row2 6 4 3
## row3 7 5 4
## row4 8 9 8
## row5 9 8 7
If you want to transpose your data, use the function t():
## row1 row2 row3 row4 row5
## col1 5 6 7 8 9
## col2 2 4 5 9 8
## col3 7 3 4 8 7
Note that it’s also possible to construct a matrix using the function matrix().
The simplified format of matrix() is as follow:
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE,
dimnames = NULL)
In the R code below, the input data has length 6. We want to create a matrix with two columns. You don’t need to specify the number of rows (here nrow = 3). R will infer this automatically. The matrix is filled column by column when the argument byrow = FALSE. If you want to fill the matrix by rows, use byrow = TRUE.
mdat <- matrix(
data = c(1,2,3, 11,12,13),
nrow = 2, byrow = TRUE,
dimnames = list(c("row1", "row2"), c("C.1", "C.2", "C.3"))
)
mdat## C.1 C.2 C.3
## row1 1 2 3
## row2 11 12 13
The R functions nrow() and ncol() return the number of rows and columns present in the data, respectively.
## [1] 3
## [1] 5
## [1] 5 3
Rows and/or columns can be selected as follow: my_data[row, col]
## col1 col2 col3
## 6 4 3
## col1 col2 col3
## row2 6 4 3
## row3 7 5 4
## row4 8 9 8
## col1 col2 col3
## row2 6 4 3
## row4 8 9 8
## row1 row2 row3 row4 row5
## 7 3 4 8 7
## [1] 3
## row1 row2 row3 row4 row5
## 2 4 5 9 8
## [1] 5
## col2 col3
## row1 2 7
## row2 4 3
## row3 5 4
## row4 9 8
## row5 8 7
## col1 col2 col3
## row1 5 2 7
## row3 7 5 4
## row4 8 9 8
## row5 9 8 7
## col1 col2 col3
## row1 10 4 14
## row2 12 8 6
## row3 14 10 8
## row4 16 18 16
## row5 18 16 14
Or, compute the log2 values:
## col1 col2 col3
## row1 2.321928 1.000000 2.807355
## row2 2.584963 2.000000 1.584963
## row3 2.807355 2.321928 2.000000
## row4 3.000000 3.169925 3.000000
## row5 3.169925 3.000000 2.807355
## row1 row2 row3 row4 row5
## 14 13 16 25 24
## col1 col2 col3
## 35 28 29
If you are interested in row/column means, you can use the function rowMeans() and colMeans() for computing row and column means, respectively.
Note that it’s also possible to use the function apply() to apply any statistical functions to rows/columns of matrices.
The simplified format of apply() is as follow:
apply(X, MARGIN, FUN)
Use apply() as follow:
## row1 row2 row3 row4 row5
## 4.666667 4.333333 5.333333 8.333333 8.000000
## row1 row2 row3 row4 row5
## 5 4 5 8 8
## col1 col2 col3
## 7.0 5.6 5.8
Factor variables represent categories or groups in your data. The function factor() can be used to create a factor variable.
## [1] 1 2 1 2
## Levels: 1 2
The variable friend_groups contains two categories of friends: 1 and 2. In R terminology, categories are called factor levels.
It’s possible to access to the factor levels using the function levels():
## [1] "1" "2"
## [1] best_friend not_best_friend best_friend not_best_friend
## Levels: best_friend not_best_friend
Note that, R orders factor levels alphabetically. If you want a different order in the levels, you can specify the levels argument in the factor function as follow.
# Change the order of levels
friend_groups <- factor(friend_groups,
levels = c("not_best_friend", "best_friend"))
# Print
friend_groups## [1] best_friend not_best_friend best_friend not_best_friend
## Levels: not_best_friend best_friend
Note that:
## [1] TRUE
## [1] TRUE FALSE TRUE TRUE
## Levels: FALSE TRUE
If you want to know the number of individuals in each levels, use the function summary():
## not_best_friend best_friend
## 2 2
In the following example, I want to compute the mean salary of my friends by groups. The function tapply() can be used to apply a function, here mean(), to each group.
## Nicolas Thierry Bernard Jerome
## 2000 1800 2500 3000
## [1] best_friend not_best_friend best_friend not_best_friend
## Levels: not_best_friend best_friend
# Compute the mean salaries by groups
mean_salaries <- tapply(salaries, friend_groups, mean)
mean_salaries## not_best_friend best_friend
## 2400 2250
## not_best_friend best_friend
## 2 2
It’s also possible to use the function table() to create a frequency table, also known as a contingency table of the counts at each combination of factor levels.
## friend_groups
## not_best_friend best_friend
## 2 2
# Cross-tabulation between
# friend_groups and are_married variables
table(friend_groups, are_married)## are_married
## friend_groups FALSE TRUE
## not_best_friend 1 1
## best_friend 0 2
A data frame is like a matrix but can have columns with different types (numeric, character, logical). Rows are observations (individuals) and columns are variables.
A data frame can be created using the function data.frame(), as follow:
# Create a data frame
friends_data <- data.frame(
name = my_friends,
age = friend_ages,
height = c(180, 170, 185, 169),
married = are_married
)
# Print
friends_dataTo check whether a data is a data frame, use the is.data.frame() function. Returns TRUE if the data is a data frame:
## [1] TRUE
## [1] FALSE
The object “friends_data” is a data frame, but not the object “my_data”. We can convert-it to a data frame using the as.data.frame() function:
## [1] "matrix" "array"
# Convert it as a data frame
my_data2 <- as.data.frame(my_data)
# Now, the class is data.frame
class(my_data2)## [1] "data.frame"
As described in matrix section, you can use the function t() to transpose a data frame:
## Nicolas Thierry Bernard Jerome
## name "Nicolas" "Thierry" "Bernard" "Jerome"
## age "27" "25" "29" "26"
## height "180" "170" "185" "169"
## married "TRUE" "FALSE" "TRUE" "TRUE"
To select just certain columns from a data frame, you can either refer to the columns by name or by their location (i.e., column 1, 2, 3, etc.).
1.) Positive indexing by name and by location
## [1] "Nicolas" "Thierry" "Bernard" "Jerome"
## [1] "Nicolas" "Thierry" "Bernard" "Jerome"
2.) Index by characteristics
We want to select all friends with age >= 27.
## [1] TRUE FALSE TRUE FALSE
TRUE specifies that the row contains a value of age >= 27.
The R code above, tells R to get all rows from friends_data where age >= 27, and then to return all the columns.
If you don’t want to see all the column data for the selected rows but are just interested in displaying, for example, friend names and age for friends with age >= 27, you could use the following R code:
If you’re finding that your selection statement is starting to be inconvenient, you can put your row and column selections into variables first, such as:
Then you can select the rows and columns with those variables:
It’s also possible to use the function subset() as follow.
Another option is to use the functions attach() and detach(). The function attach() takes a data frame and makes its columns accessible by simply giving their names.
The functions attach() and detach() can be used as follow:
Add new column in a data frame
It’s also possible to use the functions cbind() and rbind() to extend a data frame.
With numeric data frame, you can use the function rowSums(), colSums(), colMeans(), rowMeans() and apply() as described in matrix section.
A list is an ordered collection of objects, which can be vectors, matrices, data frames, etc. In other words, a list can contain all kind of R objects.
# Create a list
my_family <- list(
mother = "Veronique",
father = "Michel",
sisters = c("Alicia", "Monica"),
sister_age = c(12, 22)
)
# Print
my_family## $mother
## [1] "Veronique"
##
## $father
## [1] "Michel"
##
## $sisters
## [1] "Alicia" "Monica"
##
## $sister_age
## [1] 12 22
## [1] "mother" "father" "sisters" "sister_age"
## [1] 4
The list object “my_family”, contains four components, which may be individually referred to as my_family[[1]], as_family[[2]] and so on.
It’s possible to select an element, from a list, by its name or its index:
## [1] "Michel"
## [1] "Michel"
## [1] "Veronique"
## [1] "Alicia" "Monica"
# Select a specific element of a component
# select the first ([1]) element of my_family[[3]]
my_family[[3]][1] ## [1] "Alicia"
Note that it’s possible to extend an original list.
In the R code below, we want to add the components “grand_father” and “grand_mother” to my_family list object:
# Extend the list
my_family$grand_father <- "John"
my_family$grand_mother <- "Mary"
# Print
my_family## $mother
## [1] "Veronique"
##
## $father
## [1] "Michel"
##
## $sisters
## [1] "Alicia" "Monica"
##
## $sister_age
## [1] 12 22
##
## $grand_father
## [1] "John"
##
## $grand_mother
## [1] "Mary"
You can also concatenate three lists as follow:
The result is a list also, whose components are those of the argument lists joined together in sequence.
To read more about a given function, for example mean, the R function help() can be used as follow:
Or use this:
The output look like this:
Figure 2
If you want to see some examples of how to use the function, type this: example(function_name).
##
## sum> ## Pass a vector to sum, and it will add the elements together.
## sum> sum(1:5)
## [1] 15
##
## sum> ## Pass several numbers to sum, and it also adds the elements.
## sum> sum(1, 2, 3, 4, 5)
## [1] 15
##
## sum> ## In fact, you can pass vectors into several arguments, and everything gets added.
## sum> sum(1:2, 3:5)
## [1] 15
##
## sum> ## If there are missing values, the sum is unknown, i.e., also missing, ....
## sum> sum(1:5, NA)
## [1] NA
##
## sum> ## ... unless we exclude missing values explicitly:
## sum> sum(1:5, NA, na.rm = TRUE)
## [1] 15
Note that, typical R help files contain the following sections:
If you want to read the general documentation about R, use the function help.start():
The output look like this:
Figure 3
## [1] "elNamed" "elNamed<-" "median" "median.default"
## [5] "medpolish" "runmed"
An R package is an extension of R containing data sets and specific functions to solve specific questions.
R comes with standard (or base) packages, which contain the basic functions and data sets as well as standard statistical and graphical functions that allow R to work.
There are also thousands other R packages available for download and installation from CRAN, Bioconductor and GitHub repositories.
After installation, you must first load the package for using the functions in the package.
The function install.packages() is used to install a package from CRAN. The syntax is as follow:
For example, to install the package named readr, type this:
Note that every time you install an R package, R may ask you to specify a CRAN mirror (or server). Choose one that’s close to your location, and R will connect to that server to download and install the package files.
It’s also possible to install multiple packages at the same time, as follow:
or you can use the RStudio point and click options to do this: Packages > Install > type the name of the package you want to install.
Figure 4
Bioconductor contains packages for analyzing biological related data. In the following R code, we want to install the R/Bioconductor package limma, which is dedicated to analyse genomic data.
To install a package from Bioconductor, use this:
GitHub is a repository useful for all software development and data analysis, including R packages. It makes sharing your package easy. You can read more about GitHub here: Git and GitHub, by Hadley Wickham.
To install a package from GitHub, the R package devtools (by Hadley Wickham) can be used. You should first install devtools if you don’t have it installed on your computer.
For example, the following R code installs the latest version of survminer R package developed by A. Kassambara (https://github.com/kassambara/survminer).
To view the list of the already installed packages on your computer, type :
or you can go to Packages tab to check the install packages.
Figure 5
R packages are installed in a directory called library. The R function .libPaths() can be used to get the path to the library.
## [1] "C:/Users/Roel Ceballos/Documents/R/win-library/4.0"
## [2] "C:/Program Files/R/R-4.0.2/library"
To use a specific function available in an R package, you have to load the R package using the function library().
In the following R code, we want to import a file into R using the R package readr, which has been installed in the previous section.
The function read_tsv() [in readr] can be used to import a tab separated .txt file:
## Parsed with column specification:
## cols(
## name = col_character(),
## `100m` = col_double(),
## Long.jump = col_double(),
## Shot.put = col_double(),
## High.jump = col_double(),
## `400m` = col_double(),
## `110m.hurdle` = col_double(),
## Discus = col_double(),
## Pole.vault = col_double(),
## Javeline = col_double(),
## `1500m` = col_double(),
## Rank = col_double(),
## Points = col_double(),
## Competition = col_double()
## )
To view the list of loaded (or attached) packages during an R session, use the function search():
If you’re done with the package readr and you want to unload it, use the function detach():
To remove an installed R package, use the function remove.packages() as follow:
To see the list of pre-loaded data, type the function data():
The output is as follow:
Figure 7
Load and print mtcars data as follow:
If you want learn more about mtcars data sets, type this:
mtcars: Motor Trend Car Road Tests
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models)
View the content of mtcars data set:
## [1] 32
## [1] 11
If you want to learn more about mtcars, type this:
iris
iris data set gives the measurements in centimeters of the variables sepal length, sepal width, petal length and petal width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.
ToothGrowth
ToothGrowth data set contains the result from an experiment studying the effect of vitamin C on tooth growth in 60 Guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).
PlantGrowth
Results obtained from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment condition.
USArrests
This data set contains statistics about violent crime rates by us state.