As we are analyzing data, so it’s really helpful to visually represent the data to support our theoretical arguments. And R has some really helpful functions for us. Lets see those plots.
If we simply use the command plot() then R will plot the values of an input vector in sequence, automatically scaling the axes.
x <- c(1, 3, 2)
plot(x)
plot(type = “p”, col = “red”, xlim = c(0, 10), ylim = c(0, 10), lwd = 6, main = “Graph name”, xlab = “X-axis”, ylab = “Y-axis”)
x <- c(1, 3, 4)
plot(x, type = "p", col="red", xlim = c(0, 4), ylim = c(0, 4), lwd = 4)
plot(x, type = "b", col="green", xlim = c(0, 4), ylim = c(0, 4), lwd = 4)
plot(x, type = "o", col="blue", xlim = c(0, 4), ylim = c(0, 4), lwd = 4)
plot(x, type = "s", col = "red", xlim = c(0, 4), ylim = c(0, 4), lwd = 4)
plot(x, type = "h", col = "green", xlim = c(0, 4), ylim = c(0, 4), lwd = 4)
If there is a plot that already exists then to draw another subsequent plot on the graph we will use ‘points()’ function.
x <- c(1:7)
y <- c(3, 4, 1, 5, 2, 7, 6)
plot(x, y, type = "l", col = "gray45",
lwd = 3,
main = "It's a straight line ~_~",
xlab = "X-Values",
ylab = "Y-Values")
z1 <- c(1:5)
z2 <- c(2.3, 1.5, 5, 3.4, 2)
points(z1, z2, type = "l", col = "red4", lwd = 3)
We can use the same plot() function to plot two vectors against each other.
vec <- c(1:23)
plot(x = vec, y = vec^2, col = "green4", lwd = 4)
Here we can see the propagation of the equation, \[y = x^2\]
Box plots are based around the list data structure. By using lists, each elements of the list can be represented by its own box and thus doesn’t need to be the same length.
myList <- list(c(5:10), 3.2, c(1:13))
boxplot(myList)
There is a set of functions in R that can create a file in which to draw graphical outputs: png(), jpeg(), bitmap().
png(file = "R_basics_three_boxplot.png")
boxplot(myList)
dev.off()
Remember to close the opened file using “dev.off()” otherwise next we you want to save a file in your secondary storage devices it will not store the file, throwing an error saying the file is still open.
We have already see a function with cute graphical capabilities but R has developed a more cuter graphical tool called ‘ggplot2’.
ggplot stands for ‘grammar of graphics plot’.
If you don’t have ‘ggplot2’ in your environment then install it by the following manner.
Now remember that ggplot2 takes data frames as input. Let’s say this is the viability of a cell line, treated with a drug in two different cell culture conditions. Because we will draw the viability values randomly, we are setting the seed of the random number generator to a specific value.
set.seed(10)
Here we use set.seed() function that doesn’t create an output directly. Instead it alters the internal state of the R workspace. This function ensures that we will get the same results for randomization. Now we can generate some (pseudo-)random numbers to draw.
viability <- rnorm(40)
viability
## [1] 0.01874617 -0.18425254 -1.37133055 -0.59916772 0.29454513 0.38979430
## [7] -1.20807618 -0.36367602 -1.62667268 -0.25647839 1.10177950 0.75578151
## [13] -0.23823356 0.98744470 0.74139013 0.08934727 -0.95494386 -0.19515038
## [19] 0.92552126 0.48297852 -0.59631064 -2.18528684 -0.67486594 -2.11906119
## [25] -1.26519802 -0.37366156 -0.68755543 -0.87215883 -0.10176101 -0.25378053
## [31] -1.85374045 -0.07794607 0.96856634 0.18492596 -1.37994358 -1.43551436
## [37] 0.36208723 -1.75908675 -0.32454401 -0.65156299
We use ‘rnorm()’ function to create an object with 40 random values drawn from a standard normal distribution.
treatment <- rep(c("control", "treated"), 20)
The function ‘rep()’ stands for repeat, it will repeat the two strings 20 times.
culture <- rep(c("Media1", "Media2"), 20)
Now we will combine these three objects together into a data frame.
plotDF <- data.frame(viability = viability, treatment = treatment, culture = culture)
head(plotDF)
We have a data frame with three named columns. We already installed the ‘ggplot2’ package. Let’s load the library into our workspace.
library("ggplot2")
ggplot(plotDF, aes(x = culture,
y = viability,
fill = treatment)) +
geom_boxplot() +
geom_point(position = position_jitterdodge())
The ggplot2 is made of three basic elements: Plot = Data + Aesthetics + Geometry
Data ~>
Aesthetics ~> It is used to represent x and y in a graph. It can alter the color, size, dots, the height of the bars etc.
Geometry ~> It defines the graphics type, i.e., scatter plot, bar plot, jitter plot etc.
It’s important to note that you have to use the addition (+) operator to add the geom layer.
ggplot(plotDF, aes(x = culture,
y = viability,
col = treatment)) +
geom_point()
ggplot(plotDF, aes(x = culture,
y = viability,
col = treatment,
shape = treatment)) +
geom_point()