1 Plots

As we are analyzing data, so it’s really helpful to visually represent the data to support our theoretical arguments. And R has some really helpful functions for us. Lets see those plots.

1.1 Simple scatterplot

If we simply use the command plot() then R will plot the values of an input vector in sequence, automatically scaling the axes.

x <- c(1, 3, 2)

plot(x)

plot(type = “p”, col = “red”, xlim = c(0, 10), ylim = c(0, 10), lwd = 6, main = “Graph name”, xlab = “X-axis”, ylab = “Y-axis”)

1.1.1 Arguments of plot()

type ~> What type of plot should be drawn.
- “p” -> points
- “l” -> lines
- “b” -> both points with lines
- “c” -> lines part along the “b”
- “s” -> stair steps
- “n” -> no plotting
col ~> Stands for colors
xlim ~> Limits for x axis
ylim ~> Limits for y axis
lwd ~> Line width
main ~> Graph name
xlab ~> Name of your x-axis
ylab ~> Name of your y-axis

x <- c(1, 3, 4)

plot(x, type = "p", col="red", xlim = c(0, 4), ylim = c(0, 4), lwd = 4)

plot(x, type = "b", col="green", xlim = c(0, 4), ylim = c(0, 4), lwd = 4)

plot(x, type = "o", col="blue", xlim = c(0, 4), ylim = c(0, 4), lwd = 4)

plot(x, type = "s", col = "red", xlim = c(0, 4), ylim = c(0, 4), lwd = 4)

plot(x, type = "h", col = "green", xlim = c(0, 4), ylim = c(0, 4), lwd = 4)

1.1.2 Multiple plotting

If there is a plot that already exists then to draw another subsequent plot on the graph we will use ‘points()’ function.

x <- c(1:7)
y <- c(3, 4, 1, 5, 2, 7, 6)

plot(x, y, type = "l", col = "gray45",
    lwd = 3,
    main = "It's a straight line ~_~",
    xlab = "X-Values",
    ylab = "Y-Values")

z1 <- c(1:5)

z2 <- c(2.3, 1.5, 5, 3.4, 2)

points(z1, z2, type = "l", col = "red4", lwd = 3)

1.1.3 Multiple variable plotting

We can use the same plot() function to plot two vectors against each other.

vec <- c(1:23)

plot(x = vec, y = vec^2, col = "green4", lwd = 4)

Here we can see the propagation of the equation, \[y = x^2\]

1.2 Box plots

Box plots are based around the list data structure. By using lists, each elements of the list can be represented by its own box and thus doesn’t need to be the same length.

myList <- list(c(5:10), 3.2, c(1:13))

boxplot(myList)

1.2.1 Saving boxplot

There is a set of functions in R that can create a file in which to draw graphical outputs: png(), jpeg(), bitmap().

png(file = "R_basics_three_boxplot.png")

boxplot(myList)

dev.off()

Remember to close the opened file using “dev.off()” otherwise next we you want to save a file in your secondary storage devices it will not store the file, throwing an error saying the file is still open.

2 ggplot2

We have already see a function with cute graphical capabilities but R has developed a more cuter graphical tool called ‘ggplot2’.

ggplot stands for ‘grammar of graphics plot’.

If you don’t have ‘ggplot2’ in your environment then install it by the following manner.

2.1 Installing ggplot2

Now remember that ggplot2 takes data frames as input. Let’s say this is the viability of a cell line, treated with a drug in two different cell culture conditions. Because we will draw the viability values randomly, we are setting the seed of the random number generator to a specific value.

2.2 Generating random points

set.seed(10)

Here we use set.seed() function that doesn’t create an output directly. Instead it alters the internal state of the R workspace. This function ensures that we will get the same results for randomization. Now we can generate some (pseudo-)random numbers to draw.

viability <- rnorm(40)

viability

##  [1]  0.01874617 -0.18425254 -1.37133055 -0.59916772  0.29454513  0.38979430
##  [7] -1.20807618 -0.36367602 -1.62667268 -0.25647839  1.10177950  0.75578151
## [13] -0.23823356  0.98744470  0.74139013  0.08934727 -0.95494386 -0.19515038
## [19]  0.92552126  0.48297852 -0.59631064 -2.18528684 -0.67486594 -2.11906119
## [25] -1.26519802 -0.37366156 -0.68755543 -0.87215883 -0.10176101 -0.25378053
## [31] -1.85374045 -0.07794607  0.96856634  0.18492596 -1.37994358 -1.43551436
## [37]  0.36208723 -1.75908675 -0.32454401 -0.65156299

We use ‘rnorm()’ function to create an object with 40 random values drawn from a standard normal distribution.

treatment <- rep(c("control", "treated"), 20)

The function ‘rep()’ stands for repeat, it will repeat the two strings 20 times.

culture <- rep(c("Media1", "Media2"), 20)

Now we will combine these three objects together into a data frame.

plotDF <- data.frame(viability = viability, treatment = treatment, culture = culture)

head(plotDF)

We have a data frame with three named columns. We already installed the ‘ggplot2’ package. Let’s load the library into our workspace.

2.3 Loading library

library("ggplot2")

ggplot(plotDF, aes(x = culture, 
                   y = viability,
                   fill = treatment)) +

geom_boxplot() +

geom_point(position = position_jitterdodge())

2.4 Arguments of ggplot

The ggplot2 is made of three basic elements: Plot = Data + Aesthetics + Geometry

Data ~>
- The data frame that we created “plotDF”.
- aes ~> This function maps columns of the input data frame to characteristics of the plot.
  - x ~> x-axis: culture
  - y ~> y-axis: viability
  - fill ~> Fill the boxplot with treatment column
Aesthetics ~> It is used to represent x and y in a graph. It can alter the color, size, dots, the height of the bars etc.
Geometry ~> It defines the graphics type, i.e., scatter plot, bar plot, jitter plot etc.

It’s important to note that you have to use the addition (+) operator to add the geom layer.

2.5 Alternative

ggplot(plotDF, aes(x = culture,
                   y = viability,
                   col = treatment)) +

geom_point()

ggplot(plotDF, aes(x = culture,
                   y = viability,
                   col = treatment,
                   shape = treatment)) +

geom_point()

R Basics Three

Neko_Chan666

2023-07-10