1. Introduction

What is R?

R is programming language popular for statistical computing and graphics (r-project.org). You can obtain a free copy of this software from r-project.org and combine with many available packages for almost any of your needs.

As we are beginning our journey in learning R, we will keep it short and simple. Learning any kind of programming takes time. But it can be very rewarding if we keep trying and learn in small steps.

As programming language R is a dialect of the programming language S. Which is still being used and is proprietary software. R was written by Ross Ihaka and Robert Gentleman, it was written in C, Fortran and R itself (Wikipedia.org)](https://en.wikipedia.org/wiki/R_(programming_language)))

Installing and running R

You can visit the the Comprehensive R Archive Network (CRAN) website (https://cran.r-project.org) and obtain the necessary files for your computer environment (i.e. Windows, Linux, MacOS or etc.). As of June 2022, the latest version of R release was R-4.2.0. Please follow the website details on how to install on your system. However, if you are not sure, it is better to search some help document on the web/internet on typing something like this: “how to install R on Windows/Linux/MacOS” or get help from your friends or seniors.

The key idea of this journey of learning R is to always try first hand, and it is almost predictable that you will fail in first few times. But do not frustrated thinking this is too difficult or blame your skills. It is same for everyone who is starting like you, except for few of you who already have some familiarity. So, please try first, if fail search on the web, next ask help from friends/seniors/teachers.

If you were able to install R and can run it on your computer, Congratulations, you have finished the first hurdle. Now before we begin programming we need to be familiar with the software interface. Of course, we can run R from the command line on your Windows/MacOS, but first we will try to use R Gui, the graphical user interface of R. But, please keep in mind for these running in Linux, you may need to install RStudio or any other IDE of your choice. We will not yet discuss these here at the beginning.

Practice R online

You can practice R in an online environment from your favourite web browser. Please search on the web, there are many. Here I am listing few:

Getting help

For any updates or manuals please consult R home first. There are many free resources that will be sufficient for beginners.

For these, who likes to read books, few free on-line e-book/materials:

Running first R command

Inside RGui, you will find a standard menu bar, and below that a open pane named *console”. Here we can type commands to interact with the computer. We will first write a typing command and see its output:

print("Hello world!")

If everything went well, you will see R has output your message “Hello world!” in the console itself. The output will look like this:

## [1] "Hello world!"

So, what is happing here? Well we used a simple R function called “print()”, and inside it have given a text “Hello world!”, so it was our instruction to the R/or computer to print out the text “Hello world!”. Please notice the quotation(“) before and after the text, to write a character or texts we need to enclose it with single (’) or double (”) quotation, except that, for numbers we can input it directly.

Comments

Type the following one line at a time and press enter:

# print the number 4
print(4)

Here, R print out the number 4. You may have noticed that there is a single line “# print the number 4”, this line is called a comment line, and in R it begins with a single “#” (hash) sign. So, this is not a command line for R, when R sees a # in front of a line, it will ignore the line and go to the next line.

We can use comments to describe our codes, and keep track what we are doing and why we are doing. So, when you come back to your code after few days or months, you can still understand what you were doing.

Consider the following block of codes. Notice how comments are used to describe the purpose of this block of codes.

At this point you do not need to understand everything about this block of code. This is just to show you how we can use comments in our code. We will come back to this code block later.

#-------------------------------------------------------------
# Author       : Mohammad Shamim Hasan Mandal
# Description  : Getting R logo from R base package and plot 
# Date: June 19, 2022
#-------------------------------------------------------------
# Code starts here
library(png)              # load png package library
# Read the file and save as "logo"
logo = readPNG(
  system.file("img","Rlogo.png", 
    package="png")
  )
# Now make an empty plot
plot(1:2,type="n",main = "The R logo",axes=F,xlab = "",ylab="")
rasterImage(logo,1.,1.3,1.7,2) # add the logo image to the plot

2. Variables

We can execute simple mathematical equations inside R. In other way we can use it simply as calculator. Consider the following lines of codes. Notice, before each line there is a comment describing what to expect by the next line of codes. Also we can write the comments just after the code, in the same line.

# print the addition of 4 and 2
print(4+2)

# print the multiplication of 2 and 2, then division by 2
print((4*2)/2)

# Some calculation 
print(100/100)    # division
print(100*100)    # multiplication

Now, suppose we are doing our calculation and need to memorize the value of one calculation, then use it for next. What to do then? Well we can ask R to save it to memory by a name (must begin with a character). Then call that name and use it for calculation.

# Save birth year and present year
birthYear = 1999
present   = 2022
# make the calculation
Age = present - birthYear

Age # print the value of "Age" 23

Equation of a straight line

Now let’s try something practical. The equation of straight line is y = mx + b

where,

x = how far along x

y = how far along y

b = intercept

Now calculate the slope (m) for a line that go through a point 2,2 (x,y), and intercept (b) along y axis is 1:

# first make variables
y = 2           # slope
x = 2           # how far along x
b = 1           # intercept

# equation of straight line: y = mx +b
m = (y - b)/x   # calculate the slop  
print(m)        # print out the value of m

3. What is a Function?

What is a function? In plain text, a function is a piece of code is can be written by the programmer yourself, or available from other package libraries. There are many functions already available through your R installations. A function takes no or few inputs (arguments), and performs a specific tasks.

For example, the print() function. It has only one purpose, it prints whatever text, or variable we provide, it will print in the console.

Let’s look at the c() function which combines its arguments into vector or lists.

# the c() functions taking many arguments
numbers = c(1,2,3,4)            # numbers
lists   = c("one","two",3,4.4)  # characters and numbers

# All functions taking one arguments
print(max(numbers))             # get maximum
print(min(numbers))             # get minimum
print(median(numbers))          # get median value
print(sqrt(numbers[4]))         # get square root of fourth item

Note: We can actually omit print() function in previous examples, instead type “max(numbers)” and in return we will get the same result. R assumes that you meant to print out the value of that calculation.

A useful function for plotting is called plot(), which can take many arguments. Here is an example of plot using the cars dataset in R.

# Make a scatter plot using cars data
plot(x= cars$speed,y= cars$dist,      # define x and y axis
  xlab="Speed",ylab="Distance",       # give x,y axis name
  type="p",pch=25,col="red",bg="red", # colour the points 
  main="Speed vs Distance"            # add a title
  )
# Make a box plot from Iris data
boxplot(Sepal.Width ~ Species,main="Sepal width of three Iris species", 
        data = iris,col="orange")
Use of plot functionsUse of plot functions

Use of plot functions

Excecise

  1. Make a variable under your name, and save your birth year. Then print our using the print function.
  2. Now using the c() make another variable combining three of your favourite hobbies.
  3. Load the cars data using “data(cars)” function. Then using plot() function, try to plot a scatter plot. Hints: Use the code from the previous section and try to change the type argument to “p”, “l”,“b” and “o” one at a time. Check the results.

4. Datatypes in R

You may know that computers though good for calculation, it itself can not do anything. The computer needs instructions from the user, meaning you. But wait computers do not understand human language, so how can you communicate with it? A programming language works as a communicating means between us human and the machine. R is a very high level language. Here high meaning, we do not need to go details of how very small scale instructions for example going very details how the computer will save a number, a text and when finished how it will manage the memory etc. Rest assure, these details are taken care by the language itself.

Storing mode

Lets check how R handles some data in it’s memory. Integers (e.g. 1, 2022, 3) and fractional values (e.g. 1.22, 9.33) are stored as double data type. Characters (e.g. “Dhaka”, “Bangladesh”) are stored as character type. We can use the typeof() function print variable types in R console

typeof(2022)     # double
typeof(20.22)    # double
typeof("JU")     # character
typeof("2022")   # character

Converting between types

We can convert between data types in R. For examples, notice below functions and see how they work:

year = 2022                # variable saving present year
typeof(year)               # type of the variable
year = as.character(year)  # convert to character
typeof(year)               # check type
year = as.numeric(year)    # convert back to double
typeof(year)
# For integer values we can also do
year = as.integer(year)
typeof(year)

Note You can ask help inside R for checking what a function such as typeof() does. To ask for help simple place a single question or dobule question in front of that and press enter. You can also use the help() function. For example, type help(typeof).

Common data types in R

There are few built in data structures in R

  1. vector : vector()
  2. list : list()
  3. matrix : matrix()
  4. dataframe : data.frame()

a. Vector

A vector can be created in many ways, in R you can simply pass the values you want to store in a vector.

# make a vector
x = c(1,2,3)   # create a vector named x
is.vector(x)   # check whether it is vector?

The elements of a vector have indexing meaning they have an order, so we can use that order to access values inside a vector.

x[3]   # the third element of vector
x[4]   # the fourth element does not exists

We can use this method for our calculation also. For example:

# We access first and third element
# make multiplication
# and save the results in the fourth element
x[4] = x[1] * x[3]

x   # 4th element was added 

Generating numbers

Few useful functions that maybe useful for us. Lets see how to generate numbers.

# from one two 10, by 1
s = 1:10
# the repeat function
x = rep(9,3)                 # repeat 9, three times
y = rep("a",3)               # repeat a, three times
z = rep(c("a","b"),3)        # repeat a,b three times
# the sequence function
a = seq(from=1,to=10,by=2)   
# random number
b = rnorm(n=5,mean = 4,sd=2) # generate random number
# letters
letters[1:5]   # first five small caps English characters
LETTERS[6:10]  # change the index

Lets use our new skills to generate two random vector and plot it.

set.seed(101)          # to generate same result
# Make new plot
plot(
  x = 1:10,            # numbers/integers from 1 to 10
  y = rnorm(10,4,1),   # 10 random number with mean 4 and sd of 1
  xlab= "Numbers",
  ylab="Random numbers",
  main="This is random scatter plot?",
  sub="This is subtitle",
  type = "b",           # both line and point type
  pch=24,
  col="blue",
  bg="red"
  )

# check help(plot) to see more options

b. Lists

A list is a useful data structures that can handle any data type. It is more flexible that vector type. Consider the following examples.

# create three vectors
x = 1:5
y = letters[1:3]
z = rep(a,4)

# make list with three vectors
myList = list(x,y,z)
myList

List elements can also be accessed using indexing.

myList[[1]]    # the first element
myList[[1]][5] # 5the element of the first item

# use this to modify
myList[[1]][5] = 100
myList[[1]]    # check the fifth element

Some practical use

Suppose we have many image files or data files that we want to read from our computer. If there is directory or folder named “data” in your computer, you can list the files and read multiple files at once.

# get current working directory
getwd()
# change current working directory
# the argument is text for example "C:/data" or "/home/data"
setwd()  
# List directories of current working directory
list.dirs() 

# Lists the file of your current working directory
list.files()

c. Matrix and Array

Matrix are very useful data structures in R. We can create a matrix in R using matrix() function. There are few arguments in that function, for example how many rows or columns of the matrix. Please see the example below:

# create a matrix
m <- matrix(
  data = 1:16,            # from 1 to 16
  ncol = 4,               # number of columns
  nrow = 4,               # number of rows
  byrow = TRUE            # arrange data by row 
  )

You can think array a kind of matrix like data structures

ar <- array(data = 1:16,  # from 1 to 16
      dim = c(4,4),       # 4 rows, 4 columns
      dimnames = NULL     # no names
      )

Detour:

Images or rasters are basically matrix or array like data structures, the numbers are stored inside the rasters and then the computer assigns some value for these. Then we can see these as they are. So, let’s try to make an 10 x 10 raster/image.

Step by step:

# generate 16 numbers
mydata = seq(0,1,length.out=16)

# make a matrix named mymatrix
mymatrix = matrix(
  data = mydata,
  ncol = 4,
  nrow = 4
  )

# conver them as raster
myraster = as.raster(mymatrix,max = 1)

myraster
##      [,1]      [,2]      [,3]      [,4]     
## [1,] "#000000" "#444444" "#888888" "#CCCCCC"
## [2,] "#111111" "#555555" "#999999" "#DDDDDD"
## [3,] "#222222" "#666666" "#AAAAAA" "#EEEEEE"
## [4,] "#333333" "#777777" "#BBBBBB" "#FFFFFF"
# plot
plot(myraster)