Chapter 1: Introduction to R

This chapter is designed to introduce beginners to R, ensuring that trainees with no prior exposure to the software can grasp its basic functionalities. It begins with an overview of R, instructions for installing the software, and a description of the RStudio interface. Finally, it provides guidance on creating basic code.

What is R?

  • R is a programming language and environment specifically designed for statistical computing and graphics.
  • It offers a wide range of statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and more. These tools are invaluable for systematizing the quantitative analysis of social assistance programs, enabling evidence-based decision-making.
  • By default, R does not have a user-friendly interface for visualizing inputs and outputs. To enhance usability, it is recommended to download RStudio, an integrated development environment (IDE) that simplifies interaction with R.

Given the importance of RStudio, the following section outlines the steps to install both R and RStudio. For a more detailed installation guide, refer to the following resources:


Installing R and RStudio

Step 1: Download R

  1. Visit the official R website: https://www.r-project.org/.

  2. Click on the CRAN link located on the left side of the webpage.

    Image 1: CRAN link on the R website

  3. Select a mirror from the list (e.g., 0-Cloud).

    Image 2: List of CRAN mirrors

  4. Choose the appropriate version for your operating system (Windows or Mac) and click Download R.

    Image 3: Download R for Windows or Mac

  5. Click on Install R for the first time.

    Image 4: Install R for the first time

  6. Download the installation file for your operating system (e.g., Download R for Windows).

  7. Run the installation file and follow the on-screen instructions to complete the installation.

Step 2: Download RStudio

  1. Visit the RStudio website: https://posit.co/products/open-source/rstudio/.

  2. Click on the Download button.

    Image 6: RStudio download page

  3. Select the Free version of RStudio.

    Image 7: Free version of RStudio

  4. Click Download RStudio and follow the installation instructions.


RStudio Interface

When you launch RStudio, you will see an interface divided into four panels:

  1. Source Panel:

    • Located in the top-left quadrant, this panel is where you write and edit code scripts.
    • You can save scripts, view databases, and execute specific lines of code here.
  2. Environment and History Panel:

    • Located in the top-right quadrant, this panel displays all variables, datasets, functions, and objects loaded into the system.
    • It also provides access to history, connections, and tutorials, though these features are not covered in this manual.
  3. Multipurpose Panel:

    • Located in the bottom-right quadrant, this panel is highly versatile.
    • It allows you to view files, preview plots, manage installed packages, and access help manuals.
  4. Console Panel:

    • Located in the bottom-left quadrant, this panel displays the results of executed code.
    • While you can write code directly in the console, it is recommended to use the Source panel for better traceability.

    Image 11: Console Panel


Packages and Libraries

Packages are collections of R functions, data, and code that extend R’s capabilities. Some packages are included with the base R installation and are automatically accessible. These provide essential functions for statistical analysis and data visualization.

To use additional packages, you must load them into your R session using the library() function. For example:

library(readxl)

Once you write the previous statement in the Source panel, the natural question is how to execute this line (i.e., to let R know that you want it to perform the instruction). There are two ways to do this:

  1. First Option: Select the command line you want to run and press “Ctrl+Enter” on the keyboard (recommended option).

  2. Second Option: Select the command line and click the “Run” button at the top of the Source panel.

Both methods will execute the selected command in the R console.

Here’s a revised and polished version of your R Markdown content. You can copy and paste this directly into your R Markdown document for a Word output. I’ve improved grammar, formatting, and clarity while preserving the R Markdown structure.


Installing Packages

R is an open-source software, and new packages created by users are continually being developed to simplify analytical tasks. To use these packages, they must first be downloaded and installed. There are two primary ways to install packages in R.

Method 1: Using the RStudio Interface

  1. Navigate to the Packages tab in the Multipurpose panel.
  2. Click the Install button.
  3. Type the name of the package you want to install (e.g., ggplot2). As you type, RStudio will suggest available packages to help avoid spelling errors.

Method 2: Using the Command Line

You can also install packages directly from the command line using the install.packages() function. This method requires you to know the exact name of the package but ensures your code is self-contained and reproducible. For example:

install.packages("ggplot2")

Note: Installing a package is only the first step. To use it in your session, you must load it using the library() function:

library(ggplot2)

Objects in R

R uses various types of objects to store and manipulate data. Below, we explore the most common ones.

Variables

Variables in R can store different types of data, such as numeric, character, or logical values. To create a variable, assign a value to it using the = or <- operator.

# Examples of variables
A = 4.5          # Numeric
B = "House"      # Character
C = FALSE        # Logical

# Display the value of C
C
## [1] FALSE

Vectors

A vector is a collection of elements of the same type (e.g., all numeric, all character, or all logical). Use the c() function to create a vector.

# Create vectors
A = c(1, 2, 3)                # Numeric vector
B = c("blue", "red", "yellow") # Character vector
C = c(TRUE, FALSE, FALSE)     # Logical vector

# Display the vector B
B
## [1] "blue"   "red"    "yellow"
# Access specific elements of B
B[1:2]
## [1] "blue" "red"

Matrices

Matrices are two-dimensional objects that store elements of the same type. They can be created by combining vectors using rbind() (row-wise) or cbind() (column-wise).

# Create vectors
vectorA = c(1, 2, 3)
vectorB = c(4, 5, 6)

# Combine vectors into matrices
Matrix1 = rbind(vectorA, vectorB)  # Combine by row
Matrix2 = cbind(vectorA, vectorB)  # Combine by column

# Display Matrix1
Matrix1
##         [,1] [,2] [,3]
## vectorA    1    2    3
## vectorB    4    5    6
# Display Matrix2
Matrix2
##      vectorA vectorB
## [1,]       1       4
## [2,]       2       5
## [3,]       3       6
# Check dimensions of Matrix2
dim(Matrix2)
## [1] 3 2
# Access specific elements of Matrix2
Matrix2[2, 2]
## vectorB 
##       5

Factors

Factors are used to represent categorical data. They assign labels to numeric values, making it easier to interpret data.

# Create a numeric vector
vectorA = c(1, 3, 2, 2, 4, 1)

# Convert to a factor with labels
factorA = factor(vectorA, 
                 levels = c(1, 2, 3), 
                 labels = c("Primary", "Secondary", "Tertiary"))

# Display factorA
factorA
## [1] Primary   Tertiary  Secondary Secondary <NA>      Primary  
## Levels: Primary Secondary Tertiary

Data Frames

Data frames are similar to matrices but can store columns of different data types. They are ideal for working with structured data.To explore more data types.

# Create vectors
edu = c(1, 2, 3)
edu = factor(edu, 
             levels = c(1, 2, 3), 
             labels = c("Primary", "Secondary", "Tertiary"))
Name = c("Amira", "Jumana", "Carole")
Age = c(35, 32, 43)
Beneficiaries = c(TRUE, TRUE, FALSE)

# Combine vectors into a data frame
data = data.frame(Name, Age, Education = edu, Beneficiaries)

# Display the data frame
data
# Access specific columns
data$Education
## [1] Primary   Secondary Tertiary 
## Levels: Primary Secondary Tertiary
data$Age
## [1] 35 32 43

Functions

Functions are reusable blocks of code that perform specific tasks. R includes many built-in functions, and you can also create your own. R comes with many built-in functions, such as c(), rbind(), and factor(), which you have already encountered. However, R also allows users to define their own custom functions. This section delves deeper into how functions work in R. For example, the cbind() function is used to combine objects by columns, while rbind() combines objects by rows.

Both functions take arguments to specify the objects to be combined. To understand the arguments of a function already created in R, you can use the ? operator or the help() function. For example, upon run the command ?cbind R will display documentation for the function, including its arguments and usage as illustrated in the image xxx

dbind(..., deparse.level = 1)
The ... argument refers to one or more vectors, matrices, or other R objects that can be combined by rows. These can be provided as named or unnamed arguments.

The deparse.level argument is an integer that controls how labels are constructed for non-matrix-like arguments. By default, it is set to 1.

Creating a Function

A function in R has the following structure:

function_name = function(arg1, arg2, ...) {
  # Code to execute
  return(result)
}

Example:

# Define a function to sum two numbers
sumNumbers = function(a, b) {
  y = a + b
  return(y)
}

# Use the function with  1 and 3 values as arguments. y= 1 + 3  will return as a result the 4
sumNumbers(1, 3)
## [1] 4

Conditional Statements

Conditional statements allow you to execute code based on specific conditions. The if/else statement is commonly used for this purpose.

If  (condition) 
{code to be executed if the condition is TRUE }  
else { code to be executed if the condition is FALSE }
# Example of if/else
x = 3
if (x < 0) {
  print("Negative")
} else if (x > 0) {
  print("Positive")
} else {
  print("Zero")
}
## [1] "Positive"

Loops

Loops are used to repeat a block of code multiple times. R supports two main types of loops: while and for.

While Loop

A while loop repeats code as long as a condition is true.

while (test_expression) {
code to be executed
}
# Display numbers from 1 to 6
i = 1
while (i <= 6) {
  print(i)
  i = i + 1
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6

For Loop

A for loop iterates over a sequence of values.

for (element in a limited sequence)
code to be executed
}
# Sum numbers from 1 to 6
x = c(1, 2, 3, 4, 5, 6)
y = 0 # initialize y in 0
for (i in x) {
  y = y + i
}
y
## [1] 21