1 INTRODUCTION TO R

1.1 R’S INTERFACE, RUNNING A CODE AND COMMENTING

1.1.1 R Inter phase Overview

R provides a user-friendly interface that is composed of four key sections, each serving a unique function to streamline your data analysis workflow.

  1. Script Pane: The script pane functions similarly to a word processor where you can write, edit, and save your R code.
  2. Console: Directly below the script pane, the console serves as the interactive environment where R executes the code.
  3. Global Environment: On the right side of the script pane, you’ll find the Global Environment tab, which is used to manage and track all the objects (variables, data frames, functions, etc.) created during your session.
  4. File/Plot/Packages/Help: This section allows you to manage datasets, open saved files, and define pathways you can visualize your data. R’s versatility comes from the vast array of available packages that extend its functionality. The Help tab provides access to R’s documentation, offering detailed information about functions, packages, and syntax. This is invaluable for learning and troubleshooting.
x <- 10 # creating a variable in r
 print (x) # Output: 10
## [1] 10

1.1.2 Running Code in R:

Place the cursor at the end of the line and press the “Run” button in the R interface. This will execute the command and display the result in the console. Using Keyboard Shortcuts: For an even quicker method, you can use the keyboard shortcut Ctrl + Enter (on Windows) or Cmd + Enter (on Mac). * Running Multiple Lines of Code: If you need to run several lines of code at once, you can highlight the lines you want to execute. After selecting them, you can either press the “Run” button or use the Ctrl + Enter (or Cmd + Enter) shortcut.

    x <- 5 # Assigning the value 5 as x so after running when you call x R will give you 5

1.1.3 Commenting

  • Using the # Symbol: To write a comment, simply place the # symbol before the text. Just like demonstrated above. R will recognize the comment and display it in a different color (typically yellow) to distinguish it from active code. This helps you visually separate explanations from your actual code.
 x <- 5 # Assigning the value 5 as x so after running when you call x R will give you 5

2 DATA STRUCTURE

2.1 Vector:

-Vector a fundamental data structure that stores a group of objects of the same type. They are one-dimensional, meaning they exist in either rows or columns, not both. -Vectors are created using the c() function, which stands for combine or concatenate. All elements to be included in the vector are placed inside the parentheses. -Example

subject_id <- c("subject1", "subject2", "subject3") # creating a vector of subject IDs (character type)

Passed_Stat <- c(TRUE, TRUE, FALSE, FALSE, FALSE) # Logical vector on those who passed statistics
 
Age <- c(34, 23, 43, 56, 33, 49, 62) # creating numerical vector for age

Favorite_color <- c("red", "blue", "green") # creating character vector for favorite colors

Calling a Vector: Once a vector is created and stored, you can retrieve its values by simply typing its name: By just typing Age all values created will appear.

Example

Age <- c(34, 23, 43, 56, 33, 49, 62) # creating numerical vector for age

Age  # When you call it will display age
## [1] 34 23 43 56 33 49 62

2.1.1 Performing Operations on Vectors

Since vectors group similar data types, you can apply operations to all elements at once. Example: Adding 2 years to each age

 Age + 2 # adding two years to each element  of the `Age` vector 
## [1] 36 25 45 58 35 51 64

3 BASIC DATA CLASSES IN R:

In R, objects belong to different data classes based on the type of information they store. The main data classes include: -Numeric – Decimal numbers (e.g., 3.4, 7.89) # Example height <- c(6.2, 5.3, 6.7, 5.6) -Integer – Whole numbers (e.g., 0, 1, 10L).
-Character – Text or strings (e.g., “apple”, “data”). Example Fruitype <- c("orange", "banana", "mangoes") -Factor – Categorical variables with predefined levels. -Logical – Boolean values (TRUE, FALSE). -Missing Values (NA) – Representing missing or undefined data.

x <- 5.6
class(x) # will display the data class of x which is numeric
## [1] "numeric"

4 INDEXING IN R:

*Indexing is the process of accessing specific elements within a vector, list, or data frame. Since objects store values for later use, we need a way to retrieve or manipulate specific elements efficiently. In R, indexing is done using square brackets [], allowing us to pull individual values, sequences, or non-sequential elements.

Example of vector

band <- c("Boys to Men", "UB4o", "Queen") #list of bands

4.1 Retriving Single Values

To retrieve the second band, UB40, use square brackets and specify the index:

band [2] # retriving the 2nd band output will be UB40
## [1] "UB4o"

4.2 Retrieving Multiple Sequential Values

To retrieve a sequence of values, use : inside the brackets.

band [1:3] # will retrive band number 1 boys to men and 3 queen
## [1] "Boys to Men" "UB4o"        "Queen"

#DATA VISUALIZATION Data visualization is representing data graphically to help wit easy identification of trends and other key elements of the data. In R, tidyverse package provides one toll for data visualization: ggplot2.

library(ggplot2)
library (ggplot2) #load ggplot2 package. *Note:* This will only be possible after you installed _tidyverse_package

data(mpg) # Sample of data set

ggplot(data=mpg, aes(x = displ, y = hwy, color=class)) +
  geom_point() +
  labs(title = "Engine Displacement vs. Highway MPG",
       x = "Engine Displacement (L)",
       y = "Highway Miles per Gallon") +
  theme_minimal()                          #Creating a ggplot scatter plot

4.3 Reference

For more information on R, visit R Document.

The output presented is based on R skill lab course material and would wish to acknowledge Kaser Taylor for the notes

4.4 Challenging Areas

  1. ggplot creating and color coding