Prepared by Rokabi_Reveal_Tech

General Guidelines

Review Time: 20 minutes

Before you start implementing this lab, it is a good idea to dedicate 20 minutes to reviewing the concepts that you will address in this lab. This will help you deepen your understanding and better organize your time while working.



Lap 1

What will you learn from this Lab 1 ?

  • What is R and why do we use it
  • Feather –advance-challenge
  • Work Environments
  • Identifying the package

Setp 1: What is R and why do we use it?

Step 2: Feather – Advanced Challenge

The Feather format is an efficient, fast file format that facilitates data sharing between different programming languages. You can use it to dig deeper into data, for example by using functions like face().

Common Features

  • Open Source: R is free and open-source, allowing anyone to use, modify, and share it.
  • Data Stored in Data Frames: R uses data frames for data storage, making it easy to manipulate, subset, and analyze data.
  • Community Support: R has a large, active community that contributes to code development and provides extensive support.
  • Formulas and Functions Readily Available: R comes with a wide range of built-in functions and statistical formulas that are easily accessible.
  • Data Manipulation, Visualization, and Statistics: R offers extensive packages for data manipulation (e.g., dplyr), visualization (e.g., ggplot2), and statistical analysis (e.g., stats).
  • Find Packages: There is a vast ecosystem of R packages that allow you to do practically anything with data, from data cleaning to advanced machine learning.

Unique Advantages

  • Flexibility and Customization: R allows users to create custom workflows and tailor their code to their specific needs.
  • Comprehensive Libraries: R has a vast number of libraries that cater to a wide range of tasks, including specialized tools for machine learning, bioinformatics, and geospatial analysis.
  • Reproducible Research: With tools like R Markdown, Shiny, and RStudio, R supports reproducible research and interactive reports.
  • Data Exploration: R provides easy-to-use functions and visualizations that allow you to dig deeper into datasets and explore them efficiently.

Unique Challenges

  • Inconsistent Naming Conventions: R’s inconsistent naming conventions across different packages can confuse beginners, making it hard to select the correct function or method.
  • Complex Methods for Beginners: Some of the methods and functions, especially for handling complex variables or transformations, might be challenging for beginners.
  • Memory Management: When working with large datasets, R’s memory management can become a challenge, as the system might run out of memory during large computations or when handling massive data frames.

Step 3: Work Environments

The R language offers various environments that enhance the ease of executing programs and analyzing data. Below is a quick overview of common R work environments:

1. RStudio

  • RStudio is the most popular Integrated Development Environment (IDE) for R.
  • It provides a graphical interface for easy writing, executing, and navigating through code and results.
  • Features include interactive analysis, integration with graphical tools like ggplot2, and app development via Shiny.
  • It supports RMarkdown (.RMD), allowing users to combine code and text for creating interactive reports.

2. Jupyter Notebook

  • Jupyter Notebook is an interactive open-source environment that supports multiple programming languages, including R (via IRKernel).
  • It is widely used in scientific research and education, as it allows users to write code in cells and include both graphs and explanatory text.
  • It supports multiple environments (Python, R, etc.).

3. R Console

  • The R Console is the primary environment for executing R code, typically accessed via a command prompt or emulator.
  • It is a text-based environment where code is written and executed directly without a graphical interface.
  • Ideal for users who prefer a command-line environment due to its simplicity and speed.

4. R on Cloud Platforms (e.g., RStudio Cloud)

  • Cloud platforms like RStudio Cloud offer online development environments for R.
  • These platforms enable users to perform analysis and manage R projects directly from the browser, making it easier to access and share projects within teams.
  • No need for local installations of R and RStudio, providing flexibility to access projects from anywhere.

I have to get bored with RStudio Desktop or RStudio Cloud

    1. Install R:

To download R click here: [https://cran.r-project.org/]

    1. RStudio installation:

After installing R, download RStudio from the following link: [https://posit.co/download/rstudio-desktop/]

NOT

Choose the appropriate version of your operating system (Windows, macOS, or Linux) and follow the instructions to install R or RStudio.

After you install R and RStudio, you can open RStudio and use it to work with R.

Step 4: Identifying the package

  • Are units of reproducible R code.
  • Packages are a key part of working with R.
  • We will be using a package called tidyverse.

The tidyverse package is actually a collection of individual packages that can help you perform a wide variety of analysis tasks.

The tidyverse is a collection of packages in R with a common design philosophy for data manipulation, exploration, and visualization.

The tidyverse package contains several integrated libraries (packages), such as:

  • 1) ggplot2: For creating visualizations and graphs.
  • 2) dplyr: For data manipulation and analysis.
  • 3) tidyr: For cleaning and reshaping data.
  • 4) readr: For reading data from files.
  • 5) stringr: For string/text manipulation.
  • 6) lubridate: For working with dates and times.
  • 7) tibble: For creating and managing data in a table format.
  • 8) forcats: For handling categorical variables (factors).

General Guidelines

Review Time: 45 minutes

Before you start implementing this lab, it is a good idea to dedicate 45 minutes to reviewing the concepts that you will address in this lab. This will help you deepen your understanding and better organize your time while working.



Lap 2

What will you learn from this Lab 2 ?

  • Understand basic programing concept
    • Vector
    • List
    • Data Frame
    • File
    • Matrix
    • Operators And Conditional
      In programming, a data structure is a format for organizing and storing data.

      Step 1: Vector:

      vector is a group of data elements of the same type, stored in a sequence in R. You cannot have a vector that contains both logicals and numerics.
  • There are two types of vectors:
    • atomic vectors
      • Logical EX True/False TRUE
      • Integer EX Positive and negative whole values 7
      • Double EX Decimal valuesEX 37.405
      • Character EX String/character values Coding
    • lists

createavectoris by using the c() function

Vector creation formula c(x, y, z, …).

# Example of Logical Vector
logical_vector <- c(TRUE, FALSE, TRUE, TRUE)
logical_vector
## [1]  TRUE FALSE  TRUE  TRUE
# Example of Integer Vector
integer_vector <- c(1L, 2L, 3L, 4L)  # Note the "L" for integer type
integer_vector
## [1] 1 2 3 4
# Example of a Double vector
double_vector <- c(3.14159, 2.71828, 1.61803)
double_vector
## [1] 3.14159 2.71828 1.61803
# Example of Character Vector
character_vector <- c("apple", "banana", "cherry", "date")
character_vector
## [1] "apple"  "banana" "cherry" "date"

Practice some functions with the vector

# Checking the type of 'character_vector'
typeof(character_vector)
## [1] "character"
# Getting the length of 'character_vecto
length(character_vector)
## [1] 4
# Checking if 'character_vector' is an integer vector
is.integer(character_vector)
## [1] FALSE

Al types of vectors can be named.

z <- c('ali', 'hussan', 'mohamed')
# Assigning names to the elements of the vector 'z'
names(z) <- c('ali', 'hussan', 'mohamed')

# Displaying the vector 'z' with the assigned names
z
##       ali    hussan   mohamed 
##     "ali"  "hussan" "mohamed"

Step 2: List:

Lists allow us to store values of different types as opposed to vectors that store a single data type. syntax: list(x, y, z, …).

# Creating a list containing numbers, characters, and logical values
list(1,2,"clean",23.45,TRUE,FALSE)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] "clean"
## 
## [[4]]
## [1] 23.45
## 
## [[5]]
## [1] TRUE
## 
## [[6]]
## [1] FALSE
# Creating a list inside another list
list(list(1,23,34.4,5))
## [[1]]
## [[1]][[1]]
## [1] 1
## 
## [[1]][[2]]
## [1] 23
## 
## [[1]][[3]]
## [1] 34.4
## 
## [[1]][[4]]
## [1] 5
# Creating a list with mixed elements and inspecting its structure
y<-list('data analysis',TRUE,123.68,'data science')
str(y)
## List of 4
##  $ : chr "data analysis"
##  $ : logi TRUE
##  $ : num 124
##  $ : chr "data science"
# Creating a named list
list_1<-list("R" = 1, "python" = 2,"css"=3 )
# Accessing the value associated with 'css'
list_1$"css"
## [1] 3

other resource (vector and list) clike her: {https://r4ds.had.co.nz/vectors.html#vectors}

Step 3: Data Frame:

It is a data structure used to store data in the form of a table consisting of rows and columns. A Data Frame is a special type of list where each column can contain a different data type (numbers, text, dates, etc.).

To create it, use the following function data.frame()

# Creating a Data Frame containing information about analysis tools
tools_df <- data.frame(
  Tool = c("R", "Python", "Tableau", "Excel", "SAS", "Power BI"), 
  Type = c("Programming", "Programming", "Visualization", "Spreadsheet", "Statistical", "Visualization"),
  Main_Use = c("Statistical Computing", "Data Science & Machine Learning", "Business Intelligence", "Data Analysis", "Advanced Analytics", "Business Intelligence"),
  Supported_Data_Types = c("Data Frames, Vectors, Matrices", "Data Frames, Arrays, Lists", "Data Sets, Charts", "Spreadsheets, Charts", "Data Sets, Tables", "Data Sets, Charts"),
  Popularity = c("High", "Very High", "High", "High", "Medium", "High")
)

head(tools_df)
##       Tool          Type                        Main_Use
## 1        R   Programming           Statistical Computing
## 2   Python   Programming Data Science & Machine Learning
## 3  Tableau Visualization           Business Intelligence
## 4    Excel   Spreadsheet                   Data Analysis
## 5      SAS   Statistical              Advanced Analytics
## 6 Power BI Visualization           Business Intelligence
##             Supported_Data_Types Popularity
## 1 Data Frames, Vectors, Matrices       High
## 2     Data Frames, Arrays, Lists  Very High
## 3              Data Sets, Charts       High
## 4           Spreadsheets, Charts       High
## 5              Data Sets, Tables     Medium
## 6              Data Sets, Charts       High
str(tools_df)
## 'data.frame':    6 obs. of  5 variables:
##  $ Tool                : chr  "R" "Python" "Tableau" "Excel" ...
##  $ Type                : chr  "Programming" "Programming" "Visualization" "Spreadsheet" ...
##  $ Main_Use            : chr  "Statistical Computing" "Data Science & Machine Learning" "Business Intelligence" "Data Analysis" ...
##  $ Supported_Data_Types: chr  "Data Frames, Vectors, Matrices" "Data Frames, Arrays, Lists" "Data Sets, Charts" "Spreadsheets, Charts" ...
##  $ Popularity          : chr  "High" "Very High" "High" "High" ...
colnames(tools_df)
## [1] "Tool"                 "Type"                 "Main_Use"            
## [4] "Supported_Data_Types" "Popularity"
dim(tools_df)
## [1] 6 5
tail(tools_df)
##       Tool          Type                        Main_Use
## 1        R   Programming           Statistical Computing
## 2   Python   Programming Data Science & Machine Learning
## 3  Tableau Visualization           Business Intelligence
## 4    Excel   Spreadsheet                   Data Analysis
## 5      SAS   Statistical              Advanced Analytics
## 6 Power BI Visualization           Business Intelligence
##             Supported_Data_Types Popularity
## 1 Data Frames, Vectors, Matrices       High
## 2     Data Frames, Arrays, Lists  Very High
## 3              Data Sets, Charts       High
## 4           Spreadsheets, Charts       High
## 5              Data Sets, Tables     Medium
## 6              Data Sets, Charts       High

Step 4:File:

  • You can create a new file by using the file.create() function. *EX : file.create(“new_file.txt”)

  • You can use the file.copy() function to copy a file from one place to another

    • EX: file.copy(“new_file.txt”, “copied_file.txt”)
  • Copy a file to a specific folder

    • EX: file.copy(“new_file.txt”,“C:/Users/Username/Documents/copied_file.txt)
  • You can use the writeLines() or write() function to write content to a text file

    • EX: writeLines(c(“Hello, world!”, “This is a new file.”), “new_file.txt”)
  • To read from a text file, you can use the readLines() function to read content line by line

    • EX: content <- readLines(“new_file.txt”)
  • CSV file read: *EX: data <- read.csv(“data.csv”)

  • Write a CSV file:

    • EX: write.csv(data, “output_data.csv”, row.names = FALSE)
  • You can delete a file by using the file.remove() function

    • EX: file.remove(“new_file.txt”)

Step 5:Matrix:

A matrix in R is a two-dimensional array where elements are arranged in rows and columns. created using the matrix() function

  • Creating a Matrix

matrix(data, nrow, ncol, byrow = FALSE, dimnames = NULL) data: The elements to populate the matrix. It is typically a vector. nrow: Number of rows. ncol: Number of columns. byrow: Logical value indicating whether the matrix should be filled by row (TRUE) or by column (FALSE). dimnames: Optional names for the rows and columns.EX

# Create an array containing analysis tools with values representing how well the tools support different functions  
analysis_tools_matrix <- matrix(c(5, 4, 3, 5, 5, 4), nrow = 3, ncol = 2)

#Name rows and columns 
rownames(analysis_tools_matrix) <- c("R", "Python", "Tableau")
colnames(analysis_tools_matrix) <- c("Data Analysis", "Data Visualization")

# Matrix printing 
print(analysis_tools_matrix)
##         Data Analysis Data Visualization
## R                   5                  5
## Python              4                  5
## Tableau             3                  4
# Accessing the element in the first row and second column
element <- analysis_tools_matrix[1, 2]
print(element)
## [1] 5
# Accessing the element in the second row and first column
element <- analysis_tools_matrix[2, 1]
print(element)
## [1] 4
# Accessing the entire first column (Ease of Use)
column_data <- analysis_tools_matrix[, 1]
print(column_data)
##       R  Python Tableau 
##       5       4       3
# Accessing the entire third row (Excel)
row_data <- analysis_tools_matrix[3, ]
print(row_data)
##      Data Analysis Data Visualization 
##                  3                  4

Step 6:Operators And Conditional:

is a symbol that identi es the type of operation or calculation to be performed in a formula. Logical operators return a logical data type such as TRUE or FALSE.

  • There are three primary types of logical operators:
    • AND (sometimes represented as & or && in R)
    • OR (sometimes represented as | or || in R)
    • NOT (!)

A conditional statement is a declaration that if a certain condition holds, then a certain event must take place. For example,If the temperature is above freezing, then I will go outside for a walk.

  • using three related statements:
    • if()
    • else()
    • else if()

You have to practice a lot

You have made good progress, keep developing your creative skills!

We will dive deeper into the analysis process in the upcoming labs. Stay tuned!