Review Time: 20 minutes
Before you start implementing this lab, it is a good idea to dedicate 20 minutes to reviewing the concepts that you will address in this lab. This will help you deepen your understanding and better organize your time while working.
Lap 1
The Feather format is an efficient, fast file format
that facilitates data sharing between different programming languages.
You can use it to dig deeper into data, for example by using functions
like face()
.
dplyr
), visualization (e.g., ggplot2
), and
statistical analysis (e.g., stats
).The R language offers various environments that enhance the ease of executing programs and analyzing data. Below is a quick overview of common R work environments:
To download R click here: [https://cran.r-project.org/]
After installing R, download RStudio from the following link: [https://posit.co/download/rstudio-desktop/]
Choose the appropriate version of your operating system (Windows, macOS, or Linux) and follow the instructions to install R or RStudio.
After you install R and RStudio, you can open RStudio and use it to work with R.
The tidyverse package is actually a collection of individual packages that can help you perform a wide variety of analysis tasks.
The tidyverse is a collection of packages in R with a common design philosophy for data manipulation, exploration, and visualization.
The tidyverse package contains several integrated libraries (packages), such as:
Review Time: 45 minutes
Before you start implementing this lab, it is a good idea to dedicate 45 minutes to reviewing the concepts that you will address in this lab. This will help you deepen your understanding and better organize your time while working.
Lap 2
Vector creation formula c(x, y, z, …).
# Example of Logical Vector
logical_vector <- c(TRUE, FALSE, TRUE, TRUE)
logical_vector
## [1] TRUE FALSE TRUE TRUE
# Example of Integer Vector
integer_vector <- c(1L, 2L, 3L, 4L) # Note the "L" for integer type
integer_vector
## [1] 1 2 3 4
# Example of a Double vector
double_vector <- c(3.14159, 2.71828, 1.61803)
double_vector
## [1] 3.14159 2.71828 1.61803
# Example of Character Vector
character_vector <- c("apple", "banana", "cherry", "date")
character_vector
## [1] "apple" "banana" "cherry" "date"
# Checking the type of 'character_vector'
typeof(character_vector)
## [1] "character"
# Getting the length of 'character_vecto
length(character_vector)
## [1] 4
# Checking if 'character_vector' is an integer vector
is.integer(character_vector)
## [1] FALSE
Al types of vectors can be named.
z <- c('ali', 'hussan', 'mohamed')
# Assigning names to the elements of the vector 'z'
names(z) <- c('ali', 'hussan', 'mohamed')
# Displaying the vector 'z' with the assigned names
z
## ali hussan mohamed
## "ali" "hussan" "mohamed"
Lists allow us to store values of different types as opposed to vectors that store a single data type. syntax: list(x, y, z, …).
# Creating a list containing numbers, characters, and logical values
list(1,2,"clean",23.45,TRUE,FALSE)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] "clean"
##
## [[4]]
## [1] 23.45
##
## [[5]]
## [1] TRUE
##
## [[6]]
## [1] FALSE
# Creating a list inside another list
list(list(1,23,34.4,5))
## [[1]]
## [[1]][[1]]
## [1] 1
##
## [[1]][[2]]
## [1] 23
##
## [[1]][[3]]
## [1] 34.4
##
## [[1]][[4]]
## [1] 5
# Creating a list with mixed elements and inspecting its structure
y<-list('data analysis',TRUE,123.68,'data science')
str(y)
## List of 4
## $ : chr "data analysis"
## $ : logi TRUE
## $ : num 124
## $ : chr "data science"
# Creating a named list
list_1<-list("R" = 1, "python" = 2,"css"=3 )
# Accessing the value associated with 'css'
list_1$"css"
## [1] 3
other resource (vector and list) clike her: {https://r4ds.had.co.nz/vectors.html#vectors}
It is a data structure used to store data in the form of a table consisting of rows and columns. A Data Frame is a special type of list where each column can contain a different data type (numbers, text, dates, etc.).
To create it, use the following function data.frame()
# Creating a Data Frame containing information about analysis tools
tools_df <- data.frame(
Tool = c("R", "Python", "Tableau", "Excel", "SAS", "Power BI"),
Type = c("Programming", "Programming", "Visualization", "Spreadsheet", "Statistical", "Visualization"),
Main_Use = c("Statistical Computing", "Data Science & Machine Learning", "Business Intelligence", "Data Analysis", "Advanced Analytics", "Business Intelligence"),
Supported_Data_Types = c("Data Frames, Vectors, Matrices", "Data Frames, Arrays, Lists", "Data Sets, Charts", "Spreadsheets, Charts", "Data Sets, Tables", "Data Sets, Charts"),
Popularity = c("High", "Very High", "High", "High", "Medium", "High")
)
head(tools_df)
## Tool Type Main_Use
## 1 R Programming Statistical Computing
## 2 Python Programming Data Science & Machine Learning
## 3 Tableau Visualization Business Intelligence
## 4 Excel Spreadsheet Data Analysis
## 5 SAS Statistical Advanced Analytics
## 6 Power BI Visualization Business Intelligence
## Supported_Data_Types Popularity
## 1 Data Frames, Vectors, Matrices High
## 2 Data Frames, Arrays, Lists Very High
## 3 Data Sets, Charts High
## 4 Spreadsheets, Charts High
## 5 Data Sets, Tables Medium
## 6 Data Sets, Charts High
str(tools_df)
## 'data.frame': 6 obs. of 5 variables:
## $ Tool : chr "R" "Python" "Tableau" "Excel" ...
## $ Type : chr "Programming" "Programming" "Visualization" "Spreadsheet" ...
## $ Main_Use : chr "Statistical Computing" "Data Science & Machine Learning" "Business Intelligence" "Data Analysis" ...
## $ Supported_Data_Types: chr "Data Frames, Vectors, Matrices" "Data Frames, Arrays, Lists" "Data Sets, Charts" "Spreadsheets, Charts" ...
## $ Popularity : chr "High" "Very High" "High" "High" ...
colnames(tools_df)
## [1] "Tool" "Type" "Main_Use"
## [4] "Supported_Data_Types" "Popularity"
dim(tools_df)
## [1] 6 5
tail(tools_df)
## Tool Type Main_Use
## 1 R Programming Statistical Computing
## 2 Python Programming Data Science & Machine Learning
## 3 Tableau Visualization Business Intelligence
## 4 Excel Spreadsheet Data Analysis
## 5 SAS Statistical Advanced Analytics
## 6 Power BI Visualization Business Intelligence
## Supported_Data_Types Popularity
## 1 Data Frames, Vectors, Matrices High
## 2 Data Frames, Arrays, Lists Very High
## 3 Data Sets, Charts High
## 4 Spreadsheets, Charts High
## 5 Data Sets, Tables Medium
## 6 Data Sets, Charts High
You can create a new file by using the file.create() function. *EX : file.create(“new_file.txt”)
You can use the file.copy() function to copy a file from one place to another
Copy a file to a specific folder
You can use the writeLines() or write() function to write content to a text file
To read from a text file, you can use the readLines() function to read content line by line
CSV file read: *EX: data <- read.csv(“data.csv”)
Write a CSV file:
You can delete a file by using the file.remove() function
A matrix in R is a two-dimensional array where elements are arranged in rows and columns. created using the matrix() function
matrix(data, nrow, ncol, byrow = FALSE, dimnames = NULL) data: The elements to populate the matrix. It is typically a vector. nrow: Number of rows. ncol: Number of columns. byrow: Logical value indicating whether the matrix should be filled by row (TRUE) or by column (FALSE). dimnames: Optional names for the rows and columns.EX
# Create an array containing analysis tools with values representing how well the tools support different functions
analysis_tools_matrix <- matrix(c(5, 4, 3, 5, 5, 4), nrow = 3, ncol = 2)
#Name rows and columns
rownames(analysis_tools_matrix) <- c("R", "Python", "Tableau")
colnames(analysis_tools_matrix) <- c("Data Analysis", "Data Visualization")
# Matrix printing
print(analysis_tools_matrix)
## Data Analysis Data Visualization
## R 5 5
## Python 4 5
## Tableau 3 4
# Accessing the element in the first row and second column
element <- analysis_tools_matrix[1, 2]
print(element)
## [1] 5
# Accessing the element in the second row and first column
element <- analysis_tools_matrix[2, 1]
print(element)
## [1] 4
# Accessing the entire first column (Ease of Use)
column_data <- analysis_tools_matrix[, 1]
print(column_data)
## R Python Tableau
## 5 4 3
# Accessing the entire third row (Excel)
row_data <- analysis_tools_matrix[3, ]
print(row_data)
## Data Analysis Data Visualization
## 3 4
is a symbol that identi es the type of operation or calculation to be performed in a formula. Logical operators return a logical data type such as TRUE or FALSE.
A conditional statement is a declaration that if a certain condition holds, then a certain event must take place. For example,If the temperature is above freezing, then I will go outside for a walk.
You have made good progress, keep developing your creative skills!
We will dive deeper into the analysis process in the upcoming labs. Stay tuned!