Introduction to R

Column 1

Course title

Overview

About the course

R is the most popular programming language in the data industry. It uses vectors and a variety of pre-processed packages. It’s in high demand for Data Scientists, Analysts, and Statisticians alike. This introduction to R course covers the basics of this open source language, including vectors, factors, lists, and data frames. You’ll gain useful coding skills and be ready to start your own data analysis in R. “The”R” name is derived from the first letter of the names of its two developers, Ross Ihaka and Robert Gentleman, who were associated with the University of Auckland at the time”. In 1991, R was created by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of Auckland.

Course structure

There will be six lessons. The course will be started with basic operations, like installing R and RStudio, using the console as a calculator (lesson-1) and understanding basic data types in R (lesson-2). Then we will move on to basic data structures (lesson-3) and Indexing to extract values of a variable in a dataset (lesson-4). Next, we will learn how to do vector algebra and matrices in R (lesson-5). Finally, we will learn data exploration, cleaning data and plotting in R using simple R codes (lesson-6).

Course learning outcomes

Upon completion of this Introduction to R course, learners will be able to use the R basics for their own data analysis. These sought-after skills can help you progress in your career and set you up for further self-learning.

Facilitator

Dr Proloy Barua, Assistant Scientist, BRAC James P Grant School of Public Health, BRAC University

Getting help in R

Within R, it has a facilities to searching for help and documentation. # (hashtag) sign will make your R commands as text. You can write any texts with theuse of # sign as follows

help.search(“mean”) #search for specific subject
find(“mean”) #search for packages related to any subject

Some Basics of R using keyboard

Ctrl+Enter #for execution of commands or arguments
Ctrl+l #To clear console window
Ctrl+a #To clear first line
Ctrl+e #To clear last line
Ctrl+u #To clear current line
Ctrl+c #To copy
Ctrl+v #To paste
rm(list=ls()) # Clean up everything
getwd() # Get working directory
setwd(d) # Setting path of working directory

Free Online Resources

Installing R (https://github.com/genomicsclass/windows#installing-r)
R Studio (https://www.rstudio.com/products/rstudio/download/)
R Studio Cheat Sheets (https://rstudio.cloud/learn/cheat-sheets)
Introduction to R by Robert J. Hijmans available at https://rspatial.org/intr/IntroductiontoR.pdf
An Introduction to R by W. N. Venables, D. M. Smith and the R Core Team https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
R for Beginners by Emmanuel Paradis http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
R tutorial by Kelly Black http://www.cyclismo.org/tutorial/R/
A brief overview by Ross Ihaka (one of the originators of R) https://www.stat.auckland.ac.nz/~ihaka/120/Notes/ch02.pdf
Information Visualization course by Ross Ihaka https://www.stat.auckland.ac.nz/~ihaka/120/notes.html
A Beginner’s Guide to R by Zuur, Leno and Meesters http://www.springer.com/us/book/9780387938363
R in a nutshell by Joseph Adler http://shop.oreilly.com/product/0636920022008.do
The Art of R Programming by Norman Matloff http://www.nostarch.com/artofr.htm
Introduction to R by Datacamp https://www.datacamp.com/courses/free-introduction-to-r
Good material on rstatistics.net http://rstatistics.net/
Watch some Google Developers videos http://www.youtube.com/playlist?list=PLOU2XLYxmsIK9qQfztXeybpHvru-TrqAP
Advanced R by Hadley Wickham http://adv-r.had.co.nz/
StackOverflow https://stackoverflow.com
R-Bloggers https://www.r-bloggers.com
Rseek https://rseek.org/ (served as R search engine powered by google)
Stackexchange https://stackexchange.com

Course Outline

R Course Outline

Course Participants

R Course Participants

Powerpoint Slides

R Markdown File

Please click here for R Markdown File

R Scripts

================================== Lesson-2: Basic data types in R ==================================

Followings are vector data. A vector is a one-dimensional array or structure

- Numeric values - Integer values - Character values - Logical values - Factors - Missing values - Time

Numeric values

a <- 7 # one element

show(a)

print(a)

a

class(a)

length(a) # to see how many elements or observations in the vector

rm(a) # Remove any variable or file. Now try this function show(a)

Integer values

b <- 7L

b

class(b)

Character values

x <- “Proloy”

x

class(x)

Logical values

x <- FALSE

y<- TRUE

x

y class(x)

class (y)

Factors

countries <- c(‘Bangladesh’, ‘Bangladesh’, ‘India’, ‘Afghanistan’, ‘India’)

countries

class(countries)

f1 <- as.factor(countries) # converting character values into factor values

f1

class(f1)

Missing values

m <- c(2, NA, 5, 2, NA, 2) # NA (“Not Available”) (e.g. missing value = .)

is.na(m) # To check NA or missing values

class(m)

which(is.na(m)) # Get positions of NA

n <- c(5, 9, NaN, 3, 8, NA, NaN) # NaN (“Not a Number”) (e.g. 0 / 0)

is.nan(n) # To check NaN values

class(n)

which(is.nan(n)) # Get positions of NaN

Time

d<- Sys.Date()

class(d)

================================ Lesson-3: Basic data structures ================================

In the previous lesson we learned one dimensional data structure (vector). In this lesson, we will learn multi-dimensional data structures that can store basic data or vector data

Matrix

A two-dimensional rectangular layout is called a matrix. We can create a matrix with two rows and three columns using following codes

m <- matrix(ncol=3, nrow=2)

Note that all values were missing (NA) in above matrix. Let’s make a matrix with values 1 to 6

m <- matrix(data=c(1:6), ncol=3, nrow=2, byrow = TRUE) # Arguments- like parameters, are are information passed to functions.

m <- matrix(data=c(1:6), ncol=3, nrow=2, byrow = FALSE) # By default elements are arranged sequentially by column.

t(m) # switching the number of columns and rows and using the t (transpose) function

A matrix can only store a single data type. If you try to mix character and numeric values, all values will become character values (as the other way around may not be possible)

vchar <- c(“a”, “b”)

class(vchar)

vnumb <- c(1,2)

class(vnumb)

matrix(c(vchar,vnumb), ncol=2, nrow=2, byrow = FALSE)

m <- matrix(data=c(1:6), ncol=3, nrow=2, byrow = FALSE) # Define the column and row names in matrix m

rownames(m) = c(“row1”, “row2”) # Row names are less important.

colnames(m) = c(“ID”, “X”, “Y”)

class(m)

List

A list in R is similar to your to-do list at work or school a list is some kind super data type

v <- c(1:10)

m <- matrix(data=c(1:6), ncol = 3, nrow=2)

c <- “abc”

l<- list(v, m, c)

names(l) <- c(“first”, “second”, “third”) # Naming of list elements

print(l)

class(l)

Data frame

It is rectangular like a matrix, but unlike matrices a data.frame can have columns (variables) of different data types such as numeric, character, factor. Let’s create a data frame with the following four variables or vectors

ID <- as.integer(c(1,2,3,4))

name <- c(“name1”, “name2”, “name3”, “name4”)

sex <- as.factor(c(“Female”,“Male”,“Male”,“Female”))

age <- as.numeric(c(36, 27, 37, 32))

df <- data.frame(ID, name, sex, age, stringsAsFactors=FALSE)

print(df)

class(df)

str(df) # to see the data structure

================================ Lesson-4: Indexing ================================

Vector

Access element(s) of a vector

b <- c(10:15)

b[1] # Get the first element of a vector

b[-2] # Get all elements except the second

b[1] <- 11 # use an index to change values

b[3:6] <- -99 # use an index to change values

Matrix

values of matrices can be accessed through indexing

m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE)

colnames(m) <- c(‘a’, ‘b’, ‘c’)

use two numbers in a double index, the first for the row number(s) and the second for the column number(s).

m[2,2]

m[ ,2] # entire column

m[, c(‘a’, ‘c’)] # two columns

m[1,1] <- 5 # setting values

List

v <- c(1:10)

m <- matrix(data=c(1:6), ncol = 3, nrow=2)

c <- “abc”

l<- list(v, m, c)

names(l) <- c(“first”, “second”, “third”) # Naming of list elements

print(l)

class(l)

l$first # the first elements can be extracted by using the $ (dollar) operator

l$second

l$third

l[[“first”]] # to extract elements of first vector

Data frame

m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE) # create a data.frame from matrix m

colnames(m) <- c(‘a’, ‘b’, ‘c’)

d <- data.frame(m)

class(d)

d[,2] # extract a column by column number

d[, ‘b’] # use the column name to get values

d[ , ‘b’, drop=FALSE] # to make the output a one dimensional matrix

Which

When we need to find indices of the elements in a vector that have values above 15? The function which() gives us the entries of a logical vector that are true.

x <- c(10:20)

i <- which(x > 15)

print(i)

x[i]

%in%

A very useful operator that allows you to ask whether a set of values is present in a vector is %in%.

x <- c(10:20)

j <- c(7,9,11,13)

j %in% x

which(j %in% x)

Match

The function match() looks for entries in a vector and returns the index needed to access them

match(j, x) # Another handy similar function is match

================================ Lesson-5: Algebra ================================

Vector algebra

Creating example vectors

a <- c(1:5) # to create vector a

b <- c(6:10) # to create vector b

d <- a*b # Multiplication works element by element

Logical comparisons

a == 2

b > 6 & b < 8

b > 9 | a < 2

b >= 9

a <= 2

b >= 9 | a <= 2

b >= 9 & a <= 2

Functions

sqrt(a)

exp(a)

min(a)

max(a)

range(a)

sum(a)

mean(a)

median(a)

prod(a)

sd(a)

Random numbers

r <- runif(10) # for uniform distributed numbers

r <- rnorm(10, mean=10, sd =2) # for randomly distributed numbers

To be able to exactly reproduce examples or data analysis we often want to assure that we take exactly the same “random” sample each time we run our code.

set.seed(n)

Matrices

m <- matrix(1:6, ncol=3, nrow=2, byrow=TRUE) # Create an example matrix

print(m)

m*5 #to multiply all values of m with 5

m*m # multiply two matrices

m * 1:2 # We can also do math with a matrix and a vector

================================ Lesson-6: Data Exploration ================================

Summary and Table

d <- data.frame(id=1:10, name=c(‘Bob’, ‘Bobby’, ‘???’, ‘Bob’, ‘Bab’, ‘Jim’, ‘Jim’, ‘jim’, ’‘, ’Jim’), score1=c(8, 10, 7, 9, 2, 5, 1, 6, 3, 4), score2=c(3,4,5,-999,5,5,-999,2,3,4), stringsAsFactors=FALSE)

print(d)

str(d)

summary(d) # to see summary of data

i <- d$score2 == -999 # R uses dollar symbol to extract variable from dataset

d$score2[i] <- NA

summary(d)

unique(d$name) # to see unique character (and integer) of character variable

table(d$name)

Note that somehow $ symbol is not appearing after d. Please add $ after d without any space Note that somehow $ symbol is not appearing after df. So I add space to appear it

d $name[d $name %in% c(‘Bab’, ‘Bobby’)] <- ‘Bob’ # to replace ‘Bab’ and ‘Bobby’ with ‘Bob’

table(d$name)

d $name[d $name %in% ‘jim’] <- ‘Jim’ # to replace ‘jim’ with ‘Jim’

table(d$name)

d $name[d $name == ‘???’] <- NA # to replace ‘???’ with NA

table(d$name)

table(d$name, useNA=‘ifany’) # To force table to also count the NA values.

d$name[9]

Note that there is one ‘empty’ value in the dataset. to replace ‘empty’ value with NA (missing value) Note that somehow $ symbol is not appearing after df. So I add space to appear it

d $name[d $name == ’’] <- NA # to replace empty value ’’ with NA

table(d[ c(‘name’, ‘score2’)]) # to see frequency table of two variables

Quantile, range, and mean

quantile(d$score1)

range(d$score1)

mean(d$score1)

Note that we may need to use na.rm=TRUE if there are NA values. for example we see error for this quantile(d$score2)

quantile(d$score2, na.rm=TRUE)

range(d$score2)

range(d$score2, na.rm=TRUE)

Plots

par(mfrow=c(2,2)) # sets up the canvas for two rows and columns

plot(d$score1, d$score2) # Scatter plot with two variables

boxplot(d[, c(‘score1’, ‘score2’)]) # Boxplot of two variables

plot(sort(d$score1))

hist(d$score2)

======================== Read and write files ========================

To read first we need to know the full path (directory) name and the name of the file for path delimiters we need to use the forward-slash “/”. For example, “C:/projects/research/data/obs.csv”.

setwd(“F:/BSMMU/Spatial Data Analysis in R/”) # Setting working directory

getwd() # to see file.path

df <- read.csv(“F:/BSMMU/Spatial Data Analysis in R/participants.csv”)

print(df)

class(df)

str(df)

write.csv(df, “F:/BSMMU/Spatial Data Analysis in R/participants_data.csv”)

Learning some data cleaning

colnames(df) <- c(“reg”, “name”, “excel”, “spss”, “stata”, “r”, “sas”, “age”, “sex”,“major”, “laptop”) # naming variabxle names sequentially

str(df) # see data structure

Note that somehow $ symbol is not appearing after df. So I add space to appear it

df $excel <- as.factor(df $excel) # converting character to factor

df $spss <- as.factor(df $spss) # converting character to factor

df $stata <- as.factor(df $stata) # converting character to factor

df $r <- as.factor(df $r) # converting character to factor

df $sas <- as.factor(df $sas) # converting character to factor

df $age <- as.factor(df $age) # converting character to factor

df$sex <- as.factor(df $sex) # converting character to factor

df $major <- as.factor(df $major) # converting character to factor

df $laptop <- as.factor(df $laptop) # converting character to factor

df <- df[-1] # removing first variable

df$ID <- 1:nrow(df) # creating a new variable called ID

data.table::setcolorder(df, neworder = “ID”) # Ordering variables starting with ID

str(df)

--- title: "Introduction to R" date: '' output: flexdashboard::flex_dashboard: source_code: embed social: menu --- ```{css} body > div.navbar.navbar-inverse.navbar-fixed-top > div > div.navbar-header > span.navbar-brand { font-size: 25px; color: white; } ``` ```{r setup, include=FALSE} library(flexdashboard) library(knitr) ``` ## Column 1 {data-width=650, .tabset} ### Course title ```{r, echo = FALSE, out.width="75%"} include_graphics("F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide1.png") ``` ### Overview About the course R is the most popular programming language in the data industry. It uses vectors and a variety of pre-processed packages. It’s in high demand for Data Scientists, Analysts, and Statisticians alike. This introduction to R course covers the basics of this open source language, including vectors, factors, lists, and data frames. You’ll gain useful coding skills and be ready to start your own data analysis in R. "The "R" name is derived from the first letter of the names of its two developers, Ross Ihaka and Robert Gentleman, who were associated with the University of Auckland at the time". [In 1991, R was created by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of Auckland](https://bookdown.org/rdpeng/rprogdatascience/history-and-overview-of-r.html). Course structure There will be six lessons. The course will be started with basic operations, like installing R and RStudio, using the console as a calculator (lesson-1) and understanding basic data types in R (lesson-2). Then we will move on to basic data structures (lesson-3) and Indexing to extract values of a variable in a dataset (lesson-4). Next, we will learn how to do vector algebra and matrices in R (lesson-5). Finally, we will learn data exploration, cleaning data and plotting in R using simple R codes (lesson-6). Course learning outcomes Upon completion of this Introduction to R course, learners will be able to use the R basics for their own data analysis. These sought-after skills can help you progress in your career and set you up for further self-learning. Facilitator [Dr Proloy Barua](https://rpubs.com/proloy/949659), Assistant Scientist, BRAC James P Grant School of Public Health, BRAC University Getting help in R Within R, it has a facilities to searching for help and documentation. # (hashtag) sign will make your R commands as text. You can write any texts with theuse of # sign as follows - help.search(“mean”) #search for specific subject - find(“mean”) #search for packages related to any subject Some Basics of R using keyboard - Ctrl+Enter #for execution of commands or arguments - Ctrl+l #To clear console window - Ctrl+a #To clear first line - Ctrl+e #To clear last line - Ctrl+u #To clear current line - Ctrl+c #To copy - Ctrl+v #To paste - rm(list=ls()) # Clean up everything - getwd() # Get working directory - setwd(d) # Setting path of working directory Free Online Resources - Installing R (https://github.com/genomicsclass/windows#installing-r) - R Studio (https://www.rstudio.com/products/rstudio/download/) - R Studio Cheat Sheets (https://rstudio.cloud/learn/cheat-sheets) - Introduction to R by Robert J. Hijmans available at https://rspatial.org/intr/IntroductiontoR.pdf - An Introduction to R by W. N. Venables, D. M. Smith and the R Core Team https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf - R for Beginners by Emmanuel Paradis http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf - R tutorial by Kelly Black http://www.cyclismo.org/tutorial/R/ - A brief overview by Ross Ihaka (one of the originators of R) https://www.stat.auckland.ac.nz/~ihaka/120/Notes/ch02.pdf - Information Visualization course by Ross Ihaka https://www.stat.auckland.ac.nz/~ihaka/120/notes.html - A Beginner’s Guide to R by Zuur, Leno and Meesters http://www.springer.com/us/book/9780387938363 - R in a nutshell by Joseph Adler http://shop.oreilly.com/product/0636920022008.do - The Art of R Programming by Norman Matloff http://www.nostarch.com/artofr.htm - Introduction to R by Datacamp https://www.datacamp.com/courses/free-introduction-to-r - Good material on rstatistics.net http://rstatistics.net/ - Watch some Google Developers videos http://www.youtube.com/playlist?list=PLOU2XLYxmsIK9qQfztXeybpHvru-TrqAP - Advanced R by Hadley Wickham http://adv-r.had.co.nz/ - StackOverflow https://stackoverflow.com - R-Bloggers https://www.r-bloggers.com - Rseek https://rseek.org/ (served as R search engine powered by google) - Stackexchange https://stackexchange.com ### Course Outline R Course Outline ```{r echo=FALSE, eval=TRUE, message=FALSE, warning=FALSE} library("reactable") library("htmlwidgets") library("htmltools") library("data.table") R_trainees <- data.table::fread("F:/BSMMU/Spatial Data Analysis in R/Course_outline.csv") reactable::reactable(R_trainees,highlight = TRUE, outlined = TRUE, bordered = TRUE, borderless = FALSE, striped = TRUE, compact = TRUE, searchable = TRUE, wrap = TRUE, columns=list(Sessions=colDef(minWidth=20), Contents=colDef(minWidth=30), "Sub-Contents" =colDef(minWidth=60)), showPageSizeOptions = TRUE, defaultPageSize = 120) ``` ### Course Participants R Course Participants ```{r echo=FALSE, eval=TRUE, message=FALSE, warning=FALSE} library("reactable") library("htmlwidgets") library("htmltools") library("data.table") R_trainees <- data.table::fread("F:/BSMMU/Spatial Data Analysis in R/participants.csv") reactable::reactable(R_trainees,highlight = TRUE, outlined = TRUE, bordered = TRUE, borderless = FALSE, striped = TRUE, compact = TRUE, searchable = TRUE, wrap = TRUE, showPageSizeOptions = TRUE, defaultPageSize = 50) ``` ### Powerpoint Slides ```{r, echo = FALSE, out.width="75%", fig.align= "center"} library(knitr) include_graphics(c("F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide2.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide3.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide4.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide5.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide6.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide7.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide8.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide9.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide10.png")) ``` ### R Markdown File Please click [here](https://rpubs.com/proloy/983358) for R Markdown File R Scripts ================================== **Lesson-2: Basic data types in R** ================================== **Followings are vector data. A vector is a one-dimensional array or structure** - Numeric values - Integer values - Character values - Logical values - Factors - Missing values - Time Numeric values a <- 7 # one element show(a) print(a) a class(a) length(a) # to see how many elements or observations in the vector rm(a) # Remove any variable or file. Now try this function show(a) Integer values b <- 7L b class(b) Character values x <- "Proloy" x class(x) Logical values x <- FALSE y<- TRUE x y class(x) class (y) Factors countries <- c('Bangladesh', 'Bangladesh', 'India', 'Afghanistan', 'India') countries class(countries) f1 <- as.factor(countries) # converting character values into factor values f1 class(f1) Missing values m <- c(2, NA, 5, 2, NA, 2) # NA (“Not Available”) (e.g. missing value = .) is.na(m) # To check NA or missing values class(m) which(is.na(m)) # Get positions of NA n <- c(5, 9, NaN, 3, 8, NA, NaN) # NaN (“Not a Number”) (e.g. 0 / 0) is.nan(n) # To check NaN values class(n) which(is.nan(n)) # Get positions of NaN Time d<- Sys.Date() d class(d) ================================ **Lesson-3: Basic data structures** ================================ **In the previous lesson we learned one dimensional data structure (vector). In this lesson, we will learn multi-dimensional data structures that can store basic data or vector data** Matrix **A two-dimensional rectangular layout is called a matrix. We can create a matrix with two rows and three columns using following codes** m <- matrix(ncol=3, nrow=2) m **Note that all values were missing (NA) in above matrix. Let’s make a matrix with values 1 to 6** m <- matrix(data=c(1:6), ncol=3, nrow=2, byrow = TRUE) # Arguments- like parameters, are are information passed to functions. m m <- matrix(data=c(1:6), ncol=3, nrow=2, byrow = FALSE) # By default elements are arranged sequentially by column. m t(m) # switching the number of columns and rows and using the t (transpose) function **A matrix can only store a single data type. If you try to mix character and numeric values, all values will become character values (as the other way around may not be possible)** vchar <- c("a", "b") class(vchar) vnumb <- c(1,2) class(vnumb) matrix(c(vchar,vnumb), ncol=2, nrow=2, byrow = FALSE) m <- matrix(data=c(1:6), ncol=3, nrow=2, byrow = FALSE) # Define the column and row names in matrix m m rownames(m) = c("row1", "row2") # Row names are less important. colnames(m) = c("ID", "X", "Y") m class(m) List **A list in R is similar to your to-do list at work or school a list is some kind super data type** v <- c(1:10) m <- matrix(data=c(1:6), ncol = 3, nrow=2) c <- "abc" l<- list(v, m, c) names(l) <- c("first", "second", "third") # Naming of list elements print(l) class(l) Data frame **It is rectangular like a matrix, but unlike matrices a data.frame can have columns (variables) of different data types such as numeric, character, factor. Let's create a data frame with the following four variables or vectors** ID <- as.integer(c(1,2,3,4)) name <- c("name1", "name2", "name3", "name4") sex <- as.factor(c("Female","Male","Male","Female")) age <- as.numeric(c(36, 27, 37, 32)) df <- data.frame(ID, name, sex, age, stringsAsFactors=FALSE) print(df) class(df) str(df) # to see the data structure ================================ **Lesson-4: Indexing** ================================ Vector **Access element(s) of a vector** b <- c(10:15) b b[1] # Get the first element of a vector b[-2] # Get all elements except the second b[1] <- 11 # use an index to change values b[3:6] <- -99 # use an index to change values b Matrix **values of matrices can be accessed through indexing** m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE) colnames(m) <- c('a', 'b', 'c') m **use two numbers in a double index, the first for the row number(s) and the second for the column number(s).** m[2,2] m[ ,2] # entire column m[, c('a', 'c')] # two columns m[1,1] <- 5 # setting values List v <- c(1:10) m <- matrix(data=c(1:6), ncol = 3, nrow=2) c <- "abc" l<- list(v, m, c) names(l) <- c("first", "second", "third") # Naming of list elements print(l) class(l) l$first # the first elements can be extracted by using the $ (dollar) operator l$second l$third l[["first"]] # to extract elements of first vector Data frame m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE) # create a data.frame from matrix m colnames(m) <- c('a', 'b', 'c') d <- data.frame(m) class(d) d[,2] # extract a column by column number d[, 'b'] # use the column name to get values d[ , 'b', drop=FALSE] # to make the output a one dimensional matrix Which **When we need to find indices of the elements in a vector that have values above 15? The function which() gives us the entries of a logical vector that are true.** x <- c(10:20) i <- which(x > 15) print(i) x[i] %in% **A very useful operator that allows you to ask whether a set of values is present in a vector is %in%.** x <- c(10:20) j <- c(7,9,11,13) j %in% x which(j %in% x) Match **The function match() looks for entries in a vector and returns the index needed to access them** match(j, x) # Another handy similar function is match ================================ **Lesson-5: Algebra** ================================ Vector algebra Creating example vectors a <- c(1:5) # to create vector a b <- c(6:10) # to create vector b d <- a*b # Multiplication works element by element Logical comparisons a == 2 b > 6 & b < 8 b > 9 | a < 2 b >= 9 a <= 2 b >= 9 | a <= 2 b >= 9 & a <= 2 Functions sqrt(a) exp(a) min(a) max(a) range(a) sum(a) mean(a) median(a) prod(a) sd(a) Random numbers r <- runif(10) # for uniform distributed numbers r <- rnorm(10, mean=10, sd =2) # for randomly distributed numbers **To be able to exactly reproduce examples or data analysis we often want to assure that we take exactly the same “random” sample each time we run our code.** set.seed(n) Matrices m <- matrix(1:6, ncol=3, nrow=2, byrow=TRUE) # Create an example matrix print(m) m*5 #to multiply all values of m with 5 m*m # multiply two matrices m * 1:2 # We can also do math with a matrix and a vector ================================ **Lesson-6: Data Exploration** ================================ Summary and Table d <- data.frame(id=1:10, name=c('Bob', 'Bobby', '???', 'Bob', 'Bab', 'Jim', 'Jim', 'jim', '', 'Jim'), score1=c(8, 10, 7, 9, 2, 5, 1, 6, 3, 4), score2=c(3,4,5,-999,5,5,-999,2,3,4), stringsAsFactors=FALSE) print(d) str(d) summary(d) # to see summary of data i <- d$score2 == -999 # R uses dollar symbol to extract variable from dataset d$score2[i] <- NA summary(d) unique(d$name) # to see unique character (and integer) of character variable table(d$name) **Note that somehow $ symbol is not appearing after d. Please add $ after d without any space** **Note that somehow $ symbol is not appearing after df. So I add space to appear it** d $name[d $name %in% c('Bab', 'Bobby')] <- 'Bob' # to replace ‘Bab’ and ‘Bobby’ with ‘Bob’ table(d$name) d $name[d $name %in% ‘jim’] <- ‘Jim’ # to replace ‘jim’ with ‘Jim’ table(d$name) d $name[d $name == '???'] <- NA # to replace '???' with NA table(d$name) table(d$name, useNA='ifany') # To force table to also count the NA values. d$name[9] **Note that there is one ‘empty’ value in the dataset. to replace ‘empty’ value with NA (missing value)** **Note that somehow $ symbol is not appearing after df. So I add space to appear it** d $name[d $name == ''] <- NA # to replace empty value '' with NA table(d[ c('name', 'score2')]) # to see frequency table of two variables Quantile, range, and mean quantile(d$score1) range(d$score1) mean(d$score1) **Note that we may need to use na.rm=TRUE if there are NA values. for example we see error for this quantile(d$score2)** quantile(d$score2, na.rm=TRUE) range(d$score2) range(d$score2, na.rm=TRUE) Plots par(mfrow=c(2,2)) # sets up the canvas for two rows and columns plot(d$score1, d$score2) # Scatter plot with two variables boxplot(d[, c('score1', 'score2')]) # Boxplot of two variables plot(sort(d$score1)) hist(d$score2) ======================== **Read and write files** ======================== **To read first we need to know the full path (directory) name and the name of the file for path delimiters we need to use the forward-slash "/". For example, "C:/projects/research/data/obs.csv".** setwd("F:/BSMMU/Spatial Data Analysis in R/") # Setting working directory getwd() # to see file.path df <- read.csv("F:/BSMMU/Spatial Data Analysis in R/participants.csv") print(df) class(df) str(df) write.csv(df, "F:/BSMMU/Spatial Data Analysis in R/participants_data.csv") Learning some data cleaning colnames(df) <- c("reg", "name", "excel", "spss", "stata", "r", "sas", "age", "sex","major", "laptop") # naming variabxle names sequentially str(df) # see data structure **Note that somehow $ symbol is not appearing after df. So I add space to appear it** df $excel <- as.factor(df $excel) # converting character to factor df $spss <- as.factor(df $spss) # converting character to factor df $stata <- as.factor(df $stata) # converting character to factor df $r <- as.factor(df $r) # converting character to factor df $sas <- as.factor(df $sas) # converting character to factor df $age <- as.factor(df $age) # converting character to factor df$sex <- as.factor(df $sex) # converting character to factor df $major <- as.factor(df $major) # converting character to factor df $laptop <- as.factor(df $laptop) # converting character to factor df <- df[-1] # removing first variable df df$ID <- 1:nrow(df) # creating a new variable called ID df data.table::setcolorder(df, neworder = "ID") # Ordering variables starting with ID df str(df)