About the course
R is the most popular programming language in the data industry. It uses vectors and a variety of pre-processed packages. It’s in high demand for Data Scientists, Analysts, and Statisticians alike. This introduction to R course covers the basics of this open source language, including vectors, factors, lists, and data frames. You’ll gain useful coding skills and be ready to start your own data analysis in R. “The”R” name is derived from the first letter of the names of its two developers, Ross Ihaka and Robert Gentleman, who were associated with the University of Auckland at the time”. In 1991, R was created by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of Auckland.
Course structure
There will be six lessons. The course will be started with basic operations, like installing R and RStudio, using the console as a calculator (lesson-1) and understanding basic data types in R (lesson-2). Then we will move on to basic data structures (lesson-3) and Indexing to extract values of a variable in a dataset (lesson-4). Next, we will learn how to do vector algebra and matrices in R (lesson-5). Finally, we will learn data exploration, cleaning data and plotting in R using simple R codes (lesson-6).
Course learning outcomes
Upon completion of this Introduction to R course, learners will be able to use the R basics for their own data analysis. These sought-after skills can help you progress in your career and set you up for further self-learning.
Facilitator
Dr Proloy Barua, Assistant Scientist, BRAC James P Grant School of Public Health, BRAC University
Getting help in R
Within R, it has a facilities to searching for help and documentation. # (hashtag) sign will make your R commands as text. You can write any texts with theuse of # sign as follows
help.search(“mean”) #search for specific subject
find(“mean”) #search for packages related to any subject
Some Basics of R using keyboard
Ctrl+Enter #for execution of commands or arguments
Ctrl+l #To clear console window
Ctrl+a #To clear first line
Ctrl+e #To clear last line
Ctrl+u #To clear current line
Ctrl+c #To copy
Ctrl+v #To paste
rm(list=ls()) # Clean up everything
getwd() # Get working directory
setwd(d) # Setting path of working directory
Free Online Resources
Installing R (https://github.com/genomicsclass/windows#installing-r)
R Studio (https://www.rstudio.com/products/rstudio/download/)
R Studio Cheat Sheets (https://rstudio.cloud/learn/cheat-sheets)
Introduction to R by Robert J. Hijmans available at https://rspatial.org/intr/IntroductiontoR.pdf
An Introduction to R by W. N. Venables, D. M. Smith and the R Core Team https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
R for Beginners by Emmanuel Paradis http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
R tutorial by Kelly Black http://www.cyclismo.org/tutorial/R/
A brief overview by Ross Ihaka (one of the originators of R) https://www.stat.auckland.ac.nz/~ihaka/120/Notes/ch02.pdf
Information Visualization course by Ross Ihaka https://www.stat.auckland.ac.nz/~ihaka/120/notes.html
A Beginner’s Guide to R by Zuur, Leno and Meesters http://www.springer.com/us/book/9780387938363
R in a nutshell by Joseph Adler http://shop.oreilly.com/product/0636920022008.do
The Art of R Programming by Norman Matloff http://www.nostarch.com/artofr.htm
Introduction to R by Datacamp https://www.datacamp.com/courses/free-introduction-to-r
Good material on rstatistics.net http://rstatistics.net/
Watch some Google Developers videos http://www.youtube.com/playlist?list=PLOU2XLYxmsIK9qQfztXeybpHvru-TrqAP
Advanced R by Hadley Wickham http://adv-r.had.co.nz/
StackOverflow https://stackoverflow.com
R-Bloggers https://www.r-bloggers.com
Rseek https://rseek.org/ (served as R search engine powered by google)
Stackexchange https://stackexchange.com
R Course Outline
R Course Participants
Please click here for R Markdown File
R Scripts
================================== Lesson-2: Basic data types in R ==================================
Followings are vector data. A vector is a one-dimensional array or structure
- Numeric values
- Integer values
- Character values
- Logical values
- Factors
- Missing values
- Time
Numeric values
a <- 7 # one element
show(a)
print(a)
a
class(a)
length(a) # to see how many elements or observations in the vector
rm(a) # Remove any variable or file. Now try this function show(a)
Integer values
b <- 7L
b
class(b)
Character values
x <- “Proloy”
x
class(x)
Logical values
x <- FALSE
y<- TRUE
x
y class(x)
class (y)
Factors
countries <- c(‘Bangladesh’, ‘Bangladesh’, ‘India’, ‘Afghanistan’, ‘India’)
countries
class(countries)
f1 <- as.factor(countries) # converting character values into factor values
f1
class(f1)
Missing values
m <- c(2, NA, 5, 2, NA, 2) # NA (“Not Available”) (e.g. missing value = .)
is.na(m) # To check NA or missing values
class(m)
which(is.na(m)) # Get positions of NA
n <- c(5, 9, NaN, 3, 8, NA, NaN) # NaN (“Not a Number”) (e.g. 0 / 0)
is.nan(n) # To check NaN values
class(n)
which(is.nan(n)) # Get positions of NaN
Time
d<- Sys.Date()
d
class(d)
================================ Lesson-3: Basic data structures ================================
In the previous lesson we learned one dimensional data structure (vector). In this lesson, we will learn multi-dimensional data structures that can store basic data or vector data
Matrix
A two-dimensional rectangular layout is called a matrix. We can create a matrix with two rows and three columns using following codes
m <- matrix(ncol=3, nrow=2)
m
Note that all values were missing (NA) in above matrix. Let’s make a matrix with values 1 to 6
m <- matrix(data=c(1:6), ncol=3, nrow=2, byrow = TRUE) # Arguments- like parameters, are are information passed to functions.
m
m <- matrix(data=c(1:6), ncol=3, nrow=2, byrow = FALSE) # By default elements are arranged sequentially by column.
m
t(m) # switching the number of columns and rows and using the t (transpose) function
A matrix can only store a single data type. If you try to mix character and numeric values, all values will become character values (as the other way around may not be possible)
vchar <- c(“a”, “b”)
class(vchar)
vnumb <- c(1,2)
class(vnumb)
matrix(c(vchar,vnumb), ncol=2, nrow=2, byrow = FALSE)
m <- matrix(data=c(1:6), ncol=3, nrow=2, byrow = FALSE) # Define the column and row names in matrix m
m
rownames(m) = c(“row1”, “row2”) # Row names are less important.
colnames(m) = c(“ID”, “X”, “Y”)
m
class(m)
List
A list in R is similar to your to-do list at work or school a list is some kind super data type
v <- c(1:10)
m <- matrix(data=c(1:6), ncol = 3, nrow=2)
c <- “abc”
l<- list(v, m, c)
names(l) <- c(“first”, “second”, “third”) # Naming of list elements
print(l)
class(l)
Data frame
It is rectangular like a matrix, but unlike matrices a data.frame can have columns (variables) of different data types such as numeric, character, factor. Let’s create a data frame with the following four variables or vectors
ID <- as.integer(c(1,2,3,4))
name <- c(“name1”, “name2”, “name3”, “name4”)
sex <- as.factor(c(“Female”,“Male”,“Male”,“Female”))
age <- as.numeric(c(36, 27, 37, 32))
df <- data.frame(ID, name, sex, age, stringsAsFactors=FALSE)
print(df)
class(df)
str(df) # to see the data structure
================================ Lesson-4: Indexing ================================
Vector
Access element(s) of a vector
b <- c(10:15)
b
b[1] # Get the first element of a vector
b[-2] # Get all elements except the second
b[1] <- 11 # use an index to change values
b[3:6] <- -99 # use an index to change values
b
Matrix
values of matrices can be accessed through indexing
m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE)
colnames(m) <- c(‘a’, ‘b’, ‘c’)
m
use two numbers in a double index, the first for the row number(s) and the second for the column number(s).
m[2,2]
m[ ,2] # entire column
m[, c(‘a’, ‘c’)] # two columns
m[1,1] <- 5 # setting values
List
v <- c(1:10)
m <- matrix(data=c(1:6), ncol = 3, nrow=2)
c <- “abc”
l<- list(v, m, c)
names(l) <- c(“first”, “second”, “third”) # Naming of list elements
print(l)
class(l)
l$first # the first elements can be extracted by using the $ (dollar) operator
l$second
l$third
l[[“first”]] # to extract elements of first vector
Data frame
m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE) # create a data.frame from matrix m
colnames(m) <- c(‘a’, ‘b’, ‘c’)
d <- data.frame(m)
class(d)
d[,2] # extract a column by column number
d[, ‘b’] # use the column name to get values
d[ , ‘b’, drop=FALSE] # to make the output a one dimensional matrix
Which
When we need to find indices of the elements in a vector that have values above 15? The function which() gives us the entries of a logical vector that are true.
x <- c(10:20)
i <- which(x > 15)
print(i)
x[i]
%in%
A very useful operator that allows you to ask whether a set of values is present in a vector is %in%.
x <- c(10:20)
j <- c(7,9,11,13)
j %in% x
which(j %in% x)
Match
The function match() looks for entries in a vector and returns the index needed to access them
match(j, x) # Another handy similar function is match
================================ Lesson-5: Algebra ================================
Vector algebra
Creating example vectors
a <- c(1:5) # to create vector a
b <- c(6:10) # to create vector b
d <- a*b # Multiplication works element by element
Logical comparisons
a == 2
b > 6 & b < 8
b > 9 | a < 2
b >= 9
a <= 2
b >= 9 | a <= 2
b >= 9 & a <= 2
Functions
sqrt(a)
exp(a)
min(a)
max(a)
range(a)
sum(a)
mean(a)
median(a)
prod(a)
sd(a)
Random numbers
r <- runif(10) # for uniform distributed numbers
r <- rnorm(10, mean=10, sd =2) # for randomly distributed numbers
To be able to exactly reproduce examples or data analysis we often want to assure that we take exactly the same “random” sample each time we run our code.
set.seed(n)
Matrices
m <- matrix(1:6, ncol=3, nrow=2, byrow=TRUE) # Create an example matrix
print(m)
m*5 #to multiply all values of m with 5
m*m # multiply two matrices
m * 1:2 # We can also do math with a matrix and a vector
================================ Lesson-6: Data Exploration ================================
Summary and Table
d <- data.frame(id=1:10, name=c(‘Bob’, ‘Bobby’, ‘???’, ‘Bob’, ‘Bab’, ‘Jim’, ‘Jim’, ‘jim’, ’‘, ’Jim’), score1=c(8, 10, 7, 9, 2, 5, 1, 6, 3, 4), score2=c(3,4,5,-999,5,5,-999,2,3,4), stringsAsFactors=FALSE)
print(d)
str(d)
summary(d) # to see summary of data
i <- d$score2 == -999 # R uses dollar symbol to extract variable from dataset
d$score2[i] <- NA
summary(d)
unique(d$name) # to see unique character (and integer) of character variable
table(d$name)
Note that somehow $ symbol is not appearing after d. Please add $ after d without any space Note that somehow $ symbol is not appearing after df. So I add space to appear it
d $name[d $name %in% c(‘Bab’, ‘Bobby’)] <- ‘Bob’ # to replace ‘Bab’ and ‘Bobby’ with ‘Bob’
table(d$name)
d $name[d $name %in% ‘jim’] <- ‘Jim’ # to replace ‘jim’ with ‘Jim’
table(d$name)
d $name[d $name == ‘???’] <- NA # to replace ‘???’ with NA
table(d$name)
table(d$name, useNA=‘ifany’) # To force table to also count the NA values.
d$name[9]
Note that there is one ‘empty’ value in the dataset. to replace ‘empty’ value with NA (missing value) Note that somehow $ symbol is not appearing after df. So I add space to appear it
d $name[d $name == ’’] <- NA # to replace empty value ’’ with NA
table(d[ c(‘name’, ‘score2’)]) # to see frequency table of two variables
Quantile, range, and mean
quantile(d$score1)
range(d$score1)
mean(d$score1)
Note that we may need to use na.rm=TRUE if there are NA values. for example we see error for this quantile(d$score2)
quantile(d$score2, na.rm=TRUE)
range(d$score2)
range(d$score2, na.rm=TRUE)
Plots
par(mfrow=c(2,2)) # sets up the canvas for two rows and columns
plot(d\(score1, d\)score2) # Scatter plot with two variables
boxplot(d[, c(‘score1’, ‘score2’)]) # Boxplot of two variables
plot(sort(d$score1))
hist(d$score2)
======================== Read and write files ========================
To read first we need to know the full path (directory) name and the name of the file for path delimiters we need to use the forward-slash “/”. For example, “C:/projects/research/data/obs.csv”.
setwd(“F:/BSMMU/Spatial Data Analysis in R/”) # Setting working directory
getwd() # to see file.path
df <- read.csv(“F:/BSMMU/Spatial Data Analysis in R/participants.csv”)
print(df)
class(df)
str(df)
write.csv(df, “F:/BSMMU/Spatial Data Analysis in R/participants_data.csv”)
Learning some data cleaning
colnames(df) <- c(“reg”, “name”, “excel”, “spss”, “stata”, “r”, “sas”, “age”, “sex”,“major”, “laptop”) # naming variabxle names sequentially
str(df) # see data structure
Note that somehow $ symbol is not appearing after df. So I add space to appear it
df $excel <- as.factor(df $excel) # converting character to factor
df $spss <- as.factor(df $spss) # converting character to factor
df $stata <- as.factor(df $stata) # converting character to factor
df $r <- as.factor(df $r) # converting character to factor
df $sas <- as.factor(df $sas) # converting character to factor
df $age <- as.factor(df $age) # converting character to factor
df$sex <- as.factor(df $sex) # converting character to factor
df $major <- as.factor(df $major) # converting character to factor
df $laptop <- as.factor(df $laptop) # converting character to factor
df <- df[-1] # removing first variable
df
df$ID <- 1:nrow(df) # creating a new variable called ID
df
data.table::setcolorder(df, neworder = “ID”) # Ordering variables starting with ID
df
str(df)
---
title: "Introduction to R"
date: ''
output:
flexdashboard::flex_dashboard:
source_code: embed
social: menu
---
```{css}
body > div.navbar.navbar-inverse.navbar-fixed-top > div > div.navbar-header > span.navbar-brand {
font-size: 25px;
color: white;
}
```
```{r setup, include=FALSE}
library(flexdashboard)
library(knitr)
```
## Column 1 {data-width=650, .tabset}
### Course title
```{r, echo = FALSE, out.width="75%"}
include_graphics("F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide1.png")
```
### Overview
About the course
R is the most popular programming language in the data industry. It uses vectors and a variety of pre-processed packages. It’s in high demand for Data Scientists, Analysts, and Statisticians alike. This introduction to R course covers the basics of this open source language, including vectors, factors, lists, and data frames. You’ll gain useful coding skills and be ready to start your own data analysis in R. "The "R" name is derived from the first letter of the names of its two developers, Ross Ihaka and Robert Gentleman, who were associated with the University of Auckland at the time". [In 1991, R was created by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of Auckland](https://bookdown.org/rdpeng/rprogdatascience/history-and-overview-of-r.html).
Course structure
There will be six lessons. The course will be started with basic operations, like installing R and RStudio, using the console as a calculator (lesson-1) and understanding basic data types in R (lesson-2). Then we will move on to basic data structures (lesson-3) and Indexing to extract values of a variable in a dataset (lesson-4). Next, we will learn how to do vector algebra and matrices in R (lesson-5). Finally, we will learn data exploration, cleaning data and plotting in R using simple R codes (lesson-6).
Course learning outcomes
Upon completion of this Introduction to R course, learners will be able to use the R basics for their own data analysis. These sought-after skills can help you progress in your career and set you up for further self-learning.
Facilitator
[Dr Proloy Barua](https://rpubs.com/proloy/949659), Assistant Scientist, BRAC James P Grant School of Public Health, BRAC University
Getting help in R
Within R, it has a facilities to searching for help and documentation. # (hashtag) sign will make your R commands as text. You can write any texts with theuse of # sign as follows
- help.search(“mean”) #search for specific subject
- find(“mean”) #search for packages related to any subject
Some Basics of R using keyboard
- Ctrl+Enter #for execution of commands or arguments
- Ctrl+l #To clear console window
- Ctrl+a #To clear first line
- Ctrl+e #To clear last line
- Ctrl+u #To clear current line
- Ctrl+c #To copy
- Ctrl+v #To paste
- rm(list=ls()) # Clean up everything
- getwd() # Get working directory
- setwd(d) # Setting path of working directory
Free Online Resources
- Installing R (https://github.com/genomicsclass/windows#installing-r)
- R Studio (https://www.rstudio.com/products/rstudio/download/)
- R Studio Cheat Sheets (https://rstudio.cloud/learn/cheat-sheets)
- Introduction to R by Robert J. Hijmans available at https://rspatial.org/intr/IntroductiontoR.pdf
- An Introduction to R by W. N. Venables, D. M. Smith
and the R Core Team https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
- R for Beginners by Emmanuel Paradis http://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
- R tutorial by Kelly Black http://www.cyclismo.org/tutorial/R/
- A brief overview by Ross Ihaka (one of the originators of R) https://www.stat.auckland.ac.nz/~ihaka/120/Notes/ch02.pdf
- Information Visualization course by Ross Ihaka https://www.stat.auckland.ac.nz/~ihaka/120/notes.html
- A Beginner’s Guide to R by Zuur, Leno and Meesters http://www.springer.com/us/book/9780387938363
- R in a nutshell by Joseph Adler http://shop.oreilly.com/product/0636920022008.do
- The Art of R Programming by Norman Matloff http://www.nostarch.com/artofr.htm
- Introduction to R by Datacamp https://www.datacamp.com/courses/free-introduction-to-r
- Good material on rstatistics.net http://rstatistics.net/
- Watch some Google Developers videos http://www.youtube.com/playlist?list=PLOU2XLYxmsIK9qQfztXeybpHvru-TrqAP
- Advanced R by Hadley Wickham http://adv-r.had.co.nz/
- StackOverflow https://stackoverflow.com
- R-Bloggers https://www.r-bloggers.com
- Rseek https://rseek.org/ (served as R search engine powered by google)
- Stackexchange https://stackexchange.com
### Course Outline
R Course Outline
```{r echo=FALSE, eval=TRUE, message=FALSE, warning=FALSE}
library("reactable")
library("htmlwidgets")
library("htmltools")
library("data.table")
R_trainees <- data.table::fread("F:/BSMMU/Spatial Data Analysis in R/Course_outline.csv")
reactable::reactable(R_trainees,highlight = TRUE,
outlined = TRUE,
bordered = TRUE,
borderless = FALSE,
striped = TRUE,
compact = TRUE,
searchable = TRUE,
wrap = TRUE,
columns=list(Sessions=colDef(minWidth=20), Contents=colDef(minWidth=30), "Sub-Contents" =colDef(minWidth=60)),
showPageSizeOptions = TRUE,
defaultPageSize = 120)
```
### Course Participants
R Course Participants
```{r echo=FALSE, eval=TRUE, message=FALSE, warning=FALSE}
library("reactable")
library("htmlwidgets")
library("htmltools")
library("data.table")
R_trainees <- data.table::fread("F:/BSMMU/Spatial Data Analysis in R/participants.csv")
reactable::reactable(R_trainees,highlight = TRUE,
outlined = TRUE,
bordered = TRUE,
borderless = FALSE,
striped = TRUE,
compact = TRUE,
searchable = TRUE,
wrap = TRUE,
showPageSizeOptions = TRUE,
defaultPageSize = 50)
```
### Powerpoint Slides
```{r, echo = FALSE, out.width="75%", fig.align= "center"}
library(knitr)
include_graphics(c("F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide2.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide3.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide4.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide5.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide6.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide7.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide8.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide9.png", "F:/BSMMU/Spatial Data Analysis in R/PPT_Introduction to R/Slide10.png"))
```
### R Markdown File
Please click [here](https://rpubs.com/proloy/983358) for R Markdown File
R Scripts
==================================
**Lesson-2: Basic data types in R**
==================================
**Followings are vector data. A vector is a one-dimensional array or structure**
- Numeric values
- Integer values
- Character values
- Logical values
- Factors
- Missing values
- Time
Numeric values
a <- 7 # one element
show(a)
print(a)
a
class(a)
length(a) # to see how many elements or observations in the vector
rm(a) # Remove any variable or file. Now try this function show(a)
Integer values
b <- 7L
b
class(b)
Character values
x <- "Proloy"
x
class(x)
Logical values
x <- FALSE
y<- TRUE
x
y
class(x)
class (y)
Factors
countries <- c('Bangladesh', 'Bangladesh', 'India', 'Afghanistan', 'India')
countries
class(countries)
f1 <- as.factor(countries) # converting character values into factor values
f1
class(f1)
Missing values
m <- c(2, NA, 5, 2, NA, 2) # NA (“Not Available”) (e.g. missing value = .)
is.na(m) # To check NA or missing values
class(m)
which(is.na(m)) # Get positions of NA
n <- c(5, 9, NaN, 3, 8, NA, NaN) # NaN (“Not a Number”) (e.g. 0 / 0)
is.nan(n) # To check NaN values
class(n)
which(is.nan(n)) # Get positions of NaN
Time
d<- Sys.Date()
d
class(d)
================================
**Lesson-3: Basic data structures**
================================
**In the previous lesson we learned one dimensional data structure (vector). In this lesson, we will learn multi-dimensional data structures that can store basic data or vector data**
Matrix
**A two-dimensional rectangular layout is called a matrix. We can create a matrix with two rows and three columns using following codes**
m <- matrix(ncol=3, nrow=2)
m
**Note that all values were missing (NA) in above matrix. Let’s make a matrix with values 1 to 6**
m <- matrix(data=c(1:6), ncol=3, nrow=2, byrow = TRUE) # Arguments- like parameters, are are information passed to functions.
m
m <- matrix(data=c(1:6), ncol=3, nrow=2, byrow = FALSE) # By default elements are arranged sequentially by column.
m
t(m) # switching the number of columns and rows and using the t (transpose) function
**A matrix can only store a single data type. If you try to mix character and numeric values, all values will become character values (as the other way around may not be possible)**
vchar <- c("a", "b")
class(vchar)
vnumb <- c(1,2)
class(vnumb)
matrix(c(vchar,vnumb), ncol=2, nrow=2, byrow = FALSE)
m <- matrix(data=c(1:6), ncol=3, nrow=2, byrow = FALSE) # Define the column and row names in matrix m
m
rownames(m) = c("row1", "row2") # Row names are less important.
colnames(m) = c("ID", "X", "Y")
m
class(m)
List
**A list in R is similar to your to-do list at work or school a list is some kind super data type**
v <- c(1:10)
m <- matrix(data=c(1:6), ncol = 3, nrow=2)
c <- "abc"
l<- list(v, m, c)
names(l) <- c("first", "second", "third") # Naming of list elements
print(l)
class(l)
Data frame
**It is rectangular like a matrix, but unlike matrices a data.frame can have columns (variables) of different data types such as numeric, character, factor. Let's create a data frame with the following four variables or vectors**
ID <- as.integer(c(1,2,3,4))
name <- c("name1", "name2", "name3", "name4")
sex <- as.factor(c("Female","Male","Male","Female"))
age <- as.numeric(c(36, 27, 37, 32))
df <- data.frame(ID, name, sex, age, stringsAsFactors=FALSE)
print(df)
class(df)
str(df) # to see the data structure
================================
**Lesson-4: Indexing**
================================
Vector
**Access element(s) of a vector**
b <- c(10:15)
b
b[1] # Get the first element of a vector
b[-2] # Get all elements except the second
b[1] <- 11 # use an index to change values
b[3:6] <- -99 # use an index to change values
b
Matrix
**values of matrices can be accessed through indexing**
m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE)
colnames(m) <- c('a', 'b', 'c')
m
**use two numbers in a double index, the first for the row number(s) and the second for the column number(s).**
m[2,2]
m[ ,2] # entire column
m[, c('a', 'c')] # two columns
m[1,1] <- 5 # setting values
List
v <- c(1:10)
m <- matrix(data=c(1:6), ncol = 3, nrow=2)
c <- "abc"
l<- list(v, m, c)
names(l) <- c("first", "second", "third") # Naming of list elements
print(l)
class(l)
l$first # the first elements can be extracted by using the $ (dollar) operator
l$second
l$third
l[["first"]] # to extract elements of first vector
Data frame
m <- matrix(1:9, nrow=3, ncol=3, byrow=TRUE) # create a data.frame from matrix m
colnames(m) <- c('a', 'b', 'c')
d <- data.frame(m)
class(d)
d[,2] # extract a column by column number
d[, 'b'] # use the column name to get values
d[ , 'b', drop=FALSE] # to make the output a one dimensional matrix
Which
**When we need to find indices of the elements in a vector that have values above 15? The function which() gives us the entries of a logical vector that are true.**
x <- c(10:20)
i <- which(x > 15)
print(i)
x[i]
%in%
**A very useful operator that allows you to ask whether a set of values is present in a vector is %in%.**
x <- c(10:20)
j <- c(7,9,11,13)
j %in% x
which(j %in% x)
Match
**The function match() looks for entries in a vector and returns the index needed to access them**
match(j, x) # Another handy similar function is match
================================
**Lesson-5: Algebra**
================================
Vector algebra
Creating example vectors
a <- c(1:5) # to create vector a
b <- c(6:10) # to create vector b
d <- a*b # Multiplication works element by element
Logical comparisons
a == 2
b > 6 & b < 8
b > 9 | a < 2
b >= 9
a <= 2
b >= 9 | a <= 2
b >= 9 & a <= 2
Functions
sqrt(a)
exp(a)
min(a)
max(a)
range(a)
sum(a)
mean(a)
median(a)
prod(a)
sd(a)
Random numbers
r <- runif(10) # for uniform distributed numbers
r <- rnorm(10, mean=10, sd =2) # for randomly distributed numbers
**To be able to exactly reproduce examples or data analysis we often want to assure that we take exactly the same “random” sample each time we run our code.**
set.seed(n)
Matrices
m <- matrix(1:6, ncol=3, nrow=2, byrow=TRUE) # Create an example matrix
print(m)
m*5 #to multiply all values of m with 5
m*m # multiply two matrices
m * 1:2 # We can also do math with a matrix and a vector
================================
**Lesson-6: Data Exploration**
================================
Summary and Table
d <- data.frame(id=1:10, name=c('Bob', 'Bobby', '???', 'Bob', 'Bab', 'Jim', 'Jim', 'jim', '', 'Jim'), score1=c(8, 10, 7, 9, 2, 5, 1, 6, 3, 4),
score2=c(3,4,5,-999,5,5,-999,2,3,4), stringsAsFactors=FALSE)
print(d)
str(d)
summary(d) # to see summary of data
i <- d$score2 == -999 # R uses dollar symbol to extract variable from dataset
d$score2[i] <- NA
summary(d)
unique(d$name) # to see unique character (and integer) of character variable
table(d$name)
**Note that somehow $ symbol is not appearing after d. Please add $ after d without any space** **Note that somehow $ symbol is not appearing after df. So I add space to appear it**
d $name[d $name %in% c('Bab', 'Bobby')] <- 'Bob' # to replace ‘Bab’ and ‘Bobby’ with ‘Bob’
table(d$name)
d $name[d $name %in% ‘jim’] <- ‘Jim’ # to replace ‘jim’ with ‘Jim’
table(d$name)
d $name[d $name == '???'] <- NA # to replace '???' with NA
table(d$name)
table(d$name, useNA='ifany') # To force table to also count the NA values.
d$name[9]
**Note that there is one ‘empty’ value in the dataset. to replace ‘empty’ value with NA (missing value)** **Note that somehow $ symbol is not appearing after df. So I add space to appear it**
d $name[d $name == ''] <- NA # to replace empty value '' with NA
table(d[ c('name', 'score2')]) # to see frequency table of two variables
Quantile, range, and mean
quantile(d$score1)
range(d$score1)
mean(d$score1)
**Note that we may need to use na.rm=TRUE if there are NA values. for example we see error for this quantile(d$score2)**
quantile(d$score2, na.rm=TRUE)
range(d$score2)
range(d$score2, na.rm=TRUE)
Plots
par(mfrow=c(2,2)) # sets up the canvas for two rows and columns
plot(d$score1, d$score2) # Scatter plot with two variables
boxplot(d[, c('score1', 'score2')]) # Boxplot of two variables
plot(sort(d$score1))
hist(d$score2)
========================
**Read and write files**
========================
**To read first we need to know the full path (directory) name and the name of the file for path delimiters we need to use the forward-slash "/". For example, "C:/projects/research/data/obs.csv".**
setwd("F:/BSMMU/Spatial Data Analysis in R/") # Setting working directory
getwd() # to see file.path
df <- read.csv("F:/BSMMU/Spatial Data Analysis in R/participants.csv")
print(df)
class(df)
str(df)
write.csv(df, "F:/BSMMU/Spatial Data Analysis in R/participants_data.csv")
Learning some data cleaning
colnames(df) <- c("reg", "name", "excel", "spss", "stata", "r", "sas", "age", "sex","major", "laptop") # naming variabxle names sequentially
str(df) # see data structure
**Note that somehow $ symbol is not appearing after df. So I add space to appear it**
df $excel <- as.factor(df $excel) # converting character to factor
df $spss <- as.factor(df $spss) # converting character to factor
df $stata <- as.factor(df $stata) # converting character to factor
df $r <- as.factor(df $r) # converting character to factor
df $sas <- as.factor(df $sas) # converting character to factor
df $age <- as.factor(df $age) # converting character to factor
df$sex <- as.factor(df $sex) # converting character to factor
df $major <- as.factor(df $major) # converting character to factor
df $laptop <- as.factor(df $laptop) # converting character to factor
df <- df[-1] # removing first variable
df
df$ID <- 1:nrow(df) # creating a new variable called ID
df
data.table::setcolorder(df, neworder = "ID") # Ordering variables starting with ID
df
str(df)