Basic R

Best to start a new project.

Introduction to R Studio

Mathematical Operations

Below is a list of common mathematical operations that you can perform on numerical types.

x + y performs addition

x - y performs subtraction

x * y performs multiplication

x / y performs division

x ^ y raises x to the yth power

x = y assigns the variable named “x” to the value y

x == y evaluates to a Boolean, true if x equals y, false otherwise

x != y evaluates to a Boolean, true if x does not equal y, false otherwise

x > y evaluates to a Boolean, true if x is greater than y, false otherwise

x < y evaluates to a Boolean, true if x is less than y, false otherwise

x <= y evaluates to a Boolean, true if x is less than or equal to y, false otherwise

x >= y evaluates to a Boolean, true if x is greater than or equal to y, false otherwise

9-4
## [1] 5
9 / 4
## [1] 2.25
log(exp(10))
## [1] 10
sin(pi/2)
## [1] 1

Data type

R has five basic or “atomic” classes of objects:

  • character

  • numeric (real numbers)

  • integer

  • complex

  • logical (True/False)

x <- c(0.5, 0.6);  typeof(x)       ## numeric
## [1] "double"
x <- c(TRUE, FALSE);  typeof(x)    ## logical
## [1] "logical"
x <- c(T, F);   typeof(x)         ## logical
## [1] "logical"
x <- c("a", "b", "c");  typeof(x)   ## character
## [1] "character"
x <- c(1+0i, 2+4i);  typeof(x)    ## complex
## [1] "complex"
x <- 9:29;   typeof(x)            ## integer
## [1] "integer"

":" operator can create a sequence which is very useful.

seq() can do similar thing and is more powerful.

seq(9,29, by=2)
##  [1]  9 11 13 15 17 19 21 23 25 27 29

Data Structure

Vector

The most basic type of R object is a vector. Empty vectors can be created with the vector() function or c() function. There is really only one rule about vectors in R, which is that A vector can only contain objects of the same class.

You can also use the vector() function to initialize vectors.

x <- vector("numeric", length = 10) 
x
##  [1] 0 0 0 0 0 0 0 0 0 0
  1. Question 1: Generate a vector

\[a = (3, 6, 9 ,12) \]

# type your code for Question 1 here, and Knit

A vector can only contain objects of the same class.

A matrix can only contain objects of the same class.

Matrix

Initial a empty matrix

m <- matrix(nrow = 2, ncol = 3) 
m
##      [,1] [,2] [,3]
## [1,]   NA   NA   NA
## [2,]   NA   NA   NA

vector -> matrix

Matrices can be created by column-binding or row-binding with the cbind() and rbind() functions.

x <- 1:3
y <- 10:12
cbind(x, y)
##      x  y
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12
rbind(x, y) 
##   [,1] [,2] [,3]
## x    1    2    3
## y   10   11   12
  1. Question 2: Generate a matrix
\[\begin{matrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{matrix}\]
# type your code for Question 1 here, and Knit

Data Frames

Data frames can contain objects of different classes.

Data frames are used to store tabular data in R. They are an important type of object in R and are used in a variety of statistical modeling applications.

x <- data.frame(foo = 1:4, bar = c(T, T, F, F)) 
x
##   foo   bar
## 1   1  TRUE
## 2   2  TRUE
## 3   3 FALSE
## 4   4 FALSE

Check what is inside an object

length(y)
## [1] 3
dim(y)
## NULL
str(y)
##  int [1:3] 10 11 12
attributes(y)
## NULL
typeof(y)
## [1] "integer"
head(y, 1)
## [1] 10
tail(y,1)
## [1] 12

More for data frame.

nrow(x)
## [1] 4
ncol(x)
## [1] 2
names(x)
## [1] "foo" "bar"

Using str() function to check x.

str(x)
## 'data.frame':    4 obs. of  2 variables:
##  $ foo: int  1 2 3 4
##  $ bar: logi  TRUE TRUE FALSE FALSE
  • What class is the foo column?

  • What class is the bar column?

** Different classes of data can be in a data frame**

Load Data from Csv File

R works with many data formats.

csv file is the most convenient.

Load data locally

Download file from here

if (!file.exists("All_pokemon.csv")){
   download.file("https://github.com/Eighty20/eighty20.github.io/raw/master/_rmd/Post_data/All_pokemon.csv", destfile = "All_pokemon.csv",method="wininet")
}
pokemon <- read.csv("All_pokemon.csv")

Check data

head(pokemon)
##   Nat    Pokemon HP Atk Def  SA  SD Spd Total Type.I Type.II    Gender
## 1   1  Bulbasaur 45  49  49  65  65  45   318  Grass  Poison M (87.5%)
## 2   2    Ivysaur 60  62  63  80  80  60   405  Grass  Poison M (87.5%)
## 3   3   Venusaur 80  82  83 100 100  80   525  Grass  Poison M (87.5%)
## 4   4 Charmander 39  52  43  60  50  65   309   Fire         M (87.5%)
## 5   5 Charmeleon 58  64  58  80  65  80   405   Fire         M (87.5%)
## 6   6  Charizard 78  84  78 109  85 100   534   Fire  Flying M (87.5%)
##   Evolves.From lvl_up Evolves.Into
## 1           -- Lv. 16      Ivysaur
## 2    Bulbasaur Lv. 32     Venusaur
## 3      Ivysaur     --           --
## 4           -- Lv. 16   Charmeleon
## 5   Charmander Lv. 36    Charizard
## 6   Charmeleon     --           --
str(pokemon)
## 'data.frame':    251 obs. of  15 variables:
##  $ Nat         : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Pokemon     : Factor w/ 250 levels "Abra","Aerodactyl",..: 17 93 235 23 24 22 208 240 15 19 ...
##  $ HP          : int  45 60 80 39 58 78 44 59 79 45 ...
##  $ Atk         : int  49 62 82 52 64 84 48 63 83 30 ...
##  $ Def         : int  49 63 83 43 58 78 65 80 100 35 ...
##  $ SA          : int  65 80 100 60 80 109 60 65 85 20 ...
##  $ SD          : int  65 80 100 50 65 85 54 80 105 20 ...
##  $ Spd         : int  45 60 80 65 80 100 43 58 78 45 ...
##  $ Total       : int  318 405 525 309 405 534 314 405 530 195 ...
##  $ Type.I      : Factor w/ 16 levels "Bug","Dark","Dragon",..: 8 8 8 6 6 6 16 16 16 1 ...
##  $ Type.II     : Factor w/ 15 levels "","Dark","Dragon",..: 11 11 11 1 1 7 1 1 1 1 ...
##  $ Gender      : Factor w/ 7 levels "50/50","F (100%)",..: 6 6 6 6 6 6 6 6 6 1 ...
##  $ Evolves.From: Factor w/ 120 levels "--","Alakazam",..: 1 7 43 1 10 11 1 103 115 1 ...
##  $ lvl_up      : Factor w/ 57 levels "","--","Dusk Stone",..: 17 30 2 17 34 2 17 34 2 40 ...
##  $ Evolves.Into: Factor w/ 130 levels "--","Alakazam",..: 51 121 1 15 14 1 124 11 1 72 ...

directly from online repository.

library(readr) # prefered
pokemon <- read_csv("https://github.com/Eighty20/eighty20.github.io/raw/master/_rmd/Post_data/All_pokemon.csv")
## Parsed with column specification:
## cols(
##   Nat = col_integer(),
##   Pokemon = col_character(),
##   HP = col_integer(),
##   Atk = col_integer(),
##   Def = col_integer(),
##   SA = col_integer(),
##   SD = col_integer(),
##   Spd = col_integer(),
##   Total = col_integer(),
##   `Type I` = col_character(),
##   `Type II` = col_character(),
##   Gender = col_character(),
##   `Evolves From` = col_character(),
##   lvl_up = col_character(),
##   `Evolves Into` = col_character()
## )

More information about pokeman

Bulbasaur

Bulbasaur

Subsetting

Indexing

Indexing is used to specify the elements of an array. Indexing also allows you to get out certain bits of information from an array. To index into an array, type the name of the array, followed by the index of the element you want in brackets. Note that in R, indices start at 1.

For a multidimensional array, index by [row,column]

To index an entire row or column, use a colon.

Below we index into the named y to get out the element in the second row, third column, 6.

y <- c( 1,2,3, 4,5,6)
y <- matrix(y, nrow = 2, ncol = 3, byrow = T)
y[2,3]
## [1] 6

If you want to change the value in y[2,3], you can assign a new value to it.

y[2,3] <- 7
y
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    7

Below, we index the entire first row of the array named y.

y[1,]
## [1] 1 2 3

Below, we index the entire second column of the array named y.

y[,2]
## [1] 2 5

Subset by name,

For data frame, $ to subset

type2 <- pokemon$Type.II
## Warning: Unknown column 'Type.II'
head(type2)
## NULL

Look those levels

table(type2)
## < table of extent 0 >

Subset by logic indexing or comparison operations

Dragon <-  pokemon[ pokemon$Type.I == "Dragon", ]
## Warning: Unknown column 'Type.I'
head(Dragon)
## # A tibble: 0 x 15
## # ... with 15 variables: Nat <int>, Pokemon <chr>, HP <int>, Atk <int>,
## #   Def <int>, SA <int>, SD <int>, Spd <int>, Total <int>, Type I <chr>,
## #   Type II <chr>, Gender <chr>, Evolves From <chr>, lvl_up <chr>, Evolves
## #   Into <chr>
  1. Question 3: Subset pokeman with HP greater than 150 and write to HP.csv
# type your code for Question 1 here, and Knit

Write data

#if (!file.exists("Dragon.csv")){
write.csv(Dragon, file="Dragon.csv")
#}

By default R will wrap character vectors with quotation marks when writing out to file. It will also write out the row and column names.

Let’s fix this:

#if (!file.exists("Dragon.csv")){
write.csv(Dragon,file="Dragon.csv",
          quote=FALSE, row.names=FALSE)
#}

Control Flow

Condition if/elseif/else

Conditional evaluation allows portions of code to be evaluated or not evaluated depending on the value of a Boolean expression. You do not need all if/elseif/else statements. You can have conditional evaluations with just an if, or just an if/else.

The general structure of conditional evaluation is as follows.

After assigning values to x and y and running the code we obtain the following output.

x =1
y=2
if (x<y){
  print(x)
}else{
  print(y)
}
## [1] 1
  1. Question 4: Write a if else statement to print larger number in x and y.
# type your code for Question 1 here, and Knit

For Loops

A for loop allows you to specify the number of iterations for the repeated execution of a code block. They are great when you know how many iterations you want to run.

The general form of a for loop is shown below. The example shows a for loop that calculates the sum of the integers 1 through 10 and prints the final result.

Note that to obtain a range of integers, we use the colon : symbol.

sum = 0
for(num in 1:10){
  sum = sum + num
}
print(sum)
## [1] 55

Repeat an operation until a certain condition is met.

For example, generates random numbers from a uniform distribution (the runif() function) between 0 and 1 until it gets one that’s less than 0.1.

z <- 1
while(z > 0.1){
  z <- runif(1)
  print(z)
}
## [1] 0.2839505
## [1] 0.1720636
## [1] 0.5310621
## [1] 0.6951613
## [1] 0.3367849
## [1] 0.628949
## [1] 0.9106426
## [1] 0.577616
## [1] 0.4796738
## [1] 0.5839251
## [1] 0.8860319
## [1] 0.6953232
## [1] 0.04305115
  1. Question 5: Write a for loop statement to print 1 to 10
# type your code for Question 1 here, and Knit

Plot

x = seq(0, 2*pi, 2*pi/180)
y = sin(x)
plot(x,y)

Color, line type, line width and title

plot(x,y, type = "l", col="blue", lwd="4",
     main="sin", xlab = "x", ylab = "y")

Add two lines and a point.

plot(x,y, type = "l", col="blue", lwd="4",
     main="sin", xlab = "x", ylab = "y")
## the x- and y-axis, and an integer grid
abline(h = 0, v = pi, col = "gray60")
points(x=pi, y=0, cex =2, col ="dark red")
text(x=pi+0.3, y = 0.1, labels = "Here", col = "Red")

  1. Question 6: Plot pokemon’s HP and Atk with green color. Change label and title.
# type your code for Question 1 here, and Knit

How to get help

#?plot

Learn by doing

Visit DataCamp or Try R to learn how to write basic R code. Both sites provide interactive lessons that will get you writing real code in minutes. They are a great place to make mistakes and test out new skills. You are told immediately when you go wrong and given a chance to fix your code.

A series of video of R tutorial.

Each tutorial usually only is 2 minutes long.

http://www.twotorials.com/

See what packages we used.

sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 10586)
## 
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936 
## [2] LC_CTYPE=Chinese (Simplified)_China.936   
## [3] LC_MONETARY=Chinese (Simplified)_China.936
## [4] LC_NUMERIC=C                              
## [5] LC_TIME=Chinese (Simplified)_China.936    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] readr_1.0.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.5        knitr_1.14         magrittr_1.5      
##  [4] munsell_0.4.3      colorspace_1.2-6   xtable_1.8-2      
##  [7] R6_2.1.2           stringr_1.0.0      plyr_1.8.4        
## [10] tools_3.3.1        grid_3.3.1         gtable_0.2.0      
## [13] statsr_0.1-1       htmltools_0.3.5    assertthat_0.1    
## [16] yaml_2.1.13        digest_0.6.9       tibble_1.1        
## [19] shiny_0.14         ggplot2_2.1.0      formatR_1.4       
## [22] curl_1.2           evaluate_0.9       mime_0.5          
## [25] rmarkdown_1.0.9002 stringi_1.1.1      scales_0.4.0      
## [28] httpuv_1.3.3