1 Goal of this course

We will introduce the data type and data structure of R. For example, we use c() to denote a set of numbers and strings.

A<-c("Taipei City","New Taipei City", "Taoyuan City", "Taichung City","Tainan City","Kaohsiung City")
print(A)
## [1] "Taipei City"     "New Taipei City" "Taoyuan City"    "Taichung City"  
## [5] "Tainan City"     "Kaohsiung City"
B<-c(0,1,2,3,4,5,6,7,8,9)
print(B)
##  [1] 0 1 2 3 4 5 6 7 8 9

2 Overview

  • One dimension
    • vector
    • factor
  • Two dimensions
    • matrix
    • data frame
  • Multiple dimensions
    • array
    • list
    • table

3 What is R?

R is a free software for data analysis and statistical computing. It can be download from here. It has a base package and allows users to download packages from a specific repository or the “Comprehensive R Archive Network” (CRAN).
Please make sure that you download the right version compatible with your system.
RStudio is an IDE that expands the functions of R. For example, you can embed R codes in a html or \(\LaTeX\) document. You can also import different kinds of datasets easily. Notice that it cannot run without R.

4 Data Type

4.1 numeric

Numeric data can be numeric or integer. For example:

x<-c(2, 4, 6, 8); x
## [1] 2 4 6 8
class(x)
## [1] "numeric"

The function class() returns the data type.

We can also use scientific notations to present the numeric data:

y=c(1.1e+06); y
## [1] 1100000
class(y)
## [1] "numeric"

Integer is a subset of numbers. In a 32-bit operation system, the maximum integer is 2147483647,far smaller than the maximum of number. Every number is composed of significan (\(+\-\)), mantissa and exponent to be computed, in particular the number with a decimal. For example:

u<-as.integer(c(4)); class(u)
## [1] "integer"

Another example:

a=1.356e+3
is.numeric(a)
## [1] TRUE
is.integer(a)
## [1] FALSE
options(digits=20)
pi
## [1] 3.141592653589793116
is.numeric(pi); is.integer(pi)
## [1] TRUE
## [1] FALSE

Integer can be added, substracted, multiplied, and divided like number.

J<-c(3, 6, 33)
J<-as.integer(J)
J-1; J*2; J/3
## [1]  2  5 32
## [1]  6 12 66
## [1]  1  2 11

4.2 character

Charaters are easy to understand, espcially for data visualization. For example:

library(lattice)
s77<-data.frame(state.x77)
s77<-s77[order(s77$Population,decreasing=T),]
dotchart(s77$Population,
labels=row.names(s77), pch=16,cex=.7, xlab='Population')

Here we transform the matrix state.x77 to be a data frame, and then we draw a dotchart, a function from lattice package.

numbers can be treated as characters:

char1<-c("1","2", "Do", "Re", "Mi"); char1
## [1] "1"  "2"  "Do" "Re" "Mi"

Can we convert characters to numbers? No. But we can convert it to factor, then converting it to numbers.

LETTERS
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
LETTERS.f <- as.factor(LETTERS)
as.numeric(LETTERS.f)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26

Please read the councilor data. Notice that we can encode the data if it is not UTF-8.

library(foreign)
df <- read.csv('councilor.csv', header=T, fileEncoding = 'BIG5', 
               colClasses = "character")
df
##    Year budget                         unit contracter open
## 1  2015    676 Hydraulic Engineering Office       台球  Yes
## 2  2016    673      New Construction Office       茂盛  Yes
## 3  2016    270      New Construction Office       冠君  Yes
## 4  2016    255      New Construction Office       金煌  Yes
## 5  2016    235      New Construction Office       聖鋒  Yes
## 6  2016    190      New Construction Office       福呈   No
## 7  2015    155  Park and Stree Light Office       盛吉  Yes
## 8  2016    154      New Construction Office       茂盛  Yes
## 9  2016    142      New Construction Office       冠君  Yes
## 10 2016    123      New Construction Office     未發包  Yes
as.numeric(df$unit)
##  [1] NA NA NA NA NA NA NA NA NA NA

To make the conversion easier, we change the arguments colClasses=“factor” in the read.csv() funcation.

library(foreign)
df <- read.csv('councilor.csv', header=T, fileEncoding = 'BIG5', 
               colClasses = "factor")
df
##    Year budget                         unit contracter open
## 1  2015    676 Hydraulic Engineering Office       台球  Yes
## 2  2016    673      New Construction Office       茂盛  Yes
## 3  2016    270      New Construction Office       冠君  Yes
## 4  2016    255      New Construction Office       金煌  Yes
## 5  2016    235      New Construction Office       聖鋒  Yes
## 6  2016    190      New Construction Office       福呈   No
## 7  2015    155  Park and Stree Light Office       盛吉  Yes
## 8  2016    154      New Construction Office       茂盛  Yes
## 9  2016    142      New Construction Office       冠君  Yes
## 10 2016    123      New Construction Office     未發包  Yes
as.numeric(df$unit)
##  [1] 1 2 2 2 2 2 3 2 2 2

4.3 logic

  • Data can be logic as True or False. We use it to filter data. For example, let’s create a logic variable ok:
  • s77$ok<-s77$Income>5000; s77$ok
    ##  [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
    ## [12] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
    ## [23] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
    ## [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
    ## [45]  TRUE  TRUE FALSE FALSE FALSE  TRUE
  • We can select cases of a data frame according to this variable:
  • options(digits=3)
    s77[s77$ok, ]
    ##              Population Income Illiteracy Life.Exp Murder HS.Grad Frost
    ## California        21198   5114        1.1     71.7   10.3    62.6    20
    ## Illinois          11197   5107        0.9     70.1   10.3    52.6   127
    ## New Jersey         7333   5237        1.1     70.9    5.2    52.5   115
    ## Maryland           4122   5299        0.9     70.2    8.5    52.3   101
    ## Connecticut        3100   5348        1.1     72.5    3.1    56.0   139
    ## North Dakota        637   5087        0.8     72.8    1.4    50.3   186
    ## Nevada              590   5149        0.5     69.0   11.5    65.2   188
    ## Alaska              365   6315        1.5     69.3   11.3    66.7   152
    ##                Area   ok
    ## California   156361 TRUE
    ## Illinois      55748 TRUE
    ## New Jersey     7521 TRUE
    ## Maryland       9891 TRUE
    ## Connecticut    4862 TRUE
    ## North Dakota  69273 TRUE
    ## Nevada       109889 TRUE
    ## Alaska       566432 TRUE

    Please try to run the following codes:

    library(car)
    head(Duncan)
    income.hi<-Duncan$income>70  
    Duncan[income.hi, ]

    How many “bc” are left in your dataset?

    4.4 date

  • Date is a special type of data. First of all, we can show what date is today.
  • Sys.Date()
  • If we have a vector of characters, we can tranform it to date according to the original format. We can transform it to other date format.
  • v<-c("2/27/2018", "6/26/2018", "12/31/2018"); class(v)
    ## [1] "character"
    v.date<-as.Date(v, format='%m/%d/%Y'); class(v.date)
    ## [1] "Date"
    v.date
    ## [1] "2018-02-27" "2018-06-26" "2018-12-31"
    v.date <-format(v.date, "%b. %d, %Y"); v.date
    ## [1] " 2. 27, 2018" " 6. 26, 2018" "12. 31, 2018"

    Or

    v<-c("", "6/26/2018", "12/31/2018")
    as.Date(v, format='%m/%d/%Y')
    ## [1] NA           "2018-06-26" "2018-12-31"
  • We can also know certain dates after certain days pass from the origin date.
  • S <- c(50, 100)
    as.Date(S, origin="2018-01-01")
    ## [1] "2018-02-20" "2018-04-11"
  • So what is the strength of date? Can we stick with character or numeric? It depends.
  • For example, we create the following dataset with tibble():
  • library(dplyr)
    df <- tibble(date=c("2016", "2017", "2018"),
                        students=c(20, 22, 18),
                        teachers=c(12, 13, 20))
    df
    ## # A tibble: 3 x 3
    ##   date  students teachers
    ##   <chr>    <dbl>    <dbl>
    ## 1 2016       20.      12.
    ## 2 2017       22.      13.
    ## 3 2018       18.      20.
  • Then let’s draw a multiple-line chart to visualize the trends of students and teachers. In order to do so, we use melt() to create a long table according to the index variable, date.
  • library(reshape2); library(ggplot2)
    df2 <- melt(df, id.vars="date", variable.name="Group")
    df2$Group <-as.factor(df2$Group)
    df2
    ##   date    Group value
    ## 1 2016 students    20
    ## 2 2017 students    22
    ## 3 2018 students    18
    ## 4 2016 teachers    12
    ## 5 2017 teachers    13
    ## 6 2018 teachers    20
    ggplot(df2, aes(x=date, y=value, col=Group)) + 
         geom_line( size=1) +
         geom_point(shape=6, size=3)           

  • The figure looks okay, but there are no lines connecting the dots. Let’s convert the type of “date” to date type.
  • library(reshape2); library(ggplot2)
    df2 <- melt(df, id.vars="date", variable.name="Group")
    df2$Group <-as.factor(df2$Group)
    df2$date <- as.Date(df2$date, format="%Y")
    df2
    ##         date    Group value
    ## 1 2016-07-05 students    20
    ## 2 2017-07-05 students    22
    ## 3 2018-07-05 students    18
    ## 4 2016-07-05 teachers    12
    ## 5 2017-07-05 teachers    13
    ## 6 2018-07-05 teachers    20
    ggplot(df2, aes(x=date, y=value, col=Group)) + 
         geom_line( size=1) +
         geom_point(shape=16, size=3)           

  • We can try another dataset as follows.
  • library(dplyr)
    df <- tibble(date=c("2018/07/11", "2018/07/12", "2018/07/13"),
                        students=c(20, 22, 18),
                        teachers=c(2, 3, 4))
    df             
    ## # A tibble: 3 x 3
    ##   date       students teachers
    ##   <chr>         <dbl>    <dbl>
    ## 1 2018/07/11      20.       2.
    ## 2 2018/07/12      22.       3.
    ## 3 2018/07/13      18.       4.
    library(reshape2); library(ggplot2)
    df2 <- melt(df, id.vars="date", variable.name="Group")
    df2$Group <-as.factor(df2$Group)
    df2$date <- as.Date(df2$date, format="%Y/%m/%d")
    df2
    ##         date    Group value
    ## 1 2018-07-11 students    20
    ## 2 2018-07-12 students    22
    ## 3 2018-07-13 students    18
    ## 4 2018-07-11 teachers     2
    ## 5 2018-07-12 teachers     3
    ## 6 2018-07-13 teachers     4
    ggplot(df2, aes(x=date, y=value, col=Group)) + 
         geom_line( size=1) +
         geom_point(shape=16, size=3) +
         scale_x_date(date_labels = "%Y/%m/%d")

  • Here we use scale_x_date(date_labels = “%Y/%m/%d”) to specify the date format of x-axis.
  • 4.4.1 Difference in date

  • We can calculate the difference between two dates. For example, we want to know the difference of Xi and Tsai’s age:
  • xi<-"1953-06-15" #Xi's birthday
    tsai<-"1956-08-31" #Tsai's birthday
  • We use difftime() to calculate the difference:
  • as.Date(c(xi,tsai))
    ## [1] "1953-06-15" "1956-08-31"
    difftime(tsai, xi)
    ## Time difference of 1173 days

    4.4.2 Formats of Date

  • Here are some formats of date:
  • symbol meaing example
    %d date as number 01-31
    %a abbreviated weekday Mon
    %A unabbreviated weekday Monday
    %m month as number 01-12
    %b abbreviate month Jan
    %B unabbreviate month January
    %y 2-digit year 18
    %Y 4-digit year 2018
  • format() can transform date to other formats.
  • Today<-Sys.Date(); Today
    ## [1] "2018-07-05"
    to_day<-format(Today, format='%Y-%b-%d'); to_day
    ## [1] "2018- 7-05"
    to_day<-format(Today, format='%m %d (%a), %y'); to_day
    ## [1] "07 05 (四), 18"

    5 Data structure

    5.1 one-dimension

    5.1.1 vector

  • Vector is the most often used data structure.
  • example<-c(0,1,2,3,4)
    print(example)
    ## [1] 0 1 2 3 4
  • We can reverse the arrow.
  • c(2,4,6,8)->A
  • There are upper case and lower case of English letters.
  • c(letters)
    ##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
    ## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
    c(LETTERS)
    ##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
    ## [18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
  • We can directly do arithmetic within a vector.
  • j<-c(2*2,2*9, 10-2, sqrt(25)); j
    ## [1]  4 18  8  5
  • We can multiply, divide, add, or substract a vector by certain number.
  • R<-c(30, 40, 50); A <- R/5; A; A^2 
    ## [1]  6  8 10
    ## [1]  36  64 100

    5.1.2 Attribute

  • We can use attr() function to store metadata about the object. Attributes can be thought of as a named list (with unique names). Attributes can be accessed with attr() or all at once (as a list) with attributes().
  • attr(mtcars$wt, "myattribute") <-"Weights"
    attr(mtcars$wt, "mylabel") <-"Weights of Cars"
    attr(mtcars$wt, "myattribute")
    ## [1] "Weights"
    attr(mtcars$wt, "mylabel")
    ## [1] "Weights of Cars"
    attributes(mtcars$wt)
    ## $myattribute
    ## [1] "Weights"
    ## 
    ## $mylabel
    ## [1] "Weights of Cars"
  • We can give names to elements in a vector.
  • U <- c(60, 70, 75, 82)
    attr(U, "myattribute")<-"humanity"
    names(U) <-c("Taichung", "Chiayi", "Tainan", "Kaohsiung")
    U
    ##  Taichung    Chiayi    Tainan Kaohsiung 
    ##        60        70        75        82 
    ## attr(,"myattribute")
    ## [1] "humanity"
    barchart(U)

  • We can combine vectors with vectors.
  • RA <- c(R, A)
    RA
    ## [1] 30 40 50  6  8 10
  • If two vectors are of unequal length, the shorter one will be recycled in order to match the longer vector.
  • M1 <- c(1:3); M1
    ## [1] 1 2 3
    M2 <- rep(10, 4); M2
    ## [1] 10 10 10 10
    M1 + M2
    ## [1] 11 12 13 11

    5.1.3 factor

    Data can have categories, which look like characters.

    library(ggplot2)
    class(diamonds$cut)
    ## [1] "ordered" "factor"
    table(diamonds$cut)
    ## 
    ##      Fair      Good Very Good   Premium     Ideal 
    ##      1610      4906     12082     13791     21551
  • In this example, cut is an ordered factor. We can try to do it on our own.
  • H<-c("Hi", "Lo", "Lo", "Middle", "Middle", "Middle")
    table(H)
    ## H
    ##     Hi     Lo Middle 
    ##      1      2      3
    H.o <- ordered(H, levels=c("Lo", "middle", "Hi"))
    table(H.o)
    ## H.o
    ##     Lo middle     Hi 
    ##      2      0      1
  • Factor is useful because it has characters that we can read in tabulation.
  • table(diamonds$cut, diamonds$color)
    ##            
    ##                D    E    F    G    H    I    J
    ##   Fair       163  224  312  314  303  175  119
    ##   Good       662  933  909  871  702  522  307
    ##   Very Good 1513 2400 2164 2299 1824 1204  678
    ##   Premium   1603 2337 2331 2924 2360 1428  808
    ##   Ideal     2834 3903 3826 4884 3115 2093  896
  • It is also useful when we vizualize our data.
  • library(car)
    library(lattice)
    plot( Chile$vote ~ Chile$sex, xlab="Sex", ylab="Vote")

    5.1.3.1 Conversion of factor to numeric

  • We can convert factor to numeric data.
  • library(car)
    data(Chile)
    class(Chile$region)
    ## [1] "factor"
    table(Chile$region)
    ## 
    ##   C   M   N   S  SA 
    ## 600 100 322 718 960
    y<-as.numeric(Chile$region)
    table(y)
    ## y
    ##   1   2   3   4   5 
    ## 600 100 322 718 960
  • We can also create a numeric variable and replace factor into numeric.
  • Chile$gender[Chile$sex=="F"]<-1
    Chile$gender[Chile$sex=="M"]<-2
    table(Chile$gender)
    ## 
    ##    1    2 
    ## 1379 1321
    table(Chile$sex)
    ## 
    ##    F    M 
    ## 1379 1321
  • Please try to convert the cut variable from factor to numeric.
  • 5.2 Two-dimension

    5.2.1 matrix

  • We can create a matrix to represent two dimensions of data. For example, we can create a \(3\times 3\) matrix like that:
  • m<-matrix(c(1:9), nrow=3, ncol=3); m
    ##      [,1] [,2] [,3]
    ## [1,]    1    4    7
    ## [2,]    2    5    8
    ## [3,]    3    6    9
    n<-matrix(c(1:6), nrow=3, ncol=2); n
    ##      [,1] [,2]
    ## [1,]    1    4
    ## [2,]    2    5
    ## [3,]    3    6
  • We can multiply the two matrixes if the number of columns of the first matrixt equals to the number of row of the second matrix:
  • m%*%n
    ##      [,1] [,2]
    ## [1,]   30   66
    ## [2,]   36   81
    ## [3,]   42   96
  • We can obtain the diagnol of matrix with diag() function:
  • diag(m)
    ## [1] 1 5 9
    diag(n)
    ## [1] 1 5
  • Sometimes we need to transpose the matrix:
  • t(m)
    ##      [,1] [,2] [,3]
    ## [1,]    1    2    3
    ## [2,]    4    5    6
    ## [3,]    7    8    9
  • We can replace one of the elements in a matrix with other numbers or strings by specifing the two dimensions:

    m
    ##      [,1] [,2] [,3]
    ## [1,]    1    4    7
    ## [2,]    2    5    8
    ## [3,]    3    6    9
    m1<-m
    m1[2,3]<-0
    m1[3,]<-99
    m1
    ##      [,1] [,2] [,3]
    ## [1,]    1    4    7
    ## [2,]    2    5    0
    ## [3,]   99   99   99
  • We can set up different dimensions of a same matrix:
  • a <- matrix(c(1:20), nrow=2, ncol=10)
    b <- matrix(c(1:20), nrow=5, ncol=4)
    a; b
    ##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
    ## [1,]    1    3    5    7    9   11   13   15   17    19
    ## [2,]    2    4    6    8   10   12   14   16   18    20
    ##      [,1] [,2] [,3] [,4]
    ## [1,]    1    6   11   16
    ## [2,]    2    7   12   17
    ## [3,]    3    8   13   18
    ## [4,]    4    9   14   19
    ## [5,]    5   10   15   20

    5.2.2 data frame

  • Data frame is like matrix. It can combines vectors and assign names to each of them.
  • options(digits=4)
    R1<-c(170, 175, 166, 172, 165, 157, 167, 167, 
            156, 160)
    R2<-c("F","M","M","M","F","F","F","F","M","F")
    R3<-R1/10 + 42
    
    R123<-data.frame(height=R1,gender=R2,weight=R3); R123
    ##    height gender weight
    ## 1     170      F   59.0
    ## 2     175      M   59.5
    ## 3     166      M   58.6
    ## 4     172      M   59.2
    ## 5     165      F   58.5
    ## 6     157      F   57.7
    ## 7     167      F   58.7
    ## 8     167      F   58.7
    ## 9     156      M   57.6
    ## 10    160      F   58.0
  • It is shown that gender is a factor. We can ask R not to convert it.
  • class(R123$gender)
    ## [1] "factor"
    R123<-data.frame(height=R1,gender=R2,weight=R3, stringsAsFactors = F)
    str(R123)
    ## 'data.frame':    10 obs. of  3 variables:
    ##  $ height: num  170 175 166 172 165 157 167 167 156 160
    ##  $ gender: chr  "F" "M" "M" "M" ...
    ##  $ weight: num  59 59.5 58.6 59.2 58.5 57.7 58.7 58.7 57.6 58
  • Each column must have same numbers of elements, and each row must have the same number of columns.
    • nrow():number of rows
    • ncol():number of columns
    • dim():number of rows and columns
    • head():the first six rows
    • str(): information of all variables
  • If we want to know the number of observations in AMSsurvey
  • library(car)
    nrow(AMSsurvey)
    ## [1] 24
  • If we want to change the names of variables of a data frame, we can use colnames() function:
  • newsurvey <- AMSsurvey
    colnames(newsurvey)<-c("v1","v2","v3", "v4", "v5"); head(newsurvey)
    ##      v1     v2 v3  v4  v5
    ## 1 I(Pu)   Male US 132 148
    ## 2 I(Pu) Female US  35  40
    ## 3 I(Pr)   Male US  87  63
    ## 4 I(Pr) Female US  20  22
    ## 5    II   Male US  96 161
    ## 6    II Female US  47  53

    5.3 Multi-dimension

    5.3.1 array

  • Array can contain more than one matrix. An one-matrixt array is a matrix.
  • Array1 <- array(1:12, dim = c(2, 6, 1)); Array1
    ## , , 1
    ## 
    ##      [,1] [,2] [,3] [,4] [,5] [,6]
    ## [1,]    1    3    5    7    9   11
    ## [2,]    2    4    6    8   10   12
  • We can create an array that contains two \(2\times 4\) matries
  • Array2 <- array(16:1, dim = c(2, 4, 2)); Array2
    ## , , 1
    ## 
    ##      [,1] [,2] [,3] [,4]
    ## [1,]   16   14   12   10
    ## [2,]   15   13   11    9
    ## 
    ## , , 2
    ## 
    ##      [,1] [,2] [,3] [,4]
    ## [1,]    8    6    4    2
    ## [2,]    7    5    3    1
  • We can subset the array by specifying the one of the three dimesnions:
  • A12<-Array2[,,2]; A12
    ##      [,1] [,2] [,3] [,4]
    ## [1,]    8    6    4    2
    ## [2,]    7    5    3    1
    A111 <- Array2[1, 1, 1]; A111
    ## [1] 16

    5.3.2 list

  • If we have vectors of unequal length, we can use list to store the data.
  • options(digits=4)
    birthday <- c("1981/03/15", "1983/04/20", "1984/01/18")
    listA<-list(R123, m, birthday); listA
    ## [[1]]
    ##    height gender weight
    ## 1     170      F   59.0
    ## 2     175      M   59.5
    ## 3     166      M   58.6
    ## 4     172      M   59.2
    ## 5     165      F   58.5
    ## 6     157      F   57.7
    ## 7     167      F   58.7
    ## 8     167      F   58.7
    ## 9     156      M   57.6
    ## 10    160      F   58.0
    ## 
    ## [[2]]
    ##      [,1] [,2] [,3]
    ## [1,]    1    4    7
    ## [2,]    2    5    8
    ## [3,]    3    6    9
    ## 
    ## [[3]]
    ## [1] "1981/03/15" "1983/04/20" "1984/01/18"
    listA[[3]]
    ## [1] "1981/03/15" "1983/04/20" "1984/01/18"
  • To organize the list, we can assign names to vectors, arrays, or matrix respectively.
  • options(digits=4)
    listB<-list(data=R123, vec=m, char=birthday); 
    listB[["data"]]
    ##    height gender weight
    ## 1     170      F   59.0
    ## 2     175      M   59.5
    ## 3     166      M   58.6
    ## 4     172      M   59.2
    ## 5     165      F   58.5
    ## 6     157      F   57.7
    ## 7     167      F   58.7
    ## 8     167      F   58.7
    ## 9     156      M   57.6
    ## 10    160      F   58.0
    Please combine `c('a','b','c')``c(1,2,3,4)`以及`c('2018-01-01', '2018-04-04', '2018-04-05', '2018-06-18', '2018-10-10')` as a list.  

    5.3.3 table

    Titanic is a table. It contains four arrays.

    class(Titanic); Titanic
    ## [1] "table"
    ## , , Age = Child, Survived = No
    ## 
    ##       Sex
    ## Class  Male Female
    ##   1st     0      0
    ##   2nd     0      0
    ##   3rd    35     17
    ##   Crew    0      0
    ## 
    ## , , Age = Adult, Survived = No
    ## 
    ##       Sex
    ## Class  Male Female
    ##   1st   118      4
    ##   2nd   154     13
    ##   3rd   387     89
    ##   Crew  670      3
    ## 
    ## , , Age = Child, Survived = Yes
    ## 
    ##       Sex
    ## Class  Male Female
    ##   1st     5      1
    ##   2nd    11     13
    ##   3rd    13     14
    ##   Crew    0      0
    ## 
    ## , , Age = Adult, Survived = Yes
    ## 
    ##       Sex
    ## Class  Male Female
    ##   1st    57    140
    ##   2nd    14     80
    ##   3rd    75     76
    ##   Crew  192     20
    Titanic[, , 1, 1]
    ##       Sex
    ## Class  Male Female
    ##   1st     0      0
    ##   2nd     0      0
    ##   3rd    35     17
    ##   Crew    0      0
  • We can use prop.table() to know the conditional probability of a table and margin.table() can return the marginal probability.
  • options(digits=4)
    g<-Titanic[ , , 2, 2]; g
    ##       Sex
    ## Class  Male Female
    ##   1st    57    140
    ##   2nd    14     80
    ##   3rd    75     76
    ##   Crew  192     20
    prop.table(g, 1)
    ##       Sex
    ## Class     Male  Female
    ##   1st  0.28934 0.71066
    ##   2nd  0.14894 0.85106
    ##   3rd  0.49669 0.50331
    ##   Crew 0.90566 0.09434
    prop.table(g, 2)
    ##       Sex
    ## Class     Male  Female
    ##   1st  0.16864 0.44304
    ##   2nd  0.04142 0.25316
    ##   3rd  0.22189 0.24051
    ##   Crew 0.56805 0.06329
    margin.table(g,1)
    ## Class
    ##  1st  2nd  3rd Crew 
    ##  197   94  151  212
    margin.table(g,2)
    ## Sex
    ##   Male Female 
    ##    338    316


    6 Basic Operation

    6.1 Operation of Vector

  • Because vector can represent the coordinates, so the order of elements is important, especially when we do arithmetic of two vectors.
  • First of all, we do arithmetic between a vector and a scalar.
  • X<-c(10,20,30,40,50,60); Sca<-10
    X+Sca
    ## [1] 20 30 40 50 60 70
    X/Sca
    ## [1] 1 2 3 4 5 6
  • Then we try the operation of two vectors:
  • Y<-c(5,10,6,8,25,6)
    X/Y; X*Y
    ## [1]  2  2  5  5  2 10
    ## [1]   50  200  180  320 1250  360

    6.2 Mathmatical functions

    Please see more details here

    • sqrt(x): square root
    • exp(x): exponential of x
    • log(x, y): natural logarithm of x if y is not specified
    a=6
    exp(a); log(a)
    ## [1] 403.4
    ## [1] 1.792
    log(exp(a)); exp(log(a))
    ## [1] 6
    ## [1] 6
    • factorial(x): factorial of x. For example
    1*2*3
    ## [1] 6
    factorial(3)
    ## [1] 6
    • abs(x): absolute value of x

    6.3 Rounding of numbers

    Please see more details here

    • round:rounds the values in its first argument to the specified number of decimal places
    • floor:takes a single numeric argument x and returns a numeric vector containing the largest integers not greater than the corresponding elements of x.
    • ceiling: takes a single numeric argument x and returns a numeric vector containing the smallest integers not less than the corresponding elements of x
    a1<-c(2.54, 3.111, 10.999)
    round(a1, digits=2)
    ## [1]  2.54  3.11 11.00
    floor(a1)
    ## [1]  2  3 10
    ceiling(a1)
    ## [1]  3  4 11

    7 Exercises

    1. \(\text{log}(\frac{14}{5})=\)?

    2. \(1\times 2\times 3\times , \dots ,\times 8=\)?

    3. About Titanic data, please show which coach that the survived children were taking.

    4. Please use weekdays() function to show what day it is today.

    5. How many observations that has mpg greater or equal to 21 in mtcars

    6. Please analyze if admission is related to gender in A department in UCBAdmissions.

    7. Please count the number of letters in English.

    8. Please transform today’s temperature to Fahrenheit.

    9. Please create a matrix with a diagnol (1,1,1)

    10. Please count the number of days between when you are doing assignment and Jan. 1st.