1 Goal

  • This course will introduce control-flow and loops. R can repeat and terminate the commands according to the conditions given by us. It is very useful for data generation and data clean. For example, we randomly draw ten people and we want to decide whether they are eligible for our survey based on their ages. We can write codes as follows:
  • age<-c(34, 12, 19, 21, 22, 30, 16, 18, 17, 39)
     f<-function(x){
      interview<-ifelse(x>=18, "Yes", "No")
      return(data.frame(id=1:10, Age=x, Interview=interview, row.names=NULL))
     }
    f(age)
    ##    id Age Interview
    ## 1   1  34       Yes
    ## 2   2  12        No
    ## 3   3  19       Yes
    ## 4   4  21       Yes
    ## 5   5  22       Yes
    ## 6   6  30       Yes
    ## 7   7  16        No
    ## 8   8  18       Yes
    ## 9   9  17        No
    ## 10 10  39       Yes
    #knitr::kable(f(age))
  • Sub-setting data is also very useful for computing; we can select part of data for analysis. Control-flow and loops also require data subset.

  • 2 Subsetting

  • We will discuss subsetting of vector, list, array, and data frame.
  • 2.1 vector

  • We use index and which() function to sub-set vectors.
  • 2.1.1 index

  • We can express a variable as \(x_{i}\), the subscript \(i\) means the \(i\) th element of it, which is called index. We can subset a vector by indexing it:
  • state.name[1]
    ## [1] "Alabama"
    state.abb[1:4]
    ## [1] "AL" "AK" "AZ" "AR"
    head(sleep)
    ##   extra group ID
    ## 1   0.7     1  1
    ## 2  -1.6     1  2
    ## 3  -0.2     1  3
    ## 4  -1.2     1  4
    ## 5  -0.1     1  5
    ## 6   3.4     1  6
    sleep$extra[nrow(sleep)]
    ## [1] 3.4

    2.1.2 which()

  • which() function can filter out elements within a vector. For example, we can find out which states that have “B” and “C” as the first letter in their names:
  • state.abb[1:10]
    ##  [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA"
    state.abb.abb<-substr(state.abb, 1,1)
    state.abb[which(state.abb.abb=="B")]
    ## character(0)
    state.abb[which(state.abb.abb=="C")]
    ## [1] "CA" "CO" "CT"
  • Here we use substr(A, i, j) function to grab the first letter of state names. A means the vector, i means the first letter to get, and j means the last letter.
  • We grab the first letter of each state and save it as a vector, state.abb.abb. Then we use which() fucntion to match the vector of abbrevation, state.abb.
  • Please try to find out how many states are larger than 10,000 square miles
  • 2.2 list

  • Let’s construct a list that has vector and data frame first.
  • ListA<-list(height=90, width=120, string=state.abb[1:2], data=state.area)
    ListA
    ## $height
    ## [1] 90
    ## 
    ## $width
    ## [1] 120
    ## 
    ## $string
    ## [1] "AL" "AK"
    ## 
    ## $data
    ##  [1]  51609 589757 113909  53104 158693 104247   5009   2057  58560  58876
    ## [11]   6450  83557  56400  36291  56290  82264  40395  48523  33215  10577
    ## [21]   8257  58216  84068  47716  69686 147138  77227 110540   9304   7836
    ## [31] 121666  49576  52586  70665  41222  69919  96981  45333   1214  31055
    ## [41]  77047  42244 267339  84916   9609  40815  68192  24181  56154  97914

    2.2.1 index

  • We can index the vector from a list.
  • ListA[c(1)]
    ## $height
    ## [1] 90
  • We can either index an object or its elements directly from a list.
  • ListA[c(3)]
    ## $string
    ## [1] "AL" "AK"
    ListA[[3]]
    ## [1] "AL" "AK"
  • The name of an object can be indexed.
  • ListA["data"]
    ## $data
    ##  [1]  51609 589757 113909  53104 158693 104247   5009   2057  58560  58876
    ## [11]   6450  83557  56400  36291  56290  82264  40395  48523  33215  10577
    ## [21]   8257  58216  84068  47716  69686 147138  77227 110540   9304   7836
    ## [31] 121666  49576  52586  70665  41222  69919  96981  45333   1214  31055
    ## [41]  77047  42244 267339  84916   9609  40815  68192  24181  56154  97914

    2.3 Matrix and Array

    2.3.1 Index

  • A typical matrix is set up as follows:
  • \[\begin{bmatrix} x_{11} & x_{12} & x_{13} & \dots & x_{1c} \\ x_{21} & x_{22} & x_{23} & \dots & x_{2c} \\ \ldots \\ x_{r1} & x_{r2} & x_{r3} & \dots & x_{rc} \end{bmatrix}\]
  • \(x_{11},\ldots, x_{r1}\) are in the same column, and \(x_{11},\ldots, x_{1c}\) are in the same row. So \(x_{,1}\) represents all elements in the 1st column, and \(x_{1,}\) means all elements in the 1st row.
  • Suppose there is a \(3\times 3\) matrix, we can index one or more than one elements, or replace them.
  • m1<-matrix(c(1:9), 3, 3)
    m1
    ##      [,1] [,2] [,3]
    ## [1,]    1    4    7
    ## [2,]    2    5    8
    ## [3,]    3    6    9
    print(m1[2,2]) #1
    ## [1] 5
    print(m1[c(1:2)]) #2
    ## [1] 1 2
    print(m1[c(1,2),c(1,2)]) #3
    ##      [,1] [,2]
    ## [1,]    1    4
    ## [2,]    2    5
    print(m1[c(1,3),c(1,3)]) #4
    ##      [,1] [,2]
    ## [1,]    1    7
    ## [2,]    3    9
    print(m1[,1]) #5
    ## [1] 1 2 3
    m1[3,3]<-"Hello" #6
    m1
    ##      [,1] [,2] [,3]   
    ## [1,] "1"  "4"  "7"    
    ## [2,] "2"  "5"  "8"    
    ## [3,] "3"  "6"  "Hello"
  • Notice that the ordered pairs \(c(1,2), c(1,2)\) specify the elements at (1,1),(2,1),(1,2),(2,2). That returns:
  • \[\begin{bmatrix} 1 & 4 \\ 2 & 5 \end{bmatrix}\]

    2.3.2 which()

  • We can use which() to select data that meet our condition in a matrix or array. For example, we can get part of array accoring to our criterion and show the result as an array. Then we can match the original array with the selected one.
  • T <- array(1:20, dim=c(4,5)); T
    ##      [,1] [,2] [,3] [,4] [,5]
    ## [1,]    1    5    9   13   17
    ## [2,]    2    6   10   14   18
    ## [3,]    3    7   11   15   19
    ## [4,]    4    8   12   16   20
    ok <- which(T >= 17, arr.ind = T)
    ok
    ##      row col
    ## [1,]   1   5
    ## [2,]   2   5
    ## [3,]   3   5
    ## [4,]   4   5
    T[ok]
    ## [1] 17 18 19 20


    2.4 Data Frame

    2.4.1 Index

  • We use sleep data to illustrate how to subset a data frame, which is similar to array.
  • data(sleep)
    names(sleep)
    ## [1] "extra" "group" "ID"
    sleep[1:3, ]
    ##   extra group ID
    ## 1   0.7     1  1
    ## 2  -1.6     1  2
    ## 3  -0.2     1  3
    sleep[, "extra"]
    ##  [1]  0.7 -1.6 -0.2 -1.2 -0.1  3.4  3.7  0.8  0.0  2.0  1.9  0.8  1.1  0.1
    ## [15] -0.1  4.4  5.5  1.6  4.6  3.4

    2.4.2 %in%

  • If there is a column of id in the data frame, we can use \(\%\textrm{in}\%\) to subset data.
  • head(sleep)
    ##   extra group ID
    ## 1   0.7     1  1
    ## 2  -1.6     1  2
    ## 3  -0.2     1  3
    ## 4  -1.2     1  4
    ## 5  -0.1     1  5
    ## 6   3.4     1  6
    sleep[sleep$ID %in% c(1,2,3), ] #select by ID
    ##    extra group ID
    ## 1    0.7     1  1
    ## 2   -1.6     1  2
    ## 3   -0.2     1  3
    ## 11   1.9     2  1
    ## 12   0.8     2  2
    ## 13   1.1     2  3
    sleep[sleep$ID %in% c(1,2,3) & sleep$group %in% c(1), ] # two conditions
    ##   extra group ID
    ## 1   0.7     1  1
    ## 2  -1.6     1  2
    ## 3  -0.2     1  3
  • \(\%\textrm{in}\%\) means “belong to.” For example, human can sense pitch that ranges from 20Hz to 20000Hz. If a pitch is above 20KHz, we call “ultra sound.” If it is below 20Hz, we call it “infrasound.” We can check how many elements in one vector is subset of the other one:
  • A<-c(4000: 6000); B<-c(3000:5000); C <- c(3000:10000)
    table(A%in%B)
    ## 
    ## FALSE  TRUE 
    ##  1000  1001
    table(A%in%B%in%C)
    ## 
    ## FALSE 
    ##  2001

    2.4.3 which()

  • which()can include more than one conditions connected by “&” (and) or “|” (or). As data frame as concerned, we can find out each row that matches our conditions . Therefore, we can subset our data frame based on multiple conditions.
  • cond <- which(sleep$extra>0.5 & sleep$group==1)
    sleep[cond, ]
    ##    extra group ID
    ## 1    0.7     1  1
    ## 6    3.4     1  6
    ## 7    3.7     1  7
    ## 8    0.8     1  8
    ## 10   2.0     1 10
  • \(\%\textrm{in}\%\) and which() can return the same results:
  • mtcars[mtcars$cyl %in% c(4) & mtcars$hp %in% c(90: max(mtcars$hp)), c(1:6)]
    ##                mpg cyl  disp  hp drat   wt
    ## Datsun 710    22.8   4 108.0  93 3.85 2.32
    ## Merc 230      22.8   4 140.8  95 3.92 3.15
    ## Toyota Corona 21.5   4 120.1  97 3.70 2.46
    ## Porsche 914-2 26.0   4 120.3  91 4.43 2.14
    ## Lotus Europa  30.4   4  95.1 113 3.77 1.51
    ## Volvo 142E    21.4   4 121.0 109 4.11 2.78
    mtcars[which(mtcars$cyl==4 & mtcars$hp > 90) , c(1:6)]
    ##                mpg cyl  disp  hp drat   wt
    ## Datsun 710    22.8   4 108.0  93 3.85 2.32
    ## Merc 230      22.8   4 140.8  95 3.92 3.15
    ## Toyota Corona 21.5   4 120.1  97 3.70 2.46
    ## Porsche 914-2 26.0   4 120.3  91 4.43 2.14
    ## Lotus Europa  30.4   4  95.1 113 3.77 1.51
    ## Volvo 142E    21.4   4 121.0 109 4.11 2.78


    3 Conditional Element Selection

    3.1 ifelse()

  • When we want to keep some elements according to a test or condition, we can use ifesle.
  • Suppose there are four people living in a household and the interviewer has to filter out people under 18 years old. Our codes can be written like this:
  • x=c(20, 50, 16, 78)
    interview<-ifelse(x>=18, "Yes", "No")
    print(interview)
    ## [1] "Yes" "Yes" "No"  "Yes"
  • In essense, ifesle can recode the numeric variable.
  • survey <- c()
    survey[x>=18]<-"Yes"
    survey[x<18]<-"No"
    survey
    ## [1] "Yes" "Yes" "No"  "Yes"
    interview
    ## [1] "Yes" "Yes" "No"  "Yes"

    \(\blacksquare\) Suppose we want to decide if these dates, Jan. 1st, Feb. 1st, Nov. 1st, and Dec. 20., are before or after July 12, 2018. Apply difftime() and ifelse() to transformation.

    3.2 if-else

  • if-else can return a vector when another vector passes a test. The following example shows that we can test if temperature is over 28.
  • temperature<-30 
    if (temperature>28){
       cat ("Turn on air condition")
    }else {
      cat ("Turn off air condition")
    }
    ## Turn on air condition
  • We can also run certain calculation if the vector meets certain condition.
  • scores<-c(30, 50, 90, 20) 
    if (scores< 36){
         sqrt(scores)*10
    }else {
        scores
    }
    ## [1] 54.8 70.7 94.9 44.7
  • if-else will take actions according to the first element of the vector, so it cannot deal with more than one elements. It takes a loop to test more than one elements.
  • score<-c(30) 
    if (score< 36){
         sqrt(score)*10
    }else {
        score
    }
    ## [1] 54.8

    3.3 if-else if-else

  • if-else if-else allows more than one conditions for testing the element in a vector. For example, if a movie over 180 minutes is called “too long,” if it is under 165 minutes is “short.” If it is in-between, we call it “long.”
  • movie<-170 
    if(movie>=180){
         cat('Very long')
        } else if(movie>=165) {
        cat('Long')
      } else {
              cat('Short')
          }
    ## Long
  • Example: Assume that a hotel gives 15% discount if we book a room 90 days before check-in, 10% discount if we book a room 60 days in advance, and within 1 week 20% up from the original price. If you plan to check in on Jul. 30th and the list price is 3,000 NT dollars. How much do you have to pay if you book your room today and two weeks from now?
  • price=3000
    booking<-as.Date(Sys.Date(), format='%Y-%m-%d')
    booking
    ## [1] "2018-07-05"
    checkin<-as.Date(c("2018-07-30"), format='%Y-%m-%d')
    
    if (difftime(checkin, booking)>90){
        print (price*0.85)
    }else if (difftime(checkin, booking)>=60){
       print (price*0.9)
    }else if (difftime(checkin, booking)>=7){
       print (price)
    }else{
      print (price*1.2)
    }
    ## [1] 3000
     booking2 <-booking + 14
    if (difftime(checkin, booking2)>90){
        print (price*0.85)
    }else if (difftime(checkin, booking2)>=60){
       print (price*0.9)
    }else if (difftime(checkin, booking2)>=7){
       print (price)
    }else{
      print (price*1.2)
    }
    ## [1] 3000

    4 Loop

  • There are for, while, break, and next loops.
  • 4.1 for

  • We can repeat an action \(n\) times with for loop

    for (U in 1:5){
      cat("All work and no play","\n")
    }
    ## All work and no play 
    ## All work and no play 
    ## All work and no play 
    ## All work and no play 
    ## All work and no play
    a <-c(1:5)
    for (i in a) { cat ("Busy", " ")
      }
    ## Busy  Busy  Busy  Busy  Busy
  • In the first loop, \(U\) is a variable that R will execute the code in the bracket from the starting value to the end.
  • In the second loop, we can generate a vector, then writing a loop that will execute the code as many times as the elements in the vector that we specify.
  • The variable in the for can be shown in the result by pasting it.
  • for (i in 1:4){
      cat("Hello World", paste(i), "times \n")
    }
    ## Hello World 1 times 
    ## Hello World 2 times 
    ## Hello World 3 times 
    ## Hello World 4 times
  • We can add 1 to 10 in a loop. Notice that we have to set up a variable equals to zero.
  • sum<-0
    for (i in 1:10){
      sum = sum + i
      }
    print(sum)
    ## [1] 55

    4.1.1 Generating random numbers

  • To show a distribution of a random varaible through an experiment, we can toss three fair six-point dices many times and add the sum of each toss together. The summation should follow the shape of a normal distribution.
  • set.seed(11605)
    dice <- seq(1:6)
    x <- c()
    for (i in 1:1000){
      x[i]<-sum(sample(dice, 1), sample(dice, 1), sample(dice, 1)) 
    }
    # graphic
    df<-data.frame(Dice=x)
    library(ggplot2)
    g <- ggplot(aes(Dice), data=df) + 
      geom_histogram(binwidth = 0.9, fill='lightgreen', aes(y=..density..), position="identity") +
      labs(x="Sum of Three Dices", y="Density")
    g

  • We use index to record the result of each experiment and save it in a vector for further analysis.
  • 4.1.2 for and function

  • We can combine for loop and function. Given certain condictions, we can execute or skip certain codes.
  • For example, we draw three cards from a stack of cards from 1 to 13. Suppose we sum up the first two cards and decide if we should draw the third card. If the sum of the first two cards are smaller than 16, we have to draw the third one. Otherwise we just show the sum of the first two cards. We can manipulate the random seed to test our function.
  • set.seed(02138)
    card<-function(x) {
    set.seed(x)
    for (i in 1:3)
      x[i]<-sample(1:13, 1)
      if (x[1]+x[2]<16 ){
      print(x[1:3])
      cat(sum(x[1:3]),"is sum of three cards \n")
        } else {
            print(x[1:2])
            cat(sum(x[1:2]), "is sum of the first 2 cards \n")
            }
      }
    card(100); card(1001); card(11605)
    ## [1] 5 4 8
    ## 17 is sum of three cards
    ## [1] 13  6
    ## 19 is sum of the first 2 cards
    ## [1] 11  2  2
    ## 15 is sum of three cards
  • We can designate ‘cards’ to the function and the parameter is the random seed number.
  • 4.1.3 for and if-else if-else

  • We can build a function by applying for and if-else if-else to the booking exercise above if we have more than one dates.
  • today<-as.Date(Sys.Date(), format='%Y-%m-%d')
    
    hotel <- function(checkin){
          n <- length(checkin)
          price <- 3000
    diff <- difftime(checkin, today)  
    
    for (i in 1:n)
        if (diff[i]>90){
              print(checkin[i])
                 cat (round(diff[i]/30,1), "months:", price*0.85, "\n")
          }else if (diff[i]>=60){
                  print(checkin[i])
                  cat (round(diff[i]/30,1), "months:",price*0.9,"\n")
        }else if (diff[i]>=30){
                print(checkin[i])
                cat (round(diff[i]/30,1), "months:",price,"\n")
        }else{
                print(checkin[i])
                cat (diff[i],  "days:",price*1.2, "\n")
        }
     }
    checkin<-as.Date(c("2018-12-31", "2018-11-20","2018-09-20"),    format='%Y-%m-%d')
    checkin<-c(checkin, today+7)
    hotel(checkin)
    ## [1] "2018-12-31"
    ## 6 months: 2550 
    ## [1] "2018-11-20"
    ## 4.6 months: 2550 
    ## [1] "2018-09-20"
    ## 2.6 months: 2700 
    ## [1] "2018-07-12"
    ## 7 days: 3600

    4.1.4 Double Loops

  • Double loops allow us to set up two variables in a function at the same time. For example, we can generate a 2020 multiplication table with double loops.
  • multiplication <- matrix(nrow=10, ncol=10)
    for (i in 1:dim(multiplication)[1]){
      for (j in 1:dim(multiplication)[2]){
        multiplication[i,j] <- (i+10)*(j+10)
      }
    }
    rownames(multiplication)<-c(11:20)
    colnames(multiplication)<-c(11:20)
    multiplication
    ##     11  12  13  14  15  16  17  18  19  20
    ## 11 121 132 143 154 165 176 187 198 209 220
    ## 12 132 144 156 168 180 192 204 216 228 240
    ## 13 143 156 169 182 195 208 221 234 247 260
    ## 14 154 168 182 196 210 224 238 252 266 280
    ## 15 165 180 195 210 225 240 255 270 285 300
    ## 16 176 192 208 224 240 256 272 288 304 320
    ## 17 187 204 221 238 255 272 289 306 323 340
    ## 18 198 216 234 252 270 288 306 324 342 360
    ## 19 209 228 247 266 285 304 323 342 361 380
    ## 20 220 240 260 280 300 320 340 360 380 400

    4.1.5 Data clean

  • for loop can help us read and clean data. For example, we want to read data of 23 cities. It has several variables, such as the percentage of elder people.
  • library(foreign)
    stat.dat<-read.csv("CS3171D1A.csv",header=TRUE,sep=";",dec=".",fileEncoding="BIG5")
    stat.dat[1:11,]
    ##                         X 臺北縣 宜蘭縣 桃園縣 新竹縣 苗栗縣 臺中縣 彰化縣
    ## 1  老年人口比率(65歲以上)     NA     NA     NA     NA     NA     NA     NA
    ## 2                    2000   6.37   10.2   7.46   9.69   11.0   7.16   9.42
    ## 3                    2001   6.44   10.5   7.49   9.91   11.2   7.32   9.73
    ## 4                    2002   6.55   10.8   7.51  10.17   11.6   7.50  10.03
    ## 5                    2003   6.67   11.2   7.56  10.39   11.9   7.68  10.31
    ## 6                    2004   6.86   11.5   7.62  10.58   12.2   7.90  10.65
    ## 7                    2005   7.08   11.9   7.72  10.85   12.5   8.12  10.97
    ## 8                    2006   7.32   12.3   7.84  11.02   12.8   8.31  11.28
    ## 9                    2007   7.52   12.6   7.92  11.12   13.0   8.50  11.55
    ## 10                   2008   7.76   12.8   8.05  11.20   13.2   8.68  11.79
    ## 11                   2009   8.04   13.0   8.18  11.28   13.3   8.86  11.98
    ##    南投縣 雲林縣 嘉義縣 臺南縣 高雄縣 屏東縣 臺東縣 花蓮縣 澎湖縣 基隆市
    ## 1      NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
    ## 2    10.6   11.6   12.4   10.8   8.35   10.0   11.3   10.7   14.4   8.81
    ## 3    10.9   12.0   12.8   11.0   8.52   10.2   11.4   10.8   14.3   9.06
    ## 4    11.2   12.4   13.1   11.3   8.75   10.5   11.6   11.0   14.4   9.28
    ## 5    11.6   12.8   13.6   11.6   8.95   10.8   11.8   11.2   14.6   9.47
    ## 6    12.0   13.3   14.0   11.8   9.16   11.1   12.0   11.4   14.8   9.71
    ## 7    12.3   13.7   14.3   12.1   9.39   11.4   12.2   11.6   14.8  10.03
    ## 8    12.7   14.1   14.8   12.4   9.65   11.8   12.5   11.9   15.0  10.31
    ## 9    13.0   14.4   15.1   12.6   9.88   12.0   12.7   12.1   15.0  10.54
    ## 10   13.2   14.7   15.3   12.8  10.11   12.3   12.9   12.3   14.9  10.77
    ## 11   13.4   14.9   15.6   12.9  10.33   12.5   13.0   12.4   14.6  10.96
    ##    新竹市 臺中市 嘉義市 臺南市 臺北市 高雄市
    ## 1      NA     NA     NA     NA     NA     NA
    ## 2    8.46   6.49   8.67   7.69   9.67   7.16
    ## 3    8.50   6.60   8.85   7.85   9.94   7.41
    ## 4    8.59   6.79   9.15   8.06  10.25   7.63
    ## 5    8.69   6.94   9.46   8.24  10.58   7.93
    ## 6    8.81   7.15   9.70   8.46  10.92   8.24
    ## 7    8.95   7.35  10.00   8.69  11.29   8.59
    ## 8    9.12   7.59  10.25   8.90  11.64   8.94
    ## 9    9.21   7.75  10.45   9.09  11.96   9.23
    ## 10   9.29   7.92  10.64   9.33  12.31   9.57
    ## 11   9.37   8.13  10.86   9.53  12.60   9.93
  • There is a variable name on the second column and second row. So how do we correctly read the data not in right format?
    • We create a data frame that has only one column, the names of 23 cities.
    • We create a data frame that has 23 rows and 10 columns
    • Then we run a loop 23 times
    • Notice that the first element comes from the second column and second row.
    dt <- data.frame(city=colnames(stat.dat)[-1])
    old<-data.frame()  
     for (u in 1:23){
       for (i in 1:11)
       old[u, i]<-stat.dat[i+1, u+1]
     }
    dt <-data.frame(dt, old)
    colnames(dt)<-c("city", c(2000:2010))
    head(dt)
    ##     city  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010
    ## 1 臺北縣  6.37  6.44  6.55  6.67  6.86  7.08  7.32  7.52  7.76  8.04  8.27
    ## 2 宜蘭縣 10.20 10.49 10.82 11.17 11.54 11.95 12.30 12.61 12.83 13.01 13.10
    ## 3 桃園縣  7.46  7.49  7.51  7.56  7.62  7.72  7.84  7.92  8.05  8.18  8.24
    ## 4 新竹縣  9.69  9.91 10.17 10.39 10.58 10.85 11.02 11.12 11.20 11.28 11.15
    ## 5 苗栗縣 10.98 11.21 11.57 11.87 12.19 12.50 12.79 13.01 13.21 13.33 13.40
    ## 6 臺中縣  7.16  7.32  7.50  7.68  7.90  8.12  8.31  8.50  8.68  8.86  8.99
  • We can melt the data and make a multiple line chart.
  • library(reshape2)
    DT <-melt(dt, id.vars='city', variable.name='years')
    DT$years <- as.Date(DT$years, format="%Y")
    library(ggplot2)
    ggplot(DT, aes(x=years, y=value, col=city)) +
          geom_line(size=1) +
         geom_point(shape=16, size=3) +
       labs(x="Years", y="Percent") +
       scale_x_date(date_labels = "%Y") +
        theme(text=element_text(family='STFangsong')) 


    4.2 while

  • while loopallows us to leave the loop when certain conditions are met. For example, we want to stop the loop of calculation if \(2^{x}\) is greater than 1000.
  • power<--1
    while (power <= 12) {
          power <- power +1
        if (2^power<1000){
        cat(2^power, "\n")
        }else{
            cat("Stop")
        }
    }
    ## 1 
    ## 2 
    ## 4 
    ## 8 
    ## 16 
    ## 32 
    ## 64 
    ## 128 
    ## 256 
    ## 512 
    ## StopStopStopStop
  • for loop can only return \(2^{0}, \ldots, 2^{12}\).
  •  for (a in -1:11){
        a <- a +1
       print(2^a)
     }
    ## [1] 1
    ## [1] 2
    ## [1] 4
    ## [1] 8
    ## [1] 16
    ## [1] 32
    ## [1] 64
    ## [1] 128
    ## [1] 256
    ## [1] 512
    ## [1] 1024
    ## [1] 2048
    ## [1] 4096

    4.3 break

  • break loop can literally stop the loop.
  • power<-0
    while (power <= 12) {
      if (2^power<1000){
        cat(2^power, "\n")
        }else{
            cat("Stop")
            break
        }
      power <- power +1
    }
    ## 1 
    ## 2 
    ## 4 
    ## 8 
    ## 16 
    ## 32 
    ## 64 
    ## 128 
    ## 256 
    ## 512 
    ## Stop
  • We apply this technique to the booking system. For example, if we book the room within 30 days and the price is over our budget, the loop will stop.
  • today<-as.Date(Sys.Date(), format='%Y-%m-%d')
    
    hotel <- function(checkin){
    n <- length(checkin)
    price <- 3000
    diff <- difftime(checkin, today)  
    for (i in 1:n)
        if (diff[i]>90){
              print(checkin[i])
                 cat (round(diff[i]/30,1), "months:", price*0.85, "\n")
          }else if (diff[i]>=60){
                  print(checkin[i])
                  cat (round(diff[i]/30,1), "months:",price*0.9,"\n")
        }else if (diff[i]>=30){
                print(checkin[i])
                cat (round(diff[i]/30,1), "months:",price,"\n")
                
        }else{
                print(checkin[i])
                cat("Over the budget")
                break
        }
     }
    checkin<-as.Date(c("2018-10-31", "2018-08-10","2018-07-20"), format='%Y-%m-%d')
    checkin<-c(checkin, today+3)
    hotel(checkin)
    ## [1] "2018-10-31"
    ## 3.9 months: 2550 
    ## [1] "2018-08-10"
    ## 1.2 months: 3000 
    ## [1] "2018-07-20"
    ## Over the budget

    5 Assignments

      1. Please find out the number of letters of state names longer than 13, and point out their poisitions in an array. nchar() can return the length of word.
      1. Please read studentsfull.txt and get the data of students from Economic and Chemistry.
      1. After an exam, the instructer decides to take the square root of scores smaller than 60 and multiply it by 10. For example, 36 will be changed to 60. Please write a function to convert the following scores: 34, 81, 55, 69, 77, 40, 49, 26.
      1. Please write a function to convert the difference between two dates to month. You can make an example of today and July 31, 2020.
      1. Please try to get “unemployment rate” and combine it with the percentage of elder people in 2000.