Advanced Data Manipulation in R

Import data:

library(readr)
input = read_csv("person_info.csv")
input
## # A tibble: 10 x 14
##    first_name last_name birthday    age state  address  City   phone email
##    <chr>      <chr>     <chr>     <int> <chr>  <chr>    <chr>  <chr> <chr>
##  1 Carol      Davis     9/29/1996    23 Illin… 1674 Ca… Burr … 312-… curt…
##  2 Bruno      Horan     6/11/1995    24 Calif… 1561 St… San D… 858-… guad…
##  3 William    Moody     2/27/1997    22 Illin… 541 Jad… Arlin… 979-… roos…
##  4 Robin      Steel     8/3/1989     57 Texas  1674 Ca… Josep… 214-… lloy…
##  5 Michelle   Roberts   7/17/1995    24 Oregon 1372 Ga… "Port… 503-… ben1…
##  6 June       Sneed     3/27/2000    19 Arizo… 2411 Cl… Phoen… 256-… kath…
##  7 Curtis     Campbell  3/15/1991    28 Idahol 2760 Sc… Pocat… 979-… just…
##  8 Dorothy    Schott    1/2/1997     21 Calif… 2742 Su… Santa… 501-… mega…
##  9 Mae        Skinner   3/16/1995    24 Penns… <NA>     Newar… 501-… enri…
## 10 David      Victoria  8/2/1996     23 Maine  3327 Ch… Harps… 207-… caro…
## # ... with 5 more variables: car_1 <chr>, gpa <dbl>, year <chr>,
## #   class_of <int>, online_signiture <chr>

Date-time manipulation:

In R, we recommend using the lubridate library to for converting data in the format of day-month-year to a datetime object.

library(lubridate)
## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date
dates <- c("15/02/2013", "15 Feb 13", "It happened on 15 02 '13")
dmy(dates)
## [1] "2013-02-15" "2013-02-15" "2013-02-15"
mdy(input$birthday)
##  [1] "1996-09-29" "1995-06-11" "1997-02-27" "1989-08-03" "1995-07-17"
##  [6] "2000-03-27" "1991-03-15" "1997-01-02" "1995-03-16" "1996-08-02"

Note:

There are similar functions for all permutations of d, m and y. Explicitly, all of the following functions exist: dmy myd ydm mdy dym ymd.

String manipulation:

library(stringr)
str_trim(" hello world ")
## [1] "hello world"
str_trim(" hello world ", side = "left")
## [1] "hello world "
str_trim(" hello world ", side = "right")
## [1] " hello world"
toupper("Hello world")
## [1] "HELLO WORLD"
tolower("Hello World")
## [1] "hello world"
input$City[5]
## [1] "Portland, Oregon(OR), 97217\n\n"
input$City[5] = str_trim(input$City[5], side = "right")
input$City[5]
## [1] "Portland, Oregon(OR), 97217"
gender <- c("M", "male ", "Female", "fem.")
grepl("m", gender)
## [1] FALSE  TRUE  TRUE  TRUE
grep("m", gender)
## [1] 2 3 4