11 Data import

11.2.2 Exercises
11.3.5 Exercises

11.2.2 Exercises

1. What function would you use to read a file where fields were separated with “|”?

read_delim(filename, delim = "|")

2. Apart from file, skip, and comment, what other arguments do read_csv() and read_tsv() have in common?

intersect(
  read_csv %>% args %>% as.list %>% names,
  read_tsv %>% args %>% as.list %>% names
) %>% setdiff(c("file", "skip", "comment"))

##  [1] "col_names"       "col_types"       "locale"         
##  [4] "na"              "quoted_na"       "quote"          
##  [7] "trim_ws"         "n_max"           "guess_max"      
## [10] "progress"        "skip_empty_rows" ""

3. What are the most important arguments to read_fwf()?

col_positions.

4. Sometimes strings in a CSV file contain commas. To prevent them from causing problems they need to be surrounded by a quoting character, like " or ’. By convention, read_csv() assumes that the quoting character will be “, and if you want to change it you’ll need to use read_delim() instead. What arguments do you need to specify to read the following text into a data frame?

read_csv("x,y\n1,'a,b'", quote = "'")

## # A tibble: 1 x 2
##       x y    
##   <dbl> <chr>
## 1     1 a,b

read_delim("x,y\n1,'a,b'", delim = ",", quote = "'")

## # A tibble: 1 x 2
##       x y    
##   <dbl> <chr>
## 1     1 a,b

5. Identify what is wrong with each of the following inline CSV files. What happens when you run the code?

## col_namesが少ない
read_csv("a,b\n1,2,3\n4,5,6")

## Warning: 2 parsing failures.
## row col  expected    actual         file
##   1  -- 2 columns 3 columns literal data
##   2  -- 2 columns 3 columns literal data

## # A tibble: 2 x 2
##       a     b
##   <dbl> <dbl>
## 1     1     2
## 2     4     5

## データのcolumn数が合ってない
read_csv("a,b,c\n1,2\n1,2,3,4")

## Warning: 2 parsing failures.
## row col  expected    actual         file
##   1  -- 3 columns 2 columns literal data
##   2  -- 3 columns 4 columns literal data

## # A tibble: 2 x 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1     1     2    NA
## 2     1     2     3

## quoteの不整合
read_csv("a,b\n\"1")

## Warning: 2 parsing failures.
## row col                     expected    actual         file
##   1  a  closing quote at end of file           literal data
##   1  -- 2 columns                    1 columns literal data

## # A tibble: 1 x 2
##       a b    
##   <dbl> <chr>
## 1     1 <NA>

## 数字とchrが混在
read_csv("a,b\n1,2\na,b")

## # A tibble: 2 x 2
##   a     b    
##   <chr> <chr>
## 1 1     2    
## 2 a     b

## 多分csvではない `;`
read_csv("a;b\n1;3")

## # A tibble: 1 x 1
##   `a;b`
##   <chr>
## 1 1;3

11.3.5 Exercises

1. What are the most important arguments to locale()?

tz(timezone)な気がする。

2. What happens if you try and set decimal_mark and grouping_mark to the same character? What happens to the default value of grouping_mark when you set decimal_mark to “,”? What happens to the default value of decimal_mark when you set the grouping_mark to “.”?

同じだと怒られる。

parse_number("100.10.2,345", locale = locale(grouping_mark = ",", decimal_mark = ","))

## Error: `decimal_mark` and `grouping_mark` must be different

parse_number("100.10.2,345", locale = locale(grouping_mark = ".")) # ","がdecimal_marknに

## [1] 100102.3

parse_number("100.10.2,345", locale = locale(decimal_mark = ","))  # "."がgrouping_markに

## [1] 100102.3

3. I didn’t discuss the date_format and time_format options to locale(). What do they do? Construct an example that shows when they might be useful.

読みとりとか?

4. If you live outside the US, create a new locale object that encapsulates the settings for the types of file you read most commonly.

shift_jis

5. What’s the difference between read_csv() and read_csv2()?

‘read_csv2()’ uses ‘;’ for the field separator and ‘,’ for the decimal point. This is common in some European countries.

6. What are the most common encodings used in Europe? What are the most common encodings used in Asia? Do some googling to find out.

shift_jis

7. Generate the correct format string to parse each of the following dates and times:

d1 <- "January 1, 2010"
parse_date(d1, "%B %d, %Y")

## [1] "2010-01-01"

d2 <- "2015-Mar-07"
parse_date(d2, "%Y-%b-%d")

## [1] "2015-03-07"

d3 <- "06-Jun-2017"
parse_date(d3, "%d-%b-%Y")

## [1] "2017-06-06"

d4 <- c("August 19 (2015)", "July 1 (2015)")
parse_date(d4, "%B %d (%Y)")

## [1] "2015-08-19" "2015-07-01"

d5 <- "12/30/14" # Dec 30, 2014
parse_date(d5, "%m/%d/%y")

## [1] "2014-12-30"

t1 <- "1705"
parse_time(t1, "%H%M")

## 17:05:00

t2 <- "11:15:10.12 PM"
parse_time(t2, "%I:%M:%OS %p")

## 23:15:10.12

11 Data import

2019-06-19

11.2.2 Exercises

1. What function would you use to read a file where fields were separated with “|”?

2. Apart from file, skip, and comment, what other arguments do read_csv() and read_tsv() have in common?

3. What are the most important arguments to read_fwf()?

5. Identify what is wrong with each of the following inline CSV files. What happens when you run the code?

11.3.5 Exercises

1. What are the most important arguments to locale()?

2. What happens if you try and set decimal_mark and grouping_mark to the same character? What happens to the default value of grouping_mark when you set decimal_mark to “,”? What happens to the default value of decimal_mark when you set the grouping_mark to “.”?

3. I didn’t discuss the date_format and time_format options to locale(). What do they do? Construct an example that shows when they might be useful.

4. If you live outside the US, create a new locale object that encapsulates the settings for the types of file you read most commonly.

5. What’s the difference between read_csv() and read_csv2()?

6. What are the most common encodings used in Europe? What are the most common encodings used in Asia? Do some googling to find out.

7. Generate the correct format string to parse each of the following dates and times: