Week7_Assignment

11.1-11.2

1.What function would you use to read a file where fields were separated with “|”?

# read_delim(file, delim = "|")

2.Apart from file, skip, and comment, what other arguments do read_csv() and read_tsv() have in common?

#the following arguments:
intersect(names(formals(read_csv)), names(formals(read_tsv)))

##  [1] "file"            "col_names"       "col_types"       "col_select"     
##  [5] "id"              "locale"          "na"              "quoted_na"      
##  [9] "quote"           "comment"         "trim_ws"         "skip"           
## [13] "n_max"           "guess_max"       "name_repair"     "num_threads"    
## [17] "progress"        "show_col_types"  "skip_empty_rows" "lazy"

5.Identify what is wrong with each of the following inline CSV files. What happens when you run the code?

read_csv("a,b\n1,2,3\n4,5,6")     #Only two columns are specified in the header “a” and “b”, but the rows have three columns

## Warning: One or more parsing issues, see `problems()` for details

## Rows: 2 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (1): a
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 2 × 2
##       a     b
##   <dbl> <dbl>
## 1     1    23
## 2     4    56

read_csv("a,b,c\n1,2\n1,2,3,4")   #the header has 3 columns, row 2 has 2 values and row 4 has 4 values

## Warning: One or more parsing issues, see `problems()` for details

## Rows: 2 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): a, b
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 2 × 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1     1     2    NA
## 2     1     2    34

read_csv("a,b\n\"1")    #The opening quote "1 is dropped because it is not closed, and a is treated as an integer.

## Rows: 0 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): a, b
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 0 × 2
## # … with 2 variables: a <chr>, b <chr>

read_csv("a,b\n1,2\na,b") #Both “a” and “b” are treated as character vectors since they contain non-numeric strings. This may have been intentional, or the author may have intended the values of the columns to be “1,2” and “a,b”.

## Rows: 2 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): a, b
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 2 × 2
##   a     b    
##   <chr> <chr>
## 1 1     2    
## 2 a     b

read_csv("a;b\n1;3")   #The values are separated by “;” rather than “,”.

## Rows: 1 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): a;b
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 1 × 1
##   `a;b`
##   <chr>
## 1 1;3

11.3

2.What happens if you try and set decimal_mark and grouping_mark to the same character? What happens to the default value of grouping_mark when you set decimal_mark to “,”? What happens to the default value of decimal_mark when you set the grouping_mark to “.”?

#If the decimal and grouping marks are set to the same character, locale throws an error:
locale(decimal_mark = ".", grouping_mark = ".")

## Error: `decimal_mark` and `grouping_mark` must be different

#If the decimal_mark is set to the comma ",", then the grouping mark is set to the period ".":
locale(decimal_mark = ",")

## <locale>
## Numbers:  123.456,78
## Formats:  %AD / %AT
## Timezone: UTC
## Encoding: UTF-8
## <date_names>
## Days:   Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed), Thursday
##         (Thu), Friday (Fri), Saturday (Sat)
## Months: January (Jan), February (Feb), March (Mar), April (Apr), May (May),
##         June (Jun), July (Jul), August (Aug), September (Sep), October
##         (Oct), November (Nov), December (Dec)
## AM/PM:  AM/PM

#If the grouping mark is set to a period, then the decimal mark is set to a comma
locale(grouping_mark = ".")

## <locale>
## Numbers:  123.456,78
## Formats:  %AD / %AT
## Timezone: UTC
## Encoding: UTF-8
## <date_names>
## Days:   Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed), Thursday
##         (Thu), Friday (Fri), Saturday (Sat)
## Months: January (Jan), February (Feb), March (Mar), April (Apr), May (May),
##         June (Jun), July (Jul), August (Aug), September (Sep), October
##         (Oct), November (Nov), December (Dec)
## AM/PM:  AM/PM

7.Generate the correct format string to parse each of the following dates and times:

d1 <- "January 1, 2010"
d2 <- "2015-Mar-07"
d3 <- "06-Jun-2017"
d4 <- c("August 19 (2015)", "July 1 (2015)")
d5 <- "12/30/14" 
t1 <- "1705"
t2 <- "11:15:10.12 PM"

parse_date(d1, "%B %d, %Y")

## [1] "2010-01-01"

parse_date(d2, "%Y-%b-%d")

## [1] "2015-03-07"

parse_date(d3, "%d-%b-%Y")

## [1] "2017-06-06"

parse_date(d4, "%B %d (%Y)")

## [1] "2015-08-19" "2015-07-01"

parse_date(d5, "%m/%d/%y")

## [1] "2014-12-30"

parse_time(t1, "%H%M")

## 17:05:00

parse_time(t2, "%H:%M:%OS %p")

## 23:15:10.12

Week7_Assignment

2022-07-03

11.1-11.2

1.What function would you use to read a file where fields were separated with “|”?

2.Apart from file, skip, and comment, what other arguments do read_csv() and read_tsv() have in common?

5.Identify what is wrong with each of the following inline CSV files. What happens when you run the code?

11.3

2.What happens if you try and set decimal_mark and grouping_mark to the same character? What happens to the default value of grouping_mark when you set decimal_mark to “,”? What happens to the default value of decimal_mark when you set the grouping_mark to “.”?

7.Generate the correct format string to parse each of the following dates and times: