readr trainingThe purpose of this noteboook is to illustrate how the readr package can be used to read .csv files into R.
library(readr)
df <- readr::read_csv("test_data_1.csv")
## Parsed with column specification:
## cols(
## fruit = col_character(),
## count = col_integer()
## )
df
## # A tibble: 3 x 2
## fruit count
## <chr> <int>
## 1 Apple 1
## 2 Banana 2
## 3 Carrot NA
Make sure headers are in the right place, no footer observations, and non-blank NAs are identified as NA. Any other data cleaning can be done with stringr and dplyr.
readr::read_csv("test_data_2.csv")
## Warning: Missing column names filled in: 'X2' [2]
## Parsed with column specification:
## cols(
## `A dataset proudly brought to you by StatsNZ` = col_character(),
## X2 = col_character()
## )
## # A tibble: 7 x 2
## `A dataset proudly brought to you by StatsNZ` X2
## <chr> <chr>
## 1 Jun-18 <NA>
## 2 fruit count
## 3 Apple 1
## 4 Banana 2
## 5 Carrot C
## 6 <NA> <NA>
## 7 C means confidential <NA>
readr::read_csv("test_data_2.csv", skip = 2)
## Parsed with column specification:
## cols(
## fruit = col_character(),
## count = col_character()
## )
## # A tibble: 5 x 2
## fruit count
## <chr> <chr>
## 1 Apple 1
## 2 Banana 2
## 3 Carrot C
## 4 <NA> <NA>
## 5 C means confidential <NA>
readr::read_csv("test_data_2.csv", skip = 2, n_max = 3)
## Parsed with column specification:
## cols(
## fruit = col_character(),
## count = col_character()
## )
## # A tibble: 3 x 2
## fruit count
## <chr> <chr>
## 1 Apple 1
## 2 Banana 2
## 3 Carrot C
readr::read_csv("test_data_2.csv", skip = 2, n_max = 3, na = c("C"))
## Parsed with column specification:
## cols(
## fruit = col_character(),
## count = col_integer()
## )
## # A tibble: 3 x 2
## fruit count
## <chr> <int>
## 1 Apple 1
## 2 Banana 2
## 3 Carrot NA
readxl::excel_sheets("test_data_1.xlsx")
## [1] "fruit" "vegetables"
readxl::read_excel("test_data_1.xlsx", sheet = 1)
## # A tibble: 3 x 2
## fruit count
## <chr> <dbl>
## 1 Apple 1
## 2 Banana 2
## 3 Carrot NA
readxl::read_excel("test_data_1.xlsx", sheet = "vegetables")
## # A tibble: 3 x 2
## vegetable count
## <chr> <dbl>
## 1 Tomato 5
## 2 Beans 2
## 3 Spinach 1