file()
: The path to the file to be openedopen()
, close()
: Open and close a connectionscan()
: Read data into a vector of the same data type from the console or filereadLines()
: Reads text lines from a connectionscan()
, readLines()
, R can read any character data from any connection.# Scan character data from text file, specify that you're scan character data
scan(file = "Z:/R Training/character_data.txt", what = "character")
[1] "This" "is" "a" "test" "file" "containg"
[7] "character" "data."
# Establish a file connection
char_data_file <- file("Z:/R Training/character_data.txt")
# Collpase Character vector into one character string
str_c(scan(char_data_file, what = "character"), collapse = " ")
[1] "This is a test file containg character data."
# Using readLines, extracts the text from the input source
# Returns each line as a character string where n is the number of lines
readLines(char_data_file, n = 1)
[1] "This is a test file containg character data. "
The function read.table()
is the most common base R function for reading in rectangular data.
Arguments for read.table()
:
file
: name of the file or connectionheader
: A boolean value indicating whether the file has a header linesep
: How the columns are seperated denoted by a stringcolClasses
: A character vector of the class for each columnnrows
: The number of rows to be read. The default value is the entire file.skip
: The number of lines to skip from the beginningstringsAsFactors
: A boolean value indicating whether character vectors should be coded as factors. Default is TRUE
.
FALSE
. This is because you can have character vectors in the data that are not categorical variables. Another reason is that even if they are encoded as factors, they might not contain all the levels and you will have to add levels anyways.read.csv()
: Comma-seperated file. sep
has a default value of “,”.read.csv2()
: Semicolon-sepearted file. sep
has a default value of “;”.read.delim()
: Useful for reading in tab-seperated data. sep
has default value of "file.size()
and object.size()
will give the size of files and objectscolClasses
argument will make read.table()
much faster.# Enrollment data prepared for Factbook
ir_dir <- "R:/Institutional Research"
enrollment_data <- read.csv(str_c(ir_dir, "Tableau","University Factbook", "Source Data",
"Enrollment.csv", sep = "/" ), stringsAsFactors = FALSE)
class(enrollment_data)
[1] "data.frame"
[1] 107840 56
# WWR 2040 file
# IR directory
ir_dir <- "R:/Institutional Research"
wwr2040_ftfy_2017 <- read.delim(str_c(ir_dir, "Data Management", "Staging", "WWR2040", "FTFY",
"WWR2040_2017_09_15.txt", sep = "/" ), stringsAsFactors = FALSE)
class(wwr2040_ftfy_2017)
[1] "data.frame"
pidm id term majr_code majr_desc levl_code coll_code year
1 2861794 873286069 201770 MUPR Performance UG AH 2017
2 2859708 873283983 201770 MUPR Performance UG AH 2017
3 2865126 873289395 201770 MUPR Performance UG AH 2017
4 2849305 873273584 201770 MUPR Performance UG AH 2017
5 2858542 873282817 201770 MUPR Performance UG AH 2017
6 2867043 873291309 201770 MUPR Performance UG AH 2017
appl_cnt comp_cnt acce_cnt conf_cnt
1 1 1 1 0
2 1 1 1 0
3 1 1 1 0
4 1 1 1 0
5 1 1 1 0
6 1 1 1 0