1 Introduction

In R, we can read data from files stored outside the R environment. We can also write data into files that will be stored and accessed by the operating system. R can read and write into various file formats like csv, excel, txt,rds, xml, json, etc.

2 Working Directory

Before we start working with data (interface data), first make sure your working directory in the right connection. You can check it by using the getwd() function. You can also set a new working directory using setwd() function

print(getwd())   # get and print current working directory
## [1] "C:/Users/USER/Documents/SEMESTER 3/ALGORITMA/R Vanessa"
getwd()          # get and print current working directory
## [1] "C:/Users/USER/Documents/SEMESTER 3/ALGORITMA/R Vanessa"
setwd("C:/Users/USER/Documents/SEMESTER 3/ALGORITMA/R Vanessa")                  # set your working directory (this is                                                                                      example)
setwd("C:/Users/USER/Documents/SEMESTER 3/ALGORITMA/R Vanessa\\")              # or this way

3 Read/Write CSV

Here is a simple example of read.csv() function to read a CSV file available in your current working directory

# csv <- read.csv(file.choose())                  # interface csv file without `setwd`
csv1 <- read.csv("csv1.csv",sep = ",")          # this is read data for comma separator
csv2 <- read.csv("csv2.csv",sep = ";")       # this is read data for semicolon separator 
head(csv2,3)                                      # print the result of `data1`
##   id    name salary start_date dept
## 1  1  Julian  623,3 2022-01-01   DS
## 2  2 Vanessa  515,2 2022-09-23   DS
## 3  3  Jeffry    611 2022-11-15   BA

R can create CSV file from existing data frame. The write.csv() function is used to create the CSV file. This file gets created in the working directory.

write.csv(csv1,"csv3.csv")                # check the output in your working directory

4 Read/Write Excel

Microsoft Excel is the most widely used spreadsheet program which stores data in the .xls or .xlsx format. R can read directly from these files using some excel specific packages. We will be using readxl package.

#xlsx <- read_excel(file.choose())                # interface xlsx file without `setwd`
#install.packages("readxl")                       # install `readxl` packages 
library("readxl")                                  # load `readxl` packages
xlsx1<-read_excel("xlsx1.xlsx",sheet=1)       # read/import data xlsx from PC
head(xlsx1,3)
## # A tibble: 3 x 5
##      id name    salary start_date dept 
##   <dbl> <chr>   <chr>  <chr>      <chr>
## 1     1 Julian  623    44562      DS   
## 2     2 Vanessa 515    44827      DS   
## 3     3 Jeffry  611    44880      BA

In order to write existing data frame into excel file you have to install writexl package

#install.packages("writexl")                       # install `readxl` packages 
library("writexl")                                 # load `readxl` packages 
writexl::write_xlsx(xlsx1,"xlsx2.xlsx")       # check the output in your working directory

5 Read/Write TXT and RDS

One of the most common tasks we perform is reading in data from CSV and XLSX files. However, for large CSV or XLSX files, this can be slow. One neat trick is to read in the data and save as a TXT or an R binary file (RDS). To import in the TXT file, we useread.table() and to import the RDS file we can use readRDS().

# txt1 <- read.table(file.choose())                # interface TXT file without `setwd`
txt1 <- read.table("txt1.txt")                # read/load TXT format (notepad)
txt1 <- source("txt1.Rdmpd")                  # read/load TXT format (Rdmpd)
rds1 <- readRDS("rds1.rds")                   # read/load binary RDS format
ascii1 <- readRDS("ascii1.rds")               # read/load binary ASCII format

To save as a TXT file we can use “write.table()” function, and for R binary file (RDS) we can use saveRDS() function. They are widely used by R itself, for example, to store metadata for a package and to store the help.search databases: the “.rds” file extension is most often used. This format can be binary or ASCII. Binary is more compact, while ASCII will be more efficient with version control systems like Git

data <- read.csv("csv1.csv",sep = ",")        # read/import data from your PC
write.table(data,"txt2.txt")                  # save in a TXT format (notepad)
dump("data", "txt2.Rdmpd")                    # save in a TXT format (Rdmpd)
saveRDS(data, "rds2.rds")                     # save a single object in binary RDS format
saveRDS(data, "ascii2.rds", ascii=TRUE)            # save a single object in ASCII RDS format

6 Read/Write XML

XML is a file format that shares both the file format and the data on the World Wide Web, intranets, and elsewhere using standard ASCII text. It stands for Extensible Markup Language (XML). Similar to HTML it contains markup tags. But unlike HTML where the markup tag describes the structure of the page, in XML the markup tags describe the meaning of the data contained in the file. You can read an XML file in R using the “XML” package. The XML file is read by R using the function xmlParse(). It is stored as a list in R.

library("XML")                                     # load the package required to read XML files
library("methods")                                 # also load the other required package.
result <- xmlParse(file = "xml1.xml")         # give the input file name to the function
print(result)                                      # Print the result
## <?xml version="1.0"?>
## <RECORDS>
##   <EMPLOYEE>
##     <id>1</id>
##     <name>Julian</name>
##     <salary>623.3</salary>
##     <start_date>1/1/2022</start_date>
##     <dept>DS</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>2</id>
##     <name>Vanessa</name>
##     <salary>515.2</salary>
##     <start_date>9/23/2022</start_date>
##     <dept>DS</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>3</id>
##     <name>Jeffry</name>
##     <salary>611</salary>
##     <start_date>11/15/2022</start_date>
##     <dept>BA</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>4</id>
##     <name>Angel</name>
##     <salary>729</salary>
##     <start_date>5/11/2022</start_date>
##     <dept>BA</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>5</id>
##     <name>Nikki</name>
##     <salary>843.25</salary>
##     <start_date>3/27/2022</start_date>
##     <dept>DS</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>6</id>
##     <name>Ardifo</name>
##     <salary>578</salary>
##     <start_date>5/21/2022</start_date>
##     <dept>Actuaries</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>7</id>
##     <name>Irene</name>
##     <salary>722.5</salary>
##     <start_date>7/30/2022</start_date>
##     <dept>Actuaries</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>8</id>
##     <name>Kefas</name>
##     <salary>632.8</salary>
##     <start_date>6/17/2022</start_date>
##     <dept>CA</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>9</id>
##     <name>Sherly</name>
##     <salary>632.8</salary>
##     <start_date>7/30/2022</start_date>
##     <dept>DE</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>10</id>
##     <name>Bakti</name>
##     <salary>NA</salary>
##     <start_date>9/03/2018</start_date>
##     <dept>Lecturer</dept>
##   </EMPLOYEE>
## </RECORDS>
## 

To handle the data effectively in large files we read the data in the XML file as a data frame. Then process the data frame for data analysis.

xmldataframe <- xmlToDataFrame(result)             # convert the input xml file to a data frame
head(xmldataframe,3)                               # Print the result
##   id    name salary start_date dept
## 1  1  Julian  623.3   1/1/2022   DS
## 2  2 Vanessa  515.2  9/23/2022   DS
## 3  3  Jeffry    611 11/15/2022   BA

Create a XML file by copying this data into a text editor like notepad. Save the file with a .xml extension and choosing the file type as all files(.).

7 Read/Write JSON

JSON file stores data as text in human-readable format. Json stands for JavaScript Object Notation. R can read JSON files using the rjson package. more about JSON.

library("rjson")                                   # load the package to read JSON files
json1 <- fromJSON(file= "json1.json")         # give the input file name to the function
print(json1)                                       # print the result
## $id
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
## 
## $name
##  [1] "Julian"  "Vanessa" "Jeffry"  "Angel"   "Nikki"   "Ardifo"  "Irene"  
##  [8] "Kefas"   "Sherly"  "Bakti"  
## 
## $salary
##  [1] "623.3"  "515.2"  "611"    "729"    "843.25" "578"    "722.5"  "632.8" 
##  [9] "632.8"  "NA"    
## 
## $start_date
##  [1] "1/1/2022"   "9/23/2022"  "11/15/2022" "5/11/2022"  "3/27/2022" 
##  [6] "5/21/2022"  "7/30/2022"  "6/17/2022"  "7/30/2022"  "9/3/2018"  
## 
## $dept
##  [1] "DS"        "DS"        "BA"        "DA"        "DS"        "Actuaries"
##  [7] "Actuaries" "CA"        "DE"        "Lecturer"

We can convert the extracted data above to a R data frame for further analysis using the as.data.frame()function.

json_data_frame <- as.data.frame(json1)            # convert JSON file to a data frame
head(json_data_frame,3)                            # print the result
##   id    name salary start_date dept
## 1  1  Julian  623.3   1/1/2022   DS
## 2  2 Vanessa  515.2  9/23/2022   DS
## 3  3  Jeffry    611 11/15/2022   BA

8 Read Data from Web

Many websites provide data for consumption by its users. Using R programs, we can programmatically extract specific data from such websites. In this section, I provide examples of how to import data from the GitHub repository, but you can do the same thing to other websites or repository.

8.1 CSV:

web_csv <- read.csv("https://github.com/Bakti-Siregar/dataset/raw/master/Bookdown-Data-Science-for-Beginners/csv1.csv")
head(web_csv,3)
##   id    name salary start_date dept
## 1  1  Julian  623,3   1/1/2022   DS
## 2  2 Vanessa  515,2  9/23/2022   DS
## 3  3  Jeffry    611 11/15/2022   BA

8.2 XLSX:

library(rio)                                       # import data from github
install_formats()                                  # cross-check suggested packages
## [1] TRUE
web_xlsx <-rio::import("https://github.com/Bakti-Siregar/dataset/blob/master/Bookdown-Data-Science-for-Beginners/xlsx1.xlsx?raw=true")
head(web_xlsx,3)
##   id    name salary start_date dept
## 1  1  Julian    623 2022-01-01   DS
## 2  2 Vanessa    515 2022-09-23   DS
## 3  3  Jeffry    611 2022-11-15   BA

8.3 OTHERS

web_txt <- read.table("type URL/Web.txt here")     # read/load TXT format (notepad) from web
web_rds <- readRDS("type URL/Web.rds here")        # read/load RDS format from web
web_ascii <- readRDS("type URL/Web.ascii here")    # read/load ASCII format from web
web_xml<- xmlParse("type URL/Web.xml here")        # read/load XML format from web
xmlToDataFrame(web_xml)                            # convert the input xml file to a data frame
web_json <- fromJSON("type URL/Web.json here")     # read/load JSON format from web
as.data.frame(web_json)                            # convert JSON file to a data frame

9 R-Databases

The data is Relational database systems are stored in a normalized format. So, to carry out statistical computing we will need very advanced and complex SQL queries. But R can connect easily to many relational databases like MySql, Oracle, SQL Server, etc. and fetch records from them as a data frame. Once the data is available in the R environment, it becomes a normal R data set and can be manipulated or analyzed using all the powerful packages and functions.

Note: We will learn this section in the specific section called “Database System with R”

10 Webscraping Data

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere.

Note: We will learn this section in the specific section called “Applied Data Science”