1 Introduction

In R, we can read data from files stored outside the R environment. We can also write data into files that will be stored and accessed by the operating system. R can read and write into various file formats like csv, excel, txt,rds, xml, json, etc.

2 Working Directory

Before we start working with data (interface data), first make sure your working directory in the right connection. You can check it by using the getwd() function. You can also set a new working directory using setwd() function.

## [1] "D:/Documents/Campus Documents/Semester 3/Algoritma dan Struktur Data/Week 9"
## [1] "D:/Documents/Campus Documents/Semester 3/Algoritma dan Struktur Data/Week 9"

3 Read/Write CSV

3.1 Read CSV

Here is a simple example of read.csv() function to read a CSV file available in your current working directory.

##   id    name salary start_date dept
## 1  1  Julian  623,3 2022-01-01   DS
## 2  2 Vanessa  515,2 2022-09-23   DS
## 3  3  Jeffry    611 2022-11-15   BA

3.2 Write CSV

R can create CSV file from existing data frame. The write.csv() function is used to create the CSV file. This file gets created in the working directory.

4 Read/Write Excel

4.1 Read Excel

Microsoft Excel is the most widely used spreadsheet program which stores data in the .xls or .xlsx format. R can read directly from these files using some excel specific packages. We will be using readxl package.

## # A tibble: 3 x 5
##      id name    salary start_date dept 
##   <dbl> <chr>   <chr>  <chr>      <chr>
## 1     1 Julian  623    44562      DS   
## 2     2 Vanessa 515    44827      DS   
## 3     3 Jeffry  611    44880      BA

5 Read/Write TXT and RDS

5.1 Read TXT and RDS

One of the most common tasks we perform is reading in data from CSV and XLSX files. However, for large CSV or XLSX files, this can be slow. One neat trick is to read in the data and save as a TXT or an R binary file (RDS). To import in the TXT file, we use read.table() and to import the RDS file we can use readRDS().

5.2 Write TXT and RDS

To save as a TXT file we can use write.table() function, and for R binary file (RDS) we can use saveRDS() function. They are widely used by R itself, for example, to store metadata for a package and to store the help.search databases: the .rds file extension is most often used. This format can be binary or ASCII. Binary is more compact, while ASCII will be more efficient with version control systems like Git.

6 Read/Write XML

XML is a file format that shares both the file format and the data on the World Wide Web, intranets, and elsewhere using standard ASCII text. It stands for Extensible Markup Language (XML). Similar to HTML it contains markup tags. But unlike HTML where the markup tag describes the structure of the page, in XML the markup tags describe the meaning of the data contained in the file. You can read an XML file in R using the XML package. The XML file is read by R using the function xmlParse(). It is stored as a list in R.

## <?xml version="1.0"?>
## <RECORDS>
##   <EMPLOYEE>
##     <id>1</id>
##     <name>Julian</name>
##     <salary>623.3</salary>
##     <start_date>1/1/2022</start_date>
##     <dept>DS</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>2</id>
##     <name>Vanessa</name>
##     <salary>515.2</salary>
##     <start_date>9/23/2022</start_date>
##     <dept>DS</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>3</id>
##     <name>Jeffry</name>
##     <salary>611</salary>
##     <start_date>11/15/2022</start_date>
##     <dept>BA</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>4</id>
##     <name>Angel</name>
##     <salary>729</salary>
##     <start_date>5/11/2022</start_date>
##     <dept>BA</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>5</id>
##     <name>Nikki</name>
##     <salary>843.25</salary>
##     <start_date>3/27/2022</start_date>
##     <dept>DS</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>6</id>
##     <name>Ardifo</name>
##     <salary>578</salary>
##     <start_date>5/21/2022</start_date>
##     <dept>Actuaries</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>7</id>
##     <name>Irene</name>
##     <salary>722.5</salary>
##     <start_date>7/30/2022</start_date>
##     <dept>Actuaries</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>8</id>
##     <name>Kefas</name>
##     <salary>632.8</salary>
##     <start_date>6/17/2022</start_date>
##     <dept>CA</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>9</id>
##     <name>Sherly</name>
##     <salary>632.8</salary>
##     <start_date>7/30/2022</start_date>
##     <dept>DE</dept>
##   </EMPLOYEE>
##   <EMPLOYEE>
##     <id>10</id>
##     <name>Bakti</name>
##     <salary>NA</salary>
##     <start_date>9/03/2018</start_date>
##     <dept>Lecturer</dept>
##   </EMPLOYEE>
## </RECORDS>
## 

To handle the data effectively in large files we read the data in the XML file as a data frame. Then process the data frame for data analysis.

##   id    name salary start_date dept
## 1  1  Julian  623.3   1/1/2022   DS
## 2  2 Vanessa  515.2  9/23/2022   DS
## 3  3  Jeffry    611 11/15/2022   BA

Create a XML file by copying this data into a text editor like notepad. Save the file with a .xml extension and choosing the file type as all files(.).

7 Read/Write JSON

JSON file stores data as text in human-readable format. Json stands for JavaScript Object Notation. R can read JSON files using the rjson package. more about JSON.

## $id
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
## 
## $name
##  [1] "Julian"  "Vanessa" "Jeffry"  "Angel"   "Nikki"   "Ardifo"  "Irene"  
##  [8] "Kefas"   "Sherly"  "Bakti"  
## 
## $salary
##  [1] "623.3"  "515.2"  "611"    "729"    "843.25" "578"    "722.5"  "632.8" 
##  [9] "632.8"  "NA"    
## 
## $start_date
##  [1] "1/1/2022"   "9/23/2022"  "11/15/2022" "5/11/2022"  "3/27/2022" 
##  [6] "5/21/2022"  "7/30/2022"  "6/17/2022"  "7/30/2022"  "9/3/2018"  
## 
## $dept
##  [1] "DS"        "DS"        "BA"        "DA"        "DS"        "Actuaries"
##  [7] "Actuaries" "CA"        "DE"        "Lecturer"

We can convert the extracted data above to a R data frame for further analysis using the as.data.frame() function.

##   id    name salary start_date dept
## 1  1  Julian  623.3   1/1/2022   DS
## 2  2 Vanessa  515.2  9/23/2022   DS
## 3  3  Jeffry    611 11/15/2022   BA

8 Read Data From Web

Many websites provide data for consumption by its users. Using R programs, we can programmatically extract specific data from such websites. In this section, I provide examples of how to import data from the GitHub repository, but you can do the same thing to other websites or repository.

8.1 CSV:

##   id    name salary start_date dept
## 1  1  Julian  623,3   1/1/2022   DS
## 2  2 Vanessa  515,2  9/23/2022   DS
## 3  3  Jeffry    611 11/15/2022   BA

9 R-Databases

The data is Relational database systems are stored in a normalized format. So, to carry out statistical computing we will need very advanced and complex SQL queries. But R can connect easily to many relational databases like MySql, Oracle, SQL Server, etc. and fetch records from them as a data frame. Once the data is available in the R environment, it becomes a normal R data set and can be manipulated or analyzed using all the powerful packages and functions.

Note: We will learn this section in the specific section called “Database System with R”.

10 Webscraping Data

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere.

Note: We will learn this section in the specific section called “Applied Data Science”.