Reference

R Programming for Data Science by Roger D. Peng, May 31, 2022

Textbook

1 Using dput() and dump()

One way to pass data around is by deparsing the R object with dput() and reading it back in (parsing it) using dget().

## Create a data frame
y <- data.frame(a = 1, b = "a")  
## Print 'dput' output to console
dput(y)     
## structure(list(a = 1, b = "a"), class = "data.frame", row.names = c(NA, 
## -1L))

Notice that the dput() output is in the form of R code and that it preserves metadata like the class of the object, the row names, and the column names.

The output of dput() can also be saved directly to a file.

## Send 'dput' output to a file
dput(y, file = "y.R")        
## Read in 'dput' output from a file
new.y <- dget("y.R")
str(new.y)
## 'data.frame':    1 obs. of  2 variables:
##  $ a: num 1
##  $ b: chr "a"
pop2022 <- read.csv("pop2022.csv")
str(pop2022)
## 'data.frame':    107785 obs. of  9 variables:
##  $ REGION             : chr  "BARMM" "BARMM" "BARMM" "BARMM" ...
##  $ PROVINCE           : chr  "BASILAN" "BASILAN" "BASILAN" "BASILAN" ...
##  $ MUNICIPALITY       : chr  "AKBAR" "AKBAR" "AKBAR" "AKBAR" ...
##  $ BARANGAY           : chr  "LINONGAN" "LINONGAN" "MANGUSO" "UPPER SINANGKAPAN" ...
##  $ PRECINCT_ID        : int  7080001 7080002 7080003 7080004 7080005 7080006 7080007 7080008 7080009 7080010 ...
##  $ CLUSTER            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ CLUSTERTOTAL       : int  423 411 313 655 479 678 633 715 698 725 ...
##  $ CLUSTERED_PRECINCTS: chr  "0012A, 0012P1, 0011A" "0011B, 0011C, 0012B" "0035A, 0035C, 0035P1, 0035B" "0050A, 0050B, 0050C, 0050D, 0050P1" ...
##  $ POLLINGCENTER      : chr  "LINONGAN ELEMENTARY SCHOOL, BARANGAY LINONGAN, AKBAR MUNICIPALITY" "LINONGAN ELEMENTARY SCHOOL, BARANGAY LINONGAN, AKBAR MUNICIPALITY" "AKBAR ELEMENTARY SCHOOL, BARANGAY UPPER BATO-BATO, AKBAR MUNICIPALITY" "BARANGAY UPPER SINANGKAPAN COVERED COURT (BUSCC), BARANGAY UPPER SINANGKAPAN" ...

Multiple objects can be deparsed at once using the dump function and read back in using source.

x <- "foo"
y <- data.frame(a = 1L, b = "a")

We can dump() R objects to a file by passing a character vector of their names.

dump(c("x", "y", "pop2022"), file = "data.R")
# remove x and y in R memory
rm(x, y, pop2022)

The inverse of dump() is source().

source("data.R")
x
## [1] "foo"
str(y)
## 'data.frame':    1 obs. of  2 variables:
##  $ a: int 1
##  $ b: chr "a"
str(pop2022)
## 'data.frame':    107785 obs. of  9 variables:
##  $ REGION             : chr  "BARMM" "BARMM" "BARMM" "BARMM" ...
##  $ PROVINCE           : chr  "BASILAN" "BASILAN" "BASILAN" "BASILAN" ...
##  $ MUNICIPALITY       : chr  "AKBAR" "AKBAR" "AKBAR" "AKBAR" ...
##  $ BARANGAY           : chr  "LINONGAN" "LINONGAN" "MANGUSO" "UPPER SINANGKAPAN" ...
##  $ PRECINCT_ID        : int  7080001 7080002 7080003 7080004 7080005 7080006 7080007 7080008 7080009 7080010 ...
##  $ CLUSTER            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ CLUSTERTOTAL       : int  423 411 313 655 479 678 633 715 698 725 ...
##  $ CLUSTERED_PRECINCTS: chr  "0012A, 0012P1, 0011A" "0011B, 0011C, 0012B" "0035A, 0035C, 0035P1, 0035B" "0050A, 0050B, 0050C, 0050D, 0050P1" ...
##  $ POLLINGCENTER      : chr  "LINONGAN ELEMENTARY SCHOOL, BARANGAY LINONGAN, AKBAR MUNICIPALITY" "LINONGAN ELEMENTARY SCHOOL, BARANGAY LINONGAN, AKBAR MUNICIPALITY" "AKBAR ELEMENTARY SCHOOL, BARANGAY UPPER BATO-BATO, AKBAR MUNICIPALITY" "BARANGAY UPPER SINANGKAPAN COVERED COURT (BUSCC), BARANGAY UPPER SINANGKAPAN" ...

2 Binary Formats

The complement to the textual format is the binary format, which is sometimes necessary to use for efficiency purposes, or because there’s just no useful way to represent data in a textual manner. Also, with numeric data, one can often lose precision when converting to and from a textual format, so it’s better to stick with a binary format.

The key functions for converting R objects into a binary format are save(), save.image(), and serialize(). Individual R objects can be saved to a file using the save() function.

a <- data.frame(x = rnorm(100), y = runif(100))
b <- c(3, 4.4, 1 / 3)
str(a)
## 'data.frame':    100 obs. of  2 variables:
##  $ x: num  -0.707 -2.584 1.004 -0.615 -2.385 ...
##  $ y: num  0.997 0.829 0.509 0.58 0.218 ...
range(a$y)
## [1] 0.0007055944 0.9966275345
mean(a$x)
## [1] -0.1235815
sd(a$x)
## [1] 1.036628
summary(a)
##        x                 y            
##  Min.   :-2.5839   Min.   :0.0007056  
##  1st Qu.:-0.7329   1st Qu.:0.2002197  
##  Median :-0.1107   Median :0.5051831  
##  Mean   :-0.1236   Mean   :0.4874169  
##  3rd Qu.: 0.5120   3rd Qu.:0.7403850  
##  Max.   : 2.9730   Max.   :0.9966275
str(b)
##  num [1:3] 3 4.4 0.333
 ## Save 'a' and 'b' to a file
save(a, b, file = "mydata.rda") 
rm(a)
## Load 'a' and 'b' into your workspace
load("mydata.rda")              
str(a)
## 'data.frame':    100 obs. of  2 variables:
##  $ x: num  -0.707 -2.584 1.004 -0.615 -2.385 ...
##  $ y: num  0.997 0.829 0.509 0.58 0.218 ...

The serialize() function is used to convert individual R objects into a binary format that can be communicated across an arbitrary connection. This may get sent to a file, but it could get sent over a network or other connection.

When you call serialize() on an R object, the output will be a raw vector coded in hexadecimal format.

x <- list(1, 2:4, c(3,6))
x
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2 3 4
## 
## [[3]]
## [1] 3 6
x[[2]]
## [1] 2 3 4
m <- serialize(x, NULL)
m
##   [1] 58 0a 00 00 00 03 00 04 02 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
##  [26] 00 13 00 00 00 03 00 00 00 0e 00 00 00 01 3f f0 00 00 00 00 00 00 00 00 00
##  [51] ee 00 00 00 02 00 00 00 01 00 04 00 09 00 00 00 0e 63 6f 6d 70 61 63 74 5f
##  [76] 69 6e 74 73 65 71 00 00 00 02 00 00 00 01 00 04 00 09 00 00 00 04 62 61 73
## [101] 65 00 00 00 02 00 00 00 0d 00 00 00 01 00 00 00 0d 00 00 00 fe 00 00 00 0e
## [126] 00 00 00 03 40 08 00 00 00 00 00 00 40 00 00 00 00 00 00 00 3f f0 00 00 00
## [151] 00 00 00 00 00 00 fe 00 00 00 0e 00 00 00 02 40 08 00 00 00 00 00 00 40 18
## [176] 00 00 00 00 00 00

If you want, this can be sent to a file, but in that case you are better off using something like save().

The benefit of the serialize() function is that it is the only way to perfectly represent an R object in an exportable format, without losing precision or any metadata. If that is what you need, then serialize() is the function for you.

unserialize(m)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2 3 4
## 
## [[3]]
## [1] 3 6
url <- "https://ops.gov.ph/wp-content/uploads/2022/09/20220920-STATEMENT-BY-PRESIDENT-FERDINAND-ROMUALDEZ-MARCOS-JR.-AT-THE-HIGH-LEVEL-GENERAL-DEBATE-OF-THE-77TH-SESSION-OF-THE-UNITED-NATIONS-GENERAL-ASSEMBLY.pdf"
download.file(url, destfile = "marcos_speech.pdf")
library(textreadr)
marcos <- read_pdf(url)
marcos
## Table: [6 x 3]
## 
##   page_id element_id text                                     
## 1 1       1          OFFICE OF THE PRESS SECRETARY\nPRESIDENTI
## 2 2       1          Climate change is the greatest threat af 
## 3 3       1          This injustice was evident during the pa 
## 4 4       1          Our work must also focus on ensuring tha 
## 5 5       1          This requires investment in food securit 
## 6 6       1          But we also need to update the global st 
## . ...     ...        ...
con <- file("iris.txt")   
open(con, "r")  
data <- read.table(con, sep = "|", header = TRUE) 
data
##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1            5.1         3.5          1.4         0.2     setosa
## 2            4.9         3.0          1.4         0.2     setosa
## 3            4.7         3.2          1.3         0.2     setosa
## 4            4.6         3.1          1.5         0.2     setosa
## 5            5.0         3.6          1.4         0.2     setosa
## 6            5.4         3.9          1.7         0.4     setosa
## 7            4.6         3.4          1.4         0.3     setosa
## 8            5.0         3.4          1.5         0.2     setosa
## 9            4.4         2.9          1.4         0.2     setosa
## 10           4.9         3.1          1.5         0.1     setosa
## 11           5.4         3.7          1.5         0.2     setosa
## 12           4.8         3.4          1.6         0.2     setosa
## 13           4.8         3.0          1.4         0.1     setosa
## 14           4.3         3.0          1.1         0.1     setosa
## 15           5.8         4.0          1.2         0.2     setosa
## 16           5.7         4.4          1.5         0.4     setosa
## 17           5.4         3.9          1.3         0.4     setosa
## 18           5.1         3.5          1.4         0.3     setosa
## 19           5.7         3.8          1.7         0.3     setosa
## 20           5.1         3.8          1.5         0.3     setosa
## 21           5.4         3.4          1.7         0.2     setosa
## 22           5.1         3.7          1.5         0.4     setosa
## 23           4.6         3.6          1.0         0.2     setosa
## 24           5.1         3.3          1.7         0.5     setosa
## 25           4.8         3.4          1.9         0.2     setosa
## 26           5.0         3.0          1.6         0.2     setosa
## 27           5.0         3.4          1.6         0.4     setosa
## 28           5.2         3.5          1.5         0.2     setosa
## 29           5.2         3.4          1.4         0.2     setosa
## 30           4.7         3.2          1.6         0.2     setosa
## 31           4.8         3.1          1.6         0.2     setosa
## 32           5.4         3.4          1.5         0.4     setosa
## 33           5.2         4.1          1.5         0.1     setosa
## 34           5.5         4.2          1.4         0.2     setosa
## 35           4.9         3.1          1.5         0.2     setosa
## 36           5.0         3.2          1.2         0.2     setosa
## 37           5.5         3.5          1.3         0.2     setosa
## 38           4.9         3.6          1.4         0.1     setosa
## 39           4.4         3.0          1.3         0.2     setosa
## 40           5.1         3.4          1.5         0.2     setosa
## 41           5.0         3.5          1.3         0.3     setosa
## 42           4.5         2.3          1.3         0.3     setosa
## 43           4.4         3.2          1.3         0.2     setosa
## 44           5.0         3.5          1.6         0.6     setosa
## 45           5.1         3.8          1.9         0.4     setosa
## 46           4.8         3.0          1.4         0.3     setosa
## 47           5.1         3.8          1.6         0.2     setosa
## 48           4.6         3.2          1.4         0.2     setosa
## 49           5.3         3.7          1.5         0.2     setosa
## 50           5.0         3.3          1.4         0.2     setosa
## 51           7.0         3.2          4.7         1.4 versicolor
## 52           6.4         3.2          4.5         1.5 versicolor
## 53           6.9         3.1          4.9         1.5 versicolor
## 54           5.5         2.3          4.0         1.3 versicolor
## 55           6.5         2.8          4.6         1.5 versicolor
## 56           5.7         2.8          4.5         1.3 versicolor
## 57           6.3         3.3          4.7         1.6 versicolor
## 58           4.9         2.4          3.3         1.0 versicolor
## 59           6.6         2.9          4.6         1.3 versicolor
## 60           5.2         2.7          3.9         1.4 versicolor
## 61           5.0         2.0          3.5         1.0 versicolor
## 62           5.9         3.0          4.2         1.5 versicolor
## 63           6.0         2.2          4.0         1.0 versicolor
## 64           6.1         2.9          4.7         1.4 versicolor
## 65           5.6         2.9          3.6         1.3 versicolor
## 66           6.7         3.1          4.4         1.4 versicolor
## 67           5.6         3.0          4.5         1.5 versicolor
## 68           5.8         2.7          4.1         1.0 versicolor
## 69           6.2         2.2          4.5         1.5 versicolor
## 70           5.6         2.5          3.9         1.1 versicolor
## 71           5.9         3.2          4.8         1.8 versicolor
## 72           6.1         2.8          4.0         1.3 versicolor
## 73           6.3         2.5          4.9         1.5 versicolor
## 74           6.1         2.8          4.7         1.2 versicolor
## 75           6.4         2.9          4.3         1.3 versicolor
## 76           6.6         3.0          4.4         1.4 versicolor
## 77           6.8         2.8          4.8         1.4 versicolor
## 78           6.7         3.0          5.0         1.7 versicolor
## 79           6.0         2.9          4.5         1.5 versicolor
## 80           5.7         2.6          3.5         1.0 versicolor
## 81           5.5         2.4          3.8         1.1 versicolor
## 82           5.5         2.4          3.7         1.0 versicolor
## 83           5.8         2.7          3.9         1.2 versicolor
## 84           6.0         2.7          5.1         1.6 versicolor
## 85           5.4         3.0          4.5         1.5 versicolor
## 86           6.0         3.4          4.5         1.6 versicolor
## 87           6.7         3.1          4.7         1.5 versicolor
## 88           6.3         2.3          4.4         1.3 versicolor
## 89           5.6         3.0          4.1         1.3 versicolor
## 90           5.5         2.5          4.0         1.3 versicolor
## 91           5.5         2.6          4.4         1.2 versicolor
## 92           6.1         3.0          4.6         1.4 versicolor
## 93           5.8         2.6          4.0         1.2 versicolor
## 94           5.0         2.3          3.3         1.0 versicolor
## 95           5.6         2.7          4.2         1.3 versicolor
## 96           5.7         3.0          4.2         1.2 versicolor
## 97           5.7         2.9          4.2         1.3 versicolor
## 98           6.2         2.9          4.3         1.3 versicolor
## 99           5.1         2.5          3.0         1.1 versicolor
## 100          5.7         2.8          4.1         1.3 versicolor
## 101          6.3         3.3          6.0         2.5  virginica
## 102          5.8         2.7          5.1         1.9  virginica
## 103          7.1         3.0          5.9         2.1  virginica
## 104          6.3         2.9          5.6         1.8  virginica
## 105          6.5         3.0          5.8         2.2  virginica
## 106          7.6         3.0          6.6         2.1  virginica
## 107          4.9         2.5          4.5         1.7  virginica
## 108          7.3         2.9          6.3         1.8  virginica
## 109          6.7         2.5          5.8         1.8  virginica
## 110          7.2         3.6          6.1         2.5  virginica
## 111          6.5         3.2          5.1         2.0  virginica
## 112          6.4         2.7          5.3         1.9  virginica
## 113          6.8         3.0          5.5         2.1  virginica
## 114          5.7         2.5          5.0         2.0  virginica
## 115          5.8         2.8          5.1         2.4  virginica
## 116          6.4         3.2          5.3         2.3  virginica
## 117          6.5         3.0          5.5         1.8  virginica
## 118          7.7         3.8          6.7         2.2  virginica
## 119          7.7         2.6          6.9         2.3  virginica
## 120          6.0         2.2          5.0         1.5  virginica
## 121          6.9         3.2          5.7         2.3  virginica
## 122          5.6         2.8          4.9         2.0  virginica
## 123          7.7         2.8          6.7         2.0  virginica
## 124          6.3         2.7          4.9         1.8  virginica
## 125          6.7         3.3          5.7         2.1  virginica
## 126          7.2         3.2          6.0         1.8  virginica
## 127          6.2         2.8          4.8         1.8  virginica
## 128          6.1         3.0          4.9         1.8  virginica
## 129          6.4         2.8          5.6         2.1  virginica
## 130          7.2         3.0          5.8         1.6  virginica
## 131          7.4         2.8          6.1         1.9  virginica
## 132          7.9         3.8          6.4         2.0  virginica
## 133          6.4         2.8          5.6         2.2  virginica
## 134          6.3         2.8          5.1         1.5  virginica
## 135          6.1         2.6          5.6         1.4  virginica
## 136          7.7         3.0          6.1         2.3  virginica
## 137          6.3         3.4          5.6         2.4  virginica
## 138          6.4         3.1          5.5         1.8  virginica
## 139          6.0         3.0          4.8         1.8  virginica
## 140          6.9         3.1          5.4         2.1  virginica
## 141          6.7         3.1          5.6         2.4  virginica
## 142          6.9         3.1          5.1         2.3  virginica
## 143          5.8         2.7          5.1         1.9  virginica
## 144          6.8         3.2          5.9         2.3  virginica
## 145          6.7         3.3          5.7         2.5  virginica
## 146          6.7         3.0          5.2         2.3  virginica
## 147          6.3         2.5          5.0         1.9  virginica
## 148          6.5         3.0          5.2         2.0  virginica
## 149          6.2         3.4          5.4         2.3  virginica
## 150          5.9         3.0          5.1         1.8  virginica
close(con)
con <- url("https://ops.gov.ph/presidential-speech/")   
open(con, "r")  
con <- url("https://www.jhu.edu", "r")  
x <- readLines(con)                      
cat(head(x)) 
## <!doctype html>  <html class="no-js" lang="en">   <head>     <script>     dataLayer = [];