dput() and dump()One way to pass data around is by deparsing the R object with
dput() and reading it back in (parsing it) using
dget().
## Create a data frame
y <- data.frame(a = 1, b = "a")
## Print 'dput' output to console
dput(y)
## structure(list(a = 1, b = "a"), class = "data.frame", row.names = c(NA,
## -1L))
Notice that the dput() output is in the form of R code
and that it preserves metadata like the class of the object, the row
names, and the column names.
The output of dput() can also be saved directly to a
file.
## Send 'dput' output to a file
dput(y, file = "y.R")
## Read in 'dput' output from a file
new.y <- dget("y.R")
str(new.y)
## 'data.frame': 1 obs. of 2 variables:
## $ a: num 1
## $ b: chr "a"
pop2022 <- read.csv("pop2022.csv")
str(pop2022)
## 'data.frame': 107785 obs. of 9 variables:
## $ REGION : chr "BARMM" "BARMM" "BARMM" "BARMM" ...
## $ PROVINCE : chr "BASILAN" "BASILAN" "BASILAN" "BASILAN" ...
## $ MUNICIPALITY : chr "AKBAR" "AKBAR" "AKBAR" "AKBAR" ...
## $ BARANGAY : chr "LINONGAN" "LINONGAN" "MANGUSO" "UPPER SINANGKAPAN" ...
## $ PRECINCT_ID : int 7080001 7080002 7080003 7080004 7080005 7080006 7080007 7080008 7080009 7080010 ...
## $ CLUSTER : int 1 2 3 4 5 6 7 8 9 10 ...
## $ CLUSTERTOTAL : int 423 411 313 655 479 678 633 715 698 725 ...
## $ CLUSTERED_PRECINCTS: chr "0012A, 0012P1, 0011A" "0011B, 0011C, 0012B" "0035A, 0035C, 0035P1, 0035B" "0050A, 0050B, 0050C, 0050D, 0050P1" ...
## $ POLLINGCENTER : chr "LINONGAN ELEMENTARY SCHOOL, BARANGAY LINONGAN, AKBAR MUNICIPALITY" "LINONGAN ELEMENTARY SCHOOL, BARANGAY LINONGAN, AKBAR MUNICIPALITY" "AKBAR ELEMENTARY SCHOOL, BARANGAY UPPER BATO-BATO, AKBAR MUNICIPALITY" "BARANGAY UPPER SINANGKAPAN COVERED COURT (BUSCC), BARANGAY UPPER SINANGKAPAN" ...
Multiple objects can be deparsed at once using the dump
function and read back in using source.
x <- "foo"
y <- data.frame(a = 1L, b = "a")
We can dump() R objects to a file by passing a character
vector of their names.
dump(c("x", "y", "pop2022"), file = "data.R")
# remove x and y in R memory
rm(x, y, pop2022)
The inverse of dump() is source().
source("data.R")
x
## [1] "foo"
str(y)
## 'data.frame': 1 obs. of 2 variables:
## $ a: int 1
## $ b: chr "a"
str(pop2022)
## 'data.frame': 107785 obs. of 9 variables:
## $ REGION : chr "BARMM" "BARMM" "BARMM" "BARMM" ...
## $ PROVINCE : chr "BASILAN" "BASILAN" "BASILAN" "BASILAN" ...
## $ MUNICIPALITY : chr "AKBAR" "AKBAR" "AKBAR" "AKBAR" ...
## $ BARANGAY : chr "LINONGAN" "LINONGAN" "MANGUSO" "UPPER SINANGKAPAN" ...
## $ PRECINCT_ID : int 7080001 7080002 7080003 7080004 7080005 7080006 7080007 7080008 7080009 7080010 ...
## $ CLUSTER : int 1 2 3 4 5 6 7 8 9 10 ...
## $ CLUSTERTOTAL : int 423 411 313 655 479 678 633 715 698 725 ...
## $ CLUSTERED_PRECINCTS: chr "0012A, 0012P1, 0011A" "0011B, 0011C, 0012B" "0035A, 0035C, 0035P1, 0035B" "0050A, 0050B, 0050C, 0050D, 0050P1" ...
## $ POLLINGCENTER : chr "LINONGAN ELEMENTARY SCHOOL, BARANGAY LINONGAN, AKBAR MUNICIPALITY" "LINONGAN ELEMENTARY SCHOOL, BARANGAY LINONGAN, AKBAR MUNICIPALITY" "AKBAR ELEMENTARY SCHOOL, BARANGAY UPPER BATO-BATO, AKBAR MUNICIPALITY" "BARANGAY UPPER SINANGKAPAN COVERED COURT (BUSCC), BARANGAY UPPER SINANGKAPAN" ...
The complement to the textual format is the binary format, which is sometimes necessary to use for efficiency purposes, or because there’s just no useful way to represent data in a textual manner. Also, with numeric data, one can often lose precision when converting to and from a textual format, so it’s better to stick with a binary format.
The key functions for converting R objects into a binary format are
save(), save.image(), and
serialize(). Individual R objects can be saved to a file
using the save() function.
a <- data.frame(x = rnorm(100), y = runif(100))
b <- c(3, 4.4, 1 / 3)
str(a)
## 'data.frame': 100 obs. of 2 variables:
## $ x: num -0.707 -2.584 1.004 -0.615 -2.385 ...
## $ y: num 0.997 0.829 0.509 0.58 0.218 ...
range(a$y)
## [1] 0.0007055944 0.9966275345
mean(a$x)
## [1] -0.1235815
sd(a$x)
## [1] 1.036628
summary(a)
## x y
## Min. :-2.5839 Min. :0.0007056
## 1st Qu.:-0.7329 1st Qu.:0.2002197
## Median :-0.1107 Median :0.5051831
## Mean :-0.1236 Mean :0.4874169
## 3rd Qu.: 0.5120 3rd Qu.:0.7403850
## Max. : 2.9730 Max. :0.9966275
str(b)
## num [1:3] 3 4.4 0.333
## Save 'a' and 'b' to a file
save(a, b, file = "mydata.rda")
rm(a)
## Load 'a' and 'b' into your workspace
load("mydata.rda")
str(a)
## 'data.frame': 100 obs. of 2 variables:
## $ x: num -0.707 -2.584 1.004 -0.615 -2.385 ...
## $ y: num 0.997 0.829 0.509 0.58 0.218 ...
The serialize() function is used to convert individual R
objects into a binary format that can be communicated across an
arbitrary connection. This may get sent to a file, but it could get sent
over a network or other connection.
When you call serialize() on an R object, the output
will be a raw vector coded in hexadecimal format.
x <- list(1, 2:4, c(3,6))
x
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2 3 4
##
## [[3]]
## [1] 3 6
x[[2]]
## [1] 2 3 4
m <- serialize(x, NULL)
m
## [1] 58 0a 00 00 00 03 00 04 02 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
## [26] 00 13 00 00 00 03 00 00 00 0e 00 00 00 01 3f f0 00 00 00 00 00 00 00 00 00
## [51] ee 00 00 00 02 00 00 00 01 00 04 00 09 00 00 00 0e 63 6f 6d 70 61 63 74 5f
## [76] 69 6e 74 73 65 71 00 00 00 02 00 00 00 01 00 04 00 09 00 00 00 04 62 61 73
## [101] 65 00 00 00 02 00 00 00 0d 00 00 00 01 00 00 00 0d 00 00 00 fe 00 00 00 0e
## [126] 00 00 00 03 40 08 00 00 00 00 00 00 40 00 00 00 00 00 00 00 3f f0 00 00 00
## [151] 00 00 00 00 00 00 fe 00 00 00 0e 00 00 00 02 40 08 00 00 00 00 00 00 40 18
## [176] 00 00 00 00 00 00
If you want, this can be sent to a file, but in that case you are
better off using something like save().
The benefit of the serialize() function is that it is
the only way to perfectly represent an R object in an exportable format,
without losing precision or any metadata. If that is what you need, then
serialize() is the function for you.
unserialize(m)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2 3 4
##
## [[3]]
## [1] 3 6
url <- "https://ops.gov.ph/wp-content/uploads/2022/09/20220920-STATEMENT-BY-PRESIDENT-FERDINAND-ROMUALDEZ-MARCOS-JR.-AT-THE-HIGH-LEVEL-GENERAL-DEBATE-OF-THE-77TH-SESSION-OF-THE-UNITED-NATIONS-GENERAL-ASSEMBLY.pdf"
download.file(url, destfile = "marcos_speech.pdf")
library(textreadr)
marcos <- read_pdf(url)
marcos
## Table: [6 x 3]
##
## page_id element_id text
## 1 1 1 OFFICE OF THE PRESS SECRETARY\nPRESIDENTI
## 2 2 1 Climate change is the greatest threat af
## 3 3 1 This injustice was evident during the pa
## 4 4 1 Our work must also focus on ensuring tha
## 5 5 1 This requires investment in food securit
## 6 6 1 But we also need to update the global st
## . ... ... ...
con <- file("iris.txt")
open(con, "r")
data <- read.table(con, sep = "|", header = TRUE)
data
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## 11 5.4 3.7 1.5 0.2 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 14 4.3 3.0 1.1 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 16 5.7 4.4 1.5 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 18 5.1 3.5 1.4 0.3 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 21 5.4 3.4 1.7 0.2 setosa
## 22 5.1 3.7 1.5 0.4 setosa
## 23 4.6 3.6 1.0 0.2 setosa
## 24 5.1 3.3 1.7 0.5 setosa
## 25 4.8 3.4 1.9 0.2 setosa
## 26 5.0 3.0 1.6 0.2 setosa
## 27 5.0 3.4 1.6 0.4 setosa
## 28 5.2 3.5 1.5 0.2 setosa
## 29 5.2 3.4 1.4 0.2 setosa
## 30 4.7 3.2 1.6 0.2 setosa
## 31 4.8 3.1 1.6 0.2 setosa
## 32 5.4 3.4 1.5 0.4 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 35 4.9 3.1 1.5 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 38 4.9 3.6 1.4 0.1 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 40 5.1 3.4 1.5 0.2 setosa
## 41 5.0 3.5 1.3 0.3 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 44 5.0 3.5 1.6 0.6 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 46 4.8 3.0 1.4 0.3 setosa
## 47 5.1 3.8 1.6 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 49 5.3 3.7 1.5 0.2 setosa
## 50 5.0 3.3 1.4 0.2 setosa
## 51 7.0 3.2 4.7 1.4 versicolor
## 52 6.4 3.2 4.5 1.5 versicolor
## 53 6.9 3.1 4.9 1.5 versicolor
## 54 5.5 2.3 4.0 1.3 versicolor
## 55 6.5 2.8 4.6 1.5 versicolor
## 56 5.7 2.8 4.5 1.3 versicolor
## 57 6.3 3.3 4.7 1.6 versicolor
## 58 4.9 2.4 3.3 1.0 versicolor
## 59 6.6 2.9 4.6 1.3 versicolor
## 60 5.2 2.7 3.9 1.4 versicolor
## 61 5.0 2.0 3.5 1.0 versicolor
## 62 5.9 3.0 4.2 1.5 versicolor
## 63 6.0 2.2 4.0 1.0 versicolor
## 64 6.1 2.9 4.7 1.4 versicolor
## 65 5.6 2.9 3.6 1.3 versicolor
## 66 6.7 3.1 4.4 1.4 versicolor
## 67 5.6 3.0 4.5 1.5 versicolor
## 68 5.8 2.7 4.1 1.0 versicolor
## 69 6.2 2.2 4.5 1.5 versicolor
## 70 5.6 2.5 3.9 1.1 versicolor
## 71 5.9 3.2 4.8 1.8 versicolor
## 72 6.1 2.8 4.0 1.3 versicolor
## 73 6.3 2.5 4.9 1.5 versicolor
## 74 6.1 2.8 4.7 1.2 versicolor
## 75 6.4 2.9 4.3 1.3 versicolor
## 76 6.6 3.0 4.4 1.4 versicolor
## 77 6.8 2.8 4.8 1.4 versicolor
## 78 6.7 3.0 5.0 1.7 versicolor
## 79 6.0 2.9 4.5 1.5 versicolor
## 80 5.7 2.6 3.5 1.0 versicolor
## 81 5.5 2.4 3.8 1.1 versicolor
## 82 5.5 2.4 3.7 1.0 versicolor
## 83 5.8 2.7 3.9 1.2 versicolor
## 84 6.0 2.7 5.1 1.6 versicolor
## 85 5.4 3.0 4.5 1.5 versicolor
## 86 6.0 3.4 4.5 1.6 versicolor
## 87 6.7 3.1 4.7 1.5 versicolor
## 88 6.3 2.3 4.4 1.3 versicolor
## 89 5.6 3.0 4.1 1.3 versicolor
## 90 5.5 2.5 4.0 1.3 versicolor
## 91 5.5 2.6 4.4 1.2 versicolor
## 92 6.1 3.0 4.6 1.4 versicolor
## 93 5.8 2.6 4.0 1.2 versicolor
## 94 5.0 2.3 3.3 1.0 versicolor
## 95 5.6 2.7 4.2 1.3 versicolor
## 96 5.7 3.0 4.2 1.2 versicolor
## 97 5.7 2.9 4.2 1.3 versicolor
## 98 6.2 2.9 4.3 1.3 versicolor
## 99 5.1 2.5 3.0 1.1 versicolor
## 100 5.7 2.8 4.1 1.3 versicolor
## 101 6.3 3.3 6.0 2.5 virginica
## 102 5.8 2.7 5.1 1.9 virginica
## 103 7.1 3.0 5.9 2.1 virginica
## 104 6.3 2.9 5.6 1.8 virginica
## 105 6.5 3.0 5.8 2.2 virginica
## 106 7.6 3.0 6.6 2.1 virginica
## 107 4.9 2.5 4.5 1.7 virginica
## 108 7.3 2.9 6.3 1.8 virginica
## 109 6.7 2.5 5.8 1.8 virginica
## 110 7.2 3.6 6.1 2.5 virginica
## 111 6.5 3.2 5.1 2.0 virginica
## 112 6.4 2.7 5.3 1.9 virginica
## 113 6.8 3.0 5.5 2.1 virginica
## 114 5.7 2.5 5.0 2.0 virginica
## 115 5.8 2.8 5.1 2.4 virginica
## 116 6.4 3.2 5.3 2.3 virginica
## 117 6.5 3.0 5.5 1.8 virginica
## 118 7.7 3.8 6.7 2.2 virginica
## 119 7.7 2.6 6.9 2.3 virginica
## 120 6.0 2.2 5.0 1.5 virginica
## 121 6.9 3.2 5.7 2.3 virginica
## 122 5.6 2.8 4.9 2.0 virginica
## 123 7.7 2.8 6.7 2.0 virginica
## 124 6.3 2.7 4.9 1.8 virginica
## 125 6.7 3.3 5.7 2.1 virginica
## 126 7.2 3.2 6.0 1.8 virginica
## 127 6.2 2.8 4.8 1.8 virginica
## 128 6.1 3.0 4.9 1.8 virginica
## 129 6.4 2.8 5.6 2.1 virginica
## 130 7.2 3.0 5.8 1.6 virginica
## 131 7.4 2.8 6.1 1.9 virginica
## 132 7.9 3.8 6.4 2.0 virginica
## 133 6.4 2.8 5.6 2.2 virginica
## 134 6.3 2.8 5.1 1.5 virginica
## 135 6.1 2.6 5.6 1.4 virginica
## 136 7.7 3.0 6.1 2.3 virginica
## 137 6.3 3.4 5.6 2.4 virginica
## 138 6.4 3.1 5.5 1.8 virginica
## 139 6.0 3.0 4.8 1.8 virginica
## 140 6.9 3.1 5.4 2.1 virginica
## 141 6.7 3.1 5.6 2.4 virginica
## 142 6.9 3.1 5.1 2.3 virginica
## 143 5.8 2.7 5.1 1.9 virginica
## 144 6.8 3.2 5.9 2.3 virginica
## 145 6.7 3.3 5.7 2.5 virginica
## 146 6.7 3.0 5.2 2.3 virginica
## 147 6.3 2.5 5.0 1.9 virginica
## 148 6.5 3.0 5.2 2.0 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica
close(con)
con <- url("https://ops.gov.ph/presidential-speech/")
open(con, "r")
con <- url("https://www.jhu.edu", "r")
x <- readLines(con)
cat(head(x))
## <!doctype html> <html class="no-js" lang="en"> <head> <script> dataLayer = [];