Reading and Writing Files
This covers loading and saving data in an active workspace by reading and writing files.
8.1 R-Ready Data Sets
Enter data() at the R prompt to bring up a window listing these ready-to-use data sets along with a one-line description
data()
Use ???data(package = .packages(all.available = TRUE))??? to list the data sets in all available packages.
8.1.1 Built-in Data sets
To see a summary of the data sets contained in the package, you can use the library function library(help=“datasets”)
library(help="datasets")
?ChickWeight
ChickWeight[1:15,]
Grouped Data: weight ~ Time | Chick
weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
5 76 8 1 1
6 93 10 1 1
7 106 12 1 1
8 125 14 1 1
9 149 16 1 1
10 171 18 1 1
11 199 20 1 1
12 205 21 1 1
13 40 0 2 1
14 49 2 2 1
15 58 4 2 1
8.1.2 Contributed Data Sets
Additional data sets are also available as contributed packages. To access them, first install and load the relevant package. Consider the data set ice.river which is in the contriubted package tseries by Trapletti and Hornik(2013)
First install the package by running the line install.packages(“tseries”) at the prompt. Then, to access the components of the pakcage, load it using library
install.packages("tseries"
)
library("tseries")
library(help="tseries")
You can enter ?ice.river to find more detals about th data set you want to work . The help file describes it as a “time series object” comprrised of river flow, precipitation, and temperature measurements. Data initially reported in Tong (1990)
?ice.river
Warning message:
In native_encode(res, to = encoding) :
some characters may not work under the current locale
Icelandic River Data
Description
Contains the Icelandic river data as presented in Tong (1990), pages 432???440.
Usage
data(ice.river) Format
4 univariate time series flow.vat, flow.jok, prec, and temp, each with 1095 observations and the joint series ice.river.
Details
The series are daily observations from Jan. 1, 1972 to Dec. 31, 1974 on 4 variables: flow.vat, mean daily flow of Vatnsdalsa river (cms), flow.jok, mean daily flow of Jokulsa Eystri river (cms), prec, daily precipitation in Hveravellir (mm), and mean daily temperature in Hveravellir (deg C).
These datasets were introduced into the literature in a paper by Tong, Thanoon, and Gudmundsson (1985).
data(ice.river)
ice.river[1:5,]
flow.vat flow.jok prec temp
[1,] 16.1 30.2 8.1 0.9
[2,] 19.2 29.0 4.4 1.6
[3,] 14.5 28.4 7.0 0.1
[4,] 11.0 27.8 0.0 0.6
[5,] 13.6 27.8 0.0 2.0
8.2 Reading in External Data Files
You often have to work with data from external sources. This shows how to do that.
8.2.1 The Table format
Table format files have 3 key features: Header (provides names for each column of data) Delimiter (character used to separate the entries in each line) Missing Value (character string used to exclusively denote a missing value)
Typically these files have a .txt extension or .csv Download the sample files from: https://www.nostarch.com/bookofr
mydatafile<- read.table(file="d:/bookofr/8.2.1_mydatafile.txt",
header = TRUE,
sep=" ",
na.strings="*",
stringsAsFactors = FALSE)
mydatafile
Tip: if you are working with multiple files and don’t want to specify the working directory, you can set it with the setwd() command
# get working directory
getwd()
[1] "D:/Dropbox/BigData/R Programming/Work"
# change it back to original setting
setwd("D:/Dropbox/BigData/R Programming/Work")
getwd()
[1] "D:/Dropbox/BigData/R Programming/Work"
list.files("D:/Dropbox/BigData/R Programming/Work/bookofr")
[1] "8.2.1_mydatafile.txt" "8.2.2_spreadsheetfile.csv" "Davies_Part1_Solutions.R"
[4] "Davies_Part1_Source_Code.R" "Davies_Part2_Solutions.R" "Davies_Part2_Source_Code.R"
[7] "Davies_Part3_Solutions.R" "Davies_Part3_Source_Code.R" "Davies_Part4_Solutions.R"
[10] "Davies_Part4_Source_Code.R" "Davies_Part5_Solutions.R" "Davies_Part5_Source_Code.R"
You can also find files interactively using the file.choose command
file.choose()
[1] "D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\8.2.1_mydatafile.txt"
So you can combine the output of the above command to load the file.
mydatafile<- read.table(file=file.choose(),
header = TRUE,
sep=" ",
na.strings="*",
stringsAsFactors = FALSE)
mydatafile
Normally, read.table will convert non-numeric values into factors by default. To prevent this, we use the “stringsAsFactors=FALSE” To overide one of them as that data type:
# look at the current mydatafile before data type overide
mydatafile
# now we overide them
mydatafile$sex <- as.factor(mydatafile$sex)
mydatafile$funny <- factor(x=mydatafile$funny,levels=c("Low","Med","High"))
mydatafile
8.2.2 Spreadsheet Workbooks
Generally you will need to use file save as and save the xls or xlsx into .csv format. You can also use some contributed packages like gdata by Warnes et al. or XLConnect by Mirai Solutions
8.2.3 Web Based Files
You use the same read.table command, but this time in the file=“” area, use the URL
dia.url<-"http://www.amstat.org/publications/jse/v9n2/4cdata.txt"
diamonds <-read.table(dia.url)
diamonds
# now add the header using the names function
names(diamonds)<-c("Carat","Color","Clarity","Cert","Price")
# now list the table
diamonds[1:5,]
8.2.4 Other File Formats
Other format like .dat can be imported using read.table. Use the skip argument to remove the unwanted extra lines
8.3 Writing out data files and plots
8.3.1 Data Sets
Use the write.table function
write.table(x=mydatafile, file="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\somenewfile.txt",
sep="@",
na="??",
quote=FALSE,
row.names=FALSE)
The output file will have contents like this:
person@age@sex@funny@age.mon Peter@??@M@High@504 Lois@40@F@??@480 Meg@17@F@Low@204 Chris@14@M@Med@168 Stewie@1@M@High@?? Brian@??@M@Med@??
See also read.csv and write.csv which are shortcut versions of read.table and write.table.
8.3.2 Plot and Graphics Files
YOu can save plots using either: jpeg pdf eps ggsave
jpeg(filename="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\someplot.jpeg",
width=600,height=600
)
plot(1:5,10:6,ylab="Y axis", xlab=" x axis",
main="A save jpeg plot")
points(1:5,10:6,cex=2,pch=4, col=2)
dev.off()
null device
1
pdf(file="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\someplot.pdf",
width=5,height=5
)
plot(1:5,10:6,ylab="Y axis", xlab=" x axis",
main="A save jpeg plot")
points(1:5,10:6,cex=2,pch=4, col=2)
dev.off()
null device
1
library("ggplot2")
foo
[1] 1.1 2.0 3.5 3.9 4.2
bar
[1] 2.0 2.2 -1.3 0.0 0.2
qplot(foo,bar,geom="blank")+
geom_point(size=3,shape=8,color="darkgreen")+
geom_line(color="orange",linetype=4)
# now save the plot using ggsave
ggsave(filename="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\myqplot.png")
Saving 7.29 x 4.5 in image
# you can save into different formats by simply changing the extension
ggsave(filename="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\myqplot.pdf")
Saving 7.29 x 4.5 in image

To find our more details: ?pdf ?postscript ?jpeg ?ggsave
8.4 Ad Hoc Object Read/Write Operations
Use dput and dget
somelist<-list(foo=c(5,2,45),
bar=matrix(data=c(T,T,F,F,F,F,T,F,T), nrow=3,ncol=3),
baz=factor(c(1,2,2,3,1,1,3),levels=1:3,ordered=T))
somelist
$foo
[1] 5 2 45
$bar
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE FALSE
[3,] FALSE FALSE TRUE
$baz
[1] 1 2 2 3 1 1 3
Levels: 1 < 2 < 3
# put object somelist into file
dput(x=somelist, file="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\myRobject.txt")
Now to read back the object
newobect<-dget(file="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\myRobject.txt")
newobect
$foo
[1] 5 2 45
$bar
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE FALSE
[3,] FALSE FALSE TRUE
$baz
[1] 1 2 2 3 1 1 3
Levels: 1 < 2 < 3
