Reading and Writing Files

This covers loading and saving data in an active workspace by reading and writing files.

8.1 R-Ready Data Sets

Enter data() at the R prompt to bring up a window listing these ready-to-use data sets along with a one-line description

data()

Use ???data(package = .packages(all.available = TRUE))??? to list the data sets in all available packages.

8.1.1 Built-in Data sets

To see a summary of the data sets contained in the package, you can use the library function library(help=“datasets”)

library(help="datasets")
?ChickWeight
ChickWeight[1:15,]
Grouped Data: weight ~ Time | Chick
   weight Time Chick Diet
1      42    0     1    1
2      51    2     1    1
3      59    4     1    1
4      64    6     1    1
5      76    8     1    1
6      93   10     1    1
7     106   12     1    1
8     125   14     1    1
9     149   16     1    1
10    171   18     1    1
11    199   20     1    1
12    205   21     1    1
13     40    0     2    1
14     49    2     2    1
15     58    4     2    1

8.1.2 Contributed Data Sets

Additional data sets are also available as contributed packages. To access them, first install and load the relevant package. Consider the data set ice.river which is in the contriubted package tseries by Trapletti and Hornik(2013)

First install the package by running the line install.packages(“tseries”) at the prompt. Then, to access the components of the pakcage, load it using library

install.packages("tseries"
                 )
library("tseries")
library(help="tseries")

You can enter ?ice.river to find more detals about th data set you want to work . The help file describes it as a “time series object” comprrised of river flow, precipitation, and temperature measurements. Data initially reported in Tong (1990)

?ice.river
Warning message:
In native_encode(res, to = encoding) :
  some characters may not work under the current locale

Icelandic River Data

Description

Contains the Icelandic river data as presented in Tong (1990), pages 432???440.

Usage

data(ice.river) Format

4 univariate time series flow.vat, flow.jok, prec, and temp, each with 1095 observations and the joint series ice.river.

Details

The series are daily observations from Jan. 1, 1972 to Dec. 31, 1974 on 4 variables: flow.vat, mean daily flow of Vatnsdalsa river (cms), flow.jok, mean daily flow of Jokulsa Eystri river (cms), prec, daily precipitation in Hveravellir (mm), and mean daily temperature in Hveravellir (deg C).

These datasets were introduced into the literature in a paper by Tong, Thanoon, and Gudmundsson (1985).

data(ice.river)
ice.river[1:5,]
     flow.vat flow.jok prec temp
[1,]     16.1     30.2  8.1  0.9
[2,]     19.2     29.0  4.4  1.6
[3,]     14.5     28.4  7.0  0.1
[4,]     11.0     27.8  0.0  0.6
[5,]     13.6     27.8  0.0  2.0

8.2 Reading in External Data Files

You often have to work with data from external sources. This shows how to do that.

8.2.1 The Table format

Table format files have 3 key features: Header (provides names for each column of data) Delimiter (character used to separate the entries in each line) Missing Value (character string used to exclusively denote a missing value)

Typically these files have a .txt extension or .csv Download the sample files from: https://www.nostarch.com/bookofr

mydatafile<- read.table(file="d:/bookofr/8.2.1_mydatafile.txt",
                        header = TRUE, 
                        sep=" ",
                        na.strings="*",
                        stringsAsFactors = FALSE)
mydatafile

Tip: if you are working with multiple files and don’t want to specify the working directory, you can set it with the setwd() command

# get working directory
getwd()
[1] "D:/Dropbox/BigData/R Programming/Work"
# change it back to original setting
setwd("D:/Dropbox/BigData/R Programming/Work")
getwd()
[1] "D:/Dropbox/BigData/R Programming/Work"
list.files("D:/Dropbox/BigData/R Programming/Work/bookofr")
 [1] "8.2.1_mydatafile.txt"       "8.2.2_spreadsheetfile.csv"  "Davies_Part1_Solutions.R"  
 [4] "Davies_Part1_Source_Code.R" "Davies_Part2_Solutions.R"   "Davies_Part2_Source_Code.R"
 [7] "Davies_Part3_Solutions.R"   "Davies_Part3_Source_Code.R" "Davies_Part4_Solutions.R"  
[10] "Davies_Part4_Source_Code.R" "Davies_Part5_Solutions.R"   "Davies_Part5_Source_Code.R"

You can also find files interactively using the file.choose command

file.choose()
[1] "D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\8.2.1_mydatafile.txt"

So you can combine the output of the above command to load the file.

mydatafile<- read.table(file=file.choose(),
                        header = TRUE, 
                        sep=" ",
                        na.strings="*",
                        stringsAsFactors = FALSE)
mydatafile

Normally, read.table will convert non-numeric values into factors by default. To prevent this, we use the “stringsAsFactors=FALSE” To overide one of them as that data type:

# look at the current mydatafile before data type overide
mydatafile
# now we overide them
mydatafile$sex <- as.factor(mydatafile$sex)
mydatafile$funny <- factor(x=mydatafile$funny,levels=c("Low","Med","High"))
mydatafile

8.2.2 Spreadsheet Workbooks

Generally you will need to use file save as and save the xls or xlsx into .csv format. You can also use some contributed packages like gdata by Warnes et al. or XLConnect by Mirai Solutions

8.2.3 Web Based Files

You use the same read.table command, but this time in the file=“” area, use the URL

dia.url<-"http://www.amstat.org/publications/jse/v9n2/4cdata.txt"
diamonds <-read.table(dia.url)
diamonds
# now add the header using the names function
names(diamonds)<-c("Carat","Color","Clarity","Cert","Price")
# now list the table
diamonds[1:5,]

8.2.4 Other File Formats

Other format like .dat can be imported using read.table. Use the skip argument to remove the unwanted extra lines

8.3 Writing out data files and plots

8.3.1 Data Sets

Use the write.table function

write.table(x=mydatafile, file="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\somenewfile.txt",
            sep="@",
            na="??",
            quote=FALSE,
            row.names=FALSE)

The output file will have contents like this:

person@age@sex@funny@age.mon Peter@??@M@High@504 Lois@40@F@??@480 Meg@17@F@Low@204 Chris@14@M@Med@168 Stewie@1@M@High@?? Brian@??@M@Med@??

See also read.csv and write.csv which are shortcut versions of read.table and write.table.

8.3.2 Plot and Graphics Files

YOu can save plots using either: jpeg pdf eps ggsave

jpeg(filename="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\someplot.jpeg",
     width=600,height=600
     )
plot(1:5,10:6,ylab="Y axis", xlab=" x axis",
     main="A save jpeg plot")
points(1:5,10:6,cex=2,pch=4, col=2)
dev.off()
null device 
          1 
pdf(file="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\someplot.pdf",
     width=5,height=5
     )
plot(1:5,10:6,ylab="Y axis", xlab=" x axis",
     main="A save jpeg plot")
points(1:5,10:6,cex=2,pch=4, col=2)
dev.off()
null device 
          1 
library("ggplot2")
foo
[1] 1.1 2.0 3.5 3.9 4.2
bar
[1]  2.0  2.2 -1.3  0.0  0.2
qplot(foo,bar,geom="blank")+
  geom_point(size=3,shape=8,color="darkgreen")+
  geom_line(color="orange",linetype=4)
# now save the plot using ggsave
ggsave(filename="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\myqplot.png")
Saving 7.29 x 4.5 in image
# you can save into different formats by simply changing the extension
ggsave(filename="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\myqplot.pdf")
Saving 7.29 x 4.5 in image

To find our more details: ?pdf ?postscript ?jpeg ?ggsave

8.4 Ad Hoc Object Read/Write Operations

Use dput and dget

somelist<-list(foo=c(5,2,45),
               bar=matrix(data=c(T,T,F,F,F,F,T,F,T), nrow=3,ncol=3),
               baz=factor(c(1,2,2,3,1,1,3),levels=1:3,ordered=T))
somelist
$foo
[1]  5  2 45

$bar
      [,1]  [,2]  [,3]
[1,]  TRUE FALSE  TRUE
[2,]  TRUE FALSE FALSE
[3,] FALSE FALSE  TRUE

$baz
[1] 1 2 2 3 1 1 3
Levels: 1 < 2 < 3
# put object somelist into file
dput(x=somelist, file="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\myRobject.txt")

Now to read back the object

newobect<-dget(file="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\myRobject.txt")
newobect
$foo
[1]  5  2 45

$bar
      [,1]  [,2]  [,3]
[1,]  TRUE FALSE  TRUE
[2,]  TRUE FALSE FALSE
[3,] FALSE FALSE  TRUE

$baz
[1] 1 2 2 3 1 1 3
Levels: 1 < 2 < 3
---
title: "Chapter 8 Reading and Writing Files"
output: html_notebook
---

<H1>Reading and Writing Files </H1>
This covers loading and saving data in an active workspace by reading and writing files.

<h2> 8.1 R-Ready Data Sets</h2>
Enter data() at the R prompt to bring up a window listing these ready-to-use data sets along with a one-line description

```{r}
data()
```

Use ???data(package = .packages(all.available = TRUE))???
to list the data sets in all *available* packages.


<h3> 8.1.1 Built-in Data sets</h3>
To see a summary of the data sets contained in the package, you can use the library function library(help="datasets")

```{r}
library(help="datasets")
```


```{r}
?ChickWeight
```


       
```{r}
ChickWeight[1:15,]

```

                        
<h3> 8.1.2 Contributed Data Sets</h3>
Additional data sets are also available as contributed packages.
To access them, first install and load the relevant package.
Consider the data set ice.river which is in the contriubted package tseries by Trapletti and Hornik(2013)

First install the package by running the line  install.packages("tseries") at the prompt.
Then, to access the components of the pakcage, load it using library
```{r}
install.packages("tseries"
                 )

```

```{r}
library("tseries")
```

```{r}
library(help="tseries")
```

You can enter ?ice.river to find more detals about th data set you want to work .
The help file describes it as a "time series object" comprrised of river flow, precipitation, and temperature measurements. Data initially reported in Tong (1990)

```{r}
?ice.river

```
Icelandic River Data

Description

Contains the Icelandic river data as presented in Tong (1990), pages 432???440.

Usage

data(ice.river)
Format

4 univariate time series flow.vat, flow.jok, prec, and temp, each with 1095 observations and the joint series ice.river.

Details

The series are daily observations from Jan. 1, 1972 to Dec. 31, 1974 on 4 variables: flow.vat, mean daily flow of Vatnsdalsa river (cms), flow.jok, mean daily flow of Jokulsa Eystri river (cms), prec, daily precipitation in Hveravellir (mm), and mean daily temperature in Hveravellir (deg C).

These datasets were introduced into the literature in a paper by Tong, Thanoon, and Gudmundsson (1985).


```{r}
data(ice.river)
ice.river[1:5,]
```

<h2> 8.2 Reading in External Data Files</h2>
You often have to work with data from external sources. 
This shows how to do that.

<h3> 8.2.1 The Table format</h3>
Table format files have 3 key features:
Header (provides names for each column of data)
Delimiter (character used to separate the entries in each line)
Missing Value (character string used to exclusively denote a missing value)

Typically these files have a .txt extension or .csv
Download the sample files from:
https://www.nostarch.com/bookofr



```{r}
mydatafile<- read.table(file="d:/bookofr/8.2.1_mydatafile.txt",
                        header = TRUE, 
                        sep=" ",
                        na.strings="*",
                        stringsAsFactors = FALSE)
mydatafile
```
Tip: if you are working with multiple files and don't want to specify the working directory,
you can set it with the setwd() command
```{r}
# get working directory
getwd()
# change it back to original setting
setwd("D:/Dropbox/BigData/R Programming/Work")
getwd()

```
```{r}
list.files("D:/Dropbox/BigData/R Programming/Work/bookofr")
```

You can also find files interactively using the 
file.choose command

```{r}
file.choose()

```

So you can combine the output of the above command to load the file.

```{r}
mydatafile<- read.table(file=file.choose(),
                        header = TRUE, 
                        sep=" ",
                        na.strings="*",
                        stringsAsFactors = FALSE)
mydatafile
```
Normally, 
read.table will convert non-numeric values into factors by default. 
To prevent this, we use the "stringsAsFactors=FALSE"
To overide one of them as that data type:

```{r}
# look at the current mydatafile before data type overide
mydatafile
# now we overide them
mydatafile$sex <- as.factor(mydatafile$sex)
mydatafile$funny <- factor(x=mydatafile$funny,levels=c("Low","Med","High"))
mydatafile

```

<h3> 8.2.2 Spreadsheet Workbooks</h3>

Generally you will need to use file save as and save the xls or xlsx into .csv format.
You can also use some contributed packages like gdata by Warnes et al. or XLConnect by Mirai Solutions

<h3> 8.2.3 Web Based Files</h3>
You use the same read.table command, but this time in the file="" area, use the URL 

```{r}
dia.url<-"http://www.amstat.org/publications/jse/v9n2/4cdata.txt"
diamonds <-read.table(dia.url)
diamonds
# now add the header using the names function

names(diamonds)<-c("Carat","Color","Clarity","Cert","Price")
# now list the table
diamonds[1:5,]
```

<h3> 8.2.4 Other File Formats</h3>
Other format like .dat can be imported using read.table. 
Use the skip argument to remove the unwanted extra lines

<h2> 8.3 Writing out data files and plots</h2>

<h3> 8.3.1 Data Sets</h3>
Use the write.table function
```{r}
write.table(x=mydatafile, file="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\somenewfile.txt",
            sep="@",
            na="??",
            quote=FALSE,
            row.names=FALSE)
```
The output file will have contents like this:

person@age@sex@funny@age.mon
Peter@??@M@High@504
Lois@40@F@??@480
Meg@17@F@Low@204
Chris@14@M@Med@168
Stewie@1@M@High@??
Brian@??@M@Med@??

See also read.csv and write.csv which are shortcut versions of read.table and write.table.


<h3> 8.3.2 Plot and Graphics Files</h3>
YOu can save plots using either:
jpeg
pdf
eps
ggsave

```{r}
jpeg(filename="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\someplot.jpeg",
     width=600,height=600
     )
plot(1:5,10:6,ylab="Y axis", xlab=" x axis",
     main="A save jpeg plot")
points(1:5,10:6,cex=2,pch=4, col=2)
dev.off()


```

```{r}
# pdf uses file instead of filename
# pdf uses inches instead of pixels
pdf(file="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\someplot.pdf",
     width=5,height=5
     )
plot(1:5,10:6,ylab="Y axis", xlab=" x axis",
     main="A save jpeg plot")
points(1:5,10:6,cex=2,pch=4, col=2)
dev.off()

```

```{r}
library("ggplot2")
foo
bar
qplot(foo,bar,geom="blank")+
  geom_point(size=3,shape=8,color="darkgreen")+
  geom_line(color="orange",linetype=4)
# now save the plot using ggsave
ggsave(filename="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\myqplot.png")
# you can save into different formats by simply changing the extension
ggsave(filename="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\myqplot.pdf")
```
To find our more details:
?pdf
?postscript
?jpeg
?ggsave


<h2> 8.4 Ad Hoc Object Read/Write Operations</h2>
Use dput and dget
```{r}
somelist<-list(foo=c(5,2,45),
               bar=matrix(data=c(T,T,F,F,F,F,T,F,T), nrow=3,ncol=3),
               baz=factor(c(1,2,2,3,1,1,3),levels=1:3,ordered=T))
somelist



```
```{r}
# put object somelist into file
dput(x=somelist, file="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\myRobject.txt")
```

Now to read back the object 
```{r}
newobect<-dget(file="D:\\Dropbox\\BigData\\R Programming\\Work\\bookofr\\myRobject.txt")
newobect

```

