1. String basics.
  1. Make a string called veggies which consists of your 2 favorite vegetables, and (programmatically) count the number of characters in that string.
veggies <- "tomato, cucumber"
nchar(veggies)
## [1] 16
  1. Now make a vector of length 2 called VeggiesandFruits, consisting of two strings, veggies (as above) in the first component, and fruits, which consists of your 2 favorite fruits, in the second component. Then (programmatically) count the number of characters in each. What R command will display the second entry of VeggiesandFruits?
VeggiesandFruits <- c(veggies, "corn, wheat")
nchar(VeggiesandFruits[1])
## [1] 16
nchar(VeggiesandFruits[2])
## [1] 11
VeggiesandFruits[2]
## [1] "corn, wheat"
  1. Use “paste” to combine the two entries of VeggiesandFruits, separated by ++, and call the resulting string VeggiesandFruitsPlus. Count the number of characters in VeggiesandFruitsPlus. Then try it again, but this time paste with +++ as a separator.
VeggiesandFruitsPlus <- paste(VeggiesandFruits[1], VeggiesandFruits[2], sep = "++")
nchar(VeggiesandFruitsPlus)
## [1] 29
VeggiesandFruitsPlus <- paste(VeggiesandFruits[1], VeggiesandFruits[2], sep = "+++")
nchar(VeggiesandFruitsPlus)
## [1] 30
  1. Display the first 6 letters from the veggies string.
substr(veggies, 1, 6)
## [1] "tomato"
  1. Display the first 4 letters from each of the strings in the vector VeggiesandFruits
substr(VeggiesandFruits, 1, 4)
## [1] "toma" "corn"
  1. Display the last 3 letters from each of the strings in VeggiesandFruits.
substr(VeggiesandFruits, nchar(VeggiesandFruits)-3, nchar(VeggiesandFruits))
## [1] "mber" "heat"
  1. Splitting and combining strings

  2. Split the veggies into two strings, the first and second favorites. Save the resulting list as veggies.list, and display it (the result should be a list). How do you access your second favorite veggie in this list?

#######IMPORTANT!!!! sierra has been helping me on these, and their strsplit is behaving differently than mine. mine is returning a list with only one item in it: 
#veggies.list: [1]"tomato"    "cucumber 
#theirs returns a list of 2. 
veggies
## [1] "tomato, cucumber"
veggies.list <- strsplit(veggies, ", ")
veggies.list[[1]][2]
## [1] "cucumber"
  1. You are going to an English class and need to learn a few things quickly so you can show off. You go to the site http://www.gutenberg.org/ and download Jane Austen’s complete works. (OK, I’ve uploaded the file to Moodle already, but I wanted you to know where I got it!) Look at the .txt file so you can see more or less what is in it. Then load it into R with the following code:
setwd("C:/Users/ethan/Desktop")
##...I've been using a friend's computer
AustenLines<-readLines("Austen.txt")

Austen.lines should be a vector of strings, each element representing a “line” of text. Do a few basic vector operations with Austen.lines so you can convince yourself of this. What is the 10,000th line? Display the first 50 lines.

AustenLines[10000]
## [1] "of finding him still with them--a hope which, when it proved to be"
AustenLines[c(1:50)]
##  [1] ""                                                                      
##  [2] "Project Gutenberg's The Complete Works of Jane Austen, by Jane Austen" 
##  [3] ""                                                                      
##  [4] "This eBook is for the use of anyone anywhere at no cost and with"      
##  [5] "almost no restrictions whatsoever.  You may copy it, give it away or"  
##  [6] "re-use it under the terms of the Project Gutenberg License included"   
##  [7] "with this eBook or online at www.gutenberg.org"                        
##  [8] ""                                                                      
##  [9] ""                                                                      
## [10] "Title: The Complete Project Gutenberg Works of Jane Austen"            
## [11] ""                                                                      
## [12] "Author: Jane Austen"                                                   
## [13] ""                                                                      
## [14] "Editor: David Widger"                                                  
## [15] ""                                                                      
## [16] "Release Date: January 25, 2010 [EBook #31100]"                         
## [17] ""                                                                      
## [18] "Language: English"                                                     
## [19] ""                                                                      
## [20] "Character set encoding: ASCII"                                         
## [21] ""                                                                      
## [22] "*** START OF THIS PROJECT GUTENBERG EBOOK THE WORKS OF JANE AUSTEN ***"
## [23] ""                                                                      
## [24] ""                                                                      
## [25] ""                                                                      
## [26] ""                                                                      
## [27] "Produced by many Project Gutenberg volunteers."                        
## [28] ""                                                                      
## [29] ""                                                                      
## [30] ""                                                                      
## [31] ""                                                                      
## [32] ""                                                                      
## [33] ""                                                                      
## [34] ""                                                                      
## [35] "THE WORKS OF JANE AUSTEN"                                              
## [36] ""                                                                      
## [37] ""                                                                      
## [38] ""                                                                      
## [39] "Edited by David Widger"                                                
## [40] ""                                                                      
## [41] "Project Gutenberg Editions"                                            
## [42] ""                                                                      
## [43] ""                                                                      
## [44] ""                                                                      
## [45] "             DEDICATION"                                               
## [46] ""                                                                      
## [47] "     This Jane Austen collection"                                      
## [48] "         is dedicated to"                                              
## [49] "     Alice Goodson [Hart] Woodby"                                      
## [50] ""
  1. How many lines are there?
length(AustenLines)
## [1] 80478

b.How many characters in the longest line? Where is the longest line(s) located?

linLens <- nchar(AustenLines)
max(linLens)
## [1] 74
which(linLens == max(linLens))
## [1] 66986 66987 66997 67000 67005 67012
  1. Display the largest line(s).
AustenLines[which(linLens == max(linLens))]
## [1] "problems which delight the cummin-splitters of criticism. In the _Cecilia_"
## [2] "of Madame D'Arblay--the forerunner, if not the model, of Miss Austen--is a"
## [3] "before _Sense and Sensibility_--its original title for several years being"
## [4] "she re-christened _Sense and Sensibility._ This, as we know, was her first"
## [5] "Marianne_ before she changed the title of _First Impressions_, as she well"
## [6] "simply substituted the leading characteristics of her principal personages"
  1. What is the average number of characters per line?
mean(linLens)
## [1] 53.34525
  1. Are there any lines with no characters? If so, remove them. Check that the new length of Austen.lines makes sense to you.
length(which(linLens == 0))
## [1] 12601
NewAustenLines <- AustenLines[which(linLens != 0)]
length(AustenLines)
## [1] 80478
length(NewAustenLines)+length(which(linLens == 0))
## [1] 80478
  1. Collapse the lines in Austen.lines into one big string, separating each line by a space in doing so, using paste(), together with the “collapse” command. Call the resulting string Austen.all. How many characters does this have? Display the first 2000 characters of Austen.all.
Austen.all <- paste(NewAustenLines, collapse = " ")
nchar(Austen.all)
## [1] 4360995
substr(Austen.all, 1, 2000)
## [1] "Project Gutenberg's The Complete Works of Jane Austen, by Jane Austen This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever.  You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org Title: The Complete Project Gutenberg Works of Jane Austen Author: Jane Austen Editor: David Widger Release Date: January 25, 2010 [EBook #31100] Language: English Character set encoding: ASCII *** START OF THIS PROJECT GUTENBERG EBOOK THE WORKS OF JANE AUSTEN *** Produced by many Project Gutenberg volunteers. THE WORKS OF JANE AUSTEN Edited by David Widger Project Gutenberg Editions              DEDICATION      This Jane Austen collection          is dedicated to      Alice Goodson [Hart] Woodby [Note: The accompanying HTML file has active links to all the volumes and chapters in this set.] CONTENTS:    PERSUASION    NORTHANGER ABBEY    MANSFIELD PARK    EMMA    LADY SUSAN    LOVE AND FREINDSHIP AND OTHER EARLY WORKS    PRIDE AND PREJUDICE    SENSE AND SENSIBILITY PERSUASION by Jane Austen (1818) Chapter 1 Sir Walter Elliot, of Kellynch Hall, in Somersetshire, was a man who, for his own amusement, never took up any book but the Baronetage; there he found occupation for an idle hour, and consolation in a distressed one; there his faculties were roused into admiration and respect, by contemplating the limited remnant of the earliest patents; there any unwelcome sensations, arising from domestic affairs changed naturally into pity and contempt as he turned over the almost endless creations of the last century; and there, if every other leaf were powerless, he could read his own history with an interest which never failed.  This was the page at which the favourite volume always opened:            \"ELLIOT OF KELLYNCH HALL. \"Walter Elliot, born March 1, 1760, married, July 15, 1784, Elizabeth, daughter of James Stevenson, Esq. of South Park, in the county of"
  1. Split up Austen.all into words, using strsplit() with split=" ". Call the resulting string vector (note: here we are asking you for a vector, not a list) Austen.words. How long is this vector, i.e., how many words are there? Using the unique() function, compute and store the unique words as Austen.words.unique. How many unique words are there?
Austen.words <- strsplit(Austen.all, " ")
length(Austen.words)
## [1] 1
length(Austen.words[[1]])
## [1] 784869
Austen.words.unique <- unique(Austen.words[[1]])
length(Austen.words.unique)
## [1] 44361

To be continued! Save your work!