Task 1

library(readr)
# Read Alice's Adventures in Wonderland fil into R and call it aiw.
aiw <- read_file(file = "https://www.gutenberg.org/files/11/11-0.txt")

# How many characters in aiw?
nchar(aiw)
## [1] 167807
# Is aiw a vector?
is.vector(aiw)
## [1] TRUE
# What is the length of the vector?
length(aiw)
## [1] 1

Task 2

library(stringr)
# Examine the first 3000 characters with str_sub().
first3k_aiw <- str_sub(aiw, 1, 3000)
print(first3k_aiw)
## [1] "The Project Gutenberg eBook of Alice’s Adventures in Wonderland, by Lewis Carroll\r\n\r\nThis eBook is for the use of anyone anywhere in the United States and\r\nmost other parts of the world at no cost and with almost no restrictions\r\nwhatsoever. You may copy it, give it away or re-use it under the terms\r\nof the Project Gutenberg License included with this eBook or online at\r\nwww.gutenberg.org. If you are not located in the United States, you\r\nwill have to check the laws of the country where you are located before\r\nusing this eBook.\r\n\r\nTitle: Alice’s Adventures in Wonderland\r\n\r\nAuthor: Lewis Carroll\r\n\r\nRelease Date: January, 1991 [eBook #11]\r\n[Most recently updated: October 12, 2020]\r\n\r\nLanguage: English\r\n\r\nCharacter set encoding: UTF-8\r\n\r\nProduced by: Arthur DiBianca and David Widger\r\n\r\n*** START OF THE PROJECT GUTENBERG EBOOK ALICE’S ADVENTURES IN WONDERLAND ***\r\n\r\n[Illustration]\r\n\r\n\r\n\r\n\r\nAlice’s Adventures in Wonderland\r\n\r\nby Lewis Carroll\r\n\r\nTHE MILLENNIUM FULCRUM EDITION 3.0\r\n\r\nContents\r\n\r\n CHAPTER I.     Down the Rabbit-Hole\r\n CHAPTER II.    The Pool of Tears\r\n CHAPTER III.   A Caucus-Race and a Long Tale\r\n CHAPTER IV.    The Rabbit Sends in a Little Bill\r\n CHAPTER V.     Advice from a Caterpillar\r\n CHAPTER VI.    Pig and Pepper\r\n CHAPTER VII.   A Mad Tea-Party\r\n CHAPTER VIII.  The Queen’s Croquet-Ground\r\n CHAPTER IX.    The Mock Turtle’s Story\r\n CHAPTER X.     The Lobster Quadrille\r\n CHAPTER XI.    Who Stole the Tarts?\r\n CHAPTER XII.   Alice’s Evidence\r\n\r\n\r\n\r\n\r\nCHAPTER I.\r\nDown the Rabbit-Hole\r\n\r\n\r\nAlice was beginning to get very tired of sitting by her sister on the\r\nbank, and of having nothing to do: once or twice she had peeped into\r\nthe book her sister was reading, but it had no pictures or\r\nconversations in it, “and what is the use of a book,” thought Alice\r\n“without pictures or conversations?”\r\n\r\nSo she was considering in her own mind (as well as she could, for the\r\nhot day made her feel very sleepy and stupid), whether the pleasure of\r\nmaking a daisy-chain would be worth the trouble of getting up and\r\npicking the daisies, when suddenly a White Rabbit with pink eyes ran\r\nclose by her.\r\n\r\nThere was nothing so _very_ remarkable in that; nor did Alice think it\r\nso _very_ much out of the way to hear the Rabbit say to itself, “Oh\r\ndear! Oh dear! I shall be late!” (when she thought it over afterwards,\r\nit occurred to her that she ought to have wondered at this, but at the\r\ntime it all seemed quite natural); but when the Rabbit actually _took a\r\nwatch out of its waistcoat-pocket_, and looked at it, and then hurried\r\non, Alice started to her feet, for it flashed across her mind that she\r\nhad never before seen a rabbit with either a waistcoat-pocket, or a\r\nwatch to take out of it, and burning with curiosity, she ran across the\r\nfield after it, and fortunately was just in time to see it pop down a\r\nlarge rabbit-hole under the hedge.\r\n\r\nIn another moment down went Alice after it, never once considering how\r\nin the world she was to get out again.\r\n\r\n"
# I'm confirming that the substring is in fact 3000 characters because it looked too long to me when it printed.
nchar(first3k_aiw)
## [1] 3000

Task 3

# Find and replace "\n" and "\r" strings with white space and save as aiw2.
aiw2 <- str_replace_all(string = aiw, 
                        pattern = "\r|\n",
                        replacement = " ")

Task 4

library(stringr)
# Find the true beginning of the text.
str_locate(string = aiw2,
           pattern = '\\[Illustration\\]') 
##      start end
## [1,]   876 889
# Find the true end of the text.
str_locate(string = aiw2,
           pattern = '\\*\\*\\* END OF THE PROJECT GUTENBERG EBOOK ALICE’S ADVENTURES IN WONDERLAND \\*\\*\\*')
##       start    end
## [1,] 148866 148940
# Save just the true text as aiw2.
aiw2 <- substr(aiw2, 890, 148865)

For my own confirmation, I want to double check that the character count for Task 4’s aiw2 is less than aiw, which it should because we are selecting a substring from Task 3’s aiw2 which will have the same nchar as aiw.

nchar(aiw)
## [1] 167807
nchar(aiw2)
## [1] 147976

Task 5

library(stringr)

# aiw2 Split aiw2 so the words are each their own element and call it aiw3.
aiw3 <- str_split_1(aiw2, pattern = " ")

# aiw3 Change aiw3 from a list to a long character vector. 
aiw3 <-unlist(aiw3)

Task 6

To find the proportion of words in the book that has at least one uppercase letter, I will first find the number of words that has at least one uppercase letter and divide by the total number of words in the text.

# in aiw3, how many words have at least one uppercase letter?
uppercase_words <- grepl("[A-Z]", aiw3)
total_words_upper <- sum(uppercase_words)

print(total_words_upper)
## [1] 3399
# Find the word count of aiw3 because you will need it for the next two tasks.
total_words <- length(aiw3)
total_words
## [1] 31387
# Now divide the totals to find the proportion of words in the book that have at least one uppercase letter.
total_words_upper/total_words
## [1] 0.1082932

The proportion of words in the book that have at least one uppercase letter is approximately 11%.

Task 7

To find the proportion of words in the book that has some form a punctuation, I will repeat the procedures from Task 6 for words with punctuation.

library(stringr)

# In aiw3, what how many words have some form of punctuation?
punct_words <- grepl("[[:punct:]]", aiw3)
total_words_punct <- sum(punct_words)

print(total_words_punct)
## [1] 6470
# Divide the total number of words with some form of punctuation  by the total number of words in the book (from Task 6) to find the proportion.
total_words_punct/total_words
## [1] 0.2061363

The proportion of words in the book that have at least one uppercase letter is approximately 21%.

Task 8

# How many times does the word mushroom appear?
word <- "mushroom"
occurrences <- gregexpr(pattern = word, text = aiw2, ignore.case = TRUE)
count <- sum(sapply(occurrences, length))

print(count)
## [1] 8
# Locate each instance of the word "mushroom" and print out enough of the surrounding text to display the context for each instance.
context_length <- 40

for (occurrence in unlist(occurrences)) {
  if (occurrence > 0) {
    start_index <- max(occurrence - context_length, 1)
    end_index <- min(occurrence + context_length, nchar(aiw2))
    context <- substring(aiw2, first = start_index, last = end_index)
    print(context)
  }
}
## [1] "r the circumstances.  There was a large mushroom growing near her, about the same"
## [1] "iptoe, and peeped over the edge of the  mushroom, and her eyes immediately met th"
## [1] "shook itself. Then it got down off the  mushroom, and crawled away in the grass, "
## [1] "” thought Alice to  herself.    “Of the mushroom,” said the Caterpillar, just as "
## [1] "ce remained looking thoughtfully at the mushroom for a minute,  trying to make ou"
## [1] "bered that she still held the pieces of mushroom in her hands,  and she set to wo"
## [1] "ibbled some more of the lefthand bit of mushroom, and raised herself  to about tw"
## [1] ". Then she went to work nibbling at the mushroom  (she had kept a piece of it in "

I don’t necessarily think that using a certain number of characters to show context is the best way to go about it. So I tried again to see if I could figure out how to print each instance with the sentence that includes “mushroom” as well as one preceding sentence, and two following sentences. The attempts not shown here include versions where the outputs were numbered [1]:[4] repeatedly, lacked punctuation, had no headings, etc. I finally got this one to work and I think it’s how I would do it in real life. I used ChatGPT to help me figure out the instance counter part so I could put headings on each instance output.

Note, I like the efficiency of “\\s+” for white space so I went ahead and made this perl compatible so I could use it.

# I split the text into sentences using punctuation marks plus white space (at least one white space). I use [[1]] to pull the text out as a single character vector instead of a list.
sentences <- strsplit(aiw2, "(?<=[.!?])\\s+", perl = TRUE) [[1]]

# I'm creating a instance counter starting at one so that I can have numbered headings on each section of text.
instance_counter <- 1

# Now I'm going to create a loop that checks each sentence in the text for the word mushroom. When such a sentence is located (its index), we then define a range that includes the sentence's index minus one position (so we get the sentence in the position before the mushroom sentence) and plus two (so we get the sentences in the two postions following the mushroom sentence). Then we print the heading of the instance, the instance, and remove extra spaces so the sentences print together in a line. Since this is a loop, it will do this for all eight instances of "mushroom."
for (sentence in sentences) {
  if (grepl(word, sentence, ignore.case = TRUE)) {
    
    # Find the index of the matching sentence
    sentence_index <- match(sentence, sentences)

    # Define the range of sentences to print (one preceding and two following)
    start_index <- max(sentence_index - 1, 1)
    end_index <- min(sentence_index + 2, length(sentences))

    # Pull out the context sentences
    context <- sentences[start_index:end_index]

    # Print the instance title
    cat("Instance", instance_counter, ": ")

    # Print the context as a single line
    cat(paste(context, collapse = " "), "\n\n")

    # Tell the instance counter to go in increments of 1
    instance_counter <- instance_counter + 1
  }
}
## Instance 1 : Alice looked all round her at  the flowers and the blades of grass, but she did not see anything that  looked like the right thing to eat or drink under the circumstances. There was a large mushroom growing near her, about the same height as  herself; and when she had looked under it, and on both sides of it, and  behind it, it occurred to her that she might as well look and see what  was on the top of it. She stretched herself up on tiptoe, and peeped over the edge of the  mushroom, and her eyes immediately met those of a large blue  caterpillar, that was sitting on the top with its arms folded, quietly  smoking a long hookah, and taking not the smallest notice of her or of  anything else. CHAPTER V. 
## 
## Instance 2 : There was a large mushroom growing near her, about the same height as  herself; and when she had looked under it, and on both sides of it, and  behind it, it occurred to her that she might as well look and see what  was on the top of it. She stretched herself up on tiptoe, and peeped over the edge of the  mushroom, and her eyes immediately met those of a large blue  caterpillar, that was sitting on the top with its arms folded, quietly  smoking a long hookah, and taking not the smallest notice of her or of  anything else. CHAPTER V. Advice from a Caterpillar      The Caterpillar and Alice looked at each other for some time in  silence: at last the Caterpillar took the hookah out of its mouth, and  addressed her in a languid, sleepy voice. 
## 
## Instance 3 : In a  minute or two the Caterpillar took the hookah out of its mouth and  yawned once or twice, and shook itself. Then it got down off the  mushroom, and crawled away in the grass, merely remarking as it went,  “One side will make you grow taller, and the other side will make you  grow shorter.”    “One side of _what?_ The other side of _what?_” thought Alice to  herself. “Of the mushroom,” said the Caterpillar, just as if she had asked it  aloud; and in another moment it was out of sight. Alice remained looking thoughtfully at the mushroom for a minute,  trying to make out which were the two sides of it; and as it was  perfectly round, she found this a very difficult question. 
## 
## Instance 4 : Then it got down off the  mushroom, and crawled away in the grass, merely remarking as it went,  “One side will make you grow taller, and the other side will make you  grow shorter.”    “One side of _what?_ The other side of _what?_” thought Alice to  herself. “Of the mushroom,” said the Caterpillar, just as if she had asked it  aloud; and in another moment it was out of sight. Alice remained looking thoughtfully at the mushroom for a minute,  trying to make out which were the two sides of it; and as it was  perfectly round, she found this a very difficult question. However, at  last she stretched her arms round it as far as they would go, and broke  off a bit of the edge with each hand. 
## 
## Instance 5 : “Of the mushroom,” said the Caterpillar, just as if she had asked it  aloud; and in another moment it was out of sight. Alice remained looking thoughtfully at the mushroom for a minute,  trying to make out which were the two sides of it; and as it was  perfectly round, she found this a very difficult question. However, at  last she stretched her arms round it as far as they would go, and broke  off a bit of the edge with each hand. “And now which is which?” she said to herself, and nibbled a little of  the right-hand bit to try the effect: the next moment she felt a  violent blow underneath her chin: it had struck her foot! 
## 
## Instance 6 : Alice crouched down among the trees as well  as she could, for her neck kept getting entangled among the branches,  and every now and then she had to stop and untwist it. After a while  she remembered that she still held the pieces of mushroom in her hands,  and she set to work very carefully, nibbling first at one and then at  the other, and growing sometimes taller and sometimes shorter, until  she had succeeded in bringing herself down to her usual height. It was so long since she had been anything near the right size, that it  felt quite strange at first; but she got used to it in a few minutes,  and began talking to herself, as usual. “Come, there’s half my plan  done now! 
## 
## Instance 7 : It’s the most curious thing I ever saw in my life!”    She had not gone much farther before she came in sight of the house of  the March Hare: she thought it must be the right house, because the  chimneys were shaped like ears and the roof was thatched with fur. It  was so large a house, that she did not like to go nearer till she had  nibbled some more of the lefthand bit of mushroom, and raised herself  to about two feet high: even then she walked up towards it rather  timidly, saying to herself “Suppose it should be raving mad after all! I almost wish I’d gone to see the Hatter instead!”          CHAPTER VII. A Mad Tea-Party      There was a table set out under a tree in front of the house, and the  March Hare and the Hatter were having tea at it: a Dormouse was sitting  between them, fast asleep, and the other two were using it as a  cushion, resting their elbows on it, and talking over its head. 
## 
## Instance 8 : “Now, I’ll manage better this time,” she said to herself,  and began by taking the little golden key, and unlocking the door that  led into the garden. Then she went to work nibbling at the mushroom  (she had kept a piece of it in her pocket) till she was about a foot  high: then she walked down the little passage: and _then_—she found  herself at last in the beautiful garden, among the bright flower-beds  and the cool fountains. CHAPTER VIII. The Queen’s Croquet-Ground      A large rose-tree stood near the entrance of the garden: the roses  growing on it were white, but there were three gardeners at it, busily  painting them red.

Task 9

I’m replacing all instances of the word mushroom with apple. Below, I first counted how many occurrences of apple there were. Knowing that there are eight occurrences of mushroom, I wanted to check the difference in the number of instances before and after I replace mushroom with apple to confirm the code I wrote did what I wanted it to do.

# Count the number of occurrences of apple in the original text.
word <- "apple"
occurrences <- gregexpr(pattern = word, text = aiw2, ignore.case = TRUE)
count <- sum(sapply(occurrences, length))

print(count)
## [1] 3
#Replace "mushroom" with "apple" in the text.
eatapples <- str_replace_all(string = aiw2, 
                        pattern = "mushroom",
                        replacement = "apple")

# Recount the number of occurrences of apple in the text.
word <- "apple"
occurrences <- gregexpr(pattern = word, text = eatapples, ignore.case = TRUE)
count <- sum(sapply(occurrences, length))

print(count)
## [1] 11

Since the original occurrences of “apple” was 3 and the original occurrences of “mushroom” was 8, it makes sense that the new occurrences of “apple” is 11. I am satisfied that the code worked as expected.