library(tidyverse)
library(stringr)

Introduction

Many of you will need some practice on string manipulation (finding, inserting, extracting, replacing, etc. ). Also, many of you will find the Boston dialect of our language challenging. So, solving two problems at once, I am asking your help in writing a function to convert English text to a “pronuncation guide” for our local, cherished dialect. Specifically, I am asking you to write a vectorized function to convert each string in a string vector to a Bostonized pronunciation guide. By speaking this output, you will be understood by Bostonians.

Here are some pronunciation rules to assist you. In writing these rules, I am amazed how contradictory some are. That is, by running these rules twice through a text string, we lose some changes from the first round. To amend this, surround each replaced string with “<>”. This way, we can see all changed substrings and prevent further changes. (Assume characters “<” and “>” will not be in the English text.) It will be easy to read through these character anyway!

Please note that I have arranged the above rules according to the order of changes. You should avoid making changes to patterns preceded by “<”, as this indicates a change already made.

I hope that lecture notes will be helpful to you. In particular, please find the stringr and RegEx slides.

In the following chunk, please create an R function called Bostonize(), which takes a string vector and returns, for each string element, its Boston pronunciation guide.

Bostonize <- function(chvector) {
  
  chvector <- str_replace_all(chvector,
                              "(?<!<)turn left",
                              "<bang <ah> louee>")
  chvector <- str_replace_all(chvector,
                              "(?<!<)turn right",
                              "<bang <ah> ralph>")
  chvector <- str_replace_all(chvector,
                              "(?<!<)very",
                              "<wicked>")
  chvector <- str_replace_all(chvector,
                              "(?<!<)great",
                              "<wicked>")
  chvector <- str_replace_all(chvector,
                              "(?<!<)sandwich",
                              "<spuckee>")
  chvector <- str_replace_all(chvector,
                              "(?<!<)water fountain",
                              "<bubblah>")
  chvector <- str_replace_all(chvector,
                              "(?<!<)soda",
                              "<soder>")
  chvector <- str_replace_all(chvector,
                              "(?<!<)Dorchester Avenue",
                              "<Dotav>")
  chvector <- str_replace_all(chvector,
                              "(?<!<)Dorchester",
                              "<D<aw>t>")
  chvector <- str_replace_all(chvector,
                              "(?<!<)Massachusetts Avenue",
                              "<Massav>")
  chvector <- str_replace_all(chvector,
                              "(?<!<)Commonwealth Avenue",
                              "<C<aw>mav>")
  chvector <- str_replace_all(chvector,
                              "(?<!<)Jamaica Plain",
                              "<JP>")
   chvector <- str_replace_all(chvector,
                              "(?<!<)package store",
                              "<packee>")
   chvector <- str_replace_all(chvector,
                              "(?<!<)([^\\s])ear",
                              "\\1<ah>")
    chvector <- str_replace_all(chvector,
                              "(?<!<)ar",
                              "<ah>")
    chvector <- str_replace_all(chvector,
                              "(?<!<)er",
                              "<ah>")
    chvector <- str_replace_all(chvector,
                              "(?<!<)ir",
                              "<ah>")
    chvector <- str_replace_all(chvector,
                              "(?<!<)or",
                              "<ah>")
    chvector <- str_replace_all(chvector,
                              "(?<!<)ur",
                              "<ah>")
    chvector <- str_replace_all(chvector,
                              "(?<!<)aw",
                              "<or>")
    chvector <- str_replace_all(chvector,
                              "(?<!<)(?<!>)(?<!o)(?<!s)(?<!ng)(?<!f)(?<!l)(?<!th)o",
                              "<aw>")
    chvector <- str_replace_all(chvector,
                              "(?<!<)a$",
                              "<ar>")
    chvector <- str_replace_all(chvector, "(?<!<)i$", "<ir>")
    chvector <- str_replace_all(chvector, "(?<!<)o$", "<or>")
    chvector <- str_replace_all(chvector, "(?<!<)u$", "<ur>")
    chvector <- str_replace_all(chvector, "(?<!<)aw$", "<awr>")
   
  
  return(chvector)
}

Consider the following input string vector, and run your function once. Make sure that you display the output in your solution.

myvector <-c("I parked my car by the harbor yard near Harvard .",
"I have my heart set on eating a ham sandwich from Mike's Deli in Dorchester. That place is great and  I am very hungry!",
"I hope that this Uber gets to the package store soon! The traffic on Massachusetts Avenue was very bad this afternoon.  Three cars were in an accident. The Uber driver had to turn left onto Commonwealth Avenue, then turn left again onto Dorchester Avenue, and finally turn right on the Jamaica Plain parkway. We had to drive to Newton through Jamaica Plain!",
"In the long history of the world, only a few generations have been granted the role of defending freedom in its hour of maximum danger. I do not shrink from this responsibility--I welcome it. I do not believe that any of us would exchange places with any other people or any other generation. The energy, the faith, the devotion which we bring to this endeavor will light our country and all who serve it--and the glow from that fire can truly light the world. 

     And so, my fellow Americans: ask not what your country can do for you--ask what you can do for your country. ")

Here’s one run of the Bostonize function, with myvector as the input vector:

Bostonize(myvector)
## [1] "I p<ah>ked my c<ah> by the h<ah>b<ah> y<ah>d n<ah> H<ah>v<ah>d ."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
## [2] "I have my h<ah>t set <aw>n eating a ham <spuckee> fr<aw>m Mike's Deli in <D<aw>t>. That place is <wicked> and  I am <wicked> hungry!"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## [3] "I h<aw>pe that this Ub<ah> gets t<aw> the <packee> soon! The traffic <aw>n <Massav> was <wicked> bad this aft<ah>n<aw>on.  Three c<ah>s w<ah>e in an accident. The Ub<ah> driv<ah> had t<aw> <bang <ah> louee> <aw>nt<aw> <C<aw>mav>, then <bang <ah> louee> again <aw>nt<aw> <D<aw>tav>, and finally <bang <ah> ralph> <aw>n the <JP> p<ah>kway. We had t<aw> drive t<aw> Newt<aw>n thr<aw>ugh <JP>!"                                                                                                                                                                                                                                                                                                                                                     
## [4] "In the long hist<ah>y <aw>f the w<ah>ld, <aw>nly a few gen<ah>ati<aw>ns have been granted the r<aw>le <aw>f defending freed<aw>m in its h<aw><ah> <aw>f maximum dang<ah>. I d<aw> n<aw>t shrink fr<aw>m this resp<aw>nsibility--I welc<aw>me it. I d<aw> n<aw>t believe that any <aw>f us w<aw>uld exchange places with any <aw>th<ah> pe<aw>ple <ah> any <aw>th<ah> gen<ah>ati<aw>n. The en<ah>gy, the faith, the dev<aw>ti<aw>n which we bring t<aw> this endeav<ah> will light <aw><ah> c<aw>untry and all wh<aw> s<ah>ve it--and the glow fr<aw>m that f<ah>e can truly light the w<ah>ld. \n\n     And so, my fellow Am<ah>icans: ask n<aw>t what y<aw><ah> c<aw>untry can d<aw> f<ah> y<aw>u--ask what y<aw>u can d<aw> f<ah> y<aw><ah> c<aw>untry. "

Great! Now make sure that your pronunciation guide is not changed by running your function again. To do this, apply the function twice to the above string vector and see if no further changes are made. This is easily verified from a TRUE result from identical().

Below, I check to make sure that the string vector isn’t changed by running the function twice:

test1 <- Bostonize(myvector)
test2 <- Bostonize(test1)

identical(test1,test2)
## [1] TRUE

The first time I ran the above identical() test, it returned FALSE. After doing some investigating, I noticed that the word ‘afternoon’ was causing this trouble. The instructions state that if there is an ‘oo’, the second ‘o’ shouldn’t be replaced. But after the first run, ‘afternoon’ was converted to ‘aftnon’, so I added an additional check to make sure that the second check wouldn’t convert ‘aftnon’ to ‘aftnn’.

As a final exercise, please compare your pronunciation guide of the last string to this recording of from President Kennedy’s inaugural address: https://www.youtube.com/watch?v=mxa4HDgfWFs .

Overall, the translation was pretty good! There were a few instances where the translation was not as good. For example, JFK tends to pronounce “do” correctly, but as per the assignment instructions, we translate “do” to “daw” (not “dor”, because the instructions want us to maintain the order, and replacing “o” with “aw” comes before replacing the final “o” with “or”). He also pronounced “fire” similarly to “faiah”, but as per the instructions, we change “fire” to “fahe”. Words such as “energy” (enahgy) and “endeavor” (endeavah) and “for” (fah) were translated very well, and they reflected JFK’s accent.