library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stringr)
data = state.name
length(data)
## [1] 50
The length function allows an individual to identify the number of strings contained within a variable.
str_length(data)
## [1] 7 6 7 8 10 8 11 8 7 7 6 5 8 7 4 6 8 9 5 8 13 8 9 11 8
## [26] 7 8 6 13 10 10 8 14 12 4 8 6 12 12 14 12 9 5 4 7 8 10 13 9 7
The output given by this function counts the number of characters within each string. (i.e., Alabama is the first string in this vector which contains 7 letters/characters)
four_states = c("Arizona",
"California",
"Illinois",
"Oregon")
str_view_all(data, paste0(four_states, collapse = "|")) %>%
print(n=50)
## [1] │ Alabama
## [2] │ Alaska
## [3] │ <Arizona>
## [4] │ Arkansas
## [5] │ <California>
## [6] │ Colorado
## [7] │ Connecticut
## [8] │ Delaware
## [9] │ Florida
## [10] │ Georgia
## [11] │ Hawaii
## [12] │ Idaho
## [13] │ <Illinois>
## [14] │ Indiana
## [15] │ Iowa
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ Louisiana
## [19] │ Maine
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ New Hampshire
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolina
## [34] │ North Dakota
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ <Oregon>
## [38] │ Pennsylvania
## [39] │ Rhode Island
## [40] │ South Carolina
## [41] │ South Dakota
## [42] │ Tennessee
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virginia
## [47] │ Washington
## [48] │ West Virginia
## [49] │ Wisconsin
## [50] │ Wyoming
str_view_all(data, "^D") %>%
print(n=50)
## [1] │ Alabama
## [2] │ Alaska
## [3] │ Arizona
## [4] │ Arkansas
## [5] │ California
## [6] │ Colorado
## [7] │ Connecticut
## [8] │ <D>elaware
## [9] │ Florida
## [10] │ Georgia
## [11] │ Hawaii
## [12] │ Idaho
## [13] │ Illinois
## [14] │ Indiana
## [15] │ Iowa
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ Louisiana
## [19] │ Maine
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ New Hampshire
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolina
## [34] │ North Dakota
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ Oregon
## [38] │ Pennsylvania
## [39] │ Rhode Island
## [40] │ South Carolina
## [41] │ South Dakota
## [42] │ Tennessee
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virginia
## [47] │ Washington
## [48] │ West Virginia
## [49] │ Wisconsin
## [50] │ Wyoming
In this case, the “^” meta-character allows us to identify any strings that begin with the letter “D”. Delaware is the only state that begins with that letter. It is important to note that the letter “D” must be capitalized in the code due to the fact that R is case-sensitive. A way to get past this without having to worry so much about it is by using either of the functions that allow you to convert all characters in a string to upper- or lower-case.
str_view_all(data, "[ae]$") %>%
print(n=50)
## [1] │ Alabam<a>
## [2] │ Alask<a>
## [3] │ Arizon<a>
## [4] │ Arkansas
## [5] │ Californi<a>
## [6] │ Colorado
## [7] │ Connecticut
## [8] │ Delawar<e>
## [9] │ Florid<a>
## [10] │ Georgi<a>
## [11] │ Hawaii
## [12] │ Idaho
## [13] │ Illinois
## [14] │ Indian<a>
## [15] │ Iow<a>
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ Louisian<a>
## [19] │ Main<e>
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesot<a>
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montan<a>
## [27] │ Nebrask<a>
## [28] │ Nevad<a>
## [29] │ New Hampshir<e>
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolin<a>
## [34] │ North Dakot<a>
## [35] │ Ohio
## [36] │ Oklahom<a>
## [37] │ Oregon
## [38] │ Pennsylvani<a>
## [39] │ Rhode Island
## [40] │ South Carolin<a>
## [41] │ South Dakot<a>
## [42] │ Tennesse<e>
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virgini<a>
## [47] │ Washington
## [48] │ West Virgini<a>
## [49] │ Wisconsin
## [50] │ Wyoming
For this question, the square brackets were used alongside the dollar-sign meta-character. The squared bracket allows me to specify the letters “a” and “e” while the dollar-sign meta-character allows me to specify that these letters must be at the end of the string. Similar to the previous question, R is case-sensitive meaning that “a” and “e” must both be inputted as lower-case characters.
str_view_all(data, ".*\\s.*") %>%
print(n=50)
## [1] │ Alabama
## [2] │ Alaska
## [3] │ Arizona
## [4] │ Arkansas
## [5] │ California
## [6] │ Colorado
## [7] │ Connecticut
## [8] │ Delaware
## [9] │ Florida
## [10] │ Georgia
## [11] │ Hawaii
## [12] │ Idaho
## [13] │ Illinois
## [14] │ Indiana
## [15] │ Iowa
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ Louisiana
## [19] │ Maine
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ <New Hampshire>
## [30] │ <New Jersey>
## [31] │ <New Mexico>
## [32] │ <New York>
## [33] │ <North Carolina>
## [34] │ <North Dakota>
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ Oregon
## [38] │ Pennsylvania
## [39] │ <Rhode Island>
## [40] │ <South Carolina>
## [41] │ <South Dakota>
## [42] │ Tennessee
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virginia
## [47] │ Washington
## [48] │ <West Virginia>
## [49] │ Wisconsin
## [50] │ Wyoming
By using the “escape” meta-character, we were able to identify all states that consisted of two words. The result was 9 different states: New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Rhode Island, South Carolina, South Dakota and West Virginia.
str_view_all(data, "\\b\\w*[ntwcNCTW]+\\w*\\b") %>%
print(n=50)
## [1] │ Alabama
## [2] │ Alaska
## [3] │ <Arizona>
## [4] │ <Arkansas>
## [5] │ <California>
## [6] │ <Colorado>
## [7] │ <Connecticut>
## [8] │ <Delaware>
## [9] │ Florida
## [10] │ Georgia
## [11] │ <Hawaii>
## [12] │ Idaho
## [13] │ <Illinois>
## [14] │ <Indiana>
## [15] │ <Iowa>
## [16] │ <Kansas>
## [17] │ <Kentucky>
## [18] │ <Louisiana>
## [19] │ <Maine>
## [20] │ <Maryland>
## [21] │ <Massachusetts>
## [22] │ <Michigan>
## [23] │ <Minnesota>
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ <Montana>
## [27] │ <Nebraska>
## [28] │ <Nevada>
## [29] │ <New> Hampshire
## [30] │ <New> Jersey
## [31] │ <New> <Mexico>
## [32] │ <New> York
## [33] │ <North> <Carolina>
## [34] │ <North> <Dakota>
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ <Oregon>
## [38] │ <Pennsylvania>
## [39] │ Rhode <Island>
## [40] │ <South> <Carolina>
## [41] │ <South> <Dakota>
## [42] │ <Tennessee>
## [43] │ <Texas>
## [44] │ <Utah>
## [45] │ <Vermont>
## [46] │ <Virginia>
## [47] │ <Washington>
## [48] │ <West> <Virginia>
## [49] │ <Wisconsin>
## [50] │ <Wyoming>
I asked an AI tool for help on this question, and it directed me to use a specific function that we have not yet covered in class. I then instructed it to use simple meta-characters and I learned how to use word and character boundaries to complete this question.
str_view_all(data, "[ciCI]") %>%
print(n=50)
## [1] │ Alabama
## [2] │ Alaska
## [3] │ Ar<i>zona
## [4] │ Arkansas
## [5] │ <C>al<i>forn<i>a
## [6] │ <C>olorado
## [7] │ <C>onne<c>t<i><c>ut
## [8] │ Delaware
## [9] │ Flor<i>da
## [10] │ Georg<i>a
## [11] │ Hawa<i><i>
## [12] │ <I>daho
## [13] │ <I>ll<i>no<i>s
## [14] │ <I>nd<i>ana
## [15] │ <I>owa
## [16] │ Kansas
## [17] │ Kentu<c>ky
## [18] │ Lou<i>s<i>ana
## [19] │ Ma<i>ne
## [20] │ Maryland
## [21] │ Massa<c>husetts
## [22] │ M<i><c>h<i>gan
## [23] │ M<i>nnesota
## [24] │ M<i>ss<i>ss<i>pp<i>
## [25] │ M<i>ssour<i>
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ New Hampsh<i>re
## [30] │ New Jersey
## [31] │ New Mex<i><c>o
## [32] │ New York
## [33] │ North <C>arol<i>na
## [34] │ North Dakota
## [35] │ Oh<i>o
## [36] │ Oklahoma
## [37] │ Oregon
## [38] │ Pennsylvan<i>a
## [39] │ Rhode <I>sland
## [40] │ South <C>arol<i>na
## [41] │ South Dakota
## [42] │ Tennessee
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ V<i>rg<i>n<i>a
## [47] │ Wash<i>ngton
## [48] │ West V<i>rg<i>n<i>a
## [49] │ W<i>s<c>ons<i>n
## [50] │ Wyom<i>ng
With the above code, 31 different states were identified to have either “c/C” or “i/I” within their names.
str_view_all(data, "^......$") %>%
print(n=50)
## [1] │ Alabama
## [2] │ <Alaska>
## [3] │ Arizona
## [4] │ Arkansas
## [5] │ California
## [6] │ Colorado
## [7] │ Connecticut
## [8] │ Delaware
## [9] │ Florida
## [10] │ Georgia
## [11] │ <Hawaii>
## [12] │ Idaho
## [13] │ Illinois
## [14] │ Indiana
## [15] │ Iowa
## [16] │ <Kansas>
## [17] │ Kentucky
## [18] │ Louisiana
## [19] │ Maine
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ <Nevada>
## [29] │ New Hampshire
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolina
## [34] │ North Dakota
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ <Oregon>
## [38] │ Pennsylvania
## [39] │ Rhode Island
## [40] │ South Carolina
## [41] │ South Dakota
## [42] │ Tennessee
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virginia
## [47] │ Washington
## [48] │ West Virginia
## [49] │ Wisconsin
## [50] │ Wyoming
By specifying the number of characters in the word by using “.”, we were able to identify only the states with the desired number of letters.
str_view_all(data, ".{6,}") %>%
print(n=50)
## [1] │ <Alabama>
## [2] │ <Alaska>
## [3] │ <Arizona>
## [4] │ <Arkansas>
## [5] │ <California>
## [6] │ <Colorado>
## [7] │ <Connecticut>
## [8] │ <Delaware>
## [9] │ <Florida>
## [10] │ <Georgia>
## [11] │ <Hawaii>
## [12] │ Idaho
## [13] │ <Illinois>
## [14] │ <Indiana>
## [15] │ Iowa
## [16] │ <Kansas>
## [17] │ <Kentucky>
## [18] │ <Louisiana>
## [19] │ Maine
## [20] │ <Maryland>
## [21] │ <Massachusetts>
## [22] │ <Michigan>
## [23] │ <Minnesota>
## [24] │ <Mississippi>
## [25] │ <Missouri>
## [26] │ <Montana>
## [27] │ <Nebraska>
## [28] │ <Nevada>
## [29] │ <New Hampshire>
## [30] │ <New Jersey>
## [31] │ <New Mexico>
## [32] │ <New York>
## [33] │ <North Carolina>
## [34] │ <North Dakota>
## [35] │ Ohio
## [36] │ <Oklahoma>
## [37] │ <Oregon>
## [38] │ <Pennsylvania>
## [39] │ <Rhode Island>
## [40] │ <South Carolina>
## [41] │ <South Dakota>
## [42] │ <Tennessee>
## [43] │ Texas
## [44] │ Utah
## [45] │ <Vermont>
## [46] │ <Virginia>
## [47] │ <Washington>
## [48] │ <West Virginia>
## [49] │ <Wisconsin>
## [50] │ <Wyoming>
By using the brace meta-character, we were able to specify that we wanted state names that contained at least 6 or more characters in their names. The combination of the wild-card and brace meta-characters made this possible.
str_view_all(data, "^[BCDFGHJKLMNPQRSTVWXYZ][bcdfghjklmnpqrstvwxyz]") %>%
print(n=50)
## [1] │ Alabama
## [2] │ Alaska
## [3] │ Arizona
## [4] │ Arkansas
## [5] │ California
## [6] │ Colorado
## [7] │ Connecticut
## [8] │ Delaware
## [9] │ <Fl>orida
## [10] │ Georgia
## [11] │ Hawaii
## [12] │ Idaho
## [13] │ Illinois
## [14] │ Indiana
## [15] │ Iowa
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ Louisiana
## [19] │ Maine
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ New Hampshire
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolina
## [34] │ North Dakota
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ Oregon
## [38] │ Pennsylvania
## [39] │ <Rh>ode Island
## [40] │ South Carolina
## [41] │ South Dakota
## [42] │ Tennessee
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virginia
## [47] │ Washington
## [48] │ West Virginia
## [49] │ Wisconsin
## [50] │ <Wy>oming
This time, the anchor meta-character was used to specify that we were interested in finding characters at the start of the string. Then, all the consonants were written out to instruct R which characters to look for. The first bracket was a set of capitalized letters since the first letters of all names are capitalized and the second bracket was full of the same letters; however, this time in lower-case. Florida, Rhode Island, and Wyoming were the only states identified.
str_view_all(data, "[aeiouAEIOU][aeiouAEIOU]+") %>%
print(n=50)
## [1] │ Alabama
## [2] │ Alaska
## [3] │ Arizona
## [4] │ Arkansas
## [5] │ Californ<ia>
## [6] │ Colorado
## [7] │ Connecticut
## [8] │ Delaware
## [9] │ Florida
## [10] │ G<eo>rg<ia>
## [11] │ Haw<aii>
## [12] │ Idaho
## [13] │ Illin<oi>s
## [14] │ Ind<ia>na
## [15] │ <Io>wa
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ L<oui>s<ia>na
## [19] │ M<ai>ne
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Miss<ou>ri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ New Hampshire
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolina
## [34] │ North Dakota
## [35] │ Oh<io>
## [36] │ Oklahoma
## [37] │ Oregon
## [38] │ Pennsylvan<ia>
## [39] │ Rhode Island
## [40] │ S<ou>th Carolina
## [41] │ S<ou>th Dakota
## [42] │ Tenness<ee>
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virgin<ia>
## [47] │ Washington
## [48] │ West Virgin<ia>
## [49] │ Wisconsin
## [50] │ Wyoming
Similar to the previous question, two brackets including the characters I wanted were specified. Then the reptition meta-character “+” was used to instruct R to choose items that were matched 1 or more times, as the assignment requested.
str_view_all(data, "^[AEIOU].*[bcdfghjklmnpqrstvwxyz]$") %>%
print(n=50)
## [1] │ Alabama
## [2] │ Alaska
## [3] │ Arizona
## [4] │ <Arkansas>
## [5] │ California
## [6] │ Colorado
## [7] │ Connecticut
## [8] │ Delaware
## [9] │ Florida
## [10] │ Georgia
## [11] │ Hawaii
## [12] │ Idaho
## [13] │ <Illinois>
## [14] │ Indiana
## [15] │ Iowa
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ Louisiana
## [19] │ Maine
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ New Hampshire
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolina
## [34] │ North Dakota
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ <Oregon>
## [38] │ Pennsylvania
## [39] │ Rhode Island
## [40] │ South Carolina
## [41] │ South Dakota
## [42] │ Tennessee
## [43] │ Texas
## [44] │ <Utah>
## [45] │ Vermont
## [46] │ Virginia
## [47] │ Washington
## [48] │ West Virginia
## [49] │ Wisconsin
## [50] │ Wyoming
Arkansas, Illinois, Oregon and Utah were identified as the only 4 states whose names began with a vowel and ended with a consonant.