Data and Packages

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(stringr)
data = state.name

Question 1

Length of the vector

length(data)

## [1] 50

The length function allows an individual to identify the number of strings contained within a variable.

Length of each string

str_length(data)

##  [1]  7  6  7  8 10  8 11  8  7  7  6  5  8  7  4  6  8  9  5  8 13  8  9 11  8
## [26]  7  8  6 13 10 10  8 14 12  4  8  6 12 12 14 12  9  5  4  7  8 10 13  9  7

The output given by this function counts the number of characters within each string. (i.e., Alabama is the first string in this vector which contains 7 letters/characters)

Question 2

four_states = c("Arizona",
                "California",
                "Illinois",
                "Oregon")

str_view_all(data, paste0(four_states, collapse = "|")) %>% 
  print(n=50)

##  [1] │ Alabama
##  [2] │ Alaska
##  [3] │ <Arizona>
##  [4] │ Arkansas
##  [5] │ <California>
##  [6] │ Colorado
##  [7] │ Connecticut
##  [8] │ Delaware
##  [9] │ Florida
## [10] │ Georgia
## [11] │ Hawaii
## [12] │ Idaho
## [13] │ <Illinois>
## [14] │ Indiana
## [15] │ Iowa
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ Louisiana
## [19] │ Maine
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ New Hampshire
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolina
## [34] │ North Dakota
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ <Oregon>
## [38] │ Pennsylvania
## [39] │ Rhode Island
## [40] │ South Carolina
## [41] │ South Dakota
## [42] │ Tennessee
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virginia
## [47] │ Washington
## [48] │ West Virginia
## [49] │ Wisconsin
## [50] │ Wyoming

Question 3

str_view_all(data, "^D") %>% 
  print(n=50)

##  [1] │ Alabama
##  [2] │ Alaska
##  [3] │ Arizona
##  [4] │ Arkansas
##  [5] │ California
##  [6] │ Colorado
##  [7] │ Connecticut
##  [8] │ <D>elaware
##  [9] │ Florida
## [10] │ Georgia
## [11] │ Hawaii
## [12] │ Idaho
## [13] │ Illinois
## [14] │ Indiana
## [15] │ Iowa
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ Louisiana
## [19] │ Maine
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ New Hampshire
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolina
## [34] │ North Dakota
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ Oregon
## [38] │ Pennsylvania
## [39] │ Rhode Island
## [40] │ South Carolina
## [41] │ South Dakota
## [42] │ Tennessee
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virginia
## [47] │ Washington
## [48] │ West Virginia
## [49] │ Wisconsin
## [50] │ Wyoming

In this case, the “^” meta-character allows us to identify any strings that begin with the letter “D”. Delaware is the only state that begins with that letter. It is important to note that the letter “D” must be capitalized in the code due to the fact that R is case-sensitive. A way to get past this without having to worry so much about it is by using either of the functions that allow you to convert all characters in a string to upper- or lower-case.

Question 4

str_view_all(data, "[ae]$") %>% 
  print(n=50)

##  [1] │ Alabam<a>
##  [2] │ Alask<a>
##  [3] │ Arizon<a>
##  [4] │ Arkansas
##  [5] │ Californi<a>
##  [6] │ Colorado
##  [7] │ Connecticut
##  [8] │ Delawar<e>
##  [9] │ Florid<a>
## [10] │ Georgi<a>
## [11] │ Hawaii
## [12] │ Idaho
## [13] │ Illinois
## [14] │ Indian<a>
## [15] │ Iow<a>
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ Louisian<a>
## [19] │ Main<e>
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesot<a>
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montan<a>
## [27] │ Nebrask<a>
## [28] │ Nevad<a>
## [29] │ New Hampshir<e>
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolin<a>
## [34] │ North Dakot<a>
## [35] │ Ohio
## [36] │ Oklahom<a>
## [37] │ Oregon
## [38] │ Pennsylvani<a>
## [39] │ Rhode Island
## [40] │ South Carolin<a>
## [41] │ South Dakot<a>
## [42] │ Tennesse<e>
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virgini<a>
## [47] │ Washington
## [48] │ West Virgini<a>
## [49] │ Wisconsin
## [50] │ Wyoming

For this question, the square brackets were used alongside the dollar-sign meta-character. The squared bracket allows me to specify the letters “a” and “e” while the dollar-sign meta-character allows me to specify that these letters must be at the end of the string. Similar to the previous question, R is case-sensitive meaning that “a” and “e” must both be inputted as lower-case characters.

Question 5

str_view_all(data, ".*\\s.*") %>% 
  print(n=50)

##  [1] │ Alabama
##  [2] │ Alaska
##  [3] │ Arizona
##  [4] │ Arkansas
##  [5] │ California
##  [6] │ Colorado
##  [7] │ Connecticut
##  [8] │ Delaware
##  [9] │ Florida
## [10] │ Georgia
## [11] │ Hawaii
## [12] │ Idaho
## [13] │ Illinois
## [14] │ Indiana
## [15] │ Iowa
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ Louisiana
## [19] │ Maine
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ <New Hampshire>
## [30] │ <New Jersey>
## [31] │ <New Mexico>
## [32] │ <New York>
## [33] │ <North Carolina>
## [34] │ <North Dakota>
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ Oregon
## [38] │ Pennsylvania
## [39] │ <Rhode Island>
## [40] │ <South Carolina>
## [41] │ <South Dakota>
## [42] │ Tennessee
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virginia
## [47] │ Washington
## [48] │ <West Virginia>
## [49] │ Wisconsin
## [50] │ Wyoming

By using the “escape” meta-character, we were able to identify all states that consisted of two words. The result was 9 different states: New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Rhode Island, South Carolina, South Dakota and West Virginia.

Question 6

str_view_all(data, "\\b\\w*[ntwcNCTW]+\\w*\\b") %>% 
  print(n=50)

##  [1] │ Alabama
##  [2] │ Alaska
##  [3] │ <Arizona>
##  [4] │ <Arkansas>
##  [5] │ <California>
##  [6] │ <Colorado>
##  [7] │ <Connecticut>
##  [8] │ <Delaware>
##  [9] │ Florida
## [10] │ Georgia
## [11] │ <Hawaii>
## [12] │ Idaho
## [13] │ <Illinois>
## [14] │ <Indiana>
## [15] │ <Iowa>
## [16] │ <Kansas>
## [17] │ <Kentucky>
## [18] │ <Louisiana>
## [19] │ <Maine>
## [20] │ <Maryland>
## [21] │ <Massachusetts>
## [22] │ <Michigan>
## [23] │ <Minnesota>
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ <Montana>
## [27] │ <Nebraska>
## [28] │ <Nevada>
## [29] │ <New> Hampshire
## [30] │ <New> Jersey
## [31] │ <New> <Mexico>
## [32] │ <New> York
## [33] │ <North> <Carolina>
## [34] │ <North> <Dakota>
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ <Oregon>
## [38] │ <Pennsylvania>
## [39] │ Rhode <Island>
## [40] │ <South> <Carolina>
## [41] │ <South> <Dakota>
## [42] │ <Tennessee>
## [43] │ <Texas>
## [44] │ <Utah>
## [45] │ <Vermont>
## [46] │ <Virginia>
## [47] │ <Washington>
## [48] │ <West> <Virginia>
## [49] │ <Wisconsin>
## [50] │ <Wyoming>

I asked an AI tool for help on this question, and it directed me to use a specific function that we have not yet covered in class. I then instructed it to use simple meta-characters and I learned how to use word and character boundaries to complete this question.

Question 7

str_view_all(data, "[ciCI]") %>% 
  print(n=50)

##  [1] │ Alabama
##  [2] │ Alaska
##  [3] │ Ar<i>zona
##  [4] │ Arkansas
##  [5] │ <C>al<i>forn<i>a
##  [6] │ <C>olorado
##  [7] │ <C>onne<c>t<i><c>ut
##  [8] │ Delaware
##  [9] │ Flor<i>da
## [10] │ Georg<i>a
## [11] │ Hawa<i><i>
## [12] │ <I>daho
## [13] │ <I>ll<i>no<i>s
## [14] │ <I>nd<i>ana
## [15] │ <I>owa
## [16] │ Kansas
## [17] │ Kentu<c>ky
## [18] │ Lou<i>s<i>ana
## [19] │ Ma<i>ne
## [20] │ Maryland
## [21] │ Massa<c>husetts
## [22] │ M<i><c>h<i>gan
## [23] │ M<i>nnesota
## [24] │ M<i>ss<i>ss<i>pp<i>
## [25] │ M<i>ssour<i>
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ New Hampsh<i>re
## [30] │ New Jersey
## [31] │ New Mex<i><c>o
## [32] │ New York
## [33] │ North <C>arol<i>na
## [34] │ North Dakota
## [35] │ Oh<i>o
## [36] │ Oklahoma
## [37] │ Oregon
## [38] │ Pennsylvan<i>a
## [39] │ Rhode <I>sland
## [40] │ South <C>arol<i>na
## [41] │ South Dakota
## [42] │ Tennessee
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ V<i>rg<i>n<i>a
## [47] │ Wash<i>ngton
## [48] │ West V<i>rg<i>n<i>a
## [49] │ W<i>s<c>ons<i>n
## [50] │ Wyom<i>ng

With the above code, 31 different states were identified to have either “c/C” or “i/I” within their names.

Question 8

str_view_all(data, "^......$") %>% 
  print(n=50)

##  [1] │ Alabama
##  [2] │ <Alaska>
##  [3] │ Arizona
##  [4] │ Arkansas
##  [5] │ California
##  [6] │ Colorado
##  [7] │ Connecticut
##  [8] │ Delaware
##  [9] │ Florida
## [10] │ Georgia
## [11] │ <Hawaii>
## [12] │ Idaho
## [13] │ Illinois
## [14] │ Indiana
## [15] │ Iowa
## [16] │ <Kansas>
## [17] │ Kentucky
## [18] │ Louisiana
## [19] │ Maine
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ <Nevada>
## [29] │ New Hampshire
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolina
## [34] │ North Dakota
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ <Oregon>
## [38] │ Pennsylvania
## [39] │ Rhode Island
## [40] │ South Carolina
## [41] │ South Dakota
## [42] │ Tennessee
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virginia
## [47] │ Washington
## [48] │ West Virginia
## [49] │ Wisconsin
## [50] │ Wyoming

By specifying the number of characters in the word by using “.”, we were able to identify only the states with the desired number of letters.

Question 9

str_view_all(data, ".{6,}") %>% 
  print(n=50)

##  [1] │ <Alabama>
##  [2] │ <Alaska>
##  [3] │ <Arizona>
##  [4] │ <Arkansas>
##  [5] │ <California>
##  [6] │ <Colorado>
##  [7] │ <Connecticut>
##  [8] │ <Delaware>
##  [9] │ <Florida>
## [10] │ <Georgia>
## [11] │ <Hawaii>
## [12] │ Idaho
## [13] │ <Illinois>
## [14] │ <Indiana>
## [15] │ Iowa
## [16] │ <Kansas>
## [17] │ <Kentucky>
## [18] │ <Louisiana>
## [19] │ Maine
## [20] │ <Maryland>
## [21] │ <Massachusetts>
## [22] │ <Michigan>
## [23] │ <Minnesota>
## [24] │ <Mississippi>
## [25] │ <Missouri>
## [26] │ <Montana>
## [27] │ <Nebraska>
## [28] │ <Nevada>
## [29] │ <New Hampshire>
## [30] │ <New Jersey>
## [31] │ <New Mexico>
## [32] │ <New York>
## [33] │ <North Carolina>
## [34] │ <North Dakota>
## [35] │ Ohio
## [36] │ <Oklahoma>
## [37] │ <Oregon>
## [38] │ <Pennsylvania>
## [39] │ <Rhode Island>
## [40] │ <South Carolina>
## [41] │ <South Dakota>
## [42] │ <Tennessee>
## [43] │ Texas
## [44] │ Utah
## [45] │ <Vermont>
## [46] │ <Virginia>
## [47] │ <Washington>
## [48] │ <West Virginia>
## [49] │ <Wisconsin>
## [50] │ <Wyoming>

By using the brace meta-character, we were able to specify that we wanted state names that contained at least 6 or more characters in their names. The combination of the wild-card and brace meta-characters made this possible.

Question 10

str_view_all(data, "^[BCDFGHJKLMNPQRSTVWXYZ][bcdfghjklmnpqrstvwxyz]") %>% 
  print(n=50)

##  [1] │ Alabama
##  [2] │ Alaska
##  [3] │ Arizona
##  [4] │ Arkansas
##  [5] │ California
##  [6] │ Colorado
##  [7] │ Connecticut
##  [8] │ Delaware
##  [9] │ <Fl>orida
## [10] │ Georgia
## [11] │ Hawaii
## [12] │ Idaho
## [13] │ Illinois
## [14] │ Indiana
## [15] │ Iowa
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ Louisiana
## [19] │ Maine
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ New Hampshire
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolina
## [34] │ North Dakota
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ Oregon
## [38] │ Pennsylvania
## [39] │ <Rh>ode Island
## [40] │ South Carolina
## [41] │ South Dakota
## [42] │ Tennessee
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virginia
## [47] │ Washington
## [48] │ West Virginia
## [49] │ Wisconsin
## [50] │ <Wy>oming

This time, the anchor meta-character was used to specify that we were interested in finding characters at the start of the string. Then, all the consonants were written out to instruct R which characters to look for. The first bracket was a set of capitalized letters since the first letters of all names are capitalized and the second bracket was full of the same letters; however, this time in lower-case. Florida, Rhode Island, and Wyoming were the only states identified.

Question 11

str_view_all(data, "[aeiouAEIOU][aeiouAEIOU]+") %>% 
  print(n=50)

##  [1] │ Alabama
##  [2] │ Alaska
##  [3] │ Arizona
##  [4] │ Arkansas
##  [5] │ Californ<ia>
##  [6] │ Colorado
##  [7] │ Connecticut
##  [8] │ Delaware
##  [9] │ Florida
## [10] │ G<eo>rg<ia>
## [11] │ Haw<aii>
## [12] │ Idaho
## [13] │ Illin<oi>s
## [14] │ Ind<ia>na
## [15] │ <Io>wa
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ L<oui>s<ia>na
## [19] │ M<ai>ne
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Miss<ou>ri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ New Hampshire
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolina
## [34] │ North Dakota
## [35] │ Oh<io>
## [36] │ Oklahoma
## [37] │ Oregon
## [38] │ Pennsylvan<ia>
## [39] │ Rhode Island
## [40] │ S<ou>th Carolina
## [41] │ S<ou>th Dakota
## [42] │ Tenness<ee>
## [43] │ Texas
## [44] │ Utah
## [45] │ Vermont
## [46] │ Virgin<ia>
## [47] │ Washington
## [48] │ West Virgin<ia>
## [49] │ Wisconsin
## [50] │ Wyoming

Similar to the previous question, two brackets including the characters I wanted were specified. Then the reptition meta-character “+” was used to instruct R to choose items that were matched 1 or more times, as the assignment requested.

Question 12

str_view_all(data, "^[AEIOU].*[bcdfghjklmnpqrstvwxyz]$") %>% 
  print(n=50)

##  [1] │ Alabama
##  [2] │ Alaska
##  [3] │ Arizona
##  [4] │ <Arkansas>
##  [5] │ California
##  [6] │ Colorado
##  [7] │ Connecticut
##  [8] │ Delaware
##  [9] │ Florida
## [10] │ Georgia
## [11] │ Hawaii
## [12] │ Idaho
## [13] │ <Illinois>
## [14] │ Indiana
## [15] │ Iowa
## [16] │ Kansas
## [17] │ Kentucky
## [18] │ Louisiana
## [19] │ Maine
## [20] │ Maryland
## [21] │ Massachusetts
## [22] │ Michigan
## [23] │ Minnesota
## [24] │ Mississippi
## [25] │ Missouri
## [26] │ Montana
## [27] │ Nebraska
## [28] │ Nevada
## [29] │ New Hampshire
## [30] │ New Jersey
## [31] │ New Mexico
## [32] │ New York
## [33] │ North Carolina
## [34] │ North Dakota
## [35] │ Ohio
## [36] │ Oklahoma
## [37] │ <Oregon>
## [38] │ Pennsylvania
## [39] │ Rhode Island
## [40] │ South Carolina
## [41] │ South Dakota
## [42] │ Tennessee
## [43] │ Texas
## [44] │ <Utah>
## [45] │ Vermont
## [46] │ Virginia
## [47] │ Washington
## [48] │ West Virginia
## [49] │ Wisconsin
## [50] │ Wyoming

Arkansas, Illinois, Oregon and Utah were identified as the only 4 states whose names began with a vowel and ended with a consonant.

Assignment10 Rmarkdown

ChanKim

2023-11-15