For the following regular expression, explain in words what it matches on. Then add test strings to demonstrate that it in fact does match on the pattern you claim it does. Make sure that your test set of strings has several examples that match as well as several that do not.
## string result
## 1 cat TRUE
## 2 dog FALSE
## 3 flagstaff TRUE
## 4 mood FALSE
## string result
## 1 cab TRUE
## 2 comfy FALSE
## 3 abstract TRUE
## 4 bad FALSE
## 5 amber FALSE
## string result
## 1 cab TRUE
## 2 comfy FALSE
## 3 bad TRUE
## 4 camper TRUE
## 5 amber TRUE
## 6 bud TRUE
## string result
## 1 cab FALSE
## 2 comfy FALSE
## 3 bad TRUE
## 4 camper FALSE
## 5 bud TRUE
## 6 milk FALSE
## string result
## 1 Blue Flora15 FALSE
## 2 66 art TRUE
## 3 99 Art TRUE
## 4 22 apen15 FALSE
## 5 2 abc TRUE
## 6 Albuquerque FALSE
## 7 camper FALSE
## 8 food FALSE
## string result
## 1 11 a TRUE
## 2 11abcd TRUE
## 3 4Abc TRUE
## 4 423 Abc TRUE
## 5 pen15 FALSE
## 6 Albuquerque FALSE
## 7 11 afjkdla TRUE
## 8 food FALSE
## string result
## 1 aaaa TRUE
## 2 bad TRUE
## 3 TRUE
## 4 TRUE
## 5 <NA> NA
## string result
## 1 aabar TRUE
## 2 11barkerfluff TRUE
## 3 abar FALSE
## 4 1baar FALSE
## 5 2abar TRUE
## 6 $$bar FALSE
## 7 $aabar FALSE
## string result
## 1 foo.bar TRUE
## 2 foo1bar FALSE
## 3 foodbar FALSE
## 4 44barry TRUE
## 5 twenty FALSE
## 6 abbar TRUE
## 7 1abar TRUE
The following file names were used in a camera trap study. The S number represents the site, P is the plot within a site, C is the camera number within the plot, the first string of numbers is the YearMonthDay and the second string of numbers is the HourMinuteSecond.
Produce a data frame with columns corresponding to the site, plot, camera, year, month, day, hour, minute, and second for these three file names.
## site plot camera year month day hour minute second
## 1 S123 P2 C10 2012 06 21 21 34 22
## 2 S10 P1 C1 2012 06 22 05 01 48
## 3 S187 P2 C2 2012 07 02 02 35 01
The full text from Lincoln’s Gettysburg Address is given below. Calculate the mean word length. Note: consider ‘battle-field’ as one word with 11 letters.
| mean.wordlength |
|---|
| 4.239852 |
Variable names in R may be any combination of letters, digits, period, and underscore. However, they may not start with a digit and if they start with a period, they must not be followed by a digit.
The first four are valid variable names, but the last four are not.
data.frame( string=strings ) %>%
mutate( result = str_detect(string, '^[a-zA-Z_](\\w|\\.|_)*$' ))
## string result
## 1 foo15 TRUE
## 2 Bar TRUE
## 3 .resid FALSE
## 4 _14s TRUE
## 5 99_Bottles FALSE
## 6 .9Arggh FALSE
## 7 Foo! FALSE
## 8 HIV Rate FALSE
## 9 abc_def TRUE
#this accommodates any order of digits/letters/periods/underscores
data.frame( string=strings ) %>%
mutate( result = str_detect(string, '^[a-zA-Z_\\.](\\w|\\.|_)*$' ))
## string result
## 1 foo15 TRUE
## 2 Bar TRUE
## 3 .resid TRUE
## 4 _14s TRUE
## 5 99_Bottles FALSE
## 6 .9Arggh TRUE
## 7 Foo! FALSE
## 8 HIV Rate FALSE
## 9 abc_def TRUE