This assignment goes over normalization and character manipulation examples
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(babynames)
Let’s create normalized dataframes for gym data (this data is synthetic):
# gym trainers and which class they teach
trainer_classes <- data.frame (
TrainerID = c("T01", "T02", "T03", "T03", "T03", "T04", "T04", "T05", "T06", "T07", "T08", "T08"),
ClassID = c("C03", "C01", "C01", "C02", "C03", "C03", "C04", "C05", "C04", "C02", "C03", "C05")
)
trainer_classes
## TrainerID ClassID
## 1 T01 C03
## 2 T02 C01
## 3 T03 C01
## 4 T03 C02
## 5 T03 C03
## 6 T04 C03
## 7 T04 C04
## 8 T05 C05
## 9 T06 C04
## 10 T07 C02
## 11 T08 C03
## 12 T08 C05
trainers_locations <- data.frame(
TrainerID = c("T01", "T02", "T03", "T04", "T05", "T06", "T07", "T08"),
Name = c("Steve", "Sara", "Bill", "Bill", "Bill", "Rob", "Rob", "Tina"),
LocationID = c("10", "10", "10", "11", "12", "13", "11", "12")
)
trainers_locations
## TrainerID Name LocationID
## 1 T01 Steve 10
## 2 T02 Sara 10
## 3 T03 Bill 10
## 4 T04 Bill 11
## 5 T05 Bill 12
## 6 T06 Rob 13
## 7 T07 Rob 11
## 8 T08 Tina 12
classes <- data.frame (
ClassID = c("C01", "C02", "C03", "C04", "C05"),
Class = c("Pilates", "Yoga", "Weights", "Cycling", "Treadmill")
)
classes
## ClassID Class
## 1 C01 Pilates
## 2 C02 Yoga
## 3 C03 Weights
## 4 C04 Cycling
## 5 C05 Treadmill
locations <- data.frame (
LocationID = c("10", "11", "12", "13"),
Location = c("Tribeca", "Williamsburg", "UES", "FiDi")
)
locations
## LocationID Location
## 1 10 Tribeca
## 2 11 Williamsburg
## 3 12 UES
## 4 13 FiDi
These dataframes are:
Having the different dataframes is important here because trainers can teach multiple classes at multiple locations.
Having tables with an ID and just one feature column is beneficial because each attribute depends only on the primary key, which is the ID in this case. In the real world, datasets can be very complicated with many tables, so it is easier to be able to refer back to a single table if there is a certain part of the data that needs to be looked into. Overall, it makes the data easier to understand.
Additionally, having normalized tables protects the data from insertion, update and deletion anomalies. If you changed a feature in the main table, you could risk missing updating critical rows or parts of the data, causing the data to be inaccurate or not make any sense (update anomaly). With normalized tables, there is no chance of repeating data or data that contradicts itself. Also, if you delete a row in the main table, you could risk removing critical data permanently (deletion anomaly). You could also risk not adding/inserting critical data because it doesn’t exactly fit the features in the main table (insertion anomaly). For example, if a new trainer is hired at a gym location, but hasn’t been assign a class yet, it’s still important to add the new trainer’s info to the dataset. This is only possible by having a separate table dedicated for trainers. If the main table contained everything, the new trainer wouldn’t fit the criteria of the main table.
With normalized tables, you can easily navigate the data, and build upon it.
Load the data:
majors_list_df <- read.csv(url("https://raw.githubusercontent.com/fivethirtyeight/data/refs/heads/master/college-majors/majors-list.csv"))
majors_list_df
## FOD1P Major
## 1 1100 GENERAL AGRICULTURE
## 2 1101 AGRICULTURE PRODUCTION AND MANAGEMENT
## 3 1102 AGRICULTURAL ECONOMICS
## 4 1103 ANIMAL SCIENCES
## 5 1104 FOOD SCIENCE
## 6 1105 PLANT SCIENCE AND AGRONOMY
## 7 1106 SOIL SCIENCE
## 8 1199 MISCELLANEOUS AGRICULTURE
## 9 1302 FORESTRY
## 10 1303 NATURAL RESOURCES MANAGEMENT
## 11 6000 FINE ARTS
## 12 6001 DRAMA AND THEATER ARTS
## 13 6002 MUSIC
## 14 6003 VISUAL AND PERFORMING ARTS
## 15 6004 COMMERCIAL ART AND GRAPHIC DESIGN
## 16 6005 FILM VIDEO AND PHOTOGRAPHIC ARTS
## 17 6007 STUDIO ARTS
## 18 6099 MISCELLANEOUS FINE ARTS
## 19 1301 ENVIRONMENTAL SCIENCE
## 20 3600 BIOLOGY
## 21 3601 BIOCHEMICAL SCIENCES
## 22 3602 BOTANY
## 23 3603 MOLECULAR BIOLOGY
## 24 3604 ECOLOGY
## 25 3605 GENETICS
## 26 3606 MICROBIOLOGY
## 27 3607 PHARMACOLOGY
## 28 3608 PHYSIOLOGY
## 29 3609 ZOOLOGY
## 30 3611 NEUROSCIENCE
## 31 3699 MISCELLANEOUS BIOLOGY
## 32 4006 COGNITIVE SCIENCE AND BIOPSYCHOLOGY
## 33 6200 GENERAL BUSINESS
## 34 6201 ACCOUNTING
## 35 6202 ACTUARIAL SCIENCE
## 36 6203 BUSINESS MANAGEMENT AND ADMINISTRATION
## 37 6204 OPERATIONS LOGISTICS AND E-COMMERCE
## 38 6205 BUSINESS ECONOMICS
## 39 6206 MARKETING AND MARKETING RESEARCH
## 40 6207 FINANCE
## 41 6209 HUMAN RESOURCES AND PERSONNEL MANAGEMENT
## 42 6210 INTERNATIONAL BUSINESS
## 43 6211 HOSPITALITY MANAGEMENT
## 44 6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS
## 45 6299 MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION
## 46 1901 COMMUNICATIONS
## 47 1902 JOURNALISM
## 48 1903 MASS MEDIA
## 49 1904 ADVERTISING AND PUBLIC RELATIONS
## 50 2001 COMMUNICATION TECHNOLOGIES
## 51 2100 COMPUTER AND INFORMATION SYSTEMS
## 52 2101 COMPUTER PROGRAMMING AND DATA PROCESSING
## 53 2102 COMPUTER SCIENCE
## 54 2105 INFORMATION SCIENCES
## 55 2106 COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY
## 56 2107 COMPUTER NETWORKING AND TELECOMMUNICATIONS
## 57 3700 MATHEMATICS
## 58 3701 APPLIED MATHEMATICS
## 59 3702 STATISTICS AND DECISION SCIENCE
## 60 4005 MATHEMATICS AND COMPUTER SCIENCE
## 61 2300 GENERAL EDUCATION
## 62 2301 EDUCATIONAL ADMINISTRATION AND SUPERVISION
## 63 2303 SCHOOL STUDENT COUNSELING
## 64 2304 ELEMENTARY EDUCATION
## 65 2305 MATHEMATICS TEACHER EDUCATION
## 66 2306 PHYSICAL AND HEALTH EDUCATION TEACHING
## 67 2307 EARLY CHILDHOOD EDUCATION
## 68 2308 SCIENCE AND COMPUTER TEACHER EDUCATION
## 69 2309 SECONDARY TEACHER EDUCATION
## 70 2310 SPECIAL NEEDS EDUCATION
## 71 2311 SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION
## 72 2312 TEACHER EDUCATION: MULTIPLE LEVELS
## 73 2313 LANGUAGE AND DRAMA EDUCATION
## 74 2314 ART AND MUSIC EDUCATION
## 75 2399 MISCELLANEOUS EDUCATION
## 76 3501 LIBRARY SCIENCE
## 77 1401 ARCHITECTURE
## 78 2400 GENERAL ENGINEERING
## 79 2401 AEROSPACE ENGINEERING
## 80 2402 BIOLOGICAL ENGINEERING
## 81 2403 ARCHITECTURAL ENGINEERING
## 82 2404 BIOMEDICAL ENGINEERING
## 83 2405 CHEMICAL ENGINEERING
## 84 2406 CIVIL ENGINEERING
## 85 2407 COMPUTER ENGINEERING
## 86 2408 ELECTRICAL ENGINEERING
## 87 2409 ENGINEERING MECHANICS PHYSICS AND SCIENCE
## 88 2410 ENVIRONMENTAL ENGINEERING
## 89 2411 GEOLOGICAL AND GEOPHYSICAL ENGINEERING
## 90 2412 INDUSTRIAL AND MANUFACTURING ENGINEERING
## 91 2413 MATERIALS ENGINEERING AND MATERIALS SCIENCE
## 92 2414 MECHANICAL ENGINEERING
## 93 2415 METALLURGICAL ENGINEERING
## 94 2416 MINING AND MINERAL ENGINEERING
## 95 2417 NAVAL ARCHITECTURE AND MARINE ENGINEERING
## 96 2418 NUCLEAR ENGINEERING
## 97 2419 PETROLEUM ENGINEERING
## 98 2499 MISCELLANEOUS ENGINEERING
## 99 2500 ENGINEERING TECHNOLOGIES
## 100 2501 ENGINEERING AND INDUSTRIAL MANAGEMENT
## 101 2502 ELECTRICAL ENGINEERING TECHNOLOGY
## 102 2503 INDUSTRIAL PRODUCTION TECHNOLOGIES
## 103 2504 MECHANICAL ENGINEERING RELATED TECHNOLOGIES
## 104 2599 MISCELLANEOUS ENGINEERING TECHNOLOGIES
## 105 5008 MATERIALS SCIENCE
## 106 4002 NUTRITION SCIENCES
## 107 6100 GENERAL MEDICAL AND HEALTH SERVICES
## 108 6102 COMMUNICATION DISORDERS SCIENCES AND SERVICES
## 109 6103 HEALTH AND MEDICAL ADMINISTRATIVE SERVICES
## 110 6104 MEDICAL ASSISTING SERVICES
## 111 6105 MEDICAL TECHNOLOGIES TECHNICIANS
## 112 6106 HEALTH AND MEDICAL PREPARATORY PROGRAMS
## 113 6107 NURSING
## 114 6108 PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION
## 115 6109 TREATMENT THERAPY PROFESSIONS
## 116 6110 COMMUNITY AND PUBLIC HEALTH
## 117 6199 MISCELLANEOUS HEALTH MEDICAL PROFESSIONS
## 118 1501 AREA ETHNIC AND CIVILIZATION STUDIES
## 119 2601 LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE
## 120 2602 FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES
## 121 2603 OTHER FOREIGN LANGUAGES
## 122 3301 ENGLISH LANGUAGE AND LITERATURE
## 123 3302 COMPOSITION AND RHETORIC
## 124 3401 LIBERAL ARTS
## 125 3402 HUMANITIES
## 126 4001 INTERCULTURAL AND INTERNATIONAL STUDIES
## 127 4801 PHILOSOPHY AND RELIGIOUS STUDIES
## 128 4901 THEOLOGY AND RELIGIOUS VOCATIONS
## 129 5502 ANTHROPOLOGY AND ARCHEOLOGY
## 130 6006 ART HISTORY AND CRITICISM
## 131 6402 HISTORY
## 132 6403 UNITED STATES HISTORY
## 133 2201 COSMETOLOGY SERVICES AND CULINARY ARTS
## 134 2901 FAMILY AND CONSUMER SCIENCES
## 135 3801 MILITARY TECHNOLOGIES
## 136 4101 PHYSICAL FITNESS PARKS RECREATION AND LEISURE
## 137 5601 CONSTRUCTION SERVICES
## 138 5701 ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION
## 139 5901 TRANSPORTATION SCIENCES AND TECHNOLOGIES
## 140 4000 MULTI/INTERDISCIPLINARY STUDIES
## 141 3201 COURT REPORTING
## 142 3202 PRE-LAW AND LEGAL STUDIES
## 143 5301 CRIMINAL JUSTICE AND FIRE PROTECTION
## 144 5401 PUBLIC ADMINISTRATION
## 145 5402 PUBLIC POLICY
## 146 bbbb N/A (less than bachelor's degree)
## 147 5000 PHYSICAL SCIENCES
## 148 5001 ASTRONOMY AND ASTROPHYSICS
## 149 5002 ATMOSPHERIC SCIENCES AND METEOROLOGY
## 150 5003 CHEMISTRY
## 151 5004 GEOLOGY AND EARTH SCIENCE
## 152 5005 GEOSCIENCES
## 153 5006 OCEANOGRAPHY
## 154 5007 PHYSICS
## 155 5098 MULTI-DISCIPLINARY OR GENERAL SCIENCE
## 156 5102 NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES
## 157 5200 PSYCHOLOGY
## 158 5201 EDUCATIONAL PSYCHOLOGY
## 159 5202 CLINICAL PSYCHOLOGY
## 160 5203 COUNSELING PSYCHOLOGY
## 161 5205 INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY
## 162 5206 SOCIAL PSYCHOLOGY
## 163 5299 MISCELLANEOUS PSYCHOLOGY
## 164 5403 HUMAN SERVICES AND COMMUNITY ORGANIZATION
## 165 5404 SOCIAL WORK
## 166 4007 INTERDISCIPLINARY SOCIAL SCIENCES
## 167 5500 GENERAL SOCIAL SCIENCES
## 168 5501 ECONOMICS
## 169 5503 CRIMINOLOGY
## 170 5504 GEOGRAPHY
## 171 5505 INTERNATIONAL RELATIONS
## 172 5506 POLITICAL SCIENCE AND GOVERNMENT
## 173 5507 SOCIOLOGY
## 174 5599 MISCELLANEOUS SOCIAL SCIENCES
## Major_Category
## 1 Agriculture & Natural Resources
## 2 Agriculture & Natural Resources
## 3 Agriculture & Natural Resources
## 4 Agriculture & Natural Resources
## 5 Agriculture & Natural Resources
## 6 Agriculture & Natural Resources
## 7 Agriculture & Natural Resources
## 8 Agriculture & Natural Resources
## 9 Agriculture & Natural Resources
## 10 Agriculture & Natural Resources
## 11 Arts
## 12 Arts
## 13 Arts
## 14 Arts
## 15 Arts
## 16 Arts
## 17 Arts
## 18 Arts
## 19 Biology & Life Science
## 20 Biology & Life Science
## 21 Biology & Life Science
## 22 Biology & Life Science
## 23 Biology & Life Science
## 24 Biology & Life Science
## 25 Biology & Life Science
## 26 Biology & Life Science
## 27 Biology & Life Science
## 28 Biology & Life Science
## 29 Biology & Life Science
## 30 Biology & Life Science
## 31 Biology & Life Science
## 32 Biology & Life Science
## 33 Business
## 34 Business
## 35 Business
## 36 Business
## 37 Business
## 38 Business
## 39 Business
## 40 Business
## 41 Business
## 42 Business
## 43 Business
## 44 Business
## 45 Business
## 46 Communications & Journalism
## 47 Communications & Journalism
## 48 Communications & Journalism
## 49 Communications & Journalism
## 50 Computers & Mathematics
## 51 Computers & Mathematics
## 52 Computers & Mathematics
## 53 Computers & Mathematics
## 54 Computers & Mathematics
## 55 Computers & Mathematics
## 56 Computers & Mathematics
## 57 Computers & Mathematics
## 58 Computers & Mathematics
## 59 Computers & Mathematics
## 60 Computers & Mathematics
## 61 Education
## 62 Education
## 63 Education
## 64 Education
## 65 Education
## 66 Education
## 67 Education
## 68 Education
## 69 Education
## 70 Education
## 71 Education
## 72 Education
## 73 Education
## 74 Education
## 75 Education
## 76 Education
## 77 Engineering
## 78 Engineering
## 79 Engineering
## 80 Engineering
## 81 Engineering
## 82 Engineering
## 83 Engineering
## 84 Engineering
## 85 Engineering
## 86 Engineering
## 87 Engineering
## 88 Engineering
## 89 Engineering
## 90 Engineering
## 91 Engineering
## 92 Engineering
## 93 Engineering
## 94 Engineering
## 95 Engineering
## 96 Engineering
## 97 Engineering
## 98 Engineering
## 99 Engineering
## 100 Engineering
## 101 Engineering
## 102 Engineering
## 103 Engineering
## 104 Engineering
## 105 Engineering
## 106 Health
## 107 Health
## 108 Health
## 109 Health
## 110 Health
## 111 Health
## 112 Health
## 113 Health
## 114 Health
## 115 Health
## 116 Health
## 117 Health
## 118 Humanities & Liberal Arts
## 119 Humanities & Liberal Arts
## 120 Humanities & Liberal Arts
## 121 Humanities & Liberal Arts
## 122 Humanities & Liberal Arts
## 123 Humanities & Liberal Arts
## 124 Humanities & Liberal Arts
## 125 Humanities & Liberal Arts
## 126 Humanities & Liberal Arts
## 127 Humanities & Liberal Arts
## 128 Humanities & Liberal Arts
## 129 Humanities & Liberal Arts
## 130 Humanities & Liberal Arts
## 131 Humanities & Liberal Arts
## 132 Humanities & Liberal Arts
## 133 Industrial Arts & Consumer Services
## 134 Industrial Arts & Consumer Services
## 135 Industrial Arts & Consumer Services
## 136 Industrial Arts & Consumer Services
## 137 Industrial Arts & Consumer Services
## 138 Industrial Arts & Consumer Services
## 139 Industrial Arts & Consumer Services
## 140 Interdisciplinary
## 141 Law & Public Policy
## 142 Law & Public Policy
## 143 Law & Public Policy
## 144 Law & Public Policy
## 145 Law & Public Policy
## 146 <NA>
## 147 Physical Sciences
## 148 Physical Sciences
## 149 Physical Sciences
## 150 Physical Sciences
## 151 Physical Sciences
## 152 Physical Sciences
## 153 Physical Sciences
## 154 Physical Sciences
## 155 Physical Sciences
## 156 Physical Sciences
## 157 Psychology & Social Work
## 158 Psychology & Social Work
## 159 Psychology & Social Work
## 160 Psychology & Social Work
## 161 Psychology & Social Work
## 162 Psychology & Social Work
## 163 Psychology & Social Work
## 164 Psychology & Social Work
## 165 Psychology & Social Work
## 166 Social Science
## 167 Social Science
## 168 Social Science
## 169 Social Science
## 170 Social Science
## 171 Social Science
## 172 Social Science
## 173 Social Science
## 174 Social Science
List out the majors containing “DATA” or “STATISTICS”:
str_view(majors_list_df$Major, "DATA|STATISTICS")
## [44] │ MANAGEMENT INFORMATION SYSTEMS AND <STATISTICS>
## [52] │ COMPUTER PROGRAMMING AND <DATA> PROCESSING
## [59] │ <STATISTICS> AND DECISION SCIENCE
As seen above, the results are:
The two exercises below are taken from R for Data Science, 14.3.5.1 in the on-line version:
(.)\1\1
This is a regular expression, so to turn it into a string defining
the regex, we must add a \
before each \
:
str_view("aaaabbc", "(.)\\1\\1")
## [1] │ <aaa>abbc
str_view("1111", "(.)\\1\\1")
## [1] │ <111>1
This will match any character repeated 3 times in a row such as “aaa”, “111”, etc.
"(.)(.)\\2\\1"
This is a string defining a regular expression, so we can just throw
this into the str_view
function:
str_view(fruit, "(.)(.)\\2\\1")
## [5] │ bell p<eppe>r
## [17] │ chili p<eppe>r
str_view("aaaabbc", "(.)(.)\\2\\1")
## [1] │ <aaaa>bbc
str_view("11111", "(.)(.)\\2\\1")
## [1] │ <1111>1
This will match a pair of characters immediately followed by the same pair of characters but reversed such as “ep” followed by “pe”, or “11” followed by “11”.
(..)\1
This is a regular expression, so to turn it into a string defining
the regex, we must add a \
before each \
:
str_view(fruit, "(..)\\1")
## [4] │ b<anan>a
## [20] │ <coco>nut
## [22] │ <cucu>mber
## [41] │ <juju>be
## [56] │ <papa>ya
## [73] │ s<alal> berry
str_view("aaaabbc", "(..)\\1")
## [1] │ <aaaa>bbc
str_view("1111111", "(..)\\1")
## [1] │ <1111>111
This will match a repeated pair of characters such as “anan” or “1111”.
"(.).\\1.\\1"
This is a string defining a regular expression, so we can just throw
this into the str_view
function:
str_view(fruit, "(.).\\1.\\1")
## [4] │ b<anana>
## [56] │ p<apaya>
str_view("11111", "(.).\\1.\\1")
## [1] │ <11111>
str_view("121314", "(.).\\1.\\1")
## [1] │ <12131>4
This will match a character repeated in three places, separated by any single character such as “12131” in “121314”.
"(.)(.)(.).*\\3\\2\\1"
This is a string defining a regular expression, so we can just throw
this into the str_view
function:
str_view(sentences, "(.)(.)(.).*\\3\\2\\1")
## [4] │ These days< a chicken leg is a >rare dish.
## [10] │ A large< size in stockings is >hard to sell.
## [14] │ Kick< the ball straight >and follow through.
## [16] │ A p<ot of tea helps to> pass the evening.
## [22] │ The fis<h twisted and turned on the bent h>ook.
## [28] │ The colt rea<red and threw the tall rider>.
## [57] │ Marc<h the soldiers past the next h>ill.
## [67] │ The set of chin<a hit the floor with a> crash.
## [68] │ This is a grand< season for hikes >on the road.
## [71] │ A yac<ht slid around the point into th>e bay.
## [83] │ Th<ere are more than two factors here>.
## [97] │ The term< ended in late june >that year.
## [101] │ Oak i<s strong and also gives s>hade.
## [105] │ Add the sum t<o the product o>f these three.
## [117] │ Weave< the carpet on the right >hand side.
## [118] │ Hemp is a weed< found in parts of >the tropics.
## [122] │ The harder he trie<d the less he got d>one.
## [131] │ A cramp is< no small danger on >a swim.
## [133] │ Pluck< the bright >rose without leaves.
## [135] │ The glow< deepened >in the eyes of the sweet girl.
## ... and 112 more
str_view("12345678.321", "(.)(.)(.).*\\3\\2\\1")
## [1] │ <12345678.321>
This will match strings that start and end with the same 3
characters, but the end pattern is reversed, such as
the carpet on the right
(th
and
ht
).
4.1 Start and end with the same character.
^(.).*\1$
or "^(.).*\\1$"
str_view(words, "^(.).*\\1$")
## [36] │ <america>
## [49] │ <area>
## [209] │ <dad>
## [213] │ <dead>
## [223] │ <depend>
## [258] │ <educate>
## [266] │ <else>
## [268] │ <encourage>
## [270] │ <engine>
## [278] │ <europe>
## [283] │ <evidence>
## [285] │ <example>
## [287] │ <excuse>
## [288] │ <exercise>
## [291] │ <expense>
## [292] │ <experience>
## [296] │ <eye>
## [386] │ <health>
## [394] │ <high>
## [450] │ <knock>
## ... and 16 more
Since we want to match the start and end of the string we use
^
and $
, and the middle regex captures a
character followed by the same character with any character in
between.
4.2 Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)
(..).*\1
or "(..).*\\1"
str_view(words, "(..).*\\1")
## [48] │ ap<propr>iate
## [152] │ <church>
## [181] │ c<ondition>
## [217] │ <decide>
## [275] │ <environmen>t
## [487] │ l<ondon>
## [598] │ pa<ragra>ph
## [603] │ p<articular>
## [617] │ <photograph>
## [638] │ p<repare>
## [641] │ p<ressure>
## [696] │ r<emem>ber
## [698] │ <repre>sent
## [699] │ <require>
## [739] │ <sense>
## [858] │ the<refore>
## [903] │ u<nderstand>
## [946] │ w<hethe>r
str_view(fruit, "(..).*\\1")
## [4] │ b<anan>a
## [5] │ bell <peppe>r
## [17] │ chili <peppe>r
## [20] │ <coco>nut
## [22] │ <cucu>mber
## [29] │ eld<erber>ry
## [41] │ <juju>be
## [51] │ <nectarine>
## [56] │ <papa>ya
## [73] │ s<alal> berry
This will capture a pair of characters repeated, with any amount of characters in between the 2 pairs.
4.3 Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
(.).*\1.*\1
or "(.).*\\1.*\\1"
str_view(words, "(.).*\\1.*\\1")
## [48] │ a<pprop>riate
## [62] │ <availa>ble
## [86] │ b<elieve>
## [90] │ b<etwee>n
## [119] │ bu<siness>
## [221] │ d<egree>
## [229] │ diff<erence>
## [233] │ di<scuss>
## [265] │ <eleve>n
## [275] │ e<nvironmen>t
## [283] │ <evidence>
## [288] │ <exercise>
## [291] │ <expense>
## [292] │ <experience>
## [423] │ <indivi>dual
## [598] │ p<aragra>ph
## [684] │ r<eceive>
## [696] │ r<emembe>r
## [698] │ r<eprese>nt
## [845] │ t<elephone>
## ... and 2 more
str_view("aaba", "(.).*\\1.*\\1")
## [1] │ <aaba>
This will capture a character repeated 3 times, with any amount of characters in between each character.