#1. Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset

Load the dataset:

majors <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv")

# Filter majors that contain "DATA" or "STATISTICS" (case insensitive)

filtered_majors <- majors[grep("DATA|STATISTICS", trimws(majors$Major), ignore.case = TRUE), ]
print(filtered_majors)

##    FOD1P                                         Major          Major_Category
## 44  6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS                Business
## 52  2101      COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 59  3702               STATISTICS AND DECISION SCIENCE Computers & Mathematics

#2 Write code that transforms the data below:

# Original vector of fruits
fruits <- c("bell pepper", "bilberry", "blackberry", "blood orange",
            "blueberry", "cantaloupe", "chili pepper", "cloudberry", 
            "elderberry", "lime", "lychee", "mulberry", 
            "olive", "salal berry")

# Transform the list into the desired format
formatted_fruits <- paste0('c(', paste(shQuote(fruits), collapse = ", "), ')')

# Output the formatted list
cat(formatted_fruits)

## c("bell pepper", "bilberry", "blackberry", "blood orange", "blueberry", "cantaloupe", "chili pepper", "cloudberry", "elderberry", "lime", "lychee", "mulberry", "olive", "salal berry")

3 Describe, in words, what these expressions will match:

Here’s a description of what each regular expression will match:

(.)\1\1: This expression matches any single character (represented by (.)) followed by the same character repeated twice (using \1\1 to refer to the first captured character).Essentially, it will match any three consecutive identical characters.Example: “aaa”, “bbb”, “111”, etc.

“(.)(.)\2\1”: This matches two characters (.) and (.), followed by the second captured character \2 and then the first captured character \1. So, it will match any pair of characters where the second character is repeated first, and the first character is repeated second.Example: “abab”, “cdcd”, etc.

(..)\1: This matches a pair of characters (..), followed by the same exact pair of characters again \1. It will match any four characters that consist of two identical pairs. Example: “aabb”, “1212”, “xyxy”, etc.

“(.).\1.\1”: This matches any single character (.), followed by any character (denoted by .), then the same character as the first one \1, another character, and finally the first character \1 again. Example: “ababa”, “cdcdc”, etc.

“(.)(.)(.).\3\2\1”: This matches three characters, represented by (.)(.)(.), followed by any number of characters ., and then it expects the third captured character \3, followed by the second captured character \2, and finally the first captured character \1. So, it will match a sequence of three characters followed by the reverse order of these three characters.Example:”abc…cba”, “xyz…zyx”, etc.

#4 Construct regular expressions to match words that:

Start and end with the same character.

Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)

Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)

Words that start and end with the same character:

Regular expression: ^().*\1$

Explanation:

^ asserts the start of the string.

() captures the first character (using to match any alphanumeric character or underscore).

.* matches any number of characters (including none) in between.

\1 ensures that the last character matches the first captured character.

$ asserts the end of the string.

Example matches: “madam”, “level”, “radar”.

Words that contain a repeated pair of letters (e.g., “church” contains “ch” repeated twice):

Regular expression: (.)\1.*\2\1

Explanation:

(.) captures any character.

\1 ensures that the first character is repeated right after.

.* matches any number of characters in between.

\2 captures the second pair of letters and repeats it using \1.

Example matches: “church”, “deeded”, “abccba”.

3. Words that contain one letter repeated in at least three places:

Regular expression: .().?\1.?\1.

Explanation:

.* allows any number of characters before the repeated letter.

() captures any alphanumeric character.

.*? matches any number of characters (non-greedy).

\1 ensures that the captured character repeats later in the string.

The third \1 ensures that the same character appears at least three times in the word.

Example matches: “eleven”, “banana”, “mississippi”.

Assignment 3

Mohammad Zahid Chowdhury

02-13-2025

Load the dataset:

3 Describe, in words, what these expressions will match:

Here’s a description of what each regular expression will match:

(.)\1\1: This expression matches any single character (represented by (.)) followed by the same character repeated twice (using \1\1 to refer to the first captured character).Essentially, it will match any three consecutive identical characters.Example: “aaa”, “bbb”, “111”, etc.

(..)\1: This matches a pair of characters (..), followed by the same exact pair of characters again \1. It will match any four characters that consist of two identical pairs. Example: “aabb”, “1212”, “xyxy”, etc.

“(.).\1.\1”: This matches any single character (.), followed by any character (denoted by .), then the same character as the first one \1, another character, and finally the first character \1 again. Example: “ababa”, “cdcdc”, etc.

Start and end with the same character.

Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)

Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)

Words that start and end with the same character:

Regular expression: ^().*\1$

Explanation:

^ asserts the start of the string.

() captures the first character (using to match any alphanumeric character or underscore).

.* matches any number of characters (including none) in between.

\1 ensures that the last character matches the first captured character.

$ asserts the end of the string.

Example matches: “madam”, “level”, “radar”.

Words that contain a repeated pair of letters (e.g., “church” contains “ch” repeated twice):

Regular expression: (.)\1.*\2\1

Explanation:

(.) captures any character.

\1 ensures that the first character is repeated right after.

.* matches any number of characters in between.

\2 captures the second pair of letters and repeats it using \1.

Example matches: “church”, “deeded”, “abccba”.

3. Words that contain one letter repeated in at least three places:

Regular expression: .().?\1.?\1.

Explanation:

.* allows any number of characters before the repeated letter.

() captures any alphanumeric character.

.*? matches any number of characters (non-greedy).

\1 ensures that the captured character repeats later in the string.

The third \1 ensures that the same character appears at least three times in the word.

Example matches: “eleven”, “banana”, “mississippi”.