Normalization

Dataframe 1- Patient list with ID # and health insurance

patient_list <- data.frame(
  patient_ID = c( 1, 2, 3, 4, 5),
  last_name = c("Jones", "Green", "Murphy", "Brown", "Davis"),
  health_ins = c("Aetna", "Aetna", "Cigna", "BCBS", "EmblemHealth"),
  stringsAsFactors = FALSE
)
print(patient_list)
##   patient_ID last_name   health_ins
## 1          1     Jones        Aetna
## 2          2     Green        Aetna
## 3          3    Murphy        Cigna
## 4          4     Brown         BCBS
## 5          5     Davis EmblemHealth

Dataframe 2- Provider information with provider ID, and specialty

provider_info <- data.frame(
  provider_ID = c(12, 13, 14, 15),
  provider_lastname = c("Keller", "Johnson", "Stuart", "Williams"),
  provider_specialty = c("PCP", "OBGYN", "PED", "Internalist"),
  provider_credential = c("MD", "MD", "MD", "NP"),
  stringsAsFactors = FALSE
)
print(provider_info)
##   provider_ID provider_lastname provider_specialty provider_credential
## 1          12            Keller                PCP                  MD
## 2          13           Johnson              OBGYN                  MD
## 3          14            Stuart                PED                  MD
## 4          15          Williams        Internalist                  NP

Dataframe 3- Primary provider of patients listed in dataframe 1

primary_provider <- data.frame(
  patient_ID = c( 1, 2, 3, 4, 5),
  provider_ID = c(1, 2, 2, 3, 4),
  stringsAsFactors = FALSE
)
print(primary_provider)
##   patient_ID provider_ID
## 1          1           1
## 2          2           2
## 3          3           2
## 4          4           3
## 5          5           4

Character Manipulation

major_data <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/refs/heads/master/college-majors/majors-list.csv")
major_datastat <- major_data[grep("DATA|STATISTICS", major_data$Major, ignore.case = TRUE), ]

major_datastat
##    FOD1P                                         Major          Major_Category
## 44  6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS                Business
## 52  2101      COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 59  3702               STATISTICS AND DECISION SCIENCE Computers & Mathematics

Describe in words, what these expressions will match

(.)\1\1 :

“(.)” signifies a single expression. “\1” signifies matching the expression. Together, “(.)\1\1” signifies a single expression, matching that single expression two times.

For example: “xxx”, or “yyy”

(.)(.)\2\1 :

As mentioned previously, “(.)” signifies a single or primary expression. The succeeding “(.)” signifies an additional secondary expression. “\2” signifies a match for the secondary expression, while the “\1” signifies matching the primary expression.

For example: “xyyx”, with x being the primary expression and y being the secondary expression.

(..)\1 :

The “(..)” character signifies any two expressions and puts them in a singular group. As mentioned previously, the “\1” character matches the single expression. This would ultimately give us the following results.

For example: “xyxy”

(.).\1.\1

From these characters, we will receive a five expression string. However, we must note that within the five expression string, the first, third, and fifth characters will all be the same.

For example: “xyxyx” (Note that the first, third, and fifth characters are all “x”.)

**(.)(.)(.).*\3\2\1**

From these characters, we can assume from earlier that the three “(.)” listed are three expressions. The asterisk “.*” indicates that any number of characters can appear within the three “(.)” expressions and the remaining chunk. As previously mentioned, we know that the “\1” and “\2” expressions signify the primary and secondary expression respectively. The “\3” signifies the third expression captured. Ultimately, we should receive the following outputs:

For example: xyzHellozyx

Another example: abcgoodmorningcba

Construct regular expressions to match words that:

Start and end with the same character. (.)\1

Contain a repeated pair of letters (..)\1

Contain one letter repeated in at least three places (.)(.)(.).*\3\2\1