library(tidyverse)
## Warning: package 'dplyr' was built under R version 4.4.1
library(openintro)
## Warning: package 'openintro' was built under R version 4.4.1
library(tibble)
N1<- tibble(
EmployeeID = c(1, 1, 2, 3, 3),
EmployeeName = c("John", "John", "Jane", "Alice", "Alice"),
Department = c("Sales", "Sales", "Marketing", "HR", "HR"),
Project = c("Project A", "Project B", "Project C", "Project D", "Project E"),
HoursWorked = c(10, 15, 20, 30, 25)
)
N1
This table is in N1 form because all the data are atomic and the rows and columns are not repeating.
Since the table is in N1 form, we can transform it into N2, eliminating partial dependencies on non-key columns.
N2EmpID<- N1 %>%
subset(select=c(EmployeeID, EmployeeName, Department)) %>%
distinct()
N2EmpID
To further eliminate partial dependency, we will introduce a new column called ProjectID. This makes the Project column fully dependent on the primary key, which is ProjectID rather than the EmployeeID in the original table.
N2D_Pro<- N1 %>%
mutate(ProjectID= c('AA01', 'AA02', 'AA03', 'AA04', 'AA05')) %>%
subset(select=c(ProjectID, Project)) %>%
distinct()
print(N2D_Pro)
## # A tibble: 5 × 2
## ProjectID Project
## <chr> <chr>
## 1 AA01 Project A
## 2 AA02 Project B
## 3 AA03 Project C
## 4 AA04 Project D
## 5 AA05 Project E
N2Hour<- N1 %>%
mutate(ProjectID= c('AA01', 'AA02', 'AA03', 'AA04', 'AA05')) %>%
subset(select=c(EmployeeID, ProjectID, HoursWorked)) %>%
distinct()
print(N2Hour)
## # A tibble: 5 × 3
## EmployeeID ProjectID HoursWorked
## <dbl> <chr> <dbl>
## 1 1 AA01 10
## 2 1 AA02 15
## 3 2 AA03 20
## 4 3 AA04 30
## 5 3 AA05 25
These three tables happen to also satisfy N3F since each non-key attributes are fully dependent on the primary keys in each table and no transitive dependency is observed. The non-key attribute “HoursWorked” is fully dependent on the composite keys “EmployeeID” and “ProjectID”.
major<- read.csv('https://raw.githubusercontent.com/stormwhale/607-assignment/refs/heads/main/majors-list.csv')
major<-data.frame(major)
name<- major %>%
filter(str_detect(Major, regex('DATA|STATISTIC', ignore_case = TRUE)))
name
(.)\1\1 The correct expression should be (.)\\1\\1 and it will match three of the same characters captured in the first group. (e.g. aaa, ccc, bbb)
“(.)(.)\2\1” <- This will capture 4 characters that are reverse of one another. (e.g saas)
(..)\1 <- The correct expression should be “(..)\\1” and it will match anything that is repeating the first two characters. e.g. abab
“(.).\1.\1” <- this captures any first letter with any letter that follows and returns the first captured letter with another letter finally followed by the first letter. E.g “cbcnc”
“(.)(.)(.).*\3\2\1” <- captures three characters followed by a string that could be anything and any length as long as the reverse of the first three characters are followed. E.g. xyz123abc123zyx
Start and end with the same character.
df<- 'aaafffddf'
print(str_view(df, "(.)\\1"))
## [1] │ <aa>a<ff>f<dd>f
Contain a repeated pair of letters (e.g. “church” contains “ch” repeated twice.)
df2<- c("church", "success", "letter", "banana", "bookkeeper")
str_view(df2, "(.)(.).*\\1\\2")
## [1] │ <church>
## [4] │ b<anan>a
Contain one letter repeated in at least three places (e.g. “eleven” contains three “e”s.)
df3<- 'Tennessee'
str_view(df3, "(.).*\\1.*\\1")
## [1] │ T<ennessee>