This is an R Markdown document containing Sean Amato’s work pertaining to HW #3.
Exercise 1:
# First I need to delete row 146 because it represents less than a Bachelor's Degree.
df2 <- df[-146,]
# Now I'm going to create a column and assign a row TRUE or FALSE based on whether it contains any of the following text in the 'Major' column: SCIENCE, OLOGY, COMPUTER, MATH, and ENGINEERING. While this probably isn't a perfect approach, in my opinion it's a very good first pass. I know "Physics" is one major my classifier misses as you can collect data during experiments and run statistical analysis to ensure your data is sound.
df2$Data_or_Stats <- str_detect(df2$Major, "SCIENCE")|str_detect(df2$Major, "OLOGY")|str_detect(df2$Major, "COMPUTER")|str_detect(df2$Major, "MATH")|str_detect(df2$Major, "ENGINEERING")|str_detect(df2$Major, "TECHNO")
# Printing the top 10 rows to show the results
head(df2, 10) |>
select(Major, Data_or_Stats) |>
gt() |>
tab_header(title = md("**Data or Stats**"), subtitle = md("Are Data or Stats used in a particular major?")) |>
cols_width(everything() ~ px(200))
| Data or Stats | |
| Are Data or Stats used in a particular major? | |
| Major | Data_or_Stats |
|---|---|
| GENERAL AGRICULTURE | FALSE |
| AGRICULTURE PRODUCTION AND MANAGEMENT | FALSE |
| AGRICULTURAL ECONOMICS | FALSE |
| ANIMAL SCIENCES | TRUE |
| FOOD SCIENCE | TRUE |
| PLANT SCIENCE AND AGRONOMY | TRUE |
| SOIL SCIENCE | TRUE |
| MISCELLANEOUS AGRICULTURE | FALSE |
| FORESTRY | FALSE |
| NATURAL RESOURCES MANAGEMENT | FALSE |
Exercise 2:
# For this problem I copy and pasted the data directly from the homework and put it into "string1", via single quotes, without any formatting.
string1 <- '[1] "bell pepper" "bilberry" "blackberry" "blood orange"
[5] "blueberry" "cantaloupe" "chili pepper" "cloudberry"
[9] "elderberry" "lime" "lychee" "mulberry"
[13] "olive" "salal berry"'
# In the following lines I've removed numbers and square brackets from my string.
string2 <- str_remove_all(string1, "[0-9]")
string3 <- str_remove_all(string2, "\\[\\]")
# Here I've extracted items in quotations and stored them in matches.
matches <- str_extract_all(string3, '"([^"]+)"') # The regex here does all the heavy lifting.
x <- unlist(matches)
x2 <- str_remove_all(x, "\"")
print(x2)
## [1] "bell pepper" "bilberry" "blackberry" "blood orange" "blueberry"
## [6] "cantaloupe" "chili pepper" "cloudberry" "elderberry" "lime"
## [11] "lychee" "mulberry" "olive" "salal berry"
Exercise 3: Describe in words what the following
expressions will match: (.)\1\1 - This expression will find all triplets
in a string for any character, i.e. “
“(.)(.)\2\1” - This expression will find any 4 character palindromes,
i.e. “b a
(..)\1 - This expression will find any two consecutive character pairs,
i.e. “b
“(.).\1.\1” - This expression will find a 5 character string where
positions 1, 3, & 5 are any of the same character and position 2
& 4 are any characters, i.e. “b ab<a.aba>b race..carara< k
i >kl <…g.>”.
“(.)(.)(.).*\3\2\1” - This expression will find any 3 characters with
any characters in between followed by the same 3 initial characters in
reverse, i.e. “<abc cba . I love taco tuesday! kjn abc . cba>
bleh!”.
Exercise 4: Construct regular expressions to match
words that:
1. Start and end with the same character.
2. Contain a repeated pair of letters (e.g. “church” contains “ch”
repeated twice.)
3 .Contain one letter repeated in at least three places (e.g. “eleven”
contains three “e”s.)
word_list <- c("racecar", "church", "eleven", "taco", "spinning", "retroactive", "ubuntu", "eel", "lyrically")
# 1
answer1 <- str_view(word_list, "^(.).*\\1$")
print(answer1)
## [1] │ <racecar>
## [7] │ <ubuntu>
answer2 <- str_view(word_list, "(..).*\\1")
print(answer2)
## [2] │ <church>
## [5] │ sp<innin>g
## [9] │ <lyrically>
answer3 <- str_view(word_list, "(.).*\\1.*\\1.*")
print(answer3)
## [3] │ <eleven>
## [5] │ spi<nning>
## [7] │ <ubuntu>
## [9] │ <lyrically>