Download all of the abbreviations of the U.S. States and Possessions.
When entering a character string such as “KS”, I would like a function that will explore a set of characters in a set of several character strings and will return all of those that begin with “S”, the last letter in “KS”.
Also, it would be handy to take two character strings, say “UTN” and “NE” and connect the two so they are one: “UTNE”. The two functions below do this.
## A function that takes a set of abbreviations and a character string
## and then returns a vector that contains the set of abbreviations that
## have a first letter matching the last letter of your string,
LstLetSet <- function(ch, set){
n <- nchar(ch)
ll <- substr(ch,n,n)
fls <- sapply(set, function(x) substr(x,1,1))
return(set[fls==ll])
}
## Test
LstLetSet("MN", Abbr)
## [1] "NE" "NV" "NH" "NJ" "NM" "NY" "NC" "ND"
LstLetSet("FM", Abbr)
## [1] "ME" "MH" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "MP"
## A function that connects two strings.
connect <- function(st1, st2){
n <- nchar(st1)
if(substr(st1,n,n)!=substr(st2,1,1)){return("cannot connect")}
else{return(paste(st1,substr(st2,2,nchar(st2)),sep = ""))}
}
## Test
connect("MN","NJ")
## [1] "MNJ"
connect("TN", "NY")
## [1] "TNY"
connect("NJ", "TN")
## [1] "cannot connect"
Now we will use recursion, and search through all possible string sequences in search for the longest. I do this by cycling through all of the states with abbreviations that will potentially have something that will attach to it.
Let’s remove those states that end in ‘E’, ‘J’, ‘X’, ‘Y’, and ‘Z’, since there are no states or possessions that begin with those letters.
The function that will call itself is designed to take as inputs a character string such as “AK” (or “AKSCAL”) and a set of strings that does not include “AK” (or “AK”, “KS”, “SC”, “CA”, or “AL”) and will return the longest possible string that it can build from the remaining strings in the given set.
Many of the resulting longest strings conveniently end in ‘D’, which means we can tack on DE to the end for one more character. We’ll do this at the end.
AbbrWOEnd <- Abbr[!Abbr %in% c("DE","ME","NE","NJ","TX","KY","NY","WY","AZ")]
LgSt <- function(ch, set){
LG <- ch
n <- nchar(ch)
innerset <- set[set!=ch]
cycleset <- LstLetSet(ch, innerset)
if(length(cycleset)==0){return(LG)}
else{
for(st in cycleset){
St <- connect(ch,st)
metaset <- innerset[innerset!=st]
lg <- LgSt(St,metaset)
if(nchar(lg)>nchar(LG)){
LG <- lg
}
}
}
return(LG)
}
## Example with a short run.
LgSt("LA", AbbrWOEnd[AbbrWOEnd!="LA"])
## [1] "LAKSCASDCOHINVTNMPWVIARID"
Now, loop through all of the states and save this to a csv file so you can edit it aesthetically through Excel.
LongAbbr <- c()
for(i in AbbrWOEnd){
L <- LgSt(i,AbbrWOEnd[AbbrWOEnd!=i])
print(L)
print(nchar(L))
LongAbbr <- c(LongAbbr, L)
}
LongAbbr2 <- sapply(LongAbbr, function(x)
ifelse(substr(x,nchar(x),nchar(x))=="D",paste(x,"E",sep = ""), x))
AbbrWOEnd <- sort(AbbrWOEnd)
LongAbbr2 <- sort(LongAbbr2)
LngAbbr.csv <- data.frame(States = AbbrWOEnd, Longest_String = LongAbbr2,
Num_characters = nchar(LongAbbr2)+1)
The resulting longest string ends up being one that begins with “FM”. There are at least two strings that are longest, but probably many, many more. One such is “FMNVIDCALAKSCOHINCTNMPWVARIASDE”, which is 31 characters long.