As a response to sending conference announcement to a list of e-mail addresses of friends and colleagues, I usually receive several responses that e-mail can not be delivered. I would like to clean my list of e-mails and delete all e-mails that are not reachable anymore.

Save e-mails to a text file

Select e-mails with undeliverable responses (or any other group of e-mails) and use File|Save as... menu option in Outlook to save them as text file. The headers (and sometimes the body) will be conveniently saved in one file.

lfn <- "emails2.txt"
txt <- readLines(lfn)

Inspect the file structure

txt[1:12]
##  [1] "From:\tMail Delivery System [MAILER-DAEMON@relay.nib.si]"            
##  [2] "To:\thassank@namru3.med.navy.mil"                                    
##  [3] "Sent:\t21. april 2015 12:51"                                         
##  [4] "Subject:\tUndeliverable: [AS2015] Applied Statistics 2015 Conference"
##  [5] ""                                                                   
##  [6] "From:\tMail Delivery System [MAILER-DAEMON@relay.nib.si]"            
##  [7] "To:\tr.heuberger@iccr-international.org"                             
##  [8] "Sent:\t21. april 2015 12:51"                                         
##  [9] "Subject:\tUndeliverable: [AS2015] Applied Statistics 2015 Conference"
## [10] ""                                                                   
## [11] "From:\tMail Delivery System [MAILER-DAEMON@relay.nib.si]"            
## [12] "To:\tnino.rode@fsd.si; mt@iccr-international.org"

Each e-mail has recepient e-mail address in the line starting with the text “To:” (followed by tab character) and corresponding to the To: field. In case of multiple recepients, the e-mails are printed in several lines, with e-mails delimited by ;. Next line corresponds to Sent: field (starts with this keyword and tab character)

Parsing the file

We need to extract information in several places. First we detect the locations of To: and Sent: lines.

start <- grep("To:\t",txt)
start
##  [1]  2  7 12 17 22 27 32 37 42 47 52 57
end <- grep("Sent:\t",txt)-1
end
##  [1]  2  7 12 17 22 27 32 37 42 47 52 57

Extract only relevant lines:

ind <- data.frame(start,end)
mytxt <- apply(ind,1,function(x) txt[x[1]:x[2]])
txt <- unlist(mytxt)

The first group of e-mails:

head(txt)
## [1] "To:\thassank@namru3.med.navy.mil"                                   
## [2] "To:\tr.heuberger@iccr-international.org"                            
## [3] "To:\tnino.rode@fsd.si; mt@iccr-international.org"                   
## [4] "To:\tGerd.Beidernikl@zbw.at; Gisela.Andersson@integrationsverket.se"
## [5] "To:\tstadlober@stat.tu-graz.ac.at"                                  
## [6] "To:\tBojan.Leskosek@sp.uni-lj.si"

Now we can delete all spaces and keyword To:.

txt <- gsub(" ","",txt)
txt <- gsub("To:\\t","",txt)
head(txt)
## [1] "hassank@namru3.med.navy.mil"                                  
## [2] "r.heuberger@iccr-international.org"                           
## [3] "nino.rode@fsd.si;mt@iccr-international.org"                   
## [4] "Gerd.Beidernikl@zbw.at;Gisela.Andersson@integrationsverket.se"
## [5] "stadlober@stat.tu-graz.ac.at"                                 
## [6] "Bojan.Leskosek@sp.uni-lj.si"

Finnaly we can use semicolon to split multiple e-mails and make the final list:

emails <- unlist(strsplit(txt,";"))
head(emails)
## [1] "hassank@namru3.med.navy.mil"           
## [2] "r.heuberger@iccr-international.org"    
## [3] "nino.rode@fsd.si"                      
## [4] "mt@iccr-international.org"             
## [5] "Gerd.Beidernikl@zbw.at"                
## [6] "Gisela.Andersson@integrationsverket.se"
length(emails)
## [1] 14

Save to a text file

Save them into a text for further use and delete them from the source list.

cat(emails,file="wrong-e-mails.txt",sep="\n",append=TRUE)

With obviuos modifications you can use this method to extract e-mails from the textfile with printed e-mails (or similar structured files).