A quick try-out. Now, .docx files are just zipped directories, so rename that document to ‘data’.zip, unzip it and navigate to ‘data’/word/document.xml . ‘data’ is just the name of the file.
Looking at document.xml, you will see a standard xml script.
We can easily make out a table structure with rows and columns. In the simplest cases (which is all I???ll cover in this post) where the rows and columns are uniform it???s pretty easy to grab the data:
setwd("~/Documents/Workspace/R-project/Word")
library(xml2) #read xml text
library(textreadr) # read docx text
##
## Attaching package: 'textreadr'
## The following object is masked from 'package:xml2':
##
## read_html
library(knitr)
## FIRST, look ar xml2 package
# read in the XML file
doc <- read_xml("document.xml")
# there is an egregious use of namespaces in these files
ns <- xml_ns(doc)
# extract all the table cells (this is assuming one table in the document)
cells <- xml_find_all(doc, ".//w:tbl/w:tr/w:tc", ns=ns)
# convert the cells to a matrix then to a data.frame)
dat <- data.frame(matrix(xml_text(cells), ncol=4, byrow=TRUE),
stringsAsFactors=FALSE)
# if there are column headers, make them the column name and remove that line
colnames(dat) <- dat[1,]
dat <- dat[-1,]
rownames(dat) <- NULL
| Item | Definition/clarification | ||
|---|---|---|---|
| 1. | Date of transplantation | dd/mm/yyyy | |
| 2. | Country where transplantation took place | ||
| 3. | City and hospital where transplantation took place | ||
| 4. | Recipient age at transplantation | Years | |
| 5. | Recipient gender | FORMCHECKBOX Male FORMCHECKBOX Female | |
| 6. | Recipient blood group | FORMCHECKBOX A FORMCHECKBOX B FORMCHECKBOX AB FORMCHECKBOX 0 | |
| 7. | Status of the recipient on your centres waiting list when he/she travelled for transplantation abroad | FORMCHECKBOX Active on the waiting list FORMCHECKBOX Not active on the waiting list, but treated at your Centre FORMCHECKBOX Not active on the waiting list, and not treated at your Centre FORMCHECKBOX Other (please specify)_______________ | |
| 8. | Referral of the recipient by your Centre or a Centre in your country for transplantation abroad | Specify if the patient was referred by your Centre or a Centre in your country for transplantation in another country. Referral should be understood as the establishment of a direct contact between the Centre of origin and the Centre where the transplantation procedure would take place in order to ensure transfer of medical records and continuity of care. Referral should NOT be understood as a simple recommendation to travel for transplantation abroad without any further engagement or contact between the Centre of origin and the Centre where the transplantation procedure would take place. | FORMCHECKBOX Yes FORMCHECKBOX No |
| 9. | Reason for referring the recipient for transplantation abroad | Complete only if the answer to question #8 is Yes. | FORMCHECKBOX Lack of transplant programme in home country and established official bilateral agreement with the country where the transplantation procedure would take place FORMCHECKBOX Lack of transplant programme in home country in the absence of an established official bilateral agreement with the country where the transplantation procedure would take place FORMCHECKBOX Double citizenship of recipient FORMCHECKBOX Other reason (please specify) _______________ |
| 10. | Country(ies) of legal citizenship/residency of the recipient | Citizenship: _______________Residency (if different): _______________ | |
| 11. | Donor type | According to the World Health Organization classification, a living donor has one of the following relationships with the recipient: RelatedA1. Genetically related: 1st degree genetically relative: parent, sibling, offspring2nd degree genetically related: e.g. grandparent, grandchild, aunt, uncle, niece, nephew.Other than 1st or 2nd degree genetically related: e.g cousinA2. Emotionally related: spouse (if not genetically related; in-law; adopted; friend.Unrelated: not genetically or emotionally related. | FORMCHECKBOX Deceased FORMCHECKBOX Living - genetically related 1st degree FORMCHECKBOX Living - genetically related 2nd degree FORMCHECKBOX Living - genetically related other than 1st or 2nd degree FORMCHECKBOX Living - emotionally related FORMCHECKBOX Living - unrelated FORMCHECKBOX Not available |
| 12. | Donor age | Years. | Please specify if not available. |
| 13. | Donor gender | FORMCHECKBOX Male FORMCHECKBOX Female FORMCHECKBOX Not available | |
| 14. | Country(ies) of legal citizenship/residency of the donor | Citizenship: _______________ Residency (if different): _______________ FORMCHECKBOX Not available | |
| 15. | Donor blood group | FORMCHECKBOX A FORMCHECKBOX B FORMCHECKBOX AB FORMCHECKBOX 0 FORMCHECKBOX Not available | |
| 16. | Information on the Transplant Team | Specify if your Centre has information available on the Transplant Team (contact details) who performed the transplant procedure abroad | FORMCHECKBOX Yes FORMCHECKBOX No |
| 17. | Quality of medical report on transplant procedure provided to the patient at hospital discharge after transplantation | A complete report should contain at a minimum information on: hospital where transplantation took place (with contact details of the transplant team), date of transplantationdonor characteristicsrecipients post-operative complications recipients treatmentIf any of this information is missing, please describe the report as incomplete. | FORMCHECKBOX Complete report FORMCHECKBOX Incomplete report FORMCHECKBOX No report available |
| 18. | Date of last available follow-up | The evolution of the transplant recipient is assessed at 1 year 1 month after transplantation. dd/mm/yyyy | |
| 19. | Functioning graft (censored for death) | Please specify if the patient had a functioning graft at 1 year 1 month after transplantation. If the patient died with a functioning graft, please respond Yes.Skip to question # 22 if the answer is Yes. | FORMCHECKBOX Yes FORMCHECKBOX No |
| 20. | Date of graft loss | Complete only if answer to question #19 is No. dd/mm/yyyy |
doc2 <- system.file("Questionnaire data collection for hospitals.docx")
read_document(doc2)
## character(0)
doc2