La minería de texto es el proceso de extraer información útil, patrones o conocimiento de textos no estructurados.
Consta de tres etapas:
1. Obtener datos: El reconocimiento óptico de caracteres (OCR)es una
cronología que permite convertir imágenes de texto en texto editable.
También es conocido como extracción de texto en
imágenes.
2. Explorar datos: Representación gráfica o visual de los datos para su
interpretación. Los métodos más comunes son el Análisis de Sentimientos,
la Nube de Palabras y el Topic Modeling.
3. Análisis predictivo: Las técnicas y los modelos estadísticos para
predecir resultados futuros. Los modelos más usados son el Random
Forest, Redes Neuronales y Regresiones.
#install.packages("tidyverse") # Data wrangling
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#install.packages("tesseract") # OCR
library(tesseract)
#install.packages("magick") # PNG
library(magick)
## Linking to ImageMagick 6.9.10.23
## Enabled features: fontconfig, freetype, fftw, lcms, pango, webp, x11
## Disabled features: cairo, ghostscript, heic, raw, rsvg
## Using 16 threads
#install.packages("officer") # Office(word)
library(officer)
#install.packages("pdftools") # PDF
library(pdftools)
## Using poppler version 0.86.1
#install.packages("purrr") # Función map aplicar una funcion a cada elemento de un vector
library(purrr)
#install.packages("tm") # Text Minign
library(tm)
## Loading required package: NLP
##
## Attaching package: 'NLP'
##
## The following object is masked from 'package:ggplot2':
##
## annotate
#install.packages("RColorBrewer") # Colores
library(RColorBrewer)
#install.packages("wordcloud") # Nube de palabras
library(wordcloud)
#install.packages("topicmodels") # Modelo de temas
library(topicmodels)
#install.packages(ggplot2)
library(ggplot2)
imagen1 <- image_read("imagen1.PNG")
texto1 <- ocr(imagen1)
texto1
## [1] "Linear regression with one variable x is also known as univariate linear regression\nor simple linear regression. Simple linear regression is used to predict a single\noutput from a single input. This is an example of supervised learning, which means\nthat the data is labeled, i.e., the output values are known in the training data. Let us\nfit a line through the data using simple linear regression as shown in Fig. 4.1.\n"
doc1 <- read_docx() #Crea un documento de word en blanco
doc1 <- doc1 %>% body_add_par(texto1, style = "Normal") #Pega el texto en el word
print(doc1, target = "texto1.docx") #Guarda el word en la computadora
#library(tesseract)
#imagen2 <- image_read("imagen2.PNG")
#eng <- tesseract("eng")
#tesseract_options(datapath = "/path/to/custom/directory")
# Download the Spanish language model
#tesseract_download("spa")
#texto2 <- ocr(imagen2, engine = tesseract("spa"))
#texto2
#doc2 <- read_docx() #Crea un documento de word en blanco
#doc2 <- doc2 %>% body_add_par(texto2, style = "Normal") #Pega el texto en el word
#print(doc2, target = "texto2.docx") #Guarda el word en la computadora
pdf1 <- pdf_convert("pdf1.pdf") %>% map(ocr)
## Converting page 1 to pdf1_1.png... done!
## Converting page 2 to pdf1_2.png... done!
## Converting page 3 to pdf1_3.png... done!
## Converting page 4 to pdf1_4.png... done!
## Converting page 5 to pdf1_5.png... done!
## Converting page 6 to pdf1_6.png... done!
## Converting page 7 to pdf1_7.png... done!
## Converting page 8 to pdf1_8.png... done!
pdf1
## [[1]]
## [1] "RELAY assum eae so Sneha Tours oaee VOR Sek Se ii\n‘aa bee ‘cone\nNovenber 16,2015\nDear Pili Water System Overs Operatr\nThe Misou Ste Pie ator (SPI inthe proce o implementing new\nIkra The Opn (5 LN wl pants he uboeee semen\nCarly, prove tangy eke portal ener ae ne mE\nine vil pidctnwcrel oectsg paling ReaeeraaecaciecTon ea ar\nMissour Dstt Naural Recer (HDA) Sot Bae ee oe ee\n(SDWIS), SDWIS's We compe sjac MDNR wes ene SE on\nCaetand rsa rte Teter ans lntabcn ane a th\nNEW SAMPLE BOTTLES:\nDeiming i Aug 203+ e NSPHL Regan sing le sample ate fr ater ater wtng\nTHs be tn eink wep seston ers ens he pare oa Poe ae\nSample veleme MUST e win te nr eta G00, nL eee\nNEW SAMPLE INFORMATION FORMS:\nTe onal sarge rman “ada nd ott ety yes is being\n\"plced by le Environmental Sample Calsctoe Fos Aree a aad he\nCasi Sm tate ion\nTS Fam sz sexpanad oe sg 8\" 1\" shel of paper. Te for so logerina tle\nCat coy oem Yuna occ optomeay eer nea oe eee a\nGessimt ofa pic wats an onan sofercaeor ee es Ne MONE\nmallee beaten ee\n2, Thefbm pred ye OE wl eprops wit sor Pie Water Supe\nee\nsae,\nwont hes tt pt npn\nPea nee Sn ic san ledcitakctaechaad\n"
##
## [[2]]
## [1] "Contract operators will be provided with forms for all the supplies they operate, Blank forms will\nbe available for MDNR Regional Office staff use,\n\n3. The form requires all requested information to be printed by the collector. There are no longer\ncheek boxes for Sample Type or Repeat Location,\n\n4, Facility 1D, Sample Collection Point ID and Location for the sampling site MUST be\nprovided by the collecior. This information is available from your MDNR approved PWS\n‘sampling plan. MDNR will be providing all public water systems with a current copy of thei\napproved sampling plan. This information is required by SDWIS and is used by MDNR to\ncensure regulatory compliance requirements have been met, Failure to complete this information\n‘on the sample collection form may result in a non-compliance report from MDNR.\n\n5. A Collector Signature line has been added. The sample collector must sign the form to attest the\ninformation provided is accurate to the best oftheir knowledge.\n\nThe MSPHL will begin shipping the new forms to public water systems in late November or early\nDecember. Please begin using the new forms December 16, 2015. Discard all the old forms (“cards”)\nat that time.\n\nNEW SAMPLE INSTRUCTIONS:\n\n‘Sample instructions have been revised to include changes tothe bottle and sampling form. The\ninstructions include detailed information on how to collect the sample using the new bottle, how to\ncomplete the new sample collection form, how to best ship samples to the MSPHIL using the free\nMSPIIL courier system, and how to register for the new MSPHL web portal. A capy of these\ninstruetions is attached.\n\nNEW WEB PORTAL FOR RESULTS REPORTS,\n\nThe OE LIMS provides a web portal that may be used by systems to view and print their test result\nreports, check status of samples, download sample information into Excel, and receive automated emails\nwhen samples are received atthe laboratory, and when sample results are ready to be viewed. For\ninformation on how to gain access to this portal, please contact Shondra Johnson, LIMS Administrator\nat Shondra Johnson @health,mo,gov ot at 573-751-3334,\n\nIMPLEMENTATION DATES:\n\n‘The MSPHL intends to implement the OpenELIS LIMS on December 1, 2015. There will be a two\n‘week testing period in which laboratory staff will run the new LIMS in conjunction with our current\n‘manual, paper-based system to ensure the OE LIMS is operating properly. You may continue to submit\n‘samples as you currently do, using the old sample information card, throughout this time,\n\n‘On December 16, 2015, the MSPHL plans to “go-live” with the new OF LIMS, Samples submitted\nafter that date should be submitted on the new Environmental Sample Collection Form. At that time, the\nMSPIIL Test Results Web Portal will also be available to those systems that have been granted access,\nThe MSPHIL and MDNR understand that there will be alot of changes to a system that has been in place\nfor many years. The MSPHL is excited about the added beneiits from this new system, and we ask for\n‘your patience as we implement the OpenELIS LIMS at the Missouri State Publie Health Laboratory.\n"
##
## [[3]]
## [1] "Iyou have any questions, please contact the MSPII. Environmental Bacteriology Unita $73-751-\n3334. You may also contact your MDNR Regional Ofie for alitionl information on sample\ncollection\n\nOnce again thank you for your patience and undertaning as we implement these chang.\nPoteet, R Yanennn\n\nPatsck R- Shannon\n\nManager, Environmental Bacteriology Unit\n\nMissouri Department of Heath and Senior Series\n\nSiate Publi Heath Laboratory\n\n101 North Chestnat St\n\nP.O. Box $70\n\nJefferson City, MO 65102\n\nPhone: 573-751-3334\n\nEmail: Pa Shanson@healthmo.gov\n\nWeb: ys heath gox/Lab\n"
##
## [[4]]
## [1] "overt 98 ER | RePoRr To: UL To:\nPages in Order: 1 of 1 Jas we 2 wom\ntl ct\ng_| terete ate ct reset\n£5\nEs cotesetout Cet Tr\n88\n5° pus Noro Foc 08\nae :\n2 saree: sino coe\n5 vom cate\ncetacean Speco aon\nCole Sooe ty: Tes\nQ\n3\n%\n= 4,\nSe OEY\nge 88\ngE ogy\nESB P8| recmeany et\nee ee Evidene of Cooling: Yes No\ngulg82}| amen —\"\nied\nSaeesral\nlim |\" suo 0 EL WITHIN THIS BOX\n"
##
## [[5]]
## [1] "AD .\n(GED SAMPLE COLLECTION INSTRUCTIONS | (OES) mssour:\nA228 | puauconnnins waren or couronm sacreni annus | [BIE)| gerarrme o-\nS BN ertentetscunces\nThis sample kt and colletion method is for pubic drinking water regulatory complianee and special samples.\nOnly samples eoleted in botes supplied bythe Missouri State Public Health Laboratory (MSPHL) an collected\nin agcordance with these instructions willbe accepted for sling PLEASE READ THIESE INSTRUCTIONS\nCOMPLETELY BEFORE COLLECTING SAMPLES.\nSample Containers\nSample bots from the MSPHL contain chlorine eutalizer hati present in powder or Hiquid form, The bates\nare stele and ready for use when shipped. Do not rinse the contents from the Container amd Keep the botle\nclosed unlit sto be filed.\nShrink Wrap Seal\nRemave the sa by pulling down onthe redstip\nand pealing shrink rap from both the eap and j\nbole: star all shrink wrap. Do not attempt to ;\nreseallid with shrink wap stl attached Fa ; =\nCe a. fine\nTwo Fi Mines ; >=\n: in. fitne\nFill the bottle uni the water sample evel :\nBETWEEN THE TWO LINES, Place the boule\nona Tevel surface to check the sample level }\nSamples below the 100 mL (lower) line WILL. pro =\nNOT BE TESTED duc to insufficient sample v ~\n‘lume. Samples above the 120 mL Capper) ine Perera |\nWILL NOT BE TESTED due to overtiled :\nboatle. Technical protocol and EPA requirements 5\ndictate that bottles must have suicent ai space\nto ad testing reagents and to mix the sample\nrropedly\nikl aia yr dh Ha For More Information, please contact\nour off water unt the sample volume is between rede bercimk Hel cpces teen\nthoes ns bef sipingte MSPHL. MPL Mist Dear of Heath and Sein Sie\nWILL NOT adjust sample volume once the ee\nsimple i renetved a the at 101 North Chestnut St, P.O. Box 370\n; : Jefferson City, MO 6si02\ninformation on the bottle, DO NOT WRITE ON ms Reet oiicaey\nTHE BOTTLE, Please complete a sample ana | ae aloe\ninformation frm for each sample submited for ices | aaltetataeta\ntesting. DATE AND TIME OF SAMPLE. Websit sswwuhealthmogov/Lab\nCOLLECTION and the BOTTLE NUMBER\n(Grom sicker on bottle) ARE REQUIRED. A form\nforeach bots included inthis sample kit\nrage tot 1 28 Pb water m-2015)\n"
##
## [[6]]
## [1] "Bacteriological Sample Collection Procedures\nAssemble ll ofthe sampling supplies. Before you begin, wash your hands\nthoroughly before handling ups. Goto thc sap caess) seed\n‘n your Miso! Department a Matra Reseuseen HEN) srooten\n( sampling site plan, spl shade ake fom clean seeegh nse old\ni ‘water faucet if possible. Avoid drinking fountains, leaky faucets, hoV/cold\n| tnising faucet an rostproot yard hydrants sige wo praca to =\ni Stevie thse nares I oil, remune any aerators stant hoses\n* ® _hacare present becase ley may harbor ester, Flow the procedures below\nshen collecting the sample nsracton fr completing te envfonent c\n“Shing fomare on ne flowing oes fy\na |, Open the cold water tap for about 3 minutes before collecting the sample.\nThs sold aunty Mash he water ine of any dtr Seeeeeoe\n{ 2. Flamesterize the tap andor chemical dif the tap-Do not Mame- a\n¥ Serie tt ap plat or iraerators are attache Ds py\nGY thong ming toh can oie hip nih meg of 0%\nhouse hol blac (NaOCt and Sie tap wat. Tae extreme ce With song\n@® —___ Hewh oxidizing) Soluion\n5, Flesh the ap foram aitiona 3} minutes wth cold water, and then reduce\ntoa genileftow to about the with ofa pene Do ot case he wate os\nonce sou ave saredsampingas this cold dgdgecontaminansntietap,\n4, Remove the plastic shrink wrap seal by pulling down on the red strip and oO\npean the strink rap fom both the cap and bale isan the aa\ntrap. Do nota fo esa ti wih ss wrap slated\ntl 5. Grasp cap along top edge and remove carefully, Do not touch the inside\n‘ith your ges Hid het noc han thea nto, Dot\nsee ote cop with yone fers permite ane to touch ene\necETESEN.,, 6. Hoi the totes ht wae entering the bot wl nt come in contact with\nJour hands ore ous ofthe bot\nTl 7. flee ott unt the water sap ves BETWEEN THE TWO\nFile LINES on the bottle (100 — 120 ml). Preferably, the sample level should be at Hf\nor js sly above the (00 ne Sample eel tow the 100 mower) ==\nline WILL NOT BE: TESTED dct maitent sample volume, Sample\n: levelsabave the 120 appr) ie WILL NOTBE-TESTED dc ta\n© ‘overfilled bottle. If the bottle is overfilled, you may pour off any excess water '=xeweNa reer\niD tthe sale level bemeen he won, Plath capone bt ad\nscrew it dow gy\n& Filout the Misnur Deparment of Heath and Senor Services (DIISS)\nGE_ \"Sicha res a) ronment Sangean\nsing waterproof ink See attached document for fstrctionson propery >=\nComping the sample clei form and fr shipping atactons\n4. Fordngetaopley, nay fl the sunplcallecsonters inns aadad\ntetera, sound th ote nd pace the siping on Forme\n© _Hongside he sample Venede, we buble poco ed paps oe\nDo'ot se sveded paper San ton wth sige spo sip age ad\nMiche aes bal ote oy oe Do\nge 2of4 a9 2¢ ae wae (902015)\n"
##
## [[7]]
## [1] "INSTRUCTIONS FOR COMPLETING ENVIRONMENTAL SAMPLE COLLECTION FORM\nPublie Drinking Water Bacterial Analysis\nPRINT LEGIBLY using water proofink. A sandaditk pen i saicient, Complete ALL sample information tines onthe for,\nSome scetions ofthe form may already be completed by the laboratory computer system ave te forms ae primed. Te make\ncorrections, please drav single line through the inacewrate information and print the corrected information beh i The\nSections ofthe form and dretions for completing cach line ee as follows\n‘Order For Misout Slate Public Heath Lab (MSPHL} purposes only, Pages in Order and Containers in Order indicate number of|\nfms and sample bates shipped in the sample Kit onder.\nREPORT TO: Public water systems name ad shipping address an fle with Misiourt Department of Natural Resources (MDNR)\nPlease review and crest if necessary. Real reports wil be alle this adress,\nBILL TO: Section defaulted tothe MDNR. There ae no charg for pubic water fesing atthe MSPHL.\nRequested Analssis Test\n“This setion wil sate FUBLIC DRINKING WATER BACTERIAL ANALYSIS, Iit doesnot, you may have the wrong eollsction\nfom. Please conat the MSPHL or MDNR forthe proper fom. Donat wse forms Irom a local gouty health agene as these Forms ae\nFor private well water samples, Your MDNR Regional Office can povide Bink Tors fer yout use.\n‘Complete or corres the following information:\nAll Fines are considered required information. Failure to complete line may result in an invalid sample.\n{Cotlectd Date: Enter the date of sample coletion in the format V¥YY-MM-DD. Use ds fr year and 2 digs for month and de,\nNovember |, 2015 would be writen as 2015-11-01\nCollected Time: Enter the time of sample collection vsing 24-hour miter format hmm.\nPWS ID: If bank enter your T-gt Publi Water System ID number as asignad by MDNR (MO#HHEH,\nFacility 1D: Defaulted to DS (Distribution System) for routine samples. IFsubmitng a sample type other than Routine, enter the Facility\n1D number fom your system's MDNR approved sample sit plan ar example DS#, WL, WT).\n‘Sample Type: Etter one o the following options\nRoutine ~ Regular monthly monitoring simples.\npeat — A series of 3 or 4 repeat simples (i you only take 1 routine per month) mst be taken fr each routine simple hat tests\n‘osive (Present) for elitr bacteria All repeats must be taken onthe te dy, within 2 hou of boing nied ofthe\n«olf positive sample, it locations are based on te approved site sampling plan. Typrelly these samples wl consi of oe\nfom the site of th orignal unsafe sample locaton, one within S service comestonsupsteam. oe within service connections\n<ovesteam, and one fom location specified or approved by MDNR. Ir your system fa prod water stem serving es than\n1.000 people without 4 Tog vis inactivation, one repeat sample the fourth repeat) may Be ellected fom the sure el prior to\ntteaiment See Repeat Location blow\nReplacement ~All sanples which are not tested because they war invalid incomplete information, outdated, broken in as,\nfore, ete, mst be replaced with sins sample fom the sme location within 24 bours of being noted.\nSoureeWell-Ifyour system isa ground vate system without chlorine contac time (1 log views inetvation o mova, one\nSample must be collected rom each well source, prior to any treatment sctive at the time a the postive sample).\nSpeciat~ Any sample that does not count for compliance. These may include samples to check disinfect practices om eps\nnew constriction ofr seasonal public water sytens prior to serine Watt vo the pb\n‘Sample Cotetion Point 1D: Ener the sampling point ID number ftom yor ystems MBNR approved sample sit plan, This numbers\n‘equted. DO NOT LEAVE BLANK, Ifyou have questions nbout your Sample Clletion Point ID, plese contact the MDNR.\n‘Location: Enter the aes one of the collection foeaton associated with he Sample Callecton Pont ID above. Impurtat Not\n‘The Location sed to the Sample Collection Point 1D trom the appreved site sampling plan and will be the location printed on\nthe final analysis report. If the focation entered onthe collection form doesnot match te al Fepor, contact MDNR)\nCollector: Enter your last ame, Sst name\nCollector Phone? Ester your 10g dy tine phone number\nSample Category: This wll always be Bacterial an alr filled out or you\nRepeat Location: ithe sample type above i Repeat, enter to repeat location for his sample: upstream, dowesican, orginal, source ot\n‘other. other, please describe the location,\nBottle Numer: Enter the number from the label onthe bt. This is sed to match coleton forms to samples.\nree Chlorine: Ener the fee chlorine test evel in mg. (i your stem fchlornated).\n‘Total Chlorine: Ener the total eorne tex level in mi. i your system is eorinate),\nCollector Signature: By signing you atest that the information provided is acura fo thebestof your knowles.\n‘County: Ever the county name for the colecton point i is nt are filled out far Jou\nAlloticr seston of the Environmental Sample Colleton Form ate fr MSPHL use onl. if you have any questions, please cont the\nMSPHL Environmental Bacteriology Unit at (573) 731-3334 or your lcal MDNR Regional Oe (se West page for phone number),\nPage 3 of wn 34 Public Water (810-2025)\n"
##
## [[8]]
## [1] "Shipping Instructions\nPer U.S. Environmental Protection Agency requirements, public water samples must be recived by the laboratory and tse\nwithin 0 hours ofthe date and time of collection. The MSPHL and MDNK recommend you ue the tree Department of Health\nand Senor Services (DHSS) contract courier for overnight delivery to the MSPHL, Ths coer picks up st tow local pbs heh\nagency offices and hospital (Note: Noll hosp wll accept water samples for courier pick up). For sale dropoff locations and\ntines, please got p/w ath mo govlabeourierseriecs php and elie onthe inerctive mop ote listing of drop of leatons\nby aunty © you may calle MSPHL courier liaison at (573) 731-1830, othe MDNR Public Drinking Water Branch PDWB) at (373)\n520-1124\nPlease wote the couriers allowed to plekup samples within one hour ofthe scheduled time (before or after). The evs pickup\n{ine a 10:30 aan. To ensure your samp meet the transit ime requirement of 30 hows trypan! that yo collect your samples in\n‘he morning and have them dropped of at the courier pickup point one hor roe the sled ine,\nUse ofthe US, Postal Service or other commercial cations such ax Fe Ex oF UPS wil require aditons charges and may not mee the\n30 hou ans time reqirement\n‘Samples should not be en route tothe laboratory ver a weekend oF state holiday (New Yes Day. Matin Lather King Day,\nLincoln’ Bithay, Washingtons Birthday, Truman's Binhéay, Memorial Day, Inependence Day, Labor Day, Columbus Day. Veteran's\nDay, Thanksaiving Da. ard Christa)\nPublic water supplies may use the new MSPHL Test Results Web Portal orriove preliminary test esulscnfne Fr infrmation on\nhow to register asa user fr the web portal ado recive ena notifications, please contac the MSPHL LIMS Administ st\nshondeajnson@halhno gov or eal 373-751-3334, ‘These preliminary ts rel are fr informational purposes onl Oficial test\nFesulis are available on-line within 2 or 3 business days atthe MDNR Drinking Water Watch website hip:/darna gos DWWL\nInaddiin, de oficial bterilogieal sample reports wil be mailed ty MDNR within or 5 busines dys\nAdditional sample hotles canbe ordered ome t p/w el mo. g0v abl speinensarms php or by calling the MSPHL\n(Centra Services Unit at (575) 751-4830,\nSomatines i spite of taking all of the precautions you may get cll orn MDNR or eels by mai notifying you hat eoiform oe\ncoli bacterin ae present in your water. You will be sien specifi instutions tht may include collection of repent sample to confirm\nthat he fst routie sample was not sampling eror Please cll he MDNR Regional fice sland they will cas the procedure\nwith you, See cone infomation below.\nor more information abe public water systems, contiet the MDNR Public Drinking Water Branch a (573) 751-5331 or your MDNR.\nRegional fice (counties within each region ae sed at tp/dne mo, gov/gpin/indes hn) o visit warw dna gon pps\nindex hn\nMissouri Department of Naterl Resouress\n‘Division Enviromental Cay\n‘Wa Protect Progra\nPublic Deaking Water Branch\nFO. hox 8\nKansas City Regional Offie Northeast Regional Ofc Southest Reionat Oc\nDpaien of Nata Resour Departs of Nera Resouces Deparment of Natal Reoucss\n\"aH NE Calter Roa 70) Pape Dre 7138 Noch Westwood\nsss Simo MO ete 470 sco 10 #3689-190 Poplar Ba, MO 6901-120\nthey 381-070 ‘iy 38-600 (373) 03790\nSouthwest Regina Office St. Lous Regma Offce\nDepartment of Natl Reuss Deparnent of Ratu Resouces\n2010 Wes Woodland 7548 Soa Linder ute 210\nSpring, MO esa 912 ‘tau MO 133\n‘si7) 991-0300 (G18) 46.2960 |\nPage a ofa 1s 34 Public water (830-2035)\n"
text <- readLines("http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt")
corpus <- Corpus(VectorSource(text)) #pone cada renglon en una celda de vector
corpus <- tm_map(corpus, content_transformer(tolower)) #Pone todo en minusculas
## Warning in tm_map.SimpleCorpus(corpus, content_transformer(tolower)):
## transformation drops documents
corpus <- tm_map(corpus, removePunctuation) #Elimina puntuacion
## Warning in tm_map.SimpleCorpus(corpus, removePunctuation): transformation drops
## documents
corpus <- tm_map(corpus, removeNumbers) #Elimina numeros
## Warning in tm_map.SimpleCorpus(corpus, removeNumbers): transformation drops
## documents
corpus <- tm_map(corpus, removeWords, stopwords("en")) #Elimina palabras que no hablan del texto
## Warning in tm_map.SimpleCorpus(corpus, removeWords, stopwords("en")):
## transformation drops documents
#corpus <- tm_map(corpus, removeWords, c("dream", "will")) #Eliminar palabras puntuales
inspect(corpus)
## <<SimpleCorpus>>
## Metadata: corpus specific: 1, document level (indexed): 0
## Content: documents: 46
##
## [1]
## [2] even though face difficulties today tomorrow still dream dream deeply rooted american dream
## [3]
## [4] dream one day nation will rise live true meaning creed
## [5]
## [6] hold truths selfevident men created equal
## [7]
## [8] dream one day red hills georgia sons former slaves sons former slave owners will able sit together table brotherhood
## [9]
## [10] dream one day even state mississippi state sweltering heat injustice sweltering heat oppression will transformed oasis freedom justice
## [11]
## [12] dream four little children will one day live nation will judged color skin content character
## [13]
## [14] dream today
## [15]
## [16] dream one day alabama vicious racists governor lips dripping words interposition nullification one day right alabama little black boys black girls will able join hands little white boys white girls sisters brothers
## [17]
## [18] dream today
## [19]
## [20] dream one day every valley shall exalted every hill mountain shall made low rough places will made plain crooked places will made straight glory lord shall revealed flesh shall see together
## [21]
## [22] hope faith go back south
## [23]
## [24] faith will able hew mountain despair stone hope faith will able transform jangling discords nation beautiful symphony brotherhood faith will able work together pray together struggle together go jail together stand freedom together knowing will free one day
## [25]
## [26] will day will day god s children will able sing new meaning
## [27]
## [28] country tis thee sweet land liberty thee sing
## [29] land fathers died land pilgrim s pride
## [30] every mountainside let freedom ring
## [31] america great nation must become true
## [32] let freedom ring prodigious hilltops new hampshire
## [33] let freedom ring mighty mountains new york
## [34] let freedom ring heightening alleghenies pennsylvania
## [35] let freedom ring snowcapped rockies colorado
## [36] let freedom ring curvaceous slopes california
## [37]
## [38]
## [39] let freedom ring stone mountain georgia
## [40] let freedom ring lookout mountain tennessee
## [41] let freedom ring every hill molehill mississippi
## [42] every mountainside let freedom ring
## [43] happens allow freedom ring let ring every village every hamlet every state every city will able speed day god s children black men white men jews gentiles protestants catholics will able join hands sing words old negro spiritual
## [44] free last free last
## [45]
## [46] thank god almighty free last
tdm <- TermDocumentMatrix(corpus)
m <- as.matrix(tdm) #Cuenta las veces que aparece cada palabra por renglon
frecuencia <- sort(rowSums(m), decreasing = TRUE) # Cuenta frecuencia de cada palabra en el texto completo
frecuencia_df <- data.frame(word=names(frecuencia), freq=frecuencia)
frecuencia_df #Convierte la frecuencia en un dataframe
## word freq
## will will 17
## freedom freedom 13
## ring ring 12
## dream dream 11
## day day 11
## let let 11
## every every 9
## one one 8
## able able 8
## together together 7
## nation nation 4
## mountain mountain 4
## shall shall 4
## faith faith 4
## free free 4
## today today 3
## men men 3
## state state 3
## children children 3
## little little 3
## black black 3
## white white 3
## made made 3
## god god 3
## new new 3
## sing sing 3
## land land 3
## last last 3
## even even 2
## live live 2
## meaning meaning 2
## true true 2
## brotherhood brotherhood 2
## former former 2
## georgia georgia 2
## sons sons 2
## heat heat 2
## mississippi mississippi 2
## sweltering sweltering 2
## alabama alabama 2
## boys boys 2
## girls girls 2
## hands hands 2
## join join 2
## words words 2
## hill hill 2
## places places 2
## hope hope 2
## stone stone 2
## thee thee 2
## mountainside mountainside 2
## american american 1
## deeply deeply 1
## difficulties difficulties 1
## face face 1
## rooted rooted 1
## still still 1
## though though 1
## tomorrow tomorrow 1
## creed creed 1
## rise rise 1
## created created 1
## equal equal 1
## hold hold 1
## selfevident selfevident 1
## truths truths 1
## hills hills 1
## owners owners 1
## red red 1
## sit sit 1
## slave slave 1
## slaves slaves 1
## table table 1
## injustice injustice 1
## justice justice 1
## oasis oasis 1
## oppression oppression 1
## transformed transformed 1
## character character 1
## color color 1
## content content 1
## four four 1
## judged judged 1
## skin skin 1
## brothers brothers 1
## dripping dripping 1
## governor governor 1
## interposition interposition 1
## lips lips 1
## nullification nullification 1
## racists racists 1
## right right 1
## sisters sisters 1
## vicious vicious 1
## crooked crooked 1
## exalted exalted 1
## flesh flesh 1
## glory glory 1
## lord lord 1
## low low 1
## plain plain 1
## revealed revealed 1
## rough rough 1
## see see 1
## straight straight 1
## valley valley 1
## back back 1
## south south 1
## beautiful beautiful 1
## despair despair 1
## discords discords 1
## hew hew 1
## jail jail 1
## jangling jangling 1
## knowing knowing 1
## pray pray 1
## stand stand 1
## struggle struggle 1
## symphony symphony 1
## transform transform 1
## work work 1
## country country 1
## liberty liberty 1
## sweet sweet 1
## tis tis 1
## died died 1
## fathers fathers 1
## pilgrim pilgrim 1
## pride pride 1
## america america 1
## become become 1
## great great 1
## must must 1
## hampshire hampshire 1
## hilltops hilltops 1
## prodigious prodigious 1
## mighty mighty 1
## mountains mountains 1
## york york 1
## alleghenies alleghenies 1
## heightening heightening 1
## pennsylvania pennsylvania 1
## colorado colorado 1
## rockies rockies 1
## snowcapped snowcapped 1
## california california 1
## curvaceous curvaceous 1
## slopes slopes 1
## lookout lookout 1
## tennessee tennessee 1
## molehill molehill 1
## allow allow 1
## catholics catholics 1
## city city 1
## gentiles gentiles 1
## hamlet hamlet 1
## happens happens 1
## jews jews 1
## negro negro 1
## old old 1
## protestants protestants 1
## speed speed 1
## spiritual spiritual 1
## village village 1
## almighty almighty 1
## thank thank 1
ggplot(head(frecuencia_df,10), aes(x=reorder(word, -freq), y=freq)) +
geom_bar(stat="identity", fill= "lightblue") +
geom_text(aes(label = freq), vjust= -0.5) +
labs(title = "TOP 10 palabras más frecuentes", subtitle = "Discurso I have a dream de M. L. King", x="Palabra", y="Frequencia") +
ylim(0,20)
# El procesamiento de datos antes de la nube de palabras es igual que en el Análisis de frecuencias, desde importar el texto hasta frequencia_df
set.seed(123)
wordcloud(words = frecuencia_df$word, freq= frecuencia_df$freq, min.freq=1, random.order=FALSE, colors=brewer.pal(8,"RdPu"))
ocr_results <- pdf_convert("eso3.pdf") %>%
map(ocr)
## Converting page 1 to eso3_1.png... done!
## Converting page 2 to eso3_2.png... done!
## Converting page 3 to eso3_3.png... done!
# Combinar archivos en un solo texto
combined_text <- unlist(ocr_results) %>% paste(collapse = "\n")
# Creacion de documento vacio
doc2 <- read_docx()
#Añadir el texto combinado en el documento
doc2 <- doc2 %>% body_add_par(combined_text, style = "Normal")
print(doc2, target = "textoIT.docx")
it <- Corpus(VectorSource(combined_text)) #pone cada renglon en una celda de vector
it <- tm_map(it, content_transformer(tolower)) #Pone todo en minusculas
## Warning in tm_map.SimpleCorpus(it, content_transformer(tolower)):
## transformation drops documents
it <- tm_map(it, removePunctuation) #Elimina puntuacion
## Warning in tm_map.SimpleCorpus(it, removePunctuation): transformation drops
## documents
it <- tm_map(it, removeNumbers) #Elimina numeros
## Warning in tm_map.SimpleCorpus(it, removeNumbers): transformation drops
## documents
it <- tm_map(it, removeWords, stopwords("spa")) #Elimina palabras que no hablan del texto
## Warning in tm_map.SimpleCorpus(it, removeWords, stopwords("spa")):
## transformation drops documents
inspect(it)
## <<SimpleCorpus>>
## Metadata: corpus specific: 1, document level (indexed): 0
## Content: documents: 1
##
## [1] alli persiguiendo barco papel lado izquierdo witcham street corria\ndeprisa agua ganaba barquito sacando ventaja oy rugido profundo \nvio cincuenta metros més adelante colina abajo agua cuneta precipitaba\ndentro boca tormenta atin continuaba abierta largo semicirculo oscuro\nabierto bordillo acera mientras george miraba rama desgarrada corteza\n‘oscura reluciente hundié aquellas fauces alli pendié momento luego desliz\nhacia interior hacia alli encaminaba bote\n\n—imierda —chillé horrorizado\n\nforz paso momento parecié iba alcanzar barquito pies\nresbal george cayé despatarrado despellejandose rodilla grito dolor \nnueva perspectiva altura pavimento vio barco giraba redondo dos veces\nmomentaneamente atrapado remolino desaparecer\n\n—imierda mas mierda —volvié chilar estrellando pho pavimento\n\n dolio eché sollozar iqué manera tan estlipida perder barco\n\n levanto caminar hacia boca tormenta alli dejé caer rodillas mirar hacia\nelinterior agua hacia ruido hueco hlimedo caer oscuridad sonido daba\nescalofrios hacia pensar \n\n—ieh\n\n exclamacién arrancada cordel retrocedis\n\nalli adentro habia ojos amarillos tipo ojos siempre imaginaba verlos\nnunca oscuridad sétano £s animal —pens incoherente— animal alo\n‘mejor gato quedé atrapado\n\n modes echar correr habria corrido dos segundos \ntablero mental hecho cargo espanto produjeron dos ojos amarillos \nbrilantes sintié aspera superficie pavimento bajo dedos fina lamina agua fria\n corria alrededor vio si mismo levantandose retrocediendo entonces \ntna voz voz perfectamente razonable bastante simpatica hablé dentro \nboca tormenta\n\nhola george dio\n\ngeorge parpaded volvié mirar apenas podia dar crédito vela sacado\n cuento pelicula sabe animales hablan bailan si \n diez afios mas habria creido viendo tenia dieciséis afios sino\nseis\n\n boca tormenta habia payaso uz distaba ser buena basté \ngeorge denbrough seguro vela payaso elcirco tele\nparecia mezcla bozo clarabell hablaba haciendo sonar becina howdy\ndoody sabados mafana bufalo bob unico entendia clarabell \nsiempre hacia reir george cara payaso metido boca tormenta blanca tenia\n‘cémicos mechones pelo rojo cada lado calva gran sonrisa payaso pintada\n\nalrededor boca si george vivide afios después habria pensado ronald\nmcdonald bozo clarabell\n\n payase tenia mano mangjo globos colores tentadora fruta\nmadura\n\n barquito papel george\n\n—quieres barquito georgie payaso sonrela\n\ngeorge sonri podia evitarlo aquella sonrisa tipe une devuetve \nquerer\n\n— supuesto\n\n payaso eché ret\n\n—spor supuesto» iasi gusta iasi gusta zy globo parece quieres globo\n—bueno si supuesto —alargé mano inmediato retiré voluntad— \ndebo coger ofrezca desconacido dice papa\n\n— papa mucha razén —replicé payaso boca tormenta sonriendo george \npregunto cémo podia haber creido ojos amarillos si color azul brillante\nbailarin ojos mama bill— muchisima razon creo voy \npresentarme george sefior bob gray conacido pennywise payaso\nballarin pennywise presento george denbrough george presento pennywise ahora\n conocemos desconacido tampoco correcto\n\ngeorge solté risita\n\n—correcto —volvié estirar mano retirarla— gcémo metiste alli adentro\n\n— tormenta trajo volaaaando —dijo pennywise payaso bailarin— llev citco\n sientes olor circo george\n\ngeorge incliné hacia adelante ide pronto olla cacahuetes icacahuetes tostados iv vinagre\nblanco pone patatas fritas agujero tapa olia algoden \nazuicar bunuelos leve poderosamente estiércol animales salvajes olia \n‘aroma regocijante aserrin ¥ embargo\n\n‘ embargo bajo olla inundacién hojas deshechas oscuras sombras bocas\n tormenta olor himedo putrido olor sétano\n\n olores mas fuertes\n\nclaro huelo —dij\n\n—quieres barquito george —pregunt pennywise— pregunto vez \npareces desearlo \n\n mostré alto sonriendo llevaba traje seda abolsado grandes botones color\nnaranja corbata brillante color azul eléctrico derramaba pechera \n‘manos llevaba grandes guantes blancos mickey donald\n\nsi claro —dijo george mirando dentro boca tormenta\n\n—¥ globo rojos verdes amarillos azules\n\n—élotan\n\n si flotan — sonrisa payaso acentué— oh si claro si iflotan \nalgodén aziicar\n\ngeorge estiré mano\n\n payaso sujetd brazo\n\nyentonces george vio cara payaso cambiaba\n\n vio entonces tan terrible peor habia imaginado cosa sétano\nparecia dulce suefio vio destruy cordura zarpazo\n\n—flotan —croé cosa alcantarila voz rela coagulos\n\nsujelaba brazo george puiio grueso agusanado tiré hacia horrible\n‘oscuridad agua corria rugia aullaba llevando hacia mar desechos \ntormenta george esti cuello apartarse negrura definitiva empez gritar hacia\n lluvia gritar loco hacia gris cielo otofial curvaba derry aquel dia \notofio gritos agudos penetrantes largo toda calle gente \nasomé ventanas lanz porches\n\n—flotan —grufé cosa— flotan georgie aqui abajo conmigo \nflotaras\n\nelhombro gearge clavé cemento bordillo dave gardener dia \nhabia ido trabajar shoeboat debido inundacion vio sélo nino impermeable\namarillo nifio gritaba retorcia arroyo mientras agua lodosa corria \n‘cara haciendo alaridos sonaran burbujeantes\n\naqui abajo flota —susurré voz podrida riendo pronto soné desgarto \n destello agonia george denbrough supo mas\n\ndave gardener primero llegar aunque lego slo cuarenta cinco segundos después\n primer grito george denbrough habia muerto gardener agarré impermeable tie\n sacarlo ala calle girar manos cuerpo george tambien empezé \ngritar lado izquierdo impermeable nifo rojo intenso sangre fluia hacia\n alcantarilla agujero habia brazo izquierdo trazo hueso\nhorriblemente brillante asomaba tela rota\n\n ojos nitio miraban fijamente cielo gris mientras dave retrocedia tropezones hacia\n corrian calle empezaron llenarse lluvia\n
tdm2 <- TermDocumentMatrix(it)
m2 <- as.matrix(tdm2) #CUenta las veces que aparece cada palabra por renglon
frecuencia2 <- sort(rowSums(m2), decreasing = TRUE) # Cuenta frecuencia de cada palabra en el texto completo
frecuencia_df2 <- data.frame(word=names(frecuencia2), freq=frecuencia2)
frecuencia_df2 #Convierte la frecuencia en un dataframe
## word freq
## george george 25
## hacia hacia 14
## payaso payaso 12
## tormenta tormenta 10
## boca boca 8
## vio vio 7
## agua agua 6
## alli alli 6
## habia habia 6
## ojos ojos 6
## barquito barquito 5
## amarillos amarillos 4
## corria corria 4
## denbrough denbrough 4
## mano mano 4
## mas mas 4
## pennywise pennywise 4
## voz voz 4
## — — 4
## abajo abajo 3
## afios afios 3
## barco barco 3
## brazo brazo 3
## brillante brillante 3
## calle calle 3
## clarabell clarabell 3
## claro claro 3
## color color 3
## dave dave 3
## dentro dentro 3
## dos dos 3
## gardener gardener 3
## globo globo 3
## gritar gritar 3
## habria habria 3
## impermeable impermeable 3
## izquierdo izquierdo 3
## lado lado 3
## mientras mientras 3
## olor olor 3
## pavimento pavimento 3
## podia podia 3
## sonrisa sonrisa 3
## sétano sétano 3
## tenia tenia 3
## adelante adelante 2
## adentro adentro 2
## agujero agujero 2
## alrededor alrededor 2
## animal animal 2
## animales animales 2
## aqui aqui 2
## atrapado atrapado 2
## azul azul 2
## bajo bajo 2
## bob bob 2
## bordillo bordillo 2
## bozo bozo 2
## caer caer 2
## cara cara 2
## cielo cielo 2
## cosa cosa 2
## creido creido 2
## desconacido desconacido 2
## después después 2
## dia dia 2
## eché eché 2
## embargo embargo 2
## entonces entonces 2
## flotan flotan 2
## georgie georgie 2
## grandes grandes 2
## gris gris 2
## grito grito 2
## gusta gusta 2
## haciendo haciendo 2
## iasi iasi 2
## largo largo 2
## llevaba llevaba 2
## lluvia lluvia 2
## mirar mirar 2
## momento momento 2
## olia olia 2
## olla olla 2
## oscuridad oscuridad 2
## papa papa 2
## papel papel 2
## parecia parecia 2
## pregunto pregunto 2
## presento presento 2
## pronto pronto 2
## rojo rojo 2
## segundos segundos 2
## siempre siempre 2
## sonriendo sonriendo 2
## supuesto supuesto 2
## tan tan 2
## vela vela 2
## —dijo —dijo 2
## —flotan —flotan 2
## —imierda —imierda 2
## —quieres —quieres 2
## —volvié —volvié 2
## abierta abierta 1
## abierto abierto 1
## abolsado abolsado 1
## acentué— acentué— 1
## acera acera 1
## agarré agarré 1
## agonia agonia 1
## agudos agudos 1
## agusanado agusanado 1
## ahora ahora 1
## ala ala 1
## alaridos alaridos 1
## alcantarila alcantarila 1
## alcantarilla alcantarilla 1
## alcanzar alcanzar 1
## algoden algoden 1
## algodén algodén 1
## alo alo 1
## alto alto 1
## altura altura 1
## amarillo amarillo 1
## apartarse apartarse 1
## apenas apenas 1
## aquel aquel 1
## aquella aquella 1
## aquellas aquellas 1
## arrancada arrancada 1
## arroyo arroyo 1
## aserrin aserrin 1
## asomaba asomaba 1
## asomé asomé 1
## aspera aspera 1
## atin atin 1
## aullaba aullaba 1
## aunque aunque 1
## aziicar aziicar 1
## azuicar azuicar 1
## azules azules 1
## bailan bailan 1
## bailarin bailarin 1
## bailarin— bailarin— 1
## ballarin ballarin 1
## bastante bastante 1
## basté basté 1
## becina becina 1
## bill— bill— 1
## blanca blanca 1
## blanco blanco 1
## blancos blancos 1
## bocas bocas 1
## bote bote 1
## botones botones 1
## brilantes brilantes 1
## buena buena 1
## bufalo bufalo 1
## bunuelos bunuelos 1
## burbujeantes burbujeantes 1
## cacahuetes cacahuetes 1
## cada cada 1
## calva calva 1
## cambiaba cambiaba 1
## caminar caminar 1
## cargo cargo 1
## cayé cayé 1
## cemento cemento 1
## chilar chilar 1
## cinco cinco 1
## cincuenta cincuenta 1
## circo circo 1
## citco citco 1
## clavé clavé 1
## coagulos coagulos 1
## coger coger 1
## colina colina 1
## colores colores 1
## conacido conacido 1
## conmigo conmigo 1
## conocemos conocemos 1
## continuaba continuaba 1
## corbata corbata 1
## cordel cordel 1
## cordura cordura 1
## correcto correcto 1
## correr correr 1
## corrian corrian 1
## corrido corrido 1
## corteza corteza 1
## cosa— cosa— 1
## creo creo 1
## crédito crédito 1
## cuarenta cuarenta 1
## cuello cuello 1
## cuento cuento 1
## cuerpo cuerpo 1
## cuneta cuneta 1
## curvaba curvaba 1
## cémo cémo 1
## daba daba 1
## dar dar 1
## debido debido 1
## debo debo 1
## dedos dedos 1
## definitiva definitiva 1
## dejé dejé 1
## deprisa deprisa 1
## derramaba derramaba 1
## derry derry 1
## desaparecer desaparecer 1
## desearlo desearlo 1
## desechos desechos 1
## desgarrada desgarrada 1
## desgarto desgarto 1
## deshechas deshechas 1
## desliz desliz 1
## despatarrado despatarrado 1
## despellejandose despellejandose 1
## destello destello 1
## destruy destruy 1
## devuetve devuetve 1
## dice dice 1
## dieciséis dieciséis 1
## diez diez 1
## dio dio 1
## distaba distaba 1
## dolio dolio 1
## dolor dolor 1
## donald donald 1
## doody doody 1
## dulce dulce 1
## echar echar 1
## elcirco elcirco 1
## elhombro elhombro 1
## elinterior elinterior 1
## eléctrico eléctrico 1
## empez empez 1
## empezaron empezaron 1
## empezé empezé 1
## encaminaba encaminaba 1
## entendia entendia 1
## escalofrios escalofrios 1
## espanto espanto 1
## esti esti 1
## estirar estirar 1
## estiré estiré 1
## estiércol estiércol 1
## estlipida estlipida 1
## estrellando estrellando 1
## evitarlo evitarlo 1
## exclamacién exclamacién 1
## fauces fauces 1
## fijamente fijamente 1
## fina fina 1
## flota flota 1
## flotaras flotaras 1
## fluia fluia 1
## forz forz 1
## fria fria 1
## fritas fritas 1
## fruta fruta 1
## fuertes fuertes 1
## ganaba ganaba 1
## gato gato 1
## gcémo gcémo 1
## gearge gearge 1
## gente gente 1
## giraba giraba 1
## girar girar 1
## globos globos 1
## gran gran 1
## gray gray 1
## gritaba gritaba 1
## gritos gritos 1
## grueso grueso 1
## guantes guantes 1
## haber haber 1
## hablaba hablaba 1
## hablan hablan 1
## hablé hablé 1
## hecho hecho 1
## himedo himedo 1
## hlimedo hlimedo 1
## hojas hojas 1
## hola hola 1
## horrible horrible 1
## horriblemente horriblemente 1
## horrorizado horrorizado 1
## howdy howdy 1
## hueco hueco 1
## huelo huelo 1
## hueso hueso 1
## hundié hundié 1
## iba iba 1
## icacahuetes icacahuetes 1
## ide ide 1
## ido ido 1
## iflotan iflotan 1
## imaginaba imaginaba 1
## imaginado imaginado 1
## incliné incliné 1
## incoherente— incoherente— 1
## inmediato inmediato 1
## intenso intenso 1
## interior interior 1
## inundacion inundacion 1
## inundacién inundacién 1
## iqué iqué 1
## lamina lamina 1
## lanz lanz 1
## lego lego 1
## levantandose levantandose 1
## levanto levanto 1
## leve leve 1
## llegar llegar 1
## llenarse llenarse 1
## llev llev 1
## llevando llevando 1
## loco loco 1
## lodosa lodosa 1
## luego luego 1
## madura madura 1
## mafana mafana 1
## mama mama 1
## manera manera 1
## mangjo mangjo 1
## manos manos 1
## mar mar 1
## mcdonald mcdonald 1
## mechones mechones 1
## mental mental 1
## metido metido 1
## metiste metiste 1
## metros metros 1
## mezcla mezcla 1
## mickey mickey 1
## mierda mierda 1
## miraba miraba 1
## miraban miraban 1
## mirando mirando 1
## mismo mismo 1
## modes modes 1
## momentaneamente momentaneamente 1
## mostré mostré 1
## mucha mucha 1
## muchisima muchisima 1
## muerto muerto 1
## més més 1
## naranja naranja 1
## negrura negrura 1
## nifio nifio 1
## nifo nifo 1
## nino nino 1
## nitio nitio 1
## nueva nueva 1
## nunca nunca 1
## ofrezca ofrezca 1
## olores olores 1
## oscuras oscuras 1
## oscuro oscuro 1
## otofial otofial 1
## otofio otofio 1
## parece parece 1
## pareces pareces 1
## parecié parecié 1
## parpaded parpaded 1
## paso paso 1
## patatas patatas 1
## payase payase 1
## pechera pechera 1
## pelicula pelicula 1
## pelo pelo 1
## pendié pendié 1
## penetrantes penetrantes 1
## pennywise— pennywise— 1
## pensado pensado 1
## pensar pensar 1
## peor peor 1
## perder perder 1
## perfectamente perfectamente 1
## persiguiendo persiguiendo 1
## perspectiva perspectiva 1
## pho pho 1
## pies pies 1
## pintada pintada 1
## poderosamente poderosamente 1
## podrida podrida 1
## pone pone 1
## porches porches 1
## precipitaba precipitaba 1
## presentarme presentarme 1
## primer primer 1
## primero primero 1
## produjeron produjeron 1
## profundo profundo 1
## puiio puiio 1
## putrido putrido 1
## quedé quedé 1
## querer querer 1
## quieres quieres 1
## rama rama 1
## razon razon 1
## razonable razonable 1
## razén razén 1
## redondo redondo 1
## regocijante regocijante 1
## reir reir 1
## rela rela 1
## reluciente reluciente 1
## remolino remolino 1
## resbal resbal 1
## ret ret 1
## retirarla— retirarla— 1
## retiré retiré 1
## retorcia retorcia 1
## retrocedia retrocedia 1
## retrocediendo retrocediendo 1
## retrocedis retrocedis 1
## riendo riendo 1
## risita risita 1
## rodilla rodilla 1
## rodillas rodillas 1
## rojos rojos 1
## ronald ronald 1
## rota rota 1
## rugia rugia 1
## rugido rugido 1
## ruido ruido 1
## sabados sabados 1
## sabe sabe 1
## sacado sacado 1
## sacando sacando 1
## sacarlo sacarlo 1
## salvajes salvajes 1
## sangre sangre 1
## seda seda 1
## sefior sefior 1
## seguro seguro 1
## seis seis 1
## semicirculo semicirculo 1
## ser ser 1
## shoeboat shoeboat 1
## sientes sientes 1
## simpatica simpatica 1
## sino sino 1
## sintié sintié 1
## slo slo 1
## sollozar sollozar 1
## solté solté 1
## sombras sombras 1
## sonar sonar 1
## sonaran sonaran 1
## sonido sonido 1
## sonrela sonrela 1
## sonri sonri 1
## soné soné 1
## street street 1
## suefio suefio 1
## sujelaba sujelaba 1
## sujetd sujetd 1
## superficie superficie 1
## supo supo 1
## supuesto» supuesto» 1
## sélo sélo 1
## tablero tablero 1
## tambien tambien 1
## tampoco tampoco 1
## tapa tapa 1
## tela tela 1
## tele tele 1
## tentadora tentadora 1
## terrible terrible 1
## tie tie 1
## tipe tipe 1
## tipo tipo 1
## tiré tiré 1
## tna tna 1
## toda toda 1
## tostados tostados 1
## trabajar trabajar 1
## traje traje 1
## trajo trajo 1
## trazo trazo 1
## tropezones tropezones 1
## une une 1
## unico unico 1
## veces veces 1
## ventaja ventaja 1
## ventanas ventanas 1
## verdes verdes 1
## verlos verlos 1
## vez vez 1
## viendo viendo 1
## vinagre vinagre 1
## vivide vivide 1
## volaaaando volaaaando 1
## voluntad— voluntad— 1
## volvié volvié 1
## voy voy 1
## witcham witcham 1
## yentonces yentonces 1
## zarpazo zarpazo 1
## £s £s 1
## —alargé —alargé 1
## —bueno —bueno 1
## —chillé —chillé 1
## —correcto —correcto 1
## —croé —croé 1
## —dij —dij 1
## —grufé —grufé 1
## —ieh —ieh 1
## —pens —pens 1
## —pregunt —pregunt 1
## —replicé —replicé 1
## —spor —spor 1
## —susurré —susurré 1
## —¥ —¥ 1
## —élotan —élotan 1
## ‘ ‘ 1
## ‘aroma ‘aroma 1
## ‘cara ‘cara 1
## ‘cémicos ‘cémicos 1
## ‘manos ‘manos 1
## ‘mejor ‘mejor 1
## ‘oscura ‘oscura 1
## ‘oscuridad ‘oscuridad 1
ggplot(head(frecuencia_df2,10), aes(x=reorder(word, -freq), y=freq)) +
geom_bar(stat="identity", fill= "red") +
geom_text(aes(label = freq), vjust= -0.5) +
labs(title = "TOP 10 palabras más frecuentes", subtitle = "Capítulo IT", x="Palabra", y="Frequencia") +
ylim(0,30)
set.seed(123)
wordcloud(words = frecuencia_df2$word, freq= frecuencia_df2$freq, min.freq=1, random.order=FALSE, colors=brewer.pal(10,"Oranges"))