Load libraries.
library(tidyverse)
library(quanteda)
require(quanteda.textstats)
library(readtext)
Build a corpus of the texts.
x <- list.files(pattern = "(pdf|docx)$") %>%
readtext(ignore_missing_files = T) %>% texts
What regular past tense verbs did you use?
x <- tolower(x)
kwic(x, phrase("*ed$"), window=10) %>% summarise(keyword) %>% distinct %>% arrange(keyword)
keyword
1 abdicated
2 acclaimed
3 advanced
4 advised
5 affected
6 allowed
7 announced
8 aspired
9 assainated
10 assassinated
11 attached
12 attacked
13 based
14 beated
15 bed
16 behaved
17 believed
18 betrayed
19 called
20 campaigned
21 carried
22 challenged
23 changed
24 charged
25 cleaned
26 commanded
27 commissioned
28 completed
29 conferred
30 confronted
31 conquered
32 considered
33 contributed
34 controlled
35 cooperated
36 created
37 decided
38 decorated
39 defeated
40 demanded
41 denied
42 destroyed
43 detected
44 developed
45 died
46 dispatched
47 divided
48 dropped
49 earned
50 emerged
51 emphasized
52 employed
53 enacted
54 ended
55 enforced
56 enshrined
57 entered
58 entertained
59 escaped
60 established
61 evacuated
62 excited
63 exiled
64 expected
65 experienced
66 faced
67 failed
68 filled
69 finished
70 formed
71 founded
72 gained
73 gathered
74 graduated
75 handed
76 happened
77 hated
78 hedistroyed
79 hemarried
80 identified
81 impressed
82 improved
83 infiltrated
84 influenced
85 inherited
86 injured
87 intercepted
88 interested
89 invented
90 issued
91 joined
92 killed
93 lasted
94 learned
95 led
96 lied
97 lived
98 looked
99 loved
100 low-ranked
101 married
102 mildred
103 moved
104 named
105 need
106 noticed
107 objected
108 occupied
109 occurred
110 opened
111 opposed
112 ordered
113 overturned
114 participated
115 passed
116 played
117 promoted
118 promulgated
119 published
120 pulled
121 qualified
122 realized
123 received
124 recognized
125 recuperated
126 refused
127 rejected
128 rejoined
129 released
130 relied
131 renamed
132 researched
133 resisted
134 respected
135 retired
136 returned
137 robbed
138 ruled
139 saved
140 secured
141 served
142 shed
143 smoked
144 so-called
145 solved
146 started
147 strengthened
148 studied
149 succeeded
150 sucked
151 supported
152 synchronized
153 touched
154 treated
155 turned
156 unfinished
157 united
158 used
159 utilized
160 visited
161 walked
162 wandered
163 wanted
164 watched
165 worked
166 worried
Who was Mildred?
kwic(x, "mildred", window = 20) %>% summarise(pre, post)
pre
1 adams keller . the family member is father arthur , mother kate , two old half-brothers , a little sister
2 1909 , helen has joined a number of political protests . in 1922 , her mother kate and little sister
post
1 and herself . in 1882 , helen got meningitis at the age of one and a half . the doctor
2 died when she was 42 years old .
Who was she / he?
set.seed(1066)
kwic(x, phrase("*he was"), window=8) %>%
summarise(keyword, post) %>%
sample_n(50) %>%
summarise(answer = post)
answer
1 named takechiyo when he was young . capture
2 born on the french island of corsica in
3 the fourth of 12 siblings . later he
4 born on april 15 , 1452 in what
5 taught telegraph technology.this was his first step to
6 called " sanjiro " in childhood . his
7 commissioned to make paintings in january and may
8 worried , but he worked hard . and
9 66 years old .
10 renamed kinoshita tokichiro when he was 14 years
11 exiled to saint helena . after his life
12 leather of the indian dependence movement.his real name
13 chosen member of " american statistical association "
14 a shogun ! ieyasu went to the countryside
15 one year old . then he went to
16 japanese author who is considered one of the
17 six , his mother died . after a
18 born , and from 1457 he lived in
19 advised by his brother to study dutch studies
20 six . it is said that he met
21 born in osaka on january 10,1835 . his
22 42 years old .
23 born in osaka on january 10,1835 . his
24 swept away by st.helena and ended his life
25 a genius . he graduated from paris military
26 very smart . she heard the voice of
27 fourteen years old , he became a disciple
28 twenty four years old , he married one
29 one year old . then he went to
30 decorated in america.and in 1931 , he died
31 called the lady of the lamp . she
32 full of curiosity from an early age .
33 19 years old . a year later he
34 imprisoned.thereby peoples join the anti-british movement more and
35 brought up in komyoji temple , and he
36 young . capture as a hostage when he
37 55 years old , survey began . the
38 advised by his brother to study dutch studies
39 taught latin , math and geometry by students
40 15 , he saved a child who was
41 14 years old . in 1554 , this
42 taught enough education for example , writing ,
43 doing a lot of experiments at home.when he
44 in the noble family , he was taught
45 20 years old who graduated from perkins school
46 little . so , she used that experience
47 71 , he started to create a map.but
48 mainly active in the duchy of milan from
49 born in italy in 1820 . her family
50 18 , he moved to london , england
Common words
x %>% tokens(remove_punct=T) %>% dfm %>%
dfm_remove(stopwords("en")) %>%
textstat_frequency() %>%
filter(frequency>=2) %>%
summarise(word = feature, frequency) %>%
arrange(-frequency, word)
word frequency
1 years 35
2 battle 22
3 old 22
4 died 21
5 born 20
6 hideyoshi 20
7 japan 20
8 later 20
9 called 18
10 ieyasu 17
11 people 17
12 famous 16
13 school 16
14 went 16
15 family 15
16 got 15
17 made 15
18 name 15
19 nightingale 15
20 became 14
21 death 14
22 napoleon 14
23 one 14
24 first 13
25 life 13
26 war 13
27 age 12
28 father 12
29 helen 12
30 said 12
31 book 10
32 lot 10
33 year 10
34 yukichi 10
35 also 9
36 company 9
37 established 9
38 france 9
39 french 9
40 known 9
41 left 9
42 leonardo 9
43 make 9
44 tadataka 9
45 time 9
46 vinci 9
47 began 8
48 disney 8
49 dutch 8
50 fukuzawa 8
51 great 8
52 nobunaga 8
53 started 8
54 toyotomi 8
55 various 8
56 world 8
57 bonaparte 7
58 england 7
59 however 7
60 mitsuhide 7
61 nurses 7
62 osamu 7
63 published 7
64 system 7
65 won 7
66 active 6
67 brother 6
68 can 6
69 da 6
70 dazai 6
71 decided 6
72 ino 6
73 keller 6
74 learned 6
75 learning 6
76 little 6
77 manuscripts 6
78 married 6
79 osaka 6
80 paris 6
81 revolution 6
82 study 6
83 two 6
84 village 6
85 walt 6
86 word 6
87 1948 5
88 2 5
89 adams 5
90 akechi 5
91 completed 5
92 created 5
93 dahlia 5
94 emperor 5
95 entered 5
96 fields 5
97 fight 5
98 five 5
99 following 5
100 go 5
101 half 5
102 heard 5
103 hospital 5
104 india 5
105 indian 5
106 invented 5
107 japanese 5
108 lost 5
109 map 5
110 may 5
111 means 5
112 mother 5
113 named 5
114 napoleon's 5
115 never 5
116 new 5
117 now 5
118 nurse 5
119 person 5
120 result 5
121 six 5
122 tatakai 5
123 way 5
124 words 5
125 10,000 4
126 1582 4
127 20 4
128 38 4
129 academy 4
130 addition 4
131 among 4
132 arsenic 4
133 best 4
134 built 4
135 cancer 4
136 castle 4
137 cause 4
138 cram 4
139 disneyland 4
140 early 4
141 edo 4
142 elba 4
143 europe 4
144 finished 4
145 future 4
146 get 4
147 gradually 4
148 happened 4
149 high 4
150 home 4
151 human 4
152 important 4
153 incident 4
154 influence 4
155 island 4
156 january 4
157 job 4
158 josephine 4
159 keio 4
160 knowledge 4
161 learn 4
162 lived 4
163 loved 4
164 many 4
165 military 4
166 novel 4
167 number 4
168 oda 4
169 outline 4
170 part 4
171 partner 4
172 perspective 4
173 picture 4
174 position 4
175 power 4
176 records 4
177 researched 4
178 returned 4
179 rich 4
180 scutari 4
181 soldiers 4
182 still 4
183 studies 4
184 taught 4
185 technique 4
186 tekijuku 4
187 therefore 4
188 things 4
189 three 4
190 tokugawa 4
191 used 4
192 worked 4
193 wrote 4
194 yamazaki 4
195 yen 4
196 12 3
197 13 3
198 15 3
199 19 3
200 africa 3
201 animation 3
202 become 3
203 british 3
204 broke 3
205 brought 3
206 challenged 3
207 cherry 3
208 child 3
209 childhood 3
210 countries 3
211 crimean 3
212 currently 3
213 days 3
214 de 3
215 developed 3
216 draw 3
217 drawn 3
218 drew 3
219 education 3
220 ended 3
221 escaped 3
222 etc 3
223 example 3
224 experience 3
225 full 3
226 furthermore 3
227 gandhi 3
228 gastric 3
229 gave 3
230 genius 3
231 girl 3
232 good 3
233 graduated 3
234 hard 3
235 hated 3
236 helicopter 3
237 improved 3
238 inos 3
239 italian 3
240 joined 3
241 june 3
242 just 3
243 killed 3
244 king 3
245 last 3
246 led 3
247 lifetime 3
248 like 3
249 looked 3
250 marriage 3
251 master 3
252 met 3
253 money 3
254 moved 3
255 overturned 3
256 painting 3
257 paintings 3
258 reading 3
259 real 3
260 referendum 3
261 released 3
262 rose 3
263 salt 3
264 samurai 3
265 sea 3
266 second 3
267 serve 3
268 shogun 3
269 since 3
270 sister 3
271 son 3
272 south 3
273 student 3
274 studying 3
275 supper 3
276 teacher 3
277 temple 3
278 think 3
279 top 3
280 treated 3
281 universal 3
282 university 3
283 walked 3
284 work 3
285 world's 3
286 ~ 2
287 1 2
288 10,1835 2
289 13,000 2
290 14 2
291 1478 2
292 1499 2
293 1590 2
294 1600 2
295 17 2
296 1769 2
297 1789 2
298 18 2
299 1804 2
300 1806 2
301 1820 2
302 1870s 2
303 1882 2
304 1888 2
305 1901 2
306 1947 2
307 1984 2
308 2024 2
309 23 2
310 27 2
311 35 2
312 40 2
313 55 2
314 66 2
315 71 2
316 9 2
317 able 2
318 absolute 2
319 academic 2
320 acclaimed 2
321 achievements 2
322 advised 2
323 aerial 2
324 affairs 2
325 afterwards 2
326 air 2
327 all-around 2
328 along 2
329 american 2
330 anatomist 2
331 anatomy 2
332 angry 2
333 anime 2
334 ann 2
335 announced 2
336 aristocrat 2
337 arthur 2
338 artists 2
339 association 2
340 asthma 2
341 attach 2
342 attacked 2
343 attending 2
344 august 2
345 austerlitz 2
346 austria 2
347 away 2
348 banknote 2
349 based 2
350 believed 2
351 bill 2
352 bird 2
353 birth 2
354 birthday 2
355 blind 2
356 blueprints 2
357 bluish 2
358 blurring 2
359 borderline 2
360 boy 2
361 bream 2
362 care 2
363 chaebol 2
364 changed 2
365 characters 2
366 chicago 2
367 chosen 2
368 church 2
369 citizens 2
370 city 2
371 civil 2
372 cleaning 2
373 close 2
374 code 2
375 colors 2
376 commemorate 2
377 considerations 2
378 considered 2
379 contain 2
380 contents 2
381 contributed 2
382 control 2
383 controlled 2
384 corsica 2
385 course 2
386 deaf 2
387 december 2
388 defeated 2
389 denied 2
390 depending 2
391 despite 2
392 detected 2
393 development 2
394 di 2
395 difference 2
396 difficult 2
397 difficulties 2
398 disciples 2
399 discriminate 2
400 disobedience 2
401 distant 2
402 diverse 2
403 drawing 2
404 dream 2
405 duchy 2
406 edison 2
407 elementary 2
408 exaggeration 2
409 exiled 2
410 expedition 2
411 experienced 2
412 expresses 2
413 ezo 2
414 favorite 2
415 finally 2
416 find 2
417 flying 2
418 foreign 2
419 forget 2
420 founded 2
421 four 2
422 francisco 2
423 freedom 2
424 general 2
425 gifu 2
426 globally 2
427 god 2
428 government 2
429 gradation 2
430 grade 2
431 grew 2
432 group 2
433 heaven 2
434 helena 2
435 hen 2
436 hidetada 2
437 hinduism 2
438 historically 2
439 history 2
440 hitler 2
441 honnohji 2
442 honnoji 2
443 house 2
444 humans 2
445 identity 2
446 immediately 2
447 in1901 2
448 including 2
449 injured 2
450 integrate 2
451 interested 2
452 introduce 2
453 invents 2
454 ishida 2
455 italy 2
456 kanji 2
457 kate 2
458 koan 2
459 lamp 2
460 land 2
461 large 2
462 law 2
463 lawyer 2
464 light 2
465 lisa 2
466 live 2
467 london 2
468 look 2
469 love 2
470 lower 2
471 luke 2
472 mahatma 2
473 main 2
474 makes 2
475 man 2
476 manaka 2
477 march 2
478 marriages 2
479 masterpieces 2
480 member 2
481 merosu 2
482 mickey 2
483 mickey's 2
484 milan 2
485 mildred 2
486 mitsunari 2
487 mona 2
488 monarchy 2
489 months 2
490 mouse 2
491 movement 2
492 ms 2
493 muscles 2
494 nagasaki 2
495 nagasawas 2
496 natural 2
497 nature 2
498 next 2
499 night 2
500 ningen 2
501 non 2
502 objects 2
503 october 2
504 ogata 2
505 oita 2
506 opened 2
507 opposed 2
508 opposite 2
509 original 2
510 park 2
511 perkins 2
512 philosophy 2
513 political 2
514 politics 2
515 popular 2
516 posterity 2
517 problems 2
518 producing 2
519 prototype 2
520 pursue 2
521 racist 2
522 realism 2
523 realized 2
524 received 2
525 recommendation 2
526 recommendations 2
527 rejoined 2
528 renaissance 2
529 researching 2
530 rest 2
531 return 2
532 revolutionary 2
533 run 2
534 russian 2
535 sanjiro 2
536 say 2
537 scholar 2
538 school.then 2
539 scientist 2
540 see 2
541 sentence 2
542 ser 2
543 serially 2
544 serinuntiusu 2
545 sfumato 2
546 shikkaku 2
547 ship 2
548 shogunate 2
549 short 2
550 show 2
551 sketch 2
552 sketches 2
553 smart 2
554 so-called 2
555 society 2
556 speak 2
557 spent 2
558 st 2
559 statistical 2
560 story 2
561 strengthened 2
562 succeeded 2
563 sullivan 2
564 survey 2
565 systems 2
566 talent 2
567 tanaka 2
568 tani 2
569 tekijuku.at 2
570 tempura 2
571 tenkawakeme 2
572 thought 2
573 titles 2
574 toilet 2
575 tomie 2
576 took 2
577 turn 2
578 turned 2
579 unfortunately 2
580 union 2
581 university.and 2
582 uomo 2
583 usa.he 2
584 violence 2
585 visited 2
586 vitruviano 2
587 vocalism 2
588 voice 2
589 walking 2
590 want 2
591 warriors 2
592 watching 2
593 waterloo 2
594 wealthy 2
595 western 2
596 whether 2
597 white 2
598 winter 2
599 without 2
600 women 2
601 writer 2
602 writing 2
603 written 2
604 yoshida 2
605 young 2
606 yuma 2
Three-word Collocations
x %>% textstat_collocations(size=3) %>%
summarise(collocation, count) %>%
filter(count>3) %>%
arrange(-count, collocation)
collocation count
1 when he was 13
2 he went to 12
3 was born in 12
4 a lot of 9
5 the battle of 9
6 the age of 8
7 is said that 7
8 it is said 7
9 one of the 7
10 at the age 6
11 he was born 6
12 battle is called 5
13 leonardo da vinci 5
14 the following year 5
15 born in the 4
16 cause of death 4
17 da vinci is 4
18 does not make 4
19 he made a 4
20 helen adams keller 4
21 his brother to 4
22 in the village 4
23 is a famous 4
24 is called the 4
25 not make people 4
26 that he is 4
27 the battle is 4
28 was born on 4
Grammar Point - You need to put a space after a period.
kwic(x, pattern = "\\.\\S", window = 10,
valuetype="regex") %>%
sample_n(15) %>%
summarise(keyword)
keyword
1 tekijuku.at
2 march.he
3 ohio.he
4 act.gandhi
5 independence.his
6 them.we
7 himself.in
8 technology.this
9 university.and
10 people.but
11 usa.he
12 tomorrow.learn
13 months.he
14 britain.also
15 days.from
Word Count Graph
df <- data.frame(x)
df$words <- ntoken(x)
df %>% ggplot(aes(x=words)) +
geom_density() +
labs(y="") +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank())

Comment: Some of these words are adjectives and others are not verbs, such as “Mildred”.