Background

Often useful to distinguish between personal and organisational tweets in document/ discourse analysis of twitter.

This note sets out some ideas for doing this.

We use

Extract tweets

We are primarily interested in the user descriptions - this case retweet_description and usernames.

Idea 1 - disclaimers

This filters tweets where the retweet description contains the word view, View, views, Views. Of 3190, 254 contain one or more of these terms.

We can see how successful this strategy is by reviewing a sample of the descriptions.

retweet_name name retweet_description
Prof. Iain Buchan Public Health Data Science Public Health and Clinical Informatics researcher helping data and algorithms work at scale for societal gain. Views my own.
David Baker Public Health Data Science Director of Partnerships and Innovation at Sandwell and West Birmingham NHS Trust. Views are my own. Thank you
Eustis Corrigan Public Health Data Science Senior Managing Director, #CBIZ #Memphis (NYSE: @CBZ) | Collaborative Leader | Valued Advisor | Views are mine - RTs are not endorsements
Vidhi Thakkar Public Health Data Science Health Policy PhD HSPR, views are my own, Digital Health, Educator, @MacHealthSci, patient oriented research, health systems design, lifelong learner @ihpmeuoft
EmTech MENA Public Health Data Science Discover the emerging technologies that will change the world through the #EmTechMENA platform. Organized by @TechReviewAR in partnership with @dubaifuture
Michelle Fenner Public Health Data Science Michelle Fenner - #digital #healthcare Digital Marketing and Storytelling Perpetual Mediator. Views are 100% my own!
Jeff Acaba Public Health Data Science #zerodiscrimination, fulfilment of #SRHR, recognition of #LGBTIQ rights, & respect to #humanrights as we #endAIDS and #endTB. Views & RTs my own. 🏳️‍🌈☕️
Pharma Tech Outlook Public Health Data Science Pharma Tech Outlook has contributors from the most established organizations who have been presenting their viewpoint using this unique print platform.
Carol Sinclair Public Health Data Science Assoc Director, Public Health&Intelligence, NHSNSS/Non-Exec Director, Scottish Ambulance Service. Passionate about innovation in public sector. All views my own
Brett Keller Public Health Data Science Roving data guy, occasional musician. Job: @PSIimpact. Alum of @WilsonSchool econ/policy, @JohnsHopkinsSPH global epi, @TrumanScholars. Opinions/views = mine.

In the sample we can see that often retweeters are recognisable individual names though not always.

We will add a new column to the tweet table (add a feature).

This approach is unlikely to filter all personal users - we will need additional strategies.

Idea 2 - detect use of personal pronouns

The idea here is that individuals are more likely to use singular personal pronouns like I and me than organisations.

Detecting this can be done with powerful NLP tools, but a much quicker method is to use the textfeature package which is very quick and can detect personal pronouns.

## ↪ Counting features in text...
## ↪ Sentiment analysis...
## ↪ Parts of speech...
## ↪ Word dimensions started
## ✔ Job's done!
## # A tibble: 3,190 x 134
##    n_urls n_uq_urls n_hashtags n_uq_hashtags n_mentions n_uq_mentions n_chars
##     <int>     <int>      <int>         <int>      <int>         <int>   <int>
##  1      0         0          4             4          0             0     129
##  2      0         0          0             0          1             1      21
##  3      0         0          2             2          1             1     127
##  4      0         0          0             0          0             0     124
##  5      1         1          1             1          2             2      99
##  6      0         0          0             0          0             0     136
##  7      0         0          0             0          0             0     121
##  8      0         0          1             1          0             0     144
##  9      0         0          0             0          0             0      36
## 10      0         0          0             0          0             0     133
## # … with 3,180 more rows, and 127 more variables: n_uq_chars <int>,
## #   n_commas <int>, n_digits <int>, n_exclaims <int>, n_extraspaces <int>,
## #   n_lowers <int>, n_lowersp <dbl>, n_periods <int>, n_words <int>,
## #   n_uq_words <int>, n_caps <int>, n_nonasciis <int>, n_puncts <int>,
## #   n_capsp <dbl>, n_charsperword <dbl>, sent_afinn <dbl>, sent_bing <dbl>,
## #   sent_syuzhet <dbl>, sent_vader <dbl>, n_polite <dbl>, n_first_person <int>,
## #   n_first_personp <int>, n_second_person <int>, n_second_personp <int>,
## #   n_third_person <int>, n_tobe <int>, n_prepositions <int>, w1 <dbl>,
## #   w2 <dbl>, w3 <dbl>, w4 <dbl>, w5 <dbl>, w6 <dbl>, w7 <dbl>, w8 <dbl>,
## #   w9 <dbl>, w10 <dbl>, w11 <dbl>, w12 <dbl>, w13 <dbl>, w14 <dbl>, w15 <dbl>,
## #   w16 <dbl>, w17 <dbl>, w18 <dbl>, w19 <dbl>, w20 <dbl>, w21 <dbl>,
## #   w22 <dbl>, w23 <dbl>, w24 <dbl>, w25 <dbl>, w26 <dbl>, w27 <dbl>,
## #   w28 <dbl>, w29 <dbl>, w30 <dbl>, w31 <dbl>, w32 <dbl>, w33 <dbl>,
## #   w34 <dbl>, w35 <dbl>, w36 <dbl>, w37 <dbl>, w38 <dbl>, w39 <dbl>,
## #   w40 <dbl>, w41 <dbl>, w42 <dbl>, w43 <dbl>, w44 <dbl>, w45 <dbl>,
## #   w46 <dbl>, w47 <dbl>, w48 <dbl>, w49 <dbl>, w50 <dbl>, w51 <dbl>,
## #   w52 <dbl>, w53 <dbl>, w54 <dbl>, w55 <dbl>, w56 <dbl>, w57 <dbl>,
## #   w58 <dbl>, w59 <dbl>, w60 <dbl>, w61 <dbl>, w62 <dbl>, w63 <dbl>,
## #   w64 <dbl>, w65 <dbl>, w66 <dbl>, w67 <dbl>, w68 <dbl>, w69 <dbl>,
## #   w70 <dbl>, w71 <dbl>, w72 <dbl>, w73 <dbl>, …

textfeature creates a range of other text features for each tweet which might be useful - it also attaches sentiment scores using a variety of sentimeng algorithms.

We’ll combine the features generated this way with the original tweet data and a new column for tweets which have at least one singular pronoun.

We can summarise at the overlap between tweets with disclaimers and those which have first person singular references.

views sing_pron n
0 0 2612
0 1 324
1 0 73
1 1 181

Idea 3 - named entity recognition

Using Natural Language Processing techniques…

Makes use of the spacyr package. spacy is trained on very large datasets like Wiki to learn categories including names and organisations. We can then annotate the tweet text with labels to add whether it is from a person, organisation (or both) and combine this annotations from the first two ideas. This gives a number of tweet categories summarised in the table.

## Observations: 3,190
## Variables: 229
## $ user_id                 <chr> "983658091874054144", "983658091874054144", "…
## $ status_id               <chr> "1212881931458752517", "1212876982377230337",…
## $ created_at              <dttm> 2020-01-02 23:42:41, 2020-01-02 23:23:01, 20…
## $ screen_name             <chr> "PublicHealthBot", "PublicHealthBot", "Public…
## $ text                    <chr> "Manager, Epidemiology Analytics\nBuckinghams…
## $ source                  <chr> "PHDS bot mark 2", "PHDS bot mark 2", "PHDS b…
## $ display_text_width      <dbl> 120, 140, 144, 113, 140, 120, 140, 140, 140, …
## $ reply_to_status_id      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ reply_to_user_id        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ reply_to_screen_name    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ is_quote                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ is_retweet              <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU…
## $ favorite_count          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ retweet_count           <int> 1, 1, 1, 1, 2, 3, 2, 1, 1, 1, 1, 1, 1, 4, 1, …
## $ quote_count             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ reply_count             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hashtags                <list> ["Epijobs", NA, "AI", NA, NA, NA, NA, NA, <"…
## $ symbols                 <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ urls_url                <list> ["jobs.jnj.com/jobs/101019121…", NA, NA, NA,…
## $ urls_t.co               <list> ["https://t.co/X4nIl0LapT", NA, NA, NA, NA, …
## $ urls_expanded_url       <list> ["https://jobs.jnj.com/jobs/1010191211?lang=…
## $ media_url               <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ media_t.co              <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ media_expanded_url      <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ media_type              <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ext_media_url           <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ext_media_t.co          <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ext_media_expanded_url  <list> [NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ext_media_type          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ mentions_user_id        <list> ["632184479", <"943222483326513152", "119496…
## $ mentions_screen_name    <list> ["Epijobs", <"sami_barrit", "VPrasadMDMPH">,…
## $ lang                    <chr> "en", "en", "en", "en", "en", "en", "en", "en…
## $ quoted_status_id        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_text             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_created_at       <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ quoted_source           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_favorite_count   <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_retweet_count    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_user_id          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_screen_name      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_name             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_followers_count  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_friends_count    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_statuses_count   <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_location         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_description      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ quoted_verified         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ retweet_status_id       <chr> "1212881274391670785", "1212873956149075968",…
## $ retweet_text            <chr> "Manager, Epidemiology Analytics\nBuckinghams…
## $ retweet_created_at      <dttm> 2020-01-02 23:40:04, 2020-01-02 23:10:59, 20…
## $ retweet_source          <chr> "Twitter Web App", "Twitter for iPhone", "Lin…
## $ retweet_favorite_count  <int> 3, 4, 3, 1, 1, 6, 2, 1, 1, 2, 2, 1, 4, 3, 2, …
## $ retweet_retweet_count   <int> 1, 1, 1, 1, 2, 3, 2, 1, 1, 1, 1, 1, 1, 4, 1, …
## $ retweet_user_id         <chr> "632184479", "943222483326513152", "306053959…
## $ retweet_screen_name     <chr> "Epijobs", "sami_barrit", "StephTweetChat", "…
## $ retweet_name            <chr> "Epi Job Openings", "Barrit Sami", "Steph S. …
## $ retweet_followers_count <int> 1058, 88, 7544, 341, 567, 2809, 12099, 55, 12…
## $ retweet_friends_count   <int> 91, 797, 6984, 685, 2059, 702, 344, 68, 481, …
## $ retweet_statuses_count  <int> 41375, 789, 8952, 314, 664, 10499, 16069, 666…
## $ retweet_location        <chr> "", "Les Marolles", "St Louis, MO", "Philadel…
## $ retweet_description     <chr> "Your \"ONE STOP\" for the latest Job Opportu…
## $ retweet_verified        <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ place_url               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ place_name              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ place_full_name         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ place_type              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ country                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ country_code            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ geo_coords              <list> [<NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA…
## $ coords_coords           <list> [<NA, NA>, <NA, NA>, <NA, NA>, <NA, NA>, <NA…
## $ bbox_coords             <list> [<NA, NA, NA, NA, NA, NA, NA, NA>, <NA, NA, …
## $ status_url              <chr> "https://twitter.com/PublicHealthBot/status/1…
## $ name                    <chr> "Public Health Data Science", "Public Health …
## $ location                <chr> "England, United Kingdom", "England, United K…
## $ description             <chr> "Interested in #PublicHealth and #DataScience…
## $ url                     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ protected               <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ followers_count         <int> 755, 755, 755, 755, 755, 755, 755, 755, 755, …
## $ friends_count           <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ listed_count            <int> 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 1…
## $ statuses_count          <int> 10620, 10620, 10620, 10620, 10620, 10620, 106…
## $ favourites_count        <int> 9945, 9945, 9945, 9945, 9945, 9945, 9945, 994…
## $ account_created_at      <dttm> 2018-04-10 10:48:59, 2018-04-10 10:48:59, 20…
## $ verified                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ profile_url             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ profile_expanded_url    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ account_lang            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ profile_banner_url      <chr> "https://pbs.twimg.com/profile_banners/983658…
## $ profile_background_url  <chr> "http://abs.twimg.com/images/themes/theme1/bg…
## $ profile_image_url       <chr> "http://pbs.twimg.com/profile_images/98366123…
## $ views                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, …
## $ n_urls                  <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ n_uq_urls               <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ n_hashtags              <int> 4, 0, 2, 0, 1, 0, 0, 1, 0, 0, 1, 0, 6, 0, 11,…
## $ n_uq_hashtags           <int> 4, 0, 2, 0, 1, 0, 0, 1, 0, 0, 1, 0, 6, 0, 11,…
## $ n_mentions              <int> 0, 1, 1, 0, 2, 0, 0, 0, 0, 0, 4, 0, 1, 0, 0, …
## $ n_uq_mentions           <int> 0, 1, 1, 0, 2, 0, 0, 0, 0, 0, 4, 0, 1, 0, 0, …
## $ n_chars                 <int> 129, 21, 127, 124, 99, 136, 121, 144, 36, 133…
## $ n_uq_chars              <int> 34, 13, 37, 28, 27, 33, 33, 27, 18, 38, 25, 1…
## $ n_commas                <int> 0, 0, 1, 1, 1, 2, 0, 0, 0, 1, 0, 3, 3, 4, 0, …
## $ n_digits                <int> 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, …
## $ n_exclaims              <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ n_extraspaces           <int> 1, 0, 1, 0, 1, 1, 0, 0, 0, 3, 4, 0, 0, 0, 1, …
## $ n_lowers                <int> 97, 20, 104, 113, 92, 116, 98, 138, 32, 109, …
## $ n_lowersp               <dbl> 0.7538462, 0.9545455, 0.8203125, 0.9120000, 0…
## $ n_periods               <int> 1, 0, 1, 0, 1, 2, 2, 1, 0, 4, 0, 0, 2, 3, 0, …
## $ n_words                 <int> 17, 3, 19, 19, 15, 22, 22, 17, 3, 23, 13, 5, …
## $ n_uq_words              <int> 17, 3, 16, 15, 15, 20, 19, 17, 3, 22, 11, 5, …
## $ n_caps                  <int> 22, 0, 13, 9, 3, 10, 18, 4, 4, 14, 3, 1, 18, …
## $ n_nonasciis             <int> 0, 4, 0, 0, 0, 12, 3, 0, 0, 0, 0, 0, 0, 0, 16…
## $ n_puncts                <int> 9, 0, 8, 1, 2, 4, 3, 1, 0, 4, 5, 1, 9, 2, 17,…
## $ n_capsp                 <dbl> 0.17692308, 0.04545455, 0.10937500, 0.0800000…
## $ n_charsperword          <dbl> 7.222222, 5.500000, 6.400000, 6.250000, 6.250…
## $ sent_afinn              <dbl> 1, 0, -3, 0, 3, 0, 1, 1, 0, 2, 4, 0, -1, 0, 2…
## $ sent_bing               <dbl> 0, 0, -1, 0, 2, 0, 0, 1, 0, 1, 2, 0, 0, 1, 3,…
## $ sent_syuzhet            <dbl> 0.60, 0.00, 0.05, 2.20, 1.95, 0.00, 0.85, 0.7…
## $ sent_vader              <dbl> 0.4, 0.0, -1.5, 0.0, 5.0, 0.0, 2.3, 1.9, 0.0,…
## $ n_polite                <dbl> 0.5000000, 0.0000000, -0.6250000, 0.0000000, …
## $ n_first_person          <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, …
## $ n_first_personp         <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ n_second_person         <int> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ n_second_personp        <int> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, …
## $ n_third_person          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ n_tobe                  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, …
## $ n_prepositions          <int> 2, 0, 0, 3, 3, 4, 2, 2, 0, 2, 1, 0, 1, 1, 0, …
## $ w1                      <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w2                      <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w3                      <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w4                      <dbl> 0.0000000, 0.0000000, 0.1176471, 0.0000000, 0…
## $ w5                      <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0588235…
## $ w6                      <dbl> 0.00000000, 0.00000000, 0.05882353, 0.0000000…
## $ w7                      <dbl> 0.05555556, 0.00000000, 0.00000000, 0.0000000…
## $ w8                      <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w9                      <dbl> 0.00000000, 0.00000000, 0.05882353, 0.0000000…
## $ w10                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w11                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w12                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w13                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ w14                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.1176470…
## $ w15                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w16                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w17                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w18                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w19                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w20                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w21                     <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0…
## $ w22                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w23                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w24                     <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0…
## $ w25                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w26                     <dbl> 0.00000000, 0.00000000, 0.11764706, 0.0000000…
## $ w27                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w28                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w29                     <dbl> 0.16666667, 0.00000000, 0.00000000, 0.0000000…
## $ w30                     <dbl> 0.05555556, 0.00000000, 0.00000000, 0.0000000…
## $ w31                     <dbl> 0.00000000, 0.00000000, 0.05882353, 0.1176470…
## $ w32                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w33                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w34                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w35                     <dbl> 0.00000000, 0.50000000, 0.00000000, 0.0000000…
## $ w36                     <dbl> 0.00000000, 0.00000000, 0.05882353, 0.0000000…
## $ w37                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w38                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.1176470…
## $ w39                     <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0…
## $ w40                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0588235…
## $ w41                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w42                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w43                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w44                     <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0…
## $ w45                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ w46                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0588235…
## $ w47                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w48                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w49                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w50                     <dbl> 0.00000000, 0.00000000, 0.05882353, 0.0000000…
## $ w51                     <dbl> 0.05555556, 0.00000000, 0.00000000, 0.0588235…
## $ w52                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w53                     <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0…
## $ w54                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0588235…
## $ w55                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w56                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0588235…
## $ w57                     <dbl> 0.05555556, 0.00000000, 0.00000000, 0.0000000…
## $ w58                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w59                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w60                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w61                     <dbl> 0.05555556, 0.00000000, 0.00000000, 0.0000000…
## $ w62                     <dbl> 0.05555556, 0.00000000, 0.00000000, 0.0000000…
## $ w63                     <dbl> 0.00000000, 0.00000000, 0.05882353, 0.0000000…
## $ w64                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ w65                     <dbl> 0.00000000, 0.00000000, 0.29411765, 0.0000000…
## $ w66                     <dbl> 0.05555556, 0.00000000, 0.00000000, 0.0000000…
## $ w67                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ w68                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w69                     <dbl> 0.00000000, 0.00000000, 0.05882353, 0.0588235…
## $ w70                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w71                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.1176470…
## $ w72                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w73                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w74                     <dbl> 0.00000000, 0.50000000, 0.05882353, 0.0000000…
## $ w75                     <dbl> 0.16666667, 0.00000000, 0.00000000, 0.0000000…
## $ w76                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w77                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w78                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w79                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w80                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w81                     <dbl> 0.11111111, 0.00000000, 0.00000000, 0.0000000…
## $ w82                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w83                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w84                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w85                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w86                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0588235…
## $ w87                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w88                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w89                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w90                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w91                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w92                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w93                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w94                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w95                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ w96                     <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ w97                     <dbl> 0.05555556, 0.00000000, 0.00000000, 0.0000000…
## $ w98                     <dbl> 0.05555556, 0.00000000, 0.00000000, 0.0588235…
## $ w99                     <dbl> 0.05555556, 0.00000000, 0.00000000, 0.0000000…
## $ w100                    <dbl> 0.00000000, 0.00000000, 0.00000000, 0.0000000…
## $ sing_pron               <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, …
## $ desc1                   <chr> "Epi Job Openings Your \"ONE STOP\" for the l…
## $ row                     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14…
## $ doc_id                  <chr> "text1", "text2", "text3", "text4", "text5", …
## Found 'spacy_condaenv'. spacyr will use this environment
## successfully initialized (spaCy Version: 2.0.16, language model: en)
## (python options: type = "condaenv", value = "spacy_condaenv")
##   doc_id sentence_id token_id    token    lemma   pos  tag     entity
## 1  text1           1        1      Epi      epi PROPN  NNP      ORG_B
## 2  text1           1        2      Job      job PROPN  NNP      ORG_I
## 3  text1           1        3 Openings openings PROPN  NNP      ORG_I
## 4  text1           2        1     Your   -PRON-   ADJ PRP$           
## 5  text1           2        2        "        " PUNCT   ``           
## 6  text1           2        3      ONE      one   NUM   CD CARDINAL_B
##   nounphrase whitespace
## 1        beg       TRUE
## 2        mid       TRUE
## 3   end_root       TRUE
## 4        beg       TRUE
## 5        mid      FALSE
## 6        mid       TRUE
annotated_description person
Socially Determined Technology startup based in Washington [ GPE_B ] , DC [ GPE_B ] | Using # DataAnalytics [ WORK_OF_ART_B ] to improve health outcomes | Experts in the [ ORG_B ] Social [ ORG_I ] Determinants [ ORG_I ] of [ ORG_I ] Health [ ORG_I ] ( # SDOH ) org
Pharma [ ORG_B ] Tech [ ORG_I ] Outlook [ ORG_I ] Pharma Tech Outlook has contributors from the most established organizations who have been presenting their viewpoint using this unique print platform .. Pharma [ ORG_B ] Tech [ ORG_I ] Outlook [ ORG_I ] Pharma Tech Outlook has contributors from the most established organizations who have been presenting their viewpoint using this unique print platform . org
postDICOM PostDICOM is the [ DATE_B ] next [ DATE_I ] generation [ DATE_I ] # PACS [ PERSON_B ] which is built using Cloud [ ORG_B ] technologies .. It is especially designed for medical students , # doctors and # hospitals .. It is especially designed for medical students , # doctors and # hospitals . person
caitie hawley global health professional working at UW [ ORG_B ] .. 3/31/18 .. all views my own .. all views my own .. she / her / hers. she / her / hers org
INCLINE INCLINE is a [ ORG_B ] Climate [ ORG_I ] Change [ ORG_I ] Research [ ORG_I ] Support [ ORG_I ] Center [ ORG_I ] , created in 2011 [ DATE_B ] at the [ ORG_B ] University [ ORG_I ] of [ ORG_I ] São [ ORG_I ] Paulo [ ORG_I ] ( [ ORG_I ] USP [ ORG_I ] ) , Brazil [ GPE_B ] . org
Mark [ PERSON_B ] Pienaar [ PERSON_I ] person
Peter [ PERSON_B ] Daszak [ PERSON_I ] @EcoHealthNYC President .. @theNASEM. Forum on # microbialthreats Chair [ ORG_B ] .. Zoologist .. Parasitologist .. Ecologist .. British [ NORP_B ] .. American [ NORP_B ] . person
UNC Global Solutions Twitter home of Research [ ORG_B ] , [ ORG_I ] Innovation [ ORG_I ] and [ ORG_I ] Global [ ORG_I ] Solutions [ ORG_I ] at UNC [ ORG_B ] Gillings [ ORG_I ] School [ ORG_I ] of [ ORG_I ] Global [ ORG_I ] Public [ ORG_I ] Health [ ORG_I ] .. Find out about global health events and news here ! org
Adelaide Terracciano Digital Evangelist | Technology Enthusiast | Talk to me about [ MONEY_B ] # [ MONEY_I ] DigitalStrategy [ MONEY_I ] # [ MONEY_I ] Futurism # AI # IoT [ MONEY_B ]. Adelaide Terracciano Digital Evangelist | Technology Enthusiast | Talk to me about [ MONEY_B ] # [ MONEY_I ] DigitalStrategy [ MONEY_I ] # [ MONEY_I ] Futurism # AI # IoT [ MONEY_B ] org
Lisa [ PERSON_B ] Sullivan [ PERSON_I ] Principal-#Foresight [ PERSON_I ] & [ PERSON_I ] # Strategy at INFUSE [ ORG_B ] Corp. [ ORG_I ]. # [ MONEY_B ] education [ MONEY_I ] # [ MONEY_B ] workforce [ MONEY_I ]. # [ MONEY_B ] emergingtech [ MONEY_I ] # socimp # [ MONEY_B ] Futurist [ MONEY_I ] # [ MONEY_B ] Strategist [ MONEY_I ] # [ MONEY_I ] LongNow9125 [ GPE_B ] # FutureReady [ MONEY_B ] person
Pulse Lab Kampala Pulse Lab Kampala [ ORG_B ] is a UN [ ORG_B ] inter - agency [ NORP_B ] initiative , exploring opportunities for big data & [ ORG_B ] data innovation to achieve the [ LOC_B ] Global [ LOC_I ] Goals [ LOC_I ] .. http://t.co/RiOsbeH1ey org
CDC [ ORG_B ] Genomics [ ORG_I ] & [ ORG_I ] Precision [ ORG_I ] Health [ ORG_I ] CDC [ ORG_I ] Office [ ORG_I ] of [ ORG_I ] Genomics [ ORG_I ] and [ ORG_I ] Precision [ ORG_I ] Public [ ORG_I ] Health [ ORG_I ] : Using Genomics [ ORG_B ] & [ ORG_I ] Precision [ ORG_I ] Health [ ORG_I ] to Save Lives and Prevent Disease .. Links ≠ Endorsements . org
Reason [ ORG_B ] Digital [ ORG_I ] Working [ ORG_I ] exclusively in digital with projects that do social good .. Follow us for advice on websites , mobile , apps , email , innovation and more .. Follow us for advice on websites , mobile , apps , email , innovation and more . org
into .. AI -. The Global AI Ecosystem # intoAI get into AI [ GPE_B ] .. The # [ CARDINAL_B ] AI Knowledge and Global [ ORG_B ] Community [ ORG_I ] Platform [ ORG_I ] .. Helps you learn more about [ MONEY_B ] # [ MONEY_I ] AI [ MONEY_I ] .. Helps you learn more about [ MONEY_B ] # [ MONEY_I ] AI [ MONEY_I ] .. See @Neurons_AI , @Awards_AI @Events_AI @RoboticsandAI. # intoAI org
Edgar [ PERSON_B ] Garcia [ PERSON_I ]. That is why!!!/A mi me daban 2 [ CARDINAL_B ] ! ! !. That is why!!!/A mi me daban 2 [ CARDINAL_B ] ! ! ! person
Joseph [ PERSON_B ] Loh [ PERSON_I ] # PrivateEquity [ ORG_B ] and # Investment [ PERSON_B ] Expert [ PERSON_I ] in # EmergingMarkets [ MONEY_B ] .. Click the link to learn the basics of # PrivateEquity [ MONEY_B ] ( # PE [ MONEY_B ] ) . person
bryan gottsman Focus [ ORG_B ] on [ ORG_I ] the edge , AI [ PERSON_B ] , emerging tech .. Background in # HealthIT [ MONEY_B ] .. Husband [ PERSON_B ] and father making it happen every day for my family. Husband [ PERSON_B ] and father making it happen every day for my family person
Paul [ PERSON_B ] Blaser [ PERSON_I ] Data [ PERSON_I ] design , data visualization , data architecture , data quality .. Munging data in python , SQL [ ORG_B ] , & [ WORK_OF_ART_B ] .net .. # datavisualization # [ MONEY_B ] dataviz [ MONEY_I ] # [ CARDINAL_B ] dataisbeautiful # Tableau [ MONEY_B ] person
David [ PERSON_B ] Harris [ PERSON_I ] CEO , American [ ORG_B ] Jewish [ ORG_I ] Committee [ ORG_I ] ( AJC ) …. “. Too often we enjoy the comfort of opinion without the discomfort of thought .. Too often we enjoy the comfort of opinion without the discomfort of thought ..” - President John [ PERSON_B ] F. [ PERSON_I ] Kennedy [ PERSON_I ] person
CDC [ ORG_B ] Genomics [ ORG_I ] & [ ORG_I ] Precision [ ORG_I ] Health [ ORG_I ] CDC [ ORG_I ] Office [ ORG_I ] of [ ORG_I ] Genomics [ ORG_I ] and [ ORG_I ] Precision [ ORG_I ] Public [ ORG_I ] Health [ ORG_I ] : Using Genomics [ ORG_B ] & [ ORG_I ] Precision [ ORG_I ] Health [ ORG_I ] to Save Lives and Prevent Disease .. Links ≠ Endorsements . org

Labelled sentences

Finalising


desc1
Brian Martin M2M / IoT Sales Professional. Currently at Vodafone IoT (hence the tweets). Keeping networks, machines and people connected. Passionate about Life & Rugby.
Monique Thornton, MPH #Healthcomm: @Westat | Editor: @LTPHmedia | Writer: @theddrj | Let's talk: #socialmedia, #edutainment, #digitalhealth | Ops mine. #amwriting #publichealth
Bee 布瑞雅 Epidemiologist in-training @queensu, artist by design. | Formerly @iepi_mcmaster, @McMaster_MI @ccghr| Bridging data ethics & infectious disease | She/her.
Kayla Beth Translator terrestrielle / PhD student @uniofoxford @OxfordDemSci / researching experiences of air quality / views are my own
Ashley Perry Senior Vice President, Socially Determined. Improving the health of communities and the performance of health care organizations through #SDOH #analytics.
Melinda Mills Professor University of Oxford, @NuffieldCollege, Director @OxfordDemSci PI @sociogenome project
MPIDR Max Planck Institute for Demographic Research (MPIDR), Rostock, Germany. News & press releases.
Center for Innovative Design and Analysis The Center of Innovative Design and Analysis is part of the Department of Biostatistics & Informatics in the @Coloradosph at @CUAnschutz.
𝕍𝕠𝕝𝕜𝕒𝕟 𝕋𝕠𝕡𝕒𝕝𝕝𝕚 Criminologist | Andrew Young School of Policy Studies | Into Urban Crime, Cybercrime, Future Crime, Cashless Economy, Offender Psychology, Crime Policy, Crypto
Sam Konneh Entrepreneurial blogger, content writer seeking to merge my vast business, governance knowledge with the exciting SEO, marketing and emerging technologies