Introduction

This report compares the volume, behavior, and content of the tweets made by barstoolbigcat and pftcommenter.

## # A tibble: 2 x 2
##           person     n
##            <chr> <int>
## 1 barstoolbigcat  3195
## 2   pftcommenter  3195
## # A tibble: 2 x 2
##           person           timestamp
##            <chr>              <dttm>
## 1 barstoolbigcat 2017-08-13 18:56:08
## 2   pftcommenter 2017-03-16 11:32:41

3195 have been collected for barstoolbigcat. 3195 have been collected for pftcommenter. Note that the oldest tweet made by barstoolbigcat (in the collected data) is from 2017-08-13 18:56:08 and the oldest tweet from pftcommenter is from 2017-03-16 11:32:41.

Tweet Volume

How often do barstoolbigcat and pftcommenter tweet? Does the volume of tweets look different for temporal periods (e.g. year, month, etc.)?

Is the distribution of our volume of tweets given a certain temporal period statistically significant? Here, I use the Chi-Squared Test. If the p-value is calculated to be less thatn some threshold value (e.g. 0.05), then I can deduce that the the null hypothes (that the distribution is uniform) is invalid. In fact, it appears that our tweet volume does differ depending on the month and day of the week.

## 
##  Chi-squared test for given probabilities
## 
## data:  .
## X-squared = 18268, df = 11, p-value < 2.2e-16
## 
## 
##  Chi-squared test for given probabilities
## 
## data:  .
## X-squared = 2443.4, df = 11, p-value < 2.2e-16
## 
## 
##  Chi-squared test for given probabilities
## 
## data:  .
## X-squared = 53.077, df = 6, p-value = 1.132e-09
## 
## 
##  Chi-squared test for given probabilities
## 
## data:  .
## X-squared = 171.66, df = 6, p-value < 2.2e-16
## 
## [1] 0.9813945
## [1] 1.312177
## 
##  Chi-squared test for given probabilities
## 
## data:  .
## X-squared = 53.156, df = 6, p-value = 1.091e-09
## 
## 
##  Chi-squared test for given probabilities
## 
## data:  .
## X-squared = 126.2, df = 6, p-value < 2.2e-16

Tweet Behavior

How often do we use hashtags, RT, and reply?

## # A tibble: 24 x 4
##            person     type response  value
##  *          <chr>    <chr>    <chr>  <dbl>
##  1 barstoolbigcat  hashtag      yes 0.0808
##  2   pftcommenter  hashtag      yes 0.0576
##  3 barstoolbigcat hashtag2      yes 0.0773
##  4   pftcommenter hashtag2      yes 0.0538
##  5 barstoolbigcat  hashtag       no 0.9192
##  6   pftcommenter  hashtag       no 0.9424
##  7 barstoolbigcat hashtag2       no 0.9227
##  8   pftcommenter hashtag2       no 0.9462
##  9 barstoolbigcat     link      yes 0.5183
## 10   pftcommenter     link      yes 0.3865
## # ... with 14 more rows

Tweet Content

How Long are our tweets/

## # A tibble: 1 x 3
##   char_count_count char_count_avg char_count_max
##              <int>          <dbl>          <dbl>
## 1               78       304.3846           1354

Word Frequency and Usage

Which words are we most likely to use?

## # A tibble: 1,567 x 3
##       screen_name  word          created_at
##             <chr> <chr>               <chr>
##  1 BarstoolBigCat  009f 2017-08-14 23:58:56
##  2 BarstoolBigCat  008d 2017-08-14 23:58:56
##  3 BarstoolBigCat  009f 2017-08-14 23:58:56
##  4 BarstoolBigCat  008d 2017-08-14 23:58:56
##  5 BarstoolBigCat  009f 2017-08-14 23:58:56
##  6 BarstoolBigCat  008d 2017-08-14 23:58:56
##  7 BarstoolBigCat  009f 2017-08-15 14:56:29
##  8 BarstoolBigCat  00a3 2017-08-15 14:56:29
##  9 BarstoolBigCat  009f 2017-08-15 14:56:29
## 10 BarstoolBigCat  00a3 2017-08-15 14:56:29
## # ... with 1,557 more rows
## # A tibble: 2 x 2
##           person total
##            <chr> <int>
## 1 barstoolbigcat 18454
## 2   pftcommenter 19860
## # A tibble: 13,812 x 5
## # Groups:   person [2]
##            person            word     n total        freq
##             <chr>           <chr> <int> <int>       <dbl>
##  1 barstoolbigcat @barstoolbigcat   497 18454 0.026931830
##  2   pftcommenter              f0   440 19860 0.022155086
##  3 barstoolbigcat              f0   372 18454 0.020158231
##  4   pftcommenter   @pftcommenter   283 19860 0.014249748
##  5 barstoolbigcat   @pardonmytake   264 18454 0.014305842
##  6   pftcommenter   @pardonmytake   243 19860 0.012235650
##  7   pftcommenter @barstoolbigcat   234 19860 0.011782477
##  8 barstoolbigcat        football   152 18454 0.008236697
##  9   pftcommenter              ur   134 19860 0.006747231
## 10 barstoolbigcat            time   119 18454 0.006448466
## # ... with 13,802 more rows
## # A tibble: 11,447 x 3
##                 word barstoolbigcat pftcommenter
##                <chr>          <dbl>        <dbl>
##  1        #bleedblue   5.418879e-05 5.035247e-05
##  2            #class   5.418879e-05 5.035247e-05
##  3             #done   5.418879e-05 5.035247e-05
##  4       #draftkings   5.418879e-05 5.035247e-05
##  5            #elite   5.418879e-05 5.035247e-05
##  6         #fraudsun   5.418879e-05 5.035247e-05
##  7           #goblue   5.418879e-05 5.035247e-05
##  8              #lsu   5.418879e-05 5.035247e-05
##  9   #notcounterfeit   5.418879e-05 5.035247e-05
## 10 #parentaladvisory   5.418879e-05 5.035247e-05
## # ... with 11,437 more rows

Which words are most likely to be shared/different between us?

## # A tibble: 9,237 x 4
##        word barstoolbigcat pftcommenter logratio
##       <chr>          <dbl>        <dbl>    <dbl>
##  1      rpo   0.0015065760 3.902591e-05 3.653369
##  2  #10gawd   0.0010179568 3.902591e-05 3.261327
##  3 facebook   0.0007736471 3.902591e-05 2.986890
##  4     javy   0.0007329289 3.902591e-05 2.932823
##  5   lackey   0.0007329289 3.902591e-05 2.932823
##  6    picks   0.0006922106 3.902591e-05 2.875664
##  7     cubs   0.0012622664 7.805183e-05 2.783291
##  8      bro   0.0006107741 3.902591e-05 2.750501
##  9 rankings   0.0006107741 3.902591e-05 2.750501
## 10    bears   0.0023209414 1.561037e-04 2.699208
## # ... with 9,227 more rows
## # A tibble: 9,237 x 4
##             word barstoolbigcat pftcommenter    logratio
##            <chr>          <dbl>        <dbl>       <dbl>
##  1          book   5.293375e-04 5.463628e-04 -0.03165695
##  2           'em   1.221548e-04 1.170777e-04  0.04245103
##  3    #bleedblue   8.143654e-05 7.805183e-05  0.04245103
##  4        #class   8.143654e-05 7.805183e-05  0.04245103
##  5         #done   8.143654e-05 7.805183e-05  0.04245103
##  6   #draftkings   8.143654e-05 7.805183e-05  0.04245103
##  7 #draftthewins   1.628731e-04 1.561037e-04  0.04245103
##  8        #elite   8.143654e-05 7.805183e-05  0.04245103
##  9     #fraudsun   8.143654e-05 7.805183e-05  0.04245103
## 10       #goblue   8.143654e-05 7.805183e-05  0.04245103
## # ... with 9,227 more rows
## # A tibble: 9,237 x 4
##        word barstoolbigcat pftcommenter  logratio
##       <chr>          <dbl>        <dbl>     <dbl>
##  1 literaly   4.071827e-05 2.068373e-03 -3.927841
##  2      rpo   1.506576e-03 3.902591e-05  3.653369
##  3  actualy   4.071827e-05 1.522011e-03 -3.621111
##  4    shoud   4.071827e-05 1.170777e-03 -3.358746
##  5  #10gawd   1.017957e-03 3.902591e-05  3.261327
##  6      thx   4.071827e-05 1.053700e-03 -3.253386
##  7     woud   4.071827e-05 1.053700e-03 -3.253386
##  8    verse   4.071827e-05 1.014674e-03 -3.215646
##  9   dosent   4.071827e-05 9.366219e-04 -3.135603
## 10      ppl   1.221548e-04 2.692788e-03 -3.093043
## # ... with 9,227 more rows

Which words have we used more/less frequently over time?

## # A tibble: 748 x 6
##    time_floor       person     word count time_total word_total
##        <dttm>        <chr>    <chr> <int>      <int>      <int>
##  1 2017-03-01 pftcommenter      bad     3       1398         81
##  2 2017-03-01 pftcommenter barstool     1       1398         53
##  3 2017-03-01 pftcommenter     call     3       1398         47
##  4 2017-03-01 pftcommenter   called     2       1398         32
##  5 2017-03-01 pftcommenter      cat     1       1398         45
##  6 2017-03-01 pftcommenter    coach     2       1398         46
##  7 2017-03-01 pftcommenter  college     1       1398         46
##  8 2017-03-01 pftcommenter   coming     4       1398         56
##  9 2017-03-01 pftcommenter      dad     4       1398         38
## 10 2017-03-01 pftcommenter      day     4       1398        113
## # ... with 738 more rows
## # A tibble: 191 x 3
##          person     word             data
##           <chr>    <chr>           <list>
##  1 pftcommenter      bad <tibble [7 x 4]>
##  2 pftcommenter barstool <tibble [4 x 4]>
##  3 pftcommenter     call <tibble [7 x 4]>
##  4 pftcommenter   called <tibble [7 x 4]>
##  5 pftcommenter      cat <tibble [6 x 4]>
##  6 pftcommenter    coach <tibble [7 x 4]>
##  7 pftcommenter  college <tibble [7 x 4]>
##  8 pftcommenter   coming <tibble [7 x 4]>
##  9 pftcommenter      dad <tibble [6 x 4]>
## 10 pftcommenter      day <tibble [7 x 4]>
## # ... with 181 more rows
## # A tibble: 191 x 4
##          person     word             data    models
##           <chr>    <chr>           <list>    <list>
##  1 pftcommenter      bad <tibble [7 x 4]> <S3: glm>
##  2 pftcommenter barstool <tibble [4 x 4]> <S3: glm>
##  3 pftcommenter     call <tibble [7 x 4]> <S3: glm>
##  4 pftcommenter   called <tibble [7 x 4]> <S3: glm>
##  5 pftcommenter      cat <tibble [6 x 4]> <S3: glm>
##  6 pftcommenter    coach <tibble [7 x 4]> <S3: glm>
##  7 pftcommenter  college <tibble [7 x 4]> <S3: glm>
##  8 pftcommenter   coming <tibble [7 x 4]> <S3: glm>
##  9 pftcommenter      dad <tibble [6 x 4]> <S3: glm>
## 10 pftcommenter      day <tibble [7 x 4]> <S3: glm>
## # ... with 181 more rows
## # A tibble: 5 x 8
##           person     word       term      estimate    std.error statistic
##            <chr>    <chr>      <chr>         <dbl>        <dbl>     <dbl>
## 1 barstoolbigcat rushmore time_floor -9.331459e-07 1.528234e-07 -6.106042
## 2 barstoolbigcat       mt time_floor -1.072186e-06 1.974595e-07 -5.429901
## 3   pftcommenter       f0 time_floor  4.534702e-08 9.592594e-09  4.727295
## 4 barstoolbigcat    vegas time_floor -1.033661e-06 2.292757e-07 -4.508375
## 5 barstoolbigcat    fight time_floor -5.160781e-07 1.331107e-07 -3.877059
## # ... with 2 more variables: p.value <dbl>, adjusted_p_value <dbl>

Tweet Popularity

How often do our tweets get liked/favorited/retweeted?

## # A tibble: 2 x 10
##           person  uses rts_total favs_total rts_max favs_max rts_avg
##            <chr> <int>     <int>      <int>   <dbl>    <dbl>   <dbl>
## 1 barstoolbigcat  1514   2770117   15949094 1112620  3749240 1829.67
## 2   pftcommenter  1084   3949638   22600410  186060   755720 3643.58
## # ... with 3 more variables: favs_avg <dbl>, rts_median <dbl>,
## #   favs_median <dbl>
## # A tibble: 7,468 x 4
##            person         word rts_median favs_median
##             <chr>        <chr>      <dbl>       <dbl>
##  1 barstoolbigcat        'eers      213.0       831.0
##  2 barstoolbigcat       'yoffs        5.0        85.0
##  3 barstoolbigcat           #1       32.0       751.0
##  4 barstoolbigcat      #10gawd       85.5       649.5
##  5 barstoolbigcat          #ad       14.0       127.0
##  6 barstoolbigcat    #arodcorp       91.0      1425.0
##  7 barstoolbigcat       #askgh        9.0       206.0
##  8 barstoolbigcat #asseatinszn        8.0       392.0
##  9 barstoolbigcat        #awls      265.0      5228.0
## 10 barstoolbigcat     #badboyz       53.0       891.0
## # ... with 7,458 more rows
## # A tibble: 14,936 x 5
##            person         word  type   calc value
##  *          <chr>        <chr> <chr>  <chr> <dbl>
##  1 barstoolbigcat        'eers   rts median 213.0
##  2 barstoolbigcat       'yoffs   rts median   5.0
##  3 barstoolbigcat           #1   rts median  32.0
##  4 barstoolbigcat      #10gawd   rts median  85.5
##  5 barstoolbigcat          #ad   rts median  14.0
##  6 barstoolbigcat    #arodcorp   rts median  91.0
##  7 barstoolbigcat       #askgh   rts median   9.0
##  8 barstoolbigcat #asseatinszn   rts median   8.0
##  9 barstoolbigcat        #awls   rts median 265.0
## 10 barstoolbigcat     #badboyz   rts median  53.0
## # ... with 14,926 more rows

Sentiment Analysis

What is the sentiment (i.e. “tone”) of our tweets?

## # A tibble: 6,291 x 3
##       status_id         person total_words
##           <dbl>          <chr>       <int>
##  1 8.968831e+17 barstoolbigcat       18454
##  2 8.969027e+17 barstoolbigcat       18454
##  3 8.969098e+17 barstoolbigcat       18454
##  4 8.969100e+17 barstoolbigcat       18454
##  5 8.969153e+17 barstoolbigcat       18454
##  6 8.969155e+17 barstoolbigcat       18454
##  7 8.969165e+17 barstoolbigcat       18454
##  8 8.969253e+17 barstoolbigcat       18454
##  9 8.969291e+17 barstoolbigcat       18454
## 10 8.969313e+17 barstoolbigcat       18454
## # ... with 6,281 more rows
## # A tibble: 20 x 4
##            person    sentiment total_words words
##             <chr>        <chr>       <int> <dbl>
##  1 barstoolbigcat        anger       18454   627
##  2 barstoolbigcat anticipation       18454  1020
##  3 barstoolbigcat      disgust       18454   446
##  4 barstoolbigcat         fear       18454   620
##  5 barstoolbigcat          joy       18454   767
##  6 barstoolbigcat     negative       18454  1123
##  7 barstoolbigcat     positive       18454  1557
##  8 barstoolbigcat      sadness       18454   483
##  9 barstoolbigcat     surprise       18454   356
## 10 barstoolbigcat        trust       18454   763
## 11   pftcommenter        anger       19860   478
## 12   pftcommenter anticipation       19860   840
## 13   pftcommenter      disgust       19860   330
## 14   pftcommenter         fear       19860   525
## 15   pftcommenter          joy       19860   694
## 16   pftcommenter     negative       19860   992
## 17   pftcommenter     positive       19860  1458
## 18   pftcommenter      sadness       19860   460
## 19   pftcommenter     surprise       19860   349
## 20   pftcommenter        trust       19860   854
## # A tibble: 10 x 4
##       sentiment barstoolbigcat pftcommenter sentiment_diff
##           <chr>          <dbl>        <dbl>          <dbl>
##  1        trust         0.0413       0.0430        -0.0017
##  2     surprise         0.0193       0.0176         0.0017
##  3      sadness         0.0262       0.0232         0.0030
##  4          joy         0.0416       0.0349         0.0067
##  5         fear         0.0336       0.0264         0.0072
##  6      disgust         0.0242       0.0166         0.0076
##  7        anger         0.0340       0.0241         0.0099
##  8     positive         0.0844       0.0734         0.0110
##  9     negative         0.0609       0.0499         0.0110
## 10 anticipation         0.0553       0.0423         0.0130
## # A tibble: 10 x 9
##       sentiment  estimate statistic      p.value parameter  conf.low
##           <chr>     <dbl>     <dbl>        <dbl>     <dbl>     <dbl>
##  1        anger 1.4116544       627 1.188372e-08  532.2250 1.2512592
##  2 anticipation 1.3068015      1020 9.509408e-09  895.8720 1.1915770
##  3      disgust 1.4544863       446 2.159808e-07  373.7617 1.2587075
##  4         fear 1.2709285       620 5.665874e-05  551.4911 1.1296205
##  5          joy 1.1893909       767 9.659303e-04  703.6930 1.0719177
##  6     negative 1.2183072      1123 5.927508e-06 1018.6932 1.1175827
##  7     positive 1.1492640      1557 1.388938e-04 1452.1796 1.0693481
##  8      sadness 1.1299989       483 6.321560e-02  454.1975 0.9924752
##  9     surprise 1.0977749       356 2.277954e-01  339.5644 0.9444222
## 10        trust 0.9615135       763 4.404778e-01  778.8307 0.8709259
## # ... with 3 more variables: conf.high <dbl>, method <fctr>,
## #   alternative <fctr>

Conclusion

That’s it!