This report compares the volume, behavior, and content of the tweets made by barstoolbigcat and pftcommenter.
## # A tibble: 2 x 2
## person n
## <chr> <int>
## 1 barstoolbigcat 3195
## 2 pftcommenter 3195
## # A tibble: 2 x 2
## person timestamp
## <chr> <dttm>
## 1 barstoolbigcat 2017-08-13 18:56:08
## 2 pftcommenter 2017-03-16 11:32:41
3195 have been collected for barstoolbigcat. 3195 have been collected for pftcommenter. Note that the oldest tweet made by barstoolbigcat (in the collected data) is from 2017-08-13 18:56:08 and the oldest tweet from pftcommenter is from 2017-03-16 11:32:41.
How often do barstoolbigcat and pftcommenter tweet? Does the volume of tweets look different for temporal periods (e.g. year, month, etc.)?
Is the distribution of our volume of tweets given a certain temporal period statistically significant? Here, I use the Chi-Squared Test. If the p-value is calculated to be less thatn some threshold value (e.g. 0.05), then I can deduce that the the null hypothes (that the distribution is uniform) is invalid. In fact, it appears that our tweet volume does differ depending on the month and day of the week.
##
## Chi-squared test for given probabilities
##
## data: .
## X-squared = 18268, df = 11, p-value < 2.2e-16
##
##
## Chi-squared test for given probabilities
##
## data: .
## X-squared = 2443.4, df = 11, p-value < 2.2e-16
##
##
## Chi-squared test for given probabilities
##
## data: .
## X-squared = 53.077, df = 6, p-value = 1.132e-09
##
##
## Chi-squared test for given probabilities
##
## data: .
## X-squared = 171.66, df = 6, p-value < 2.2e-16
##
## [1] 0.9813945
## [1] 1.312177
##
## Chi-squared test for given probabilities
##
## data: .
## X-squared = 53.156, df = 6, p-value = 1.091e-09
##
##
## Chi-squared test for given probabilities
##
## data: .
## X-squared = 126.2, df = 6, p-value < 2.2e-16
How often do we use hashtags, RT, and reply?
## # A tibble: 24 x 4
## person type response value
## * <chr> <chr> <chr> <dbl>
## 1 barstoolbigcat hashtag yes 0.0808
## 2 pftcommenter hashtag yes 0.0576
## 3 barstoolbigcat hashtag2 yes 0.0773
## 4 pftcommenter hashtag2 yes 0.0538
## 5 barstoolbigcat hashtag no 0.9192
## 6 pftcommenter hashtag no 0.9424
## 7 barstoolbigcat hashtag2 no 0.9227
## 8 pftcommenter hashtag2 no 0.9462
## 9 barstoolbigcat link yes 0.5183
## 10 pftcommenter link yes 0.3865
## # ... with 14 more rows
How Long are our tweets/
## # A tibble: 1 x 3
## char_count_count char_count_avg char_count_max
## <int> <dbl> <dbl>
## 1 78 304.3846 1354
Which words are we most likely to use?
## # A tibble: 1,567 x 3
## screen_name word created_at
## <chr> <chr> <chr>
## 1 BarstoolBigCat 009f 2017-08-14 23:58:56
## 2 BarstoolBigCat 008d 2017-08-14 23:58:56
## 3 BarstoolBigCat 009f 2017-08-14 23:58:56
## 4 BarstoolBigCat 008d 2017-08-14 23:58:56
## 5 BarstoolBigCat 009f 2017-08-14 23:58:56
## 6 BarstoolBigCat 008d 2017-08-14 23:58:56
## 7 BarstoolBigCat 009f 2017-08-15 14:56:29
## 8 BarstoolBigCat 00a3 2017-08-15 14:56:29
## 9 BarstoolBigCat 009f 2017-08-15 14:56:29
## 10 BarstoolBigCat 00a3 2017-08-15 14:56:29
## # ... with 1,557 more rows
## # A tibble: 2 x 2
## person total
## <chr> <int>
## 1 barstoolbigcat 18454
## 2 pftcommenter 19860
## # A tibble: 13,812 x 5
## # Groups: person [2]
## person word n total freq
## <chr> <chr> <int> <int> <dbl>
## 1 barstoolbigcat @barstoolbigcat 497 18454 0.026931830
## 2 pftcommenter f0 440 19860 0.022155086
## 3 barstoolbigcat f0 372 18454 0.020158231
## 4 pftcommenter @pftcommenter 283 19860 0.014249748
## 5 barstoolbigcat @pardonmytake 264 18454 0.014305842
## 6 pftcommenter @pardonmytake 243 19860 0.012235650
## 7 pftcommenter @barstoolbigcat 234 19860 0.011782477
## 8 barstoolbigcat football 152 18454 0.008236697
## 9 pftcommenter ur 134 19860 0.006747231
## 10 barstoolbigcat time 119 18454 0.006448466
## # ... with 13,802 more rows
## # A tibble: 11,447 x 3
## word barstoolbigcat pftcommenter
## <chr> <dbl> <dbl>
## 1 #bleedblue 5.418879e-05 5.035247e-05
## 2 #class 5.418879e-05 5.035247e-05
## 3 #done 5.418879e-05 5.035247e-05
## 4 #draftkings 5.418879e-05 5.035247e-05
## 5 #elite 5.418879e-05 5.035247e-05
## 6 #fraudsun 5.418879e-05 5.035247e-05
## 7 #goblue 5.418879e-05 5.035247e-05
## 8 #lsu 5.418879e-05 5.035247e-05
## 9 #notcounterfeit 5.418879e-05 5.035247e-05
## 10 #parentaladvisory 5.418879e-05 5.035247e-05
## # ... with 11,437 more rows
Which words are most likely to be shared/different between us?
## # A tibble: 9,237 x 4
## word barstoolbigcat pftcommenter logratio
## <chr> <dbl> <dbl> <dbl>
## 1 rpo 0.0015065760 3.902591e-05 3.653369
## 2 #10gawd 0.0010179568 3.902591e-05 3.261327
## 3 facebook 0.0007736471 3.902591e-05 2.986890
## 4 javy 0.0007329289 3.902591e-05 2.932823
## 5 lackey 0.0007329289 3.902591e-05 2.932823
## 6 picks 0.0006922106 3.902591e-05 2.875664
## 7 cubs 0.0012622664 7.805183e-05 2.783291
## 8 bro 0.0006107741 3.902591e-05 2.750501
## 9 rankings 0.0006107741 3.902591e-05 2.750501
## 10 bears 0.0023209414 1.561037e-04 2.699208
## # ... with 9,227 more rows
## # A tibble: 9,237 x 4
## word barstoolbigcat pftcommenter logratio
## <chr> <dbl> <dbl> <dbl>
## 1 book 5.293375e-04 5.463628e-04 -0.03165695
## 2 'em 1.221548e-04 1.170777e-04 0.04245103
## 3 #bleedblue 8.143654e-05 7.805183e-05 0.04245103
## 4 #class 8.143654e-05 7.805183e-05 0.04245103
## 5 #done 8.143654e-05 7.805183e-05 0.04245103
## 6 #draftkings 8.143654e-05 7.805183e-05 0.04245103
## 7 #draftthewins 1.628731e-04 1.561037e-04 0.04245103
## 8 #elite 8.143654e-05 7.805183e-05 0.04245103
## 9 #fraudsun 8.143654e-05 7.805183e-05 0.04245103
## 10 #goblue 8.143654e-05 7.805183e-05 0.04245103
## # ... with 9,227 more rows
## # A tibble: 9,237 x 4
## word barstoolbigcat pftcommenter logratio
## <chr> <dbl> <dbl> <dbl>
## 1 literaly 4.071827e-05 2.068373e-03 -3.927841
## 2 rpo 1.506576e-03 3.902591e-05 3.653369
## 3 actualy 4.071827e-05 1.522011e-03 -3.621111
## 4 shoud 4.071827e-05 1.170777e-03 -3.358746
## 5 #10gawd 1.017957e-03 3.902591e-05 3.261327
## 6 thx 4.071827e-05 1.053700e-03 -3.253386
## 7 woud 4.071827e-05 1.053700e-03 -3.253386
## 8 verse 4.071827e-05 1.014674e-03 -3.215646
## 9 dosent 4.071827e-05 9.366219e-04 -3.135603
## 10 ppl 1.221548e-04 2.692788e-03 -3.093043
## # ... with 9,227 more rows
Which words have we used more/less frequently over time?
## # A tibble: 748 x 6
## time_floor person word count time_total word_total
## <dttm> <chr> <chr> <int> <int> <int>
## 1 2017-03-01 pftcommenter bad 3 1398 81
## 2 2017-03-01 pftcommenter barstool 1 1398 53
## 3 2017-03-01 pftcommenter call 3 1398 47
## 4 2017-03-01 pftcommenter called 2 1398 32
## 5 2017-03-01 pftcommenter cat 1 1398 45
## 6 2017-03-01 pftcommenter coach 2 1398 46
## 7 2017-03-01 pftcommenter college 1 1398 46
## 8 2017-03-01 pftcommenter coming 4 1398 56
## 9 2017-03-01 pftcommenter dad 4 1398 38
## 10 2017-03-01 pftcommenter day 4 1398 113
## # ... with 738 more rows
## # A tibble: 191 x 3
## person word data
## <chr> <chr> <list>
## 1 pftcommenter bad <tibble [7 x 4]>
## 2 pftcommenter barstool <tibble [4 x 4]>
## 3 pftcommenter call <tibble [7 x 4]>
## 4 pftcommenter called <tibble [7 x 4]>
## 5 pftcommenter cat <tibble [6 x 4]>
## 6 pftcommenter coach <tibble [7 x 4]>
## 7 pftcommenter college <tibble [7 x 4]>
## 8 pftcommenter coming <tibble [7 x 4]>
## 9 pftcommenter dad <tibble [6 x 4]>
## 10 pftcommenter day <tibble [7 x 4]>
## # ... with 181 more rows
## # A tibble: 191 x 4
## person word data models
## <chr> <chr> <list> <list>
## 1 pftcommenter bad <tibble [7 x 4]> <S3: glm>
## 2 pftcommenter barstool <tibble [4 x 4]> <S3: glm>
## 3 pftcommenter call <tibble [7 x 4]> <S3: glm>
## 4 pftcommenter called <tibble [7 x 4]> <S3: glm>
## 5 pftcommenter cat <tibble [6 x 4]> <S3: glm>
## 6 pftcommenter coach <tibble [7 x 4]> <S3: glm>
## 7 pftcommenter college <tibble [7 x 4]> <S3: glm>
## 8 pftcommenter coming <tibble [7 x 4]> <S3: glm>
## 9 pftcommenter dad <tibble [6 x 4]> <S3: glm>
## 10 pftcommenter day <tibble [7 x 4]> <S3: glm>
## # ... with 181 more rows
## # A tibble: 5 x 8
## person word term estimate std.error statistic
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 barstoolbigcat rushmore time_floor -9.331459e-07 1.528234e-07 -6.106042
## 2 barstoolbigcat mt time_floor -1.072186e-06 1.974595e-07 -5.429901
## 3 pftcommenter f0 time_floor 4.534702e-08 9.592594e-09 4.727295
## 4 barstoolbigcat vegas time_floor -1.033661e-06 2.292757e-07 -4.508375
## 5 barstoolbigcat fight time_floor -5.160781e-07 1.331107e-07 -3.877059
## # ... with 2 more variables: p.value <dbl>, adjusted_p_value <dbl>
How often do our tweets get liked/favorited/retweeted?
## # A tibble: 2 x 10
## person uses rts_total favs_total rts_max favs_max rts_avg
## <chr> <int> <int> <int> <dbl> <dbl> <dbl>
## 1 barstoolbigcat 1514 2770117 15949094 1112620 3749240 1829.67
## 2 pftcommenter 1084 3949638 22600410 186060 755720 3643.58
## # ... with 3 more variables: favs_avg <dbl>, rts_median <dbl>,
## # favs_median <dbl>
## # A tibble: 7,468 x 4
## person word rts_median favs_median
## <chr> <chr> <dbl> <dbl>
## 1 barstoolbigcat 'eers 213.0 831.0
## 2 barstoolbigcat 'yoffs 5.0 85.0
## 3 barstoolbigcat #1 32.0 751.0
## 4 barstoolbigcat #10gawd 85.5 649.5
## 5 barstoolbigcat #ad 14.0 127.0
## 6 barstoolbigcat #arodcorp 91.0 1425.0
## 7 barstoolbigcat #askgh 9.0 206.0
## 8 barstoolbigcat #asseatinszn 8.0 392.0
## 9 barstoolbigcat #awls 265.0 5228.0
## 10 barstoolbigcat #badboyz 53.0 891.0
## # ... with 7,458 more rows
## # A tibble: 14,936 x 5
## person word type calc value
## * <chr> <chr> <chr> <chr> <dbl>
## 1 barstoolbigcat 'eers rts median 213.0
## 2 barstoolbigcat 'yoffs rts median 5.0
## 3 barstoolbigcat #1 rts median 32.0
## 4 barstoolbigcat #10gawd rts median 85.5
## 5 barstoolbigcat #ad rts median 14.0
## 6 barstoolbigcat #arodcorp rts median 91.0
## 7 barstoolbigcat #askgh rts median 9.0
## 8 barstoolbigcat #asseatinszn rts median 8.0
## 9 barstoolbigcat #awls rts median 265.0
## 10 barstoolbigcat #badboyz rts median 53.0
## # ... with 14,926 more rows
What is the sentiment (i.e. “tone”) of our tweets?
## # A tibble: 6,291 x 3
## status_id person total_words
## <dbl> <chr> <int>
## 1 8.968831e+17 barstoolbigcat 18454
## 2 8.969027e+17 barstoolbigcat 18454
## 3 8.969098e+17 barstoolbigcat 18454
## 4 8.969100e+17 barstoolbigcat 18454
## 5 8.969153e+17 barstoolbigcat 18454
## 6 8.969155e+17 barstoolbigcat 18454
## 7 8.969165e+17 barstoolbigcat 18454
## 8 8.969253e+17 barstoolbigcat 18454
## 9 8.969291e+17 barstoolbigcat 18454
## 10 8.969313e+17 barstoolbigcat 18454
## # ... with 6,281 more rows
## # A tibble: 20 x 4
## person sentiment total_words words
## <chr> <chr> <int> <dbl>
## 1 barstoolbigcat anger 18454 627
## 2 barstoolbigcat anticipation 18454 1020
## 3 barstoolbigcat disgust 18454 446
## 4 barstoolbigcat fear 18454 620
## 5 barstoolbigcat joy 18454 767
## 6 barstoolbigcat negative 18454 1123
## 7 barstoolbigcat positive 18454 1557
## 8 barstoolbigcat sadness 18454 483
## 9 barstoolbigcat surprise 18454 356
## 10 barstoolbigcat trust 18454 763
## 11 pftcommenter anger 19860 478
## 12 pftcommenter anticipation 19860 840
## 13 pftcommenter disgust 19860 330
## 14 pftcommenter fear 19860 525
## 15 pftcommenter joy 19860 694
## 16 pftcommenter negative 19860 992
## 17 pftcommenter positive 19860 1458
## 18 pftcommenter sadness 19860 460
## 19 pftcommenter surprise 19860 349
## 20 pftcommenter trust 19860 854
## # A tibble: 10 x 4
## sentiment barstoolbigcat pftcommenter sentiment_diff
## <chr> <dbl> <dbl> <dbl>
## 1 trust 0.0413 0.0430 -0.0017
## 2 surprise 0.0193 0.0176 0.0017
## 3 sadness 0.0262 0.0232 0.0030
## 4 joy 0.0416 0.0349 0.0067
## 5 fear 0.0336 0.0264 0.0072
## 6 disgust 0.0242 0.0166 0.0076
## 7 anger 0.0340 0.0241 0.0099
## 8 positive 0.0844 0.0734 0.0110
## 9 negative 0.0609 0.0499 0.0110
## 10 anticipation 0.0553 0.0423 0.0130
## # A tibble: 10 x 9
## sentiment estimate statistic p.value parameter conf.low
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 anger 1.4116544 627 1.188372e-08 532.2250 1.2512592
## 2 anticipation 1.3068015 1020 9.509408e-09 895.8720 1.1915770
## 3 disgust 1.4544863 446 2.159808e-07 373.7617 1.2587075
## 4 fear 1.2709285 620 5.665874e-05 551.4911 1.1296205
## 5 joy 1.1893909 767 9.659303e-04 703.6930 1.0719177
## 6 negative 1.2183072 1123 5.927508e-06 1018.6932 1.1175827
## 7 positive 1.1492640 1557 1.388938e-04 1452.1796 1.0693481
## 8 sadness 1.1299989 483 6.321560e-02 454.1975 0.9924752
## 9 surprise 1.0977749 356 2.277954e-01 339.5644 0.9444222
## 10 trust 0.9615135 763 4.404778e-01 778.8307 0.8709259
## # ... with 3 more variables: conf.high <dbl>, method <fctr>,
## # alternative <fctr>
That’s it!