This report compares the volume, behavior, and content of the tweets made by elhabro and tonyelhabr.
## # A tibble: 2 x 2
## person n
## <chr> <int>
## 1 elhabro 3164
## 2 tonyelhabr 519
## # A tibble: 2 x 2
## person timestamp
## <chr> <dttm>
## 1 elhabro 2014-09-23 17:23:37
## 2 tonyelhabr 2015-08-19 15:06:32
3164 have been collected for elhabro. 519 have been collected for tonyelhabr. Note that the oldest tweet made by elhabro (in the collected data) is from 2014-09-23 17:23:37 and the oldest tweet from tonyelhabr is from 2015-08-19 15:06:32.
How often do elhabro and tonyelhabr tweet? Does the volume of tweets look different for temporal periods (e.g. year, month, etc.)?
Is the distribution of our volume of tweets given a certain temporal period statistically significant? Here, I use the Chi-Squared Test. If the p-value is calculated to be less thatn some threshold value (e.g. 0.05), then I can deduce that the the null hypothes (that the distribution is uniform) is invalid. In fact, it appears that our tweet volume does differ depending on the month and day of the week.
##
## Chi-squared test for given probabilities
##
## data: .
## X-squared = 116.75, df = 11, p-value < 2.2e-16
##
##
## Chi-squared test for given probabilities
##
## data: .
## X-squared = 154.32, df = 11, p-value < 2.2e-16
##
##
## Chi-squared test for given probabilities
##
## data: .
## X-squared = 48.031, df = 6, p-value = 1.165e-08
##
##
## Chi-squared test for given probabilities
##
## data: .
## X-squared = 37.965, df = 6, p-value = 1.141e-06
##
## [1] 0.9859181
## [1] 0.6603261
##
## Chi-squared test for given probabilities
##
## data: .
## X-squared = 47.579, df = 6, p-value = 1.434e-08
##
##
## Chi-squared test for given probabilities
##
## data: .
## X-squared = 13.008, df = 6, p-value = 0.04292
How often do we use hashtags, RT, and reply?
## # A tibble: 24 x 4
## person type response value
## * <chr> <chr> <chr> <dbl>
## 1 elhabro hashtag yes 0.0553
## 2 tonyelhabr hashtag yes 0.0539
## 3 elhabro hashtag2 yes 0.0531
## 4 tonyelhabr hashtag2 yes 0.0539
## 5 elhabro hashtag no 0.9447
## 6 tonyelhabr hashtag no 0.9461
## 7 elhabro hashtag2 no 0.9469
## 8 tonyelhabr hashtag2 no 0.9461
## 9 elhabro link yes 0.3666
## 10 tonyelhabr link yes 0.4586
## # ... with 14 more rows
How Long are our tweets/
## # A tibble: 1 x 3
## char_count_count char_count_avg char_count_max
## <int> <dbl> <dbl>
## 1 60 344.85 3542
Which words are we most likely to use?
## # A tibble: 1,538 x 3
## screen_name word created_at
## <chr> <chr> <chr>
## 1 elhabro 009f 2015-01-01 03:46:08
## 2 elhabro 408d 2015-01-27 20:39:05
## 3 elhabro 009f 2015-03-25 02:25:31
## 4 elhabro 009f 2015-03-25 02:25:31
## 5 elhabro 009f 2015-03-25 02:25:31
## 6 elhabro 009f 2015-04-21 18:18:11
## 7 elhabro 009f 2015-04-29 05:48:08
## 8 elhabro 008e 2015-04-29 05:48:08
## 9 elhabro 009f 2015-04-29 05:48:08
## 10 elhabro 00ab 2015-04-29 05:48:08
## # ... with 1,528 more rows
## # A tibble: 2 x 2
## person total
## <chr> <int>
## 1 elhabro 17701
## 2 tonyelhabr 3283
## # A tibble: 8,121 x 5
## # Groups: person [2]
## person word n total freq
## <chr> <chr> <int> <int> <dbl>
## 1 elhabro f0 862 17701 0.048697814
## 2 elhabro game 145 17701 0.008191628
## 3 elhabro time 116 17701 0.006553302
## 4 elhabro @sailordanj 95 17701 0.005366928
## 5 elhabro cuz 89 17701 0.005027965
## 6 elhabro day 88 17701 0.004971471
## 7 elhabro lol 79 17701 0.004463025
## 8 elhabro @dragonflyjonez 73 17701 0.004124061
## 9 elhabro gonna 73 17701 0.004124061
## 10 elhabro haha 67 17701 0.003785097
## # ... with 8,111 more rows
## # A tibble: 6,922 x 3
## word elhabro tonyelhabr
## <chr> <dbl> <dbl>
## 1 #classof2016 5.649398e-05 0.0003045995
## 2 #finalfour 5.649398e-05 0.0003045995
## 3 #graduation 5.649398e-05 0.0003045995
## 4 #marchmadness 5.649398e-05 0.0003045995
## 5 #rstats 5.649398e-05 0.0003045995
## 6 @alisongriswold 5.649398e-05 0.0003045995
## 7 @basketballtalk 5.649398e-05 0.0003045995
## 8 @bill_easterly 5.649398e-05 0.0003045995
## 9 @chldishricardo 5.649398e-05 0.0003045995
## 10 @coliegestudent 5.649398e-05 0.0003045995
## # ... with 6,912 more rows
Which words are most likely to be shared/different between us?
## # A tibble: 5,858 x 4
## word elhabro tonyelhabr logratio
## <chr> <dbl> <dbl> <dbl>
## 1 haha 0.0031969911 0.0001141944 3.332063
## 2 ppl 0.0022096850 0.0001141944 2.962703
## 3 yea 0.0015984955 0.0001141944 2.638916
## 4 prob 0.0015044664 0.0001141944 2.578292
## 5 summer 0.0012223789 0.0001141944 2.370652
## 6 f0 0.0405735778 0.0038826082 2.346610
## 7 lmao 0.0023507287 0.0002283887 2.331432
## 8 told 0.0011753644 0.0001141944 2.331432
## 9 mins 0.0009873061 0.0001141944 2.157078
## 10 gotta 0.0018805830 0.0002283887 2.108288
## # ... with 5,848 more rows
## # A tibble: 5,858 x 4
## word elhabro tonyelhabr logratio
## <chr> <dbl> <dbl> <dbl>
## 1 career 0.0005641749 0.0005709718 -0.01197551
## 2 job 0.0005641749 0.0005709718 -0.01197551
## 3 lakers 0.0005641749 0.0005709718 -0.01197551
## 4 #nbafinals 0.0002350729 0.0002283887 0.02884648
## 5 1st 0.0002350729 0.0002283887 0.02884648
## 6 android 0.0002350729 0.0002283887 0.02884648
## 7 bring 0.0002350729 0.0002283887 0.02884648
## 8 bus 0.0002350729 0.0002283887 0.02884648
## 9 buying 0.0002350729 0.0002283887 0.02884648
## 10 caught 0.0002350729 0.0002283887 0.02884648
## # ... with 5,848 more rows
## # A tibble: 5,858 x 4
## word elhabro tonyelhabr logratio
## <chr> <dbl> <dbl> <dbl>
## 1 haha 3.196991e-03 0.0001141944 3.332063
## 2 ppl 2.209685e-03 0.0001141944 2.962703
## 3 yea 1.598496e-03 0.0001141944 2.638916
## 4 prob 1.504466e-03 0.0001141944 2.578292
## 5 summer 1.222379e-03 0.0001141944 2.370652
## 6 f0 4.057358e-02 0.0038826082 2.346610
## 7 lmao 2.350729e-03 0.0002283887 2.331432
## 8 told 1.175364e-03 0.0001141944 2.331432
## 9 couch 4.701457e-05 0.0004567774 -2.273739
## 10 dion 4.701457e-05 0.0004567774 -2.273739
## # ... with 5,848 more rows
Which words have we used more/less frequently over time?
## # A tibble: 1,382 x 6
## time_floor person word count time_total word_total
## <dttm> <chr> <chr> <int> <int> <int>
## 1 2014-09-01 elhabro bad 1 73 41
## 2 2014-09-01 elhabro class 3 73 59
## 3 2014-09-01 elhabro gonna 1 73 77
## 4 2014-09-01 elhabro guess 2 73 36
## 5 2014-09-01 elhabro hours 1 73 40
## 6 2014-09-01 elhabro lol 1 73 86
## 7 2014-09-01 elhabro start 1 73 34
## 8 2014-09-01 elhabro twitter 1 73 49
## 9 2014-09-01 elhabro week 1 73 50
## 10 2014-10-01 elhabro bro 1 350 43
## # ... with 1,372 more rows
## # A tibble: 102 x 3
## person word data
## <chr> <chr> <list>
## 1 elhabro bad <tibble [27 x 4]>
## 2 elhabro class <tibble [24 x 4]>
## 3 elhabro gonna <tibble [32 x 4]>
## 4 elhabro guess <tibble [22 x 4]>
## 5 elhabro hours <tibble [22 x 4]>
## 6 elhabro lol <tibble [29 x 4]>
## 7 elhabro start <tibble [22 x 4]>
## 8 elhabro twitter <tibble [20 x 4]>
## 9 elhabro week <tibble [26 x 4]>
## 10 elhabro bro <tibble [14 x 4]>
## # ... with 92 more rows
## # A tibble: 102 x 4
## person word data models
## <chr> <chr> <list> <list>
## 1 elhabro bad <tibble [27 x 4]> <S3: glm>
## 2 elhabro class <tibble [24 x 4]> <S3: glm>
## 3 elhabro gonna <tibble [32 x 4]> <S3: glm>
## 4 elhabro guess <tibble [22 x 4]> <S3: glm>
## 5 elhabro hours <tibble [22 x 4]> <S3: glm>
## 6 elhabro lol <tibble [29 x 4]> <S3: glm>
## 7 elhabro start <tibble [22 x 4]> <S3: glm>
## 8 elhabro twitter <tibble [20 x 4]> <S3: glm>
## 9 elhabro week <tibble [26 x 4]> <S3: glm>
## 10 elhabro bro <tibble [14 x 4]> <S3: glm>
## # ... with 92 more rows
## # A tibble: 5 x 8
## person word term estimate std.error statistic
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 elhabro f0 time_floor 1.643854e-08 1.730128e-09 9.501347
## 2 elhabro lol time_floor -1.625020e-08 4.422427e-09 -3.674498
## 3 tonyelhabr game time_floor 7.449339e-08 2.036544e-08 3.657834
## 4 tonyelhabr kawhi time_floor 5.225945e-08 1.808690e-08 2.889354
## 5 elhabro class time_floor -1.561842e-08 5.672361e-09 -2.753424
## # ... with 2 more variables: p.value <dbl>, adjusted_p_value <dbl>
How often do our tweets get liked/favorited/retweeted?
## # A tibble: 2 x 10
## person uses rts_total favs_total rts_max favs_max rts_avg favs_avg
## <chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 elhabro 1148 4414 30412 909 737 3.84 26.49
## 2 tonyelhabr 184 131 1025 22 80 0.71 5.57
## # ... with 2 more variables: rts_median <dbl>, favs_median <dbl>
## # A tibble: 2,856 x 4
## person word rts_median favs_median
## <chr> <chr> <dbl> <dbl>
## 1 elhabro '16 0.0 20
## 2 elhabro '93 0.0 15
## 3 elhabro 'd 0.0 6
## 4 elhabro 'em 0.5 3
## 5 elhabro 'murica 1.0 5
## 6 elhabro 'twas 3.0 11
## 7 elhabro #1 0.0 5
## 8 elhabro #aparnahive 0.0 3
## 9 elhabro #ballislife 0.0 2
## 10 elhabro #battleof3009 0.0 1
## # ... with 2,846 more rows
## # A tibble: 5,712 x 5
## person word type calc value
## * <chr> <chr> <chr> <chr> <dbl>
## 1 elhabro '16 rts median 0.0
## 2 elhabro '93 rts median 0.0
## 3 elhabro 'd rts median 0.0
## 4 elhabro 'em rts median 0.5
## 5 elhabro 'murica rts median 1.0
## 6 elhabro 'twas rts median 3.0
## 7 elhabro #1 rts median 0.0
## 8 elhabro #aparnahive rts median 0.0
## 9 elhabro #ballislife rts median 0.0
## 10 elhabro #battleof3009 rts median 0.0
## # ... with 5,702 more rows
What is the sentiment (i.e. “tone”) of our tweets?
## # A tibble: 3,642 x 3
## status_id person total_words
## <dbl> <chr> <int>
## 1 5.145406e+17 elhabro 17701
## 2 5.148827e+17 elhabro 17701
## 3 5.148970e+17 elhabro 17701
## 4 5.151551e+17 elhabro 17701
## 5 5.155344e+17 elhabro 17701
## 6 5.155472e+17 elhabro 17701
## 7 5.156022e+17 elhabro 17701
## 8 5.159098e+17 elhabro 17701
## 9 5.159100e+17 elhabro 17701
## 10 5.159102e+17 elhabro 17701
## # ... with 3,632 more rows
## # A tibble: 20 x 4
## person sentiment total_words words
## <chr> <chr> <int> <dbl>
## 1 elhabro anger 17701 498
## 2 elhabro anticipation 17701 931
## 3 elhabro disgust 17701 348
## 4 elhabro fear 17701 553
## 5 elhabro joy 17701 652
## 6 elhabro negative 17701 1049
## 7 elhabro positive 17701 1404
## 8 elhabro sadness 17701 550
## 9 elhabro surprise 17701 427
## 10 elhabro trust 17701 830
## 11 tonyelhabr anger 3283 96
## 12 tonyelhabr anticipation 3283 193
## 13 tonyelhabr disgust 3283 64
## 14 tonyelhabr fear 3283 100
## 15 tonyelhabr joy 3283 159
## 16 tonyelhabr negative 3283 174
## 17 tonyelhabr positive 3283 333
## 18 tonyelhabr sadness 3283 91
## 19 tonyelhabr surprise 3283 86
## 20 tonyelhabr trust 3283 207
## # A tibble: 10 x 4
## sentiment elhabro tonyelhabr sentiment_diff
## <chr> <dbl> <dbl> <dbl>
## 1 positive 0.0793 0.1014 -0.0221
## 2 trust 0.0469 0.0631 -0.0162
## 3 joy 0.0368 0.0484 -0.0116
## 4 anticipation 0.0526 0.0588 -0.0062
## 5 surprise 0.0241 0.0262 -0.0021
## 6 anger 0.0281 0.0292 -0.0011
## 7 disgust 0.0197 0.0195 0.0002
## 8 fear 0.0312 0.0305 0.0007
## 9 sadness 0.0311 0.0277 0.0034
## 10 negative 0.0593 0.0530 0.0063
## # A tibble: 10 x 9
## sentiment estimate statistic p.value parameter conf.low
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 anger 0.9621243 498 7.346559e-01 501.0672 0.7719001
## 2 anticipation 0.8946753 931 1.626770e-01 948.1474 0.7654059
## 3 disgust 1.0084918 348 1.000000e+00 347.5416 0.7706078
## 4 fear 1.0256477 553 8.716172e-01 550.8365 0.8275475
## 5 joy 0.7605426 652 2.685413e-03 684.1170 0.6385506
## 6 negative 1.1181481 1049 1.808206e-01 1031.6585 0.9515492
## 7 positive 0.7819806 1404 8.331172e-05 1465.2419 0.6934323
## 8 sadness 1.1209710 550 3.281363e-01 540.7139 0.8965403
## 9 surprise 0.9208789 427 4.663807e-01 432.7398 0.7289193
## 10 trust 0.7436710 830 1.972226e-04 874.7587 0.6378809
## # ... with 3 more variables: conf.high <dbl>, method <fctr>,
## # alternative <fctr>
That’s it!