Introduction

This report compares the volume, behavior, and content of the tweets made by elhabro and tonyelhabr.

## # A tibble: 2 x 2
##       person     n
##        <chr> <int>
## 1    elhabro  3164
## 2 tonyelhabr   519
## # A tibble: 2 x 2
##       person           timestamp
##        <chr>              <dttm>
## 1    elhabro 2014-09-23 17:23:37
## 2 tonyelhabr 2015-08-19 15:06:32

3164 have been collected for elhabro. 519 have been collected for tonyelhabr. Note that the oldest tweet made by elhabro (in the collected data) is from 2014-09-23 17:23:37 and the oldest tweet from tonyelhabr is from 2015-08-19 15:06:32.

Tweet Volume

How often do elhabro and tonyelhabr tweet? Does the volume of tweets look different for temporal periods (e.g. year, month, etc.)?

Is the distribution of our volume of tweets given a certain temporal period statistically significant? Here, I use the Chi-Squared Test. If the p-value is calculated to be less thatn some threshold value (e.g. 0.05), then I can deduce that the the null hypothes (that the distribution is uniform) is invalid. In fact, it appears that our tweet volume does differ depending on the month and day of the week.

## 
##  Chi-squared test for given probabilities
## 
## data:  .
## X-squared = 116.75, df = 11, p-value < 2.2e-16
## 
## 
##  Chi-squared test for given probabilities
## 
## data:  .
## X-squared = 154.32, df = 11, p-value < 2.2e-16
## 
## 
##  Chi-squared test for given probabilities
## 
## data:  .
## X-squared = 48.031, df = 6, p-value = 1.165e-08
## 
## 
##  Chi-squared test for given probabilities
## 
## data:  .
## X-squared = 37.965, df = 6, p-value = 1.141e-06
## 
## [1] 0.9859181
## [1] 0.6603261
## 
##  Chi-squared test for given probabilities
## 
## data:  .
## X-squared = 47.579, df = 6, p-value = 1.434e-08
## 
## 
##  Chi-squared test for given probabilities
## 
## data:  .
## X-squared = 13.008, df = 6, p-value = 0.04292

Tweet Behavior

How often do we use hashtags, RT, and reply?

## # A tibble: 24 x 4
##        person     type response  value
##  *      <chr>    <chr>    <chr>  <dbl>
##  1    elhabro  hashtag      yes 0.0553
##  2 tonyelhabr  hashtag      yes 0.0539
##  3    elhabro hashtag2      yes 0.0531
##  4 tonyelhabr hashtag2      yes 0.0539
##  5    elhabro  hashtag       no 0.9447
##  6 tonyelhabr  hashtag       no 0.9461
##  7    elhabro hashtag2       no 0.9469
##  8 tonyelhabr hashtag2       no 0.9461
##  9    elhabro     link      yes 0.3666
## 10 tonyelhabr     link      yes 0.4586
## # ... with 14 more rows

Tweet Content

How Long are our tweets/

## # A tibble: 1 x 3
##   char_count_count char_count_avg char_count_max
##              <int>          <dbl>          <dbl>
## 1               60         344.85           3542

Word Frequency and Usage

Which words are we most likely to use?

## # A tibble: 1,538 x 3
##    screen_name  word          created_at
##          <chr> <chr>               <chr>
##  1     elhabro  009f 2015-01-01 03:46:08
##  2     elhabro  408d 2015-01-27 20:39:05
##  3     elhabro  009f 2015-03-25 02:25:31
##  4     elhabro  009f 2015-03-25 02:25:31
##  5     elhabro  009f 2015-03-25 02:25:31
##  6     elhabro  009f 2015-04-21 18:18:11
##  7     elhabro  009f 2015-04-29 05:48:08
##  8     elhabro  008e 2015-04-29 05:48:08
##  9     elhabro  009f 2015-04-29 05:48:08
## 10     elhabro  00ab 2015-04-29 05:48:08
## # ... with 1,528 more rows
## # A tibble: 2 x 2
##       person total
##        <chr> <int>
## 1    elhabro 17701
## 2 tonyelhabr  3283
## # A tibble: 8,121 x 5
## # Groups:   person [2]
##     person            word     n total        freq
##      <chr>           <chr> <int> <int>       <dbl>
##  1 elhabro              f0   862 17701 0.048697814
##  2 elhabro            game   145 17701 0.008191628
##  3 elhabro            time   116 17701 0.006553302
##  4 elhabro     @sailordanj    95 17701 0.005366928
##  5 elhabro             cuz    89 17701 0.005027965
##  6 elhabro             day    88 17701 0.004971471
##  7 elhabro             lol    79 17701 0.004463025
##  8 elhabro @dragonflyjonez    73 17701 0.004124061
##  9 elhabro           gonna    73 17701 0.004124061
## 10 elhabro            haha    67 17701 0.003785097
## # ... with 8,111 more rows
## # A tibble: 6,922 x 3
##               word      elhabro   tonyelhabr
##              <chr>        <dbl>        <dbl>
##  1    #classof2016 5.649398e-05 0.0003045995
##  2      #finalfour 5.649398e-05 0.0003045995
##  3     #graduation 5.649398e-05 0.0003045995
##  4   #marchmadness 5.649398e-05 0.0003045995
##  5         #rstats 5.649398e-05 0.0003045995
##  6 @alisongriswold 5.649398e-05 0.0003045995
##  7 @basketballtalk 5.649398e-05 0.0003045995
##  8  @bill_easterly 5.649398e-05 0.0003045995
##  9 @chldishricardo 5.649398e-05 0.0003045995
## 10 @coliegestudent 5.649398e-05 0.0003045995
## # ... with 6,912 more rows

Which words are most likely to be shared/different between us?

## # A tibble: 5,858 x 4
##      word      elhabro   tonyelhabr logratio
##     <chr>        <dbl>        <dbl>    <dbl>
##  1   haha 0.0031969911 0.0001141944 3.332063
##  2    ppl 0.0022096850 0.0001141944 2.962703
##  3    yea 0.0015984955 0.0001141944 2.638916
##  4   prob 0.0015044664 0.0001141944 2.578292
##  5 summer 0.0012223789 0.0001141944 2.370652
##  6     f0 0.0405735778 0.0038826082 2.346610
##  7   lmao 0.0023507287 0.0002283887 2.331432
##  8   told 0.0011753644 0.0001141944 2.331432
##  9   mins 0.0009873061 0.0001141944 2.157078
## 10  gotta 0.0018805830 0.0002283887 2.108288
## # ... with 5,848 more rows
## # A tibble: 5,858 x 4
##          word      elhabro   tonyelhabr    logratio
##         <chr>        <dbl>        <dbl>       <dbl>
##  1     career 0.0005641749 0.0005709718 -0.01197551
##  2        job 0.0005641749 0.0005709718 -0.01197551
##  3     lakers 0.0005641749 0.0005709718 -0.01197551
##  4 #nbafinals 0.0002350729 0.0002283887  0.02884648
##  5        1st 0.0002350729 0.0002283887  0.02884648
##  6    android 0.0002350729 0.0002283887  0.02884648
##  7      bring 0.0002350729 0.0002283887  0.02884648
##  8        bus 0.0002350729 0.0002283887  0.02884648
##  9     buying 0.0002350729 0.0002283887  0.02884648
## 10     caught 0.0002350729 0.0002283887  0.02884648
## # ... with 5,848 more rows
## # A tibble: 5,858 x 4
##      word      elhabro   tonyelhabr  logratio
##     <chr>        <dbl>        <dbl>     <dbl>
##  1   haha 3.196991e-03 0.0001141944  3.332063
##  2    ppl 2.209685e-03 0.0001141944  2.962703
##  3    yea 1.598496e-03 0.0001141944  2.638916
##  4   prob 1.504466e-03 0.0001141944  2.578292
##  5 summer 1.222379e-03 0.0001141944  2.370652
##  6     f0 4.057358e-02 0.0038826082  2.346610
##  7   lmao 2.350729e-03 0.0002283887  2.331432
##  8   told 1.175364e-03 0.0001141944  2.331432
##  9  couch 4.701457e-05 0.0004567774 -2.273739
## 10   dion 4.701457e-05 0.0004567774 -2.273739
## # ... with 5,848 more rows

Which words have we used more/less frequently over time?

## # A tibble: 1,382 x 6
##    time_floor  person    word count time_total word_total
##        <dttm>   <chr>   <chr> <int>      <int>      <int>
##  1 2014-09-01 elhabro     bad     1         73         41
##  2 2014-09-01 elhabro   class     3         73         59
##  3 2014-09-01 elhabro   gonna     1         73         77
##  4 2014-09-01 elhabro   guess     2         73         36
##  5 2014-09-01 elhabro   hours     1         73         40
##  6 2014-09-01 elhabro     lol     1         73         86
##  7 2014-09-01 elhabro   start     1         73         34
##  8 2014-09-01 elhabro twitter     1         73         49
##  9 2014-09-01 elhabro    week     1         73         50
## 10 2014-10-01 elhabro     bro     1        350         43
## # ... with 1,372 more rows
## # A tibble: 102 x 3
##     person    word              data
##      <chr>   <chr>            <list>
##  1 elhabro     bad <tibble [27 x 4]>
##  2 elhabro   class <tibble [24 x 4]>
##  3 elhabro   gonna <tibble [32 x 4]>
##  4 elhabro   guess <tibble [22 x 4]>
##  5 elhabro   hours <tibble [22 x 4]>
##  6 elhabro     lol <tibble [29 x 4]>
##  7 elhabro   start <tibble [22 x 4]>
##  8 elhabro twitter <tibble [20 x 4]>
##  9 elhabro    week <tibble [26 x 4]>
## 10 elhabro     bro <tibble [14 x 4]>
## # ... with 92 more rows
## # A tibble: 102 x 4
##     person    word              data    models
##      <chr>   <chr>            <list>    <list>
##  1 elhabro     bad <tibble [27 x 4]> <S3: glm>
##  2 elhabro   class <tibble [24 x 4]> <S3: glm>
##  3 elhabro   gonna <tibble [32 x 4]> <S3: glm>
##  4 elhabro   guess <tibble [22 x 4]> <S3: glm>
##  5 elhabro   hours <tibble [22 x 4]> <S3: glm>
##  6 elhabro     lol <tibble [29 x 4]> <S3: glm>
##  7 elhabro   start <tibble [22 x 4]> <S3: glm>
##  8 elhabro twitter <tibble [20 x 4]> <S3: glm>
##  9 elhabro    week <tibble [26 x 4]> <S3: glm>
## 10 elhabro     bro <tibble [14 x 4]> <S3: glm>
## # ... with 92 more rows
## # A tibble: 5 x 8
##       person  word       term      estimate    std.error statistic
##        <chr> <chr>      <chr>         <dbl>        <dbl>     <dbl>
## 1    elhabro    f0 time_floor  1.643854e-08 1.730128e-09  9.501347
## 2    elhabro   lol time_floor -1.625020e-08 4.422427e-09 -3.674498
## 3 tonyelhabr  game time_floor  7.449339e-08 2.036544e-08  3.657834
## 4 tonyelhabr kawhi time_floor  5.225945e-08 1.808690e-08  2.889354
## 5    elhabro class time_floor -1.561842e-08 5.672361e-09 -2.753424
## # ... with 2 more variables: p.value <dbl>, adjusted_p_value <dbl>

Tweet Popularity

How often do our tweets get liked/favorited/retweeted?

## # A tibble: 2 x 10
##       person  uses rts_total favs_total rts_max favs_max rts_avg favs_avg
##        <chr> <int>     <int>      <int>   <dbl>    <dbl>   <dbl>    <dbl>
## 1    elhabro  1148      4414      30412     909      737    3.84    26.49
## 2 tonyelhabr   184       131       1025      22       80    0.71     5.57
## # ... with 2 more variables: rts_median <dbl>, favs_median <dbl>
## # A tibble: 2,856 x 4
##     person          word rts_median favs_median
##      <chr>         <chr>      <dbl>       <dbl>
##  1 elhabro           '16        0.0          20
##  2 elhabro           '93        0.0          15
##  3 elhabro            'd        0.0           6
##  4 elhabro           'em        0.5           3
##  5 elhabro       'murica        1.0           5
##  6 elhabro         'twas        3.0          11
##  7 elhabro            #1        0.0           5
##  8 elhabro   #aparnahive        0.0           3
##  9 elhabro   #ballislife        0.0           2
## 10 elhabro #battleof3009        0.0           1
## # ... with 2,846 more rows
## # A tibble: 5,712 x 5
##     person          word  type   calc value
##  *   <chr>         <chr> <chr>  <chr> <dbl>
##  1 elhabro           '16   rts median   0.0
##  2 elhabro           '93   rts median   0.0
##  3 elhabro            'd   rts median   0.0
##  4 elhabro           'em   rts median   0.5
##  5 elhabro       'murica   rts median   1.0
##  6 elhabro         'twas   rts median   3.0
##  7 elhabro            #1   rts median   0.0
##  8 elhabro   #aparnahive   rts median   0.0
##  9 elhabro   #ballislife   rts median   0.0
## 10 elhabro #battleof3009   rts median   0.0
## # ... with 5,702 more rows

Sentiment Analysis

What is the sentiment (i.e. “tone”) of our tweets?

## # A tibble: 3,642 x 3
##       status_id  person total_words
##           <dbl>   <chr>       <int>
##  1 5.145406e+17 elhabro       17701
##  2 5.148827e+17 elhabro       17701
##  3 5.148970e+17 elhabro       17701
##  4 5.151551e+17 elhabro       17701
##  5 5.155344e+17 elhabro       17701
##  6 5.155472e+17 elhabro       17701
##  7 5.156022e+17 elhabro       17701
##  8 5.159098e+17 elhabro       17701
##  9 5.159100e+17 elhabro       17701
## 10 5.159102e+17 elhabro       17701
## # ... with 3,632 more rows
## # A tibble: 20 x 4
##        person    sentiment total_words words
##         <chr>        <chr>       <int> <dbl>
##  1    elhabro        anger       17701   498
##  2    elhabro anticipation       17701   931
##  3    elhabro      disgust       17701   348
##  4    elhabro         fear       17701   553
##  5    elhabro          joy       17701   652
##  6    elhabro     negative       17701  1049
##  7    elhabro     positive       17701  1404
##  8    elhabro      sadness       17701   550
##  9    elhabro     surprise       17701   427
## 10    elhabro        trust       17701   830
## 11 tonyelhabr        anger        3283    96
## 12 tonyelhabr anticipation        3283   193
## 13 tonyelhabr      disgust        3283    64
## 14 tonyelhabr         fear        3283   100
## 15 tonyelhabr          joy        3283   159
## 16 tonyelhabr     negative        3283   174
## 17 tonyelhabr     positive        3283   333
## 18 tonyelhabr      sadness        3283    91
## 19 tonyelhabr     surprise        3283    86
## 20 tonyelhabr        trust        3283   207
## # A tibble: 10 x 4
##       sentiment elhabro tonyelhabr sentiment_diff
##           <chr>   <dbl>      <dbl>          <dbl>
##  1     positive  0.0793     0.1014        -0.0221
##  2        trust  0.0469     0.0631        -0.0162
##  3          joy  0.0368     0.0484        -0.0116
##  4 anticipation  0.0526     0.0588        -0.0062
##  5     surprise  0.0241     0.0262        -0.0021
##  6        anger  0.0281     0.0292        -0.0011
##  7      disgust  0.0197     0.0195         0.0002
##  8         fear  0.0312     0.0305         0.0007
##  9      sadness  0.0311     0.0277         0.0034
## 10     negative  0.0593     0.0530         0.0063
## # A tibble: 10 x 9
##       sentiment  estimate statistic      p.value parameter  conf.low
##           <chr>     <dbl>     <dbl>        <dbl>     <dbl>     <dbl>
##  1        anger 0.9621243       498 7.346559e-01  501.0672 0.7719001
##  2 anticipation 0.8946753       931 1.626770e-01  948.1474 0.7654059
##  3      disgust 1.0084918       348 1.000000e+00  347.5416 0.7706078
##  4         fear 1.0256477       553 8.716172e-01  550.8365 0.8275475
##  5          joy 0.7605426       652 2.685413e-03  684.1170 0.6385506
##  6     negative 1.1181481      1049 1.808206e-01 1031.6585 0.9515492
##  7     positive 0.7819806      1404 8.331172e-05 1465.2419 0.6934323
##  8      sadness 1.1209710       550 3.281363e-01  540.7139 0.8965403
##  9     surprise 0.9208789       427 4.663807e-01  432.7398 0.7289193
## 10        trust 0.7436710       830 1.972226e-04  874.7587 0.6378809
## # ... with 3 more variables: conf.high <dbl>, method <fctr>,
## #   alternative <fctr>

Conclusion

That’s it!