1 Background

This document describes an initial analysis of the posting style used on the Donald Trump Twitter account @realDonaldTrump, focusing on the infamous “pled” Tweet posted on 2017/02/12:

I had to fire General Flynn because he lied to the Vice President and the FBI. He has pled guilty to those lies. It is a shame because his actions during the transition were lawful. There was nothing to hide! (Twitter ID: 937007006526959618)

The issue with this Tweet is that it may amount to an admission of obstructing justice if it was authored by Donald Trump. However, the President’s lawyer, John Dowd, later claimed that he had written the Tweet, potentially exonerating the President. This has led to considerable discussion about the authorship of the Tweet (e.g. see Ben Zimmer’s article in the Atlantic).

Prior to this controversy, we had been working on a stylistic analysis of the Tweeting style of the @realDonaldTrump account and we therefore decided to analyse the style of this questioned Tweet within the framework we have been developing. These results are preliminary, but we believe they are reliable and worth sharing at this point in time.

Questions, comments, and suggestions for improving our analysis are welcome.

2 Research Goals

Specifically, the main goals of our analysis are to

  1. Identify the primary dimensions of stylistic variation of the Tweets posted on the @realDonaldTrump account

  2. See where the questioned “pled” Tweet falls on these dimensions to assess how similar it is stylistically to the rest of the Tweets posted on this account.

  3. Is there change over time in style of these tweeets as identified by each of these dimensions?

Crucially, we are not trying to attribute the questioned Tweet. The problem is that we have very little reliable information about who authored any of the Tweets posted on the @realDonaldTrump account and it seems likely that multiple users have posted on this account over time. Based on the metadata provided with Twitter data, a previous study has claimed that Tweets posted using an Android phone, at least during the campaign, were written by Donald Trump, while Tweets posted using an iPhone were written by his assoicates. Although Trump was known to use an Android phone during the campaign, no evidence has been provided that his associates did not also use an Android phone, and we therefore consider this assumption to be unfounded. Furthermore, we do not have a comparison corpus of Tweets authored by Dowd.

Rather than conduct an authorship analysis, we therefore analysed the general Tweeting style used on this account based on an analysis of the usage of a wide range of grammatical forms. We then examined how the questioned Tweet compares.

3 Summary of Results

Based on a stylistic analysis of the absence and presence of a wide range of grammtical features, we find that the questioned Tweet is stylistically dissimilar to most Tweets posted on the @realDonaldTrump account.

4 Data

The data for this analysis was downloaded from the Trump Twitter Archives. We then removed all retweets from this corpus to focus on Tweets that originated from this account, leaving 21,320 Tweets posted between 2009 and 2017 in our final corpus, including the questioned Tweet. We also included the length of each Tweet in word tokens in this dataset, which we consider in our analysis, as well as a range of metadata, which we have not yet incorporated into our analysis.

5 Analysis

5.1 Short Text Multidimensional Analysis

Our approch to the stylistic analysis of the corpus, which is described in more detail in Clarke & Grieve (2017), is essentially a form of Douglas Biber’s Multidimensional Analysis (MDA) (e.g. see Biber, 1988), which we have modified for the analysis of short texts.

The goal of an MDA is to identify the most important dimensions of stylistic variation in a corpus of texts. This is usually achieved by measuring the relative frequencies of a large number of grammatical forms across the texts and then subjecting the resultant dataset to a factor analysis to identify the most common patterns of grammatical variation in those texts. These dimensions are then interpreted stylistically, based on both the linguistic features and the individual texts most strongly associated with each dimension.

The problem with applying MDA to short texts, such as Tweets, is that the relative frequencies of most grammatical forms cannot be measured accurately in short texts. To overcome this issue, we have developed a categorical form of MDA, which instead of looking at the relative frequencies of grammatical features, simply looks at their presence or absence in each text. We then use multiple correspondence analysis (MCA), as opposed to factor analysis, to identify common patterns of stylistic variation in the resultant binary linguistic data matrix.

5.2 Corpus Analysis

First, we tagged the corpus Tweets for part-of-speech information using the Gimpel et al. (2011) Twitter tagger. Using Perl scripts to analyse the tagged corpus, we then recorded the presence or absence of a wide range of grammatical forms (e.g. prepositions, time adverbials, predicative adjectives, passives), based on the standard feature set used in MDA (e.g. see Biber, 1988). We also recorded the presence/absence of a number of additional forms associated with Twitter and computer mediated communication more generally (e.g. hashtags, mentions, URLs).

All features that occured in at least 5% of the Tweets were retained, yielding a 21,320 text by 65 linguistic feature binary data matrix for further analysis (see Appendix for complete feature set).

5.3 Initial Stylistic Analysis

Before conducting a multivariate stylistic analysis of the complete corpus, we first identified individual features in the questioned Tweet that are especially rare in the complete corpus. The rest of the analysis presented in this document was conducted using the programming language R.

First, we read in the complete dataset, which is available online for download here.

tweets <- read.table("trumpdata.csv", header = TRUE, sep = ",")

We then computed the percentage of the 21,320 Tweets in the complete corpus that contain each of our 65 linguistic features and compared these percentages with the indivudal presence and absence of features in the questioned Tweet.

percentage <- data.frame()
for (i in 2:66) {
    temp <- as.data.frame(100 * prop.table(table(tweets[, i])))
    percentage[colnames(tweets[i]), 1] <- temp[2, 2]
}
comparison <- merge(percentage, t(as.data.frame(tweets[1, 2:66])), by = "row.names")
colnames(comparison) <- c("FEATURE", "OVERALL PERCENTAGE", "QUESTIONED TWEET")
comparison

Based on this analysis, we can see that the questioned Tweet contains a number of forms that are relatively rare in the larger corpus, including perfect aspect and auxiliary HAVE (“has pled”), pronoun “it”, public verbs (“lied”), copula BE (“were”), predicative adjectives (“lawful”), and infinitival “to” (“to fire”, “to hide”), all of which occur in less than 20% of the Tweets in the corpus.

Even more notably, however, there are a number of additional forms that occur in the questioned Tweet but that we excluded from our multivariate analysis because they occured in less than 5% of the 21,320 Tweets in the complete corpus.

Specifically, “had to” occurs in only 30 Tweets, “those” as a determiner occurs in only 43 Tweets, existential “there” occurs in only 262 Tweets, and “because”, which is used twice in the questioned Tweet, occurs in only 280 Tweets.

Overall, we therefore find strong initial evidence that the style of the questioned Tweet was written in a relatively atypical for the @realDonaldTrump account.

5.4 Statistical Stylistic Analysis

Next, we subjected the binary linguistic data matrix to a multiple correspondence analysis to identify underlying stylistic dimensions in the @realDonaldTrump Twitter account.

5.4.1 Packages

For this stage of the analysis we used the FactoMineR package to conduct MCA.

library(FactoMineR)

5.4.2 Multiple Correspondence Analysis

We ran an MCA on the linguistic data matrix to identify common patterns of grammatical variation in the corpus.

mcares <- MCA(tweets[, c(2:66)], graph = FALSE, ncp = 6)

coordind <- as.data.frame(mcares$ind$coord)
coordvar <- as.data.frame(mcares$var$coord)
contribvar <- as.data.frame(mcares$var$contrib)

We extracted the first six dimensions, primarily because they are interpretable (see below) and because the amount of variance explained drops to some degree after the sixth dimension.

eig <- as.data.frame(mcares$eig)
barplot(eig$"percentage of variance", ylim = c(0, 8), col = "dodgerblue", xlab = "Dimensions", 
    ylab = "Percentag of Variance Explained")

5.5 Interpretation of Dimensions

We then interpreted each of the six dimensions stylistically based on both the linguistic features and the individual Tweets most strongly associated with each pole of the dimension. It is important to note that these interpretations are preliminary. Our interpretations, howevever, do not affect the results of the MCA, including the relative stylistic typicality of the questioned Tweet.

5.5.1 Dimension 1: Length

Because we look at the presence/absence of grammatical features in each individual Tweet, rather than their relative frequencies, we have not controlled for text length. We chose to ignore relative frequency because Tweets are too short for relative frequencies to be meaningful. For example, a word that occurs once in a given Tweet, should not be assumed to occur once in every Tweet posted on that account, and a word that never occurs in a given Tweet, should not be assumed to never occur. This choice, however, means that text length is potentially a confounding variable in our analysis. Fortunately, the first dimension returned by the MCA, which primarily opposes the general absence and presence of features, is very strongly correlated to text length (r = .86), as can be seen in the figure below. Note that the relatively small number of Tweets larger than 30 words reflects the recent increase of Tweet length from 140 to 280 characters.

cor(tweets$WORDCOUNT, coordind$"Dim 1")
## [1] 0.8585055
plot(tweets$WORDCOUNT, coordind$"Dim 1", pch = ".", col = "dodgerblue", cex = 2, 
    xlab = "Word Count", ylab = "Dimension 1")

Furthermore, we found that the other 5 dimensions show weak correlations with text length.

cor(tweets$WORDCOUNT, coordind$"Dim 2")
## [1] -0.1047239
cor(tweets$WORDCOUNT, coordind$"Dim 3")
## [1] -0.1258194
cor(tweets$WORDCOUNT, coordind$"Dim 4")
## [1] -0.1363268
cor(tweets$WORDCOUNT, coordind$"Dim 5")
## [1] 0.1311681
cor(tweets$WORDCOUNT, coordind$"Dim 6")
## [1] 0.06249827

By running an MCA we are therefore able to essentially control for text length, even though we did not measure relative frequencies directly.

5.5.2 Dimension 2: Opinion

We interpreted Dimension 2 as primarily relating to the expression of opinion, with Tweets scoring positively on this dimension tending to contain overt claims related to the judgments and beliefs of the poster on a wide range of topics, including political and social issues. Alternatively, Tweets scoring negatively on this dimension tend to contain obstensibly factual claims, often linking to media reports.

A selection of Tweets assigned especially strong positive scores on Dimension 2 are presented below, all of which clearly express opinions.

When someone can discourage you, you probably aren’t determined enough. Be resolute. That’s what it takes to get things done. (316645737104539648)

Remember, no one ever said success was easy.Good luck doesn’t come overnight.But if u work hard & love it, u will find success & luck. (307195142387929088)

As everybody knows, but the haters & losers refuse to acknowledge, I do not wear a “wig.” My hair may not be perfect but it’s mine. (327077073380331525)

I have always been the same person-remain true to self.The media wants me to change but it would be very dishonest to supporters to do so! (764974286931169285)

Do you ever notice that lightweight @megynkelly constantly goes after me but when I hit back it is totally sexist. She is highly overrated! (646504336940531712)

A selection of Tweets assigned especially strong negative scores on Dimension 2 are presented below, all of which are far less overtly opinionated.

Congratulations to @TrumpDoral’s #BlueMonster course for being named one of the 10 Toughest courses on Tour This Year http://t.co/IanKmlfj8P (522080773776900097)

Join me in congratulating @NASA’s @AstroPeggy by using the hashtag #CongratsPeggy! Earlier today:… https://t.co/AUdJgxXiT4 (856556220911681539)

Congratulations to @TrumpDoral for being named one of @LINKSMagazine’s Great Destinations: http://t.co/2amXtgXTwQ (532552347265007618)

My @gretawire interview re: the dismal job report, getting ripped off by South Korea, 2016 election & #WWEHOF http://t.co/klReS5C1Ox (321322146120998912)

Today in history WrestleMania 23: I shave @VinceMcMahon’s hair–highest rated show in WWE history @WrestleFact http://t.co/7su88r1Mr0 (451080874201579520)

This distinction between opinionated and non-opinionated lanaguage is also reflected in the linguistic features most strongly associated with this dimension (i.e. with positive features tending to occur in Tweets that have strong positive scores and negative features tending to occur in Tweets that have strong negative scores).

Features associated with opinionated Tweets include copular verbs, predicative adjectives, amplifiers and other adverbs, exclamation marks, private verbs, stance verbs, subject pronouns, analytic negation, and contrastive conjunctions.

Features associated with non-opinionated Tweets include URLs, hashtags, mentions, auxiliary BE, colons, nominalisations, numerals, and prepositions.

5.5.3 Dimension 3: Prediction

We interpreted Dimension 3 as primarily relating to the prediction of future events, with Tweets scoring positively on this dimension tending to contain claims about plans, upcoming events, and the future state of the world and Trump’s role in it. Alternatively, Tweets scoring negatively on this dimension tend to contain comments on the current state of the world.

A selection of Tweets assigned especially strong positive scores on Dimension 3 are presented below, all of which make predictions about future events.

They will soon be calling me MR. BREXIT! (766246213079498752)

Thank you @BillKristol. I am going to Make America Great Again! (636263744247623681)

I will be doing @GMA @GStephanopoulos this morning at around 7:00. Likewise, I will be doing @Morning_Joe at around 7:00. Figure it out! (691934371654270976)

Will be doing @greta interview tomorrow. So much to talk about! (603362659824705537)

Hey @Rosie–how is your recovery going? I hope you are doing well so we can start fighting again soon! (259037716283535364)

A selection of Tweets assigned especially strong negative scores on Dimension 3 are presented below, all of which are commenting on current or past events.

Putin has become a big hero in Russia with an all time high popularity. Obama, on the other hand, has fallen to his lowest ever numbers. SAD (447191138962014208)

Obama’s convention bounce is gone. @MittRomney has retaken the lead in the latest @RasmussenPoll http://t.co/yOw3U3ja (246315320871104512)

Stock Market has increased by 5.2 Trillion dollars since the election on November 8th, a 25% increase. Lowest unemployment in 16 years and.. (918058910673760258)

The Dunes here are amazing, and they’re how I learned about geomorphology, which is the study of movement landforms. We’ve had a great trip (14848387814)

Obama’s foreign policy is a complete and total disaster–the worst President we have ever had. (246246297579958274)

This distinction between predictive and non-predictive lanaguage is also reflected in the linguistic features most strongly associated with this dimension.

Features associated with predictive Tweets include the presence of progressives and auxiliary BE (primarily reflecting use of the “BE going to” construction), modals of prediction, time adverbs, passives, exclamation marks, and first person pronouns.

Features associated with non-predictive Tweets include the presence of verbs in the perfect aspect and auxiliary HAVE, past tense, BE as main verb, superlatives, and various forms of determiners.

5.5.4 Dimension 4: Advice

We interpreted Dimension 4 as primarily relating to the giving of advice, with Tweets scoring negatively on this dimension tending to give counsel, guidance, or instruction. Alternatively, Tweets scoring positively on this dimension tend to contain more general claims.

A selection of Tweets assigned especially strong negative scores on Dimension 4 are presented below, all of which give advice.

Entrepreneurs: See yourself as victorious–and the best way to be victorious is to be passionate. Find something you love doing! (468477764757377024)

Be sure to watch Oprah today (4 pm on Channel 7), I’ll be on with my entire family and it will be an entertaining hour..http://bit.ly/f7wNY8 (34616290815512576)

Success tip: See yourself as victorious. This will focus you in the right direction. Apply your skills and talent–and be tenacious. (398184556005556224)

Negotiation tip: Be reasonable & flexible. Being open to change could lead you into a fortunate situation and open the door to innovation. (362215996246851585)

Never allow your attitude to be a liability. Be positive and strong. Set your mind on winning– and keep it there. (431188704380465153)

A selection of Tweets assigned especially strong positive scores on Dimension 4 are presented below, all of which makes claims as opposed to giving advice.

@bitchforshort Victim never died so they would not have been given death penalty-but did you see what these “innocent” boys were doing? (326959778163224577)

Congrats to Roger Clemens, he showed great courage. This case never should have been brought to trial. Andy Pettitte did the right thing. (215160398595362816)

Thank you to @DailyTelegraph reviewer @NeilMidgley who stated “‘You’ve Been Trumped’ was so biased in favour of the protesters… (260842555070115841)

Just in, big news- I have been declared the winner of the CNMI Rep Caucus with 72.8% of the vote! Thank you! #SuperTuesday #VoteTrump (709720456194994176)

The U.S. has appealed ro Russia not to intervene in Ukraine - Russia tells U.S. they will not become involved, and then laughs loudly! (439389810269380608)

This distinction between advisory and non-advisory lanaguage is also reflected in the linguistic features most strongly associated with this dimension.

Features associated with advisory Tweets include the presence of imperatives, mentions, modals of possibility, capitalisation, stance verbs, and superlatives.

Features associated with non-advisory Tweets include the presence of perfect aspect, auxiliary HAVE, past tense, passives, public verbs, questions, and modals of necessity.

In addition, note that we have labelled this dimension based on the negative pole, which is associated with more advisory Tweets. This is not an issue because the orientation of a dimension returned by an MCA is arbitrary. However, for the sake of consistency and to simplify the visualisations of the dimensions below, we have multiplied the dimension scores by -1 so that more advisory Tweets are assigned positive scores on this dimension.

5.5.5 Dimension 5: Promotion

We interpreted Dimension 5 as primarily relating to self-promotion, with Tweets scoring positively on this dimension tending to publicise Trump and his campaign. Alternatively, Tweets scoring negatively on this dimension tend to be far less personal, often containing comments on external people and events.

A selection of Tweets assigned especially strong postive scores on Dimension 5 are presented below, all of which are highly promotional.

Thank you Las Vegas, Nevada- I love you! Departing for Greeley, Colorado now. Get out & VOTE! #ICYMI- watch here:… https://t.co/uIloPRtEn9 (792804645542305792)

Thank you Novi, Michigan! Get out and VOTE #TrumpPence16 on 11/8. Together, WE WILL MAKE AMERICA GREAT AGAIN!… https://t.co/n7Gd6Xc18H (781992481160323072)

Join me live- now in Las Vegas, Nevada! We will MAKE AMERICA SAFE & GREAT AGAIN! #VoteTrumpNV #NevadaCaucus https://t.co/IW9s9noxDT (701977550315098113)

Governor @Mike_Pence and I will be in Cleveland, Ohio tomorrow night at 7pm - join us! #MAGATickets:… https://t.co/kfJv5Po0x6 (789657808052621313)

We have many problems in our house (country!), and we need to fix them before we let visitors come over and stay. MAKE AMERICA GREAT AGAIN! (680481995374329856)

A selection of Tweets assigned especially strong negative scores on Dimension 5 are presented below, all of which have a far more impersonal focus.

The liberal media is focusing on @MittRomney’s bank records. How about reviewing @BarackObama’s illegal land deal contracts with Tony Rezko? (222777861831000064)

Army officer who led a sexual abuse prevention unit was just fired after being charged with violently going after his wife.What is going on? (335330801216536576)

How does Obama rationalize giving Iran $8B in sanction relief when a Christian pastor is being tortured in an Iranian prison? (405429450671529984)

How much is New York State spending on that obnoxious T.V. commercial that is being played endlessly for a tax incentive that doesn’t work? (428142410087297024)

Youth unemployment is at a record high. ObamaCare is a job destroyer which is ruining aspiring careers. It must be repealed. (494203189957496832)

This distinction between promotional and non-promotional lanaguage is also reflected in the linguistic features most strongly associated with this dimension.

Features associated with promotional Tweets include the presence of first person pronouns, exclamation marks, hashtags, capitalisation, imperatives, interjections, possessive determiners, time adverbials, and subject and object pronouns.

Features associated with non-promotional Tweets include the presence of questions and WH words, passives, third person singular, auxiliary BE, progressives, and nominalisations.

5.5.6 Dimension 6: Critique

Finally, we interpreted Dimension 6 as primarily relating to critical attacks, with Tweets scoring negatively on this dimension tending to contain criticism of various people and policies. Alternatively, Tweets scoring positively on this dimension tend to be more positive or neutral.

A selection of Tweets assigned especially strong negative scores on Dimension 6 are presented below, all of which are highly critical.

N.Y. City is paying FORTY MILLION DOLLARS to five men that many think are guilty as hell. So many facts - should have been trial. Politics! (480668639201198080)

With the $635 million dollar website fiasco, getting caught tapping phones of WORLD LEADERS and so much more, U.S. is looking really stupid! (394120617001492480)

Our GDP has been growing less than 2% for the last 5 years. ObamaCare will slow us down even more. Has to be repealed. (358315180217741312)

Dems have been complaining for months & months about Dir. Comey. Now that he has been fired they PRETEND to be aggrieved. Phony hypocrites! (862387734492663808)

I am extremely pleased to see that @CNN has finally been exposed as #FakeNews and garbage journalism. It’s about time! (881138485905772549)

A selection of Tweets assigned especially strong positive scores on Dimension 6 are presented below, all of which have a far less critical and often motivational style.

Ask: Is there anyone else who can do this better than I can?That’s just another way of saying know yourself & know your competition. (298846966853545984)

The more you know, the more you realize how much you don’t know. How can you possibly discover anything if you already know everything? (483688700447326209)

You’ve got something unique to offer–find out what it is. Ask yourself: What can I provide that does not yet exist? Innovation can follow.. (301801582369071104)

Ask yourself: “What can I learn today that I didn’t know before?” Always be a student, always be open to new ideas. (451820042876116993)

You’ve got something unique to offer. Find out what it is. Ask yourself: What can I provide that does not yet exist? (393406404494520320)

This distinction between critical and non-critical lanaguage is also reflected in the linguistic features most strongly associated with this dimension.

Features associated with critical Tweets include the presence of copular verbs and predicative adjectives, amplifiers, exclamation marks, capitalisation, passives, and perfect aspect.

Features associated with non-critical Tweets include the presence of WH forms and questions, modals of possibility, articles, subordinators, and BE as main verb.

In addition, note that we have labelled this dimension based on the negative pole, which we have labelled as being more critical. Like Dimension 4, we have therefore multiplied the dimension scores by -1 so that more critical Tweets are assigned positive scores on this dimension.

5.6 Analysis of Questioned Tweet

After generating and interpreting the main stylistic dimensions in the @realDonaldTrump Twitter corpus, we were also able to see how the questioned Tweet compares to the style of language usually used on this account.

5.6.1 Visualisation

We plotted the complete set of 21,320 Tweets in the corpus, including the questioned Tweet, along our six dimensions, two dimensions at a time, to visualise the stylistic distribution of Tweets posted on @realDonaldTrump. We also generated histograms for each dimension individually showing the distributon of Tweets.

In both cases, we located the position of the questioned Tweet (which is in the first row of our dataset) along these dimensions, marked with a red circle in the scatterplots and a red line in the histograms, allowing for the typicality of the style of this Tweet to be compared to the other Tweets posted on this account.

Note that Dimensions 4 and 6 have been inversed as discussed above and that Dimension 1 is not especially meaningful, as it primarily reflects Tweet length.

coordind$"Dim 1"[1]
## [1] 0.5472094
coordind$"Dim 2"[1]
## [1] 0.2480119
plot(coordind$"Dim 1", coordind$"Dim 2", pch = ".", col = "dodgerblue", cex = 1.5, 
    xlab = "Length", ylab = "Opinion")
points(coordind$"Dim 1"[1], coordind$"Dim 2"[1], pch = 1, col = "red", cex = 2, 
    lwd = 2)

coordind$"Dim 3"[1]
## [1] -0.4193067
coordind$"Dim 4"[1] * -1
## [1] -0.1254125
plot(coordind$"Dim 3", coordind$"Dim 4" * -1, pch = ".", col = "dodgerblue", 
    cex = 1.5, xlab = "Prediction", ylab = "Advice")
points(coordind$"Dim 3"[1], coordind$"Dim 4"[1] * -1, pch = 1, col = "red", 
    cex = 2, lwd = 2)

coordind$"Dim 5"[1]
## [1] 0.1177956
coordind$"Dim 6"[1] * -1
## [1] 0.3366119
plot(coordind$"Dim 5", coordind$"Dim 6" * -1, pch = ".", col = "dodgerblue", 
    cex = 1.5, xlab = "Promotion", ylab = "Critique")
points(coordind$"Dim 5"[1], coordind$"Dim 6"[1] * -1, pch = 1, col = "red", 
    cex = 2, lwd = 2)

hist(coordind$"Dim 1", col = "dodgerblue", xlab = "Length", main = "")
abline(v = coordind$"Dim 1"[1], col = "red", lwd = 2)

hist(coordind$"Dim 2", col = "dodgerblue", xlab = "Opinion", main = "")
abline(v = coordind$"Dim 2"[1], col = "red", lwd = 2)

hist(coordind$"Dim 3", col = "dodgerblue", xlab = "Prediction", main = "")
abline(v = coordind$"Dim 3"[1], col = "red", lwd = 2)

hist(coordind$"Dim 4" * -1, col = "dodgerblue", xlab = "Advice", main = "")
abline(v = coordind$"Dim 4"[1] * -1, col = "red", lwd = 2)

hist(coordind$"Dim 5", col = "dodgerblue", xlab = "Promotion", main = "")
abline(v = coordind$"Dim 5"[1], col = "red", lwd = 2)

hist(coordind$"Dim 6" * -1, col = "dodgerblue", xlab = "Critique", main = "")
abline(v = coordind$"Dim 6"[1] * -1, col = "red", lwd = 2)

5.6.2 Conclusion

Overall, our analysis shows that the style of the questioned Tweet is relatively distinct from the general posting style that characterises the @realDonaldTrump account. Specifically, we find that questioned Tweet is especially written in a less predictive and in a more critical style than most of these Tweets. We also find that the questioned Tweet contains a number of relatively common grammatical forms that happen to be relatively uncommon in the Trump Twitter corpus, including most notably “had to”, “those” as a determiner, existential “there”, and “because”. We therefore conclude that the questioned Tweet was written in a relatively atypical style for the @realDonaldTrump account.

6 Change over Time

We also measured variation in each of our five main dimensions across all days in the corpus from 2009-2017 to see how the style of Tweets posted on the @realDonaldTrump Twitter account changed over time. To simplify the visualisation of these trends, we calculated the daily average dimension scores and the 90 day moving average dimension scoores for each of these dimensions.

Our results are plotted in five figures below, where the dots in the background represent the daily averages (note a small number of daily averages fall outside of the graphing window) and the superimposed line tracks the 90 day moving average. In addition, grey vertical lines mark the occurence of several important events in the Donald Trump timeline, which in numerous cases appear to correspond to inflection points in these moving averages.

First, we load the forecast library that we use to calculate moving averages.

library(forecast)
## Warning: package 'forecast' was built under R version 3.4.2
## Warning in as.POSIXlt.POSIXct(Sys.time()): unknown timezone 'zone/tz/2017c.
## 1.0/zoneinfo/Europe/London'

We then define a time series graphing function.

tsgraph <- function(dim, win, lab, colo, miny, maxy) {
    ts <- merge(as.Date(tweets$DATE), dim, by = "row.names")
    ts <- aggregate(y ~ x, ts, mean)
    sts <- ts
    for (i in 1:ncol(ts)) {
        sts[, i] <- ma(ts[, i], order = win, centre = T)
    }
    par(mar = c(4, 4, 8, 2))
    plot(ts$x, ts$y, type = "n", ylim = c(miny, maxy), ylab = lab, xlab = "DATE", 
        axes = FALSE, frame.plot = TRUE)
    axis(side = 1, tick = TRUE, labels = FALSE, at = c(as.Date("2010-01-01"), 
        as.Date("2011-01-01"), as.Date("2012-01-01"), as.Date("2013-01-01"), 
        as.Date("2014-01-01"), as.Date("2015-01-01"), as.Date("2016-01-01"), 
        as.Date("2017-01-01"), as.Date("2018-01-01")))
    axis(side = 2, at = c(-0.1, 0, 0.1), labels = c("-0.1", "0", "+0.1"))
    abline(v = as.Date("2011-04-30"), col = "grey", lwd = 0.6)  # Birther Attacks
    abline(v = as.Date("2012-11-06"), col = "grey", lwd = 0.6)  # 2012 Election
    abline(v = as.Date("2015-06-16"), col = "grey", lwd = 0.6)  # Announces
    abline(v = as.Date("2016-02-01"), col = "grey", lwd = 0.6)  # Iowa
    abline(v = as.Date("2016-05-03"), col = "grey", lwd = 0.6)  # Presumptive Nominee
    abline(v = as.Date("2016-08-17"), col = "grey", lwd = 0.6)  # Bannon
    abline(v = as.Date("2016-11-08"), col = "grey", lwd = 0.6)  # Election
    abline(v = as.Date("2017-01-20"), col = "grey", lwd = 0.6)  # Inauguration
    abline(a = 0, b = 0, lty = 1, lwd = 0.5, col = "black")
    points(ts$x, ts$y, pch = 16, cex = 0.7, col = adjustcolor(colo, alpha.f = 0.2))
    lines(ts$x, sts$y, col = colo, lwd = 1.5)
    axis(side = 3, tick = TRUE, las = 2, cex.axis = 1, outer = FALSE, col.ticks = "grey", 
        col.axis = "gray50", col = NA, at = c(as.Date("2011-04-30"), as.Date("2012-11-06"), 
            as.Date("2015-06-16"), as.Date("2016-02-01"), as.Date("2016-05-03"), 
            as.Date("2016-08-17"), as.Date("2016-11-08"), as.Date("2017-01-20")), 
        labels = c("Birther", "Election", "Declares", "Iowa", "Nominee", "Bannon", 
            "Election", "Inaguration"))
    axis(side = 1, tick = TRUE, cex.axis = 1, at = c(as.Date("2010-01-01"), 
        as.Date("2011-01-01"), as.Date("2012-01-01"), as.Date("2013-01-01"), 
        as.Date("2014-01-01"), as.Date("2015-01-01"), as.Date("2016-01-01"), 
        as.Date("2017-01-01"), as.Date("2018-01-01")), labels = c("2010", "2011", 
        "2012", "2013", "2014", "2015", "2016", "2017", "2018"))
    
}

Finally, we plot the 5 timeseries.

tsgraph(coordind$"Dim 2", 90, "OPINION", "red", min = -0.13, max = +0.13)

tsgraph(coordind$"Dim 3", 90, "PREDICTION", "dodgerblue", min = -0.13, max = +0.13)

tsgraph(coordind$"Dim 4" * -1, 90, "ADVICE", "darkviolet", min = -0.13, max = +0.13)

tsgraph(coordind$"Dim 5", 90, "PROMOTION", "forestgreen", min = -0.13, max = +0.13)

tsgraph(coordind$"Dim 6" * -1, 90, "CRITIQUE", "darkorange", min = -0.13, max = +0.13)

Note that this stage of the analysis was added on 2017/12/11.

7 Direction for Future Research

The main direction we plan on pursuing in future research is to consider the effect of device on our stylistic dimensions, especially given the questions related to authorship. We will also further refine our interpretations of the 6 stylistic dimensions we extracted for this analysis and possibly extract additional dimensions.

8 Appendix: Linguistic Features

CODE DEFINITION EXAMPLE
ACRONYM Refers to any initials separated by full stops U.S.A, U.S, N.Y.
AMPLIFIER Refers to adverbs used to intensify the verb/adjective very, absolutely, so
ANALNEG Refers to ‘not’ plus contracted forms can’t, cannot, not
ATTRIBADJ Attributive Adjectives: Adjectives that come before the noun and any other adjective not tagged as predicative The big cat
AUXBE Auxiliary be She was killed
AUXHAVE Auxiliary have She has spent all her money
BEMV BE as a main verb She was a beautiful girl
BRACKET Brackets ( )
CAPS Capitalisation - two or more capital letters that is not tagged as an acronym MAKE AMERICA GREAT AGAIN
CCONJ Coordinating conjunctions and, &
CNTRSTCONJ Contrastive conjunctions but, by contrast
COLON Colon :
COMMA Comma ,
COPVB Copular verbs: verbs which are followed by predicative She looked beautiful, She was beautiful
DEFART Definite article the
DETQUAN Quantifiers used as determiners She ate several sandwiches
EXCLAM Exclamation mark !
FSTPP First person I, We, us
FULSTOP Full stop .
GERUND Prepositional complement Sarah talked about leaving her job
HASHTAG Hashtag #
HAVEMV HAVE as main verb He had a wonderful smile
IMPERATIVE Clauses in imperative mood Go away!, Don’t be foolish!
INDEFART Indefinite article a, an
INFINITIVE Infinitive to be, to go, to have, to say
ING Verb in -ing form Having, going, seeing
IT The pronoun IT plus contractions It, it’s
MDNEC Necessity modals should, mustn’t, ought
MDPOSS Possibility modals can, may, mightn’t
MDPRED Prediction modals will, shall, I’ll
MENTION Mentioning through the @ symbol @username
MULTIWVB Multi-word verb: particle verbs and prepositional verbs look up, look forward
NOMIN Nominalisations action, statement
NUMDET Numerals used as determiners 3 mice, four children
NUMNOUN Numerals used as nouns He gave me seven.
OBJPRO Object pronouns them, him, us
OTHERADV Other adverbs that are not tagged as amplifiers, downtoners, time and place adverbials, etc. She danced beautifully
OTHRINTJ Other interjections that are not tagged as laughter, positive interjection ‘Yes’, negative interjections ‘No’ Wow, OMG, Congrats!, well
OTHRNOUN Other nouns that are not tagged as numeral, quantifiers, nominalisations, ordinals tea, book
OTHRSUB Other subordinators that are not tagged as time, place, cause, concessive, and conditional subordinators whereas, like
OTHRVERB Other verbs that are not tagged as private verbs, public verbs, verb-ing, past tense verbs, participle verbs, third person singular, suasive verbs, perception verbs, copular verbs, be as main verb, auxiliary be, auxiliary have, be as main verb, pro-verb do, auxiliary do, stance verb
PASSIVE Passive constructions Sally was prosecuted
PAST Past tense verbs went, saved, held
PERCEPTVB Verbs of perception hear, smell, taste
PERFECT Perfect aspect She had been to the shops already.
POSESPRPN Possessive proper noun Barack Obama’s cat
POSSDET Possessive determiner my cat, his dog, her lips, their sandwiches
PREDADJ Predicative adjectives: Adjectives that come after copular verbs She was ugly, She looked stressed
PREP Preposition down the road, in your car
PROGRESSIVE Progressive aspect I am going, she was talking, he is thinking
PROQUAN Quantitative pronoun Everyone, everything, anyone, anything
PROVDO Pro-verb do. He had done it also, She did it better.
PRPN Proper nouns Donald Trump, London, New York
PRVV Private verbs think, wonder, anticipate
PUBV Public verbs stated, said, shouted
QUES Question mark ?
RELCLAUSESUBGAP Relative clause on subject position A girl that likes cheese, the house that was haunted
SINFLECT Third person singular verbs talks, likes, thinks
SNDPP Second person you, you’re, your
STANCEVB Stance verbs want, seem, appear, need, like, love, hate
SUBJPRO Subject pronoun I, he, she, we, they
SUPERLATIVE Superlative best, greatest, happiest, slowest, oldest
TIMEADV Time adverbial last night, yesterday, afterwards, again
URL URLs - can be images, videos, gifs, links…
WHW WH-words that are not tagged as adverbs/subordinators who, how, what, when