Data Source and wrangling

oil related data was downloaded from the webpage of the Energy information administration eia:

President Donald trump tweets from @realdonaltrump where obtained thanks to an open repo on the following address, they can also be downloaded on .jason format from the following github repository and repository

all the data wrangling code can be found on the following github repository

oil data and donald trump tweets were joined and grouped by day, tweets were searched looking for the following oil related words or combination of words:

The result is an unified data set of 3684 observations and 10 variables

load("trump_oil_all.RData")

dim(trump_oil)
## [1] 3684   20
names(trump_oil)
##  [1] "date"           "oil_word_count" "oil_keyword"    "Tweet_volume"  
##  [5] "RT"             "Brent"          "WTI"            "Gas"           
##  [9] "Exxon"          "Chevron"        "Conoco"         "Shell"         
## [13] "BP"             "Eni"            "Total"          "Slb"           
## [17] "Hall"           "Baker"          "LAG"            "Period"

Data codebook

Description of variables, units of measurements, data category and the source of data, can be found on the tables below:

Codebook Varible Description

Variable Name Description
Date Date of the observation (date variable, format year-month-day)
Oil_word_count Volumen of tweets that contains an oil related words (numeric)
oil_keyword logical variable that identifies if the observation is oil related (Boolean)
Tweet_volume Total number of tweets made by donal trump on a single day (numeric)
RT Number of re-tweets (numeric)
Brent Brent spot oil price (numeric, in USD per barrel)
WTI WTI spot oil price (numeric, in USD per barrel)
Gas Henry Hub spot gas price (numeric, in USD per TCF or trillion cubic feet)
Exxon Exxon Mobil opening stock daily price NYSE (numeric, in USD)
Chevron Chevron opening stock daily price NYSE (numeric, in USD)
Conoco Conoco Phillips opening stock daily price NYSE (numeric, in USD)
Shell Royal Dutch Shell opening stock daily price NYSE (numeric, in USD)
BP BP opening stock daily price NYSE (numeric, in USD)
Eni Eni opening stock daily price NYSE (numeric, in USD)
Total Total opening stock daily price NYSE (numeric, in USD)
Slb Schlumberger opening stock daily price NYSE (numeric, in USD)
Hall Halliburton opening stock daily price NYSE (numeric, in USD)
Baker Baker Hughes opening stock daily price NYSE (numeric, in USD)
LAG Variable that identifies the relative position of a day compared to the neares day of an oil tweet (categorical)
Period Logical variable that identifies if the tweet corresponds to a date earlier than 2016 (Boolean)

Data exploration

Oil tweets and year distribution

one of the first aspects to look is the distribution of oil related tweets over time

trump_oil %>% mutate(Year=year(date)) %>%
        group_by(Year) %>%
        summarise(Tweet_Vol=sum(Tweet_volume,na.rm=TRUE),
                  Oil_Tweets=sum(oil_keyword==TRUE,na.rm=TRUE),
                  percentage_Oil_Tweets=(sum(oil_keyword==TRUE,na.rm=TRUE)/sum(Tweet_volume,na.rm = TRUE))*100) %>%
        knitr::kable()
Year Tweet_Vol Oil_Tweets percentage_Oil_Tweets
2009 56 0 0.000000
2010 142 8 5.633803
2011 774 51 6.589147
2012 3531 114 3.228547
2013 8144 119 1.461198
2014 5784 91 1.573306
2015 7536 95 1.260616
2016 4225 47 1.112426
2017 2605 97 3.723608
2018 3510 146 4.159544
2019 2146 75 3.494874
trump_oil %>% mutate(Year=year(date)) %>%
        group_by(Year) %>%
        summarise(Tweet_Vol=sum(Tweet_volume,na.rm=TRUE),
                  Oil_Tweets=sum(oil_keyword==TRUE,na.rm=TRUE),
                  percentage_Oil_Tweets=(sum(oil_keyword==TRUE,na.rm=TRUE)/sum(Tweet_volume,na.rm = TRUE))*100) %>%
        ggplot(.,aes(y=Oil_Tweets,x=as.factor(Year)))+
        geom_bar(stat = "identity",fill="red",alpha="0.4",color="blue")+
        labs(y="Oil related Tweets",
             title = "Trump Oil related Tweets over time",
             x="Year")+
        ggthemes::theme_tufte()

trump_oil %>% mutate(Year=year(date)) %>%
        group_by(Year) %>%
        summarise(Tweet_Vol=sum(Tweet_volume,na.rm=TRUE),
                  Oil_Tweets=sum(oil_keyword==TRUE,na.rm=TRUE),
                  percentage_Oil_Tweets=(sum(oil_keyword==TRUE,na.rm=TRUE)/sum(Tweet_volume,na.rm = TRUE))*100) %>%
        ggplot(.,aes(y=percentage_Oil_Tweets,x=as.factor(Year)))+
        geom_bar(stat = "identity",fill="blue",alpha="0.4",color="red")+
        labs(y="Oil related Tweets percentage",
             title = "Percentage of Trump Oil related Tweets over time",
             x="Year")+
        ggthemes::theme_tufte()

by look at the trend of the volume of oil related tweets it is possible to claim that the number has increase over the pass of the years, altought by normalizing this quantity by the total volume of tweets, the picture changes and in fact, on recent years the percentage of tweets related to oil has stabilized around 3% and previously was in fact decreasing over time, up until 2017.

Oil tweets and price

A very interesting question and one of the most fundamentals of this analysis is if there is a significant difference in prices when there is an oil tweet of donald trump. This question can be answered by using a simple t-test.

#t-test oil word vs brent oil price Before presidency

trump_oil %>% filter(Period=="Before") %>% 
        t.test(Brent~oil_keyword,data = .)
## 
##  Welch Two Sample t-test
## 
## data:  Brent by oil_keyword
## t = -6.5226, df = 675.13, p-value = 1.355e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.335166  -6.089766
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            88.98724            97.69971
#t-test oil word vs brent oil price after presidency

trump_oil %>% filter(Period=="After") %>% 
        t.test(Brent~oil_keyword,data = .)
## 
##  Welch Two Sample t-test
## 
## data:  Brent by oil_keyword
## t = -7.1919, df = 476.4, p-value = 2.492e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.354810 -4.769119
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            55.64804            62.21000
trump_oil %>% ggplot(.,aes(y=Brent,x=oil_keyword,fill=oil_keyword))+geom_boxplot(alpha=0.5)+
        labs(title = "Brent Oil Prices vs Trump's Oil tweets",
             x= "Oil Tweet",
             y="Brent spot Oil Price (USD per barrel)")+
        facet_grid(. ~ Period)+
        ggthemes::theme_tufte()
## Warning: Removed 1138 rows containing non-finite values (stat_boxplot).

#t-test oil word vs WTI oil price Before presidency

trump_oil %>% filter(Period=="Before") %>% 
        t.test(WTI~oil_keyword,data = .)
## 
##  Welch Two Sample t-test
## 
## data:  WTI by oil_keyword
## t = -4.9457, df = 646.53, p-value = 9.684e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.643486 -3.298911
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            81.91073            87.38193
#t-test oil word vs WTI oil price After presidency

trump_oil %>% filter(Period=="After") %>% 
        t.test(WTI~oil_keyword,data = .)
## 
##  Welch Two Sample t-test
## 
## data:  WTI by oil_keyword
## t = -6.3649, df = 445.55, p-value = 4.862e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6.411631 -3.386315
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            52.26750            57.16647
trump_oil %>% ggplot(.,aes(y=WTI,x=oil_keyword,fill=oil_keyword))+geom_boxplot(alpha=0.5)+
        scale_fill_manual(values=c("blue","red"))+
        labs(title = "WTI Oil Prices vs Trump's Oil tweets",
             x= "Oil Tweet",
             y="WTI spot Oil Price (USD per barrel)")+
        facet_grid(. ~ Period)+
        ggthemes::theme_tufte()
## Warning: Removed 1150 rows containing non-finite values (stat_boxplot).

trump_oil %>% filter(Period=="Before") %>% 
        t.test(Gas~oil_keyword,data = .)
## 
##  Welch Two Sample t-test
## 
## data:  Gas by oil_keyword
## t = 8.733, df = 737.43, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.3234874 0.5111040
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            3.755874            3.338578
#t-test oil word vs Henry Hub gas price  After presidency

trump_oil %>% filter(Period=="After") %>% 
        t.test(Gas~oil_keyword,data = .)
## 
##  Welch Two Sample t-test
## 
## data:  Gas by oil_keyword
## t = -3.4443, df = 465.04, p-value = 0.0006244
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2111928 -0.0577519
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            2.836565            2.971037
trump_oil %>% ggplot(.,aes(y=Gas,x=oil_keyword,fill=oil_keyword))+geom_boxplot(alpha=0.5)+
        scale_fill_manual(values=c("Darkblue","darkred"))+
        labs(title = "Henry Hub Gas Prices vs Trump's Oil tweets",
             x= "Oil Tweet",
             y="Henry Hub spot Gas Prices (USD per TCF)")+
        facet_grid(. ~ Period)+
        ggthemes::theme_tufte()
## Warning: Removed 1135 rows containing non-finite values (stat_boxplot).

For both Brent and WTI oil prices there is indeed a significative difference depending on wether trump tweets about oil or not, prices are lower when there is no oil tweet, and such behavior applies for year before he became president and after. Henry hub gas prices shows as well a significative difference in relationship with trumo oil tweets, however before trumo became a president, and in contrast with the behavior of oil, prices where in higher when there was no oil tweet and lower on oil related tweet days, such trend or difference changed after he became president.

Oil tweets and price distribution

It is interesting as well to evaluate if the oil tweeting patterns of president trump are affected by price, and determine if he tends to tweet about oil on higher or lower prices.

data.frame(table(cut(trump_oil$Brent,3),trump_oil$oil_keyword)) %>%
        filter(Var2==TRUE) %>%
        select(.,-Var2) %>%
        dplyr::rename(.,Price_range=Var1,Tweets=Freq) %>%
        knitr::kable()
Price_range Tweets
(25.9,60.1] 157
(60.1,94.1] 196
(94.1,128] 306
data.frame(table(cut(trump_oil$Brent,3),trump_oil$oil_keyword)) %>%
        filter(Var2==TRUE) %>%
        select(.,-Var2) %>%
        dplyr::rename(.,Price_range=Var1,Tweets=Freq) %>% 
        ggplot(aes(x=Price_range,y=Tweets,fill=Price_range))+geom_bar(stat="identity")+
        labs(y="Trump Oil Tweets",
             title = "Trump Tweets distribution by price oil range",
             x="Oil price Ranges")+ggthemes::theme_tufte()

It seems that higher prices are more prone to have more oil related tweets, altought in order to prove if there is a dependency between donald trump oil tweets and prices we will use a chi-square independence test.

chisq.test(table(cut(trump_oil$Brent,3),trump_oil$oil_keyword))
## 
##  Pearson's Chi-squared test
## 
## data:  table(cut(trump_oil$Brent, 3), trump_oil$oil_keyword)
## X-squared = 37.005, df = 2, p-value = 9.215e-09
data.frame(chisq.test(table(cut(trump_oil$Brent,3),trump_oil$oil_keyword))$expected) %>% 
        select(.,TRUE.) %>% mutate(Price_range=rownames(.)) %>% dplyr::rename(.,Chisq_Expected_Tweets=TRUE.) %>%
        select(.,Price_range,Chisq_Expected_Tweets) %>% knitr::kable()
Price_range Chisq_Expected_Tweets
(25.9,60.1] 185.8452
(60.1,94.1] 231.9183
(94.1,128] 241.2364
data.frame(chisq.test(table(cut(trump_oil$Brent,3),trump_oil$oil_keyword))$expected) %>% 
        select(.,TRUE.) %>% mutate(Price_range=rownames(.)) %>% dplyr::rename(.,Tweets=TRUE.) %>%
        select(.,Price_range,Tweets) %>%
        ggplot(aes(x=Price_range,y=Tweets,fill=Price_range))+geom_bar(stat="identity")+
        labs(y="Expected Tweets by Chi-square Dist.",
             title = "Chi-square expected tweets distribution by price oil range",
             x="Oil price Ranges")+ggthemes::theme_tufte()

The chi-square independence test, shows that there is indeed a dependency between prices and oil tweets and that trumps distribution presents a significative difference with the chi-square expected independent distribution, in reality donald trumo has more oil related tweets in the higher prices bin and less than expected on the lower prices bin, making it possible to assume that Trump oil tweet behavior is a reaction to high oil prices.

Oil prices and its behavior before and after an oil tweet

So far, high oil prices and an oil related tweet seems to be correlated, altouhgt it is an interesting correlation indeed, more interesting might be to evaluate the behavior of prices before and after an oil related tweet, and analyze the effect that these tweets might have on the market of this commodities, such evaluation will be done on two different periods, before and after the year on which donald trump became president.

trump_oil %>% 
        filter(Period=="Before") %>%
        group_by(LAG) %>% 
        summarise(n(),BrentP=mean(Brent,na.rm=T),WTIP=mean(WTI,na.rm=T),GasP=mean(Gas,na.rm=T),
                  BrentSD=sd(Brent,na.rm=T),WTISD=sd(WTI,na.rm=T),GasSD=sd(Gas,na.rm=T)) %>% 
        arrange(.,desc(BrentP)) %>% knitr::kable()
LAG n() BrentP WTIP GasP BrentSD WTISD GasSD
Zero 478 97.69971 87.38193 3.338578 23.82141 20.00907 0.8337834
One 301 95.64033 85.49123 3.439340 24.38514 20.55360 0.8504588
Seven 57 94.42771 84.90042 3.766939 21.53348 18.14230 0.9737475
Two 234 93.21115 83.19205 3.493853 25.13038 21.08731 0.8138404
Five 104 93.12094 82.46556 3.357937 24.80643 20.34692 0.8372115
Six 70 92.75982 82.64815 3.594444 24.24057 19.78971 0.8188963
Four_before 21 91.62000 81.92800 4.039000 23.90184 17.76233 1.2405863
One_before 48 91.29108 81.72351 3.635676 24.36614 19.95510 0.7549523
Three 187 90.72584 81.15373 3.462843 26.08764 22.06054 0.9106416
Four 140 90.43564 80.95228 3.495570 25.51448 21.36600 0.8001054
Five_before 18 90.36417 79.63250 3.629167 26.29059 20.49569 0.9319721
Three_before 28 90.17294 81.70882 4.096471 23.27960 18.68586 0.8064346
Six_before 15 89.41000 83.12583 3.818462 26.40386 18.28130 1.0000237
Two_before 34 86.52118 78.21941 3.835294 25.41401 20.81788 0.9021233
Seven_before 11 84.75909 78.07727 3.829091 26.18512 22.05246 0.8240079
No_oil_Tweet 690 82.59109 80.11140 4.131186 17.38190 12.55725 0.7677614
trump_oil %>% 
        filter(Period=="After") %>%
        group_by(LAG) %>% 
        summarise(n(),BrentP=mean(Brent,na.rm=T),WTIP=mean(WTI,na.rm=T),GasP=mean(Gas,na.rm=T),
                  BrentSD=sd(Brent,na.rm=T),WTISD=sd(WTI,na.rm=T),GasSD=sd(Gas,na.rm=T)) %>% 
        arrange(.,desc(BrentP)) %>% knitr::kable()
LAG n() BrentP WTIP GasP BrentSD WTISD GasSD
Zero 365 62.21000 57.16647 2.971037 11.88880 10.077184 0.5043826
One 217 60.36236 55.95943 2.942937 11.97314 9.635169 0.4686222
Two 136 58.18175 54.27134 2.902165 12.45402 10.238861 0.5619909
Three 90 56.77147 53.13015 2.848985 12.23827 9.865995 0.5954093
Four 67 54.33196 51.06311 2.766739 12.67375 10.283296 0.6225648
Six_before 14 53.85000 51.19500 2.929000 11.72331 8.500851 0.5695700
Six 47 53.68088 50.93281 2.729412 12.20876 9.668582 0.4767182
Seven 44 53.56531 50.94687 2.811562 12.47703 9.628227 0.4778589
Five_before 20 53.29438 50.36688 2.919375 11.08845 8.693283 0.6074588
One_before 37 52.76042 50.14958 2.733200 14.25078 11.376127 0.5622701
Four_before 21 52.47467 49.32400 2.874000 12.30110 10.462808 0.5568637
Five 54 51.52486 49.33658 2.663784 10.94321 8.689866 0.6097602
No_oil_Tweet 71 50.90765 48.13469 2.782353 13.23145 11.053217 0.4635972
Two_before 30 50.73409 48.72500 2.707391 11.67308 9.810139 0.5151983
Seven_before 12 50.70429 49.20857 2.715714 16.50307 13.223657 0.6521211
Three_before 23 50.23778 47.48000 2.723889 10.12455 8.993649 0.5911021

The tables with different statistics shows some interesting results, on the effect of tweets on prices, however it might be more interesting to analyze such data with the use of graphics and for each commodity.

#Brent oil prices behavior

trump_oil %>% group_by(Period,LAG) %>% 
        summarise(BrentP=mean(Brent,na.rm=T),WTIP=mean(WTI,na.rm=T),GasP=mean(Gas,na.rm=T),
                  BrentSD=sd(Brent,na.rm=T),WTISD=sd(WTI,na.rm=T),GasSD=sd(Gas,na.rm=T)) %>% 
        arrange(.,desc(BrentP)) %>%
        mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
                                 LAG=="One" ~ 1,
                                 LAG=="Two" ~ 2,
                                 LAG=="Three" ~ 3,
                                 LAG=="Four" ~ 4,
                                 LAG=="Five" ~ 5,
                                 LAG=="Six" ~ 6,
                                 LAG=="Seven" ~ 7,
                                 LAG=="One_before" ~ -1,
                                 LAG=="Two_before" ~ -2,
                                 LAG=="Three_before" ~ -3,
                                 LAG=="Four_before" ~ -4,
                                 LAG=="Five_before" ~ -5,
                                 LAG=="Six_before" ~ -6,
                                 LAG=="Seven_before" ~ -7,
                                 LAG=="No_oil_Tweet" ~ -8,))%>%
        ggplot(.,aes(x=as.factor(Lag_num),y=BrentP,color=Period))+
        geom_point(size=7,alpha=0.5)+
        scale_color_manual(values=c("blue","red"))+
        facet_grid(.~Period)+
        geom_errorbar(aes(ymin=BrentP-BrentSD, ymax=BrentP+BrentSD), width=.2,
                      position=position_dodge(0.05),color="black")+
        labs(title = "Brent Oil Price vs Trump Oil tweets",
             x="Lag (days)",
             y="Brent Oil spot prices (USD per barrel)")+
        ggthemes::theme_igray()

For Brent prices the graph shows various interesting points:

  • Brent Prices before Trumps presidency were higher and more variable in comparisson with prices before.
  • Before Trumps presidency, Brent prices seemed to oscilate equally before and after an oil tweet, without a clear trend after an oil tweet.
  • After Trump became president, Brent prices seem to show a clear trend after an oil tweet, on which they continuously descend for the next seven days.
  • Before Trumps presidency, the lag = 0 or the day of an oil tweet did not necessarilly matched the average maximum prices.
  • After Trumps presidency, the lag = 0 or the day of an oil tweet matches the average maximum prices.
trump_oil %>% group_by(Period,LAG) %>% 
        summarise(BrentP=mean(Brent,na.rm=T),WTIP=mean(WTI,na.rm=T),GasP=mean(Gas,na.rm=T),
                  BrentSD=sd(Brent,na.rm=T),WTISD=sd(WTI,na.rm=T),GasSD=sd(Gas,na.rm=T)) %>% 
        arrange(.,desc(BrentP)) %>%
        mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
                                 LAG=="One" ~ 1,
                                 LAG=="Two" ~ 2,
                                 LAG=="Three" ~ 3,
                                 LAG=="Four" ~ 4,
                                 LAG=="Five" ~ 5,
                                 LAG=="Six" ~ 6,
                                 LAG=="Seven" ~ 7,
                                 LAG=="One_before" ~ -1,
                                 LAG=="Two_before" ~ -2,
                                 LAG=="Three_before" ~ -3,
                                 LAG=="Four_before" ~ -4,
                                 LAG=="Five_before" ~ -5,
                                 LAG=="Six_before" ~ -6,
                                 LAG=="Seven_before" ~ -7,
                                 LAG=="No_oil_Tweet" ~ -8,))%>%
        ggplot(.,aes(x=as.factor(Lag_num),y=WTIP,color=Period))+
        scale_color_manual(values=c("darkgreen","purple"))+
        geom_point(size=7,alpha=0.5)+
        facet_grid(.~Period)+
        geom_errorbar(aes(ymin=WTIP-WTISD, ymax=WTIP+WTISD), width=.2,
                      position=position_dodge(0.05),color="black")+
        labs(title = "WTI Oil Price vs Trump Oil tweets",
             x="Lag (days)",
             y="WTI Oil spot prices (USD per barrel)")+
        ggthemes::theme_igray()

WTI oil prices shows a similar trend as the Brent oil prices.

trump_oil %>% group_by(Period,LAG) %>% 
        summarise(BrentP=mean(Brent,na.rm=T),WTIP=mean(WTI,na.rm=T),GasP=mean(Gas,na.rm=T),
                  BrentSD=sd(Brent,na.rm=T),WTISD=sd(WTI,na.rm=T),GasSD=sd(Gas,na.rm=T)) %>% 
        arrange(.,desc(BrentP)) %>%
        mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
                                 LAG=="One" ~ 1,
                                 LAG=="Two" ~ 2,
                                 LAG=="Three" ~ 3,
                                 LAG=="Four" ~ 4,
                                 LAG=="Five" ~ 5,
                                 LAG=="Six" ~ 6,
                                 LAG=="Seven" ~ 7,
                                 LAG=="One_before" ~ -1,
                                 LAG=="Two_before" ~ -2,
                                 LAG=="Three_before" ~ -3,
                                 LAG=="Four_before" ~ -4,
                                 LAG=="Five_before" ~ -5,
                                 LAG=="Six_before" ~ -6,
                                 LAG=="Seven_before" ~ -7,
                                 LAG=="No_oil_Tweet" ~ -8,))%>%
        ggplot(.,aes(x=as.factor(Lag_num),y=GasP,color=Period))+
        scale_color_manual(values=c("thistle4","steelblue4"))+
        geom_point(size=7,alpha=0.5)+
        facet_grid(.~Period)+
        geom_errorbar(aes(ymin=GasP-GasSD, ymax=GasP+GasSD), width=.2,
                      position=position_dodge(0.05))+
        labs(title = "Henry Hub gas price vs Trump Oil tweets",
             x="Lag (days)",
             y="Henry Hub gas spot prices (USD per TCF)")+
        ggthemes::theme_igray()

Henry hub gas prices dont seem to show a clear different in behavior between the period before and after Trumps presidency.

American major operators and their NYSE stock behavior before and after an oil tweet

Exxon Mobil NYSE

#Exxon Movil stock prices behavior

trump_oil %>% group_by(Period,LAG) %>% 
        summarise(ExxonP=mean(Exxon,na.rm=T),ChevronP=mean(Chevron,na.rm=T),ConocoP=mean(Conoco,na.rm=T),
                  ExxonSD=sd(Exxon,na.rm=T),ChevronSD=sd(Chevron,na.rm=T),ConocoSD=sd(Conoco,na.rm=T)) %>% 
        arrange(.,desc(ExxonP)) %>%
        mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
                                 LAG=="One" ~ 1,
                                 LAG=="Two" ~ 2,
                                 LAG=="Three" ~ 3,
                                 LAG=="Four" ~ 4,
                                 LAG=="Five" ~ 5,
                                 LAG=="Six" ~ 6,
                                 LAG=="Seven" ~ 7,
                                 LAG=="One_before" ~ -1,
                                 LAG=="Two_before" ~ -2,
                                 LAG=="Three_before" ~ -3,
                                 LAG=="Four_before" ~ -4,
                                 LAG=="Five_before" ~ -5,
                                 LAG=="Six_before" ~ -6,
                                 LAG=="Seven_before" ~ -7,
                                 LAG=="No_oil_Tweet" ~ -8,))%>%
        ggplot(.,aes(x=as.factor(Lag_num),y=ExxonP,color=Period))+
        scale_color_manual(values=c("blue","red"))+
        geom_point(size=7,alpha=0.5)+
        facet_grid(.~Period)+
        geom_errorbar(aes(ymin=ExxonP-ExxonSD, ymax=ExxonP+ExxonSD), width=.2,
                      position=position_dodge(0.05))+
        labs(title = "Exxon Mobil stock price NYSE vs Trump Oil tweets",
             x="Lag (days)",
             y="Exxon Mobil stock price (USD)")+
        ggthemes::theme_igray()

Chevron NYSE

trump_oil %>% group_by(Period,LAG) %>% 
        summarise(ExxonP=mean(Exxon,na.rm=T),ChevronP=mean(Chevron,na.rm=T),ConocoP=mean(Conoco,na.rm=T),
                  ExxonSD=sd(Exxon,na.rm=T),ChevronSD=sd(Chevron,na.rm=T),ConocoSD=sd(Conoco,na.rm=T)) %>% 
        arrange(.,desc(ExxonP)) %>%
        mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
                                 LAG=="One" ~ 1,
                                 LAG=="Two" ~ 2,
                                 LAG=="Three" ~ 3,
                                 LAG=="Four" ~ 4,
                                 LAG=="Five" ~ 5,
                                 LAG=="Six" ~ 6,
                                 LAG=="Seven" ~ 7,
                                 LAG=="One_before" ~ -1,
                                 LAG=="Two_before" ~ -2,
                                 LAG=="Three_before" ~ -3,
                                 LAG=="Four_before" ~ -4,
                                 LAG=="Five_before" ~ -5,
                                 LAG=="Six_before" ~ -6,
                                 LAG=="Seven_before" ~ -7,
                                 LAG=="No_oil_Tweet" ~ -8,))%>%
        ggplot(.,aes(x=as.factor(Lag_num),y=ChevronP,color=Period))+
        scale_color_manual(values=c("darkred","red"))+
        geom_point(size=7,alpha=0.5)+
        facet_grid(.~Period)+
        geom_errorbar(aes(ymin=ChevronP-ChevronSD, ymax=ChevronP+ChevronSD), width=.2,
                      position=position_dodge(0.05))+
        labs(title = "Chevron stock price NYSE vs Trump Oil tweets",
             x="Lag (days)",
             y="Chevron stock price (USD)")+
        ggthemes::theme_igray()

Conoco NYSE

trump_oil %>% group_by(Period,LAG) %>% 
        summarise(ExxonP=mean(Exxon,na.rm=T),ChevronP=mean(Chevron,na.rm=T),ConocoP=mean(Conoco,na.rm=T),
                  ExxonSD=sd(Exxon,na.rm=T),ChevronSD=sd(Chevron,na.rm=T),ConocoSD=sd(Conoco,na.rm=T)) %>% 
        arrange(.,desc(ExxonP)) %>%
        mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
                                 LAG=="One" ~ 1,
                                 LAG=="Two" ~ 2,
                                 LAG=="Three" ~ 3,
                                 LAG=="Four" ~ 4,
                                 LAG=="Five" ~ 5,
                                 LAG=="Six" ~ 6,
                                 LAG=="Seven" ~ 7,
                                 LAG=="One_before" ~ -1,
                                 LAG=="Two_before" ~ -2,
                                 LAG=="Three_before" ~ -3,
                                 LAG=="Four_before" ~ -4,
                                 LAG=="Five_before" ~ -5,
                                 LAG=="Six_before" ~ -6,
                                 LAG=="Seven_before" ~ -7,
                                 LAG=="No_oil_Tweet" ~ -8,))%>%
        ggplot(.,aes(x=as.factor(Lag_num),y=ConocoP,color=Period))+
        scale_color_manual(values=c("steelblue","red"))+
        geom_point(size=7,alpha=0.5)+
        facet_grid(.~Period)+
        geom_errorbar(aes(ymin=ConocoP-ConocoSD, ymax=ConocoP+ConocoSD), width=.2,
                      position=position_dodge(0.05))+
        labs(title = "Conoco stock price NYSE vs Trump Oil tweets",
             x="Lag (days)",
             y="Conoco stock price (USD)")+
        ggthemes::theme_igray()

International major operators and their NYSE stock behavior before and after an oil tweet

Royal Dutch Shell

trump_oil %>% group_by(Period,LAG) %>% 
        summarise(TotalP=mean(Total,na.rm=T),ShellP=mean(Shell,na.rm=T),BPP=mean(BP,na.rm=T),EniP=mean(Eni,na.rm=T),
                  TotalSD=sd(Total,na.rm=T),ShellSD=sd(Shell,na.rm=T),BPSD=sd(BP,na.rm=T),EniSD=sd(Eni,na.rm=T)) %>% 
        arrange(.,desc(TotalP)) %>%
        mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
                                 LAG=="One" ~ 1,
                                 LAG=="Two" ~ 2,
                                 LAG=="Three" ~ 3,
                                 LAG=="Four" ~ 4,
                                 LAG=="Five" ~ 5,
                                 LAG=="Six" ~ 6,
                                 LAG=="Seven" ~ 7,
                                 LAG=="One_before" ~ -1,
                                 LAG=="Two_before" ~ -2,
                                 LAG=="Three_before" ~ -3,
                                 LAG=="Four_before" ~ -4,
                                 LAG=="Five_before" ~ -5,
                                 LAG=="Six_before" ~ -6,
                                 LAG=="Seven_before" ~ -7,
                                 LAG=="No_oil_Tweet" ~ -8,))%>%
        ggplot(.,aes(x=as.factor(Lag_num),y=ShellP,color=Period))+
        scale_color_manual(values=c("orange","red"))+
        geom_point(size=7,alpha=0.5)+
        facet_grid(.~Period)+
        geom_errorbar(aes(ymin=ShellP-ShellSD, ymax=ShellP+ShellSD), width=.2,
                      position=position_dodge(0.05))+
        labs(title = "Shell stock price NYSE vs Trump Oil tweets",
             x="Lag (days)",
             y="Shell stock price (USD)")+
        ggthemes::theme_igray()

Total

trump_oil %>% group_by(Period,LAG) %>% 
        summarise(TotalP=mean(Total,na.rm=T),ShellP=mean(Shell,na.rm=T),BPP=mean(BP,na.rm=T),EniP=mean(Eni,na.rm=T),
                  TotalSD=sd(Total,na.rm=T),ShellSD=sd(Shell,na.rm=T),BPSD=sd(BP,na.rm=T),EniSD=sd(Eni,na.rm=T)) %>% 
        arrange(.,desc(TotalP)) %>%
        mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
                                 LAG=="One" ~ 1,
                                 LAG=="Two" ~ 2,
                                 LAG=="Three" ~ 3,
                                 LAG=="Four" ~ 4,
                                 LAG=="Five" ~ 5,
                                 LAG=="Six" ~ 6,
                                 LAG=="Seven" ~ 7,
                                 LAG=="One_before" ~ -1,
                                 LAG=="Two_before" ~ -2,
                                 LAG=="Three_before" ~ -3,
                                 LAG=="Four_before" ~ -4,
                                 LAG=="Five_before" ~ -5,
                                 LAG=="Six_before" ~ -6,
                                 LAG=="Seven_before" ~ -7,
                                 LAG=="No_oil_Tweet" ~ -8,))%>%
        ggplot(.,aes(x=as.factor(Lag_num),y=TotalP,color=Period))+
        scale_color_manual(values=c("purple","red"))+
        geom_point(size=7,alpha=0.5)+
        facet_grid(.~Period)+
        geom_errorbar(aes(ymin=TotalP-TotalSD, ymax=TotalP+TotalSD), width=.2,
                      position=position_dodge(0.05))+
        labs(title = "Total stock price NYSE vs Trump Oil tweets",
             x="Lag (days)",
             y="Total stock price (USD)")+
        ggthemes::theme_igray()

BP

trump_oil %>% group_by(Period,LAG) %>% 
        summarise(TotalP=mean(Total,na.rm=T),ShellP=mean(Shell,na.rm=T),BPP=mean(BP,na.rm=T),EniP=mean(Eni,na.rm=T),
                  TotalSD=sd(Total,na.rm=T),ShellSD=sd(Shell,na.rm=T),BPSD=sd(BP,na.rm=T),EniSD=sd(Eni,na.rm=T)) %>% 
        arrange(.,desc(TotalP)) %>%
        mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
                                 LAG=="One" ~ 1,
                                 LAG=="Two" ~ 2,
                                 LAG=="Three" ~ 3,
                                 LAG=="Four" ~ 4,
                                 LAG=="Five" ~ 5,
                                 LAG=="Six" ~ 6,
                                 LAG=="Seven" ~ 7,
                                 LAG=="One_before" ~ -1,
                                 LAG=="Two_before" ~ -2,
                                 LAG=="Three_before" ~ -3,
                                 LAG=="Four_before" ~ -4,
                                 LAG=="Five_before" ~ -5,
                                 LAG=="Six_before" ~ -6,
                                 LAG=="Seven_before" ~ -7,
                                 LAG=="No_oil_Tweet" ~ -8,))%>%
        ggplot(.,aes(x=as.factor(Lag_num),y=BPP,color=Period))+
        scale_color_manual(values=c("green4","red"))+
        geom_point(size=7,alpha=0.5)+
        facet_grid(.~Period)+
        geom_errorbar(aes(ymin=BPP-BPSD, ymax=BPP+BPSD), width=.2,
                      position=position_dodge(0.05))+
        labs(title = "BP stock price NYSE vs Trump Oil tweets",
             x="Lag (days)",
             y="BP stock price (USD)")+
        ggthemes::theme_igray()

Eni

trump_oil %>% group_by(Period,LAG) %>% 
        summarise(TotalP=mean(Total,na.rm=T),ShellP=mean(Shell,na.rm=T),BPP=mean(BP,na.rm=T),EniP=mean(Eni,na.rm=T),
                  TotalSD=sd(Total,na.rm=T),ShellSD=sd(Shell,na.rm=T),BPSD=sd(BP,na.rm=T),EniSD=sd(Eni,na.rm=T)) %>% 
        arrange(.,desc(TotalP)) %>%
        mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
                                 LAG=="One" ~ 1,
                                 LAG=="Two" ~ 2,
                                 LAG=="Three" ~ 3,
                                 LAG=="Four" ~ 4,
                                 LAG=="Five" ~ 5,
                                 LAG=="Six" ~ 6,
                                 LAG=="Seven" ~ 7,
                                 LAG=="One_before" ~ -1,
                                 LAG=="Two_before" ~ -2,
                                 LAG=="Three_before" ~ -3,
                                 LAG=="Four_before" ~ -4,
                                 LAG=="Five_before" ~ -5,
                                 LAG=="Six_before" ~ -6,
                                 LAG=="Seven_before" ~ -7,
                                 LAG=="No_oil_Tweet" ~ -8,))%>%
        ggplot(.,aes(x=as.factor(Lag_num),y=EniP,color=Period))+
        scale_color_manual(values=c("black","red"))+
        geom_point(size=7,alpha=0.5)+
        facet_grid(.~Period)+
        geom_errorbar(aes(ymin=EniP-EniSD, ymax=EniP+EniSD), width=.2,
                      position=position_dodge(0.05))+
        labs(title = "Eni stock price NYSE vs Trump Oil tweets",
             x="Lag (days)",
             y="Eni stock price (USD)")+
        ggthemes::theme_igray()

International major service comapanies and their NYSE stock behavior before and after an oil tweet

Schlumberger

trump_oil %>% group_by(Period,LAG) %>% 
        summarise(SlbP=mean(Slb,na.rm=T),HallP=mean(Hall,na.rm=T),BakerP=mean(Baker,na.rm=T),
                  SlbSD=sd(Slb,na.rm=T),HallSD=sd(Hall,na.rm=T),BakerSD=sd(Baker,na.rm=T)) %>% 
        arrange(.,desc(SlbP)) %>%
        mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
                                 LAG=="One" ~ 1,
                                 LAG=="Two" ~ 2,
                                 LAG=="Three" ~ 3,
                                 LAG=="Four" ~ 4,
                                 LAG=="Five" ~ 5,
                                 LAG=="Six" ~ 6,
                                 LAG=="Seven" ~ 7,
                                 LAG=="One_before" ~ -1,
                                 LAG=="Two_before" ~ -2,
                                 LAG=="Three_before" ~ -3,
                                 LAG=="Four_before" ~ -4,
                                 LAG=="Five_before" ~ -5,
                                 LAG=="Six_before" ~ -6,
                                 LAG=="Seven_before" ~ -7,
                                 LAG=="No_oil_Tweet" ~ -8,))%>%
        ggplot(.,aes(x=as.factor(Lag_num),y=SlbP,color=Period))+
        scale_color_manual(values=c("darkblue","red"))+
        geom_point(size=7,alpha=0.5)+
        facet_grid(.~Period)+
        geom_errorbar(aes(ymin=SlbP-SlbSD, ymax=SlbP+SlbSD), width=.2,
                      position=position_dodge(0.05))+
        labs(title = "Schlumberger stock price NYSE vs Trump Oil tweets",
             x="Lag (days)",
             y="Schlumberger stock price (USD)")+
        ggthemes::theme_igray()

Halliburton

trump_oil %>% group_by(Period,LAG) %>% 
        summarise(SlbP=mean(Slb,na.rm=T),HallP=mean(Hall,na.rm=T),BakerP=mean(Baker,na.rm=T),
                  SlbSD=sd(Slb,na.rm=T),HallSD=sd(Hall,na.rm=T),BakerSD=sd(Baker,na.rm=T)) %>% 
        arrange(.,desc(SlbP)) %>%
        mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
                                 LAG=="One" ~ 1,
                                 LAG=="Two" ~ 2,
                                 LAG=="Three" ~ 3,
                                 LAG=="Four" ~ 4,
                                 LAG=="Five" ~ 5,
                                 LAG=="Six" ~ 6,
                                 LAG=="Seven" ~ 7,
                                 LAG=="One_before" ~ -1,
                                 LAG=="Two_before" ~ -2,
                                 LAG=="Three_before" ~ -3,
                                 LAG=="Four_before" ~ -4,
                                 LAG=="Five_before" ~ -5,
                                 LAG=="Six_before" ~ -6,
                                 LAG=="Seven_before" ~ -7,
                                 LAG=="No_oil_Tweet" ~ -8,))%>%
        ggplot(.,aes(x=as.factor(Lag_num),y=HallP,color=Period))+
        scale_color_manual(values=c("darkred","red"))+
        geom_point(size=7,alpha=0.5)+
        facet_grid(.~Period)+
        geom_errorbar(aes(ymin=HallP-HallSD, ymax=HallP+HallSD), width=.2,
                      position=position_dodge(0.05))+
        labs(title = "Halliburton stock price NYSE vs Trump Oil tweets",
             x="Lag (days)",
             y="Halliburton stock price (USD)")+
        ggthemes::theme_igray()

Baker Hughes international a GE company

trump_oil %>% group_by(Period,LAG) %>% 
        summarise(SlbP=mean(Slb,na.rm=T),HallP=mean(Hall,na.rm=T),BakerP=mean(Baker,na.rm=T),
                  SlbSD=sd(Slb,na.rm=T),HallSD=sd(Hall,na.rm=T),BakerSD=sd(Baker,na.rm=T)) %>% 
        arrange(.,desc(SlbP)) %>%
        mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
                                 LAG=="One" ~ 1,
                                 LAG=="Two" ~ 2,
                                 LAG=="Three" ~ 3,
                                 LAG=="Four" ~ 4,
                                 LAG=="Five" ~ 5,
                                 LAG=="Six" ~ 6,
                                 LAG=="Seven" ~ 7,
                                 LAG=="One_before" ~ -1,
                                 LAG=="Two_before" ~ -2,
                                 LAG=="Three_before" ~ -3,
                                 LAG=="Four_before" ~ -4,
                                 LAG=="Five_before" ~ -5,
                                 LAG=="Six_before" ~ -6,
                                 LAG=="Seven_before" ~ -7,
                                 LAG=="No_oil_Tweet" ~ -8,))%>%
        ggplot(.,aes(x=as.factor(Lag_num),y=BakerP,color=Period))+
        scale_color_manual(values=c("steelblue4","red"))+
        geom_point(size=7,alpha=0.5)+
        facet_grid(.~Period)+
        geom_errorbar(aes(ymin=BakerP-BakerSD, ymax=BakerP+BakerSD), width=.2,
                      position=position_dodge(0.05))+
        labs(title = "Baker Hughes Int stock price NYSE vs Trump Oil tweets",
             x="Lag (days)",
             y="Baker Hughes Int stock price (USD)")+
        ggthemes::theme_igray()

Is the difference in commodities prices and most notorious lags significative?

#Anova test for Brent oil prices and Lags, after trump presidency

trump_oil %>% filter(Period=="After") %>%
        aov(Brent~LAG,data = .) %>% summary()
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## LAG          15  15322  1021.5   6.888 2.79e-14 ***
## Residuals   851 126200   148.3                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 381 observations deleted due to missingness
#Anova test for WTI oil prices and Lags, after trump presidency

trump_oil %>% filter(Period=="After") %>%
        aov(WTI~LAG,data = .) %>% summary()
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## LAG          15   8878   591.9   5.925 7.99e-12 ***
## Residuals   837  83602    99.9                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 395 observations deleted due to missingness
#Anova test for Henry hub gas prices and Lags, after trump presidency

trump_oil %>% filter(Period=="After") %>%
        aov(WTI~LAG,data = .) %>% summary()
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## LAG          15   8878   591.9   5.925 7.99e-12 ***
## Residuals   837  83602    99.9                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 395 observations deleted due to missingness

The anova test, performed on the three commodities show that there is indeed a significative difference between the groups, the next step will be to test individually each of the future lags (from zero to seven days after) against zero for the different prices commodities, but correcting the obtained p-values with the Bonferroni correction of 15, since there are 16 different lag categories (seven days before, seven days after, the zero and the No oil tweet category).These test will be conducted first on data after 2016 on which donal trump became president.

pval_Brent adj_pval_Brent pval_WTI adj_pval_WTI pval_Gas adj_pval_Gas Lag
0.1413949 1.0000000 0.2473435 1.0000000 0.5813290 1.0000000 1
0.0069585 0.1043768 0.0192674 0.2890112 0.2958014 1.0000000 2
0.0014873 0.0223101 0.0037113 0.0556697 0.1241664 1.0000000 3
0.0002378 0.0035673 0.0005188 0.0077815 0.0403541 0.6053113 4
0.0000014 0.0000217 0.0000055 0.0000820 0.0055702 0.0835532 5
0.0004196 0.0062938 0.0014855 0.0222828 0.0086880 0.1303196 6
0.0006561 0.0098421 0.0014634 0.0219514 0.0855699 1.0000000 7

For Brent and WTI oil prices there is a significative bonferroni adjusted difference on Lags 5,6 and 7 seven days, on which prices tend to be lower compared to the prices on the day of an oil tweet. Henry hub gas prices dont show a significative different for all the tested lags.

Is there any causality?

To test this question a linear granger causality test will be used on different lags. the test will be conducted on the form of if Trumps oil related tweets cause Brent oil prices.

After Presidency

Lag Brent WTI Gas Exxon Chevron Conoco Shell BP Eni Total Slb Hall Baker
1 0.3788865 0.1218394 0.5391637 0.7199238 0.6872522 0.1428166 0.5577220 0.7865204 0.9888505 0.7244813 0.1560236 0.2035646 0.9206718
2 0.0950934 0.2839154 0.4376983 0.7052023 0.6322892 0.0284113 0.5139555 0.3062810 0.5423909 0.2431858 0.3634272 0.4220438 0.5971868
3 0.1357214 0.2478331 0.6245419 0.4798047 0.8822479 0.0691592 0.3676087 0.4282354 0.7370478 0.3966321 0.2937457 0.3998610 0.8267625
4 0.1958477 0.3658168 0.7586976 0.6193188 0.9247578 0.1193007 0.5309001 0.3108296 0.5337400 0.3535148 0.4654277 0.2694358 0.9149002
5 0.2090139 0.2045971 0.9151566 0.7596635 0.6887044 0.1790673 0.7167755 0.4751900 0.7546911 0.4095972 0.5684025 0.3027925 0.6783316
6 0.0100615 0.0405328 0.9951684 0.8147781 0.7617168 0.2638261 0.7068435 0.5510556 0.8500280 0.4695312 0.5155986 0.3409791 0.5972505
7 0.0084340 0.0662446 0.9766937 0.7017936 0.7404184 0.3811698 0.8082257 0.6795385 0.8982271 0.5402812 0.6159437 0.3914021 0.6949052
8 0.0068809 0.0126593 0.9682278 0.8115574 0.8166166 0.4509790 0.7586276 0.6399768 0.7410641 0.3648474 0.7123440 0.5079508 0.6580017
9 0.0061314 0.0147441 0.9608297 0.8817206 0.8364847 0.4022920 0.7094930 0.4068686 0.7934637 0.2737906 0.6426183 0.4471208 0.4588495
10 0.0117433 0.0125890 0.9581397 0.7117214 0.7015058 0.4207974 0.7125026 0.3711448 0.8021638 0.3161656 0.6036470 0.4542016 0.3877114

Before Presidency

Lag Brent WTI Gas Exxon Chevron Conoco Shell BP Eni Total Slb Hall Baker
1 0.7127004 0.2624718 0.1372648 0.6894791 0.8750065 0.3465814 0.9700636 0.7972095 0.6217511 0.4091932 0.9176655 0.5162742 0.9226097
2 0.5816320 0.4909198 0.2028392 0.5309841 0.6555008 0.5126192 0.5311177 0.9629061 0.2981969 0.3878005 0.9733439 0.8122758 0.6598192
3 0.4817415 0.5459858 0.2772608 0.7543144 0.8276508 0.6809267 0.5951480 0.9763280 0.4703933 0.6083229 0.9923801 0.9525637 0.7464839
4 0.6498627 0.6041440 0.3268328 0.8979539 0.9237661 0.7981982 0.2347643 0.6835916 0.5006487 0.6957364 0.9372630 0.8361698 0.7934268
5 0.4851206 0.5355252 0.5362841 0.8610170 0.9558771 0.8516791 0.3456232 0.8049645 0.6390949 0.8198664 0.9568620 0.7705664 0.7200738
6 0.5439312 0.5831218 0.6403075 0.9158071 0.9655368 0.8967668 0.3505623 0.7255840 0.7754368 0.9016949 0.9820515 0.8383852 0.8260450
7 0.5890755 0.5354390 0.4975317 0.4074147 0.6865789 0.8361490 0.2499637 0.7680803 0.7302268 0.6814512 0.8528792 0.2979350 0.8744994
8 0.6923527 0.5936402 0.5955404 0.4147502 0.7743958 0.8793135 0.3275968 0.7657244 0.7208178 0.5861303 0.8916764 0.3422696 0.8054885
9 0.7071684 0.6727827 0.6726226 0.3645072 0.6789623 0.8541175 0.4379858 0.8357645 0.7734071 0.7040570 0.8648996 0.3589178 0.8710504
10 0.7770769 0.5864383 0.6684431 0.4137133 0.5438431 0.9021884 0.5167547 0.7648746 0.7629527 0.6736278 0.9090112 0.4397246 0.9101266

Commodities

bind_rows(Granger_causal_test_before,Granger_causal_test_after,.id = "Period") %>%
        mutate(Period=case_when(Period==1 ~ "Before", Period==2 ~ "After"))  %>%
        select(Lag,Brent,WTI,Gas,Period) %>%
        tidyr::gather(Brent,WTI,Gas,key="Commodity",value="Pvalue") %>%
        ggplot(.,aes(x=as.factor(Lag),y=Pvalue,color=Commodity,shape=Commodity))+
        scale_color_manual(values=c("Blue","Red","Black"))+
        geom_point(size=3,alpha=0.8)+
        geom_line(aes(x=Lag,y=Pvalue,color=Commodity),alpha=0.8)+
        ylim(0,1)+
        geom_hline(yintercept = 0.05,color="black",linetype = 'dashed')+
        geom_hline(yintercept = 0.1,color="red",linetype = 'dashed')+
        labs(title = "Granger causality test results commodity prices as function of trump oil tweets",
             x="Lag (Days)",
             y="p-value of Granger causality test")+
        facet_grid(.~Period)+
        ggthemes::theme_igray()

From the Granger causality test, results, the mos notorious difference is that before 2016 and Donald Trump became president, for Lags of 1 to 10, there is not a significant lag on which causality of tweets on commodity prices could be claimed, however after he became president, for Brent and WTI oil, upper Lags (above 5) start to become significant and causality can be claimed. In the case of Henry hub gas prices, there is not causality at any tested lag.

American Major operators

bind_rows(Granger_causal_test_before,Granger_causal_test_after,.id = "Period") %>%
        mutate(Period=case_when(Period==1 ~ "Before", Period==2 ~ "After"))  %>%
        select(Lag,Exxon,Chevron,Conoco,Period) %>%
        tidyr::gather(Exxon,Chevron,Conoco,key="Stock",value="Pvalue") %>%
        ggplot(.,aes(x=as.factor(Lag),y=Pvalue,color=Stock,shape=Stock))+
        scale_color_manual(values=c("Blue","Red","Black"))+
        geom_point(size=3,alpha=0.8)+
        geom_line(aes(x=Lag,y=Pvalue,color=Stock),alpha=0.8)+
        ylim(0,1)+
        geom_hline(yintercept = 0.05,color="black",linetype = 'dashed')+
        geom_hline(yintercept = 0.1,color="red",linetype = 'dashed')+
        labs(title = "Granger causality test results Stock prices as function of trump oil tweets",
             x="Lag (Days)",
             y="p-value of Granger causality test")+
        facet_grid(.~Period)+
        ggthemes::theme_igray()

International Major operators

bind_rows(Granger_causal_test_before,Granger_causal_test_after,.id = "Period") %>%
        mutate(Period=case_when(Period==1 ~ "Before", Period==2 ~ "After"))  %>%
        select(Lag,Shell,BP,Eni,Total,Period) %>%
        tidyr::gather(Shell,BP,Eni,Total,key="Stock",value="Pvalue") %>%
        ggplot(.,aes(x=as.factor(Lag),y=Pvalue,color=Stock,shape=Stock))+
        scale_color_manual(values=c("Blue","Red","Black","darkgreen"))+
        geom_point(size=3,alpha=0.8)+
        geom_line(aes(x=Lag,y=Pvalue,color=Stock),alpha=0.8)+
        ylim(0,1)+
        geom_hline(yintercept = 0.05,color="black",linetype = 'dashed')+
        geom_hline(yintercept = 0.1,color="red",linetype = 'dashed')+
        labs(title = "Granger causality test results Stock prices as function of trump oil tweets",
             x="Lag (Days)",
             y="p-value of Granger causality test")+
        facet_grid(.~Period)+
        ggthemes::theme_igray()

Service Major companies

bind_rows(Granger_causal_test_before,Granger_causal_test_after,.id = "Period") %>%
        mutate(Period=case_when(Period==1 ~ "Before", Period==2 ~ "After"))  %>%
        select(Lag,Slb,Hall,Baker,Period) %>%
        tidyr::gather(Slb,Hall,Baker,key="Stock",value="Pvalue") %>%
        ggplot(.,aes(x=as.factor(Lag),y=Pvalue,color=Stock,shape=Stock))+
        scale_color_manual(values=c("Blue","Red","Black"))+
        geom_point(size=3,alpha=0.8)+
        geom_line(aes(x=Lag,y=Pvalue,color=Stock),alpha=0.8)+
        ylim(0,1)+
        geom_hline(yintercept = 0.05,color="black",linetype = 'dashed')+
        geom_hline(yintercept = 0.1,color="red",linetype = 'dashed')+
        labs(title = "Granger causality test results Stock prices as function of trump oil tweets",
             x="Lag (Days)",
             y="p-value of Granger causality test")+
        facet_grid(.~Period)+
        ggthemes::theme_igray()

Only for oil commodities there is significant granger causality. There is no significant causality for stock prices of any major operator or service company with the exception of Conoco phillips on a 2-3 days lag.

Word frequency and sentiment analysis

load("trump_tweets_all.RData")

dim(trump_tweets)
## [1] 38454     7
names(trump_tweets)
## [1] "Tweets"           "date"             "RT"              
## [4] "Month"            "Day"              "Year"            
## [7] "oil_related_word"

This dataset is composed by 38454 tweets of Donald trump from the first of january 2009 until the 31 of may 2019

Word frequency comparisson between oil tweets and No oil tweets

trump_tweets_oil<-trump_tweets %>% filter(oil_related_word==TRUE) %>%
        select(Tweets) %>%
        unnest_tokens(word, Tweets) %>% 
        mutate(word = str_extract(word, "[a-z']+")) %>%
        anti_join(stop_words)
## Joining, by = "word"
trump_tweets_oil<-trump_tweets_oil[complete.cases(trump_tweets_oil),]

trump_tweets_no_oil<-trump_tweets %>% filter(oil_related_word==FALSE) %>%
        select(Tweets) %>%
        unnest_tokens(word, Tweets) %>% 
        mutate(word = str_extract(word, "[a-z']+")) %>%
        anti_join(stop_words)
## Joining, by = "word"
trump_tweets_no_oil<-trump_tweets_no_oil[complete.cases(trump_tweets_no_oil),]

bind_rows(mutate(trump_tweets_oil, author = "Oil"),
        mutate(trump_tweets_no_oil, author = "No_Oil")) %>%
        count(author, word) %>%
        group_by(author) %>%
        mutate(proportion = n / sum(n)) %>% 
        select(-n) %>% 
        spread(author,proportion) %>%
        ggplot(.,aes(x = No_Oil, y = Oil,color=No_Oil)) +
        geom_jitter(alpha = 0.1, size = 2.5, width = 0.3, height = 0.3)+
        scale_x_log10(labels = percent_format()) +
        scale_y_log10(labels = percent_format()) +
        geom_text(aes(label = word), check_overlap = TRUE, vjust = 1.5) +
        geom_abline(color = "red", lty = 2) +
        scale_color_gradient(limits = c(0, 0.001), low = "blue", high = "red") +
        theme(legend.position="none") +
        labs(y = "Frequency Oil related tweets", x = "Frequency No Oil related tweets")+
        ggthemes::theme_tufte()

Words, such as collusion, prices, iraq, keystone and gallon tend to appear more on Oil related tweets compared to no oil related tweets

Sentiment analysis

To perform the sentiment analysis a predefined dictinary of already classified words will be used, in this case the “bing” dictionary from the tidytext package

trump_tweets %>% unnest_tokens(word, Tweets) %>% inner_join(get_sentiments("bing")) %>% count(date,sentiment) %>%
        spread(sentiment, n, fill = 0) %>%
        mutate(sentiment = positive - negative,polarity=(positive - negative)/(positive + negative)) %>% 
        left_join(trump_oil,.) %>% 
        mutate(year=year(date)) %>%
        group_by(Period,oil_keyword) %>%
        summarise(positive=mean(positive,na.rm=TRUE),
                  negative=mean(negative,na.rm=TRUE),
                  sentiment=mean(sentiment,na.rm=TRUE),
                  polarity=mean(polarity,na.rm = TRUE)) %>% knitr::kable()
## Joining, by = "word"
## Joining, by = "date"
Period oil_keyword positive negative sentiment polarity
After FALSE 11.63658 7.121923 4.514654 0.3041282
After TRUE 17.17808 13.663014 3.515068 0.1190517
Before FALSE 15.69825 6.347368 9.350877 0.4532216
Before TRUE 22.44979 9.797071 12.652720 0.3170058
trump_tweets %>% unnest_tokens(word, Tweets) %>% inner_join(get_sentiments("bing")) %>% count(date,sentiment) %>%
        spread(sentiment, n, fill = 0) %>%
        mutate(sentiment = positive - negative,polarity=(positive - negative)/(positive + negative)) %>% 
        left_join(trump_oil,.) %>% 
        mutate(year=year(date)) %>%
        group_by(oil_keyword,year) %>%
        summarise(positive=mean(positive,na.rm=TRUE),
                  negative=mean(negative,na.rm=TRUE),
                  sentiment=mean(sentiment,na.rm=TRUE),
                  polarity=mean(polarity,na.rm = TRUE)) %>%
        ggplot(.,aes(x=as.factor(year),y=polarity,fill=oil_keyword))+
        geom_col(color="blue")+
        theme(axis.text.x = element_text(angle = 90, hjust = 1))+
        facet_wrap(.~oil_keyword)+
        labs(title = "Trump Tweets polarity",
             x="Year",
             y="Tweet Polarity")
## Joining, by = "word"
## Joining, by = "date"

The previous table and graph shows that in general trump tweets have become more balanced and less polarized over the years, and specially for oil related tweets, and even more after he became the US president.

Using a mored advanced dictionary such as the “nrc” it is possible to even further categorize the sentimen of the tweets

trump_tweets %>% unnest_tokens(word, Tweets) %>% inner_join(get_sentiments("nrc")) %>% count(date,sentiment) %>%
        spread(sentiment, n, fill = 0) %>%
        left_join(trump_oil,.) %>% 
        mutate(year=year(date)) %>%
        group_by(Period,oil_keyword) %>%
        summarise(positive=mean(positive,na.rm=TRUE),
                  negative=mean(negative,na.rm=TRUE),
                  anger=mean(anger,na.rm=TRUE),
                  anticipation=mean(anticipation,na.rm = TRUE),
                  disgust=mean(disgust,na.rm = TRUE),
                  fear=mean(fear,na.rm=TRUE),
                  joy=mean(joy,na.rm=TRUE),
                  sadness=mean(sadness,na.rm=TRUE),
                  surprise=mean(surprise,na.rm=TRUE),
                  trust=mean(trust,na.rm=TRUE)) %>% knitr::kable()
## Joining, by = "word"
## Joining, by = "date"
Period oil_keyword positive negative anger anticipation disgust fear joy sadness surprise trust
After FALSE 11.83372 7.920375 4.373536 5.437939 2.688525 4.510539 4.485949 3.861827 3.296253 8.573771
After TRUE 19.33973 14.989041 8.482192 8.980822 6.013699 8.342466 7.120548 7.008219 5.627397 14.945206
Before FALSE 14.01532 6.791489 3.507234 7.145532 2.303830 3.661277 6.676596 3.616170 5.688511 9.171915
Before TRUE 22.07741 10.751046 5.424686 10.719665 3.625523 5.790795 10.391213 5.535565 8.332636 14.395397

Oil related tweets after donald trump became president are categorized by high levels of negativity, anger, disgust, fear and sadness.