oil related data was downloaded from the webpage of the Energy information administration eia:
President Donald trump tweets from @realdonaltrump where obtained thanks to an open repo on the following address, they can also be downloaded on .jason format from the following github repository and repository
all the data wrangling code can be found on the following github repository
oil data and donald trump tweets were joined and grouped by day, tweets were searched looking for the following oil related words or combination of words:
The result is an unified data set of 3684 observations and 10 variables
load("trump_oil_all.RData")
dim(trump_oil)
## [1] 3684 20
names(trump_oil)
## [1] "date" "oil_word_count" "oil_keyword" "Tweet_volume"
## [5] "RT" "Brent" "WTI" "Gas"
## [9] "Exxon" "Chevron" "Conoco" "Shell"
## [13] "BP" "Eni" "Total" "Slb"
## [17] "Hall" "Baker" "LAG" "Period"
Description of variables, units of measurements, data category and the source of data, can be found on the tables below:
Codebook Varible Description
| Variable Name | Description |
|---|---|
| Date | Date of the observation (date variable, format year-month-day) |
| Oil_word_count | Volumen of tweets that contains an oil related words (numeric) |
| oil_keyword | logical variable that identifies if the observation is oil related (Boolean) |
| Tweet_volume | Total number of tweets made by donal trump on a single day (numeric) |
| RT | Number of re-tweets (numeric) |
| Brent | Brent spot oil price (numeric, in USD per barrel) |
| WTI | WTI spot oil price (numeric, in USD per barrel) |
| Gas | Henry Hub spot gas price (numeric, in USD per TCF or trillion cubic feet) |
| Exxon | Exxon Mobil opening stock daily price NYSE (numeric, in USD) |
| Chevron | Chevron opening stock daily price NYSE (numeric, in USD) |
| Conoco | Conoco Phillips opening stock daily price NYSE (numeric, in USD) |
| Shell | Royal Dutch Shell opening stock daily price NYSE (numeric, in USD) |
| BP | BP opening stock daily price NYSE (numeric, in USD) |
| Eni | Eni opening stock daily price NYSE (numeric, in USD) |
| Total | Total opening stock daily price NYSE (numeric, in USD) |
| Slb | Schlumberger opening stock daily price NYSE (numeric, in USD) |
| Hall | Halliburton opening stock daily price NYSE (numeric, in USD) |
| Baker | Baker Hughes opening stock daily price NYSE (numeric, in USD) |
| LAG | Variable that identifies the relative position of a day compared to the neares day of an oil tweet (categorical) |
| Period | Logical variable that identifies if the tweet corresponds to a date earlier than 2016 (Boolean) |
one of the first aspects to look is the distribution of oil related tweets over time
trump_oil %>% mutate(Year=year(date)) %>%
group_by(Year) %>%
summarise(Tweet_Vol=sum(Tweet_volume,na.rm=TRUE),
Oil_Tweets=sum(oil_keyword==TRUE,na.rm=TRUE),
percentage_Oil_Tweets=(sum(oil_keyword==TRUE,na.rm=TRUE)/sum(Tweet_volume,na.rm = TRUE))*100) %>%
knitr::kable()
| Year | Tweet_Vol | Oil_Tweets | percentage_Oil_Tweets |
|---|---|---|---|
| 2009 | 56 | 0 | 0.000000 |
| 2010 | 142 | 8 | 5.633803 |
| 2011 | 774 | 51 | 6.589147 |
| 2012 | 3531 | 114 | 3.228547 |
| 2013 | 8144 | 119 | 1.461198 |
| 2014 | 5784 | 91 | 1.573306 |
| 2015 | 7536 | 95 | 1.260616 |
| 2016 | 4225 | 47 | 1.112426 |
| 2017 | 2605 | 97 | 3.723608 |
| 2018 | 3510 | 146 | 4.159544 |
| 2019 | 2146 | 75 | 3.494874 |
trump_oil %>% mutate(Year=year(date)) %>%
group_by(Year) %>%
summarise(Tweet_Vol=sum(Tweet_volume,na.rm=TRUE),
Oil_Tweets=sum(oil_keyword==TRUE,na.rm=TRUE),
percentage_Oil_Tweets=(sum(oil_keyword==TRUE,na.rm=TRUE)/sum(Tweet_volume,na.rm = TRUE))*100) %>%
ggplot(.,aes(y=Oil_Tweets,x=as.factor(Year)))+
geom_bar(stat = "identity",fill="red",alpha="0.4",color="blue")+
labs(y="Oil related Tweets",
title = "Trump Oil related Tweets over time",
x="Year")+
ggthemes::theme_tufte()
trump_oil %>% mutate(Year=year(date)) %>%
group_by(Year) %>%
summarise(Tweet_Vol=sum(Tweet_volume,na.rm=TRUE),
Oil_Tweets=sum(oil_keyword==TRUE,na.rm=TRUE),
percentage_Oil_Tweets=(sum(oil_keyword==TRUE,na.rm=TRUE)/sum(Tweet_volume,na.rm = TRUE))*100) %>%
ggplot(.,aes(y=percentage_Oil_Tweets,x=as.factor(Year)))+
geom_bar(stat = "identity",fill="blue",alpha="0.4",color="red")+
labs(y="Oil related Tweets percentage",
title = "Percentage of Trump Oil related Tweets over time",
x="Year")+
ggthemes::theme_tufte()
by look at the trend of the volume of oil related tweets it is possible to claim that the number has increase over the pass of the years, altought by normalizing this quantity by the total volume of tweets, the picture changes and in fact, on recent years the percentage of tweets related to oil has stabilized around 3% and previously was in fact decreasing over time, up until 2017.
A very interesting question and one of the most fundamentals of this analysis is if there is a significant difference in prices when there is an oil tweet of donald trump. This question can be answered by using a simple t-test.
#t-test oil word vs brent oil price Before presidency
trump_oil %>% filter(Period=="Before") %>%
t.test(Brent~oil_keyword,data = .)
##
## Welch Two Sample t-test
##
## data: Brent by oil_keyword
## t = -6.5226, df = 675.13, p-value = 1.355e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.335166 -6.089766
## sample estimates:
## mean in group FALSE mean in group TRUE
## 88.98724 97.69971
#t-test oil word vs brent oil price after presidency
trump_oil %>% filter(Period=="After") %>%
t.test(Brent~oil_keyword,data = .)
##
## Welch Two Sample t-test
##
## data: Brent by oil_keyword
## t = -7.1919, df = 476.4, p-value = 2.492e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.354810 -4.769119
## sample estimates:
## mean in group FALSE mean in group TRUE
## 55.64804 62.21000
trump_oil %>% ggplot(.,aes(y=Brent,x=oil_keyword,fill=oil_keyword))+geom_boxplot(alpha=0.5)+
labs(title = "Brent Oil Prices vs Trump's Oil tweets",
x= "Oil Tweet",
y="Brent spot Oil Price (USD per barrel)")+
facet_grid(. ~ Period)+
ggthemes::theme_tufte()
## Warning: Removed 1138 rows containing non-finite values (stat_boxplot).
#t-test oil word vs WTI oil price Before presidency
trump_oil %>% filter(Period=="Before") %>%
t.test(WTI~oil_keyword,data = .)
##
## Welch Two Sample t-test
##
## data: WTI by oil_keyword
## t = -4.9457, df = 646.53, p-value = 9.684e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -7.643486 -3.298911
## sample estimates:
## mean in group FALSE mean in group TRUE
## 81.91073 87.38193
#t-test oil word vs WTI oil price After presidency
trump_oil %>% filter(Period=="After") %>%
t.test(WTI~oil_keyword,data = .)
##
## Welch Two Sample t-test
##
## data: WTI by oil_keyword
## t = -6.3649, df = 445.55, p-value = 4.862e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.411631 -3.386315
## sample estimates:
## mean in group FALSE mean in group TRUE
## 52.26750 57.16647
trump_oil %>% ggplot(.,aes(y=WTI,x=oil_keyword,fill=oil_keyword))+geom_boxplot(alpha=0.5)+
scale_fill_manual(values=c("blue","red"))+
labs(title = "WTI Oil Prices vs Trump's Oil tweets",
x= "Oil Tweet",
y="WTI spot Oil Price (USD per barrel)")+
facet_grid(. ~ Period)+
ggthemes::theme_tufte()
## Warning: Removed 1150 rows containing non-finite values (stat_boxplot).
trump_oil %>% filter(Period=="Before") %>%
t.test(Gas~oil_keyword,data = .)
##
## Welch Two Sample t-test
##
## data: Gas by oil_keyword
## t = 8.733, df = 737.43, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.3234874 0.5111040
## sample estimates:
## mean in group FALSE mean in group TRUE
## 3.755874 3.338578
#t-test oil word vs Henry Hub gas price After presidency
trump_oil %>% filter(Period=="After") %>%
t.test(Gas~oil_keyword,data = .)
##
## Welch Two Sample t-test
##
## data: Gas by oil_keyword
## t = -3.4443, df = 465.04, p-value = 0.0006244
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2111928 -0.0577519
## sample estimates:
## mean in group FALSE mean in group TRUE
## 2.836565 2.971037
trump_oil %>% ggplot(.,aes(y=Gas,x=oil_keyword,fill=oil_keyword))+geom_boxplot(alpha=0.5)+
scale_fill_manual(values=c("Darkblue","darkred"))+
labs(title = "Henry Hub Gas Prices vs Trump's Oil tweets",
x= "Oil Tweet",
y="Henry Hub spot Gas Prices (USD per TCF)")+
facet_grid(. ~ Period)+
ggthemes::theme_tufte()
## Warning: Removed 1135 rows containing non-finite values (stat_boxplot).
For both Brent and WTI oil prices there is indeed a significative difference depending on wether trump tweets about oil or not, prices are lower when there is no oil tweet, and such behavior applies for year before he became president and after. Henry hub gas prices shows as well a significative difference in relationship with trumo oil tweets, however before trumo became a president, and in contrast with the behavior of oil, prices where in higher when there was no oil tweet and lower on oil related tweet days, such trend or difference changed after he became president.
It is interesting as well to evaluate if the oil tweeting patterns of president trump are affected by price, and determine if he tends to tweet about oil on higher or lower prices.
data.frame(table(cut(trump_oil$Brent,3),trump_oil$oil_keyword)) %>%
filter(Var2==TRUE) %>%
select(.,-Var2) %>%
dplyr::rename(.,Price_range=Var1,Tweets=Freq) %>%
knitr::kable()
| Price_range | Tweets |
|---|---|
| (25.9,60.1] | 157 |
| (60.1,94.1] | 196 |
| (94.1,128] | 306 |
data.frame(table(cut(trump_oil$Brent,3),trump_oil$oil_keyword)) %>%
filter(Var2==TRUE) %>%
select(.,-Var2) %>%
dplyr::rename(.,Price_range=Var1,Tweets=Freq) %>%
ggplot(aes(x=Price_range,y=Tweets,fill=Price_range))+geom_bar(stat="identity")+
labs(y="Trump Oil Tweets",
title = "Trump Tweets distribution by price oil range",
x="Oil price Ranges")+ggthemes::theme_tufte()
It seems that higher prices are more prone to have more oil related tweets, altought in order to prove if there is a dependency between donald trump oil tweets and prices we will use a chi-square independence test.
chisq.test(table(cut(trump_oil$Brent,3),trump_oil$oil_keyword))
##
## Pearson's Chi-squared test
##
## data: table(cut(trump_oil$Brent, 3), trump_oil$oil_keyword)
## X-squared = 37.005, df = 2, p-value = 9.215e-09
data.frame(chisq.test(table(cut(trump_oil$Brent,3),trump_oil$oil_keyword))$expected) %>%
select(.,TRUE.) %>% mutate(Price_range=rownames(.)) %>% dplyr::rename(.,Chisq_Expected_Tweets=TRUE.) %>%
select(.,Price_range,Chisq_Expected_Tweets) %>% knitr::kable()
| Price_range | Chisq_Expected_Tweets |
|---|---|
| (25.9,60.1] | 185.8452 |
| (60.1,94.1] | 231.9183 |
| (94.1,128] | 241.2364 |
data.frame(chisq.test(table(cut(trump_oil$Brent,3),trump_oil$oil_keyword))$expected) %>%
select(.,TRUE.) %>% mutate(Price_range=rownames(.)) %>% dplyr::rename(.,Tweets=TRUE.) %>%
select(.,Price_range,Tweets) %>%
ggplot(aes(x=Price_range,y=Tweets,fill=Price_range))+geom_bar(stat="identity")+
labs(y="Expected Tweets by Chi-square Dist.",
title = "Chi-square expected tweets distribution by price oil range",
x="Oil price Ranges")+ggthemes::theme_tufte()
The chi-square independence test, shows that there is indeed a dependency between prices and oil tweets and that trumps distribution presents a significative difference with the chi-square expected independent distribution, in reality donald trumo has more oil related tweets in the higher prices bin and less than expected on the lower prices bin, making it possible to assume that Trump oil tweet behavior is a reaction to high oil prices.
So far, high oil prices and an oil related tweet seems to be correlated, altouhgt it is an interesting correlation indeed, more interesting might be to evaluate the behavior of prices before and after an oil related tweet, and analyze the effect that these tweets might have on the market of this commodities, such evaluation will be done on two different periods, before and after the year on which donald trump became president.
trump_oil %>%
filter(Period=="Before") %>%
group_by(LAG) %>%
summarise(n(),BrentP=mean(Brent,na.rm=T),WTIP=mean(WTI,na.rm=T),GasP=mean(Gas,na.rm=T),
BrentSD=sd(Brent,na.rm=T),WTISD=sd(WTI,na.rm=T),GasSD=sd(Gas,na.rm=T)) %>%
arrange(.,desc(BrentP)) %>% knitr::kable()
| LAG | n() | BrentP | WTIP | GasP | BrentSD | WTISD | GasSD |
|---|---|---|---|---|---|---|---|
| Zero | 478 | 97.69971 | 87.38193 | 3.338578 | 23.82141 | 20.00907 | 0.8337834 |
| One | 301 | 95.64033 | 85.49123 | 3.439340 | 24.38514 | 20.55360 | 0.8504588 |
| Seven | 57 | 94.42771 | 84.90042 | 3.766939 | 21.53348 | 18.14230 | 0.9737475 |
| Two | 234 | 93.21115 | 83.19205 | 3.493853 | 25.13038 | 21.08731 | 0.8138404 |
| Five | 104 | 93.12094 | 82.46556 | 3.357937 | 24.80643 | 20.34692 | 0.8372115 |
| Six | 70 | 92.75982 | 82.64815 | 3.594444 | 24.24057 | 19.78971 | 0.8188963 |
| Four_before | 21 | 91.62000 | 81.92800 | 4.039000 | 23.90184 | 17.76233 | 1.2405863 |
| One_before | 48 | 91.29108 | 81.72351 | 3.635676 | 24.36614 | 19.95510 | 0.7549523 |
| Three | 187 | 90.72584 | 81.15373 | 3.462843 | 26.08764 | 22.06054 | 0.9106416 |
| Four | 140 | 90.43564 | 80.95228 | 3.495570 | 25.51448 | 21.36600 | 0.8001054 |
| Five_before | 18 | 90.36417 | 79.63250 | 3.629167 | 26.29059 | 20.49569 | 0.9319721 |
| Three_before | 28 | 90.17294 | 81.70882 | 4.096471 | 23.27960 | 18.68586 | 0.8064346 |
| Six_before | 15 | 89.41000 | 83.12583 | 3.818462 | 26.40386 | 18.28130 | 1.0000237 |
| Two_before | 34 | 86.52118 | 78.21941 | 3.835294 | 25.41401 | 20.81788 | 0.9021233 |
| Seven_before | 11 | 84.75909 | 78.07727 | 3.829091 | 26.18512 | 22.05246 | 0.8240079 |
| No_oil_Tweet | 690 | 82.59109 | 80.11140 | 4.131186 | 17.38190 | 12.55725 | 0.7677614 |
trump_oil %>%
filter(Period=="After") %>%
group_by(LAG) %>%
summarise(n(),BrentP=mean(Brent,na.rm=T),WTIP=mean(WTI,na.rm=T),GasP=mean(Gas,na.rm=T),
BrentSD=sd(Brent,na.rm=T),WTISD=sd(WTI,na.rm=T),GasSD=sd(Gas,na.rm=T)) %>%
arrange(.,desc(BrentP)) %>% knitr::kable()
| LAG | n() | BrentP | WTIP | GasP | BrentSD | WTISD | GasSD |
|---|---|---|---|---|---|---|---|
| Zero | 365 | 62.21000 | 57.16647 | 2.971037 | 11.88880 | 10.077184 | 0.5043826 |
| One | 217 | 60.36236 | 55.95943 | 2.942937 | 11.97314 | 9.635169 | 0.4686222 |
| Two | 136 | 58.18175 | 54.27134 | 2.902165 | 12.45402 | 10.238861 | 0.5619909 |
| Three | 90 | 56.77147 | 53.13015 | 2.848985 | 12.23827 | 9.865995 | 0.5954093 |
| Four | 67 | 54.33196 | 51.06311 | 2.766739 | 12.67375 | 10.283296 | 0.6225648 |
| Six_before | 14 | 53.85000 | 51.19500 | 2.929000 | 11.72331 | 8.500851 | 0.5695700 |
| Six | 47 | 53.68088 | 50.93281 | 2.729412 | 12.20876 | 9.668582 | 0.4767182 |
| Seven | 44 | 53.56531 | 50.94687 | 2.811562 | 12.47703 | 9.628227 | 0.4778589 |
| Five_before | 20 | 53.29438 | 50.36688 | 2.919375 | 11.08845 | 8.693283 | 0.6074588 |
| One_before | 37 | 52.76042 | 50.14958 | 2.733200 | 14.25078 | 11.376127 | 0.5622701 |
| Four_before | 21 | 52.47467 | 49.32400 | 2.874000 | 12.30110 | 10.462808 | 0.5568637 |
| Five | 54 | 51.52486 | 49.33658 | 2.663784 | 10.94321 | 8.689866 | 0.6097602 |
| No_oil_Tweet | 71 | 50.90765 | 48.13469 | 2.782353 | 13.23145 | 11.053217 | 0.4635972 |
| Two_before | 30 | 50.73409 | 48.72500 | 2.707391 | 11.67308 | 9.810139 | 0.5151983 |
| Seven_before | 12 | 50.70429 | 49.20857 | 2.715714 | 16.50307 | 13.223657 | 0.6521211 |
| Three_before | 23 | 50.23778 | 47.48000 | 2.723889 | 10.12455 | 8.993649 | 0.5911021 |
The tables with different statistics shows some interesting results, on the effect of tweets on prices, however it might be more interesting to analyze such data with the use of graphics and for each commodity.
#Brent oil prices behavior
trump_oil %>% group_by(Period,LAG) %>%
summarise(BrentP=mean(Brent,na.rm=T),WTIP=mean(WTI,na.rm=T),GasP=mean(Gas,na.rm=T),
BrentSD=sd(Brent,na.rm=T),WTISD=sd(WTI,na.rm=T),GasSD=sd(Gas,na.rm=T)) %>%
arrange(.,desc(BrentP)) %>%
mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
LAG=="One" ~ 1,
LAG=="Two" ~ 2,
LAG=="Three" ~ 3,
LAG=="Four" ~ 4,
LAG=="Five" ~ 5,
LAG=="Six" ~ 6,
LAG=="Seven" ~ 7,
LAG=="One_before" ~ -1,
LAG=="Two_before" ~ -2,
LAG=="Three_before" ~ -3,
LAG=="Four_before" ~ -4,
LAG=="Five_before" ~ -5,
LAG=="Six_before" ~ -6,
LAG=="Seven_before" ~ -7,
LAG=="No_oil_Tweet" ~ -8,))%>%
ggplot(.,aes(x=as.factor(Lag_num),y=BrentP,color=Period))+
geom_point(size=7,alpha=0.5)+
scale_color_manual(values=c("blue","red"))+
facet_grid(.~Period)+
geom_errorbar(aes(ymin=BrentP-BrentSD, ymax=BrentP+BrentSD), width=.2,
position=position_dodge(0.05),color="black")+
labs(title = "Brent Oil Price vs Trump Oil tweets",
x="Lag (days)",
y="Brent Oil spot prices (USD per barrel)")+
ggthemes::theme_igray()
For Brent prices the graph shows various interesting points:
trump_oil %>% group_by(Period,LAG) %>%
summarise(BrentP=mean(Brent,na.rm=T),WTIP=mean(WTI,na.rm=T),GasP=mean(Gas,na.rm=T),
BrentSD=sd(Brent,na.rm=T),WTISD=sd(WTI,na.rm=T),GasSD=sd(Gas,na.rm=T)) %>%
arrange(.,desc(BrentP)) %>%
mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
LAG=="One" ~ 1,
LAG=="Two" ~ 2,
LAG=="Three" ~ 3,
LAG=="Four" ~ 4,
LAG=="Five" ~ 5,
LAG=="Six" ~ 6,
LAG=="Seven" ~ 7,
LAG=="One_before" ~ -1,
LAG=="Two_before" ~ -2,
LAG=="Three_before" ~ -3,
LAG=="Four_before" ~ -4,
LAG=="Five_before" ~ -5,
LAG=="Six_before" ~ -6,
LAG=="Seven_before" ~ -7,
LAG=="No_oil_Tweet" ~ -8,))%>%
ggplot(.,aes(x=as.factor(Lag_num),y=WTIP,color=Period))+
scale_color_manual(values=c("darkgreen","purple"))+
geom_point(size=7,alpha=0.5)+
facet_grid(.~Period)+
geom_errorbar(aes(ymin=WTIP-WTISD, ymax=WTIP+WTISD), width=.2,
position=position_dodge(0.05),color="black")+
labs(title = "WTI Oil Price vs Trump Oil tweets",
x="Lag (days)",
y="WTI Oil spot prices (USD per barrel)")+
ggthemes::theme_igray()
WTI oil prices shows a similar trend as the Brent oil prices.
trump_oil %>% group_by(Period,LAG) %>%
summarise(BrentP=mean(Brent,na.rm=T),WTIP=mean(WTI,na.rm=T),GasP=mean(Gas,na.rm=T),
BrentSD=sd(Brent,na.rm=T),WTISD=sd(WTI,na.rm=T),GasSD=sd(Gas,na.rm=T)) %>%
arrange(.,desc(BrentP)) %>%
mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
LAG=="One" ~ 1,
LAG=="Two" ~ 2,
LAG=="Three" ~ 3,
LAG=="Four" ~ 4,
LAG=="Five" ~ 5,
LAG=="Six" ~ 6,
LAG=="Seven" ~ 7,
LAG=="One_before" ~ -1,
LAG=="Two_before" ~ -2,
LAG=="Three_before" ~ -3,
LAG=="Four_before" ~ -4,
LAG=="Five_before" ~ -5,
LAG=="Six_before" ~ -6,
LAG=="Seven_before" ~ -7,
LAG=="No_oil_Tweet" ~ -8,))%>%
ggplot(.,aes(x=as.factor(Lag_num),y=GasP,color=Period))+
scale_color_manual(values=c("thistle4","steelblue4"))+
geom_point(size=7,alpha=0.5)+
facet_grid(.~Period)+
geom_errorbar(aes(ymin=GasP-GasSD, ymax=GasP+GasSD), width=.2,
position=position_dodge(0.05))+
labs(title = "Henry Hub gas price vs Trump Oil tweets",
x="Lag (days)",
y="Henry Hub gas spot prices (USD per TCF)")+
ggthemes::theme_igray()
Henry hub gas prices dont seem to show a clear different in behavior between the period before and after Trumps presidency.
#Exxon Movil stock prices behavior
trump_oil %>% group_by(Period,LAG) %>%
summarise(ExxonP=mean(Exxon,na.rm=T),ChevronP=mean(Chevron,na.rm=T),ConocoP=mean(Conoco,na.rm=T),
ExxonSD=sd(Exxon,na.rm=T),ChevronSD=sd(Chevron,na.rm=T),ConocoSD=sd(Conoco,na.rm=T)) %>%
arrange(.,desc(ExxonP)) %>%
mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
LAG=="One" ~ 1,
LAG=="Two" ~ 2,
LAG=="Three" ~ 3,
LAG=="Four" ~ 4,
LAG=="Five" ~ 5,
LAG=="Six" ~ 6,
LAG=="Seven" ~ 7,
LAG=="One_before" ~ -1,
LAG=="Two_before" ~ -2,
LAG=="Three_before" ~ -3,
LAG=="Four_before" ~ -4,
LAG=="Five_before" ~ -5,
LAG=="Six_before" ~ -6,
LAG=="Seven_before" ~ -7,
LAG=="No_oil_Tweet" ~ -8,))%>%
ggplot(.,aes(x=as.factor(Lag_num),y=ExxonP,color=Period))+
scale_color_manual(values=c("blue","red"))+
geom_point(size=7,alpha=0.5)+
facet_grid(.~Period)+
geom_errorbar(aes(ymin=ExxonP-ExxonSD, ymax=ExxonP+ExxonSD), width=.2,
position=position_dodge(0.05))+
labs(title = "Exxon Mobil stock price NYSE vs Trump Oil tweets",
x="Lag (days)",
y="Exxon Mobil stock price (USD)")+
ggthemes::theme_igray()
trump_oil %>% group_by(Period,LAG) %>%
summarise(ExxonP=mean(Exxon,na.rm=T),ChevronP=mean(Chevron,na.rm=T),ConocoP=mean(Conoco,na.rm=T),
ExxonSD=sd(Exxon,na.rm=T),ChevronSD=sd(Chevron,na.rm=T),ConocoSD=sd(Conoco,na.rm=T)) %>%
arrange(.,desc(ExxonP)) %>%
mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
LAG=="One" ~ 1,
LAG=="Two" ~ 2,
LAG=="Three" ~ 3,
LAG=="Four" ~ 4,
LAG=="Five" ~ 5,
LAG=="Six" ~ 6,
LAG=="Seven" ~ 7,
LAG=="One_before" ~ -1,
LAG=="Two_before" ~ -2,
LAG=="Three_before" ~ -3,
LAG=="Four_before" ~ -4,
LAG=="Five_before" ~ -5,
LAG=="Six_before" ~ -6,
LAG=="Seven_before" ~ -7,
LAG=="No_oil_Tweet" ~ -8,))%>%
ggplot(.,aes(x=as.factor(Lag_num),y=ChevronP,color=Period))+
scale_color_manual(values=c("darkred","red"))+
geom_point(size=7,alpha=0.5)+
facet_grid(.~Period)+
geom_errorbar(aes(ymin=ChevronP-ChevronSD, ymax=ChevronP+ChevronSD), width=.2,
position=position_dodge(0.05))+
labs(title = "Chevron stock price NYSE vs Trump Oil tweets",
x="Lag (days)",
y="Chevron stock price (USD)")+
ggthemes::theme_igray()
trump_oil %>% group_by(Period,LAG) %>%
summarise(ExxonP=mean(Exxon,na.rm=T),ChevronP=mean(Chevron,na.rm=T),ConocoP=mean(Conoco,na.rm=T),
ExxonSD=sd(Exxon,na.rm=T),ChevronSD=sd(Chevron,na.rm=T),ConocoSD=sd(Conoco,na.rm=T)) %>%
arrange(.,desc(ExxonP)) %>%
mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
LAG=="One" ~ 1,
LAG=="Two" ~ 2,
LAG=="Three" ~ 3,
LAG=="Four" ~ 4,
LAG=="Five" ~ 5,
LAG=="Six" ~ 6,
LAG=="Seven" ~ 7,
LAG=="One_before" ~ -1,
LAG=="Two_before" ~ -2,
LAG=="Three_before" ~ -3,
LAG=="Four_before" ~ -4,
LAG=="Five_before" ~ -5,
LAG=="Six_before" ~ -6,
LAG=="Seven_before" ~ -7,
LAG=="No_oil_Tweet" ~ -8,))%>%
ggplot(.,aes(x=as.factor(Lag_num),y=ConocoP,color=Period))+
scale_color_manual(values=c("steelblue","red"))+
geom_point(size=7,alpha=0.5)+
facet_grid(.~Period)+
geom_errorbar(aes(ymin=ConocoP-ConocoSD, ymax=ConocoP+ConocoSD), width=.2,
position=position_dodge(0.05))+
labs(title = "Conoco stock price NYSE vs Trump Oil tweets",
x="Lag (days)",
y="Conoco stock price (USD)")+
ggthemes::theme_igray()
trump_oil %>% group_by(Period,LAG) %>%
summarise(TotalP=mean(Total,na.rm=T),ShellP=mean(Shell,na.rm=T),BPP=mean(BP,na.rm=T),EniP=mean(Eni,na.rm=T),
TotalSD=sd(Total,na.rm=T),ShellSD=sd(Shell,na.rm=T),BPSD=sd(BP,na.rm=T),EniSD=sd(Eni,na.rm=T)) %>%
arrange(.,desc(TotalP)) %>%
mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
LAG=="One" ~ 1,
LAG=="Two" ~ 2,
LAG=="Three" ~ 3,
LAG=="Four" ~ 4,
LAG=="Five" ~ 5,
LAG=="Six" ~ 6,
LAG=="Seven" ~ 7,
LAG=="One_before" ~ -1,
LAG=="Two_before" ~ -2,
LAG=="Three_before" ~ -3,
LAG=="Four_before" ~ -4,
LAG=="Five_before" ~ -5,
LAG=="Six_before" ~ -6,
LAG=="Seven_before" ~ -7,
LAG=="No_oil_Tweet" ~ -8,))%>%
ggplot(.,aes(x=as.factor(Lag_num),y=ShellP,color=Period))+
scale_color_manual(values=c("orange","red"))+
geom_point(size=7,alpha=0.5)+
facet_grid(.~Period)+
geom_errorbar(aes(ymin=ShellP-ShellSD, ymax=ShellP+ShellSD), width=.2,
position=position_dodge(0.05))+
labs(title = "Shell stock price NYSE vs Trump Oil tweets",
x="Lag (days)",
y="Shell stock price (USD)")+
ggthemes::theme_igray()
trump_oil %>% group_by(Period,LAG) %>%
summarise(TotalP=mean(Total,na.rm=T),ShellP=mean(Shell,na.rm=T),BPP=mean(BP,na.rm=T),EniP=mean(Eni,na.rm=T),
TotalSD=sd(Total,na.rm=T),ShellSD=sd(Shell,na.rm=T),BPSD=sd(BP,na.rm=T),EniSD=sd(Eni,na.rm=T)) %>%
arrange(.,desc(TotalP)) %>%
mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
LAG=="One" ~ 1,
LAG=="Two" ~ 2,
LAG=="Three" ~ 3,
LAG=="Four" ~ 4,
LAG=="Five" ~ 5,
LAG=="Six" ~ 6,
LAG=="Seven" ~ 7,
LAG=="One_before" ~ -1,
LAG=="Two_before" ~ -2,
LAG=="Three_before" ~ -3,
LAG=="Four_before" ~ -4,
LAG=="Five_before" ~ -5,
LAG=="Six_before" ~ -6,
LAG=="Seven_before" ~ -7,
LAG=="No_oil_Tweet" ~ -8,))%>%
ggplot(.,aes(x=as.factor(Lag_num),y=TotalP,color=Period))+
scale_color_manual(values=c("purple","red"))+
geom_point(size=7,alpha=0.5)+
facet_grid(.~Period)+
geom_errorbar(aes(ymin=TotalP-TotalSD, ymax=TotalP+TotalSD), width=.2,
position=position_dodge(0.05))+
labs(title = "Total stock price NYSE vs Trump Oil tweets",
x="Lag (days)",
y="Total stock price (USD)")+
ggthemes::theme_igray()
trump_oil %>% group_by(Period,LAG) %>%
summarise(TotalP=mean(Total,na.rm=T),ShellP=mean(Shell,na.rm=T),BPP=mean(BP,na.rm=T),EniP=mean(Eni,na.rm=T),
TotalSD=sd(Total,na.rm=T),ShellSD=sd(Shell,na.rm=T),BPSD=sd(BP,na.rm=T),EniSD=sd(Eni,na.rm=T)) %>%
arrange(.,desc(TotalP)) %>%
mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
LAG=="One" ~ 1,
LAG=="Two" ~ 2,
LAG=="Three" ~ 3,
LAG=="Four" ~ 4,
LAG=="Five" ~ 5,
LAG=="Six" ~ 6,
LAG=="Seven" ~ 7,
LAG=="One_before" ~ -1,
LAG=="Two_before" ~ -2,
LAG=="Three_before" ~ -3,
LAG=="Four_before" ~ -4,
LAG=="Five_before" ~ -5,
LAG=="Six_before" ~ -6,
LAG=="Seven_before" ~ -7,
LAG=="No_oil_Tweet" ~ -8,))%>%
ggplot(.,aes(x=as.factor(Lag_num),y=BPP,color=Period))+
scale_color_manual(values=c("green4","red"))+
geom_point(size=7,alpha=0.5)+
facet_grid(.~Period)+
geom_errorbar(aes(ymin=BPP-BPSD, ymax=BPP+BPSD), width=.2,
position=position_dodge(0.05))+
labs(title = "BP stock price NYSE vs Trump Oil tweets",
x="Lag (days)",
y="BP stock price (USD)")+
ggthemes::theme_igray()
trump_oil %>% group_by(Period,LAG) %>%
summarise(TotalP=mean(Total,na.rm=T),ShellP=mean(Shell,na.rm=T),BPP=mean(BP,na.rm=T),EniP=mean(Eni,na.rm=T),
TotalSD=sd(Total,na.rm=T),ShellSD=sd(Shell,na.rm=T),BPSD=sd(BP,na.rm=T),EniSD=sd(Eni,na.rm=T)) %>%
arrange(.,desc(TotalP)) %>%
mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
LAG=="One" ~ 1,
LAG=="Two" ~ 2,
LAG=="Three" ~ 3,
LAG=="Four" ~ 4,
LAG=="Five" ~ 5,
LAG=="Six" ~ 6,
LAG=="Seven" ~ 7,
LAG=="One_before" ~ -1,
LAG=="Two_before" ~ -2,
LAG=="Three_before" ~ -3,
LAG=="Four_before" ~ -4,
LAG=="Five_before" ~ -5,
LAG=="Six_before" ~ -6,
LAG=="Seven_before" ~ -7,
LAG=="No_oil_Tweet" ~ -8,))%>%
ggplot(.,aes(x=as.factor(Lag_num),y=EniP,color=Period))+
scale_color_manual(values=c("black","red"))+
geom_point(size=7,alpha=0.5)+
facet_grid(.~Period)+
geom_errorbar(aes(ymin=EniP-EniSD, ymax=EniP+EniSD), width=.2,
position=position_dodge(0.05))+
labs(title = "Eni stock price NYSE vs Trump Oil tweets",
x="Lag (days)",
y="Eni stock price (USD)")+
ggthemes::theme_igray()
trump_oil %>% group_by(Period,LAG) %>%
summarise(SlbP=mean(Slb,na.rm=T),HallP=mean(Hall,na.rm=T),BakerP=mean(Baker,na.rm=T),
SlbSD=sd(Slb,na.rm=T),HallSD=sd(Hall,na.rm=T),BakerSD=sd(Baker,na.rm=T)) %>%
arrange(.,desc(SlbP)) %>%
mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
LAG=="One" ~ 1,
LAG=="Two" ~ 2,
LAG=="Three" ~ 3,
LAG=="Four" ~ 4,
LAG=="Five" ~ 5,
LAG=="Six" ~ 6,
LAG=="Seven" ~ 7,
LAG=="One_before" ~ -1,
LAG=="Two_before" ~ -2,
LAG=="Three_before" ~ -3,
LAG=="Four_before" ~ -4,
LAG=="Five_before" ~ -5,
LAG=="Six_before" ~ -6,
LAG=="Seven_before" ~ -7,
LAG=="No_oil_Tweet" ~ -8,))%>%
ggplot(.,aes(x=as.factor(Lag_num),y=SlbP,color=Period))+
scale_color_manual(values=c("darkblue","red"))+
geom_point(size=7,alpha=0.5)+
facet_grid(.~Period)+
geom_errorbar(aes(ymin=SlbP-SlbSD, ymax=SlbP+SlbSD), width=.2,
position=position_dodge(0.05))+
labs(title = "Schlumberger stock price NYSE vs Trump Oil tweets",
x="Lag (days)",
y="Schlumberger stock price (USD)")+
ggthemes::theme_igray()
trump_oil %>% group_by(Period,LAG) %>%
summarise(SlbP=mean(Slb,na.rm=T),HallP=mean(Hall,na.rm=T),BakerP=mean(Baker,na.rm=T),
SlbSD=sd(Slb,na.rm=T),HallSD=sd(Hall,na.rm=T),BakerSD=sd(Baker,na.rm=T)) %>%
arrange(.,desc(SlbP)) %>%
mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
LAG=="One" ~ 1,
LAG=="Two" ~ 2,
LAG=="Three" ~ 3,
LAG=="Four" ~ 4,
LAG=="Five" ~ 5,
LAG=="Six" ~ 6,
LAG=="Seven" ~ 7,
LAG=="One_before" ~ -1,
LAG=="Two_before" ~ -2,
LAG=="Three_before" ~ -3,
LAG=="Four_before" ~ -4,
LAG=="Five_before" ~ -5,
LAG=="Six_before" ~ -6,
LAG=="Seven_before" ~ -7,
LAG=="No_oil_Tweet" ~ -8,))%>%
ggplot(.,aes(x=as.factor(Lag_num),y=HallP,color=Period))+
scale_color_manual(values=c("darkred","red"))+
geom_point(size=7,alpha=0.5)+
facet_grid(.~Period)+
geom_errorbar(aes(ymin=HallP-HallSD, ymax=HallP+HallSD), width=.2,
position=position_dodge(0.05))+
labs(title = "Halliburton stock price NYSE vs Trump Oil tweets",
x="Lag (days)",
y="Halliburton stock price (USD)")+
ggthemes::theme_igray()
trump_oil %>% group_by(Period,LAG) %>%
summarise(SlbP=mean(Slb,na.rm=T),HallP=mean(Hall,na.rm=T),BakerP=mean(Baker,na.rm=T),
SlbSD=sd(Slb,na.rm=T),HallSD=sd(Hall,na.rm=T),BakerSD=sd(Baker,na.rm=T)) %>%
arrange(.,desc(SlbP)) %>%
mutate(Lag_num=case_when(LAG=="Zero" ~ 0,
LAG=="One" ~ 1,
LAG=="Two" ~ 2,
LAG=="Three" ~ 3,
LAG=="Four" ~ 4,
LAG=="Five" ~ 5,
LAG=="Six" ~ 6,
LAG=="Seven" ~ 7,
LAG=="One_before" ~ -1,
LAG=="Two_before" ~ -2,
LAG=="Three_before" ~ -3,
LAG=="Four_before" ~ -4,
LAG=="Five_before" ~ -5,
LAG=="Six_before" ~ -6,
LAG=="Seven_before" ~ -7,
LAG=="No_oil_Tweet" ~ -8,))%>%
ggplot(.,aes(x=as.factor(Lag_num),y=BakerP,color=Period))+
scale_color_manual(values=c("steelblue4","red"))+
geom_point(size=7,alpha=0.5)+
facet_grid(.~Period)+
geom_errorbar(aes(ymin=BakerP-BakerSD, ymax=BakerP+BakerSD), width=.2,
position=position_dodge(0.05))+
labs(title = "Baker Hughes Int stock price NYSE vs Trump Oil tweets",
x="Lag (days)",
y="Baker Hughes Int stock price (USD)")+
ggthemes::theme_igray()
#Anova test for Brent oil prices and Lags, after trump presidency
trump_oil %>% filter(Period=="After") %>%
aov(Brent~LAG,data = .) %>% summary()
## Df Sum Sq Mean Sq F value Pr(>F)
## LAG 15 15322 1021.5 6.888 2.79e-14 ***
## Residuals 851 126200 148.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 381 observations deleted due to missingness
#Anova test for WTI oil prices and Lags, after trump presidency
trump_oil %>% filter(Period=="After") %>%
aov(WTI~LAG,data = .) %>% summary()
## Df Sum Sq Mean Sq F value Pr(>F)
## LAG 15 8878 591.9 5.925 7.99e-12 ***
## Residuals 837 83602 99.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 395 observations deleted due to missingness
#Anova test for Henry hub gas prices and Lags, after trump presidency
trump_oil %>% filter(Period=="After") %>%
aov(WTI~LAG,data = .) %>% summary()
## Df Sum Sq Mean Sq F value Pr(>F)
## LAG 15 8878 591.9 5.925 7.99e-12 ***
## Residuals 837 83602 99.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 395 observations deleted due to missingness
The anova test, performed on the three commodities show that there is indeed a significative difference between the groups, the next step will be to test individually each of the future lags (from zero to seven days after) against zero for the different prices commodities, but correcting the obtained p-values with the Bonferroni correction of 15, since there are 16 different lag categories (seven days before, seven days after, the zero and the No oil tweet category).These test will be conducted first on data after 2016 on which donal trump became president.
| pval_Brent | adj_pval_Brent | pval_WTI | adj_pval_WTI | pval_Gas | adj_pval_Gas | Lag |
|---|---|---|---|---|---|---|
| 0.1413949 | 1.0000000 | 0.2473435 | 1.0000000 | 0.5813290 | 1.0000000 | 1 |
| 0.0069585 | 0.1043768 | 0.0192674 | 0.2890112 | 0.2958014 | 1.0000000 | 2 |
| 0.0014873 | 0.0223101 | 0.0037113 | 0.0556697 | 0.1241664 | 1.0000000 | 3 |
| 0.0002378 | 0.0035673 | 0.0005188 | 0.0077815 | 0.0403541 | 0.6053113 | 4 |
| 0.0000014 | 0.0000217 | 0.0000055 | 0.0000820 | 0.0055702 | 0.0835532 | 5 |
| 0.0004196 | 0.0062938 | 0.0014855 | 0.0222828 | 0.0086880 | 0.1303196 | 6 |
| 0.0006561 | 0.0098421 | 0.0014634 | 0.0219514 | 0.0855699 | 1.0000000 | 7 |
For Brent and WTI oil prices there is a significative bonferroni adjusted difference on Lags 5,6 and 7 seven days, on which prices tend to be lower compared to the prices on the day of an oil tweet. Henry hub gas prices dont show a significative different for all the tested lags.
To test this question a linear granger causality test will be used on different lags. the test will be conducted on the form of if Trumps oil related tweets cause Brent oil prices.
After Presidency
| Lag | Brent | WTI | Gas | Exxon | Chevron | Conoco | Shell | BP | Eni | Total | Slb | Hall | Baker |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.3788865 | 0.1218394 | 0.5391637 | 0.7199238 | 0.6872522 | 0.1428166 | 0.5577220 | 0.7865204 | 0.9888505 | 0.7244813 | 0.1560236 | 0.2035646 | 0.9206718 |
| 2 | 0.0950934 | 0.2839154 | 0.4376983 | 0.7052023 | 0.6322892 | 0.0284113 | 0.5139555 | 0.3062810 | 0.5423909 | 0.2431858 | 0.3634272 | 0.4220438 | 0.5971868 |
| 3 | 0.1357214 | 0.2478331 | 0.6245419 | 0.4798047 | 0.8822479 | 0.0691592 | 0.3676087 | 0.4282354 | 0.7370478 | 0.3966321 | 0.2937457 | 0.3998610 | 0.8267625 |
| 4 | 0.1958477 | 0.3658168 | 0.7586976 | 0.6193188 | 0.9247578 | 0.1193007 | 0.5309001 | 0.3108296 | 0.5337400 | 0.3535148 | 0.4654277 | 0.2694358 | 0.9149002 |
| 5 | 0.2090139 | 0.2045971 | 0.9151566 | 0.7596635 | 0.6887044 | 0.1790673 | 0.7167755 | 0.4751900 | 0.7546911 | 0.4095972 | 0.5684025 | 0.3027925 | 0.6783316 |
| 6 | 0.0100615 | 0.0405328 | 0.9951684 | 0.8147781 | 0.7617168 | 0.2638261 | 0.7068435 | 0.5510556 | 0.8500280 | 0.4695312 | 0.5155986 | 0.3409791 | 0.5972505 |
| 7 | 0.0084340 | 0.0662446 | 0.9766937 | 0.7017936 | 0.7404184 | 0.3811698 | 0.8082257 | 0.6795385 | 0.8982271 | 0.5402812 | 0.6159437 | 0.3914021 | 0.6949052 |
| 8 | 0.0068809 | 0.0126593 | 0.9682278 | 0.8115574 | 0.8166166 | 0.4509790 | 0.7586276 | 0.6399768 | 0.7410641 | 0.3648474 | 0.7123440 | 0.5079508 | 0.6580017 |
| 9 | 0.0061314 | 0.0147441 | 0.9608297 | 0.8817206 | 0.8364847 | 0.4022920 | 0.7094930 | 0.4068686 | 0.7934637 | 0.2737906 | 0.6426183 | 0.4471208 | 0.4588495 |
| 10 | 0.0117433 | 0.0125890 | 0.9581397 | 0.7117214 | 0.7015058 | 0.4207974 | 0.7125026 | 0.3711448 | 0.8021638 | 0.3161656 | 0.6036470 | 0.4542016 | 0.3877114 |
Before Presidency
| Lag | Brent | WTI | Gas | Exxon | Chevron | Conoco | Shell | BP | Eni | Total | Slb | Hall | Baker |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.7127004 | 0.2624718 | 0.1372648 | 0.6894791 | 0.8750065 | 0.3465814 | 0.9700636 | 0.7972095 | 0.6217511 | 0.4091932 | 0.9176655 | 0.5162742 | 0.9226097 |
| 2 | 0.5816320 | 0.4909198 | 0.2028392 | 0.5309841 | 0.6555008 | 0.5126192 | 0.5311177 | 0.9629061 | 0.2981969 | 0.3878005 | 0.9733439 | 0.8122758 | 0.6598192 |
| 3 | 0.4817415 | 0.5459858 | 0.2772608 | 0.7543144 | 0.8276508 | 0.6809267 | 0.5951480 | 0.9763280 | 0.4703933 | 0.6083229 | 0.9923801 | 0.9525637 | 0.7464839 |
| 4 | 0.6498627 | 0.6041440 | 0.3268328 | 0.8979539 | 0.9237661 | 0.7981982 | 0.2347643 | 0.6835916 | 0.5006487 | 0.6957364 | 0.9372630 | 0.8361698 | 0.7934268 |
| 5 | 0.4851206 | 0.5355252 | 0.5362841 | 0.8610170 | 0.9558771 | 0.8516791 | 0.3456232 | 0.8049645 | 0.6390949 | 0.8198664 | 0.9568620 | 0.7705664 | 0.7200738 |
| 6 | 0.5439312 | 0.5831218 | 0.6403075 | 0.9158071 | 0.9655368 | 0.8967668 | 0.3505623 | 0.7255840 | 0.7754368 | 0.9016949 | 0.9820515 | 0.8383852 | 0.8260450 |
| 7 | 0.5890755 | 0.5354390 | 0.4975317 | 0.4074147 | 0.6865789 | 0.8361490 | 0.2499637 | 0.7680803 | 0.7302268 | 0.6814512 | 0.8528792 | 0.2979350 | 0.8744994 |
| 8 | 0.6923527 | 0.5936402 | 0.5955404 | 0.4147502 | 0.7743958 | 0.8793135 | 0.3275968 | 0.7657244 | 0.7208178 | 0.5861303 | 0.8916764 | 0.3422696 | 0.8054885 |
| 9 | 0.7071684 | 0.6727827 | 0.6726226 | 0.3645072 | 0.6789623 | 0.8541175 | 0.4379858 | 0.8357645 | 0.7734071 | 0.7040570 | 0.8648996 | 0.3589178 | 0.8710504 |
| 10 | 0.7770769 | 0.5864383 | 0.6684431 | 0.4137133 | 0.5438431 | 0.9021884 | 0.5167547 | 0.7648746 | 0.7629527 | 0.6736278 | 0.9090112 | 0.4397246 | 0.9101266 |
bind_rows(Granger_causal_test_before,Granger_causal_test_after,.id = "Period") %>%
mutate(Period=case_when(Period==1 ~ "Before", Period==2 ~ "After")) %>%
select(Lag,Brent,WTI,Gas,Period) %>%
tidyr::gather(Brent,WTI,Gas,key="Commodity",value="Pvalue") %>%
ggplot(.,aes(x=as.factor(Lag),y=Pvalue,color=Commodity,shape=Commodity))+
scale_color_manual(values=c("Blue","Red","Black"))+
geom_point(size=3,alpha=0.8)+
geom_line(aes(x=Lag,y=Pvalue,color=Commodity),alpha=0.8)+
ylim(0,1)+
geom_hline(yintercept = 0.05,color="black",linetype = 'dashed')+
geom_hline(yintercept = 0.1,color="red",linetype = 'dashed')+
labs(title = "Granger causality test results commodity prices as function of trump oil tweets",
x="Lag (Days)",
y="p-value of Granger causality test")+
facet_grid(.~Period)+
ggthemes::theme_igray()
From the Granger causality test, results, the mos notorious difference is that before 2016 and Donald Trump became president, for Lags of 1 to 10, there is not a significant lag on which causality of tweets on commodity prices could be claimed, however after he became president, for Brent and WTI oil, upper Lags (above 5) start to become significant and causality can be claimed. In the case of Henry hub gas prices, there is not causality at any tested lag.
bind_rows(Granger_causal_test_before,Granger_causal_test_after,.id = "Period") %>%
mutate(Period=case_when(Period==1 ~ "Before", Period==2 ~ "After")) %>%
select(Lag,Exxon,Chevron,Conoco,Period) %>%
tidyr::gather(Exxon,Chevron,Conoco,key="Stock",value="Pvalue") %>%
ggplot(.,aes(x=as.factor(Lag),y=Pvalue,color=Stock,shape=Stock))+
scale_color_manual(values=c("Blue","Red","Black"))+
geom_point(size=3,alpha=0.8)+
geom_line(aes(x=Lag,y=Pvalue,color=Stock),alpha=0.8)+
ylim(0,1)+
geom_hline(yintercept = 0.05,color="black",linetype = 'dashed')+
geom_hline(yintercept = 0.1,color="red",linetype = 'dashed')+
labs(title = "Granger causality test results Stock prices as function of trump oil tweets",
x="Lag (Days)",
y="p-value of Granger causality test")+
facet_grid(.~Period)+
ggthemes::theme_igray()
bind_rows(Granger_causal_test_before,Granger_causal_test_after,.id = "Period") %>%
mutate(Period=case_when(Period==1 ~ "Before", Period==2 ~ "After")) %>%
select(Lag,Shell,BP,Eni,Total,Period) %>%
tidyr::gather(Shell,BP,Eni,Total,key="Stock",value="Pvalue") %>%
ggplot(.,aes(x=as.factor(Lag),y=Pvalue,color=Stock,shape=Stock))+
scale_color_manual(values=c("Blue","Red","Black","darkgreen"))+
geom_point(size=3,alpha=0.8)+
geom_line(aes(x=Lag,y=Pvalue,color=Stock),alpha=0.8)+
ylim(0,1)+
geom_hline(yintercept = 0.05,color="black",linetype = 'dashed')+
geom_hline(yintercept = 0.1,color="red",linetype = 'dashed')+
labs(title = "Granger causality test results Stock prices as function of trump oil tweets",
x="Lag (Days)",
y="p-value of Granger causality test")+
facet_grid(.~Period)+
ggthemes::theme_igray()
bind_rows(Granger_causal_test_before,Granger_causal_test_after,.id = "Period") %>%
mutate(Period=case_when(Period==1 ~ "Before", Period==2 ~ "After")) %>%
select(Lag,Slb,Hall,Baker,Period) %>%
tidyr::gather(Slb,Hall,Baker,key="Stock",value="Pvalue") %>%
ggplot(.,aes(x=as.factor(Lag),y=Pvalue,color=Stock,shape=Stock))+
scale_color_manual(values=c("Blue","Red","Black"))+
geom_point(size=3,alpha=0.8)+
geom_line(aes(x=Lag,y=Pvalue,color=Stock),alpha=0.8)+
ylim(0,1)+
geom_hline(yintercept = 0.05,color="black",linetype = 'dashed')+
geom_hline(yintercept = 0.1,color="red",linetype = 'dashed')+
labs(title = "Granger causality test results Stock prices as function of trump oil tweets",
x="Lag (Days)",
y="p-value of Granger causality test")+
facet_grid(.~Period)+
ggthemes::theme_igray()
Only for oil commodities there is significant granger causality. There is no significant causality for stock prices of any major operator or service company with the exception of Conoco phillips on a 2-3 days lag.
load("trump_tweets_all.RData")
dim(trump_tweets)
## [1] 38454 7
names(trump_tweets)
## [1] "Tweets" "date" "RT"
## [4] "Month" "Day" "Year"
## [7] "oil_related_word"
This dataset is composed by 38454 tweets of Donald trump from the first of january 2009 until the 31 of may 2019
trump_tweets_oil<-trump_tweets %>% filter(oil_related_word==TRUE) %>%
select(Tweets) %>%
unnest_tokens(word, Tweets) %>%
mutate(word = str_extract(word, "[a-z']+")) %>%
anti_join(stop_words)
## Joining, by = "word"
trump_tweets_oil<-trump_tweets_oil[complete.cases(trump_tweets_oil),]
trump_tweets_no_oil<-trump_tweets %>% filter(oil_related_word==FALSE) %>%
select(Tweets) %>%
unnest_tokens(word, Tweets) %>%
mutate(word = str_extract(word, "[a-z']+")) %>%
anti_join(stop_words)
## Joining, by = "word"
trump_tweets_no_oil<-trump_tweets_no_oil[complete.cases(trump_tweets_no_oil),]
bind_rows(mutate(trump_tweets_oil, author = "Oil"),
mutate(trump_tweets_no_oil, author = "No_Oil")) %>%
count(author, word) %>%
group_by(author) %>%
mutate(proportion = n / sum(n)) %>%
select(-n) %>%
spread(author,proportion) %>%
ggplot(.,aes(x = No_Oil, y = Oil,color=No_Oil)) +
geom_jitter(alpha = 0.1, size = 2.5, width = 0.3, height = 0.3)+
scale_x_log10(labels = percent_format()) +
scale_y_log10(labels = percent_format()) +
geom_text(aes(label = word), check_overlap = TRUE, vjust = 1.5) +
geom_abline(color = "red", lty = 2) +
scale_color_gradient(limits = c(0, 0.001), low = "blue", high = "red") +
theme(legend.position="none") +
labs(y = "Frequency Oil related tweets", x = "Frequency No Oil related tweets")+
ggthemes::theme_tufte()
Words, such as collusion, prices, iraq, keystone and gallon tend to appear more on Oil related tweets compared to no oil related tweets
To perform the sentiment analysis a predefined dictinary of already classified words will be used, in this case the “bing” dictionary from the tidytext package
trump_tweets %>% unnest_tokens(word, Tweets) %>% inner_join(get_sentiments("bing")) %>% count(date,sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative,polarity=(positive - negative)/(positive + negative)) %>%
left_join(trump_oil,.) %>%
mutate(year=year(date)) %>%
group_by(Period,oil_keyword) %>%
summarise(positive=mean(positive,na.rm=TRUE),
negative=mean(negative,na.rm=TRUE),
sentiment=mean(sentiment,na.rm=TRUE),
polarity=mean(polarity,na.rm = TRUE)) %>% knitr::kable()
## Joining, by = "word"
## Joining, by = "date"
| Period | oil_keyword | positive | negative | sentiment | polarity |
|---|---|---|---|---|---|
| After | FALSE | 11.63658 | 7.121923 | 4.514654 | 0.3041282 |
| After | TRUE | 17.17808 | 13.663014 | 3.515068 | 0.1190517 |
| Before | FALSE | 15.69825 | 6.347368 | 9.350877 | 0.4532216 |
| Before | TRUE | 22.44979 | 9.797071 | 12.652720 | 0.3170058 |
trump_tweets %>% unnest_tokens(word, Tweets) %>% inner_join(get_sentiments("bing")) %>% count(date,sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative,polarity=(positive - negative)/(positive + negative)) %>%
left_join(trump_oil,.) %>%
mutate(year=year(date)) %>%
group_by(oil_keyword,year) %>%
summarise(positive=mean(positive,na.rm=TRUE),
negative=mean(negative,na.rm=TRUE),
sentiment=mean(sentiment,na.rm=TRUE),
polarity=mean(polarity,na.rm = TRUE)) %>%
ggplot(.,aes(x=as.factor(year),y=polarity,fill=oil_keyword))+
geom_col(color="blue")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
facet_wrap(.~oil_keyword)+
labs(title = "Trump Tweets polarity",
x="Year",
y="Tweet Polarity")
## Joining, by = "word"
## Joining, by = "date"
The previous table and graph shows that in general trump tweets have become more balanced and less polarized over the years, and specially for oil related tweets, and even more after he became the US president.
Using a mored advanced dictionary such as the “nrc” it is possible to even further categorize the sentimen of the tweets
trump_tweets %>% unnest_tokens(word, Tweets) %>% inner_join(get_sentiments("nrc")) %>% count(date,sentiment) %>%
spread(sentiment, n, fill = 0) %>%
left_join(trump_oil,.) %>%
mutate(year=year(date)) %>%
group_by(Period,oil_keyword) %>%
summarise(positive=mean(positive,na.rm=TRUE),
negative=mean(negative,na.rm=TRUE),
anger=mean(anger,na.rm=TRUE),
anticipation=mean(anticipation,na.rm = TRUE),
disgust=mean(disgust,na.rm = TRUE),
fear=mean(fear,na.rm=TRUE),
joy=mean(joy,na.rm=TRUE),
sadness=mean(sadness,na.rm=TRUE),
surprise=mean(surprise,na.rm=TRUE),
trust=mean(trust,na.rm=TRUE)) %>% knitr::kable()
## Joining, by = "word"
## Joining, by = "date"
| Period | oil_keyword | positive | negative | anger | anticipation | disgust | fear | joy | sadness | surprise | trust |
|---|---|---|---|---|---|---|---|---|---|---|---|
| After | FALSE | 11.83372 | 7.920375 | 4.373536 | 5.437939 | 2.688525 | 4.510539 | 4.485949 | 3.861827 | 3.296253 | 8.573771 |
| After | TRUE | 19.33973 | 14.989041 | 8.482192 | 8.980822 | 6.013699 | 8.342466 | 7.120548 | 7.008219 | 5.627397 | 14.945206 |
| Before | FALSE | 14.01532 | 6.791489 | 3.507234 | 7.145532 | 2.303830 | 3.661277 | 6.676596 | 3.616170 | 5.688511 | 9.171915 |
| Before | TRUE | 22.07741 | 10.751046 | 5.424686 | 10.719665 | 3.625523 | 5.790795 | 10.391213 | 5.535565 | 8.332636 | 14.395397 |
Oil related tweets after donald trump became president are categorized by high levels of negativity, anger, disgust, fear and sadness.