We choose Eikon as the resource for our dataset. Eikon is a financial analysis and trading software created by Thomson Reuters (now part of Refinitiv). There is no official R package for Eikon, however, there are unofficial ones. We used a package called “eikonapir”, there is unfortunately not a lot of instructions of how to use it and some of the data was challaning to find. One can read more about Eikon here, and about the API here.
It is first of all necessary to download the packages below. The second step is to get your API key from the Eikon terminal.
#install.packages("usethis")
library(usethis)
#install.packages("devtools")
library(devtools)
#install.packages("tidyverse")
library(tidyverse)
## -- Attaching packages ---------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
#install_github("ahmedmohamedali/eikonapir")
library(eikonapir)
eikonapir::set_proxy_port(9000L)
eikonapir::set_app_id('d10629fe740649939772b69970f322f48531ccdc')
We decided to look at static current market data from the 30 constituents in the Dow Jones Industrial Index. However, the data we were looking for were not available for United Healthcare, Travelers Companies, and JP Morgan. We substituted those with PepsiCo, Morgan Stanley, and Willis Towers Watson to create our own Dow 30*.
To analyze the stocks, we obtained different data points. We wanted to get a broad perspective, consequently, we chose one valuation metric (P/E ratio), one profitabilty mertric (operating margin), one measure of the size of the company (market capitalization), and one measure of the firms debt (debt to market capitalization).
The first three were obtained using the codes ‘TR.PE’, ‘TR.OperatingMargin’, and ‘TR.MarketCap’. To get the ratio of debt to market cap, we had to divide ‘TR.NetDebt’ by TR.MartketCap’.
dow30 <- list("AAPL.O","MSFT.O","PEP.O","V","JNJ","WMT","PG","XOM","HD","VZ","INTC.O","DIS","MS","KO","CVX","MRK","PFE","CSCO.O","BA","MCD","NKE","UTX","IBM","MMM","AXP", "CAT","GS", "WBA.O","DOW","WLTW.OQ")
fields_data <- list(TR_Field('TR.PE'),TR_Field('TR.OperatingMargin'),TR_Field('TR.NetDebt'), TR_Field('TR.CompanyMarketCap'))
stocks_frame <- get_data(dow30, fields_data)
print(stocks_frame)
## Instrument P/E (Daily Time Series Ratio) Operating Margin, Percent
## 1 GS 10.4504043184194 23.46847
## 2 MS 10.4634356192343 22.38758
## 3 IBM 12.388214613324 14.25035
## 4 PFE 13.5997115992996 22.76921
## 5 INTC.O 13.6329777515425 32.90989
## 6 CAT 14.2019069801296 15.15478
## 7 WBA.O 14.5895072784707 3.65175
## 8 VZ 15.2447329601817 18.07998
## 9 AXP 15.7281881814161 19.51231
## 10 PEP.O 16.2396348973733 15.63539
## 11 CSCO.O 17.2582348575669 27.39481
## 12 CVX 17.3763893138653 9.73187
## 13 XOM 20.2430213081363 7.4603
## 14 MMM 20.4460454901773 21.99603
## 15 DIS 21.0084288399733 21.59695
## 16 UTX 22.0292961878139 12.85845
## 17 AAPL.O 22.101535785066 24.57202
## 18 MRK 22.6145446040438 19.62217
## 19 HD 23.3926109372899 14.35265
## 20 JNJ 25.0442827153456 22.06274
## 21 MCD 25.2602083758083 41.96203
## 22 WMT 25.9206148529997 3.33531
## 23 WLTW.OQ 27.9025125972646 9.50311
## 24 DOW 29.257787843909 9.79628
## 25 KO 29.2818555556814 22.88109
## 26 MSFT.O 29.4414394215409 34.11632
## 27 NKE 33.4257053010554 12.1993
## 28 V 35.985785074328 65.28703
## 29 BA 55.6714695991556 11.85341
## 30 PG 73.8628270630128 8.6106
## Net Debt Incl. Pref.Stock & Min.Interest Company Market Cap
## 1 4.05288e+11 78044325588.45
## 2 4.0638e+11 79327482833.95
## 3 3.3947e+10 120083582387.86
## 4 2.3277e+10 204596503797.08
## 5 1.4709e+10 2.5317e+11
## 6 2.8736e+10 80876028353.58
## 7 1.6454e+10 55980180677.12
## 8 1.11883e+11 245334719277.88
## 9 1.29258e+11 98920759188.02
## 10 2.3412e+10 184037575909.24
## 11 -8.768e+09 205274054149.43
## 12 2.5202e+10 228719899860.48
## 13 4.1488e+10 293511827798.42
## 14 1.1516e+10 98241653900.2
## 15 5.5543e+10 249635105838.82
## 16 4.1658e+10 128100478006.6
## 17 7.49e+09 1163957699400
## 18 1.6431e+10 214168186025.04
## 19 2.7424e+10 255302084377.76
## 20 1.0294e+10 345696439608.65
## 21 30037700000 145557878049.28
## 22 5.3586e+10 338811119609.6
## 23 3.716e+09 24071382017.32
## 24 1.8268e+10 40900914119.8
## 25 3.0176e+10 221551049104.67
## 26 -5.5067e+10 1121968442239.26
## 27 -1184000000 139714183803.5
## 28 1.0117e+10 390544291110.38
## 29 5.354e+09 204225682631.04
## 30 2.1118e+10 297436962964.96
stocksData <- data.frame(stocks_frame)
We then had to convert the data to from the factor form to the numeric form.
as.numeric.factor <- function(x){as.numeric(levels(x))[x]}
stockPE<-as.numeric.factor(stocksData$P.E..Daily.Time.Series.Ratio.)
operatingMargin<-as.numeric.factor(stocksData$Operating.Margin..Percent)
stocksNetDebt<-as.numeric.factor(stocksData$Net.Debt.Incl..Pref.Stock...Min.Interest)
stocksMC<-as.numeric.factor(stocksData$Company.Market.Cap)
And create a new data frame.
stockName <- as.character(stocksData$Instrument)
debtRatio <- stocksNetDebt/stocksMC
dow30Data <- data.frame(stockName,stockPE,operatingMargin,stocksNetDebt,stocksMC,debtRatio)
print(dow30Data)
## stockName stockPE operatingMargin stocksNetDebt stocksMC
## 1 GS 10.45040 23.46847 405288000000 7.804433e+10
## 2 MS 10.46344 22.38758 406380000000 7.932748e+10
## 3 IBM 12.38821 14.25035 33947000000 1.200836e+11
## 4 PFE 13.59971 22.76921 23277000000 2.045965e+11
## 5 INTC.O 13.63298 32.90989 14709000000 2.531700e+11
## 6 CAT 14.20191 15.15478 28736000000 8.087603e+10
## 7 WBA.O 14.58951 3.65175 16454000000 5.598018e+10
## 8 VZ 15.24473 18.07998 111883000000 2.453347e+11
## 9 AXP 15.72819 19.51231 129258000000 9.892076e+10
## 10 PEP.O 16.23963 15.63539 23412000000 1.840376e+11
## 11 CSCO.O 17.25823 27.39481 -8768000000 2.052741e+11
## 12 CVX 17.37639 9.73187 25202000000 2.287199e+11
## 13 XOM 20.24302 7.46030 41488000000 2.935118e+11
## 14 MMM 20.44605 21.99603 11516000000 9.824165e+10
## 15 DIS 21.00843 21.59695 55543000000 2.496351e+11
## 16 UTX 22.02930 12.85845 41658000000 1.281005e+11
## 17 AAPL.O 22.10154 24.57202 7490000000 1.163958e+12
## 18 MRK 22.61454 19.62217 16431000000 2.141682e+11
## 19 HD 23.39261 14.35265 27424000000 2.553021e+11
## 20 JNJ 25.04428 22.06274 10294000000 3.456964e+11
## 21 MCD 25.26021 41.96203 30037700000 1.455579e+11
## 22 WMT 25.92061 3.33531 53586000000 3.388111e+11
## 23 WLTW.OQ 27.90251 9.50311 3716000000 2.407138e+10
## 24 DOW 29.25779 9.79628 18268000000 4.090091e+10
## 25 KO 29.28186 22.88109 30176000000 2.215510e+11
## 26 MSFT.O 29.44144 34.11632 -55067000000 1.121968e+12
## 27 NKE 33.42571 12.19930 -1184000000 1.397142e+11
## 28 V 35.98579 65.28703 10117000000 3.905443e+11
## 29 BA 55.67147 11.85341 5354000000 2.042257e+11
## 30 PG 73.86283 8.61060 21118000000 2.974370e+11
## debtRatio
## 1 5.193048911
## 2 5.122814761
## 3 0.282694764
## 4 0.113770273
## 5 0.058099301
## 6 0.355309238
## 7 0.293925454
## 8 0.456042261
## 9 1.306682248
## 10 0.127213151
## 11 -0.042713630
## 12 0.110187177
## 13 0.141350351
## 14 0.117221154
## 15 0.222496751
## 16 0.325197850
## 17 0.006434942
## 18 0.076720078
## 19 0.107417846
## 20 0.029777570
## 21 0.206362585
## 22 0.158158918
## 23 0.154374186
## 24 0.446640384
## 25 0.136203372
## 26 -0.049080703
## 27 -0.008474444
## 28 0.025904872
## 29 0.026216096
## 30 0.070999918
We created a subsets based on P/E ratio to see if there is a difference in profitabilty and indebtedness between “expensive” companies and “cheap” companies. The dow 30* was cut in half and created 2 subset with 15 companies each.
lowPE <- subset.data.frame(dow30Data,stockPE <= quantile(stockPE,.5))
highPE <- subset.data.frame(dow30Data,stockPE > quantile(stockPE,.5))
We calculated the basic statistics of the sample means and for the whole sample and structured them in a data frame. Furthermore, with the “-1” we ignore the companies RICs.
allCompaniesMeans <- lapply(dow30Data[,-1], mean)
lowPEmeans <- lapply(lowPE[,-1], mean)
highPEmeans <- lapply(highPE[,-1], mean)
dow30means <- data.frame(rbind(allCompaniesMeans,lowPEmeans,highPEmeans))
print(dow30means)
## stockPE operatingMargin stocksNetDebt stocksMC
## allCompaniesMeans 23.80211 19.63374 51258123333 250258683088
## lowPEmeans 15.52472 18.39998 87888333333 165050246637
## highPEmeans 32.0795 20.8675 14627913333 335467119538
## debtRatio
## allCompaniesMeans 0.5190332
## lowPEmeans 0.9238761
## highPEmeans 0.1141902
We created a graph to show the correlation between the P/E ratio and the Operating Margin. As one can see, there is not much correlation there.
plot(x = dow30Data$operatingMargin, y = dow30Data$stockPE, main = "Operating Margin vs PE ratio", xlab = "Operating Margin", ylab = "Stock PE")
Eikon as a database provides data in multiple forms. Whether you get your data in wide or long format depends on how you download the data. In our case the original data was downloaded in wide format*. We used the wide format as it was easier to use in our calculations.
*You can see this above, when looking at the print of “stocks_frame”
Here we first download new data of the opening prices for Apple and Microsoft from three different dates.
aaplMsft <- list("AAPL.O","MSFT.O")
fields_dataOpen <- list("TIMESTAMP","OPEN")
aaplMsftOpen <- get_timeseries(aaplMsft, fields_dataOpen, start_date = "2016-01-10T12:00:00", end_date = "2019-01-10T12:00:00", interval = "yearly")
dataOpen <- data.frame(aaplMsftOpen)
print(dataOpen)
## TIMESTAMP OPEN NA.
## 1 2016-12-31T00:00:00Z 102.61 AAPL.O
## 2 2017-12-31T00:00:00Z 115.8 AAPL.O
## 3 2018-12-31T00:00:00Z 170.16 AAPL.O
## 4 2016-12-31T00:00:00Z 54.32 MSFT.O
## 5 2017-12-31T00:00:00Z 62.79 MSFT.O
## 6 2018-12-31T00:00:00Z 86.125 MSFT.O
We can see now that the data is presented in tall format. Next we are going to transform the data into a wide format using spread() command.
wideData <- spread(dataOpen,key = TIMESTAMP,value = OPEN)
print(wideData)
## NA. 2016-12-31T00:00:00Z 2017-12-31T00:00:00Z 2018-12-31T00:00:00Z
## 1 AAPL.O 102.61 115.8 170.16
## 2 MSFT.O 54.32 62.79 86.125
Gapminder foundation is an independent and non-profit foundation founded in 2005 and based in Stockholm, Sweden. It is a resource based on facts about the global development and the millennium development goals. As quoted by gapminder.org, its mission is to fight “devastating ignorance with fact-based worldviews everyone can understand”. It also provides services in collaboration with universities, United Nations’ organizations, public agencies and non-governmental organizations.
The dataset chosen is called “gapminder” that is a data frame with 1704 rows and 6 variables. The variables include years from 1952 to 2007 with an interval of 5 years (code = year), 142 countries (code = country), 5 continents (code = continent), average life expectancy in years (code = lifeExp), population (code = pop) and Gross Domestic Product per capita (code = gdpPercap) of the countries.
#install.packages("gapminder")
#install.packages("ggplot2")
#install.packages("tidyverse")
invisible(library(gapminder))
invisible(library(ggplot2))
invisible(library(tidyverse))
For our subset, we have used years from 1952 to 2002 to have a complete set of 50 years of data with intervals of 5 years.
demographicData <- subset(gapminder, year<=2002)
We are only comparing the GDP per capita and life expectancy data from year 2002.
gdpVsExpVsPop_2002 <- subset(demographicData, year==2002)
The mutate function is used in order to add a column in the subset to have the population of each country rounded down to 10^7 so that the sizes of the points (countries) on the graph scale down and can be shown in proportion to their relative population.
gdpVsExpVsPop_2002 <- mutate(gdpVsExpVsPop_2002, "Population (10^7)" = gdpVsExpVsPop_2002$pop/100000000)
The graph below illustrates a positive linear relationship between GDP per capita and average life expectancy of countries from five continents. The size of the circle shows the population of that country and its color illustrates the relevant continent it belongs to. For example the two biggest green circles on the graph show the two largest countries population-wise i.e China and India, with the yellow circle being the US, in the Americas.
ggplot(data=gdpVsExpVsPop_2002, aes(x=gdpVsExpVsPop_2002$gdpPercap, y=gdpVsExpVsPop_2002$lifeExp, color = continent, slope)) +
xlab("GDP per capita (US$, inflation-adjusted)") +
ylab("Life Expectancy") +
guides(size=FALSE) +
geom_smooth(se=FALSE, method="lm", color="black") +
geom_point(size=gdpVsExpVsPop_2002$`Population (10^7)`) +
ggtitle("Relationship between GDP per capita and Life Expectancy of countries") +
theme_classic()
We are now trying to find the y-intercept which will tell us the life expectancy of the countries with 0 GDP per capita and we use the cor function to find the correlation between GDP per capita and life expectancy. Note that a positive number represents that the two variables move in the same direction.
lm(formula = lifeExp ~ gdpPercap, data = gdpVsExpVsPop_2002)
##
## Call:
## lm(formula = lifeExp ~ gdpPercap, data = gdpVsExpVsPop_2002)
##
## Coefficients:
## (Intercept) gdpPercap
## 5.825e+01 7.507e-04
gdp_lifeExp_cor <- cor(x = gdpVsExpVsPop_2002$gdpPercap, y = gdpVsExpVsPop_2002$lifeExp)
print(gdp_lifeExp_cor)
## [1] 0.6818578
Mean increase in the population of the world from 1952 to 2002:
pop_1952 <- subset(demographicData, year==1952)
mean_pop_1952 <- mean(pop_1952$pop)
print(mean_pop_1952)
## [1] 16950402
pop_1962 <- subset(demographicData, year==1962)
mean_pop_1962 <- mean(pop_1962$pop)
print(mean_pop_1962)
## [1] 20421007
pop_1972 <- subset(demographicData, year==1972)
mean_pop_1972 <- mean(pop_1972$pop)
print(mean_pop_1972)
## [1] 25189980
pop_1982 <- subset(demographicData, year==1982)
mean_pop_1982 <- mean(pop_1982$pop)
print(mean_pop_1982)
## [1] 30207302
pop_1992 <- subset(demographicData, year==1992)
mean_pop_1992 <- mean(pop_1992$pop)
print(mean_pop_1992)
## [1] 35990917
pop_2002 <- subset(demographicData, year==2002)
mean_pop_2002 <- mean(pop_2002$pop)
print(mean_pop_2002)
## [1] 41457589
We create a data frame and present the mean population in a table.
years <- c(1952, 1962, 1972, 1982, 1992, 2002)
mean_pop_world <- c(mean_pop_1952, mean_pop_1962, mean_pop_1972, mean_pop_1982, mean_pop_1992, mean_pop_2002)
mean_pop_data <- data.frame(years, mean_pop_world)
colnames(mean_pop_data) <- c("Years", "Mean population of the world")
#install.packages("knitr")
library(knitr)
kable(mean_pop_data, caption = "Mean population of the world from 1952 to 2002")
| Years | Mean population of the world |
|---|---|
| 1952 | 16950402 |
| 1962 | 20421007 |
| 1972 | 25189980 |
| 1982 | 30207302 |
| 1992 | 35990917 |
| 2002 | 41457589 |
Now we calculate the mean increase population of the world from 1952 to 2002 with intervals of 10 years in between.
mean_increase_pop <- mean(mean_pop_1952, mean_pop_1962, mean_pop_1972, mean_pop_1982, mean_pop_1992, mean_pop_2002)
print(mean_increase_pop)
## [1] 16950402
The native data format of the source is long because every 5 years for each country from 1952 to 2007 are listed vertically. It is easier to work with long data format than wide because three-dimensional data is presented in two dimensions such as vertical and horizontal dimensions.