Mini Project - Eikon & Gapminder

Exercise 1 - Eikon

Introduction

We choose Eikon as the resource for our dataset. Eikon is a financial analysis and trading software created by Thomson Reuters (now part of Refinitiv). There is no official R package for Eikon, however, there are unofficial ones. We used a package called “eikonapir”, there is unfortunately not a lot of instructions of how to use it and some of the data was challaning to find. One can read more about Eikon here, and about the API here.

It is first of all necessary to download the packages below. The second step is to get your API key from the Eikon terminal.

#install.packages("usethis")
library(usethis)
#install.packages("devtools")
library(devtools)
#install.packages("tidyverse")
library(tidyverse)

## -- Attaching packages ---------------------------------- tidyverse 1.2.1 --

## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0

## -- Conflicts ------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

#install_github("ahmedmohamedali/eikonapir")
library(eikonapir)
eikonapir::set_proxy_port(9000L)
eikonapir::set_app_id('d10629fe740649939772b69970f322f48531ccdc')

The Data

We decided to look at static current market data from the 30 constituents in the Dow Jones Industrial Index. However, the data we were looking for were not available for United Healthcare, Travelers Companies, and JP Morgan. We substituted those with PepsiCo, Morgan Stanley, and Willis Towers Watson to create our own Dow 30*.

To analyze the stocks, we obtained different data points. We wanted to get a broad perspective, consequently, we chose one valuation metric (P/E ratio), one profitabilty mertric (operating margin), one measure of the size of the company (market capitalization), and one measure of the firms debt (debt to market capitalization).

The first three were obtained using the codes ‘TR.PE’, ‘TR.OperatingMargin’, and ‘TR.MarketCap’. To get the ratio of debt to market cap, we had to divide ‘TR.NetDebt’ by TR.MartketCap’.

dow30 <- list("AAPL.O","MSFT.O","PEP.O","V","JNJ","WMT","PG","XOM","HD","VZ","INTC.O","DIS","MS","KO","CVX","MRK","PFE","CSCO.O","BA","MCD","NKE","UTX","IBM","MMM","AXP", "CAT","GS", "WBA.O","DOW","WLTW.OQ")

fields_data <- list(TR_Field('TR.PE'),TR_Field('TR.OperatingMargin'),TR_Field('TR.NetDebt'), TR_Field('TR.CompanyMarketCap'))
stocks_frame <- get_data(dow30, fields_data)
print(stocks_frame)

##    Instrument P/E (Daily Time Series Ratio) Operating Margin, Percent
## 1          GS              10.4504043184194                  23.46847
## 2          MS              10.4634356192343                  22.38758
## 3         IBM               12.388214613324                  14.25035
## 4         PFE              13.5997115992996                  22.76921
## 5      INTC.O              13.6329777515425                  32.90989
## 6         CAT              14.2019069801296                  15.15478
## 7       WBA.O              14.5895072784707                   3.65175
## 8          VZ              15.2447329601817                  18.07998
## 9         AXP              15.7281881814161                  19.51231
## 10      PEP.O              16.2396348973733                  15.63539
## 11     CSCO.O              17.2582348575669                  27.39481
## 12        CVX              17.3763893138653                   9.73187
## 13        XOM              20.2430213081363                    7.4603
## 14        MMM              20.4460454901773                  21.99603
## 15        DIS              21.0084288399733                  21.59695
## 16        UTX              22.0292961878139                  12.85845
## 17     AAPL.O               22.101535785066                  24.57202
## 18        MRK              22.6145446040438                  19.62217
## 19         HD              23.3926109372899                  14.35265
## 20        JNJ              25.0442827153456                  22.06274
## 21        MCD              25.2602083758083                  41.96203
## 22        WMT              25.9206148529997                   3.33531
## 23    WLTW.OQ              27.9025125972646                   9.50311
## 24        DOW               29.257787843909                   9.79628
## 25         KO              29.2818555556814                  22.88109
## 26     MSFT.O              29.4414394215409                  34.11632
## 27        NKE              33.4257053010554                   12.1993
## 28          V               35.985785074328                  65.28703
## 29         BA              55.6714695991556                  11.85341
## 30         PG              73.8628270630128                    8.6106
##    Net Debt Incl. Pref.Stock & Min.Interest Company Market Cap
## 1                               4.05288e+11     78044325588.45
## 2                                4.0638e+11     79327482833.95
## 3                                3.3947e+10    120083582387.86
## 4                                2.3277e+10    204596503797.08
## 5                                1.4709e+10         2.5317e+11
## 6                                2.8736e+10     80876028353.58
## 7                                1.6454e+10     55980180677.12
## 8                               1.11883e+11    245334719277.88
## 9                               1.29258e+11     98920759188.02
## 10                               2.3412e+10    184037575909.24
## 11                               -8.768e+09    205274054149.43
## 12                               2.5202e+10    228719899860.48
## 13                               4.1488e+10    293511827798.42
## 14                               1.1516e+10      98241653900.2
## 15                               5.5543e+10    249635105838.82
## 16                               4.1658e+10     128100478006.6
## 17                                 7.49e+09      1163957699400
## 18                               1.6431e+10    214168186025.04
## 19                               2.7424e+10    255302084377.76
## 20                               1.0294e+10    345696439608.65
## 21                              30037700000    145557878049.28
## 22                               5.3586e+10     338811119609.6
## 23                                3.716e+09     24071382017.32
## 24                               1.8268e+10      40900914119.8
## 25                               3.0176e+10    221551049104.67
## 26                              -5.5067e+10   1121968442239.26
## 27                              -1184000000     139714183803.5
## 28                               1.0117e+10    390544291110.38
## 29                                5.354e+09    204225682631.04
## 30                               2.1118e+10    297436962964.96

stocksData <- data.frame(stocks_frame)

We then had to convert the data to from the factor form to the numeric form.

as.numeric.factor <- function(x){as.numeric(levels(x))[x]}

stockPE<-as.numeric.factor(stocksData$P.E..Daily.Time.Series.Ratio.)
operatingMargin<-as.numeric.factor(stocksData$Operating.Margin..Percent)
stocksNetDebt<-as.numeric.factor(stocksData$Net.Debt.Incl..Pref.Stock...Min.Interest)
stocksMC<-as.numeric.factor(stocksData$Company.Market.Cap)

And create a new data frame.

stockName <- as.character(stocksData$Instrument)
debtRatio <- stocksNetDebt/stocksMC
dow30Data <- data.frame(stockName,stockPE,operatingMargin,stocksNetDebt,stocksMC,debtRatio)
print(dow30Data)

##    stockName  stockPE operatingMargin stocksNetDebt     stocksMC
## 1         GS 10.45040        23.46847  405288000000 7.804433e+10
## 2         MS 10.46344        22.38758  406380000000 7.932748e+10
## 3        IBM 12.38821        14.25035   33947000000 1.200836e+11
## 4        PFE 13.59971        22.76921   23277000000 2.045965e+11
## 5     INTC.O 13.63298        32.90989   14709000000 2.531700e+11
## 6        CAT 14.20191        15.15478   28736000000 8.087603e+10
## 7      WBA.O 14.58951         3.65175   16454000000 5.598018e+10
## 8         VZ 15.24473        18.07998  111883000000 2.453347e+11
## 9        AXP 15.72819        19.51231  129258000000 9.892076e+10
## 10     PEP.O 16.23963        15.63539   23412000000 1.840376e+11
## 11    CSCO.O 17.25823        27.39481   -8768000000 2.052741e+11
## 12       CVX 17.37639         9.73187   25202000000 2.287199e+11
## 13       XOM 20.24302         7.46030   41488000000 2.935118e+11
## 14       MMM 20.44605        21.99603   11516000000 9.824165e+10
## 15       DIS 21.00843        21.59695   55543000000 2.496351e+11
## 16       UTX 22.02930        12.85845   41658000000 1.281005e+11
## 17    AAPL.O 22.10154        24.57202    7490000000 1.163958e+12
## 18       MRK 22.61454        19.62217   16431000000 2.141682e+11
## 19        HD 23.39261        14.35265   27424000000 2.553021e+11
## 20       JNJ 25.04428        22.06274   10294000000 3.456964e+11
## 21       MCD 25.26021        41.96203   30037700000 1.455579e+11
## 22       WMT 25.92061         3.33531   53586000000 3.388111e+11
## 23   WLTW.OQ 27.90251         9.50311    3716000000 2.407138e+10
## 24       DOW 29.25779         9.79628   18268000000 4.090091e+10
## 25        KO 29.28186        22.88109   30176000000 2.215510e+11
## 26    MSFT.O 29.44144        34.11632  -55067000000 1.121968e+12
## 27       NKE 33.42571        12.19930   -1184000000 1.397142e+11
## 28         V 35.98579        65.28703   10117000000 3.905443e+11
## 29        BA 55.67147        11.85341    5354000000 2.042257e+11
## 30        PG 73.86283         8.61060   21118000000 2.974370e+11
##       debtRatio
## 1   5.193048911
## 2   5.122814761
## 3   0.282694764
## 4   0.113770273
## 5   0.058099301
## 6   0.355309238
## 7   0.293925454
## 8   0.456042261
## 9   1.306682248
## 10  0.127213151
## 11 -0.042713630
## 12  0.110187177
## 13  0.141350351
## 14  0.117221154
## 15  0.222496751
## 16  0.325197850
## 17  0.006434942
## 18  0.076720078
## 19  0.107417846
## 20  0.029777570
## 21  0.206362585
## 22  0.158158918
## 23  0.154374186
## 24  0.446640384
## 25  0.136203372
## 26 -0.049080703
## 27 -0.008474444
## 28  0.025904872
## 29  0.026216096
## 30  0.070999918

Subsets and Operations

We created a subsets based on P/E ratio to see if there is a difference in profitabilty and indebtedness between “expensive” companies and “cheap” companies. The dow 30* was cut in half and created 2 subset with 15 companies each.

lowPE <- subset.data.frame(dow30Data,stockPE <= quantile(stockPE,.5))

highPE <- subset.data.frame(dow30Data,stockPE > quantile(stockPE,.5))

We calculated the basic statistics of the sample means and for the whole sample and structured them in a data frame. Furthermore, with the “-1” we ignore the companies RICs.

allCompaniesMeans <- lapply(dow30Data[,-1], mean)

lowPEmeans <- lapply(lowPE[,-1], mean) 
  
highPEmeans <- lapply(highPE[,-1], mean)

dow30means <- data.frame(rbind(allCompaniesMeans,lowPEmeans,highPEmeans))
print(dow30means)

##                    stockPE operatingMargin stocksNetDebt     stocksMC
## allCompaniesMeans 23.80211        19.63374   51258123333 250258683088
## lowPEmeans        15.52472        18.39998   87888333333 165050246637
## highPEmeans        32.0795         20.8675   14627913333 335467119538
##                   debtRatio
## allCompaniesMeans 0.5190332
## lowPEmeans        0.9238761
## highPEmeans       0.1141902

Graph

We created a graph to show the correlation between the P/E ratio and the Operating Margin. As one can see, there is not much correlation there.

plot(x = dow30Data$operatingMargin, y = dow30Data$stockPE, main = "Operating Margin vs PE ratio", xlab = "Operating Margin", ylab = "Stock PE")

Native data format

Eikon as a database provides data in multiple forms. Whether you get your data in wide or long format depends on how you download the data. In our case the original data was downloaded in wide format*. We used the wide format as it was easier to use in our calculations.

*You can see this above, when looking at the print of “stocks_frame”

Transforming the data format from tall to wide

Here we first download new data of the opening prices for Apple and Microsoft from three different dates.

aaplMsft <- list("AAPL.O","MSFT.O")
fields_dataOpen <- list("TIMESTAMP","OPEN")
aaplMsftOpen <- get_timeseries(aaplMsft, fields_dataOpen, start_date = "2016-01-10T12:00:00", end_date = "2019-01-10T12:00:00", interval = "yearly")
dataOpen <- data.frame(aaplMsftOpen)
print(dataOpen)

##              TIMESTAMP   OPEN    NA.
## 1 2016-12-31T00:00:00Z 102.61 AAPL.O
## 2 2017-12-31T00:00:00Z  115.8 AAPL.O
## 3 2018-12-31T00:00:00Z 170.16 AAPL.O
## 4 2016-12-31T00:00:00Z  54.32 MSFT.O
## 5 2017-12-31T00:00:00Z  62.79 MSFT.O
## 6 2018-12-31T00:00:00Z 86.125 MSFT.O

We can see now that the data is presented in tall format. Next we are going to transform the data into a wide format using spread() command.

wideData <- spread(dataOpen,key = TIMESTAMP,value = OPEN)
print(wideData)

##      NA. 2016-12-31T00:00:00Z 2017-12-31T00:00:00Z 2018-12-31T00:00:00Z
## 1 AAPL.O               102.61                115.8               170.16
## 2 MSFT.O                54.32                62.79               86.125

GAPMINDER.ORG

Introduction

Gapminder foundation is an independent and non-profit foundation founded in 2005 and based in Stockholm, Sweden. It is a resource based on facts about the global development and the millennium development goals. As quoted by gapminder.org, its mission is to fight “devastating ignorance with fact-based worldviews everyone can understand”. It also provides services in collaboration with universities, United Nations’ organizations, public agencies and non-governmental organizations.

The dataset chosen is called “gapminder” that is a data frame with 1704 rows and 6 variables. The variables include years from 1952 to 2007 with an interval of 5 years (code = year), 142 countries (code = country), 5 continents (code = continent), average life expectancy in years (code = lifeExp), population (code = pop) and Gross Domestic Product per capita (code = gdpPercap) of the countries.

Gapminder source

#install.packages("gapminder")
#install.packages("ggplot2")
#install.packages("tidyverse")
invisible(library(gapminder))
invisible(library(ggplot2))
invisible(library(tidyverse))

For our subset, we have used years from 1952 to 2002 to have a complete set of 50 years of data with intervals of 5 years.

demographicData <- subset(gapminder, year<=2002)

We are only comparing the GDP per capita and life expectancy data from year 2002.

gdpVsExpVsPop_2002 <- subset(demographicData, year==2002)

The mutate function is used in order to add a column in the subset to have the population of each country rounded down to 10^7 so that the sizes of the points (countries) on the graph scale down and can be shown in proportion to their relative population.

gdpVsExpVsPop_2002 <- mutate(gdpVsExpVsPop_2002, "Population (10^7)" = gdpVsExpVsPop_2002$pop/100000000)

Graph

The graph below illustrates a positive linear relationship between GDP per capita and average life expectancy of countries from five continents. The size of the circle shows the population of that country and its color illustrates the relevant continent it belongs to. For example the two biggest green circles on the graph show the two largest countries population-wise i.e China and India, with the yellow circle being the US, in the Americas.

ggplot(data=gdpVsExpVsPop_2002, aes(x=gdpVsExpVsPop_2002$gdpPercap, y=gdpVsExpVsPop_2002$lifeExp, color = continent, slope)) + 
  xlab("GDP per capita (US$, inflation-adjusted)") + 
  ylab("Life Expectancy") +
  guides(size=FALSE) + 
  geom_smooth(se=FALSE, method="lm", color="black") + 
  geom_point(size=gdpVsExpVsPop_2002$`Population (10^7)`) +
  ggtitle("Relationship between GDP per capita and Life Expectancy of countries") +
  theme_classic()

We are now trying to find the y-intercept which will tell us the life expectancy of the countries with 0 GDP per capita and we use the cor function to find the correlation between GDP per capita and life expectancy. Note that a positive number represents that the two variables move in the same direction.

lm(formula = lifeExp ~ gdpPercap, data = gdpVsExpVsPop_2002)

## 
## Call:
## lm(formula = lifeExp ~ gdpPercap, data = gdpVsExpVsPop_2002)
## 
## Coefficients:
## (Intercept)    gdpPercap  
##   5.825e+01    7.507e-04

gdp_lifeExp_cor <- cor(x = gdpVsExpVsPop_2002$gdpPercap, y = gdpVsExpVsPop_2002$lifeExp)
print(gdp_lifeExp_cor)

## [1] 0.6818578

Subsets and Operations

Mean increase in the population of the world from 1952 to 2002:

pop_1952 <- subset(demographicData, year==1952)
mean_pop_1952 <- mean(pop_1952$pop)
print(mean_pop_1952)

## [1] 16950402

pop_1962 <- subset(demographicData, year==1962)
mean_pop_1962 <- mean(pop_1962$pop)
print(mean_pop_1962)

## [1] 20421007

pop_1972 <- subset(demographicData, year==1972)
mean_pop_1972 <- mean(pop_1972$pop)
print(mean_pop_1972)

## [1] 25189980

pop_1982 <- subset(demographicData, year==1982)
mean_pop_1982 <- mean(pop_1982$pop)
print(mean_pop_1982)

## [1] 30207302

pop_1992 <- subset(demographicData, year==1992)
mean_pop_1992 <- mean(pop_1992$pop)
print(mean_pop_1992)

## [1] 35990917

pop_2002 <- subset(demographicData, year==2002)
mean_pop_2002 <- mean(pop_2002$pop)
print(mean_pop_2002)

## [1] 41457589

We create a data frame and present the mean population in a table.

years <- c(1952, 1962, 1972, 1982, 1992, 2002)
mean_pop_world <- c(mean_pop_1952, mean_pop_1962, mean_pop_1972, mean_pop_1982, mean_pop_1992, mean_pop_2002)
mean_pop_data <- data.frame(years, mean_pop_world)

colnames(mean_pop_data) <- c("Years", "Mean population of the world")

#install.packages("knitr")
library(knitr)

kable(mean_pop_data, caption = "Mean population of the world from 1952 to 2002")

Mean population of the world from 1952 to 2002
Years	Mean population of the world
1952	16950402
1962	20421007
1972	25189980
1982	30207302
1992	35990917
2002	41457589

Now we calculate the mean increase population of the world from 1952 to 2002 with intervals of 10 years in between.

mean_increase_pop <- mean(mean_pop_1952, mean_pop_1962, mean_pop_1972, mean_pop_1982, mean_pop_1992, mean_pop_2002)
print(mean_increase_pop)

## [1] 16950402

Native data format

The native data format of the source is long because every 5 years for each country from 1952 to 2007 are listed vertically. It is easier to work with long data format than wide because three-dimensional data is presented in two dimensions such as vertical and horizontal dimensions.

Mini Project - Eikon & Gapminder

Simon Jonsson, Okko Takkinen, Nabeel Asghar

11/12/2019

Exercise 1 - Eikon

Introduction

The Data

Subsets and Operations

Graph

Native data format

Transforming the data format from tall to wide

GAPMINDER.ORG

Introduction

Graph

Subsets and Operations

Native data format