To insert chunk go to CODE on the top toolbar or press Ctrl+Alt+I

You can visualize the output of the code in the console (bottom of the window). Also, the created variables are listed in the environment (top-right pane). The bottom-right pane shows the files, plots, installed packages, help, and viewer tabs.

Downloading and visualizing online financial data

We will use the quantmod R package to download online real financial data from Yahoo Finance. This package has the getSymbols() function, which downloads stock prices from the Internet. Install the quantmod package using the right-hand side window of RStudio in the “Package” tab.

This package will be in your computer forever. However, every time you use the package, you need to load it in memory. Use the library function to load the package: library(quantmod)

library(quantmod)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

getSymbols() downloads online and up-to-date financial data, such as stock prices, ETF prices, interest rates, exchange rates, etc. getSymbols() allows to download this data from multiple sources: Yahoo Finance, FRED, Oanda, and Tingo.

To download real historical monthly data of 2 US stocks (AstraZeneca and Pfizer) and S&P500 market index (^GSPC) from Jan 2018 to Dec 2021:

# We define a vector for the tickers:
tickers<-c("^GSPC","PFE","AZN")

# We download the historical prices for these tickers:
getSymbols(Symbols=tickers, from="2018-01-01", to = "2021-12-01", 
           src="yahoo", periodicity="monthly")
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## [1] "^GSPC" "PFE"   "AZN"

This function will create an xts-zoo R object for each ticker. Each object has the corresponding historical monthly prices. xts stands for extensible time-series. An xts-zoo object is designed to easily manipulate time series data.

The src argument indicates the source of the data, in this case it is Yahoo Finance. The periodicity argument specifies the granularity of the data (daily, weekly, monthly, quarterly).

You can view the FIRST 5 rows of the S&P500 index by using head() function:

head(GSPC,5)
##            GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
## 2018-01-01   2683.73   2872.87  2682.36    2823.81 76860120000       2823.81
## 2018-02-01   2816.45   2835.96  2532.69    2713.83 79579410000       2713.83
## 2018-03-01   2715.22   2801.90  2585.89    2640.87 76369800000       2640.87
## 2018-04-01   2633.45   2717.49  2553.80    2648.05 69648590000       2648.05
## 2018-05-01   2642.96   2742.24  2594.62    2705.27 75617280000       2705.27

Yahoo Finance keeps track of the open, high, low, close (OHLC) and adjusted prices for each period. Also, it keeps track of volume that was traded in every specific period. The adjusted prices are used for stocks, not for currencies. Adjusted prices considers dividend payments and also stock splits.

To visualize how the S&P500 index has changed over time:

plot(GSPC$GSPC.Adjusted)

Function to better visualize not only price over time, but also the volume traded each month:

chartSeries(GSPC, theme = ("white"))

Data management: merging and cleaning datasets

xts objects/datasets are easy to merge into 1 integrated dataset, so we can apply several functions or calculations to only one dataset instead of doing the same process to each dataset.

To do this, we use the following function.

prices<-merge(GSPC,AZN,PFE)

To calculate returns we MUST use adjusted prices to consider any historical stock split or dividend payments of the stocks.

To do this, we use the following function.

adjprices<-Ad(prices)

We can rename the columns with the ticker names:

colnames(adjprices) <- c("GSPC","AZN","PFE")

We can see the first rows of the 3 adjusted prices:

head(adjprices)
##               GSPC      AZN      PFE
## 2018-01-01 2823.81 30.85605 30.01973
## 2018-02-01 2713.83 29.21028 29.42809
## 2018-03-01 2640.87 31.64988 29.02999
## 2018-04-01 2648.05 32.15671 29.94612
## 2018-05-01 2705.27 33.51430 29.38990
## 2018-06-01 2718.37 31.77659 29.96587

Return calculation

Simple and continuously compounded (cc) return

A financial simple return for a stock (Rt) is calculated as a percentage change of price from the previous period (t-1) to the present period (t): Rt = (Adjpricet−Adjpricet−1)/Adjpricet−1 = (Adjpricet/Adjpricet−1) -1

Example: if the adjusted price of a stock at the end of January 2021 was 100.00 USD, and its previous (December 2020) adjusted price was 80.00 USD, then the monthly simple return of the stock in January 2021 will be:

    RJan2021 = (AdpriceJan2021/AdpriceDec2020) -1
             = (100/80) -1 = 0.25

We can use returns in decimal or in percentage.

In Finance it is very recommended to calculate continuously compounded returns (cc returns) and using cc returns instead of simple returns for data analysis, statistics and econometric models.

One way to calculate cc returns is by subtracting the log of the current adjusted price (at t) minus the log of the previous adjusted price (at t-1):

    rt=log(Adjpricet)−log(Adjpricet−1)
    

This is also called as the difference of the log of the price.

We can also calculate cc returns as the log of the current adjusted price (at t) divided by the previous adjusted price (at t-1):

    rt=log(Adjpricet/Adjpricet−1)
    

cc returns are usually represented by small r, while simple returns are represented by capital R.

Return calculation

We have historical monthly adjusted prices for each stock. We will first start calculating returns for the S&P 500 index. We can use the log function to calculate the natural logarithm, and the lag function to get the previous value of the adjusted prices. Let’s calculate the monthly simple and cc returns for the S&P 500 in a new dataset:

# We calculate the simple returns of the S&P 500:
SP500_R = adjprices$GSPC / lag(adjprices$GSPC,n=1) - 1

# We calculate the cc returns of the S&P 500
SP500_r = log(adjprices$GSPC) - log(lag(adjprices$GSPC,n=1) )

# We can also do the same calculation of cc returns using the
#  second formula:
SP500_r_2 = log(adjprices$GSPC/lag(adjprices$GSPC,n=1) )

We calculated cc returns using two formulas and we got exactly the same result. The first formula gets the first difference of the log of the price. The first difference refers to the current value of a time series minus its value of the previous period.

We can simplify the calculation of cc returns by using the function diff(), which calculates the first difference of any time series xts dataset.

SP500_r <- na.omit(diff(log(adjprices$GSPC)))
# We calculate cc returns for the integrated adjprices dataset:
ccr = na.omit(diff(log(adjprices)))

# We calculate simple returns for the integrated dataset:
R = adjprices / lag(adjprices,n=1) - 1

To visualize the monthly returns of the S&P500 over time:

plot(ccr$GSPC)

Since all returns are integrated in one dataset, we can also see all returns in one plot:

plot(ccr)

Descriptive statistics

Calculate the mean, standard deviation and variance of continuously compounded (cc) monthly returns using the summary command (Do the same with all stocks):

summary(ccr$GSPC)
##      Index                 GSPC         
##  Min.   :2018-02-01   Min.   :-0.13367  
##  1st Qu.:2019-01-08   1st Qu.:-0.01049  
##  Median :2019-12-16   Median : 0.01922  
##  Mean   :2019-12-16   Mean   : 0.01045  
##  3rd Qu.:2020-11-23   3rd Qu.: 0.03618  
##  Max.   :2021-11-01   Max.   : 0.11942

Summary() does not show standard deviation or variance. You can also try the table.Stats() function. However, you must install and load the Performance Analytics package. Go to the package tab in the right hand side windows and install the PerformanceAnalytics package.

After you install the package, you need to load it in memory:

library(PerformanceAnalytics)
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked _by_ '.GlobalEnv':
## 
##     prices
## The following object is masked from 'package:graphics':
## 
##     legend
#Now you can use the table.Stats function:
table.Stats(ccr$GSPC)
#This function calculates the most common measures of central tendency and dispersion. As central tendency it calculates median, arithmetic and geometric mean. As dispersion measures it calculates the minimum, the maximum values, quartile 1, quartile 3, standard deviation, and variance.

The histogram

A histogram is a plot that shows how many times a range of values of a variable appear in the data.

Do a histogram of returns (Do the same with all stocks).

hist(ccr$GSPC, main="Histogram of S&P 500 monthly returns", 
     xlab="Continuously Compounded returns", col="dark green")

#Introduction to hypothesis testing

The purpose of hypothesis testing is to show statistical evidence (based on data) that your belief is very likely to be true. Depending on this belief, we can make the hypothesis testing more sophisticated. Then, there are different types of hypothesis testing depending on the belief I want to support.

We start with the simple case of hypothesis testing: the One-Sample t-test. We will learn about this with an example:

Here is an example of a t-test to check whether the S&P 500 has an average monthly returns significantly greater than zero:

# a)
# H0: mean(ccr$GSPC) = 0
# Ha: mean(ccr$GSPC) <> 0

# b)
se_GSPC.r <- sd(ccr$GSPC) / sqrt(nrow(ccr$GSPC) )
print(paste("Standard error S&P 500 =" , se_GSPC.r))
## [1] "Standard error S&P 500 = 0.00738590373406946"
# c)
t_GSPC.r <- (mean(ccr$GSPC) - 0) / se_GSPC.r
print(paste("t-value S&P 500 = ", t_GSPC.r))
## [1] "t-value S&P 500 =  1.41506181676838"
#Since the t-value of the mean return of S&P 500 is lower than 2, I can’t reject the null hypothesis. Therefore, S&P 500 mean return is not statistically different than 0.

R function that does the same we did, but faster:

ttest_GSPC.r <- t.test(as.numeric(ccr$GSPC), alternative = "greater")
ttest_GSPC.r
## 
##  One Sample t-test
## 
## data:  as.numeric(ccr$GSPC)
## t = 1.4151, df = 45, p-value = 0.08197
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  -0.001952579          Inf
## sample estimates:
##  mean of x 
## 0.01045151
#It gets the same result

#Challenge 1

Write R code to plot the histogram of returns of one of the stocks (PFZ or AZN) and WRITE A MEANINGFUL INTERPRETATION THIS HISTOGRAM. It is recommended to read the note “Basic Statistics for Finance” posted in Canvas before you write your own interpretation.

hist(ccr$PFE, main="Histogram of Pfizer monthly returns", 
     xlab="Continuously Compounded returns", col="light blue")

With the information displayed from this histogram, it can be seen that Pfizer has shown more frequently negative monthly returns on its stocks, going between -0.05 and 0, which at first might seem as a bad investment option overall, although with a deeper analysis on the histogram, we can see than more often than not, the company can offer positive monthly returns, especially between 0 and 0.1.

In a few words, we can say that Pfizer is slightly more prone to offer positive monthly returns on its stocks (probably between 0 and 0.1), and when the company presents a decrease in the monthly returns of its stocks, it will probably be a loss between 0 and -0.05 of the returns.

#Challenge 2

Select 1 stock you want to further analyze. run a t-test to check whether the average monthly returns over time is significantly different than zero. You have to do the calculations MANUALLY and then use the t-test function in R. You have to INTERPRET your results. As comments in your .r file, explain your calculations.

You have to:

  1. WRITE THE NULL AND THE ALTERNATIVE HYPOTHESIS

  2. Calculate the Standard error, which is the standard deviation of the MEAN of returns.

  3. Calculate the t-statistic. EXPLAIN/INTERPRET THE VALUE OF t YOU GOT.

  4. WRITE YOUR CONCLUSION OF THE t-TEST

# a)
# H0: mean(ccr$PFE) = 0
# Ha: mean(ccr$PFE) <> 0

# b)
se_PFE.r <- sd(ccr$PFE) / sqrt(nrow(ccr$PFE) )
print(paste("Standard error Pfizer =" , se_PFE.r))
## [1] "Standard error Pfizer = 0.0106360417880266"
# c)
t_PFE.r <- (mean(ccr$PFE) - 0) / se_PFE.r
print(paste("t-value Pfizer = ", t_PFE.r))
## [1] "t-value Pfizer =  1.15645076597965"
#Simple method
ttest_PFE.r <- t.test(as.numeric(ccr$PFE), alternative = "greater")
ttest_PFE.r
## 
##  One Sample t-test
## 
## data:  as.numeric(ccr$PFE)
## t = 1.1565, df = 45, p-value = 0.1268
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  -0.005562401          Inf
## sample estimates:
##  mean of x 
## 0.01230006

From the t-test developed for Pfizer, the result for the exercise as p-value is of 1.1565, which is not enough, since the objective of the exercise was to demonstrate that Pfizer has average monthly returns significantly greater than zero (normally larger than 2), and since this condition was not complied, the null hypothesis would turn to be the right ine for this exercise