Ajuste de distribución

Librerías

#install.packages('quantmod')

library(moments)
library(nortest)
library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tseries)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(quantmod)

## Loading required package: xts

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## 
## Attaching package: 'xts'

## The following objects are masked from 'package:dplyr':
## 
##     first, last

## Loading required package: TTR

library(zoo)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ tibble  3.1.6     ✓ purrr   0.3.4
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.1     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x lubridate::as.difftime() masks base::as.difftime()
## x lubridate::date()        masks base::date()
## x dplyr::filter()          masks stats::filter()
## x xts::first()             masks dplyr::first()
## x lubridate::intersect()   masks base::intersect()
## x dplyr::lag()             masks stats::lag()
## x xts::last()              masks dplyr::last()
## x lubridate::setdiff()     masks base::setdiff()
## x lubridate::union()       masks base::union()

library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:purrr':
## 
##     some

## The following object is masked from 'package:dplyr':
## 
##     recode

library(astsa)
library(foreign)
library(lmtest)
library(dynlm)
library(nlme)

## 
## Attaching package: 'nlme'

## The following object is masked from 'package:dplyr':
## 
##     collapse

library(lmtest)
library(broom)
library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(knitr)
library(MASS)

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

library(parallel)
library(car)
library(mlogit)

## Loading required package: dfidx

## 
## Attaching package: 'dfidx'

## The following object is masked from 'package:MASS':
## 
##     select

## The following object is masked from 'package:stats':
## 
##     filter

library(dplyr)
library(tidyr)
library(forecast)

## 
## Attaching package: 'forecast'

## The following object is masked from 'package:nlme':
## 
##     getResponse

## The following object is masked from 'package:astsa':
## 
##     gas

library(fpp2)

## ── Attaching packages ────────────────────────────────────────────── fpp2 2.4 ──

## ✓ fma       2.4     ✓ expsmooth 2.3

## ── Conflicts ───────────────────────────────────────────────── fpp2_conflicts ──
## x forecast::getResponse() masks nlme::getResponse()
## x car::some()             masks purrr::some()

## 
## Attaching package: 'fpp2'

## The following object is masked from 'package:astsa':
## 
##     oil

library(stats)

#Televisa <- read.csv(file.choose())

Actividad

Utilizar una serie empírica (económica, acción, commodity, crytpocurrency, etc.) para aplicar la metodología de ajuste de distribución revisada en la sesión. Se recomienda considerar un periodo de observaciones de un año.

#Obtener información en Yahoo

Televisa<-getSymbols("TV", from="2020-01-01", src = "yahoo", 
                    auto.assign = F)[,6]

Gráfica

plot(Televisa, col = "violet")

1. Gráfica de serie de datos.

#Se obtiene la grafica del proceso 
ggplot(Televisa, aes(x=1:nrow(Televisa), y=TV.Adjusted))+
  geom_line(col="blue",
            size=1)+
  labs(title="Serie de precios Televisa",
       y="Precio",
       x="Observaciones")

2. Gráfica de densidad de la serie de datos.

#Se obtiene la densidad de la serie
ggplot(Televisa, aes(x=TV.Adjusted))+
  geom_density(color= "red",
               fill = "grey",
               alpha = 0.5)+
  labs(title="Gráfica de densidad",
       y="Density",
       x="Televisa precio")

Histograma

#Se obtiene el histograma de la serie
ggplot(Televisa, aes(x=TV.Adjusted))+
  geom_histogram(color= "blue",
               fill = "grey",
               alpha = 0.5)+
    labs(title="Histograma",
       y="Count",
       x="Televisa precio")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

3. Resultado de la prueba estadística ADF.

#Se estima la prueba Augemnted Dickey-Fuller
tv.adf.test <- adf.test(Televisa)
tv.adf.test

## 
##  Augmented Dickey-Fuller Test
## 
## data:  Televisa
## Dickey-Fuller = -2.2662, Lag order = 8, p-value = 0.4656
## alternative hypothesis: stationary

Como el p-value es >.05, entonces la serie tiene raíz unitaria, por tanto, no es estacionaria.

4. Mencionar si fue necesario obtener rendimientos logarítmicos de la serie y dar una explicación.

En este caso, es necesario obtener los rendimientos logarítmicos con la finalidad de inducir estacionariedad y que pueda ser modelada. Una ventaja por la que se suelen preferir los rendimientos logarítmicos es la forma de agregación de los periodos. Un rendimiento logarítmico de k-periodos es la suma de los rendimientos logarítmicos individuales; a diferencia del producto para los rendimientos simples.

#En esta linea de código se estiman las diferencias logarítmicas aplicando de manera anidada las funciones diff() y log()
tv.L1.return <- diff(log(Televisa))
#se almacenan las diferencias logarítminas en un data frame para poder graficarlas mediante ggplot2()
tv.L1.return.df <- as.data.frame(tv.L1.return)
ggplot(tv.L1.return.df, aes(x=1:nrow(tv.L1.return.df),y=tv.L1.return))+
  geom_line(col="violet",
            size=1)+
  labs(title="Retornos Logarítmicos Televisa",
       y="Rendimiento",
       x="Observaciones")

## Don't know how to automatically pick scale for object of type xts/zoo. Defaulting to continuous.

## Warning: Removed 1 row(s) containing missing values (geom_path).

ADF logarítmica

#Se estima nuevamente la prueba Dickey-Fuller para determinar si la diferenciación induji la estacionariedad a la serie.
tv.L1.return2 <- na.omit(tv.L1.return)

tv.L1.return.adf.test <- adf.test(tv.L1.return2)

## Warning in adf.test(tv.L1.return2): p-value smaller than printed p-value

tv.L1.return.adf.test

## 
##  Augmented Dickey-Fuller Test
## 
## data:  tv.L1.return2
## Dickey-Fuller = -8.151, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary

#Se obtiene la densidad de la serie
ggplot(tv.L1.return2, aes(x=TV.Adjusted))+
  geom_density(color= "red",
               fill = "grey",
               alpha = 0.5)+
  labs(title="Gráfica de densidad",
       y="Density",
       x="Televisa precio")

#Se obtiene el histograma de la serie
ggplot(tv.L1.return2, aes(x=TV.Adjusted))+
  geom_histogram(color= "blue",
               fill = "grey",
               alpha = 0.5)+
    labs(title="Histograma",
       y="Count",
       x="Televisa precio")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Estadística descriptiva de las series de rendimientos

5. Estadística descriptiva de la serie, dando una breve explicación de la media, desviación estándar, sesgo y kurtosis.

#Se estiman la estadística descriptiva de las series de rendimientos.
mean(tv.L1.return2)

## [1] -0.0005631797

var(tv.L1.return2)

##             TV.Adjusted
## TV.Adjusted 0.001260555

skewness(tv.L1.return2)

## TV.Adjusted 
##  0.04507872

kurtosis(tv.L1.return2)

## TV.Adjusted 
##    8.507065

Debido a que la kurtosis es >3, la distribución muestral es leptocúrticas. Existe asimetría y parece ser no-normal, kurtosis mayor a 3 y varianza diferente de 0 y los rendimientos son negativos.

qqnorm(tv.L1.return2$TV.Adjusted, pch = 1, col="violet", frame = FALSE)
qqline(tv.L1.return2$TV.Adjusted, col = "steelblue", lwd = 2)

## 6. Resultado de la prueba de normalidad Jarque-Bera y una breve explicación.

#Se estima la prueba de normalidad Jarque-Bera
jarque.bera.test(tv.L1.return2)

## 
##  Jarque Bera Test
## 
## data:  tv.L1.return2
## X-squared = 795.05, df = 2, p-value < 2.2e-16

Como p-value < .05 , entonces la serie no sigue una distribución normal.

7. Parámetros de la distribución NIG que ajustan los datos.

#install.packages("GeneralizedHyperbolic")

library(GeneralizedHyperbolic)

#obtención de parámetros de la distribucion empirica
fit.tv <- nigFit(tv.L1.return2)
#Se realiza el ajuste de la distribución con los parámetros obtenidos
adj.tv <- rnig(length(tv.L1.return2),mu=fit.tv$param[1],delta=fit.tv$param[2],alpha =fit.tv$param[3],beta =fit.tv$param[4])

tv.param.mu <- fit.tv$param[1]
tv.param.delta <- fit.tv$param[2]
tv.param.alpha <- fit.tv$param[3]
tv.param.beta <- fit.tv$param[4]
tv.param.mu

##            mu 
## -0.0005745319

tv.param.delta

##     delta 
## 0.0277988

tv.param.alpha

##    alpha 
## 22.61903

tv.param.beta

##        beta 
## 0.008316647

8. Resultados de las pruebas de bondad de ajuste y una breve explicación.

#install.packages("SuppDists")
#install.packages("kSamples")

library(SuppDists)
library(kSamples)

## 
## Attaching package: 'kSamples'

## The following object is masked from 'package:nortest':
## 
##     ad.test

#Bondad de ajuste
#Es necesario desactivar las librerias moments y nortest para evitar conflictos con las funciones ad.test, qn.test y ks.test, con las funcioanes de las librerías SuppDists y kSamples.
detach("package:moments", unload=TRUE)
detach("package:nortest", unload=TRUE)

#Se ejecutn las pruebas de bondad de ajuste
#Prueba Anderson-Darling
ad.test(tv.L1.return2,adj.tv)

## 
## 
##  Anderson-Darling k-sample test.
## 
## Number of samples:  2
## Sample sizes:  629, 629
## Number of ties: 10
## 
## Mean of  Anderson-Darling  Criterion: 1
## Standard deviation of  Anderson-Darling  Criterion: 0.76023
## 
## T.AD = ( Anderson-Darling  Criterion - mean)/sigma
## 
## Null Hypothesis: All samples come from a common population.
## 
##               AD   T.AD  asympt. P-value
## version 1: 1.286 0.3766           0.2365
## version 2: 1.290 0.3832           0.2350

Las muestras provienen de la misma población, ya que p-value >.05

#Prueba Kruscal-Wallis
qn.test(tv.L1.return2,adj.tv)

## 
## 
##  Kruskal-Wallis k-sample test.
## 
## Number of samples:  2
## Sample sizes:  629, 629
## Number of ties: 10
## 
## Null Hypothesis: All samples come from a common population.
## 
##   test statistic  asympt. P-value 
##           1.8090           0.1787

#Prueba Kolmogorov-Smirnov
#ks.test(tv.L1.return2,adj.tv)

9. Gráficas de ajuste de la distribución y una breve descripción

#Función que permite graficar la distribución normal, distribución empírica y distribución ajustada.
graph.kdn <- function(x,y,z) {
  name = toString(z)
  mu <- mean(x,na.rm = T)
  des <- sd(x, na.rm = T)
  li <- mu - 5*des
  lu <- mu + 5*des
  x1 <- seq(li, lu, length=500)
  z1 <- dnorm(x1, mu, des)
  dz <- density(x)
  dy <- density(y)
  m1 <- max(dz$y)*1.1
  plot(x1, z1, ylim=c(0, m1), type="l", main=name, col="black", xlab="Returns", ylab = "Density")
  lines(dz, lwd=1, col="darkblue", type="l")
  lines(dy, lwd=1, col="darkred", type="l")
}

#Graficas de ajuste de distribución
graph.kdn(tv.L1.return2,adj.tv,"BCA")

nigFit(tv.L1.return2, plots = TRUE)

## 
## Data:      tv.L1.return2 
## Parameter estimates:
##         mu       delta       alpha        beta  
## -0.0005745   0.0277988  22.6190270   0.0083166  
## Likelihood:         1269.709 
## criterion :         MLE 
## Method:             Nelder-Mead 
## Convergence code:   0 
## Iterations:         213

Contrastan la similitud entre la distribución empírica obtenida y la distribución teórica con los parámetros estimados

Ajuste de distribución

Mabel Álvarez

2022-07-04