Definición PDF

Una distribución bivariada es de la forma \[p(y_1,y_2)=P(Y_1=y_1,Y_2=y_2)\] la cual podemos ver tiene dos variables aleatorias.

Propiedades

  1. \(p(y_1.y_2)\geq 0\)
  2. \(\sum_{y_1}\sum_{y_2}p(y_1,y_2)=1, \forall (y_1,y_2)\)

Definición CDF

La funcion de distribución acumulada se define como, \[F(y_1,y_2)=P(Y_1 \leq y_1, Y_2 \leq y_2)\]

Caso Discreto

\[\sum_{t_1 \in (-\infty,y_1]}\sum_{t_2 \in (-\infty,y_2]}f(t_1,t_2)\]

Caso continuo

\[\int_{-\infty}^{y_1}\int_{-\infty}^{y_2}f(t_1,t_2)dt_2dt_1\]

Propiedades

  1. \(F(-\infty,-\infty)=F(-\infty,y_2)=F(y_1,-\infty)=0\)
  2. \(F(\infty, \infty)=1\)

Ejemplo clase

La administración en un restaurante de comida rápida está interesada en el comportamiento conjunto de las variables aleatorias \(Y_1\), definidas como el tiempo total entre la llegada de un cliente a la tienda y la salida de la ventanilla de servicio y \(Y_2\), el tiempo que un cliente espera en la fila antes de llegar a la ventanilla de servicio. Como \(Y_1\) incluye el tiempo que un cliente espera en la fila, debemos tener \(Y_1 \geq Y_2\). La distribución de frecuencia relativa de valores observados de \(Y_1\) y \(Y_2\) puede ser modelada por la función de densidad de probabilidad \[f(y_1,y_2)=\begin{cases} e^{-y_1} & \text{ , } 0 \leq y_2 \leq y_1 < \infty \\ 0 & \text{ , en cualquier otro punto} \end{cases}\]

Con el tiempo medido en minutos. Encuentre

  1. \(P(Y_1<2, Y_2>1)\)
  2. \(P(Y_1 \geq 2Y_2)\)
  3. \(P(Y_1-Y_2 \geq 1)\) (Observese que \(Y_1-Y_2\) denota el tiempo que se pasa en la ventanilla de servicio)

Laboratorio en R

Usaremos el dataset demanda de bicicletas. Para determinar demostrar la utilizacion en campo de la teoria aprendidad en clase. El dataset lo puede encontrar en el GES.

bici<- read.csv("hour.csv",
                na.strings = FALSE,
                strip.white = TRUE)
colnames(bici)
 [1] "instant"    "dteday"     "season"     "yr"         "mnth"       "hr"         "holiday"   
 [8] "weekday"    "workingday" "weathersit" "temp"       "atemp"      "hum"        "windspeed" 
[15] "casual"     "registered" "cnt"       

datetime - hourly date + timestamp

season - 1 = spring, 2 = summer, 3 = fall, 4 = winter

holiday - whether the day is considered a holiday

workingday - whether the day is neither a weekend nor holiday

weather - 1: Clear, Few clouds, Partly cloudy, Partly cloudy 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

temp - temperature in Celsius

atemp - “feels like” temperature in Celsius

humidity - relative humidity

windspeed - wind speed

casual - number of non-registered user rentals initiated

registered - number of registered user rentals initiated

count - number of total rentals

library(dplyr)
package <U+393C><U+3E31>dplyr<U+393C><U+3E32> was built under R version 3.3.3
Attaching package: <U+393C><U+3E31>dplyr<U+393C><U+3E32>

The following objects are masked from <U+393C><U+3E31>package:stats<U+393C><U+3E32>:

    filter, lag

The following objects are masked from <U+393C><U+3E31>package:base<U+393C><U+3E32>:

    intersect, setdiff, setequal, union
library(ggplot2)
library(MASS)

Attaching package: <U+393C><U+3E31>MASS<U+393C><U+3E32>

The following object is masked from <U+393C><U+3E31>package:dplyr<U+393C><U+3E32>:

    select
seasonx<-2
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) 
hist(bici_data$atemp)

hist(bici_data$hum)

bici_data %>% 
  ggplot(aes(atemp))+
  geom_histogram(bins=30, aes(y = ..density..) )+
  geom_rug()+
  geom_density()

bici_data %>% 
  ggplot(aes(hum))+
  geom_histogram(bins=30, aes(y = ..density..) )+
  geom_rug()+
  geom_density()

Grafica de contorno spring

seasonx<-1
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum)
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) %>%
  ggplot( aes(atemp, hum) ) +
  geom_raster(aes(fill=cnt), interpolate = FALSE)+
  geom_point(size=0.1)+
  geom_density_2d()+
  geom_rug()

NA
bici_density <- kde2d(bici_data$atemp,bici_data$hum, n=100)
bici_density$z <- bici_density$z/sum( bici_density$z) 
sum(bici_density$z)
[1] 1

Heatmap spring

cols1 <- colorRampPalette(c("red", "white", "blue"),
                                    space = "Lab")(256)
cols2 <- colorRampPalette(c("#FFFFD4", "#FED98E", "#FE9929", "#D95F0E", "#993404"),space="Lab")(256)
cols3<-colorRampPalette(c("black","blue","green","orange","red"))(1000)
image(bici_density$z,  
      col = cols3, 
      zlim=c(min(bici_density$z), max(bici_density$z)))

Grafica de contorno summer

seasonx<-2
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum)
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) %>%
  ggplot( aes(atemp, hum) ) +
  geom_raster(aes(fill=cnt), interpolate = FALSE)+
  geom_point(size=0.1)+
  geom_density_2d()+
  geom_rug()

NA
bici_density <- kde2d(bici_data$atemp,bici_data$hum, n=100)
bici_density$z <- bici_density$z/sum( bici_density$z) 
sum(bici_density$z)
[1] 1

Heatmap summer

cols1 <- colorRampPalette(c("red", "white", "blue"),
                                    space = "Lab")(256)
cols2 <- colorRampPalette(c("#FFFFD4", "#FED98E", "#FE9929", "#D95F0E", "#993404"),space="Lab")(256)
cols3<-colorRampPalette(c("black","blue","green","orange","red"))(1000)
image(bici_density$z,  
      col = cols3, 
      zlim=c(min(bici_density$z), max(bici_density$z)))

Grafica de contorno fall

seasonx<-3
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum)
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) %>%
  ggplot( aes(atemp, hum) ) +
  geom_raster(aes(fill=cnt), interpolate = FALSE)+
  geom_point(size=0.1)+
  geom_density_2d()+
  geom_rug()

NA
bici_density <- kde2d(bici_data$atemp,bici_data$hum, n=100)
bici_density$z <- bici_density$z/sum( bici_density$z) 
sum(bici_density$z)
[1] 1

Heatmap fall

cols1 <- colorRampPalette(c("red", "white", "blue"),
                                    space = "Lab")(256)
cols2 <- colorRampPalette(c("#FFFFD4", "#FED98E", "#FE9929", "#D95F0E", "#993404"),space="Lab")(256)
cols3<-colorRampPalette(c("black","blue","green","orange","red"))(1000)
image(bici_density$z,  
      col = cols3, 
      zlim=c(min(bici_density$z), max(bici_density$z)))

Grafica de contorno winter

seasonx<-4
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum)
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) %>%
  ggplot( aes(atemp, hum) ) +
  geom_raster(aes(fill=cnt), interpolate = FALSE)+
  geom_point(size=0.1)+
  geom_density_2d()+
  geom_rug()

NA
bici_density <- kde2d(bici_data$atemp,bici_data$hum, n=100)
bici_density$z <- bici_density$z/sum( bici_density$z) 
sum(bici_density$z)
[1] 1

Heatmap winter

cols1 <- colorRampPalette(c("red", "white", "blue"),
                                    space = "Lab")(256)
cols2 <- colorRampPalette(c("#FFFFD4", "#FED98E", "#FE9929", "#D95F0E", "#993404"),space="Lab")(256)
cols3<-colorRampPalette(c("black","blue","green","orange","red"))(1000)
image(bici_density$z,  
      col = cols3, 
      zlim=c(min(bici_density$z), max(bici_density$z)))

str(bici_density)
List of 3
 $ x: num [1:100] 0.151 0.157 0.162 0.168 0.173 ...
 $ y: num [1:100] 0.16 0.168 0.177 0.185 0.194 ...
 $ z: num [1:100, 1:100] 3.38e-15 7.71e-15 1.66e-14 3.36e-14 6.44e-14 ...
filterX<-bici_density$x>=0.5
filterY<-bici_density$y<=0.3
bici_density$z[filterX,filterY] %>% sum()
[1] 0.003170821
filterX<-bici_density$x>=0.4 & bici_density$x<=0.8
filterY<-bici_density$y>=0.6 & bici_density$y<=0.85
bici_density$z[filterX,filterY] %>% sum()
[1] 0.2296997
bici_data<-
  bici %>%
  dplyr::select(season,atemp,hum,cnt)%>%
  filter(season==4)
##bici_density$y
x<-matrix(rep(bici_density$y,3),nrow = 3, byrow = TRUE)
sum(x*bici_density$y)
[1] 106.9188
library(expm)
Loading required package: Matrix

Attaching package: <U+393C><U+3E31>expm<U+393C><U+3E32>

The following object is masked from <U+393C><U+3E31>package:Matrix<U+393C><U+3E32>:

    expm
p<-rbind(c(1,0,0,0),c(0.5,0,0.5,0),c(0,0.5,0,0.5),c(0,0,0,1))
w<-matrix(p,nrow = 4)
c(0,0,1,0) %*% (p%^%20)
         [,1] [,2]         [,3]     [,4]
[1,] 0.333333    0 9.536743e-07 0.666666
---
title: "Distribuciones de probabilidad bivariadas"
output:
  html_notebook: default
  html_document: default
---


## DefiniciÃ³n PDF
Una distribuciÃ³n bivariada es de la forma $$p(y_1,y_2)=P(Y_1=y_1,Y_2=y_2)$$
la cual podemos ver tiene dos *variables aleatorias*.

## Propiedades
1. $p(y_1.y_2)\geq 0$
2. $\sum_{y_1}\sum_{y_2}p(y_1,y_2)=1, \forall (y_1,y_2)$

## DefiniciÃ³n CDF
La funcion de distribuciÃ³n acumulada se define como,
$$F(y_1,y_2)=P(Y_1 \leq y_1, Y_2 \leq y_2)$$

### Caso Discreto

$$\sum_{t_1 \in (-\infty,y_1]}\sum_{t_2 \in (-\infty,y_2]}f(t_1,t_2)$$

### Caso continuo
$$\int_{-\infty}^{y_1}\int_{-\infty}^{y_2}f(t_1,t_2)dt_2dt_1$$

## Propiedades
1. $F(-\infty,-\infty)=F(-\infty,y_2)=F(y_1,-\infty)=0$
2. $F(\infty, \infty)=1$

## Ejemplo clase
La administraciÃ³n en un restaurante de comida rÃ¡pida estÃ¡ interesada en el comportamiento conjunto de las variables aleatorias $Y_1$, definidas como el tiempo total entre la llegada de un cliente a la tienda y la salida de la ventanilla de servicio y $Y_2$, el tiempo que un cliente espera en la fila antes de llegar a la ventanilla de servicio. Como $Y_1$ incluye el tiempo que un cliente espera en la fila, debemos tener $Y_1 \geq Y_2$. La distribuciÃ³n de frecuencia relativa de valores observados de $Y_1$ y $Y_2$ puede ser modelada por la funciÃ³n de densidad de probabilidad
$$f(y_1,y_2)=\begin{cases}
e^{-y_1} & \text{ , } 0 \leq y_2 \leq y_1 < \infty \\ 
0 & \text{ ,  en cualquier otro punto}   
\end{cases}$$



Con el tiempo medido en minutos. Encuentre

a. $P(Y_1<2, Y_2>1)$
b. $P(Y_1 \geq 2Y_2)$
c. $P(Y_1-Y_2 \geq 1)$ (Observese que $Y_1-Y_2$ denota el tiempo que se pasa en la ventanilla de servicio)


# Laboratorio en R

Usaremos el dataset demanda de bicicletas. Para determinar demostrar la utilizacion en campo de la teoria aprendidad en clase. El dataset lo puede encontrar en el GES.

```{r}
bici<- read.csv("hour.csv",
                na.strings = FALSE,
                strip.white = TRUE)
```

```{r}
colnames(bici)
```

**datetime** - hourly date + timestamp 

**season** -  1 = spring, 2 = summer, 3 = fall, 4 = winter 

**holiday** - whether the day is considered a holiday

**workingday** - whether the day is neither a weekend nor holiday

**weather** - *1:* Clear, Few clouds, Partly cloudy, Partly cloudy
*2:* Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 
*3:* Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 
*4:* Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog 
            
**temp** - temperature in Celsius

**atemp** - "feels like" temperature in Celsius

**humidity** - relative humidity

**windspeed** - wind speed

**casual** - number of non-registered user rentals initiated

**registered** - number of registered user rentals initiated

**count** - number of total rentals

```{r}
library(dplyr)
library(ggplot2)
library(MASS)
```


```{r}
seasonx<-2
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) 
```


```{r}
hist(bici_data$atemp)
```

```{r}
hist(bici_data$hum)
```



```{r}
bici_data %>% 
  ggplot(aes(atemp))+
  geom_histogram(bins=30, aes(y = ..density..) )+
  geom_rug()+
  geom_density()
```


```{r}
bici_data %>% 
  ggplot(aes(hum))+
  geom_histogram(bins=30, aes(y = ..density..) )+
  geom_rug()+
  geom_density()
```

#Grafica de contorno spring


```{r}
seasonx<-1
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum)
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) %>%
  ggplot( aes(atemp, hum) ) +
  geom_raster(aes(fill=cnt), interpolate = FALSE)+
  geom_point(size=0.1)+
  geom_density_2d()+
  geom_rug()
  
```


```{r}
bici_density <- kde2d(bici_data$atemp,bici_data$hum, n=100)
bici_density$z <- bici_density$z/sum( bici_density$z) 
sum(bici_density$z)


```

#Heatmap spring

```{r}
cols1 <- colorRampPalette(c("red", "white", "blue"),
                                    space = "Lab")(256)

cols2 <- colorRampPalette(c("#FFFFD4", "#FED98E", "#FE9929", "#D95F0E", "#993404"),space="Lab")(256)

cols3<-colorRampPalette(c("black","blue","green","orange","red"))(1000)


image(bici_density$z,  
      col = cols3, 
      zlim=c(min(bici_density$z), max(bici_density$z)))

```


#Grafica de contorno summer

```{r}
seasonx<-2
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum)
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) %>%
  ggplot( aes(atemp, hum) ) +
  geom_raster(aes(fill=cnt), interpolate = FALSE)+
  geom_point(size=0.1)+
  geom_density_2d()+
  geom_rug()
  
```

```{r}
bici_density <- kde2d(bici_data$atemp,bici_data$hum, n=100)
bici_density$z <- bici_density$z/sum( bici_density$z) 
sum(bici_density$z)


```

#Heatmap summer

```{r}
cols1 <- colorRampPalette(c("red", "white", "blue"),
                                    space = "Lab")(256)

cols2 <- colorRampPalette(c("#FFFFD4", "#FED98E", "#FE9929", "#D95F0E", "#993404"),space="Lab")(256)

cols3<-colorRampPalette(c("black","blue","green","orange","red"))(1000)


image(bici_density$z,  
      col = cols3, 
      zlim=c(min(bici_density$z), max(bici_density$z)))

```

#Grafica de contorno fall

```{r}
seasonx<-3
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum)
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) %>%
  ggplot( aes(atemp, hum) ) +
  geom_raster(aes(fill=cnt), interpolate = FALSE)+
  geom_point(size=0.1)+
  geom_density_2d()+
  geom_rug()
  
```

```{r}
bici_density <- kde2d(bici_data$atemp,bici_data$hum, n=100)
bici_density$z <- bici_density$z/sum( bici_density$z) 
sum(bici_density$z)


```

#Heatmap fall

```{r}
cols1 <- colorRampPalette(c("red", "white", "blue"),
                                    space = "Lab")(256)

cols2 <- colorRampPalette(c("#FFFFD4", "#FED98E", "#FE9929", "#D95F0E", "#993404"),space="Lab")(256)

cols3<-colorRampPalette(c("black","blue","green","orange","red"))(1000)


image(bici_density$z,  
      col = cols3, 
      zlim=c(min(bici_density$z), max(bici_density$z)))

```

#Grafica de contorno winter

```{r}
seasonx<-4
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum)
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) %>%
  ggplot( aes(atemp, hum) ) +
  geom_raster(aes(fill=cnt), interpolate = FALSE)+
  geom_point(size=0.1)+
  geom_density_2d()+
  geom_rug()
  
```

```{r}
bici_density <- kde2d(bici_data$atemp,bici_data$hum, n=100)
bici_density$z <- bici_density$z/sum( bici_density$z) 
sum(bici_density$z)


```

#Heatmap winter

```{r}
cols1 <- colorRampPalette(c("red", "white", "blue"),
                                    space = "Lab")(256)

cols2 <- colorRampPalette(c("#FFFFD4", "#FED98E", "#FE9929", "#D95F0E", "#993404"),space="Lab")(256)

cols3<-colorRampPalette(c("black","blue","green","orange","red"))(1000)


image(bici_density$z,  
      col = cols3, 
      zlim=c(min(bici_density$z), max(bici_density$z)))

```

```{r}
str(bici_density)


```

```{r}
filterX<-bici_density$x>=0.5
filterY<-bici_density$y<=0.3
bici_density$z[filterX,filterY] %>% sum()
```

```{r}
filterX<-bici_density$x>=0.4 & bici_density$x<=0.8
filterY<-bici_density$y>=0.6 & bici_density$y<=0.85
bici_density$z[filterX,filterY] %>% sum()
```



```{r}
bici_data<-
  bici %>%
  dplyr::select(season,atemp,hum,cnt)%>%
  filter(season==4)
##bici_density$y
x<-matrix(rep(bici_density$y,3),nrow = 3, byrow = TRUE)
sum(x*bici_density$y)
```

```{r}
library(expm)
p<-rbind(c(1,0,0,0),c(0.5,0,0.5,0),c(0,0.5,0,0.5),c(0,0,0,1))
w<-matrix(p,nrow = 4)
c(0,0,1,0) %*% (p%^%20)
```

