Definición PDF

Una distribución bivariada es de la forma \[p(y_1,y_2)=P(Y_1=y_1,Y_2=y_2)\] la cual podemos ver tiene dos variables aleatorias.

Propiedades

  1. \(p(y_1.y_2)\geq 0\)
  2. \(\sum_{y_1}\sum_{y_2}p(y_1,y_2)=1, \forall (y_1,y_2)\)

Definición CDF

La funcion de distribución acumulada se define como, \[F(y_1,y_2)=P(Y_1 \leq y_1, Y_2 \leq y_2)\]

Caso Discreto

\[\sum_{t_1 \in (-\infty,y_1]}\sum_{t_2 \in (-\infty,y_2]}f(t_1,t_2)\]

Caso continuo

\[\int_{-\infty}^{y_1}\int_{-\infty}^{y_2}f(t_1,t_2)dt_2dt_1\]

Propiedades

  1. \(F(-\infty,-\infty)=F(-\infty,y_2)=F(y_1,-\infty)=0\)
  2. \(F(\infty, \infty)=1\)

Ejemplo clase

La administración en un restaurante de comida rápida está interesada en el comportamiento conjunto de las variables aleatorias \(Y_1\), definidas como el tiempo total entre la llegada de un cliente a la tienda y la salida de la ventanilla de servicio y \(Y_2\), el tiempo que un cliente espera en la fila antes de llegar a la ventanilla de servicio. Como \(Y_1\) incluye el tiempo que un cliente espera en la fila, debemos tener \(Y_1 \geq Y_2\). La distribución de frecuencia relativa de valores observados de \(Y_1\) y \(Y_2\) puede ser modelada por la función de densidad de probabilidad \[f(y_1,y_2)=\begin{cases} e^{-y_1} & \text{ , } 0 \leq y_2 \leq y_1 < \infty \\ 0 & \text{ , en cualquier otro punto} \end{cases}\]

Con el tiempo medido en minutos. Encuentre

  1. \(P(Y_1<2, Y_2>1)\)
  2. \(P(Y_1 \geq 2Y_2)\)
  3. \(P(Y_1-Y_2 \geq 1)\) (Observese que \(Y_1-Y_2\) denota el tiempo que se pasa en la ventanilla de servicio)

Laboratorio en R

Usaremos el dataset demanda de bicicletas. Para determinar demostrar la utilizacion en campo de la teoria aprendidad en clase. El dataset lo puede encontrar en el GES.

bici<- read.csv("hour.csv",
                na.strings = FALSE,
                strip.white = TRUE)
colnames(bici)
 [1] "instant"    "dteday"     "season"     "yr"         "mnth"       "hr"         "holiday"   
 [8] "weekday"    "workingday" "weathersit" "temp"       "atemp"      "hum"        "windspeed" 
[15] "casual"     "registered" "cnt"       

datetime - hourly date + timestamp

season - 1 = spring, 2 = summer, 3 = fall, 4 = winter

holiday - whether the day is considered a holiday

workingday - whether the day is neither a weekend nor holiday

weather - 1: Clear, Few clouds, Partly cloudy, Partly cloudy 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

temp - temperature in Celsius

atemp - “feels like” temperature in Celsius

humidity - relative humidity

windspeed - wind speed

casual - number of non-registered user rentals initiated

registered - number of registered user rentals initiated

count - number of total rentals

library(dplyr)
package <U+393C><U+3E31>dplyr<U+393C><U+3E32> was built under R version 3.3.3
Attaching package: <U+393C><U+3E31>dplyr<U+393C><U+3E32>

The following objects are masked from <U+393C><U+3E31>package:stats<U+393C><U+3E32>:

    filter, lag

The following objects are masked from <U+393C><U+3E31>package:base<U+393C><U+3E32>:

    intersect, setdiff, setequal, union
library(ggplot2)
library(MASS)

Attaching package: <U+393C><U+3E31>MASS<U+393C><U+3E32>

The following object is masked from <U+393C><U+3E31>package:dplyr<U+393C><U+3E32>:

    select
seasonx<-2
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) 
hist(bici_data$atemp)

hist(bici_data$hum)

bici_data %>% 
  ggplot(aes(atemp))+
  geom_histogram(bins=30, aes(y = ..density..) )+
  geom_rug()+
  geom_density()

bici_data %>% 
  ggplot(aes(hum))+
  geom_histogram(bins=30, aes(y = ..density..) )+
  geom_rug()+
  geom_density()

Grafica de contorno spring

seasonx<-1
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum)
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) %>%
  ggplot( aes(atemp, hum) ) +
  geom_raster(aes(fill=cnt), interpolate = FALSE)+
  geom_point(size=0.1)+
  geom_density_2d()+
  geom_rug()

NA
bici_density <- kde2d(bici_data$atemp,bici_data$hum, n=100)
bici_density$z <- bici_density$z/sum( bici_density$z) 
sum(bici_density$z)
[1] 1

Heatmap spring

cols1 <- colorRampPalette(c("red", "white", "blue"),
                                    space = "Lab")(256)
cols2 <- colorRampPalette(c("#FFFFD4", "#FED98E", "#FE9929", "#D95F0E", "#993404"),space="Lab")(256)
cols3<-colorRampPalette(c("black","blue","green","orange","red"))(1000)
image(bici_density$z,  
      col = cols3, 
      zlim=c(min(bici_density$z), max(bici_density$z)))

Grafica de contorno summer

seasonx<-2
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum)
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) %>%
  ggplot( aes(atemp, hum) ) +
  geom_raster(aes(fill=cnt), interpolate = FALSE)+
  geom_point(size=0.1)+
  geom_density_2d()+
  geom_rug()

NA
bici_density <- kde2d(bici_data$atemp,bici_data$hum, n=100)
bici_density$z <- bici_density$z/sum( bici_density$z) 
sum(bici_density$z)
[1] 1

Heatmap summer

cols1 <- colorRampPalette(c("red", "white", "blue"),
                                    space = "Lab")(256)
cols2 <- colorRampPalette(c("#FFFFD4", "#FED98E", "#FE9929", "#D95F0E", "#993404"),space="Lab")(256)
cols3<-colorRampPalette(c("black","blue","green","orange","red"))(1000)
image(bici_density$z,  
      col = cols3, 
      zlim=c(min(bici_density$z), max(bici_density$z)))

Grafica de contorno fall

seasonx<-3
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum)
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) %>%
  ggplot( aes(atemp, hum) ) +
  geom_raster(aes(fill=cnt), interpolate = FALSE)+
  geom_point(size=0.1)+
  geom_density_2d()+
  geom_rug()

NA
bici_density <- kde2d(bici_data$atemp,bici_data$hum, n=100)
bici_density$z <- bici_density$z/sum( bici_density$z) 
sum(bici_density$z)
[1] 1

Heatmap fall

cols1 <- colorRampPalette(c("red", "white", "blue"),
                                    space = "Lab")(256)
cols2 <- colorRampPalette(c("#FFFFD4", "#FED98E", "#FE9929", "#D95F0E", "#993404"),space="Lab")(256)
cols3<-colorRampPalette(c("black","blue","green","orange","red"))(1000)
image(bici_density$z,  
      col = cols3, 
      zlim=c(min(bici_density$z), max(bici_density$z)))

Grafica de contorno winter

seasonx<-4
bici_data<-
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum)
bici %>%  
  dplyr::select(season, atemp, hum, cnt) %>% 
  filter(season == seasonx) %>%
  group_by(atemp,hum) %>%
  ggplot( aes(atemp, hum) ) +
  geom_raster(aes(fill=cnt), interpolate = FALSE)+
  geom_point(size=0.1)+
  geom_density_2d()+
  geom_rug()

NA
bici_density <- kde2d(bici_data$atemp,bici_data$hum, n=100)
bici_density$z <- bici_density$z/sum( bici_density$z) 
sum(bici_density$z)
[1] 1

Heatmap winter

cols1 <- colorRampPalette(c("red", "white", "blue"),
                                    space = "Lab")(256)
cols2 <- colorRampPalette(c("#FFFFD4", "#FED98E", "#FE9929", "#D95F0E", "#993404"),space="Lab")(256)
cols3<-colorRampPalette(c("black","blue","green","orange","red"))(1000)
image(bici_density$z,  
      col = cols3, 
      zlim=c(min(bici_density$z), max(bici_density$z)))

str(bici_density)
List of 3
 $ x: num [1:100] 0.151 0.157 0.162 0.168 0.173 ...
 $ y: num [1:100] 0.16 0.168 0.177 0.185 0.194 ...
 $ z: num [1:100, 1:100] 3.38e-15 7.71e-15 1.66e-14 3.36e-14 6.44e-14 ...
filterX<-bici_density$x>=0.5
filterY<-bici_density$y<=0.3
bici_density$z[filterX,filterY] %>% sum()
[1] 0.003170821
filterX<-bici_density$x>=0.4 & bici_density$x<=0.8
filterY<-bici_density$y>=0.6 & bici_density$y<=0.85
bici_density$z[filterX,filterY] %>% sum()
[1] 0.2296997
bici_data<-
  bici %>%
  dplyr::select(season,atemp,hum,cnt)%>%
  filter(season==4)
##bici_density$y
x<-matrix(rep(bici_density$y,3),nrow = 3, byrow = TRUE)
sum(x*bici_density$y)
[1] 106.9188
library(expm)
Loading required package: Matrix

Attaching package: <U+393C><U+3E31>expm<U+393C><U+3E32>

The following object is masked from <U+393C><U+3E31>package:Matrix<U+393C><U+3E32>:

    expm
p<-rbind(c(1,0,0,0),c(0.5,0,0.5,0),c(0,0.5,0,0.5),c(0,0,0,1))
w<-matrix(p,nrow = 4)
c(0,0,1,0) %*% (p%^%20)
         [,1] [,2]         [,3]     [,4]
[1,] 0.333333    0 9.536743e-07 0.666666
