#install.packages("tidyverse")
library(tidyverse)
[30m-- [1mAttaching packages[22m --------------------------------------- tidyverse 1.2.1 --[39m
[30m[32mv[30m [34mggplot2[30m 3.1.0 [32mv[30m [34mpurrr [30m 0.2.5
[32mv[30m [34mtibble [30m 1.4.2 [32mv[30m [34mdplyr [30m 0.7.8
[32mv[30m [34mtidyr [30m 0.8.2 [32mv[30m [34mstringr[30m 1.3.1
[32mv[30m [34mreadr [30m 1.1.1 [32mv[30m [34mforcats[30m 0.3.0[39m
[30m-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31mx[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
3.2.4 Exercises
- 1.Run ggplot(data = mpg). What do you see?
- 2.How many rows are in mpg? How many columns?
- 3.What does the drv variable describe? Read the help for ?mpg to find out.
- 4.Make a scatterplot of hwy vs cyl.
- What happens if you make a scatterplot of class vs drv? Why is the plot not useful?
3.2.4.1 No muestra NAda
ggplot(data = mpg)

3.2.4.2
print( c("Row:",nrow(mpg),"Cols:",ncol(mpg)) )
[1] "Row:" "234" "Cols:" "11"
3.2.4.3
print("rvf = front-wheel drive, r = rear wheel drive, 4 = 4wd")
[1] "rvf = front-wheel drive, r = rear wheel drive, 4 = 4wd"
3.2.4.4
ggplot(mpg, aes(x = hwy, y = cyl)) + geom_point()

3.2.4.5 No genera suficientes valores
ggplot(mpg, aes(x = class, y = drv)) + geom_point()

3.3.1 Exercises
3.3.1.1 What’s gone wrong with this code? Why are the points not blue? Por que los colores son asignados por etiquetas en este caso todo tiene la etiqueta azul
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

3.3.1.2 Which variables in mpg are categorical? Which variables are continuous? (Hint: type ?mpg to read the documentation for the dataset). How can you see this information when you run mpg? la variable Class es categorica
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))

3.3.1.3 Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables? La forma(shape) se visualiza mejor con variables categoricas, el tama?o se aprecia de mejor forma en variables continuas
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class, size = hwy, shapes=class))
Ignoring unknown aesthetics: shapes

3.3.1.4 What happens if you map the same variable to multiple aesthetics? Categoriza el tama?o de la figura en el eje X, pero este cambia de color segun su valor
ggplot(mpg, aes(x = displ, y = hwy, colour = displ, size = displ)) +
geom_point()

3.3.1.5 What does the stroke aesthetic do? What shapes does it work with? (Hint: use ?geom_point) Crea contorno de un grosor
ggplot(mpg, aes(x = displ, y = hwy, colour = displ)) +
geom_point(colour = "black", fill = "white", stroke = 3, shape=25)

3.3.1.6 What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)? Note, you’ll also need to specify x and y. vuelve discreto las posibilidades, es como estar creando una funcion que solo retornar? dos posibles valores
ggplot(mpg, aes(x = displ<5, y = hwy, colour = displ)) +
geom_point(colour = "black", fill = "white", stroke = 3, shape=25)

3.5.1 Exercises
3.5.1.1 What happens if you facet on a continuous variable? categoriza todos los valores continuos que existen en el dataset
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() + facet_grid( ~ hwy)

3.5.1.2 What do the empty cells in plot with facet_grid(drv ~ cyl) mean? How do they relate to this plot? basicamente todos los putos en el eje x corresponden a ciertas categorias eso hace que los puntos se agrupen,

3.5.1.3 What plots does the following code make? What does . do? la grafica se genera partiendo por filas las categorias del drv esto implica que el valor del eje vertical se repitirá en cada faceta

Pero si el punto le antecede se categoriza por columna, esto implica que el eje x se repite por cada faceta

3.5.1.4 Take the first faceted plot in this section: What are the advantages to using faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger dataset? El primer punto positivo es que puedo ver la data con mas de una dimension, la segunda es que el comportamiento de cada categoria es analizada por separado mientras que el color no me lo permite distinguir facilmente

3.5.1.5 Read ?facet_wrap. What does nrow do? What does ncol do? What other options control the layout of the individual panels? Why doesn’t facet_grid() have nrow and ncol arguments? nrow cuenta el numero de filas que tiene el dataset ncol cuenta el numero de columnas con relacion a Face grid no se necesitan por que se parte en terminos de la segmentacion hecha por las dimensiones utilizadas
3.5.1.6 When using facet_grid() you should usually put the variable with more unique levels in the columns. Why? Depende de la necesidad de información si la segmentacion es mayor sobre el eje horizontal , es mejor facetar filas y el mismo caso ocurre si la segmentacion es mayor sobre el eje vertical es mejor facetar columanas
3.6.1 Exercises
3.6.1.1 What geom would you use to draw a line chart? A boxplot? A histogram? An area chart?




3.6.1.2 Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions. el color se va categorizar, y smooth probablemetne va a a suavizar la data (no esperaba las lines, creo que es una tendencia)

3.6.1.3 What does show.legend = FALSE do? What happens if you remove it? Why do you think I used it earlier in the chapter? la leyenda es suprimida

3.6.1.4 What does the se argument to geom_smooth() do? Genera una una region sombreada, creo que es el posible error de algun tipo de regresión

3.6.1.5 Will these two graphs look different? Why/why not? No veo diferencia parece que es solo las etiquetas
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()

ggplot() +
geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))

3.6.1.6 Recreate the R code necessary to generate the following graphs.
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(se = FALSE)

ggplot(mpg, aes(x = displ, y = hwy, group=drv)) +
geom_point() +
geom_smooth(se = FALSE )

ggplot(mpg, aes(x = displ, y = hwy, group=drv, color=drv)) +
geom_point() +
geom_smooth(se = FALSE )

ggplot(mpg, aes(x = displ, y = hwy, group=drv, color=drv,linetype = drv)) +
geom_point() +
geom_smooth(se = FALSE )

NA
3.7.1 Exercises
3.7.1.1 What is the default geom associated with stat_summary()? How could you rewrite the previous plot to use that geom function instead of the stat function? Por default tiene la mediana y la SE, se utilizan

3.7.1.2 What does geom_col() do? How is it different to geom_bar()? geom_col se grafica basado en ejes coordenados, mientras que geombar solo con la incidencia de uno de sus ejes basicamente el eje vertical depende del parametro
ggplot(data = diamonds) +
geom_col(
mapping = aes(x = cut, y = depth)
)
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut)
)
3.7.1.3 Most geoms and stats come in pairs that are almost always used in concert. Read through the documentation and make a list of all the pairs. What do they have in common? 1. geom_hex() stat_bin_hex() 2. geom_path() geom_line() 3. geom_smooth() stat_smooth() etc
3.7.1.4 What variables does stat_smooth() compute? What parameters control its behaviour? Formula usada para computar y ~ x, y ~ poly(x, 2), y ~ log(x)
3.7.1.5 In our proportion bar chart, we need to set group = 1. Why? In other words what is the problem with these two graphs? El problema es la proporcion por que el grupo es proporcional

3.8.1 Exercises
3.8.1.1 What is the problem with this plot? How could you improve it? Al introducir ruido hace que tenga un mejor ajuste

3.8.1.2 What parameters to geom_jitter() control the amount of jittering? width = NULL cuanto ruido agregar sobre el eje horizontal, height cuanto ruido vertical aplicar , seed = un valor random que hace que sea reproducible el ejecto de distorcion
3.8.1.3 Compare and contrast geom_jitter() with geom_count(). Jitter genera distorcion, mientras geo_count solo agranda el tamaño sin alterar la posicion del centro de cada uno de los eventos

3.8.1.4 What’s the default position adjustment for geom_boxplot()? Create a visualisation of the mpg dataset that demonstrates it.

---
title: "Hoja4 cap3 "
output: html_notebook
---
```{r}
#install.packages("tidyverse")
library(tidyverse)
```

#3.2.4 Exercises

 - 1.Run ggplot(data = mpg). What do you see?
 - 2.How many rows are in mpg? How many columns?
 - 3.What does the drv variable describe? Read the help for ?mpg to find out.
 - 4.Make a scatterplot of hwy vs cyl.
 - What happens if you make a scatterplot of class vs drv? Why is the plot not useful?

3.2.4.1 No muestra NAda
```{r}
ggplot(data = mpg)
```

3.2.4.2 
```{r}
print( c("Row:",nrow(mpg),"Cols:",ncol(mpg)) )
```
3.2.4.3 
```{r}
print("rvf = front-wheel drive, r = rear wheel drive, 4 = 4wd")
```
3.2.4.4

```{r}
ggplot(mpg, aes(x = hwy, y = cyl)) +  geom_point()
```

3.2.4.5
No genera suficientes valores
```{r}
ggplot(mpg, aes(x = class, y = drv)) +  geom_point()
```

#3.3.1 Exercises
3.3.1.1 What's gone wrong with this code? Why are the points not blue?
Por que los colores son asignados por etiquetas en este caso todo tiene la etiqueta azul
```{r}
ggplot(data = mpg) +  
  geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
```
3.3.1.2
Which variables in mpg are categorical? Which variables are continuous? (Hint: type ?mpg to read the documentation for the dataset). How can you see this information when you run mpg?
la variable Class es categorica
```{r}
ggplot(data = mpg) +  
  geom_point(mapping = aes(x = displ, y = hwy, color = class))
```
3.3.1.3 
Map a continuous variable to color, size, and shape. How do these aesthetics behave differently for categorical vs. continuous variables?
La forma(shape) se visualiza mejor con variables categoricas, el tama?o se aprecia de mejor forma en variables continuas 
```{r}
ggplot(data = mpg) +  
  geom_point(mapping = aes(x = displ, y = hwy, color = class, size = hwy, shapes=class))
```
3.3.1.4 What happens if you map the same variable to multiple aesthetics?
Categoriza el tama?o de la figura en el eje X, pero este cambia de color segun su valor
```{r}
ggplot(mpg, aes(x = displ, y = hwy, colour = displ, size = displ)) +
  geom_point()
```
3.3.1.5
What does the stroke aesthetic do? What shapes does it work with? (Hint: use ?geom_point)
Crea contorno de un grosor

```{r}
ggplot(mpg, aes(x = displ, y = hwy, colour = displ)) +
  geom_point(colour = "black", fill = "white", stroke = 3, shape=25)
```
3.3.1.6
What happens if you map an aesthetic to something other than a variable name, like aes(colour = displ < 5)? Note, you'll also need to specify x and y.
vuelve discreto las posibilidades, es como estar creando una funcion que solo retornar? dos posibles valores
```{r}
ggplot(mpg, aes(x = displ<5, y = hwy, colour = displ)) +
  geom_point(colour = "black", fill = "white", stroke = 3, shape=25)
```

#3.5.1 Exercises

3.5.1.1 What happens if you facet on a continuous variable?
categoriza todos los valores continuos que existen en el dataset
```{r}
ggplot(mpg, aes(x = displ, y = hwy)) +  geom_point() +  facet_grid( ~ hwy)
```


3.5.1.2 What do the empty cells in plot with facet_grid(drv ~ cyl) mean? How do they relate to this plot?
basicamente todos los putos en el eje x corresponden a ciertas categorias eso hace que los puntos se agrupen, 
```{r}
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = drv, y = cyl))
```

3.5.1.3 What plots does the following code make? What does . do?
la grafica se genera partiendo por filas las categorias del drv esto implica que el valor del eje vertical se repitirá en cada faceta
```{r}
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(drv ~ .)
```
Pero si el punto le antecede se categoriza por columna, esto implica que el eje x se repite por cada faceta
```{r}
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  facet_grid(. ~ cyl)
```
3.5.1.4 Take the first faceted plot in this section:
What are the advantages to using faceting instead of the colour aesthetic? What are the disadvantages? How might the balance change if you had a larger dataset?
El primer punto positivo es que puedo ver la data con mas de una dimension, la segunda es que el comportamiento de cada categoria es analizada por separado mientras que el color no me lo permite distinguir facilmente
```{r}
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)
```


3.5.1.5 Read ?facet_wrap. What does nrow do? What does ncol do? What other options control the layout of the individual panels? Why doesn't facet_grid() have nrow and ncol arguments?
nrow cuenta el numero de filas que tiene el dataset
ncol cuenta el numero de columnas 
con relacion a Face grid no se necesitan por que se parte en terminos de la segmentacion hecha por las dimensiones utilizadas


3.5.1.6 When using facet_grid() you should usually put the variable with more unique levels in the columns. Why?
Depende de la necesidad de información si la segmentacion es mayor sobre el eje horizontal , es mejor facetar filas y el mismo caso ocurre si la segmentacion es mayor sobre el eje vertical es mejor facetar columanas


#3.6.1 Exercises

3.6.1.1 What geom would you use to draw a line chart? A boxplot? A histogram? An area chart?
```{r}
ggplot(data = mpg) + 
  geom_line(mapping = aes(x = displ, y = hwy)) 
   
```
```{r}
ggplot(data = mpg) + 
  geom_boxplot(mapping = aes(x = displ, y = hwy, group=class)) 
   
```
```{r}
ggplot(data = mpg) + 
 geom_histogram(mapping = aes( hwy)) 
 
```
```{r}
ggplot(data = mpg) + 
 geom_area(mapping = aes(displ, hwy)) 
 
```

3.6.1.2 Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.
el color se va categorizar, y smooth probablemetne va a a suavizar la data (no esperaba las lines, creo que es una tendencia)
```{r}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + 
  geom_point() + 
  geom_smooth(se = FALSE)
```

3.6.1.3 What does show.legend = FALSE do? What happens if you remove it?
Why do you think I used it earlier in the chapter?
la leyenda es suprimida 
```{r}
ggplot(data = mpg,mapping = aes(x = displ, y = hwy, color = drv),show.legend = FALSE) + 
 geom_point(show.legend = FALSE) +
  geom_smooth(show.legend = FALSE,se = FALSE )
```
  3.6.1.4 What does the se argument to geom_smooth() do?
  Genera una una region sombreada, creo que es el posible error de algun tipo de regresión
```{r}
ggplot(data = mpg,mapping = aes(x = displ, y = hwy, color = drv),show.legend = FALSE) + 
 geom_point(show.legend = FALSE) +
  geom_smooth(show.legend = FALSE,se = TRUE )
```

3.6.1.5 Will these two graphs look different? Why/why not?
No veo diferencia parece que es solo las etiquetas
```{r}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()

ggplot() + 
  geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
```
3.6.1.6 Recreate the R code necessary to generate the following graphs.
```{r}
ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point() +
  geom_smooth(se = FALSE)


ggplot(mpg, aes(x = displ, y = hwy, group=drv)) +
  geom_point()  +
  geom_smooth(se = FALSE )


ggplot(mpg, aes(x = displ, y = hwy, group=drv, color=drv)) +
  geom_point()  +
  geom_smooth(se = FALSE )

ggplot(mpg, aes(x = displ, y = hwy, group=drv, color=drv,linetype = drv)) +
  geom_point()  +
  geom_smooth(se = FALSE )
    
```


#3.7.1 Exercises

3.7.1.1 What is the default geom associated with stat_summary()? How could you rewrite the previous plot to use that geom function instead of the stat function?
Por default tiene la mediana y la SE, se utilizan 
```{r}
ggplot(data = diamonds) +
  geom_pointrange(
    mapping = aes(x = cut, y = depth),
    stat = "summary",
    fun.y = median    ,
    fun.ymin = min,
    fun.ymax = max
  )
```

3.7.1.2 What does geom_col() do? How is it different to geom_bar()?
geom_col se grafica basado en ejes coordenados, mientras que geombar solo con la incidencia de uno de sus ejes
basicamente el eje vertical depende del parametro
```{r}
ggplot(data = diamonds) +
  geom_col(
    mapping = aes(x = cut, y = depth)

  )

ggplot(data = diamonds) +
 geom_bar(mapping = aes(x = cut)

  )


```

3.7.1.3 Most geoms and stats come in pairs that are almost always used in concert. Read through the documentation and make a list of all the pairs. What do they have in common?
1. geom_hex() stat_bin_hex()
2. geom_path() geom_line()
3. geom_smooth() stat_smooth()
etc

3.7.1.4 What variables does stat_smooth() compute? What parameters control its behaviour?
Formula usada para computar y ~ x, y ~ poly(x, 2), y ~ log(x)


3.7.1.5 In our proportion bar chart, we need to set group = 1. Why? In other words what is the problem with these two graphs?
El problema es la proporcion por que el grupo es proporcional
```{r}
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, y = ..prop..))
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = color, y = ..prop..))


ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = color, y = ..prop.., group=color))
```
#3.8.1 Exercises

3.8.1.1 What is the problem with this plot? How could you improve it?
Al introducir ruido hace que tenga un mejor ajuste
```{r}
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_point(position=position_jitter())

```
3.8.1.2 What parameters to geom_jitter() control the amount of jittering?
width = NULL cuanto ruido agregar sobre el eje horizontal, height cuanto ruido vertical aplicar , seed = un valor random que hace que sea reproducible el ejecto de distorcion

3.8.1.3 Compare and contrast geom_jitter() with geom_count().
Jitter genera distorcion, mientras geo_count solo agranda el tamaño sin alterar la posicion del centro de cada uno de los eventos

```{r}
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_count()

```

3.8.1.4 What’s the default position adjustment for geom_boxplot()? Create a visualisation of the mpg dataset that demonstrates it.

```{r}
ggplot(data = mpg, mapping = aes(x = cty, y = hwy, group=class)) + 
  geom_boxplot()

```