Introducción

El siguiente documento nos servirá para aprender a usar KnitR y analizar el dataframe de diamantes que viene con ggplot2.

Instalando ggplot2

Para instalar ggplot2 usamos la instrucción install.packages("ggplot2").

library(ggplot2)

Dataset diamonds

head(diamonds)
##   carat       cut color clarity depth table price    x    y    z
## 1  0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
## 2  0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31
## 3  0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31
## 4  0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63
## 5  0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75
## 6  0.24 Very Good     J    VVS2  62.8    57   336 3.94 3.96 2.48
str(diamonds)
## 'data.frame':    53940 obs. of  10 variables:
##  $ carat  : num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num  55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int  326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
summary(diamonds)
##      carat               cut        color        clarity     
##  Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065  
##  1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258  
##  Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194  
##  Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171  
##  3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066  
##  Max.   :5.0100                     I: 5422   VVS1   : 3655  
##                                     J: 2808   (Other): 2531  
##      depth           table           price             x         
##  Min.   :43.00   Min.   :43.00   Min.   :  326   Min.   : 0.000  
##  1st Qu.:61.00   1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710  
##  Median :61.80   Median :57.00   Median : 2401   Median : 5.700  
##  Mean   :61.75   Mean   :57.46   Mean   : 3933   Mean   : 5.731  
##  3rd Qu.:62.50   3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540  
##  Max.   :79.00   Max.   :95.00   Max.   :18823   Max.   :10.740  
##                                                                  
##        y                z         
##  Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 4.720   1st Qu.: 2.910  
##  Median : 5.710   Median : 3.530  
##  Mean   : 5.735   Mean   : 3.539  
##  3rd Qu.: 6.540   3rd Qu.: 4.040  
##  Max.   :58.900   Max.   :31.800  
## 

Resumen del dataset

Variable Explicación Tipo
caract Kilates del diamante numerico
price Precio del diamante numerico entero
Clarity Claridad del diamante Ordinal

Analisis de la variable price vs carat

Estetica (Aesthetics)

qplot(carat,price,data=diamonds) ##Grafica basica

qplot(carat,price,data=diamonds,colour=color)

qplot(carat,price,data=diamonds,colour=color,shape=cut)

qplot(carat,price,data=diamonds,colour=color,shape=cut, size=x*y*z)

qplot(log(carat),log(price),data=diamonds,colour=color,shape=cut, size=x*y*z)

Facetting

Row ~ Column

qplot(carat,price,data=diamonds,facets=.~color)

qplot(carat,price,data=diamonds,facets=color~.)

qplot(carat,price,data=diamonds,facets=clarity~color)

Usando ggplot2()

ggplot(diamonds, aes(x = carat,y=price) ) +
        geom_point()

ggplot(diamonds, aes(x = carat,y=price) ) +
        geom_point() +
        ggtitle("Grafica kilate vs precio")

ggplot(diamonds, aes(x = carat,y=price,colour=color) ) +
        geom_point() +
        ggtitle("Grafica kilate vs precio")+
        xlab("Kilates") +
        ylab("Precio")

ggplot(diamonds, aes(x=price, y=..density..))+
        geom_histogram(fill="lightyellow",colour="grey60",size=0.2)+
        geom_density(fill="blue",alpha=0.2) 
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

ggplot(diamonds, aes(x=log(price), y=..density..))+
        geom_histogram(fill="yellow",colour="grey60",size=0.2)+
        geom_density(fill="blue",alpha=0.2) 
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.