visuliztion

diamonds %>%
  ggplot(aes(x = cut)) +
  geom_bar()

diamonds %>%
  ggplot(aes(x = carat)) +
  geom_histogram(binwidth = 0.5)

diamonds %>%
  filter(carat < 3) %>%
  ggplot(aes(x = carat)) +
  geom_histogram(binwidth = 0.5)

diamonds %>%
  ggplot(aes(x = carat, color = cut)) +
  geom_freqpoly()
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

## typical values

diamonds %>%
    #filter out diamonds > 3 carat 
    filter(carat> 3) %>%

    #plot 
    ggplot(aes(x = carat)) +
    geom_histogram(binwidth = 0.01)

faithful %>%
    ggplot(aes(eruptions))

geom_histogram(binwidth = 0.25)
## geom_bar: na.rm = FALSE, orientation = NA, lineend = butt, linejoin = mitre
## stat_bin: na.rm = FALSE, binwidth = 0.25, bins = NULL, orientation = NA
## position_stack

unusual values

diamonds %>%
    ggplot(aes(y)) +
    geom_histogram() +
    coord_cartesian(ylim =c(0, 50))
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

## missing values

diamonds %>%
  filter(y < 3 | y > 20) %>%
  mutate(y_rev = ifelse(y < 3 | y > 20, NA, y)) %>%
  ggplot(aes(x = x, y = y)) +
  geom_point()

## covartiation

diamonds %>%
    ggplot(aes(x =cut, y = price)) +
    geom_boxplot()

diamonds %>%
    count(color, cut) %>%
    ggplot(aes(x = color, y = cut, fill = n)) +
    geom_tile()

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.