2015-07-15

About Data Science HC

Why use Markdown

  • 專注寫作 (簡化排版)
  • Markdown文件
    • Remove HTML tag for higher readibility
    • Inline HTML is avaliable
    • More power pandoc markdown (seealso)

Why use R

R 來自世界上最專業的統計學家

R 可以輸出高品質的視覺化

R 有驚人彈性和潛力

R 很容易和其他工具整合

R 很容易擴充和客製化

身為一個資料分析師,最關心:

  • 舒適的撰寫資料分析報告環境
  • 解決重複性輸出報表的麻煩事
  • 分析結果能夠很方便地被驗證

Why R + Markdown is cool

  • Markdown + embedded R code chunks + LaTeX or MathML
  • Rmd -> md -> html (docx, pdf)
  • CheatSheet
  • 最新版的RStudio已經包含R Markdown功能
  • 你也可以透過以下指令安裝R Markdown套件:
install.packages("rmarkdown")

Overview

Markdown


R Code Chunks


Inline R Code and Equations

  • 利用 `r` 在markdown中插入R程式
  • 插入 LaTeX 公式的方法:
    • 行內$ equation $
    • 段落 $$ equation $$

  • 這是DSHC推出的第 10門課程
  • 熵指標的公式為 \(-\sum{p_i \log{p_i}}\),表示系統的亂度

Rendering Output

  • RStudio: "Knit" command (Ctrl+Shift+K)
  • Command line: rmarkdown::render function
rmarkdown::render("input.Rmd")

Markdown Quick Reference

在RStudio中,在UI界面中點選help (?)可以查閱Markdown語法

R Code Chunks Overview

R code will be evaluated and printed

```{r}
summary(cars$dist)
```
summary(cars$dist)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00   26.00   36.00   42.98   56.00  120.00

Named R code chunk.

```{r plot}
summary(cars)
plot(cars)
```
  • Easy Navigation in RStudio

Basic Chunk Options

  • echo(TRUE): whether to include R source code in the output file
  • eval(TRUE): whether to evaluate the code chunk
  • message(TRUE): whether to preserve messages emitted by message()
  • include(TRUE): whether to be written into the output document, but the code is still evaluated and plot files are generated
  • warning(TRUE): whether to preserve warnings in the output
  • comment("##"): set to comment notation
  • results('hide','asis'): hide output ; asis treats the output of your R code as literal Markdown (when using like kable function)

Set global chunk options at code chunks header:

knitr::opts_chunk$set(echo=FALSE, results='hide')

Formats (Rmd to XXX)

  • Documents
    • HTML (.html)
    • Word (.docx)
    • PDF (.pdf)
    • Markdown (.md)
  • Presentations
    • ioslides (.html, ioslides_presentation)
    • slidy (.html, slidy_presentation)
    • beamer (.pdf, beamer_presentation)
  • R notebooks (.R -> (.md) -> .XXX)

Example 1. Documents

copy and paste the following text, and save as example.Rmd
---
title: "Example 1"
output: html_document
---
Given Fisher’s iris data set and one simple command, 
then we can produce this plot as following:

```{r}
library(ggplot2)
qplot(Sepal.Length, Petal.Length, data = iris, color = Species)
```

Example 2. Presentations

---
title: "Example 2"
output: slidy_presentation
---
##  Learn ggplot2 by example
Try Fisher’s iris data set
```{r}
library(ggplot2)
ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width)) + geom_point()
```


##  Learn ggplot2 by example 2
Differentiate Species by color
```{r}
ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, 
                      color=Species)) + geom_point(size=3)
```

Example 3. R notebooks

rmarkdown::render("test.R")
#' Density Curve of Sepal Width
library(ggplot2)
density2 <- ggplot(data=iris, aes(x=Sepal.Width, fill=Species))
density2 + geom_density(stat="density", alpha=I(0.2)) +
  xlab("Sepal Width") +  ylab("Density")

References