Source code of this doc(gist): https://gist.github.com/kongscn/34cdfc0585c820be3e43

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

R markdown and R work well with multiple languages. If you are runing unix-like OS, just use UTF-8 encoding anywhere and you’re done. If you are running Windows, well, I tried and tried but still got a lot of wired problems. Setting Sys.setlocale can fix the output but the problem is more than that. My suggestion is through it away and get a Mac or Linux. If there’s a better workout, please fire me from my gist.

R markdown可以处理包含中文(和其它语言)的文档。在类Unix系统中，只要使用UTF-8即可，不需要特别的设置。如果是Windows系统，一个提示是Sys.setlocale设定编码，可以解决输出和一些问题，但仍然有很多诡异的问题。除了扔掉你的Windows, 或者扔掉多语言，我没有更好的建议了。另在R markdown文档中使用chunk option include=FALSE，可以运行而不显示出这些设置。

1 数据导入与整理

aqidf=xlsx::read.xlsx('武汉AQI.xlsx', 
              sheetName="Sheet1", 
              encoding='UTF-8')
aqidf = aqidf[!duplicated(aqidf$日期), ]
levels(aqidf$质量等级) = c("优", "良", "轻度污染", "中度污染", "重度污染", "严重污染")

whetherdf=xlsx::read.xlsx('武汉天气.xlsx', 
                  sheetName="Sheet1", 
                  encoding='UTF-8')
whetherdf = whetherdf[!duplicated(whetherdf$日期), ]
whetherdf = subset(whetherdf, select=-c(年,月, 日))
summary(aqidf)

##       日期               AQI指数          质量等级   当天AQI排名 
##  Min.   :2013-11-01   Min.   : 42.0   优      :25   Min.   : 16  
##  1st Qu.:2013-12-23   1st Qu.: 94.2   良      :27   1st Qu.: 86  
##  Median :2014-02-14   Median :124.5   轻度污染: 7   Median :106  
##  Mean   :2014-02-14   Mean   :146.8   中度污染:56   Mean   :115  
##  3rd Qu.:2014-04-08   3rd Qu.:189.2   重度污染:71   3rd Qu.:152  
##  Max.   :2014-05-31   Max.   :353.0   严重污染:24   Max.   :190  
##      PM2.5            PM10           Co            No2       
##  Min.   : 18.0   Min.   :  7   Min.   :0.70   Min.   : 22.0  
##  1st Qu.: 62.2   1st Qu.: 84   1st Qu.:1.05   1st Qu.: 44.2  
##  Median : 89.5   Median :134   Median :1.29   Median : 63.0  
##  Mean   :113.2   Mean   :145   Mean   :1.44   Mean   : 64.9  
##  3rd Qu.:147.8   3rd Qu.:199   3rd Qu.:1.75   3rd Qu.: 82.0  
##  Max.   :590.0   Max.   :406   Max.   :3.12   Max.   :132.0  
##       So2       
##  Min.   :  6.0  
##  1st Qu.: 28.0  
##  Median : 40.0  
##  Mean   : 46.2  
##  3rd Qu.: 62.8  
##  Max.   :112.0

2 Plots

library(ggplot2)
p = qplot(日期, AQI指数, data=aqidf, geom="line")
p

plot of chunk unnamed-chunk-2

3 AQI指数平稳性检验

aqin = subset(aqidf, select=-c(日期, 质量等级))
aqixts = xts::as.xts(aqin, order.by=aqidf$日期, frequency=1)
aqi_idx = aqixts$AQI指数
ar(diff(as.ts(aqi_idx)), method='mle')

## 
## Call:
## ar(x = diff(as.ts(aqi_idx)), method = "mle")
## 
## Coefficients:
##      1       2       3       4       5       6       7       8       9  
## -0.099  -0.462  -0.271  -0.323  -0.238  -0.226  -0.210  -0.139  -0.406  
##     10      11      12  
## -0.150  -0.155  -0.194  
## 
## Order selected 12  sigma^2 estimated as  1801

fUnitRoots::adfTest(aqi_idx,lags=12,type=c("c"))

## 
## Title:
##  Augmented Dickey-Fuller Test
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 12
##   STATISTIC:
##     Dickey-Fuller: -1.3949
##   P VALUE:
##     0.5377 
## 
## Description:
##  Wed Jul  2 12:33:23 2014 by user:

4 Comments

Amazing, isn’t it! You may wonder why should one use non-ascii characters in his source code. Definetly non-Enghlish source(and docs too) is a bad idea when you are to share your work world wide, But this is not the general circumstance, right? Mostly you get some data(hopefull in English but usually in YOUR language), and work around, do some analysis, and write a simple result note (it’s not even a report), and maybe share it with someone around you. In this situation, multi-language is meanful. If you instead tranlate your data first and work on it, it becomes LESS straight forward.

I think it’s best to keep it simple, even it is complex to. （保持简洁很重要，即使方式反而复杂。）

Again you are welcomed to comment in my gist of this page.

Rmd Demo: Chinese Enabled

Shel Kong

Wednesday, July 02, 2014

1 数据导入与整理

2 Plots

3 AQI指数平稳性检验

4 Comments