Source code of this doc(gist): https://gist.github.com/kongscn/34cdfc0585c820be3e43
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
R markdown and R work well with multiple languages. If you are runing unix-like OS, just use UTF-8 encoding anywhere and you’re done. If you are running Windows, well, I tried and tried but still got a lot of wired problems. Setting Sys.setlocale
can fix the output but the problem is more than that. My suggestion is through it away and get a Mac or Linux. If there’s a better workout, please fire me from my gist.
R markdown可以处理包含中文(和其它语言)的文档。在类Unix系统中,只要使用UTF-8即可,不需要特别的设置。如果是Windows系统,一个提示是Sys.setlocale
设定编码,可以解决输出和一些问题,但仍然有很多诡异的问题。除了扔掉你的Windows, 或者扔掉多语言,我没有更好的建议了。另在R markdown文档中使用chunk option include=FALSE
,可以运行而不显示出这些设置。
aqidf=xlsx::read.xlsx('武汉AQI.xlsx',
sheetName="Sheet1",
encoding='UTF-8')
aqidf = aqidf[!duplicated(aqidf$日期), ]
levels(aqidf$质量等级) = c("优", "良", "轻度污染", "中度污染", "重度污染", "严重污染")
whetherdf=xlsx::read.xlsx('武汉天气.xlsx',
sheetName="Sheet1",
encoding='UTF-8')
whetherdf = whetherdf[!duplicated(whetherdf$日期), ]
whetherdf = subset(whetherdf, select=-c(年,月, 日))
summary(aqidf)
## 日期 AQI指数 质量等级 当天AQI排名
## Min. :2013-11-01 Min. : 42.0 优 :25 Min. : 16
## 1st Qu.:2013-12-23 1st Qu.: 94.2 良 :27 1st Qu.: 86
## Median :2014-02-14 Median :124.5 轻度污染: 7 Median :106
## Mean :2014-02-14 Mean :146.8 中度污染:56 Mean :115
## 3rd Qu.:2014-04-08 3rd Qu.:189.2 重度污染:71 3rd Qu.:152
## Max. :2014-05-31 Max. :353.0 严重污染:24 Max. :190
## PM2.5 PM10 Co No2
## Min. : 18.0 Min. : 7 Min. :0.70 Min. : 22.0
## 1st Qu.: 62.2 1st Qu.: 84 1st Qu.:1.05 1st Qu.: 44.2
## Median : 89.5 Median :134 Median :1.29 Median : 63.0
## Mean :113.2 Mean :145 Mean :1.44 Mean : 64.9
## 3rd Qu.:147.8 3rd Qu.:199 3rd Qu.:1.75 3rd Qu.: 82.0
## Max. :590.0 Max. :406 Max. :3.12 Max. :132.0
## So2
## Min. : 6.0
## 1st Qu.: 28.0
## Median : 40.0
## Mean : 46.2
## 3rd Qu.: 62.8
## Max. :112.0
library(ggplot2)
p = qplot(日期, AQI指数, data=aqidf, geom="line")
p
aqin = subset(aqidf, select=-c(日期, 质量等级))
aqixts = xts::as.xts(aqin, order.by=aqidf$日期, frequency=1)
aqi_idx = aqixts$AQI指数
ar(diff(as.ts(aqi_idx)), method='mle')
##
## Call:
## ar(x = diff(as.ts(aqi_idx)), method = "mle")
##
## Coefficients:
## 1 2 3 4 5 6 7 8 9
## -0.099 -0.462 -0.271 -0.323 -0.238 -0.226 -0.210 -0.139 -0.406
## 10 11 12
## -0.150 -0.155 -0.194
##
## Order selected 12 sigma^2 estimated as 1801
fUnitRoots::adfTest(aqi_idx,lags=12,type=c("c"))
##
## Title:
## Augmented Dickey-Fuller Test
##
## Test Results:
## PARAMETER:
## Lag Order: 12
## STATISTIC:
## Dickey-Fuller: -1.3949
## P VALUE:
## 0.5377
##
## Description:
## Wed Jul 2 12:33:23 2014 by user:
4 Comments
Amazing, isn’t it! You may wonder why should one use non-ascii characters in his source code. Definetly non-Enghlish source(and docs too) is a bad idea when you are to share your work world wide, But this is not the general circumstance, right? Mostly you get some data(hopefull in English but usually in YOUR language), and work around, do some analysis, and write a simple result note (it’s not even a report), and maybe share it with someone around you. In this situation, multi-language is meanful. If you instead tranlate your data first and work on it, it becomes LESS straight forward.
I think it’s best to keep it simple, even it is complex to. (保持简洁很重要,即使方式反而复杂。)
Again you are welcomed to comment in my gist of this page.