在课程资源下载本周作业素材,这是某网站在一段时间内的统计指标,每三行为一个单位,分别是时间,ip,pv。将数据读入R,然后使用qplot函数用适当而美观的方式展现时间序列变化趋势。
解答:
引入ggplot2包:
library("ggplot2")
加载测试数据lesson8.txt:
x = scan("lesson8.txt", sep = "\n", what = list("", "", ""))
x[[1]][1:10]
## [1] "Sun Jul 8 23:59:02 HKT 2012" "Mon Jul 9 23:59:01 HKT 2012"
## [3] "Tue Jul 10 23:59:01 HKT 2012" "Wed Jul 11 23:59:01 HKT 2012"
## [5] "Thu Jul 12 23:59:02 HKT 2012" "Fri Jul 13 23:59:01 HKT 2012"
## [7] "Sat Jul 14 23:59:02 HKT 2012" "Sun Jul 15 23:59:01 HKT 2012"
## [9] "Mon Jul 16 23:59:02 HKT 2012" "Tue Jul 17 23:59:01 HKT 2012"
x[[2]][1:10]
## [1] "1922" "2345" "2255" "2179" "2225" "2392" "1957" "1809" "2277" "2682"
x[[3]][1:10]
## [1] "91938" "108521" "89036" "84149" "85583" "79507" "83188"
## [8] "77675" "88402" "100121"
区域设置(选项"C"表示关闭区域特定编码方式,见help):
lct <- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME", "C")
## [1] "C"
日期转换:
date <- as.Date(strptime(x[[1]], "%a %b %d %H:%M:%S HKT %Y"))
date[1:10]
## [1] "2012-07-08" "2012-07-09" "2012-07-10" "2012-07-11" "2012-07-12"
## [6] "2012-07-13" "2012-07-14" "2012-07-15" "2012-07-16" "2012-07-17"
构造数据集:
website = data.frame(date, ip = as.numeric(x[[2]]), pv = as.numeric(x[[3]]),
stringsAsFactors = F)
画出时间序列趋势图:
qplot(date, ip, data = website, geom = c("line", "point"), main = "ip ~ date")
qplot(date, pv, data = website, geom = c("line", "point"), main = "pv ~ date")
qplot(date, pv/ip, data = website, geom = c("line", "point"), xlab = "date",
ylab = "pv / ip rate", main = "pv/ip ~ date")
还原区域设置:
Sys.setlocale("LC_TIME", lct)
## [1] "Chinese (Simplified)_People's Republic of China.936"