This is an experiment mainly for me to learn R better. I want to extract access data from Google Analytics, and combine that with content data from my WordPress blog. Code and pre-munged data
setwd("~/src/ganalytics")
library(plyr)
library(ggplot2)
library(stringr)
source("./utility.R")
load("data.Rda")
load("wp_posts.Rda")
load("wp_stats.Rda")
load("posts.Rda")
Let's check the total hits for one given page, and compare with Google Analytics
We'll check the hits for '407 Indonesian textbooks openly available', Google Analytics shows 334 page views.
pagetitle = "407 Indonesian textbooks openly available"
# we first select the matching rows, and then sum the visits
sum(tbl[tbl$pageTitle == pagetitle, ]$visits)
## [1] 335
Not sure why it's one more. Let's check for a specific day.
January 19, 2013, there were 23 visits in total on Google Analytics.
sum(tbl[tbl$date == makedate("2013-01-19"), ]$visits)
## [1] 23
Let me try to plot the access to the Indonesian textbook page.
ind <- tbl[tbl$pageTitle == pagetitle, ]
ggplot(ind, aes(x = date, y = visits)) + geom_point() + geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using
## loess. Use 'method = x' to change the smoothing method.
Is there any correlation between length and reading level?
ggplot(wp_stats[wp_stats$flesch > 0, ], aes(x = flesch, y = length)) + geom_point() +
geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using
## loess. Use 'method = x' to change the smoothing method.
## Warning: Removed 1 rows containing missing values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
Doesn't seem so. (For some reason, it doesn't work well on a few foreign songs etc, gives negative level, so I exclude these).
Showing Flesch-Kincaid level over log(total visits) - no correlation?
ggplot(posts[posts$flesch > 0, ], aes(x = log(totvisits), y = flesch)) + geom_point() +
geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using
## loess. Use 'method = x' to change the smoothing method.
Log of total visits over links
ggplot(posts, aes(x = log(totvisits), y = links)) + geom_point() + geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using
## loess. Use 'method = x' to change the smoothing method.
Log of total visits over comments
ggplot(posts, aes(x = log(totvisits), y = num_comments)) + geom_point() + geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using
## loess. Use 'method = x' to change the smoothing method.
Length over total visits
ggplot(posts, aes(x = log(totvisits), y = length)) + geom_point() + geom_smooth()
## geom_smooth: method="auto" and size of largest group is <1000, so using
## loess. Use 'method = x' to change the smoothing method.