The instructions for obtaining R largely depend on the user’s hardware and operating system. The R Project has written an R Installation and Administration manual with complete, precise instructions about what to do, together with all sorts of additional information. The following is just a primer to get a person started.
Visit one of the links below to download the latest version of R for your operating system:
Microsoft Windows: http://cran.r-project.org/bin/windows/base/
On Microsoft Windows, click the R-x.y.z.exe installer to start installation. When it asks for “Customized startup options”, specify Yes. In the next window, be sure to select the SDI (single document interface) option.
To make it easier to work with R, install RStudio from
RStudio: https://www.rstudio.com/products/rstudio/download3/
After downloading RStudio, open RStudio and install Swirl Swirl by typing:
install.packages("swirl")
Once it finishes installation, run
library(swirl)
Then you will be prompted to type:
swirl()
Follow now the prompts. Once you are asked which course you want to choose, pick “R Programming”.
Now follow the prompts and go through the programming course.
To read about Swirl, check out http://swirlstats.com/students.html
Google Trends is a useful way to compare changes in popularity of certain search terms over time, and Google Trends data can be used as a proxy for all sorts of difficult-to-measure quantities like economic activity and disease propagation.
install.packages("gtrendsR")
After the package finished installing, type
library(gtrendsR)
## Warning: package 'gtrendsR' was built under R version 3.2.5
## Creating a generic function for 'toJSON' from package 'jsonlite' in package 'googleVis'
usr <- "metricswithr@gmail.com"
psw <- "BUEC333SFU"
gconnect(usr, psw)
You should be now connected to Google.
word1 <- "Thanksgiving"
word2 <- "Superbowl"
res1 <- gtrends(c(word1,word2))
plot(res1)
## Warning: Removed 2 rows containing missing values (geom_path).
You just plotted the time series for google searches for word1 and word2. You can see that there is quite a lot of seasonality for these two words and that one is a lot more popular than the other.
word3 <- "recession"
res2 <- gtrends(c(word1, word2, word3))
plot(res2)
## Warning: Removed 3 rows containing missing values (geom_path).
Compared to “Thanksgiving” and “Christmas”, “recession” is a lot less popular. However, it is interesting to see that there was an increase in the search for “recession” just as official annoucements about recession were being made in 2008.
word4 <- "tofu"
res3 <- gtrends(word4, geo = c("CA","US","JP"))
plot(res3)
## Warning: Removed 3 rows containing missing values (geom_path).
First, there is a lot of seasonality in Canada and the US. You can see that the search for tofu, which is considered a “healthy food,” increases at the beginning of the year and then starts decreasing throughout the year. This is consistent with the fact that one of the most popular New year resolutions is “being healthy.” There is another surge in interest in tofu somewhere in the summer, and then this interest appears to be constant until the end of the year. Perhaps this is consistent with the beach season?
Second, we do not observe the same type of seasonality for Japan. This could be because “tofu” is not spelled with the Latin alphabet, and because tofu is a routine food, so it is less likely to be associated with health trends and resolutions. Perhaps there are other reasons for this – can you think of one?
Let’s see a more detailed plot for “tofu” in the US and Canada.
res4 <- gtrends(word4, geo = c("CA","US"), start_date = "2014-12-20", end_date = "2016-09-11")
plot(res4)
Now it looks as if, in Canada, there is a surge of interest in tofu somewhere in April. Can you think of a reason for this?
word5 <- "fasting"
res5 <- gtrends(c(word4, word5), geo = "CA",, start_date = "2014-12-20", end_date = "2016-09-11")
plot(res5)
location <- "CA"
query <- c("Oil", "Gas", "Petroleum")
oil_data <- gtrends(query, location, start_date = "2015-01-01", end_date = "2016-09-11")
plot(oil_data)
## Warning: Removed 3 rows containing missing values (geom_path).
This graph plots the popularity of the three words, “Oil”, “Gas”, and “Petroleum,” only in Canada between Jan 1st, 2015 and Sept 11th, 2016.
location <- "CA"
classes <- c("BUEC333","BUEC232")
SFU_data <- gtrends(classes, location, start_date="2011-09-01",end_date="2016-09-11")
plot(SFU_data)
## Warning: Removed 2 rows containing missing values (geom_path).
You can also see where some searches are being made. For example, if you type the code below, you will see that it is only in BC that the search for “BUEC333” and “BUEC232” is being made:
plot(SFU_data,type="geo")
## starting httpd help server ...
## done
?gtrends