STAT 451, Day 6

Data Scraping Continued

wunderdata.txt

wunder <- read.csv('wunder-data.txt', header=FALSE, col.names=c('Date', 'Temp'))
wunder$Date <- seq(as.Date("2009/1/1"), as.Date("2009/12/31"), "days")
head(wunder)
        Date Temp
1 2009-01-01   26
2 2009-01-02   34
3 2009-01-03   27
4 2009-01-04   34
5 2009-01-05   34
6 2009-01-06   31

Open the other file formats.

wunderdata.txt

Using the default

plot(wunder)

plot of chunk unnamed-chunk-2

wunderdata.txt

Using package ggplot2

library(ggplot2)
ggplot(wunder, aes(Date,Temp))+geom_point()

plot of chunk unnamed-chunk-3

wunderdata.txt

Using least squares estimation and confidence band

ggplot(wunder, aes(Date,Temp))+geom_point()+ stat_smooth(method=lm, level=0.95)

plot of chunk unnamed-chunk-4

wunderdata.txt

Using package ggplot2 with Loess fit

ggplot(wunder, aes(Date,Temp))+geom_point()+ stat_smooth()

plot of chunk unnamed-chunk-5

Data Scraping

According to the author…

A lot of people like to keep everything within a safe point-and-click interface, but trust me. Pick up just a little bit of programming skills, and you can open up a whole bag of possibilities for what you can do with data.

page 30

Spotify

url <- "https://spotifycharts.com/regional/ca/daily/"
timevalues <- seq(as.Date("2018/08/01"), as.Date("2018/08/31"), by = "day")
head(timevalues)
[1] "2018-08-01" "2018-08-02" "2018-08-03" "2018-08-04" "2018-08-05"
[6] "2018-08-06"

Create list of URLs

unitedata<- function(x){
 full_url <- paste0(url, x)
 full_url
}
finalurl <- unitedata(timevalues)
head(finalurl)
[1] "https://spotifycharts.com/regional/ca/daily/2018-08-01"
[2] "https://spotifycharts.com/regional/ca/daily/2018-08-02"
[3] "https://spotifycharts.com/regional/ca/daily/2018-08-03"
[4] "https://spotifycharts.com/regional/ca/daily/2018-08-04"
[5] "https://spotifycharts.com/regional/ca/daily/2018-08-05"
[6] "https://spotifycharts.com/regional/ca/daily/2018-08-06"