Krisalytics - R Playground

Playing with Hearts

Let’s create a function that will plot a heart. To do this i’ll create one half of the heart with .01(-t^2+40t+1200)cos(pit/180) then inverse it. The rest is text/colour edit. The function allows you to also add custom text in it.

heart1 = function(text){
    t = seq(0,60,len=100)
    plot(c(-8,8),c(0,20),type='n',axes=FALSE,xlab='',ylab='')
    x = -.01*(-t^2+40*t+1200)*sin(pi*t/180)
    y = .01*(-t^2+40*t+1200)*cos(pi*t/180)
    lines(x,y, lwd=4, col="hotpink")
    lines(-x,y, lwd=4, col="hotpink")
    text(0,8,text,col='black',cex=2.5)
}

heart1("Waifu")

If we get bored of this function (which we are already) we can start to breakdown the function into it’s component parts and use that in the functions. For example:

heart1 = function(text, colour, line_width){
    ifelse(exists("colour"),"",warning("Please enter a colour (e.g. hotpink)", call. = FALSE))
      ifelse(exists("line_width"),"",warning("Please enter a line width (e.g. 4)", call. = FALSE))
      
    t = seq(0,60,len=100)
    plot(c(-8,8),c(0,20),type='n',axes=FALSE,xlab='',ylab='')
    x = -.01*(-t^2+40*t+1200)*sin(pi*t/180)
    y = .01*(-t^2+40*t+1200)*cos(pi*t/180)
    lines(x,y, lwd=line_width, col=colour)
    lines(-x,y, lwd=line_width, col=colour)
    text(0,8,text,col='black',cex=2.5)
}

heart1("Anime Waifu","lightpink","30")

Image-to-Text

Tesseract - image to text. This reads an image and pushes any text it finds into a backend API. It’s quite nice actually. No limit either.

img <- "http://i.imgur.com/pQyYKoO.png"

# This is for png
download.file(img, "temp.png", mode = "wb")
localPNG <- readPNG("temp.png")
plot(0:1,0:1,type="n",ann=FALSE,axes=FALSE)
rasterImage(localPNG, 0,0,1,1)

tesseract = function(image){
text <- ocr(image)
cat(text)
}

tesseract(image = "http://i.imgur.com/pQyYKoO.png")

## 1 9
## :18 HAPPINuSS pm
## Abstract
## lied the phenomenon of the effect of Bobby<U+0092>s happiness on K
## ind that when Bobby<U+0092>s happiness index score was low, so was
## on the Adams Happiness Index. Using a single linear agreed<U+0093>!
## they and Kris<U+0092>s happiness was found to be signi<U+FB01>cant and aligns:
## <U+0091> implications for further teiationship research in the <U+FB01>eld ofKn's l
## n the use of happiness units in said relationship. Further mead! i
## 'e'eet eiaims made both hete and in previous mach.
## No method 2/10
## Results
## I was calculated to predict Kris<U+0092>s happiness based on Babb
## ; found, F(i,42) = 108.78, p < .001, with an R2 of .72.W
## K8 x Bobby<U+0092>s Happiness). (see appendices for output).
## ___/

# I win

Image Descriptions

Connected up to Microsoft Cognitive Services, we can extract the “characteristics” of an image, let’s see what we can get out of Kris.

key <- "6dd5d94a3b7640a0b0971818183fbce2"
key2 <- "7d89af95337d454ea91273be4eb0bb21"

img <- "http://i.imgur.com/chivNDA.png"

# This is for jpegs
# download.file(img,'z.jpg', mode = 'wb')
# jj <- read("z.jpg",native=TRUE)
# plot(0:1,0:1,type="n",ann=FALSE,axes=FALSE)
# rasterImage(jj,0,0,1,1)

# This is for png
download.file(img, "temp.png", mode = "wb")
localPNG <- readPNG("temp.png")
plot(0:1,0:1,type="n",ann=FALSE,axes=FALSE)
rasterImage(localPNG, 0,0,1,1)

#install.packages("devtools")
# require(devtools)
#install_github("flovv/Roxford", force = TRUE)
# https://github.com/flovv/Roxford
# http://flovv.github.io/Roxford_extension/

library(Roxford)
# What tags does this have

maxCandidates <- 4
  visionURL = paste0("https://api.projectoxford.ai/vision/v1.0/describe?maxCandidates=", 
    maxCandidates)
  mybody = list(maxCandidates = maxCandidates, url = img)
  visionResponse = POST(url = visionURL, content_type("application/json"), 
    add_headers(.headers = c(`Ocp-Apim-Subscription-Key` = key)), 
    body = mybody, encode = "json")
  content <- httr::content(visionResponse)
  
dataframeFromJSON <- function(l) {
  l1 <- lapply(l, function(x) {
    x[sapply(x, is.null)] <- NA
    unlist(x)
  })
  keys <- unique(unlist(lapply(l1, names)))
  l2 <- lapply(l1, '[', keys)
  l3 <- lapply(l2, setNames, keys)
  res <- data.frame(do.call(rbind, l3))
  return(res)
}
  
b <- dataframeFromJSON(content) 

b %>% map_if(is.factor, as.character) %>% as_data_frame -> b

names(b)

##  [1] "tags1"               "tags2"               "tags3"              
##  [4] "tags4"               "tags5"               "tags6"              
##  [7] "tags7"               "tags8"               "tags9"              
## [10] "tags10"              "tags11"              "tags12"             
## [13] "tags13"              "tags14"              "tags15"             
## [16] "tags16"              "tags17"              "tags18"             
## [19] "tags19"              "captions.text"       "captions.confidence"
## [22] "width"               "height"              "format"

b$tags20 <- "cute"
b$tags21 <- "super cute"
b$tags22 <- "10/10"
colstokeep <- grep("tags", names(b))

Categories <- paste0(b[1,colstokeep])
Caption <- b$captions.text
Caption_Confidence <- b$captions.confidence

Categories

##  [1] "person"     "indoor"     "front"      "woman"      "man"       
##  [6] "young"      "looking"    "screen"     "holding"    "food"      
## [11] "using"      "monitor"    "sitting"    "computer"   "table"     
## [16] "laptop"     "standing"   "shirt"      "sign"       "cute"      
## [21] "super cute" "10/10"

Caption

## [1] "person standing in front of a screen"
## [2] NA                                    
## [3] NA

Caption_Confidence

## [1] "0.261040314793887" NA                  NA

# I win again

Realtime Audio Data

Actually, perhaps we don’t feel like doing something completely useless. Let’s have a look at what’s currently playing on some Bauer stations…

channelID <- c(2180,7229,7226,2178,2179,2177,7214,7241,7228,7227,2181,7223,2175)

channelName <- c("Absolute Radio","Magic","Kiss","Absolute Radio 80s","Absolute Radio 90s","Absolute Radio 70s,","Clyde 1","Planet Rock","Kisstory","Kiss Fresh","Absolute Classic Rock","Kerrang Radio","Absolute Radio 00s")

channelCode <- c("abr","mag","ki1","ab8","ab9","ab7","cl1","pln","ki2","ki3","abc","ker","ab0")

DF1 <- data.frame(Date = "Date", AS = "AS", ASDay= "AAS", TrackTitle="",ArtistName="",Image="",Brand="")
DF1 <- DF1[-1,]

for (z in 1:length(channelName)){
  
    GET_Url1 <- paste0("http://listenapi.bauerradio.com/api9/nowplaying/",channelCode[z])
    data1 <- GET(GET_Url1)
    data1 <- content(data1)
    
      ifelse(!is.null(data1$TrackTitle),TrackTitle <-data1$TrackTitle,TrackTitle <-0)
      ifelse(!is.null(data1$ArtistName),ArtistName <-data1$ArtistName,ArtistName <-0)
      ifelse(!is.null(data1$ArtistImageUrl),Image <-data1$ArtistImageUrl,Image <-0)
      
    ifelse(as.character(Sys.time())>data1$EventFinish,TrackTitle <-"No Song Playing","")
    ifelse(as.character(Sys.time())>data1$EventFinish,ArtistName <-"No Song Playing","")
    ifelse(as.character(Sys.time())>data1$EventFinish,Image <-"http://i.imgur.com/vTDs1W5.jpg","")
    
  DF <- data.frame(Date = Sys.time(),
                     TrackTitle = TrackTitle,
                     ArtistName = ArtistName,
                     Image = Image,
                     Brand <- channelName[z])
    
  DF1 <- rbind(DF1,DF)
}
names(DF1)[5] <- "Brand"

pander(DF1[,c(-4,-1)],caption = "Realtime Audio - Currently Playing")

Realtime Audio - Currently Playing
TrackTitle	ArtistName	Brand
Drinking In LA	Bran Van 3000	Absolute Radio
No Song Playing	No Song Playing	Magic
Love On Me	Galantis & Hook N Sling	Kiss
Heaven Is A Place On Earth	Belinda Carlisle	Absolute Radio 80s
No Song Playing	No Song Playing	Absolute Radio 90s
No Song Playing	No Song Playing	Absolute Radio 70s,
No Song Playing	No Song Playing	Clyde 1
No Song Playing	No Song Playing	Planet Rock
Lose Yourself	Eminem	Kisstory
No Song Playing	No Song Playing	Kiss Fresh
It’s Only Rock ‘n’ Roll	The Rolling Stones	Absolute Classic Rock
No Song Playing	No Song Playing	Kerrang Radio
Lsf Lost Souls Forever	Kasabian	Absolute Radio 00s

Popular Articles

We can pull down the most popular articles across a subsection of the internets, in this case we’re pulling Entertainment UK. I’m sorting by current velocity as it’s an indication of an article going viral. If it’s current velocity = maxiumum velocity it means it’s stagnated or it’s still growing, that must be judged with the volume of shares and the velocity values.

Api_Url <- c("http://api.newswhip.com/v1/region/Uk/Entertainment/24?key=YVtTJWFDnnusn")

Name <- c("Entertainment")

DF <- data.frame(Title = "Title", Link = "Link", Summary = "Summary", Timestamp = "Timestamp",
                 Facebook_Interactions = "Facebook_Interactions", Twitter_Retweets = "Twitter_Retweets",
                 Current_Velocity = "Current_Velocity", Highest_Velocity = "Highest_Velocity")
DF <- DF[-1,]

Title <- 0
Link <- 0
Summary <- 0
Facebook_Interactions <- 0
Twitter_Retweets <- 0
Timestamp <- 0
Current_Velocity <- 0
Highest_Velocity <- 0
Category <- 0

for (p in 1:length(Api_Url)){
  Type <- GET(Api_Url[p])
  result <- httr::content(Type)
  
    for (i in 1:length(result$articles)){
      Title[i] <- result$articles[[i]]$headline
      Title[i] <- gsub("[^[:alnum:] ]", "", Title[i])
      Link[i] <- result$articles[[i]]$link
      Summary[i] <- result$articles[[i]]$excerpt
      Summary[i] <- gsub("[^[:alnum:] ]", "", Summary[i])
      Facebook_Interactions[i] <- result$articles[[i]]$recent_fb_counts
      Twitter_Retweets[i] <- result$articles[[i]]$recent_tw_counts
      Timestamp[i] <- result$articles[[i]]$publication_timestamp
      Current_Velocity[i] <- result$articles[[i]]$nw_score
      Highest_Velocity[i] <- result$articles[[i]]$max_nw_score
      Category[i] <- Name[p]
    }
  
  News <- data.frame(Title = Title[1:length(result$articles)],
                     Link = Link[1:length(result$articles)],
                     Summary = Summary[1:length(result$articles)],
                     Timestamp = Timestamp[1:length(result$articles)],
                     Facebook_Interactions = Facebook_Interactions[1:length(result$articles)],
                     Twitter_Retweets = Twitter_Retweets[1:length(result$articles)],
                     Current_Velocity = Current_Velocity[1:length(result$articles)],
                     Highest_Velocity = Highest_Velocity[1:length(result$articles)],
                     Category = Category[1:length(result$articles)])
  
  DF <- rbind(DF,News)
}

Spike <- DF
rm(DF,News)

Spike <- Spike[with(Spike, order(-Spike$Current_Velocity)), ]
pander(head(Spike[,c(1,5,6,7)]),caption = "Popular Articles - Entertainment")

Popular Articles - Entertainment (continued below)
	Title	Facebook_Interactions
2	Lorry drives into German market reports BBC News	5358
4	Nine die in lorry attack on Berlin Christmas market	6086
6	This engagement photo captured a strangers perfect reaction	6829
17	Monopoly Opens A Rules Helpline To Help Settle Disputes This Christmas	4216
3	Margot Robbie marries Tom Ackerley during secret Aussie wedding	2434
32	First Dates Mandi sends viewers into a frenzy over bizarre headpiece	65

	Twitter_Retweets	Current_Velocity
2	4140	1499
4	330	1435
6	10	718.9
17	0	600.3
3	80	340.3
32	10	272

Here we’re looking at the 6 subreddits i’ve selected (“funny”,“top”,“aww”,“askreddit”,“pics”,“videos”). I’ve left some comments in so you can understand the absolute frustration I went through. This was fucking torture to get working. Effectively I enter reddit as an unknown user with a user agent of “moo moo” and “a cow” and it brings me through to the page. Reddit supplies us with a json version of its page because it’s built in javascript, so I parse it into a dataframe and merge each column (via data.frame) and the different subreddits (by rbind).

Subreddits <- c("funny","top","aww","askreddit","pics","videos")

Urls <- c("http://reddit.com/r/funny.json?limit=100&after=100","https://www.reddit.com/top.json?limit=100&after=100",
         "http://reddit.com/r/aww.json?limit=100&after=100","http://reddit.com/r/askreddit.json?limit=100&after=100",
         "http://reddit.com/r/pics.json?limit=100&after=100","http://reddit.com/r/videos.json?limit=100&after=100")

Reddit <- data.frame(Title = "Title", Subreddit = "Subreddit", RedditLink = "RedditLink", Url = "Url", Likes = "Likes", Comments = "Comments")
Reddit <- Reddit[-1,]

Title <- 0
Subreddit <- 0
RedditLink <- 0
Url <- 0
Likes <- 0
Comments <- 0

# THIS IS LITERALLY FUCKING AIDS, this works completely differently on my local machine and my R server, but it also fucking randomly changes depending on whatever the fuck it feels like doing. Sometimes it pulls back the right data, sometimes it needs a loop. For some reason it really likes to just randomly unlist without being told to unlist. Then again i'm accessing reddit as a normal user under the user-agent "a cow".

# This is actually getting stupid now, sometimes it gives a 404 when it reads the END of the json call. Literally what the fuck, it's so difficult to catch this exception 

# for (i in 1:length(Urls)){
#   url = Urls[i]
#   h <- new_handle()
#   handle_setopt(h, COPYPOSTFIELDS = "moo=moomooo");
#   handle_setheaders(h,"Content-Type" = "text/moo","Cache-Control" = "no-cache","User-Agent" = "A cow")
#   con <- curl(url, handle = h)
#   rawdat <- fromJSON(readLines(con, warn=FALSE))
#   
#   
#   for (p in 1:length(rawdat$data$children)){
# #  print(paste0(i," of ", length(Urls),"  |  ",p," of ",length(rawdat$data$children)))
#     ifelse(is.null(rawdat$data$children[[p]]$data$title),Title <- "NULL", Title <- rawdat$data$children[[p]]$data$title)
#     Title <- gsub("[^[:alnum:] ]", "", Title)
#     ifelse(is.null(rawdat$data$children[[p]]$data$subreddit),Subreddit <- "NULL", Subreddit <- rawdat$data$children[[p]]$data$subreddit)
#     ifelse(is.null(rawdat$data$children[[p]]$data$permalink), RedditLink <- "NULL", RedditLink <- paste0("www.reddit.com",rawdat$data$children[[p]]$data$permalink))
#     ifelse(is.null(rawdat$data$children[[p]]$data$url),Url <- "NULL", Url <- rawdat$data$children[[p]]$data$url)
#     ifelse(is.null(rawdat$data$children[[p]]$data$ups),Likes <- "NULL", Likes <- rawdat$data$children[[p]]$data$ups)
#     ifelse(is.null(rawdat$data$children[[p]]$data$num_comments), Comments <- "NULL", Comments <- rawdat$data$children[[p]]$data$num_comments)
#     Reddit_t <- data.frame(Title = Title, Subreddit = Subreddit, RedditLink= RedditLink, Url = Url, Likes = Likes, Comments = Comments)
#     Reddit <- rbind(Reddit,Reddit_t)
#     }
# }

# Turns out you can just associate an rawdat$data$children$data$x and it returns an entire column rather than building it row-by-row. This is getting retarded now. This has worked perfectly on an EC2 machine for 5 months, and when i want to run it locally everything goes balls to the wall.

for (i in 1:length(Urls)){
  url = Urls[i]
  h <- new_handle()
  handle_setopt(h, COPYPOSTFIELDS = "moo=moomooo");
  handle_setheaders(h,"Content-Type" = "text/moo","Cache-Control" = "no-cache","User-Agent" = "A cow")
  con <- curl(url, handle = h)
  rawdat <- fromJSON(readLines(con))
#  print(paste0(i," of ", length(Urls)))
  ifelse(is.null(rawdat$data$children$data$title),Title <- "NULL", Title <- rawdat$data$children$data$title)
  Title <- gsub("[^[:alnum:] ]", "", Title)
  ifelse(is.null(rawdat$data$children$data$subreddit),Subreddit <- "NULL", Subreddit <- rawdat$data$children$data$subreddit)
  ifelse(is.null(rawdat$data$children$data$permalink), RedditLink <- "NULL", RedditLink <- paste0("www.reddit.com",rawdat$data$children$data$permalink))
  ifelse(is.null(rawdat$data$children$data$url),Url <- "NULL", Url <- rawdat$data$children$data$url)
  ifelse(is.null(rawdat$data$children$data$ups),Likes <- "NULL", Likes <- rawdat$data$children$data$ups)
  ifelse(is.null(rawdat$data$children$data$num_comments), Comments <- "NULL", Comments <- rawdat$data$children$data$num_comments)
  Reddit_t <- data.frame(Title = Title, Subreddit = Subreddit, RedditLink= RedditLink, Url = Url, Likes = Likes, Comments = Comments)
  Reddit <- rbind(Reddit,Reddit_t)
}

Reddit$Deduped <- paste0(Reddit$Title,Reddit$Subreddit)
Reddit <- Reddit[!duplicated(Reddit$Deduped),]
Reddit$Deduped <- NULL

Reddit <- Reddit[with(Reddit, order(-as.numeric(Reddit$Comments))), ]
pander(head(Reddit[,c(1,2,5,6)]),caption = "Reddit Posts by No. Comments")

Reddit Posts by No. Comments
	Title	Subreddit	Likes	Comments
138	Reddit Whats the Best or Worst Last Minute Gift Youve Ever Given or Received	AskReddit	13543	10819
565	Spoiled Brat screaming at her Grandpa over an IPhone Appointment	videos	27328	9094
132	Truck ploughs into Christmas market in Berlin injuring several people reports	worldnews	21925	9047
116	Russian ambassador to Ankara Andrey Karlov attacked by unidentified gunmen	worldnews	31264	8386
176	What is slowly dying off and people just dont realize it	AskReddit	5798	7624
326	What are some common forms of sexism women face	AskReddit	3625	5493

Krisalytics - R Playground

By Bobby Adams

December 19, 2016

Playing with Hearts

Image-to-Text

Image Descriptions

Realtime Audio Data

Popular Articles

Reddit