1 de diciembre de 2018
Create a web page presentation using R Markdown that features a plot created with Plotly. Host your webpage on either GitHub Pages, RPubs, or NeoCities. Your webpage must contain the date that you created the document, and it must contain a plot created with Plotly. We would love to see you show off your creativity!
To produce the following plots, data from the previous exercise (Stupid MAP) were used.
The aim is to compare the number of words by source. To do so there will be two plots: 1. An histogram of number of words by group. 2. A boxplot of number of words by group to focus on average differences.
Moreover, there is an extra plot comparing the number of followers by group.
Rememember that this plots belong to data extracted from Twitter API from a short perior of time (some hours) and there were used just those tweets using the word "stupid".
library(plotly) library(ggplot2) library(stringr)
load(file = "C:/GitHub/DataScience/C09_DDP/W2_assignment_MAP/Twitter_data/Twitter_data_20181018_094032.Rda")
newT <- as.data.frame(cbind(as.character(df$user_id), as.character(df$created_at), as.character(df$text), as.character(df$source), as.logical(df$is_quote), as.logical(df$is_retweet), as.integer(df$favorite_count), as.integer(df$retweet_count), as.integer(df$followers_count), as.integer(df$friends_count))) colnames(newT) <- c("ID", "date", "text", "source", "quote", "retweet", "Nfav", "Nret", "followers", "Nfriends")
newT$ID <- as.character(newT$ID) newT$fecha <- as.POSIXct(newT$date) newT$texto <- as.character(newT$text) newT$Nfav <- as.integer(newT$Nfav) newT$Nret <- as.integer(newT$Nret) newT$followers <- as.integer(newT$followers) newT$Nfriends <- as.integer(newT$Nfriends) str(newT)
## 'data.frame': 23905 obs. of 12 variables: ## $ ID : chr "76948379" "1018464013473153029" "163751657" "537816901" ... ## $ date : Factor w/ 15209 levels "2018-10-18 00:38:06",..: 15209 15208 15207 15206 15205 15204 15203 15203 15202 15201 ... ## $ text : Factor w/ 23808 levels "'Fuck you Janet. I'm not coming to your stupid baby shower.' \U0001f602\U0001f602 https://t.co/vX9B2ioWrZ",..: 20267 15347 4274 15598 2065 7187 8074 13266 6576 4602 ... ## $ source : Factor w/ 231 levels " NotMalwareTech",..: 209 215 209 212 215 209 209 209 213 212 ... ## $ quote : Factor w/ 2 levels "FALSE","TRUE": 1 1 2 1 1 1 1 1 1 1 ... ## $ retweet : Factor w/ 1 level "FALSE": 1 1 1 1 1 1 1 1 1 1 ... ## $ Nfav : int 1 1 1 1 1 1 1 1 1 1 ... ## $ Nret : int 1 1 1 1 1 1 1 1 1 1 ... ## $ followers: int 1896 3450 568 3100 3309 2 1306 2672 3819 1398 ... ## $ Nfriends : int 59 1226 301 2450 3467 841 751 2803 2867 1694 ... ## $ fecha : POSIXct, format: "2018-10-18 07:40:12" "2018-10-18 07:40:11" ... ## $ texto : chr "Stuck in stupid work feeling like crap! Can't wait to finish so I can go do some retail therapy and get my fat "| __truncated__ "I keep going to bed at the most stupid of times UGH" "@HariGovindk1 @honeygeorge74 @abhijitmajumder Its the communist, right?? <U+0001F602><U+0001F602><U+0001F602><U"| __truncated__ "I miss dying laughing at stupid shit with @cartermoore123" ...
newT$source[newT$source != "Twitter Web Client" & newT$source != "Twitter for iPhone" & newT$source != "Twitter for Android"] <- NA newT <- mutate(newT, Nwords = str_count(text, " "))
# histogram of Nwords by source plot_ly(newT, x = ~Nwords, type = "histogram", color = ~factor(source))
# boxplot of Nwords by source plot_ly(newT, y = ~Nwords, type = "box", color = ~factor(source))
# boxplot of followers by source plot_ly(newT, y = ~followers, type = "box", color = ~factor(source))
No brain = No discussion. No discussion = No conclusion.
Have a nice day and don't be stupid.