Final Project:
Data scraping
sentiment analysis
Graphic comparsion for amazon , facebook, and Nasdaq
Conclusion

Final Project:

Introduction

In 2017, some financial companies uses advanced algorithms from data analystic to conduct 100% automatic finacial trading instead of human trading and analysis. I think this project is a good opportunity to help me figure out the relationship between different stock wave.

How the wave of Nasdaq index are affected by Amazon and Facebook?

Research Questions

To explore how the wave of Nasdaq index are affected by Amazon and Facebook, I want to break down to different topics about the financial markets.

The statistics of daily historical stock prices & volumes for one year under this category, I will compare from Nasdaq index, Amazon and Facebooks.

The mainly resource I will use is http://www.nasdaq.com

Proposed Sources

Both for short term and long term analysis, the related stock news will be used for analysis by sentiments in order to match for complexity.

Data scraping http://www.nasdaq.com/symbol/ndaq/historical http://www.nasdaq.com/symbol/amzn/historical http://www.nasdaq.com/symbol/fb/historical

Data sentiments http://www.nasdaq.com/symbol/ndaq/news-headlines http://www.nasdaq.com/symbol/amzn/news-headlines http://www.nasdaq.com/symbol/fb/news-headlines

Proposed Methodology

The visualization of data is to compare the wave and pattern of trend for Nasdaq. Use R’s tm package to create corpora and execute a corpus cleaning algorithm Attempt various combinations of classification algorithms using RTextTools sentiment analysis

Data scraping

Scrapying method 1 - Using quantmod

Using Quantmod function to get data

library(quantmod)

getSymbols("YHOO",src="google") # from google finance

## [1] "YHOO"

getSymbols("GOOG",src="yahoo") # from yahoo finance

## [1] "GOOG"

getSymbols("amzn")

## [1] "AMZN"

tail(AMZN)

##            AMZN.Open AMZN.High AMZN.Low AMZN.Close AMZN.Volume
## 2017-05-02    946.65    950.10   941.41     946.94     3812400
## 2017-05-03    946.00    946.00   935.90     941.03     3560000
## 2017-05-04    944.75    945.00   934.22     937.53     2409000
## 2017-05-05    940.52    940.79   930.30     934.15     2856600
## 2017-05-08    940.95    949.05   939.21     949.04     3390700
## 2017-05-09    952.80    957.89   950.20     952.82     3246200
##            AMZN.Adjusted
## 2017-05-02        946.94
## 2017-05-03        941.03
## 2017-05-04        937.53
## 2017-05-05        934.15
## 2017-05-08        949.04
## 2017-05-09        952.82

getSymbols("FB")

## [1] "FB"

tail(FB)

##            FB.Open FB.High FB.Low FB.Close FB.Volume FB.Adjusted
## 2017-05-02  153.34  153.44 151.66   152.78  21451000      152.78
## 2017-05-03  153.60  153.60 151.34   151.80  24379500      151.80
## 2017-05-04  150.17  151.52 148.72   150.85  36174000      150.85
## 2017-05-05  151.45  151.63 149.79   150.24  17051500      150.24
## 2017-05-08  150.71  151.08 149.74   151.06  15787900      151.06
## 2017-05-09  151.49  152.59 150.21   150.48  17368600      150.48

getSymbols("NDAQ")

## [1] "NDAQ"

tail(NDAQ)

##            NDAQ.Open NDAQ.High NDAQ.Low NDAQ.Close NDAQ.Volume
## 2017-05-02     68.45     68.64    68.02      68.36      570500
## 2017-05-03     68.34     68.53    67.28      67.81     1238300
## 2017-05-04     68.10     68.10    67.30      67.96      850000
## 2017-05-05     67.96     68.07    67.53      67.91      570400
## 2017-05-08     67.90     68.03    66.74      67.03      905900
## 2017-05-09     67.18     67.40    66.73      66.75     1186400
##            NDAQ.Adjusted
## 2017-05-02         68.36
## 2017-05-03         67.81
## 2017-05-04         67.96
## 2017-05-05         67.91
## 2017-05-08         67.03
## 2017-05-09         66.75

Scraping method 2 - Loading CSV

Using CSV direct to push data into R

#load Amazon data
amzn2<-read.csv("https://raw.githubusercontent.com/fung1091/DATA607finalproject/master/amzn.csv", stringsAsFactors = FALSE)
head(amzn2)

##         date  close       volume   open   high     low
## 1      16:00 952.95    3,171,374 952.80 957.89 950.200
## 2 2017/05/09 952.82 3256357.0000 952.80 957.89 950.200
## 3 2017/05/08 949.04 3406954.0000 940.95 949.05 939.210
## 4 2017/05/05 934.15 2863978.0000 940.52 940.79 930.300
## 5 2017/05/04 937.53 2413486.0000 944.75 945.00 934.215
## 6 2017/05/03 941.03 3578108.0000 946.00 946.00 935.900

#load Facebook data
fb2<-read.csv("https://raw.githubusercontent.com/fung1091/DATA607finalproject/master/fb.csv", stringsAsFactors = FALSE)
head(fb2)

##         date  close        volume   open   high    low
## 1      16:00 150.48    16,999,575 151.49 152.59 150.21
## 2 2017/05/09 150.48 17381800.0000 151.49 152.59 150.21
## 3 2017/05/08 151.06 15813350.0000 150.71 151.08 149.74
## 4 2017/05/05 150.24 17104730.0000 151.45 151.63 149.79
## 5 2017/05/04 150.85 36185180.0000 150.17 151.52 148.72
## 6 2017/05/03 151.80 28301550.0000 153.60 153.60 151.34

#load Nasdaq data
ndaq2<-read.csv("https://raw.githubusercontent.com/fung1091/DATA607finalproject/master/ndaq.csv", stringsAsFactors = FALSE)
head(ndaq2)

##         date close       volume  open   high     low
## 1      16:00 66.74    1,116,147 67.18 67.400 66.7300
## 2 2017/05/09 66.75 1186355.0000 67.18 67.400 66.7300
## 3 2017/05/08 67.03  982837.0000 67.90 68.030 66.7400
## 4 2017/05/05 67.91  571990.0000 67.96 68.065 67.5265
## 5 2017/05/04 67.96  850034.0000 68.10 68.100 67.3000
## 6 2017/05/03 67.81 1239941.0000 68.34 68.530 67.2800

Scraping method 3:

To compare with the above method for scaping data, the method 3 can be retreive the most updated data for comparison, it may get the real time data, so we will use the method 3 for data analysis.

library(rvest)
library(RCurl)
library(XML)

Create function for scraping data from web

ddd <- function(i){


hkurl <- i

'ratings' <- function(hkurl)
  {
  require(XML)

hdoc <- getURLContent(hkurl)

# find all tables in webpage
hktables <- readHTMLTable(hdoc)

# find largest table and return as dataframe
columns <- unlist(lapply(hktables, function(t) dim(t)[2]))
 
df <- hktables[[max(columns)]]
 
return(df)
}

seriess <- ratings(hkurl)
colnames(seriess) <- c("Date","Open","High","Low","Close","Volume")
seriess
return(seriess)
}

Create Dataframe by using function

# Real time data for Nasdaq index
ndaq3 <- ddd('http://www.nasdaq.com/symbol/ndaq/historical')
head(ndaq3)

##         Date  Open   High     Low Close    Volume
## 1      16:00 67.18  67.40   66.73 66.74 1,116,147
## 2 05/09/2017 67.18   67.4   66.73 66.75 1,186,355
## 3 05/08/2017  67.9  68.03   66.74 67.03   982,837
## 4 05/05/2017 67.96 68.065 67.5265 67.91   571,990
## 5 05/04/2017  68.1   68.1    67.3 67.96   850,034
## 6 05/03/2017 68.34  68.53   67.28 67.81 1,239,941

# Real time data for Facebook
fb3 <- ddd('http://www.nasdaq.com/symbol/fb/historical')
head(fb3)

##         Date   Open   High    Low  Close     Volume
## 1      16:00 151.49 152.59 150.21 150.48 16,999,575
## 2 05/09/2017 151.49 152.59 150.21 150.48 17,381,800
## 3 05/08/2017 150.71 151.08 149.74 151.06 15,813,350
## 4 05/05/2017 151.45 151.63 149.79 150.24 17,104,730
## 5 05/04/2017 150.17 151.52 148.72 150.85 36,185,180
## 6 05/03/2017  153.6  153.6 151.34  151.8 28,301,550

# Real time data for Amazon
amzn3 <- ddd('http://www.nasdaq.com/symbol/amzn/historical')
head(amzn3)

##         Date   Open   High     Low  Close    Volume
## 1      16:00 952.80 957.89  950.20 952.95 3,171,374
## 2 05/09/2017  952.8 957.89   950.2 952.82 3,256,357
## 3 05/08/2017 940.95 949.05  939.21 949.04 3,406,954
## 4 05/05/2017 940.52 940.79   930.3 934.15 2,863,978
## 5 05/04/2017 944.75    945 934.215 937.53 2,413,486
## 6 05/03/2017    946    946   935.9 941.03 3,578,108

Replace text of today and remove comma

# Get the text of today
library(lubridate)
date <- today()
newdate <- strptime(as.character(date), "%Y-%m-%d")
txtdate <- format(newdate, "%m/%d/%Y")
txtdate

## [1] "05/10/2017"

# remove comma fuction
removeComma= function(s) {gsub(",", "", s, fixed = TRUE)}

Function of calcuation of analysis

abc <- function(z){
#colnames(z) <- c("Date","Open","High","Low","Close","Volume")
z$Date <- as.character(z$Date)
z$Open <- round(as.numeric(as.character(z$Open)),2)
z$High <- round(as.numeric(as.character(z$High)),2)
z$Low <- round(as.numeric(as.character(z$Low)),2)
z$Close <- round(as.numeric(as.character(z$Close)),2)
z$Volume <- as.numeric(as.character(removeComma(z$Volume)))
z[1,1] <- txtdate

z$change <- z$Close - z$Open
z$rating_change <- z$change / z$Open
z$date_range <- z$High - z$Low
z
}


fb3 <- abc(fb3)
head(fb3)

##         Date   Open   High    Low  Close   Volume change rating_change
## 1 05/10/2017 151.49 152.59 150.21 150.48 16999575  -1.01  -0.006667107
## 2 05/09/2017 151.49 152.59 150.21 150.48 17381800  -1.01  -0.006667107
## 3 05/08/2017 150.71 151.08 149.74 151.06 15813350   0.35   0.002322341
## 4 05/05/2017 151.45 151.63 149.79 150.24 17104730  -1.21  -0.007989435
## 5 05/04/2017 150.17 151.52 148.72 150.85 36185180   0.68   0.004528201
## 6 05/03/2017 153.60 153.60 151.34 151.80 28301550  -1.80  -0.011718750
##   date_range
## 1       2.38
## 2       2.38
## 3       1.34
## 4       1.84
## 5       2.80
## 6       2.26

amzn3 <- abc(amzn3)
head(amzn3)

##         Date   Open   High    Low  Close  Volume change rating_change
## 1 05/10/2017 952.80 957.89 950.20 952.95 3171374   0.15  1.574307e-04
## 2 05/09/2017 952.80 957.89 950.20 952.82 3256357   0.02  2.099076e-05
## 3 05/08/2017 940.95 949.05 939.21 949.04 3406954   8.09  8.597694e-03
## 4 05/05/2017 940.52 940.79 930.30 934.15 2863978  -6.37 -6.772849e-03
## 5 05/04/2017 944.75 945.00 934.22 937.53 2413486  -7.22 -7.642233e-03
## 6 05/03/2017 946.00 946.00 935.90 941.03 3578108  -4.97 -5.253700e-03
##   date_range
## 1       7.69
## 2       7.69
## 3       9.84
## 4      10.49
## 5      10.78
## 6      10.10

ndaq3 <- abc(ndaq3)
head(ndaq3)

##         Date  Open  High   Low Close  Volume change rating_change
## 1 05/10/2017 67.18 67.40 66.73 66.74 1116147  -0.44 -0.0065495683
## 2 05/09/2017 67.18 67.40 66.73 66.75 1186355  -0.43 -0.0064007145
## 3 05/08/2017 67.90 68.03 66.74 67.03  982837  -0.87 -0.0128129602
## 4 05/05/2017 67.96 68.06 67.53 67.91  571990  -0.05 -0.0007357269
## 5 05/04/2017 68.10 68.10 67.30 67.96  850034  -0.14 -0.0020558003
## 6 05/03/2017 68.34 68.53 67.28 67.81 1239941  -0.53 -0.0077553409
##   date_range
## 1       0.67
## 2       0.67
## 3       1.29
## 4       0.53
## 5       0.80
## 6       1.25

Cleaning and Analysis 1 - Index comparsion

library(tidyr)
library(dplyr)
fb4 <- fb3 %>% select(Date, Open)
colnames(fb4) <- c("Date","facebook")
amzn4 <- amzn3 %>% select(Date, Open)
colnames(amzn4) <- c("Date","amazon")
ndaq4 <- ndaq3 %>% select(Date, Open)
colnames(ndaq4) <- c("Date","nasdaq")

x <- inner_join(fb4,amzn4)

## Joining, by = "Date"

analysis1 <- inner_join(x,ndaq4)

## Joining, by = "Date"

analysis2 <- analysis1 %>% gather(company, index, 2:4)
head(analysis1)

##         Date facebook amazon nasdaq
## 1 05/10/2017   151.49 952.80  67.18
## 2 05/09/2017   151.49 952.80  67.18
## 3 05/08/2017   150.71 940.95  67.90
## 4 05/05/2017   151.45 940.52  67.96
## 5 05/04/2017   150.17 944.75  68.10
## 6 05/03/2017   153.60 946.00  68.34

head(analysis2)

##         Date  company  index
## 1 05/10/2017 facebook 151.49
## 2 05/09/2017 facebook 151.49
## 3 05/08/2017 facebook 150.71
## 4 05/05/2017 facebook 151.45
## 5 05/04/2017 facebook 150.17
## 6 05/03/2017 facebook 153.60

Stock index comparsion

library(ggplot2)

graph1 <- ggplot(data=analysis2,
          aes(x=analysis2$Date, y=analysis2$index, group=analysis2$company)) +
          geom_point(size=3, aes(shape=analysis2$company, color=analysis2$company)) +
          geom_line(size=1, aes(color=analysis2$company)) +
          ggtitle("Profiles for stock index comparsion")
graph1

For the patten for this 3 month, facebook look more similar with nasdaq.

linear regression comparsion between nasdaq and facebook

Negative linear equation relationship

nas_fb <- lm(analysis1$nasdaq ~ analysis1$facebook)
plot(jitter(analysis1$nasdaq) ~ jitter(analysis1$facebook))

abline(nas_fb)

summary(nas_fb)

## 
## Call:
## lm(formula = analysis1$nasdaq ~ analysis1$facebook)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.94673 -0.69570  0.09836  0.66608  1.75615 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        91.61001    2.99488   30.59  < 2e-16 ***
## analysis1$facebook -0.15424    0.02124   -7.26 8.27e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.92 on 61 degrees of freedom
## Multiple R-squared:  0.4635, Adjusted R-squared:  0.4548 
## F-statistic: 52.71 on 1 and 61 DF,  p-value: 8.27e-10

hist(nas_fb$residuals)

linear regression comparsion between nasdaq and amazon

Negative linear equation relationship

nas_amzn <- lm(analysis1$nasdaq ~ analysis1$amazon)
plot(jitter(analysis1$nasdaq) ~ jitter(analysis1$amazon))

abline(nas_amzn)

summary(nas_amzn)

## 
## Call:
## lm(formula = analysis1$nasdaq ~ analysis1$amazon)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2528 -0.5861  0.1641  0.6236  1.7302 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      90.041731   2.686439  33.517  < 2e-16 ***
## analysis1$amazon -0.022966   0.003058  -7.511 3.06e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9054 on 61 degrees of freedom
## Multiple R-squared:  0.4805, Adjusted R-squared:  0.4719 
## F-statistic: 56.41 on 1 and 61 DF,  p-value: 3.062e-10

hist(nas_amzn$residuals)

Cleaning and Analysis 2 - Rating-change comparsion

fb5 <- fb3 %>% select(Date, rating_change)
colnames(fb5) <- c("Date","facebook")
amzn5 <- amzn3 %>% select(Date, rating_change)
colnames(amzn5) <- c("Date","amazon")
ndaq5 <- ndaq3 %>% select(Date, rating_change)
colnames(ndaq5) <- c("Date","nasdaq")

x <- inner_join(fb5,amzn5)

## Joining, by = "Date"

analysis3 <- inner_join(x,ndaq5)

## Joining, by = "Date"

analysis4 <- analysis3 %>% gather(company, rating, 2:4)

# Visualization
graph2 <- ggplot(data=analysis4,
          aes(x=Date, y=rating, group=company)) +
          geom_point(size=3, aes(shape=company, color=company)) +
          geom_line(size=1, aes(color=company)) +
          ggtitle("stock rating comparsion")
graph2

The wave of facebook and nasdaq seem to be more similar than amazon.

Cleaning and Analysis 3 -Stock date_range comparsion

fb6 <- fb3 %>% select(Date, date_range)
colnames(fb6) <- c("Date","facebook")
amzn6 <- amzn3 %>% select(Date, date_range)
colnames(amzn6) <- c("Date","amazon")
ndaq6 <- ndaq3 %>% select(Date, date_range)
colnames(ndaq6) <- c("Date","nasdaq")

x <- inner_join(fb6,amzn6)

## Joining, by = "Date"

analysis5 <- inner_join(x,ndaq6)

## Joining, by = "Date"

analysis6 <- analysis5 %>% gather(company, date_range, 2:4)

# Visualization
graph3 <- ggplot(data=analysis6,
          aes(x=Date, y=date_range, group=company)) +
          geom_point(size=3, aes(shape=company, color=company)) +
          geom_line(size=1, aes(color=company)) +
          ggtitle("stock date_range comparsion")
graph3

The wave of facebook seem to be close relationship with the wave of nasdaq.

sentiment analysis

Comparsion by using sentiment analysis

library(rvest)
library(XML)
library(stringr)
library(RCurl)
library(bitops)
library(tibble) # function tibble
library(tidytext) # function unnest_tokens
library(dplyr) # function mutate
library(tidyr)
library(ggplot2)

Sentiment analysis from scraping most recent article about Amazon, Facebook and Nasdaq

web <- function(i){

weblinks <- read_html(paste('https://www.nasdaq.com/symbol/',i,'/news-headlines',sep=""))

doc <- htmlParse(weblinks)

links <- xpathSApply(doc, "//a/@href")

#http://www.nasdaq.com/article/

webs <-unlist(str_extract_all(links, 'http://www.nasdaq.com/article.+'))
webs

}

Most recent article for amazon, fb and nasdaq

article_amzn <- web("amzn")
#article_amzn
article_fb <- web("fb")
#article_fb
article_ndaq <- web("ndaq")
article_ndaq

##  [1] "http://www.nasdaq.com/article/4-stocks-im-never-selling-cm786315"                                                                       
##  [2] "http://www.nasdaq.com/article/4-stocks-im-never-selling-cm786315"                                                                       
##  [3] "http://www.nasdaq.com/article/the-first-thing-you-should-do-with-your-social-security-check-cm786333"                                   
##  [4] "http://www.nasdaq.com/article/the-first-thing-you-should-do-with-your-social-security-check-cm786333"                                   
##  [5] "http://www.nasdaq.com/article/bats-targets-nyse-and-nasdaq-endofday-volume-with-new-offering-20170508-01109"                            
##  [6] "http://www.nasdaq.com/article/us-economics-feds-bullard-current-funds-rate-reasonable-balance-sheet-unwind-could-start-in-2017-cm785162"
##  [7] "http://www.nasdaq.com/article/key-takeaways-from-ices-q1-earnings-cm784490"                                                             
##  [8] "http://www.nasdaq.com/article/a-drop-in-trading-volume-curbs-nasdaqs-results-cm783594"                                                  
##  [9] "http://www.nasdaq.com/article/intercontinental-exchange-ice-beats-q1-earnings-estimates-cm783595"                                       
## [10] "http://www.nasdaq.com/article/ice-allots-10-mln-for-regulatory-probes-earnings-rise-20170503-01190"                                     
## [11] "http://www.nasdaq.com/article/key-takeaways-from-cmes-q1-earnings-cm781544"                                                             
## [12] "http://www.nasdaq.com/article/3-things-you-didnt-know-about-frontier-communications-corporation-cm781065"                               
## [13] "http://www.nasdaq.com/article/key-takeaways-from-nasdaqs-q1-earnings-cm780749"                                                          
## [14] "http://www.nasdaq.com/article/cme-group-cme-q1-earnings-beat-revenues-miss-estimates-cm780387"

Download data

# Try to use for loop, but overloading
article_amzn1 <- getURL(article_amzn)
article_fb1 <- getURL(article_fb)
article_ndaq1 <- getURL(article_ndaq)

Dataframe for each text

text <- function(q,j){
companies <- list(q)

stocks <- c(j)

series <- tibble()

for(i in seq_along(stocks)) {
        
        clean <- tibble(article = seq_along(companies[[i]]),
                        text = companies[[i]]) %>%
             unnest_tokens(word, text) %>%
             mutate(company = stocks[i]) %>%
             select(company, everything())

        series <- rbind(series, clean)
}

series$company <- factor(series$company, levels = rev(stocks))

series
}

use function for sentimnet

text_amzn <- text(article_amzn1,'amazon')
text_amzn

## # A tibble: 252,936 ⊙ 3
##    company article    word
## *   <fctr>   <int>   <chr>
## 1   amazon       1 doctype
## 2   amazon       1    html
## 3   amazon       1    html
## 4   amazon       1    lang
## 5   amazon       1      en
## 6   amazon       1      us
## 7   amazon       1   class
## 8   amazon       1   inner
## 9   amazon       1    news
## 10  amazon       1   story
## # ... with 252,926 more rows

text_fb <- text(article_fb1,'facebook')
text_fb

## # A tibble: 251,425 ⊙ 3
##     company article    word
## *    <fctr>   <int>   <chr>
## 1  facebook       1 doctype
## 2  facebook       1    html
## 3  facebook       1    html
## 4  facebook       1    lang
## 5  facebook       1      en
## 6  facebook       1      us
## 7  facebook       1   class
## 8  facebook       1   inner
## 9  facebook       1    news
## 10 facebook       1   story
## # ... with 251,415 more rows

text_ndaq <- text(article_ndaq1,'nasdaq')
text_ndaq

## # A tibble: 245,628 ⊙ 3
##    company article    word
## *   <fctr>   <int>   <chr>
## 1   nasdaq       1 doctype
## 2   nasdaq       1    html
## 3   nasdaq       1    html
## 4   nasdaq       1    lang
## 5   nasdaq       1      en
## 6   nasdaq       1      us
## 7   nasdaq       1   class
## 8   nasdaq       1   inner
## 9   nasdaq       1    news
## 10  nasdaq       1   story
## # ... with 245,618 more rows

Sentiment analysis (Using nrc for analysis)

text_sentiment <- function(text) {
text %>%
        right_join(get_sentiments("nrc")) %>%
        filter(!is.na(sentiment)) %>%
        count(sentiment, sort = TRUE)
}

3 group sentiment comparsion

sen_amzn <- text_sentiment(text_amzn)

## Joining, by = "word"

sen_amzn

## # A tibble: 10 ⊙ 2
##       sentiment     n
##           <chr> <int>
## 1      positive 11885
## 2         trust  6574
## 3      negative  6104
## 4  anticipation  3104
## 5       sadness  2849
## 6          fear  2630
## 7           joy  2560
## 8         anger  2335
## 9      surprise  1870
## 10      disgust  1457

sen_fb <- text_sentiment(text_fb)

## Joining, by = "word"

sen_fb

## # A tibble: 10 ⊙ 2
##       sentiment     n
##           <chr> <int>
## 1      positive 11704
## 2         trust  6454
## 3      negative  5988
## 4  anticipation  3008
## 5       sadness  2798
## 6          fear  2597
## 7           joy  2511
## 8         anger  2309
## 9      surprise  1830
## 10      disgust  1476

sen_ndaq <- text_sentiment(text_ndaq)

## Joining, by = "word"

sen_ndaq

## # A tibble: 10 ⊙ 2
##       sentiment     n
##           <chr> <int>
## 1      positive 11691
## 2         trust  6403
## 3      negative  5941
## 4  anticipation  3007
## 5       sadness  2775
## 6          fear  2604
## 7           joy  2465
## 8         anger  2276
## 9      surprise  1821
## 10      disgust  1400

Both of those article for amazon, facebook and nasdaq show the positive trend recently. I dont suggest to use all article into analysis, because the trend will become the same between positive and negative.

Visualization

#Using get_sentiments(bing)
graph <- function(graph1){
graph1 %>%
        group_by(company) %>% 
        mutate(word_count = 1:n(), index = word_count %/% 500 + 1) %>%
        inner_join(get_sentiments("bing")) %>%
        count(company, index = index , sentiment)  %>%
        ungroup() %>%
        spread(sentiment, n, fill = 0) %>%
        mutate(sentiment = positive - negative) %>%
        ggplot(aes(index, sentiment, fill = company)) +
        geom_bar(alpha = 0.5, stat = "identity", show.legend = FALSE) +
        facet_wrap(~ company, ncol = 2, scales = "free_x")
}

Graphic comparsion for amazon , facebook, and Nasdaq

graph(text_amzn)

## Joining, by = "word"

graph(text_fb)

## Joining, by = "word"

graph(text_ndaq)

## Joining, by = "word"

Predication for Nasdaq and amazon are most similar than with facebook by comparing graphs.

Conclusion

By the comparsion with index and sentiment for Amazon, Facebook and Nasdaq, the wave for facebook and Nasdaq are more similar and close relationship than comparing with Amazon. That mean the predication of Nasdaq may be more attend to Facebook, not amazon.

DATA607finalproject

jim lung

05-01-2017