"Catalonia Independence sentiment analysis- Alvaro Bueno"
"12/5/2017"

Catalonia Independence sentiment analysis

The project consists of loading news content from diverse sites. The sources correspond to articles from december, november and october mostly, with some articles from march, where the first actions from the spain government against the independent movement were applied this year.

Media

Catalan News, BBC, The Guardian, The independent, AbcNews, NPR, La Vanguardia, El periodico, RTE.ie, Al Jazeera and Bloomberg

Project Drawbacks

  • Restricted access to Spanish Corpus, no Catalan Corpus
  • Very difficult reformatting of dates after site mining.
  • Twitter restricts access to last 7 days, so it didn't become useful for the time needed for this project.

Data, Sources

  • Saving the data frame filtered by language and news company
  • using the content variable as the source of the sentiment analysis
  • the package methods will take care of whitespace, punctuation, and general clean-up of the data.

Assumptions, Methodology

After taking all the content from these pages, let's note that date was gathered in Descending order, getting the most recent news first (December and last week of november, when the ousted catalan government aides are starting to get out of prison on bail) and the ones from october (right when the vote started for independence referendum, which won by more than 90% of the vote.) in the right side of the plot.

Assumptions, Methodology (2)

using the analyzeSentiment library we proceed to plot the variability in sentiment across the mined documentts.

sent_english <- analyzeSentiment(as.character(df[df$lang=='EN',]$content))
sent_spanish <- analyzeSentiment(as.character(df[df$lang=='ES',]$content), language='spanish')

sent_abc <- analyzeSentiment(as.character(df[df$newscompany=='abcnews',]$content))
sent_periodico <- analyzeSentiment(as.character(df[df$newscompany=='periodico',]$content), language='spanish')
sent_ctn <- analyzeSentiment(as.character(df[df$newscompany=='CTN',]$content))
sent_jaz <- analyzeSentiment(as.character(df[df$newscompany=='aljazeera',]$content))
sent_bbc <- analyzeSentiment(as.character(df[df$newscompany=='bbc',]$content))
sent_gua <- analyzeSentiment(as.character(df[df$newscompany=='guardian',]$content))
sent_bbg <- analyzeSentiment(as.character(df[df$newscompany=='bberg',]$content))
sent_ind <- analyzeSentiment(as.character(df[df$newscompany=='indep',]$content))
sent_npr <- analyzeSentiment(as.character(df[df$newscompany=='npr',]$content))
sent_rte <- analyzeSentiment(as.character(df[df$newscompany=='rte.ie',]$content))

Other Plots

plotSentiment(sent_english) 

plot of chunk unnamed-chunk-2

Other Plots

plotSentiment(sent_spanish)

plot of chunk unnamed-chunk-3

Other Plots

plotSentiment(sent_ctn) 

plot of chunk unnamed-chunk-4

Other Plots

plotSentiment(sent_jaz) 

plot of chunk unnamed-chunk-5

Other Plots

plotSentiment(sent_bbc) 

plot of chunk unnamed-chunk-6

Other Plots

plotSentiment(sent_npr) 

plot of chunk unnamed-chunk-7

Conclusions

There's an increasing amount of the variability of sentiment at the right side of the graph, you can note that specially in the third graphic, the graph corresponding to Catalan News is peaking at the end, the dates corresponding at october when the polls just started to be declared illegal by the central government in spainand the vote continued as promised.

Conclusions (2)

THe normality in 80% of the graph shows that the press keeps a moderate tone to inform in order to keep imparcial no matter what the media is coming from in this case.

we can expect a similar increase of animosity in the days close to the new vote of december 21 if the events turn violent like as happened in october.