El uso de Text-Mining y Twitter para la construccion de un indice de sentimiento en los mercados financieros

Resumen:

Modelar el comportamiento de los precios de instrumentos financieros es complejo, de manera que el hacerlo para los mercados financieros en general también lo es. Sobre todo en una época como esta donde las tecnologías de información desempeñan un papel importante y facilitan de gran manera la difusión de datos e información respecto a eventos económicos y financieros en todo el mundo. Además de lo anterior la base de participantes en los mercados financieros ha ido incrementandose en gran medida, ahora es mucho más accesible a personas físicas e individuos tener cuentas de inversión y trading, así como de generar contenido e información en la red respecto a sus propias opiniones.

Herramientas a utilizar

Python 2.7
Pycharm
Machine Learning
Text-Mining
Twitter API
Máquina Linux - Ubuntu 14.04lts

Configuraciones previas al código.

Antes de iniciar con el código es necesario realizar pasos de configuración directamente en la página oficial de twitter y twitter desarrolladores.

1. Cuenta en Twitter.

       username = '@iffranciscome'

2. Registro de APP.

    https://bitbucket.org/quant-ai/twittermining

3. Generar llaves.

    consumer_key    = '4TJeb1mqGw22VmxWd8w7gf3Tk'
    consumer_secret = 'uvwY6vtdfxzbsbUOS14sHQMfZgKX0cRyAi7041XbdEkZWVKXhc'
    access_token    = '3288299311-sk9rkLFG0bXoeZbVbV4mJN7lmLCuUqweMhkO7BV'
    access_token_secret = 'FhGj0ab2j7b7UN5LLjgeZ3ZBeTyCrkt0T6dGe6IeYAoVj'

Código parte I Ajustes Iniciales

1. Cargar Paquetes.

    # API para Twitter https://github.com/tweepy/tweepy
    import tweepy as tp

    # Manejo de datos
    import pandas as pd

2. Parametros de inicialización

    consumer_key    = '4TJeb1mqGw22VmxWd8w7gf3Tk'
    consumer_secret = 'uvwY6vtdfxzbsbUOS14sHQMfZgKX0cRyAi7041XbdEkZWVKXhc'
    access_token    = '3288299311-sk9rkLFG0bXoeZbVbV4mJN7lmLCuUqweMhkO7BV'
    access_token_secret = 'FhGj0ab2j7b7UN5LLjgeZ3ZBeTyCrkt0T6dGe6IeYAoVj'
    username = '@iffranciscome'

3. Autentificación global

    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    api  = tweepy.API(auth)

4. Usuarios elegidos para estudio.

    Periodicos = {'Nombre' : ['Wall Street Journal','Bloomberg Markets','CNBC'],
              'Cuenta' : ['@WSJmarkets','@markets','@CNBC']}
    Profesion  = {'Nombre' : ['Guillermo Barba','Jim Cramer','Tom Keene'],
              'Cuenta' : ['@memobarba','@jimcramer','@tomkeene']}
    Bolsas     = {'Nombre' : ['Dow Jones','BMV','CME Group'],
              'Cuenta' : ['@DowJones','@GrupoBMV','@CMEGroup']}

    DFPer = pd.DataFrame(Periodicos)
    DFPro = pd.DataFrame(Profesion)
    DFBol = pd.DataFrame(Bolsas)

Código parte II Extracción de Información

5. Extraer sólo los textos.

Tweets0 =  [t.text for t in api.user_timeline(DFPer['Cuenta'][0])]
Tweets1 =  [t.text for t in api.user_timeline(DFPer['Cuenta'][1])]
Tweets2 =  [t.text for t in api.user_timeline(DFPer['Cuenta'][2])]

Tweets3 =  [t.text for t in api.user_timeline(DFPro['Cuenta'][0])]
Tweets4 =  [t.text for t in api.user_timeline(DFPro['Cuenta'][1])]
Tweets5 =  [t.text for t in api.user_timeline(DFPro['Cuenta'][2])]

Tweets6 =  [t.text for t in api.user_timeline(DFBol['Cuenta'][0])]
Tweets7 =  [t.text for t in api.user_timeline(DFBol['Cuenta'][1])]
Tweets8 =  [t.text for t in api.user_timeline(DFBol['Cuenta'][2])]

6. Tokenizar Tweets

T0 = word_tokenize(Tweets0[0])
T1 = word_tokenize(Tweets0[1])
T2 = word_tokenize(Tweets0[2])
T3 = word_tokenize(Tweets0[3])

Tweets0[0].favorite_count
Tweets0[0].retweeted

Código parte III Ajustar y Acomodar Datos

Twitter Official API

API oficial de twitter para conectividad directa

Pagina Oficial Desarrolladores https://dev.twitter.com/
Documentacion general https://dev.twitter.com/overview/documentation
Documentacion REST APIs https://dev.twitter.com/rest/public
Consola oficial para pruebas https://dev.twitter.com/rest/tools/console

Tweepy

Librería de python para comunicación con API de Twitter

Pagina Oficial http://www.tweepy.org/
Documentacion http://tweepy.readthedocs.org/en/v3.2.0/

Natural Language Toolkit (NLTK)

Plataforma que incluye libreria de python para construir programas que trabajen con datos de lenguaje humano.

Web Oficial: Natural Language Tool Kit
Pagina GitHub: Wiki GitHub
Libro-Tutorial: nltk Book

Otras fuentes y consultas

Marco Bonzanini Blog
RealPython. Blog