PRESENTACION.knit

# COMPARACIÓN ENTRE APPLE Y SAMSUNG A TRAVÉS DEL TEXT MINING EN TWITTER

## BIG DATA
### Estudiantes:
### Fabiola Aguilar - Cristóbal Matamala - Javiera Ramiréz
### Prof. Mauricio Huerta

---
class: center, middle
# **PLANTEAMIENTO DEL PROBLEMA**
***

## Las empresas Apple y Samsung son competidoras entre sí en algunos de sus productos y se basan principalmente en lo que opinan sus clientes sobre estos. En este estudio se realizará una minería de opiniones o análisis de sentimientos en base a los Tweets de las personas para realizar una comparación entre Apple y Samsung.

---
class: center, middle
# **MOTIVACIÓN**
***

## Nos motiva el poder llegar a las opiniones tanto positivas como negativas de los usuarios (en la red social Twitter) sobre las marcas Apple y Samsung para así realizar un análisis comparativo y descriptivo de ellas.
---

# **PREGUNTAS DE INVESTIGACIÓN**
***
.pull-left[
+ ## ¿Qué marca tiene mayor proporción de opiniones positivas en la red social Twitter?

+ ## ¿Cuál es la proporción o porcentaje de opiniones positivas y negativas sobre la marca Apple en la red social Twitter?]

+ ## ¿Qué marca tiene mayor proporción de opiniones negativas en la red social Twitter?

+ ## ¿Cuál es la proporción o porcentaje de opiniones positivas y negativas sobre la marca Samsung en la red social Twitter?]
---

# **OBJETIVOS**
***
.left-column[
# **OBJETIVO GENERAL**
### **Implementar una estrategia para extraer la opinión de los usuarios en Twitter, a través del Text Mining, para poder clasificar los tweets.**
]

.right-column[
# OBJETIVOS ESPECIFICOS
.pull-left[
+ ###Construir un conjunto de datos de la red social Twitter en base a tweets sobre la marca Apple.
+ ###Construir un conjunto de datos de la red social Twitter en base a tweets sobre la marca Samsung.]
.pull-right[
+ ###Comparar la marca Apple y Samsung basado en Text Mining de la red social Twitter.
+ ###Describir la marca Apple y Samsung basado en Text Mining de la red social Twitter.]
]

---

# **ALCANCE DE INVESTIGACIÓN**
***
<img style="border-radius: 50%;" src="alcance.jpeg"
width="200px"
/>

## Alcance Descriptivo

---
class: middle

# **HIPÓTESIS DE INVESTIGACIÓN**
***

+ ## La marca Apple tiene mayor proporción de opiniones positivas.

+ ## En la red social Twitter se encuentran más opiniones positivas que negativas de ambas marcas.

+ ## La marca Samsung tiene mayor proporción de opiniones negativas.

---
class: middle

# **RECOPILACIÓN DE DATOS**
***
+ Utilizamos la librería rtweet que nos permite interactuar con la API de Twitter y extraer los datos

```r
install.packages("rtweet") 
*library(rtweet)
```

+ Consideramos sólo los tweets en español y no incluimos retweets. _(Ejemplo de Samsung)_

```r
Samsung <- search_tweets("Samsung OR #samsung OR samsung OR #Samsung",
                                                   n = 3000, 
*                                       include_rts = FALSE,
*                                               lang = "es",
                                    retryonratelimit = TRUE)
```

+ Cantidad de tweets: 17.996 de Apple y 8.819 de Samsung

---
class: middle

# **RECOPILACIÓN DE DATOS**
***
+ Los datos obtenidos se ven de la siguiente forma

<div id="htmlwidget-2635b99bb080a7251980" style="width:100%;height:auto;" class="datatables html-widget"></div>
<script type="application/json" data-for="htmlwidget-2635b99bb080a7251980">{"x":{"filter":"none","vertical":false,"fillContainer":true,"data":[["1","2","3","4","5","6","7","8","9","10"],["954441538259640320","777966588","1253409942901420034","1253409942901420034","1253409942901420034","1253409942901420034","1253409942901420034","1253409942901420034","1253409942901420034","1253409942901420034"],["1464400436111233033","1464399989900251143","1464399176322588677","1464294483013476359","1464398817831243781","1462633786001149954","1463399173110202374","1462632757910130692","1464398397507399681","1463921099456987141"],["2021-11-27T01:07:25Z","2021-11-27T01:05:39Z","2021-11-27T01:02:25Z","2021-11-26T18:06:24Z","2021-11-27T01:01:00Z","2021-11-22T04:07:23Z","2021-11-24T06:48:46Z","2021-11-22T04:03:18Z","2021-11-27T00:59:19Z","2021-11-25T17:22:43Z"],["Memy_2198","Sabrococo","samsung_jin","samsung_jin","samsung_jin","samsung_jin","samsung_jin","samsung_jin","samsung_jin","samsung_jin"],["Vendo Samsung note 9\n128 GB\n6 GB RAM \nESTADO 9/10 \n290$ negociable https://t.co/tsIw5kZYgM","Gente, me han robado el móvil en barcelona en la parada de passeig de gracia, es un Samsung Galaxy A52s 5G, por favor si sabéis cualquier cosa avisadme","@Elymin136 Aún no me leo el libro :(","@Elymin136 @BTS_twt Esto me esta lastimando😭😭😭😭😭😭😭😭","@Elymin136 @BTS_twt En verdad te quedaron muy bonitos Ely.♡","@Elymin136 Amén.","Flaco me re gustas.🤯\n\n@BTS_twt https://t.co/1PLSZpM0rS","@bts_bighit @AMAs REYES LLA ROMPIERON... ESTOY ORGULLOSA 😭😭😭😭","@Elymin136 @BTS_twt Te quedaron re bonitos😭😭😭","@Incompletelyrcs Me encanta mucho la historia, creo que el final que decidas darle será el adecuado... realmente me encanta, la historia me tiene muy atrapada, y uff la redacción es perfecta.🛐"],["Twitter for Android","Twitter Web App","Twitter for Android","Twitter for Android","Twitter for Android","Twitter for Android","Twitter for Android","Twitter for Android","Twitter for Android","Twitter for Android"],[66,151,25,31,39,5,30,43,25,175],[null,null,"1464398605066907653","1464292479302254596","1464381525961629702","1462633108633706498",null,"1462630412409139205","1464381525961629702","1463918248504737797"],[null,null,"823960937120104448","823960937120104448","823960937120104448","823960937120104448",null,"1409798257","823960937120104448","1535376043"],[null,null,"Elymin136","Elymin136","Elymin136","Elymin136",null,"bts_bighit","Elymin136","Incompletelyrcs"],[false,false,false,false,false,false,false,false,false,false],[false,false,false,false,false,false,false,false,false,false],[0,2,0,0,0,0,1,0,0,0],[1,8,0,0,0,0,0,0,0,0],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],["http://pbs.twimg.com/media/FFKYbBbXMAgPrJE.jpg",null,null,null,null,null,"http://pbs.twimg.com/media/FE8JyR2VIAEsnO5.jpg",null,null,null],["https://t.co/tsIw5kZYgM",null,null,null,null,null,"https://t.co/1PLSZpM0rS",null,null,null],["https://twitter.com/Memy_2198/status/1464400436111233033/photo/1",null,null,null,null,null,"https://twitter.com/samsung_jin/status/1463399173110202374/photo/1",null,null,null],["photo",null,null,null,null,null,"photo",null,null,null],[["http://pbs.twimg.com/media/FFKYbBbXMAgPrJE.jpg","http://pbs.twimg.com/media/FFKYbaHX0AUvetQ.jpg","http://pbs.twimg.com/media/FFKYbuhWQAAdJzf.jpg"],null,null,null,null,null,"http://pbs.twimg.com/media/FE8JyR2VIAEsnO5.jpg",null,null,null],[["https://t.co/tsIw5kZYgM","https://t.co/tsIw5kZYgM","https://t.co/tsIw5kZYgM"],null,null,null,null,null,"https://t.co/1PLSZpM0rS",null,null,null],[["https://twitter.com/Memy_2198/status/1464400436111233033/photo/1","https://twitter.com/Memy_2198/status/1464400436111233033/photo/1","https://twitter.com/Memy_2198/status/1464400436111233033/photo/1"],null,null,null,null,null,"https://twitter.com/samsung_jin/status/1463399173110202374/photo/1",null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,"823960937120104448",["823960937120104448","335141638"],["823960937120104448","335141638"],"823960937120104448","335141638",["1409798257","52536879"],["823960937120104448","335141638"],"1535376043"],[null,null,"Elymin136",["Elymin136","BTS_twt"],["Elymin136","BTS_twt"],"Elymin136","BTS_twt",["bts_bighit","AMAs"],["Elymin136","BTS_twt"],"Incompletelyrcs"],["es","es","es","es","es","es","es","es","es","es"],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[[null,null],[null,null],[null,null],[null,null],[null,null],[null,null],[null,null],[null,null],[null,null],[null,null]],[[null,null],[null,null],[null,null],[null,null],[null,null],[null,null],[null,null],[null,null],[null,null],[null,null]],[[null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null]],["https://twitter.com/Memy_2198/status/1464400436111233033","https://twitter.com/Sabrococo/status/1464399989900251143","https://twitter.com/samsung_jin/status/1464399176322588677","https://twitter.com/samsung_jin/status/1464294483013476359","https://twitter.com/samsung_jin/status/1464398817831243781","https://twitter.com/samsung_jin/status/1462633786001149954","https://twitter.com/samsung_jin/status/1463399173110202374","https://twitter.com/samsung_jin/status/1462632757910130692","https://twitter.com/samsung_jin/status/1464398397507399681","https://twitter.com/samsung_jin/status/1463921099456987141"],["Emely Altamirano💫","Turbopatater Dungeons & Sabrons","𝚂𝚊𝚖𝚜𝚞𝚗𝚐_𝚓𝚒𝚗⁷ 𝙺𝚂𝙹𝟷🍎🐣","𝚂𝚊𝚖𝚜𝚞𝚗𝚐_𝚓𝚒𝚗⁷ 𝙺𝚂𝙹𝟷🍎🐣","𝚂𝚊𝚖𝚜𝚞𝚗𝚐_𝚓𝚒𝚗⁷ 𝙺𝚂𝙹𝟷🍎🐣","𝚂𝚊𝚖𝚜𝚞𝚗𝚐_𝚓𝚒𝚗⁷ 𝙺𝚂𝙹𝟷🍎🐣","𝚂𝚊𝚖𝚜𝚞𝚗𝚐_𝚓𝚒𝚗⁷ 𝙺𝚂𝙹𝟷🍎🐣","𝚂𝚊𝚖𝚜𝚞𝚗𝚐_𝚓𝚒𝚗⁷ 𝙺𝚂𝙹𝟷🍎🐣","𝚂𝚊𝚖𝚜𝚞𝚗𝚐_𝚓𝚒𝚗⁷ 𝙺𝚂𝙹𝟷🍎🐣","𝚂𝚊𝚖𝚜𝚞𝚗𝚐_𝚓𝚒𝚗⁷ 𝙺𝚂𝙹𝟷🍎🐣"],["","Puerto Hurraco","Magic Shop - LTB","Magic Shop - LTB","Magic Shop - LTB","Magic Shop - LTB","Magic Shop - LTB","Magic Shop - LTB","Magic Shop - LTB","Magic Shop - LTB"],["1998🎢","✨ Furby con cuchillo opina sobre cosas y hace shitposting ✨ Doy los buenos días ✨ Cualquier pronombre 🏳️‍🌈🏳️‍⚧️ ✨ De hiatus en twich ✨ https://t.co/t15uY2l3Yb","𝙹𝚒𝚖𝚝𝚘𝚋𝚎𝚛🐣🎃","𝙹𝚒𝚖𝚝𝚘𝚋𝚎𝚛🐣🎃","𝙹𝚒𝚖𝚝𝚘𝚋𝚎𝚛🐣🎃","𝙹𝚒𝚖𝚝𝚘𝚋𝚎𝚛🐣🎃","𝙹𝚒𝚖𝚝𝚘𝚋𝚎𝚛🐣🎃","𝙹𝚒𝚖𝚝𝚘𝚋𝚎𝚛🐣🎃","𝙹𝚒𝚖𝚝𝚘𝚋𝚎𝚛🐣🎃","𝙹𝚒𝚖𝚝𝚘𝚋𝚎𝚛🐣🎃"],[null,"https://t.co/rtBn8NhXzO",null,null,null,null,null,null,null,null],[false,false,false,false,false,false,false,false,false,false],[229,5429,31,31,31,31,31,31,31,31],[165,616,490,490,490,490,490,490,490,490],[0,60,0,0,0,0,0,0,0,0],[4823,112847,5269,5269,5269,5269,5269,5269,5269,5269],[909,186949,16845,16845,16845,16845,16845,16845,16845,16845],["2018-01-19T19:52:50Z","2012-08-24T11:24:00Z","2020-04-23T19:46:56Z","2020-04-23T19:46:56Z","2020-04-23T19:46:56Z","2020-04-23T19:46:56Z","2020-04-23T19:46:56Z","2020-04-23T19:46:56Z","2020-04-23T19:46:56Z","2020-04-23T19:46:56Z"],[false,false,false,false,false,false,false,false,false,false],[null,"https://t.co/rtBn8NhXzO",null,null,null,null,null,null,null,null],[null,"https://linktr.ee/sabrococo",null,null,null,null,null,null,null,null],[null,null,null,null,null,null,null,null,null,null],[null,"https://pbs.twimg.com/profile_banners/777966588/1619186207","https://pbs.twimg.com/profile_banners/1253409942901420034/1637904536","https://pbs.twimg.com/profile_banners/1253409942901420034/1637904536","https://pbs.twimg.com/profile_banners/1253409942901420034/1637904536","https://pbs.twimg.com/profile_banners/1253409942901420034/1637904536","https://pbs.twimg.com/profile_banners/1253409942901420034/1637904536","https://pbs.twimg.com/profile_banners/1253409942901420034/1637904536","https://pbs.twimg.com/profile_banners/1253409942901420034/1637904536","https://pbs.twimg.com/profile_banners/1253409942901420034/1637904536"],["http://abs.twimg.com/images/themes/theme1/bg.png","http://abs.twimg.com/images/themes/theme1/bg.png",null,null,null,null,null,null,null,null],["http://pbs.twimg.com/profile_images/1459010989614108672/ZVfwQqBE_normal.jpg","http://pbs.twimg.com/profile_images/1367246682044391427/VN_sk48F_normal.jpg","http://pbs.twimg.com/profile_images/1464103856267096066/U5-seDFK_normal.jpg","http://pbs.twimg.com/profile_images/1464103856267096066/U5-seDFK_normal.jpg","http://pbs.twimg.com/profile_images/1464103856267096066/U5-seDFK_normal.jpg","http://pbs.twimg.com/profile_images/1464103856267096066/U5-seDFK_normal.jpg","http://pbs.twimg.com/profile_images/1464103856267096066/U5-seDFK_normal.jpg","http://pbs.twimg.com/profile_images/1464103856267096066/U5-seDFK_normal.jpg","http://pbs.twimg.com/profile_images/1464103856267096066/U5-seDFK_normal.jpg","http://pbs.twimg.com/profile_images/1464103856267096066/U5-seDFK_normal.jpg"]],"container":"<table class=\"display fill-container\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>user_id<\/th>\n      <th>status_id<\/th>\n      <th>created_at<\/th>\n      <th>screen_name<\/th>\n      <th>text<\/th>\n      <th>source<\/th>\n      <th>display_text_width<\/th>\n      <th>reply_to_status_id<\/th>\n      <th>reply_to_user_id<\/th>\n      <th>reply_to_screen_name<\/th>\n      <th>is_quote<\/th>\n      <th>is_retweet<\/th>\n      <th>favorite_count<\/th>\n      <th>retweet_count<\/th>\n      <th>quote_count<\/th>\n      <th>reply_count<\/th>\n      <th>hashtags<\/th>\n      <th>symbols<\/th>\n      <th>urls_url<\/th>\n      <th>urls_t.co<\/th>\n      <th>urls_expanded_url<\/th>\n      <th>media_url<\/th>\n      <th>media_t.co<\/th>\n      <th>media_expanded_url<\/th>\n      <th>media_type<\/th>\n      <th>ext_media_url<\/th>\n      <th>ext_media_t.co<\/th>\n      <th>ext_media_expanded_url<\/th>\n      <th>ext_media_type<\/th>\n      <th>mentions_user_id<\/th>\n      <th>mentions_screen_name<\/th>\n      <th>lang<\/th>\n      <th>quoted_status_id<\/th>\n      <th>quoted_text<\/th>\n      <th>quoted_created_at<\/th>\n      <th>quoted_source<\/th>\n      <th>quoted_favorite_count<\/th>\n      <th>quoted_retweet_count<\/th>\n      <th>quoted_user_id<\/th>\n      <th>quoted_screen_name<\/th>\n      <th>quoted_name<\/th>\n      <th>quoted_followers_count<\/th>\n      <th>quoted_friends_count<\/th>\n      <th>quoted_statuses_count<\/th>\n      <th>quoted_location<\/th>\n      <th>quoted_description<\/th>\n      <th>quoted_verified<\/th>\n      <th>retweet_status_id<\/th>\n      <th>retweet_text<\/th>\n      <th>retweet_created_at<\/th>\n      <th>retweet_source<\/th>\n      <th>retweet_favorite_count<\/th>\n      <th>retweet_retweet_count<\/th>\n      <th>retweet_user_id<\/th>\n      <th>retweet_screen_name<\/th>\n      <th>retweet_name<\/th>\n      <th>retweet_followers_count<\/th>\n      <th>retweet_friends_count<\/th>\n      <th>retweet_statuses_count<\/th>\n      <th>retweet_location<\/th>\n      <th>retweet_description<\/th>\n      <th>retweet_verified<\/th>\n      <th>place_url<\/th>\n      <th>place_name<\/th>\n      <th>place_full_name<\/th>\n      <th>place_type<\/th>\n      <th>country<\/th>\n      <th>country_code<\/th>\n      <th>geo_coords<\/th>\n      <th>coords_coords<\/th>\n      <th>bbox_coords<\/th>\n      <th>status_url<\/th>\n      <th>name<\/th>\n      <th>location<\/th>\n      <th>description<\/th>\n      <th>url<\/th>\n      <th>protected<\/th>\n      <th>followers_count<\/th>\n      <th>friends_count<\/th>\n      <th>listed_count<\/th>\n      <th>statuses_count<\/th>\n      <th>favourites_count<\/th>\n      <th>account_created_at<\/th>\n      <th>verified<\/th>\n      <th>profile_url<\/th>\n      <th>profile_expanded_url<\/th>\n      <th>account_lang<\/th>\n      <th>profile_banner_url<\/th>\n      <th>profile_background_url<\/th>\n      <th>profile_image_url<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"columnDefs":[{"className":"dt-right","targets":[7,13,14,15,16,37,38,42,43,44,52,53,57,58,59,78,79,80,81,82]},{"orderable":false,"targets":0}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>
---
class: middle

# **RECOPILACIÓN DE DATOS**
***
+ Guardamos los datos extraidos en un archivo *.rds*, repitiendo el procedimiento durante 5 días (23-28 nov).

```r
write_rds(samsung,
          file = file.path("TRABAJO-BIGDATA,
                          paste0(Sys.Date(),
                                 " Samsung.rds")))
```

+ Se leen los datos recopilados y se compilan.

```r
RDS_file_samsung <- grep(list.files(here("TRABAJO-BIGDATA"), full.names = T), pattern = "Samsung.rds", value = T)
patron     <- "Samsung.rds"
directorio <- here("TRABAJO-BIGDATA")
samsung    <- pmap_df(list(RDS_file_samsung, patron, directorio), leerRDS)

leerRDS <- function(rds_file,patron,directorio){
  rds_filename <- gsub(rds_file,pattern = paste0(directorio,"/"), replacement = "")
  rds_fecha    <- gsub(rds_filename, pattern = patron, replacement = "")
  readRDS(rds_file) %>%
    mutate(harvest_date = as.Date(rds_fecha))
}
```
---
class: middle
# **RECOPILACIÓN DE DATOS**
***

+ Se unen los datos para trabajarlos de mejor forma.

```r
samsung <- pmap_df(list(RDS_file_samsung,
                      patron,
                      directorio),leer)
```

---
class: middle
# **PREPROCESAMIENTO**
***

+ Seleccionamos solo algunas variables que son las que nos interesan.

```r
samsung <- Samsung %>%
  select(user_id,
         created_at,
         text,
         favorite_count,
         reply_count,
         retweet_count,
         followers_count)
```

+ Renombramos las variables con nombres más prácticos.

```r
samsung <- samsung %>%
  rename(Autor = user_id,
         Fecha = created_at,
         Texto = text,
         Respuestas = reply_count,
         Likes = favorite_count,
         Retweets = retweet_count,
         Seguidores = followers_count)
```

---
class: middle
# **PREPROCESAMIENTO**
***

+ Se utiliza una función llamada `limpiar_tokenizar` para limpiar los tweets.

```r
  nuevo_Texto <- tolower(Texto)
  # Eliminación de páginas web 
  nuevo_Texto <- str_replace_all(nuevo_Texto,"http\\S*", "")
  # Eliminación de menciones con @
  nuevo_Texto <- str_replace_all(nuevo_Texto,"@\\S*", "")
  # Eliminacion de hashtag
  nuevo_Texto <- str_replace_all(nuevo_Texto,"#\\S*", "")
  # Eliminación de signos de puntuación
  nuevo_Texto <- str_replace_all(nuevo_Texto,"[[:punct:]]", " ")
  # Eliminación de números
  nuevo_Texto <- str_replace_all(nuevo_Texto,"[[:digit:]]", " ")
  # Eliminación de espacios en blanco múltiples
  nuevo_Texto <- str_replace_all(nuevo_Texto,"[\\s]+", " ")
```

---
class: middle
# **PREPROCESAMIENTO**
***

+ Seleccionamos un tweet por usuario, el que tenga más likes (para tweets duplicados).

```r
samsung_nd <- samsungtext %>%
  arrange(Likes) %>%
  filter(!duplicated(Texto))
```

+ La cantidad de datos se redujo a 83838 y 838383 para Apple y Samsung respectivamente.

+ Creamos un data frame con las palabras que queremos evitar más adelante, en el análisis.

```r
samsung_words <- c("galaxyzflip","galaxy")
stop_words1   <- data.frame(Palabras = c(stopwords(kind = "es"), samsung_words))
```

---

+ Se calcula la frecuencia con la que cada autor ocupa una palabra.

+ Agrupamos las palabras iguales y sumamos el total de veces que aparecen.

+ Quitamos todas las palabras que solo se repiten una vez.
---

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> word </th>
   <th style="text-align:right;"> m </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> precio </td>
   <td style="text-align:right;"> 693 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> iphone </td>
   <td style="text-align:right;"> 657 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> qled </td>
   <td style="text-align:right;"> 533 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> mejor </td>
   <td style="text-align:right;"> 491 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> oferta </td>
   <td style="text-align:right;"> 428 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ahora </td>
   <td style="text-align:right;"> 406 </td>
  </tr>
</tbody>
</table>
]

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> word </th>
   <th style="text-align:right;"> m </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> app </td>
   <td style="text-align:right;"> 2289 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> now </td>
   <td style="text-align:right;"> 1393 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> spotify </td>
   <td style="text-align:right;"> 1322 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> mejor </td>
   <td style="text-align:right;"> 1235 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> radio </td>
   <td style="text-align:right;"> 1151 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> android </td>
   <td style="text-align:right;"> 1126 </td>
  </tr>
</tbody>
</table>

]

---
class: middle
# **ANÁLISIS**
***

![](PRESENTACION_files/figure-html/16 setup-1.png)
]

![](PRESENTACION_files/figure-html/17 setup-1.png)
]

---

![](PRESENTACION_files/figure-html/18 setup-1.png)
]

![](PRESENTACION_files/figure-html/19 setup-1.png)
]
---
class: middle
# **ANÁLISIS**
***

![](PRESENTACION_files/figure-html/20 setup-1.png)

]

.pull-right[
.center[**SAMSUNG EN APPLE**]
![](PRESENTACION_files/figure-html/21 setup-1.png)
]

---
class: middle
# **ANÁLISIS DE SENTIMIENTOS**
***

+ Leemos el archivo _.csv_ de sentimientos.

```r
afinn <- read.csv("lexico_afinn.en.es.csv", stringsAsFactors = F, fileEncoding = "latin1") %>% 
  tbl_df()
```

+ Quitamos las palabras duplicadas.

```r
afinn <- afinn %>%
  arrange(Puntuacion) %>%
  filter(!duplicated(Palabra))
```
---
class: middle
# **ANÁLISIS DE SENTIMIENTOS**
***

+ Tomamos el data frame de la frecuencia de palabras y hacemos una intercepción con el data frame de sentimientos. Creamos una nueva variable que es el puntaje total por cada palabra y otra que indiza si es una valoración positiva o negativa.

```r
analisis <- frec %>%
  inner_join(afinn,by =c("word"="Palabra"))%>%
  mutate(Ponderacion = m*Puntuacion,
         Apreciacion = ifelse(Puntuacion<0,"Negativo","Positivo"))
```

+ Calculamos la proporcion de palabras negativas y positivas.

```r
analisis %>%
  group_by(Apreciacion)%>%
  summarise(total = sum(m))%>%
  mutate(proporcion = total/sum(total))
```

---
class: middle
# **ANÁLISIS DE SENTIMIENTOS**
***

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Apreciacion </th>
   <th style="text-align:right;"> total </th>
   <th style="text-align:right;"> proporcion </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Negativo </td>
   <td style="text-align:right;"> 2947 </td>
   <td style="text-align:right;"> 0.4189055 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Positivo </td>
   <td style="text-align:right;"> 4088 </td>
   <td style="text-align:right;"> 0.5810945 </td>
  </tr>
</tbody>
</table>
]

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Apreciacion </th>
   <th style="text-align:right;"> total </th>
   <th style="text-align:right;"> proporcion </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Negativo </td>
   <td style="text-align:right;"> 7610 </td>
   <td style="text-align:right;"> 0.4907461 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Positivo </td>
   <td style="text-align:right;"> 7897 </td>
   <td style="text-align:right;"> 0.5092539 </td>
  </tr>
</tbody>
</table>
]

---
class: middle
# **ANÁLISIS DE SENTIMIENTOS**
***

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> Promedio </th>
   <th style="text-align:right;"> Varianza </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 0.3179815 </td>
   <td style="text-align:right;"> 5.440386 </td>
  </tr>
</tbody>
</table>
]

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> Promedio </th>
   <th style="text-align:right;"> Varianza </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 0.0628103 </td>
   <td style="text-align:right;"> 5.044552 </td>
  </tr>
</tbody>
</table>
]
---
class: middle
# **ANÁLISIS DE SENTIMIENTOS**
***

![](PRESENTACION_files/figure-html/32 setup-1.png)
]

---
class: middle

# **COMPLICACIONES**
***

+ ###  Cantidad de tweets.
 + ### Tweets de publicidad.
 + ### Tweets iguales.
 + ### Palabras sin sentido (como símbolos).
 + ### Los paquetes para realizar el análisis de sentimientos en idiomas diferentes del español.
 + ### Traducción de palabras mal echa.

---

# ¡GRACIAS POR SU ATENCIÓN!