2/25/2020

Iphone11 Sentiment Analysis

Business Context:

  • Prospective modern-day customers read the product reviews and feedback given by experts and other users on social media before the purchase of new gadgets.

Problem Description:

  • This is an analysis of the tech review and customer feedback on twitter, to get insights into what the customers seek from Apple’s iPhone Series.
  • To identify the areas of concern that need more focus.

Data Collection

Recap Of Proposed Data Analytics Plan

  • Use topic modelling techniques to create topics based on co-occurence of words in the document.
  • Use text mining and sentiment analysis.
  • Use visualization techniques to demonstrate distribution of sentiments, frequency of the tweets, frequent terms in the tweets and geography-wise tweet analysis.

Summary Of Peer Comments

  • Maybe try and add some extra visualization to the project analysis -We have added a wide range of visual images to depict the information in the best way possible.
  • Could you also include region specific most popular #hashtags in your analysis. -We have included region-wise frequency of tweets and most popular hashtags have been visualized using a word cloud.
  • Will you be performing analysis on the launch of a product by location? -Yes, we have analyzed the location-wise performance with regards to launch of a product by location.
  • what amount of tweet data are you considering to use for your project purposes? -We have considered tweets around 23k for our analysis.

Peer Comments(Contd.)

  • Which statistical model are you using for topic modelling and what are different topics that you are expecting from the topic modelling of your tweets? -We have used LDA topic modelling technique and identified top 5 topics using it.
  • Can you not pull tweets with a date filter on them? -We have pulled tweets using a date filter.

Data Summary

  • Data has been uploaded at https://drive.google.com/uc?export=download&id=1kOlxzs8Bq9 WzXl8UlZDD8mOrTE9pyq6s
  • Data consists of following columns: Sno,user_id,created_at,screen_name,text,source,is_retweet, favorite_count,retweet_count,reply_count,hashtags,place_fu ll_name,place_type,country,country_description,followers_c ount,account_lang

Data Exploration

  • Frequency of tweets based on the source attributes

Data Exploration

  • Frequency of tweets based on user location

Data Exploration

  • Day-to-Day trend of tweets regarding iPhone11 series

NLP Procedure

  • As the analysis is about the data extracted from twitter, there are no predefined sentiments tagged to the tweets in the data. So, Sentimental Analysis is performed.
  • Since there is no scope for building models based on training and validation data sets, topic modelling techniques are used on the tweets.
  • Latent Dirichlet allocation (LDA) topic modelling technique has been used in the analysis.
  • Steps involved: Creating a corpus,Tokenization,Removing Stop Words,Removing Numbers,Removing rare words,Finding unigrams and bigrams,Finding correlation between words,Generating Document-Term Matrix,tf-idf.

Word Cloud of negative and positive words

Alt text

Frequency of various sentiments gathered through these tweets

Alt text

Top 10 frequent terms for each sentiment

Alt text

5 topic LDA model

Alt text

Key Take-aways

  • NLP has become increasingly popular over the past few years, and NLP researchers have achieved very insightful insights
  • The Natural Language Tool Kit (NLTK) is one of the most popular Python libraries for NLP
  • Regular Expressions are an important part of NLP, which can be used for pattern matching and filtering
  • Common feature engineering techniques are removing stop words, stemming, lemmatization, and n-grams
  • How you clean and preprocess your data will have a major effect on the conclusions you’ll be able to draw in your NLP classification problems