While waiting for a hyper-parameter tuning run to finish, I did a quick exploration on how OpenTable’s “brand new brand” was received on the Twitterverse. Pulling up a few lightweight Python tools, I conducted a simplistic sentiment analysis on one week’s worth of tweets related to OpenTable rebrand event, then visualized the results using dimple.js
Interactive visualization
Here is how the result of that exercise looks. Click on the screenshot below to launch the interactive version.
Highlights
Visualization can be powerful, both for data scientists and less technical members of a team. Just from a glance at the screenshot you can start understanding the pace and range of responses we received, and the tooling lets you dive into the results quickly. Some things we saw:
The overwhelming majority of tweets about our rebranding were positive; only three tweets expressed a negative sentiment, and (as you can tell from the bubble size, representing the number of retweets) they weren’t popular opinions. Fundafeast made the most influential tweet, which expressed a moderately positive sentiment:
Just in Time for the Reservations Renaissance, OpenTable Debuts a New Look #grub #foodie http://t.co/yZhDhQ2DqH
— fundafeast (@fundafeast) March 3, 2015
On the extreme end, a tweet from law firm blog represented a 1.0 on the sentiment scale, but traveled less far with one retweet — still better than any of the negative reactions:
Hey @Opentable @contentpilot thinks your re-branding is GREAT! So great in fact, we dedicated a blog entry to it! http://t.co/CxkGhO26yN — Paige Hornback (@PaigeHornback) March 3, 2015
How to do this
First, I used a Python package called twitter [1] to extract the tweets I needed. To do this, I signed into Twitter with my regular Twitter account, and create a sample app at https://dev.twitter.com/apps. This generates the standard OAuth 1.0A authentication identifiers: consumer key, consumer secret, access token, and access token secret. See [2] for more information.
With these in hand, it was a breeze to connect to the Twitter API, like so:
import twitter import json import pandas as pd CONSUMER_KEY = '...' CONSUMER_SECRET ='...' OAUTH_TOKEN = '...' OAUTH_TOKEN_SECRET = '...' auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET, CONSUMER_KEY, CONSUMER_SECRET) twitter_api = twitter.Twitter(auth=auth)
Next, I performed my query (here is a simplified version). Note that I hit the API several times to extract tweets in batches:
from urllib import unquote q = 'opentable brand OR opentable rebrand' count= 100 search_results = twitter_api.search.tweets(q=q, count=count) statuses = search_results['statuses']' for _ in range(10): try: next_results = search_results['search_metadata']['next_results'] except KeyError, e: # No more results when next_results doesn't exist break # Create a dictionary from next_results kwargs = dict([ kv.split('=') for kv in unquote(next_results[1:]).split("&") ]) search_results = twitter_api.search.tweets(**kwargs) statuses += search_results['statuses']
I found about a lot of tweets in the week following the rebrand, and then removed the retweets, leaving me with about half the total. From this, I extracted the fields that I would use later and structured them into a Pandas dataframe.
all_tweets = {'user':[], 'text':[], 'date':[], 'tweet_id':[],'retweet_count':[], 'urls':[] ,'text_nu':[]} for status in statuses: all_tweets['user'] += [status['user']['screen_name']] all_tweets['text'] += [status['text']] all_tweets['date'] += [status['created_at']] all_tweets['tweet_id'] += [status['id']] all_tweets['retweet_count'] += [status['retweet_count']] urls = [a['url'] for a in status['entities']['urls']] all_tweets['urls'] += [urls] all_tweets['text_nu'] += [removeurls(status['text'], urls)]
Note that I take out any urls from the post (the field text_nu
) with this tiny function:
def removeurls(text, urls): for url in urls: text = text.replace(url,'') return text
Why I removed the urls has to do with how I computed the sentiments. Which takes us to sentiment estimation.
Sentiment estimation is a non-trivial problem, but here I was just trying to get a rough sense of the overall sentiments of the tweets. So I decided to use a pretty lightweight tool called TextBlob [3]. TextBlob uses a Naive Bayes based sentiment estimator with a corpus trained on movie reviews. It has its quirks, but seemed to work reasonably well for what I was trying to do. TextBlob breaks up the tweet into sentences and assigns a sentiment polarity to each sentence. I found that an average of sentiment_polarity weighted by the character length of the sentence worked pretty well in reflecting the intuitive idea of the sentiment expressed in the tweet:
def weighted_sentiment(snippet): blob = textblob.TextBlob(snippet.replace("behind", "").replace("little","").replace("sans","") .replace("Behind", "").replace("firms","").replace("hard","").replace("subtly","") .replace("Never", "")).replace("(","").replace(")","") sentences = blob.sentences lens = [len(x) for x in blob.sentences] sentis = [x.sentiment.polarity for x in blob.sentences] total_score = np.dot(lens,sentis)/np.sum(lens) return total_score
With this it was a one-liner to get the sentiments of all the tweets. Note that I am passing the text_nu
field, which is the text without the urls because the urls do not contribute to sentiment but that unduly increase the length of the tweets.
df['sentiment_polarity'] = df.text_nu.map(lambda x: weighted_sentiment(x))
I wrote out the whole dataFrame into a tsv file and from there it was pretty easy to use dimpple.js to generate the interactive visualization! You can poke into the source file of the implementation to see how it was done (hint: to size the bubbles, I had to logarithmically scale the retweet count).
Thats all folks!
References
[1] Python Twitter Tools: http://mike.verdone.ca/twitter/ and https://github.com/sixohsix/twitter
[3] TextBlob