Social Media

Twitter Influence — Still Searching for the Perfect Answer

[Updated on 2/17/2011 — added the last section with additional information about Twitalyzer’s Community measurement and a little additional nod to TweetReach.]

A pretty intriguing post from Michael Healy came across the Twitterverse yesterday: #Measuring in 2010 — Analyzing the #measure Data of the Twitterati. What Michael did was take all of the Tweets that used the #measure hashtag in 2010 and run them through an “influence” formula he developed. The tweets data was courtesy of Kevin Hillstrom, who had set up a Twapper Keeper archive of #measure tweets. I’ve set up a couple of Twapper Keeper archives (hopefully, the #emetrics one will continue to function through the upcoming eMetrics conference, as I smell another juicy data set for us to play around with)  in my day and have been a bit frustrated with the quality of the exports — they required quite a bit of cleanup, especially of the timestamps — and, I’ve been a little skeptical of the completeness of the data. But, maybe that’s just because I’ve been working with TweetReach of late, and it’s just so darn clean and robust that the free services really do start to pale in comparison.

I digress.</statementoftheobvious>

Michael’s stated goal was pretty simple:

I wanted to know who were the most influential members of the #measure Twitterverse.

This was an exercise he did as prep work for the Web Analytics Association Spring Awards Gala, which, if you’re going to be in the area, you should plan to attend, as it should be a really good time.

I read the post and had three immediate thoughts:

  • “Influence” is one of the Mid-Major Holy Grails of social media management
  • “#measure” could be replaced by any brand or topic in, and the ability to achieve Michael’s stated goal would come in damn handy in all sorts of situations
  • Michael is one of those Brains with a capital “B,” so it’s worth taking a close look at what he produces when he claims he’s “just having fun.”

The final result was a big ol’ diagram (click on the image to jump over to Michael’s original post and a link to the full-sized image):

Michael’s formula focussed on both the volume of tweets and the “original content” in the tweets (using an Entropy calculation and a slight dampening of the score based on the volume of retweets).

Like Michael, I was surprised to see Szymon Szymanski (@ulyssez) as the dominant circle in the diagram. I’d certainly seen his tweets in the #measure stream, but I would have been more likely to guess @aknecht or @KISSMetrics (which is the good-sized circle off to the right of the diagram, as it so happens) would have had the dominant slot based on volume/variety.

So…Influence, You Say?

Seeing as I’ve been spending a lot of time with Twitalyzer of late, the next thought that popped into my mind was, “I wonder how Michael’s analysis of the top influencers would line up with Twitalyzer’s?”

It doesn’t take much to push that thought further and immediately hit a wrinkle:

  • Michael’s definition of influence was oriented towards tweet volume and tweet content
  • Twitalyzer’s definition of influence is based on an assessment of how likely a user is to be referenced or retweeted

Now, logically, if you have a high tweet volume, and your tweets contain a lot of original content (and, presumably, it’s not navel-gazing content, as that would rarely warrant the inclusion of the #measure hashtag), you’re more likely to be referenced or retweeted. Okay, there’s a logical link there, so maybe that’s not a huge wrinkle.

“Oh, bother,” said Pooh almost immediately, “I think we have a second wrinkle.” That being:

  • Michael’s analysis was based solely on tweets that included the #measure hashtag and the users who tweeted those tweets
  • Twitalyzer’s definition is more “user-based,” and takes into account the user’s direct and indirect network

And, a third wrinkle, just to round out a nice list:

  • Michael’s analysis was based on all #measure tweets from 2010
  • Twitalyzer operates more on a last day, last 7 days, last 30 days mode (with historical data going back much further…but it’s all based on when the user got plugged in as a daily-updated account)

For this third wrinkle, it seems reasonable to assume that the most influential #measure tweeters in 2010 are likely still fairly influential as of the last month.

In the end…does it matter? There’s only one way to know! Let’s take a look!

A Semi-Random Comparison

I’m not going to go through every bubble in Michael’s diagram. But, I am going to hit the “big” bubbles, look up their Twitalyzer Influence scores for the last 30 days, and then do the same for a smattering of small bubbles. For the “small bubble” users I”ve only included users where it looks like Twitalyzer has been doing daily tracking for the last 30 days. And, these bubbles were also a random selection of users that I readily recognized (which, I realize, very likely introduced some sample bias).

Let’s see what we see:

Username Bubble Size Twitalyzer Influence
@ulyssez Ginormous 1.0%
@immeria Huge 1.0%
@analyticscanvas Huge Not available*
@kissmetrics Damn Big 24.0%
@mongoosemetrics Big 5.0%
@thebrandbuilder Big Not available*
@cjpberry Pretty Big Not available*
@usujason Pretty Big 2.0%
@corryprohens Pretty Big 0.0%
@jdersh Pretty Big 1.0%
@minethatdata Pretty Big 3.0%
@hkwebanalytics Pretty Big 1.0%
@johnlovett Pretty Big 2.0%
@jimsterne Small 2.0%
@analyticspierce Small 1.0%
@ericjhansen Small 1.0%
@aknecht Small 4.0%
@tgwilson Small 2.0%
@jojoba Small 1.0%

* These accounts had not yet been Twitalyzed. As such, while I Twitalyzed them, a reliable 30-day average was not available, so I have not included their reported scores here.

So, what does this tell us? Well…seems like we don’t have a perfect correlation (we never do, do we?). From my own use of Twitter, the Twitalyzer scores square pretty well with what I would expect, although, yowza!, I wouldn’t have expected @kissmetrics to be running away from the pack like that!

I don’t think either one of these is “right” in any absolute way. Both approaches were developed with different purposes. Michael’s exercise was, I think, a couple of idle thoughts taken to a logical conclusion. Twitalyzer’s score is one metric inside a measurement platform that offers a whole suite of metrics and that has been evolving and maturing a couple of years.

Does Any of This Really Matter?

I’m drawn to these sorts of exercises because I think they do matter. As web analysts, we got to a pretty consistent definition of a “page view” and a “visit” (“different tools calculate differently” be damned — the basic definition is the same), left things a little loose on “unique visitors,” and never really reached closure on “engagement” (philosophical debates as to whether it even matters notwithstanding).

As social media continues to gain traction with consumers and as social media platforms continue to evolve and mature, we absolutely need to be thinking about measurement within those platforms and we need to keep scrambling to keep up. And, hopefully, maybe we’ll be able to influence the evolution of those platforms so that they’re at least somewhat measurement-friendly. As long as we’ve got analysts pushing the tools and experimenting with new approaches (another example: @jojoba’s oxygenating alter ego posted her Social Media Masters Twitter Analytics presentation over the weekend — lot’s o’ tools out there!), we’ll get there!

So, yeah, it matters. The fact that it’s pretty interesting to watch (and maybe even help) some really, really sharp minds in our space try to crack some pretty hard nuts is just added gravy.

Update: Twitalyzer Community Scores

One of the few benefits of being based in the Eastern timezone with a lot of the heavy analytics work occurring on the west coast is that I got to get up this morning with an inbox and comments on a post that went up shortly before I retired for the evening!

Eric Peterson sent me some of his thoughts and pointed me to the Community area under Tweets and Tags in Twitalyzer:

So…now I need to go do some more Twitalyzer exploration and thinking — from the list above, I need to think through the relationship between Participation, Influence, and Attention, methinks. One of the real draws, for me, of Twitalyzer, is that it enables picking a set of appropriate metrics that, together, measure the effectiveness of any particular Twitter engagement approach. The kicker is nailing down which of those metrics are the right fit in any given situation.

Jenn Deering Davis of TweetReach also sent me some TweetReach data on #measure that covers 2011 to date. TweetReach focusses on reach and exposure of tweets (the difference being that reach is “unique people exposed” and exposure is more “raw impressions”). From that perspective:

TweetReach’s approach has a more direct tie to traditional advertising measurement when it comes to tracking “impressions.” But, it also has a lot of other features that can help sniff out influential people on a particular topic — a major differentiator is that its trackers can use boolean logic, which they showcased in the work they did around the Super Bowl ads. It doesn’t really show this off when we’re looking at a community that is defined as tightly as the #measure hashtag.

Again, I say… so many tool…!

 

Analysis, Analytics Strategy, Reporting, Social Media

Analyzing Twitter — Practical Analysis

In my last post, I grabbed tweets with the “#emetrics” hashtag and did some analysis on them. One of the comments on that post asked what social tools I use for analysis — paid and free. Getting a bit more focussed than that, I thought it might be interesting to write up what free tools I use for Twitter analysis. There are lots of posts on “Twitter tools,” and I’ve spent more time than I like to admit sifting through them and trying to find ones that give me information I can really use. This, in some ways, is another one of those posts, except I’m going to provide a short list of tools I actually do use on a regular basis and how and why I use them.

What Kind of Analysis Are We Talking About?

I’m primarily focussed on the measurement and analysis of consumer brands on Twitter rather than on the measurement of one’s personal brand (e.g., @tgwilson). While there is some overlap, there are some things that make these fundamentally different. With that in mind, there are really three different lenses through which Twitter can be viewed, and they’re all important:

  • The brand’s Twitter account(s) — this is analysis of followers, lists, replies, retweets, and overall tweet reach
  • References of the brand or a campaign on Twitter — not necessarily mentions of @<brand>, but references to the brand in tweet content
  • References to specific topics that are relevant to the brand as a way to connect with consumers — at Resource Interactive, we call this a “shared passion,” and the nature of Twitter makes this particularly messy, but, to whatever level it’s feasible, it’s worth doing

While all three of these areas can also be applied in a competitor analysis, this is the only mention (almost) I’m going to make of that  — some of the techniques described here make sense and some don’t when it comes to analyzing the competition.

And, one final note to qualify the rest of this post: this is not about “online listening” in the sense that it’s not really about identifying specific tweets that need a timely response (or a timely retweet). It’s much more about ways to gain visibility into what is going on in Twitter that is relevant to the brand, as well as whether the time spent investing in Twitter is providing meaningful results. Online listening tools can play a part in that…but we’ll cover that later in this post.

Capturing Tweets?

When it comes to Twitter analysis, it’s hard to get too far without having a nice little repository of tweets themselves.  Unfortunately, Twitter has never made an endless history of tweets available for mining (or available for anything, for that matter). And, while the Library of Congress is archiving tweets, as far as I know, they haven’t opened up an API to allow analysts to mine them. On top of that, there are various limits to how often and how much data can be pulled in at one time through the Twitter API. As a consumer, I suppose I have to like that there are these limitations. As a data guy, it gets a little frustrating.

Two options that I’ve at least looked at or heard about on this front…but haven’t really cracked:

  • Twapper Keeper — this is a free service for setting up a tweet archive based on a hashtag, a search, or a specific user. In theory, it’s great. But, when I used it for my eMetrics tweet analysis, I stumbled into some kinks — the file download format is .tar (which just means you have to have a utility that can uncompress that format), and the date format changed throughout the data, so getting all of the tweets’ dates readable took some heavy string manipulation
  • R — this is an open source statistics package, and I talked to a fellow several months ago who had used it to hook into Twitter data and do some pretty intriguing stuff. I downloaded it and poked around in the documentation a bit…but didn’t make it much farther than that

I also looked into just pulling Tweets directly into Excel or Access through a web query. It looks like I was a little late for that — Chandoo documented how to use Excel as a Twitter client, but then reportd that Twitter made a change that means that approach no longer works as of September 2010.

So, for now, the best way I’ve found to reliably capture tweets for analysis is with RSS and Microsoft Outlook:

  1. Perform a search for the twitter username, a keyword, or a hashtag from http://search.twitter.com (or, if you just want to archive tweets for a specific user, just go to the user’s Twitter page)
  2. Copy the URL for the RSS for the search (or the user)
  3. Add a new RSS feed in MS Outlook and paste in the URL

From that point forward, assuming Outlook is updating periodically, the RSS feeds will all be captured.

There’s one more little trick: customize the view to make it more Excel/export-friendly. In Outlook 2007, go to View » Current View » Customize Current View » Fields. I typically remove everything except From, Subject, and Received. Then go to View » Current View » Format Columns and change the Received column format from Best Fit to the dd-Mmm-yy format. Finally, remove the grouping. This gives you a nice, flat view of the data. You can then simply select all the tweets you’re interested in, press <Ctrl>-<C>, and then paste them straight into Excel.

I haven’t tried this with hundreds of thousands of tweets, but it’s worked great for targeted searches where there are several thousand tweets.

Total Tweets, Replies, Retweets

While replies and retweets certainly aren’t enough to give you the ultimate ROI of your Twitter presence, they’re completely valid measures of whether you are engaging your followers (and, potentially, their followers). Setting up an RSS feed as described above based on a search for the Twitter username (without the “@”) will pick up both all tweets by that account as well as all tweets that reference that account.

It’s then a pretty straightforward exercise to add columns to a spreadsheet to classify tweets any number of ways by some use of the IF, ISERROR, and FIND functions. These can be used to quickly flag each tweet  as a reply, a retweet, a tweet by the brand, or any mix of things:

  • Tweet by the brand — the “From” value is the brand’s Twitter username
  • Retweet — tweet contains the string “RT @<username>
  • Reply — tweet is not a retweet and contains the string “@<username>

Depending on how you’re looking at the data, you can add a column to roll up the date — changing the tweet date to be the tweet week (e.g., all tweets from 10/17/2010 to 10/23/2010 get given a date of 10/17/2010) or the tweet month. To convert a date into the appropriate week (assuming you want the week to start on Sunday):

=C1-WEEKDAY(C1)+1

To convert the date to the appropriate month (the first day of the month):

=DATE(YEAR(C1),MONTH(C1),1)

C1, of course, is the cell with the tweet date.

Then, a pivot table or two later, and you have trendable counts for each of these classifications.

This same basic technique can be used with other RSS feeds and altered formulas to track competitor mentions, mentions of the brand (which may not match the brand’s Twitter username exactly), mention of specific products, etc.

Followers and Lists

Like replies and retweets, simply counting the number of followers you have isn’t a direct measure of business impact, but it is a measure of whether consumers are sufficiently engaged with your brand. Unfortunately, there are not exactly great options for tracking net follower growth over time. The “best” two options I’ve used:

  • Twitter Counter — this site provides historical counts of followers…but the changes in that historical data tend to be suspiciously evenly distributed. It’s better than nothing if you don’t have a time machine handy. (See the Twitalyzer note at the end of this post — I may be changing tools for this soon!)
  • Check the account manually — getting into a rhythm of just checking an account’s total followers is the best way I’ve found to accurately track total followers over time; in theory a script could be written and scheduled that would automatically check this on a recurring basis, but that’s not something I’ve tackled

I also like to check lists and keep track of how many lists the Twitter account is included on. This is a measure, in my mind, of whether followers of the account are sufficiently interested in the brand or the content that they want to carve it off into a subset of their total followers so they are less likely to miss those tweets and/or because they see the Twitter stream as being part of a particular “set of experts.” Twitalyzer looks like it trends list membership over time, but, since I just discovered that it now does that, I can’t stand up and say, “I use that!” I may very well start!

Referrals to the Brand’s Site

This doesn’t always apply, but, if the account represents a brand, and the brand has a web site where the consumer can meaningfully engage with the brand in some way, then measuring referrals from Twitter to the site are a measure of whether Twitter is a meaningful traffic driver. There are fundamentally two types of referrals here:

  • Referrals from tweeted links by the brand’s Twitter account that refer back to the site — these can be tracked by a short URL (such as bit.ly), by adding campaign tracking parameters to the URL so the site’s web analytics tool can identify the traffic as a brand-triggered Twitter referral, or both. The campaign tracking is what is key, because it enables measuring more than simply “clicks:” whether the visitors are first-time visitors to the site or returning visitors, how deeply they engaged with the site, and whether they took any meaningful action (conversions) on the site
  • “Organic” referrals — overall referrals to the site from twitter.com. Depending on which web analytics tool you are using on your site, this may or may not include the clickthroughs from links tweeted by the brand.

By looking at referral traffic, you can measure both the volume of traffic to the site and the relative quality of the traffic when compared to other referral sources for the site.

(If the volume of that traffic is sufficiently high to warrant the effort, you may even consider targeting content on the landing page(s) for Twitter referral traffic to try to engage visitors more effectively– you know the visitor is engaged with social media, so why not test some secondary content on the page to see if you can use that knowledge to deliver more relevant content and CTAs?)

Word Clouds with Wordle

While this isn’t a technique for performance management, it’s hard to resist the opportunity to do a qualitative assessment of the tweets to look for any emerging or hot topics that warrant further investigation. Because all of the tweets have been captured, a word cloud can be interesting (see my eMetrics post for an example). Hands-down, Wordle makes the nicest word clouds out there. I just wish it was easier to save and re-use configuration settings.

One note here: you don’t want to just take all of the tweet content and drop it straight into Wordle, as the search criteria you used for the tweets will dwarf all of the other words. If you first drop the tweets into Word, you can then do a series of search and replaces (which you can record as a macro if you’re going to repeat the analysis over time) — replace the search terms, “RT,” and any other terms that you know will be dominant-but-not-interesting with blanks.

Not Exactly the Holy Grail…

Do all of these techniques, when appropriately combined, provide near-perfect measurement of Twitter? Absolutely not. Not even close. But, they’re cheap, they do have meaning, and they beat the tar out of not measuring at all. If I had to pick one tool that I was going to bet on that I’d be using inside of six months for more comprehensive performance measurement of Twitter, it would be Twitalyzer. It sure looks like it’s come a long way in the 6-9 months since I last gave it a look. What it does now that it didn’t do initially:

  • Offers a much larger set of measures — you can pick and choose which measures make sense for your Twitter strategy
  • Provides clear definitions of how each metric is calculated (less obfuscated than the definitions used by Klout)
  • Allows trending of the metrics (including Lists and Followers).

Twitalyzer, like Klout, and Twitter Counter and countless other tools, is centered on the Twitter account itself. As I’ve described here, there is more going on in Twitter that matters to your brand than just direct engagement with your Twitter account and the social graph of your followers. Online listening tools such as Nielsen Buzzmetrics can provide keyword-based monitoring of Twitter for brand mentions and sentiment — this is not online listening per se, really, but it is using online listening tools for measurement.

For the foreseeable future, “measuring Twitter” is going to require a mix of tools. As long as the mix and metrics are grounded in clear objectives and meaningful measures, that’s okay. Isn’t it?