Technorati is a poor source of blog ranking data …
A few days ago my friend Avinash Kaushik wrote and asked me for my Feedburner subscriber numbers as an input into his very popular ranking of web analytics blogs. It gave me an opportunity to do something I’d meant to do for awhile — request that Avinash drop my site from his ranking system (which he agreed to do, thanks Avinash!)
I asked Avinash to drop me from the rankings for two reasons:
- I’m pretty well established as a web analytics blogger, having done this for a pretty long time. I started blogging about the topic while I was at JupiterResearch back in 2004 and have been writing basically the same blog at a variety of URLs continually since that time. I figure it’s far better for Avinash’s list to highlight some newer folks in the web analytics blogosphere — great writers like Ian Thomas, Gary Angel, Judah Phillips, and Aurelie Pols!
- Technorati, which Avinash uses as the basis for his ranking system, is an extremely poor data source for ranking weblogs.
While it may be a unique source of this type of data, Technorati provides a lousy basis for accurately ranking blogs and appears to be very easily fooled by anyone actively working to increase their Technorati ranking.
Why would I say such a thing, you ask? An excellent question, but I have what I think is a pretty good answer (especially if you’ve ever had any concerns about the quality of data you use in your analysis …)
First, and maybe this is something I’m just being dumb about and is easily corrected, if you’ve had a blog for any amount of time and have moved URLs for any reason, Technorati seems incapable of re-grouping URLs for a single blog. Have a look at the following:
As you can see here, based on a search for “web analytics” grouped using the “Blogs” tab in Technorati, my blog shows up as two distinct entries from two slightly different URLs. Both have slightly different levels of authority. When you drill down into “authority” which seems to provide at least partial basis for the ranking system Technorati uses, you’ll see slightly different results for each of these URLs:
The first blog URL lists “437 blog reactions to Analytics Demystified”
The second blog URL lists “461 blog reactions to Analytics Demystified”
What’s worse is that the exact same blog and the exact same content appear further down the same page of results:
Here the blog URL is www.analyticsdemystified.com/weblog which was the original blog URL back when I was on the Blogger platform. Perhaps because this is the oldest URL Technorati has in the system, this URL has the greatest reaction:
Now, I wondered if perhaps each of these listings were de-duplicated and could perhaps be added up or something — no such luck it appears. There is a ton of duplication and thusly my blog is pretty much just broken up into three pieces which certainly must make it hard for anyone trying to assess the overall reaction to my writing over the past 3.5 years.
I asked around a bit to see if anyone knew why this happens, and Judah Phillips (who admits he’s been watching his Technorati ranking lately given his relative newness to the blogosphere) pointed me to this entry in the Technotati FAQ:
http://support.technorati.com/faq/topic/56
Just in case you don’t want to read the FAQ entry, I will summarize: You are more or less out of luck. According to the FAQ “we are unable transfer or combine links from different URLs at this time.” I suppose their answer makes sense, but it doesn’t make how Technorati treats blogs that have moved any more useful or appropriate.
Oh well.
The second problem I have with Technorati is that it is either not paying very close attention to where these “blog reactions” are coming from or the system is very easily gamed. Consider the blogs in the number 2, 4, and 5 slots when you search for “web analytics” blogs at Technorati:
There is my friend Avinash, Mr. Marshall Sponder from the Web Analytics Association and KnowMoreMedia, and the entire team at FutureNow, Inc. This is what you expect to see based on Avinash’s ranking system (although I think he might be excluding the guys from FutureNow, I’m not sure …) given that, according to Avinash:
“The evolution of the ranking system continues with a couple of tweaks to the ranking this time around. The primary determinant of the rank in the list below is still Technorati (click here).”
The problem arises when you start to examine the sites that make up the Authority calculation:
Here you can see that Avinash (who is widely loved, I love Avinash too!) has done a great job at generating reaction to his blog, getting 6,013 sites to link back to his content and having a Techorati “authority score” of 948. Very cool … that is until you start to examine the actual sites and blogs linking back to Avinash, at which point you notice something like this entry (#10 on the first page of results when I snapped this screenshot):
Hmmm, that is from Avinash’s own site. That’s odd, isn’t it, that Avinash’s own site would be included in his authority calculation? I thought so, so I quickly looked at the first ten pages of results:
What I found was that 44 percent of the top 50 sites listed as providing “blog reaction” to Occam’s Razor were Avinash’s own (albeit slightly different) URLs.
I’m not sure why that is, do you know?
I figured this might just be some strange anomaly so I took a look at the same thing for Marshall Sponder’s blog, WebMetricsGuru. Marshall didn’t seem to have the same problem, fortunately, but of the 52,152 reactions to Marshall’s blog contained in Technorati, it appears that the dramatic majority come from un-targeted links to his site from other KnowMoreMedia properties:
Similar to the problem with Avinash’s listing, 80 percent of the top 50 sites listed as providing blog reaction to WebMetricsGuru were from KnowMoreMedia. When I continued looking at the results, this percentage actually went up to 83 percent of the top 100 sites “reacting” to Marshall.
I might be thinking about this the wrong way, but that hardly seems like the kind of “reaction” most bloggers are looking for.
Well, at this point I had to look at the Eisenberg’s blog which is a little newer in the blogosphere. Here I saw the exact same problem I found in Avinash’s listing, 34% of the top 50 sites listed as providing blog reaction to GrokDotCom are from, yep, you guessed it, GrokDotCom.
I also noticed that in the GrokDotCom authority listings that some sites appeared again and again and again:
Again, I don’t know what’s going on here but as the basis of “blog popularity” this data seems pretty suspect to me.
Perhaps I’m naive, or perhaps I’m just plain confused here, but the Technorati ranking system doesn’t seem to provide very useful results based on the inconsistency I am describing. Maybe I just happened to stumble on three anomalies — other blogs listed in Avinash’s ranking don’t seem to have the same problems but some surely do. For what it’s worth, none of my three blogs (?!?) listed in Technorati appear to have the problem described above — maybe that’s what I’m doing wrong!
Now perhaps I am thinking about the authority calculation incorrectly. According to Technorati:
“Technorati Authority is the number of blogs linking to a website in the last six months. The higher the number, the more Technorati Authority the blog has.
It is important to note that we measure the number of blogs, rather than the number of links. So, if a blog links to your blog many times, it still only count as +1 toward your authority. Of course, new links mean the +1 will last another 180 days.”
This sounds good and kind of makes sense, except that you can see where KnowMoreMedia is kind of cheating Marshall by having all those completely irrelevant links back to his blog that artificially run up the number of “blog reactions” and likely his authority score.
Also, I kept finding examples of blogs that when I looked at the blogs linking to them, I kept finding the same problem described above — the blog being assessed linking back to itself. Here’s an example from one of the posts/pages on Avinash’s site that has an authority ranking of “12”:
You can’t see all of it but there are 13 reactions to Avinash’s post, and given Technorati’s ranking you would expect 12 different blogs, one of which would have two posts linking to the page, right? Wrong. What you get is five different blogs, two of which are Avinash’s own work (albeit in two different domains) and three of which (the “SEO, SEM, Social Media, web analytics” listings in the image above) appear to be the exact same content in different domains.
If you examine the URLs in the authority listings that come from Avinash’s site, you’ll see that they all have slightly different URLs. But clearly they are all from the exact same blog. If, based on the FAQ answer I gave above about why my blog is listed three times, this is how Technorati is calculating authority … essentially Technorati is saying that every distinct URL is a distinct blog.
Huh?
Seems like a pretty easy system to “game” to me, or one that is easily fooled and mostly useless. At this point I’m even more confused about Technorati’s ability to de-duplicate blogs as an input to their authority ranking.
Please don’t get me wrong, I think Avinash is brilliant for publishing a list of popular blogs (especially one that ranks his own site as #1, how amazing and magnificent is that!) I have learned to respect Marshall Sponder’s ability to write (and write, and write, and write) and I obviously get on well with Bryan and Jeffery Eisenberg (Bryan is one of Analytics Demystified, Inc.’s trusted advisors.) And I sincerely, sincerely hope that each of these fine gentlemen will see that this blog post is far from a criticism of their talents and passions.
But given that I’ve always worked with my clients to make sure they had the best data possible to serve as inputs for their analysis, something about the data reported by Technorati just doesn’t pass the “old smell test”.
Honestly, if anyone out there can help me understand what Technorati is doing and why a ranking system that is apparently so easily corrupted by self-reference and link farming is useful, I’m more than happy to hear from you! Feel free to email me directly or simply comment on this post. Until that time, I will view any ranking system based on Technorati data as quite suspect.
Perhaps you will as well …
I welcome your comments, criticisms, insights and feedback. If someone from Technorati wants to email or call me and explain what the heck I’m doing wrong and why everything I’ve written in this post is incorrect, I’ll gladly listen. And if the answer makes sense to me, I’ll even more gladly apologize for being so confused! If you think I’m carping, whining, or just being critical of Technorati’s data for no good reason, let me hear it! Frankly I sometimes worry that we don’t have enough engaged, thoughtful debate in the web analytics blogosphere …