Missing SAINT Classification Data [SiteCatalyst]
Recently, the Adobe SiteCatalyst product team “hit it out of the park” (to follow Ben’s analogy) with the latest SiteCatalyst point release! There are many awesome features that people like me have been waiting for. This release has things like segment comparisons, increased self-service capabilities via the Admin Console, Classifications on List Variables, Hourly Trending and improved test search filtering. Probably the biggest point release I have seen going back to version 9! Kudos to all involved!
However, the biggest feature enhancement was the SAINT Classification Rule Builder. This has been a long time coming and I am excited to start using it. I highly recommend you read more about this in the SiteCatalyst help section (login required). This new feature will go a long way towards helping clients maintain and clean up their SAINT Classifications. While I was giddy about the concept of SiteCatalyst customers having updated SAINT Classifications, I decided to share some other tips I have used to help clients minimize their missing SAINT data. When I work with clients to audit their Adobe SiteCatalyst implementations, one thing I review is how many of their eVars and sProps are missing SAINT classification data. Hopefully, these tips, combined with the new SAINT Classification Rule Builder, will lead you into SAINT Classification bliss!
The “None” Row in SAINT
In the past, I have explained how the “None” row in SiteCatalyst is annoying (at times), but actually a good thing, and not something to be feared. The “None” row can be extremely useful in Campaign reports and many others. If you see a “None” row in any eVar report, it simply means that when the chosen Success Events took place, there was no value for the current eVar. After a while, most SiteCatalyst users begin to understand this. Traffic variable (sProp) reports don’t have “None” rows since if there is no data, it just doesn’t show it instead of lumping the reminder into a “None” row.
However, when it comes to SAINT Classifications, for the most part, the “None” row tends to be a bad thing. The reason is that when you see a “None” row, it can mean one of two things:
- The root eVar variable that you are classifying did not have a value
- You are missing SAINT Classification data, causing unclassified data to appear in the “None” row for the eVar (or sProp) classification
To better illustrate this, let’s look at an example. Let’s say you work for a company that sells video games. You are passing Product ID’s to the Products variable and also have a few SAINT classifications of the Products variable including the one shown here (Game Genre):
As you can see, there is a significant percentage of Orders and Revenue appearing in the “None” row of this classification report. But how do you know if the cause is #1 or #2 above or a mixture of both? Did someone launch new products and forget to pass in a Product ID to the products variable and is that why there is no assigned Game Genre? Or do we have all of the Product ID’s correctly assigned to the Products variable, but forgot to add the Game Genre meta-data via SAINT? Unfortunately, it is difficult to know the answer to this question without doing some research.
Isolating the True “None” Row
If you are a SiteCatalyst guru, you probably know that the fastest way to figure this out is to do what I call the “breakdown by the root” trick. What I do is to click the breakdown icon next to the “None” row and choose to break that row down by the variable that it is a classification of (its root). In this case, you would break down the Game Genre “None” row by the Products variable to see if there are any product ID’s that show up. If you see Product ID’s in the breakdown report, you know that you are missing SAINT classification data. If you only see a “None” value, then you have done all that you can do via SAINT and have to figure out why such a high percentage of Orders and Revenue are not being associated with a Product ID. The latter is often a tagging issue.
In this example, when you create this breakdown, you can see that both problems exist. About 4% of the Orders taking place are missing a Product ID in the Products variable, which means that we have no way of knowing which Game Genre they would fall into. However, the rest of the items appearing in the breakdown report have Product ID’s. This means that they are simply unclassified. Therefore, if we were to successfully classify all of these Product ID’s, we could bring our overall percent of unclassified Orders down from 22.1% to 0.8% (1,095/128,916 Orders), which makes a huge difference! I have found that having large “None” rows for classifications can confuse your users and lead to the perception that your data isn’t sound. To stay on top of this, another trick I suggest is that you schedule the preceding breakdown report to be mailed to you weekly for your most important variable classifications.
Using a “Dummy Value”
Next is what I call the “dummy value” trick. There are sometimes cases in which you know that you will be missing meta-data. For example, in the gaming scenario above, there could be a case in which you know the Product ID, but for some reason don’t yet have the Game Genre right away. Looking at the second report above, there may be a legitimate reason why Product ID 7777 and 7767 don’t yet have a Genre assigned. If that is the case, my suggestion is that you set a “dummy value” in your SAINT file to act as a placeholder for the actual value that will be coming later. To do this, simply add the “dummy value” in any blank spots of your SAINT file. For example, let’s say that you download your products SAINT file and it looks like this:
All you have to do is fill in the blanks with a “dummy value.” I like to put “dummy” values in all caps and/or brackets so they are easy to identify and filter out of reports if needed. The preceding SAINT file would now look like this:
Once this file has been uploaded and processed, you can re-open the first report shown above and see this:
Obviously, not much has changed since all we did was move most of the “None” values to a new “dummy” row. However, we now can see that the actual “None” row is only about 0.8% and more importantly, this report communicates to SiteCatalyst users that it is known that 21.3% of the Game Genre’s are currently missing (so don’t call and pester us!). You can put any message you want in the brackets such as “[GAME GENRE COMING SOON…]” or whatever you think makes sense to your users. Additionally, it is easy to see this report without the “dummy value” by simply using a search filter to remove anything with a “[” or “]” symbol, which is easier than removing the “None” row from reports.
If you have to deal with SAINT classifications on a regular basis, knowing how to do the following can make your life a lot easier:
- Isolate the true “None” values from those missing SAINT classifications
- Get a report of those SAINT items that are missing meta-data through scheduled reports
- Communicate which SAINT values are known to be missing vs. ones that are true “None” values through a “dummy value”
Together these tips should save you some time and headaches when it comes to SAINT. If you have any questions on these tips or additional ones, feel free to leave a comment here.
P.S. If you would like to take my advanced SiteCatalyst class or take classes related to Adobe ReportBuilder, Adobe Discover and many other web analytics topics, check out Analytics Demystified’s upcoming Midwest training classes: http://www.webanalyticsdemystified.com/accelerate/training-2013.asp