Approximate Count Distinct Function – Part 1
In Adobe Analytics, there are many advanced functions that can be used in Calculated Metrics. Most of the clients I work with have only scratched the surface of what can be done with these advanced functions. In this post, I want to spend some time discussing the Approximate Count Distinct function in Adobe Analytics and in my next post, I will build upon this one to show some ways you can take this function to the next level!
There are many times when you want to know how many rows of data exist for an eVar or sProp (dimension) value. Here are a few common examples:
- How many distinct pages were viewed this month?
- How many of our products were viewed this month?
- How many of our blog posts were viewed this month?
- How many of our campaign tracking codes generated visits this month?
As you can see, the possibilities are boundless. But the overall gist is that you want to see a count of unique values for a specified timeframe. Unfortunately, there has traditionally not been a great way to see this in Adobe Analytics. I am ashamed to admit that my main way to see this has always been to open the dimension report, scroll down to the area that lets you go to page 2,3,4 of the results and enter 50,000 to go to the last page of results and see the bottom row number and write it down on a piece of paper! Not exactly what you’d expect from a world-class analytics tool! It is a bit easier if you use Analysis Workspace, since you can see the total number of rows here:
To address this, Adobe added the Approximate Count Distinct function that allows you to pick a dimension and will calculate the number of unique values for the chosen timeframe. While the function isn’t exact, it is designed to be no more that 5% off, which is good enough for most analyses. To understand this function, let’s look at an example. Let’s imagine that you work for an online retailer and you sell a lot of products. Your team would like to know how many of these products are viewed at least once in the timeframe of your choosing. To do this, you would simply create a new calculated metric in which you drag over the Approximate Count Distinct function and then select the dimension (eVar or sProp) that you are interested in, which in this case is Products:
Once you save this Calculated Metric, it will be like all of your other metrics in Adobe Analytics. You can trend it and use it in combination with other metrics. Here is what it might look like in Analysis Workspace:
Here you can see the number of distinct products visitors viewed by day for the month of April. I have also included a Visits column to show some perspective. I have also added a new Calculated Metric that divides the distinct count of products by Visits and used conditional formatting to help visualize the data. Here is the formula for the third column:
The same process can be used with any dimension you are interested in within your implementation (i.e. blog posts, campaign codes, etc.)
Combining Distinct Counts With Other Dimensions
While the preceding information is useful, there is another way to use Approximate Distinct Count functions that I think is really exciting. Imagine that you are in a meeting and your boss asks you how many different products each of your marketing campaigns has generated? For example, does campaign X get people to view 20 products and campaign Y get people to view 50 products? For each visit from each campaign, how many products are viewed? Which of your campaigns gets people to view the most products? You get the gist…
To see this, what you really want to do is use the newly created Approximate Count of Products metric in your Tracking Code or other campaign reports. The good news is that you can do that in Adobe Analytics. All you need to do is open one of your campaign reports and add the Calculated Metric we created above to the report like this:
Here you can see that I am showing how many click-throughs and visits each campaign code received in the chosen timeframe. Next, I am showing the Approximate Count of Products for each campaign code and also dividing this by Visit. Just for fun, I also added how many Orders each campaign code generated and divided that by the Approximate Count of Products to see what portion of products viewed from each campaign code were purchased.
You can also view this data by any of your SAINT Classifications. In this case, if you have your campaign Tracking Codes classified by Campaign Name, you can create the same report for Campaign Name:
In this case, you can see that, for example, the VanityURL Campaign generated 19,727 Visits and 15,599 unique products viewed.
At this point, if you are like me you are saying to yourself: “Does this really work? That seems to be impossible…” I was very suspicious myself, so if you don’t really believe that this function works (especially with classifications), here is a method that Jen Lasser from Adobe told me you can use to check things out:
- Open up the report of the dimension for which you are getting Approximate Distinct Counts (in this case Products)
- Create a segment that isolates visits for one of the rows (in the preceding example, let’s use Campaign Name = VanityURL)
- Add this new segment to the report you opened in step 1 (in this case Products) and use the Instances metric (which in this case is Product Views)
- Look at the number of rows in Analysis Workspace (as shown earlier in post) or use the report page links at the bottom to go to the last page of results and check the row number (if using old reports) as shown here:
Here you can see that our value in the initial report for “VanityURL” was 15,599 and the largest row number was 15,101, which puts the value in the classification report about 3% off.
Conclusion
As you can see, the use of the Approximate Count Distinct function (link to Adobe help for more info) can add many new possibilities to your analyses in Adobe Analytics. Here, I have shown just a few examples, but depending upon your business and site objectives, there are many ways you can exploit this function to your advantage. In my next post, I will take this one step further and show you how to see how to calculate dimension penetration, or what % of all of your values received at least one view over a specified timeframe.