Dimension Penetration %
Last week, I explained how the Approximate Count Distinct function in Adobe Analytics can be used to see how many distinct dimension values occur within a specified timeframe. In that post, I showed how you could see how many different products or campaign codes are viewed without having to count up rows manually and how the function provided by Adobe can then be used in other Calculated Metrics. As a follow-on to that post, in this post, I am going to share a concept that I call “dimension penetration %.” The idea of dimension penetration % is that there may be times in which you want to see what % of all possible dimension values are viewed or have some other action taken. For example, you may want to see what % of all products available on your website were added to the shopping cart this month. The goal here is to identify the maximum number of dimension values (for a time period) and compare that to the number of dimension values that were acted upon (in the same time period). Here are just some of the business questions that you might want to answer with the concept of dimension penetration %:
- What % of available products are being viewed, added to cart, etc…?
- What % of available documents are being downloaded?
- What % of BOPIS products are picked up in store?
- What % of all campaign codes are being clicked?
- What % of all content items are viewed?
- What % of available videos are viewed?
- What % of all blog posts are viewed?
As you can see, there are many possibilities, depending upon the goals of your digital property. However, Adobe Analytics (and other digital analytics tools), only capture data for items that get “hits” in the date range you select. They are not clairvoyant and able to figure out the total sum of available items. For example, if you wanted to see what % of all campaign tracking codes had at least one click this month, Adobe Analytics can show you how many had at least one click, but it has no way of determining what the denominator should be, which is the total number of campaign codes you have purchased. If there are 1,000 campaign codes that never receive a click in the selected timeframe, as far as Adobe Analytics is concerned, they don’t exist. However, the following will share some ways that you can rectify this problem and calculate the penetration % for any Adobe Analytics dimension.
Calculating Dimension Penetration %
To calculate the dimension penetration %, you need to use the following formula:
For example, if you wanted to see what % of all blog posts available have had at least one view this month, you would calculate this by dividing the unique count of viewed blog posts by the total number of blog posts that could have been viewed. To illustrate this, let’s go through a real scenario. Based upon what was learned in the preceding post, you now know that it is easy to determine the numerator (how many unique blog posts were viewed) as long as you are capturing the blog post title or ID in an Adobe Analytics dimension (eVar or sProp). This can be done using the Approximate Count Distinct function like this:
Once this new Calculated Metric has been created, you can see how many distinct blog posts are viewed each day, week, month, etc…
So far, so good! You now have the numerator of the dimension penetration % formula completed. Unfortunately, that was the easy part!
Next, you have to figure out a way to get the denominator. This is a bit more difficult and I will share a few different ways to achieve this. Unfortunately, finding out how many dimension values exist (in this scenario, total # of available blog posts), is a manual effort. Whether you are trying to identify the total number of blog posts, videos, campaign codes, etc. you will probably have to work with someone at your company to figure out that number. Once you find that number, there are two ways that you can use it to calculate your dimension penetration %.
Adobe ReportBuilder Method
The first approach is to add the daily total count of the dimension you care about to an Excel spreadsheet and then use Adobe ReportBuilder to import the Approximate Count Distinct Calculated Metric created above by date. By importing the Approximate Count Distinct metric by date and lining it up with your total numbers by date, you can easily divide the two and compute the dimension penetration % as shown here:
In this case the items with a green background were inputted manually and mixed with an Adobe Analytics data block. Then formulas were added to compute the percentages.
However, you have to be careful not to SUM the daily Approximate Count numbers since the sum will be different than the Approximate Count of the entire month. To see an accurate count of unique blog posts viewed in the month of April, for example, you would need to create a separate data block like this:
Data Sources Method
The downside of the Adobe ReportBuilder method is that you have to leave Adobe Analytics proper and cannot take advantage of its web-based features like Dashboards, Analysis Workspace, Alerts, etc. Plus, it is more difficult to share the data with your other users. If you want to keep your users within the Adobe Analytics interface, you can use Data Sources. Shockingly, Data Sources has not changed that much since I blogged about in back in 2009! Data Sources is a mechanism to import metrics that don’t take place online into Adobe Analytics. It can be used to upload any number you want as long as you can tie that number to a date. In this case, you can use Data Sources to import the total number of dimension items that exist on each day.
To do this, you need to use the administration console to create a new Data Source. There is a wizard that walks you through the steps needed, which include creating a new numeric success event that will store your data. The wizard won’t let you complete the process unless you add at least one eVar, but you can remove that from the template later, so just pick any one if you don’t plan to upload numbers with eVar values. In this case, I used Blog Post Author (eVar3) in case I wanted to break out Total Blog Posts by Author. Here is what the wizard should look like when you are done:
Once this is complete, you can download your template and create an FTP folder to which you will upload files. Next, you will create your upload file that has date and the total number of blog posts for each date. Again, you will be responsible for identifying these numbers. Here is what a sample upload file might look like using the template provided by Adobe Analytics:
Next, you upload your data via FTP (you can read how to do this by clicking here). A few important things to note are that you cannot upload more than 90 days of data at one time, so you may have to upload your historical numbers in batches. You also cannot data for dates in the future, so my suggestion would be to upload all of your historical data and then upload one row of data (yesterday’s count) each day in an automated FTP process. When your data has successfully imported, you will see the numbers appear in Adobe Analytics just like any other metrics (see below). This new Count of Blog Posts metric can also be used in Analysis Workspace.
Now that you have the Count of Blog Posts that have been viewed for each day and the count of Total Blog Posts available for each day, you can [finally] create a Calculated Metric that divides these two metrics to see your daily penetration %:
This will produce a report that looks like this:
However, this report will not work if you change it to view the data by something other than day, since the Count of Blog Posts [e8] metric is not meant to be summed (as mentioned in the ReportBuilder method). If you do change it to report by week, you will see this:
This is obviously incorrect. The first column is correct, but the second column is drastically overstating the number of available blog posts! This is something you have to be mindful of in this type of analysis. If you want to see dimension penetration % by week or month, you would have to do some additional work. Let’s look at how you can view this data by week (special thinks to Urs Boller who helped me with this workaround!). One method is to identify how many dimension items existed yesterday and use that as the denominator. Unfortunately, this can be problematic if you are looking at a long timeframe and if there are many additional items added. But if you want to use this approach, you can create this new Calculated Metric to see yesterday’s # of blog posts:
Which produces this report:
As you can see, this approach treats yesterday’s total number as the denominator for all weeks, but if you look above, you will see that the first week only had 1,155 posts, not 1162. You could make this more precise by adding an IF statement to the Calculated Metric and use a weekly number or if you are crazy, add 31 IF statements and grab the exact number for each date.
The other approach you can take is to simply divide the incorrect summed Count of Blog Posts [e8] metric by 7 for week and 30 for month. This will give you an average number of blog posts that existed and will look like this:
This approach has pretty similar penetration % numbers as the other approach and will work best if you use full weeks or full months (in this case, I started with the first full week in January).
Automated Method (Advanced)
If you decide that finding out the total # of items for each dimension is too complicated (or if you are just too busy or lazy to find it!), I will demonstrate an automated approach to find out this information. However, this approach will not be 100% accurate and can only be used for dimension items that will be persistent on your site from the day they are added. For example, you cannot use the following approach to identify the total # of campaign codes, since they come and go regularly. But you can use the following approach to estimate the total # of values for items that, once added, will probably remain like files, content items or blog posts (as in this example).
Here is the approach. Step one is to create a date range that spans all of your analytics data like this:
You will also want to create another Date Range for the time period you want to see for recent activity. In this case, I created one for the Current Month To Date.
Next, create Segments for both of these Date Ranges (All Dates & Current month to Date):
Next, create a new Calculated Metric that divides the Current Month Approximate Count Distinct of Blog Posts by the All Dates Approximate Count Distinct of Blog Posts:
Lastly, create a report like this in Analysis Workspace:
By doing this, you are letting Adobe Analytics tell you how many dimension items you have (# of total blog posts in this case) by seeing the Approximate Count Distinct over all of your dates. The theory being that over a large timeframe all (or most) of your dimension items will be viewed at least once. In this case, Adobe Analytics has found 1,216 blog posts that have received at least one view since 1/1/16. As I stated earlier, this may not be exact, since there may be dimension items that are never viewed, but this approach allows you to calculate dimension penetration % in a semi-automated manner.
Lastly, if you wanted to adjust this to look at a different time period, you would drag over a different date range container on the first column and then have to make another copy of the 3rd column that uses the same date range as shown in the bottom table: