Shiny Web Analytics with R
It’s been a a couple of months since I posted about my continued exploration of R. In part, that’s because I found myself using it primarily as a more-powerful-than-the-Google-Analytics-Chrome-extension access point for the Google Analytics API. While that was useful, it was a bit hard to write about, and there wasn’t much that I could easily show (“Look, Ma! I exported a .csv file that had data for a bunch of different segments in a flat table! …which I then brought into Excel to work with!”). And, overall, it’s only one little piece of where I think the value of the platform ultimately lies.
The Value of R Explored in This Post
I’d love to say that the development of this app (if you’re impatient to get to the goodies, you can check it out here or watch a 3.5-minute demo here) was all driven up front by these value areas…but my nose would grow to the point that it might knock over my monitor if I actually wrote that. Still, these are the key aspects of R that I think this application illustrates:
- Dynamically building API calls — with a little bit of up front thought, and with a little bit of knowledge of Google Analytics dynamic segments, R (or any scripting language) can be set up to quickly iterate through a wide range of data sets. The web interface for Google Analytics starts to quickly feel clunky and slow once you’re working with text-based API calls.
- Customized data visualization — part of what I built came directly from something I’d done in Excel with conditional formatting. But, I was able to extend that visualization quite a bit using the ggplot2 package in R. That, I’m sure was 20X more challenging for me than it would have been in something like Tableau, but it’s hard for me to know how much of that challenge was from me still being far, far from grokking ggplot2 in full. And, this is an interactive data visualization that required zero out-of-pocket costs. So, there was no involvement of procurement or “expense pre-approval” required. I like that!
- Web-based, interactive data access — I had to get over the hump of “reactive functions,” in Shiny (which Eric Goldsmith helped me out with!), but then it was surprisingly easy to stand up a web interface that actually seems to work pretty well. This specific app is posted publicly on a (free) hosted site, but, a Shiny server can be set up on an intranet or behind a registration wall, so it doesn’t have to be publicly accessible. (And, Shiny is by no means the only way to go. Check out this post by Jowanza Joseph for another R-based interactive visualization using an entirely different set of R features.)
- Reusable/extensible scripting — I’m hoping to get some, “You should add…” or, “What about…?” feedback on this (from this post or from clients or from my own cogitation), as, for a fairly generic construct, there are many ways this basic setup could go. I also hope that a few readers will download the files (more complete instructions at the end of this post), try it out on their own data, and either get use from it directly or start tinkering and modifying it to suit their needs. This could be you! In theory, this app could be updated to work with Adobe Analytics data instead of Google Analytics data using the RSiteCatalyst (which also allows text-based “dynamic” segment construction…although I haven’t yet cracked the code on actually getting that to work).
Having said all of that, there are a few things that this example absolutely does not illustrate. But, with luck, I’ll have another post in a bit that covers some of those!
Where I Started, Where I Am Now
Nine days ago, I found myself with a free hour one night and decided to take my second run at Shiny, which is “a web application framework for R” from RStudio. Essentially, Shiny is a way to provide an interactive, web-based experience with R projects and their underlying data. Not only that, Shiny apps are “easy to write,” which is not only what their site says, but what one of my R mentors assured me when he first told me about Shiny. “Easy” is a relative term. I pretty handily flunked the Are you ready for shiny? quiz, but told myself that, since I mostly understood the answers once I read them, I’d give it a go. And, lo’ and behold, inside of an hour, I had the beginnings of a functioning app:
This was inspired by some of that “just using R to access the API” work that I’d been doing with R — always starting out by slicing the traffic into the six buckets in this 3×2 matrix (with segments of specific user actions applied on top of that).
I was so excited that I’d gotten this initial pass completed, that my mind immediately raced to all of the enhancements to this base app that I was going to quickly roll out. I knew that I’d taken some shortcuts in the initial code, and I knew I needed to remedy those first. And I quickly hit a wall. After several hours of trying to get a “reactive function” working correctly, I threw up my hands and asked Eric Goldsmith to point me in the right direction, which he promptly and graciously did. From there, I was off to the races and, ultimately, wound up with an app that looks like this:
This version cleaned up the visualizations (added labels of what metric was actually being used), added the sparkline blocks, and added percentages to the heatmap in addition to the raw numbers. And, more importantly, added a lot more user controls. Not counting the date ranges, I think this version has more than 1,000 possible configurations. You can try it yourself or watch a brief video of the app in action. I recommend the former, as you can do that without listening to my dopey voice, but you just do whatever feels right.
What’s Going On Behind the Scenes
What’s going on under the hood here isn’t exactly magic, and it’s not even something that is unique to R. I’m sure this exact same thing (or something very similar) could be done with Python — probably with some parts being easier/faster and other parts being more complex/slower. And, it’s even probably something that could be done with Tableau or Domo or Google Data Studio 360 or any number of other platforms. But, how it’s working here is as follows (and the full code is available on Github):
- Data Access: I put my Google Analytics API client ID and client secret, as well as a list of GA view IDs into variables in the script
- Dynamic Segments: I built a matrix where each row had the value that shows up in the dropdown, and then a separate row for each segment that goes into that group that has both the name of the segment (Mobile, Desktop, Tablet, New Visitors, etc.) and the dynamic segment syntax for that segment of traffic. This list can be added to at any time and the values then become available in the application.
- Trendline Resolution: This is another list that simply provides the label (e.g., “By Day”) and the GA dimension name (e.g., “ga:date”); this could be modified, too, although I’m not sure what other values would make sense beyond the three included there currently.
- Metrics: This is also a list — very similar to the one above — that includes the metric name and the GA API name for each metric. Additional metrics could be added easily (such as specific goals).
- Linking the Setup to the Front End: This was another area where I got an Eric Goldsmith assist. The app is built so that, as values get added in the options above, they automatically get surfaced in the dropdowns.
- “Reactive” Functions: One of the key concepts/aspects of Shiny is the ability to have the functions in the back end figure out when they need to run based on what is changed on the front end. (As I was writing this post, Donal Phipps pointed me to this tutorial on the subject; I’ll need to go through it another 8-10 times before it sinks in fully.)
- Pull the Data with RGA’s get_ga() Function: Using the segment definitions, a couple of nested loops cycle through and, based on the selected values, pull the data for each heatmap “block” in the final output. This data gets pulled with whatever “date” dimension is selected. Basically, it pulls the data for the sparklines in the small multiples plot.
- Plot the Data: I started with a quick refresher on ggplot2 from this post by Tom Miller. For the heatmap, the data gets “rolled up” to remove the date dimension. The heatmap uses a combination of geom_tile() and geom_text() plots from the ggplot2 pacakge. The small multiples at the bottom use a facet_grid() with geom_line().
- Publish the App: I just signed up for a free shinyapps.io account and published the app, which went way more smoothly than I expected it to! (And I then promptly hit up Jason Packer with some questions about what I’d done.)
And that’s all there is to it. Well, that’s “all” there is to it. This actually took me ~17 hours to get working. But, keep in mind that this was my first Shiny app, and I’m still early on the R learning curve.
The Most Challenging Things Were Least Expected
If someone had told me this exercise would take me ~17 hours of work to complete, I would have believed it. But, as often is the case for me with R, I would have totally muffed any estimate of where I would spend that time. A few things that took me much longer to figure out than I’d expected were:
- (Not Shown) Getting the reactive functions and calls to those functions set up properly. As mentioned above, I spun my wheels on this until I had an outside helping hand point me in the right direction.
- Getting the y-axis for the two visualizations in the same order. This seems like it would be simple, but geom_tile() and facet_grid() are two very different beasts, it seems.
- Getting the number and the percentage to show up in the top boxes. Once I realized that I just needed to do two different geom_text() calls for the values and “nudge” one value up a bit and the other value down a bit, this worked out.
- Getting the x-axis labels above the plot. This turned out to be pretty easy for the small multiples at the bottom, but I ultimately gave up on getting them moved in the heatmap at the top (the third time I stumbled across this post when looking for a way to do this, I decided I could give up an inch or two on my pristine vision for the layout).
- Getting the “boxes” to line up column-wise. They still don’t line up! They’re close, though!
The Least Challenging Things Were Delightful Surprises
On the flip side, there were some aspects of the effort that were super easy:
- There is no hard-coding of “the grid.” The layout there is completely driven by the data. If I had an option that had 5 different breakouts, the grid — both the heatmap and the small multiples — would automatically update to have five buckets along the selected dimension.
- The heatmap. Getting the initial heatmap was pretty easy (and there are lots of posts on the interwebs about doing this). scale_fill_gradient() FTW!
- ggplot2 “base theme.” This was something I clicked to the last time I made a run at using ggplot2. Themes seem like a close cousin to CSS. So, I set up a “base theme” where I set out some of the basics I wanted for my visualizations, and then just selectively added to or overrode those for each visualization.
- Experimentation with the page layout. This was super-easy. I actually started with the selection options along the left side, then I switched them to be across the top of the page, and then I switched them back. I really did very little fiddling with the front end (the ui.R file). It seems like there is a lot of customization through HTML styles that can be done there, but this seemed pretty clean as is.
Try it Yourself?
Absolutely, one of the things I think is most promising about R is the ability to re-purpose and extend scripts and apps. In theory, you can fairly easily set up this exact app for your site (you don’t have to publish it anywhere — you can just run it locally; that’s all I’d done until yesterday afternoon):
- Make sure you have a Google Analytics API client ID and client secret, as well as at least one view ID (see steps 1 through 3 in this post)
- Create a new project in RStudio as an RShiny project. This will create a ui.R and a server.R file
- Replace the contents of those files with the ui.R and server.R files posted in this Github repository.
- In the server.R file, add your client ID and client secret on rows 9 and 10
- Starting on row 18, add one or more view IDs
- Make sure you have all of the packages installed (install.packages(“[package name]”)) that are listed in the library() calls at the top of server.R.
- Run the app!
- Leave a comment here as to how it went!
Hopefully, although it may be inefficiently written, the code still makes it fairly clear as to how you can readily extend it. I’ve got refinements I already want to make, but I’m weighing that against my desire to test the hypothesis that the shareability of R holds a lot of promise for web analytics. Let me know what you think!
Or, if you want to go with a much, much more sophisticated implementation — including integrating your Google Analytics data with data from a MySQL database, check out this post by Mark Edmondson.