I offer a useful tidbit for those of you who 1) rely on a page-tagging solution for web analytics, and 2) care about tracking downloads. I've encountered this situation several times as a web analyst, so I thought I'd write up a basic summary of the issue.
You will find this relevant if you are either a newbie web analyst or an experienced web analyst charged with educating your data consumers. Here's what you need to know:
Page-tagging
Many popular web analytics tools available today – like Omniture and Google Analytics – rely solely on JavaScript tags for data collection. Using this method, site owners place a tag on every page they wish to track; when a visitor accesses a tagged page, information is sent back to the web analytics tool. That, in a nutshell, is page-tagging.
Before you read any further, confirm the method of data collection used on your site. If you've got something other than page-tagging - such as log files, or network collection, or a hybrid solution - then you can stop reading now. The issue I'm about to describe only matters if you use page-tagging alone for data collection.
Tracking downloads
Here's the problem with using page-tagging to track downloads: it's not possible to embed a JavaScript tag inside a downloadable file. Instead, the tag tracks the very moment the visitor clicks a link to get a file. This counts downloads initiated, not downloads completed.
Measuring downloads initiated is not wrong, but it is slightly upstream from the spot we'd ideally like to track: successful completion of download. Marking this slightly-upstream-but-still-valid action will necessarily inflate the number of real downloads, though, since it’s entirely possible to back out of the process before actually downloading the file 100%.
What it all means
Do not expect accounting-level precision from tag-based download stats. Take the metric at face value, and make sure everyone who uses the data understands how to interpret it. If you find yourself needing to compare tag-based downloads with an overlapping source of business data, I invite you to read my data reconciliation how-to guide post.
When I encountered this issue with downloads most recently (last month, in fact), my client opted to put a statement like the following as a footnote in a widely-distributed report that includes download stats:
"Values in this report approximate the number of successfully completed [Product X] downloads. Since no status is returned to the server when a download completes, it's not possible to get an exact figure. Therefore, we count downloads initiated."
So, armed with education and footnotes, you and your fellow analysts should feel confident using the "download initiated" value as a valid marker of site success.
Extra credit
Here's one of my favorite pieces by early photographer Edweard Muybridge. As progressive snapshots of a single activity, it seems appropriate to include it here.
Image credit, Digital Journalist.
Isn't this a good case for *not* relying on page tags as your sole source of web analytics data? I'm not saying to try to build out a system that actually combines page tag and log file data, but I do think it is useful to occasionally take a look at your log files for your top downloads (and it's fine to ID these from the "downloads-initiated" data from the page tag).
The thing to check, though, is that the downloads are getting completed -- either by bytes transferred or by status code. This still stops short of "the content got put in front of the user," but it's a lot closer. This is especially important for larger downloads -- evaluation software, for instance, or PDFs that are 100s of pages long. It's good to confirm that people are sticking with the download and not bailing out early for one reason or another.
Posted by: Tim Wilson | June 04, 2008 at 10:48 AM
Interesting post here.. at least provides some good info for a beginner in Web Analytics like me! Here is my take on tracking download. When a client wants to track the number of downloads, can the download-able file be placed in a separate directory (lets say downloads.companyname.com/filename.zip).
You will be able to track the download initiation from the log file by analyzing the data, while its evident that download started, you aren't able to confirm if the download was completed successfully. It would really good to compare the value of initiated downloads and bandwidth usage in subdirectory (in this case being downloads.company.com and corresponding file) which should give approximate success of downloads happened.
This is ain't a way to successfully track all completed downloads but atleast provide you an insight of approximate counts on downloads that happened. I hope what I am trying to explain here makes sense..!
Your blog is really interesting, following you at twitter and subscribed to your RSS!
Cheers,
Vinay
Posted by: Vinay | June 04, 2008 at 04:19 PM
I have to disagree with Tom in part. Yes, relying on a profile that analyzes web server logs for the PDF and other downloads is a way to easily do it (but out of reach of the pure ASP solutions). However, in many cases, we need to couple the downloads with other dimensions; in such case there's a strong need in having those numbers in the same reports.
June, a quick note: WebTrends has taken out all references to tracking downloads from the 8.1 literature as well as the knowledge base. Weirdly, this means they don't support it anymore. Talk about a bummer...
Posted by: Jacques Warren | June 04, 2008 at 04:31 PM
Thanks for your feedback, everyone! I guess I'm not the only one to notice this issue.
Tim: True, page-tagging isn't the best way to do data collection if you care a lot about downloads. However, I think many companies have chosen a page-tagging tool just for the sheer convenience of it (read: never having to wrestle with logs). If a company decides to switch to logs (or tags+logs) then they have to decide if the added benefit is worth the added hassle.
Also, good point about usability and big files. Without "download completed" stats we don't really know if files are prohibitively large. I suppose a qualitative survey or a spot-check of logs could help answer that question, but it's still less than ideal.
Vinay: I'm glad you found me! I'm following you on Twitter now, too. I do understand your idea about using directory bandwidth to approximate completed downloads. However, I believe that bandwidth for a directory is just the sum of the bandwidth of every tagged html file in that directory - so if the directory contained no tagged files, the bandwidth would be zero. It would be pretty easy to test this out. :)
Jacques: I know exactly what you mean about wanting to "couple the downloads with other dimensions." Ideally we'd like to segment completed vs. failed downloads by referring keyword, or landing page, or anything else that visitor did during their visit or their lifetime. Right now we can't.
That's frustrating news about WebTrends and downloads. I'm still using WT v8 with one of my clients. Luckily they don't care too much about measuring downloads.
Posted by: June Dershewitz | June 05, 2008 at 12:10 PM
Hi June,
I understand that the functionality will still be "usable", but not officially part of support/how-to's. That really escapes me... I mean, that is almost enough for a prospect to decide against purchasing WebTrends. Add to this that Omniture say they have made it even easier to track events such as podcast viewing, etc. which you would need dcsmultitrack for in WebTrends.
Posted by: Jacques Warren | June 05, 2008 at 02:54 PM
Jacques: That is not a good move! I wonder if WebTrends has any plans to support download tracking functionality in the future? Based on the reaction that I've gotten to this blog post, there's obviously a demand for tag-based or hybrid solutions that provide better download tracking than what we currently get today.
Posted by: June Dershewitz | June 05, 2008 at 04:22 PM
Hi there June and Jacques - WebTrends continues to support the tracking of flash, downloads, and an array of different web 2.0 events using our dcsMultiTrack functionality. This feature is broadly used by our customers and will remain one of our key methods for collecting events that don't fit within traditional metaphor of a page view.
In terms of the documentation, we have been cleaning up our content and will be reintroducing it in the next month. Also, keep your eyes out for some new tools from us to make the tracking of these events even easier.
Cheers,
Eric Rickson
Product Manager
WebTrends
Posted by: Eric Rickson | June 09, 2008 at 01:23 PM
Eric: Thanks for your feedback! I'm glad to hear that dcsMultiTrack will continue to be supported, and it's also a relief to know that the documentation will be reintroduced. There's obviously a need for it.
Posted by: June Dershewitz | June 09, 2008 at 02:41 PM
This is an interesting post as I've had to address this question quite a bit at my workplace.
Technically speaking, there is a way to "couple the downloads with other dimensions" such that you can look at download completions by segment X. The approach requires both a page-tagging solution with a first party cookie configuration and log files enabled on the servers that serve your downloads. However, there may be legal/privacy restrictions that would prevent you from doing this so I'll leave it at that.
For companies that tend to have large files, you really should not use attempted downloads (provided by page-tagging solutions) interchangeably with download completions. You should just call it out for what it is - an attempt to download. There are way too many variables to consider why someone clicked on the link to download but did not fully download the file.
Posted by: Fred Kuu | June 11, 2008 at 11:16 PM
Hi Fred! Thanks for posting that tip about how to segment traffic based on downloads completed; very clever. I'm curious to know - since you do track downloads completed - what
kind of business decisions you feel you can make based on downloads completed that you can't make based on downloads initiated?
Posted by: June Dershewitz | June 12, 2008 at 10:05 AM