My Photo

WAA

  • Join the WAA

    Web Analytics Association MemberI'm a member of the Web Analytics Association. If you're not a member, JOIN TODAY!

Your email address:


Powered by FeedBlitz

Recently on this blog
Recently on other blogs

Add to Social Networks, Blog, Bookmark

AddThis Social Bookmark Button

Clicky

  • Clicky Web Analytics

Data Analysis

April 23, 2008

Web Analytics Data Reconciliation How-To Guide

I suspect that most experienced web analysts have done at least one data reconciliation project during the course of their tenure.  For something so common, however, it rarely gets discussed. 

Sure, it's not sexy like Angelina Jolie, but even Plain Jane likes a little attention now and then.

Data reconciliation is an important foundational activity because, when done well, it will inspire people to have confidence in the data that you share with them. Data quality will never be perfect, but it should be good enough for everyone to feel that they can make sound business decisions based on what's available.

Enough pep talk.  If you're on the brink of your first data reconciliation project, here's what to do:

1) Identify your two data sources

The need for data reconciliation arises when you have two separate systems that provide similar sets of data.  One of these sources - let's call it "Primary Source" - will necessarily be your standard web analytics application.  The other one - let's call it "Secondary Source" - can be one of several things, namely:

  • An upstream system, like campaigns (banner, search, email)
  • A downstream system, like commerce or downloads or form submissions
  • A parallel system, such as when you migrate from one web analytics tool to another.  In this case I'd advise you to break your project into smaller chunks according to individual reports you care to reconcile.

2) Learn how your primary source gets collected

Read the documentation and talk to your internal tech team.  Be clear on the scope of the data you're collecting - ie exactly which pages are tagged, or exactly which log files are processed.  If you're using page tags, know whether the tag is placed at the top or the bottom of the page (this will affect when the tag fires, which in turn affects the level of data loss to some extent).  Make note of any special filters, transformations or business logic used here.

3) Learn how your secondary source gets collected

If your secondary source is a parallel web analytics system, repeat the process you followed in step 2, above. 

If it's an upstream system you're stuck with whatever documentation and lore you can glean regarding how that works. 

If it's a downstream system you'll need to identify the group within your business that owns that system, then grill them on how they do data collection and how they transform the data into the metric you're trying to reconcile.  There's a lot of variability here, especially if your downstream system is homegrown, so be sure to do a thorough investigation.  As in step 2, make note of any special filters, transformations or business logic used here.

4) Compare data sets

Applesoranges Pick a sensible date range and granularity level, then pull corresponding data from both sources.  A good default would be daily totals for a month.  If you're dealing with really high volume you may want to isolate a subset of your data based on some attribute that you can reliably pull from both sources, like a single URL (for downloads) or a single product (for commerce).

Now put your data sets side by side in Excel and calculate the delta.  Compare the trends over time and see if you can explain the differences.  Ask yourself, are you comfortable with the differences you see?  If not, consider fine-tuning the way you pull data from your primary and/or secondary source in order to account for those differences.

Reality check: you're never going to get a perfect match.  This is a good exercise, but know when to say when. Do not obsess!

5) Document and share your findings

This is the most important step.  Write a report about what you've done and what you've found.  Now go talk to people - give a verbal presentation of findings to your web analytics colleagues and your concerned data stakeholders. 

At this point you should be able to speak with confidence about the differences in the two data sources, and your goal should be to pass this confidence on to the people around you.  Save your report for future reference, as newcomers are likely to ask the questions you've already answered.

6) Plan to revisit if necessary

If reconciliation is part of tool migration, you are now done.  Good work.

If your secondary source is an upstream or downstream system, plan a periodic audit to make sure your findings are still valid.  If your systems are stable you can get away with doing this maybe once a year, but if you have any appreciable changes - like a major site redesign or a shopping cart overhaul - you may wish to do another quick round of reconciliation at that time.

March 27, 2008

Where to Put Integrated Data: 7 Helpful Questions

Web analytics data integration goes both ways.  When you marry clickstream data with other business data, you can put the combined result either inside or outside your web analytics application.  The trick is, if you can put it either place, how do you decide which place is best? 

Here are 7 questions to consider as you make your decision:

  1. Is this a once-off or will you need an ongoing feed?  Say you're working on a deep-dive analysis project, or you're preparing a data set to use for data mining.  You're probably pulling activity from a discrete period of time.  If so, integrate outside your web analytics application, where there's less overhead for a one-time task.  If, on the other hand, you're going to want this integrated data to be available at moments notice for all eternity, you're best off integrating wherever you can most easily automate your feed, which brings me to my next point:

  2. How much effort will it take to automate, in vs. out?  Call me lazy or call me practical, sometimes the right answer is the the easiest one (that's Occam's Razor, right?). The major commercial web analytics vendors have built-in integration tools, like Coremetrics Connect and Omniture Genesis.  If the data you need to integrate falls within the realm of what your web analytics application can handle, use the wizard and take a feed in.  If, on the other hand, you want to integrate custom data that's not wizard-able, take a feed out instead - but make sure you've got IT resources to help you automate the load into the destination system.

  3. Which analysis tools do your data consumers prefer to use?  Maybe you've got a favorite data visualization application (like Tableau), or predictive modeling software, or another business intelligence tool that people at your company like to use.  Yes?  Then integrate your data in a place where it will be easy to get at using that tool, most likely outside your web analytics application.  If you plan to use Excel you have more of a choice, because most web analytics vendors have Excel plug-ins.  You could integrate within your web analytics application and then feed it to Excel, or, if your data set is small enough, you could integrate by VLOOKUP()-ing right there inside Excel.

  4. Are your data consumers already active users of your web analytics application?  If so, you'd be doing them a favor by putting the integrated data where they're most likely to use it.  On the other hand, if they spend all day working with some other business data system, put it there instead.  It could be the factor that determines whether the integrated data ever gets adopted in practice by the people who are expected to use it. 

  5. Will you need reporting components that web analytics applications handle especially well, like browser overlay and pathing?  This will depend on whether your web analytics application actually lets you display integrated data in browser overlay and pathing reports.  If so, and if you can imagine actually using these reporting components, try to integrate inside your web analytics application.  Although web analytics applications are not as robust, generally speaking, as other data analysis tools, they manage to do a good job of presenting clickstream-specific data. 

  6. Are you hoping to integrate data that can actually be gathered at collection time?  Maybe the extra business data you want to integrate is something you'll be able to assign to a custom variable in your web analytics application at collection time.  If so, you'll be able to integrate without any after-the-fact joining.  If your integration data doesn't surface until further downstream, though, you can't use this approach.

  7. Do you need to store your integrated data behind the corporate firewall?  This isn't so much a technical issue as a legal one.  If the data you want to integrate involves personally-identifiable information and you're using a hosted web analytics solution, go re-read your site's privacy policy.  Chances are you will need to store the integrated data behind your own corporate firewall.  If you host your own web analytics application on-site you may still be able to integrate inside it, otherwise you'll need to pull a feed out.

So, depending on your situation it's perfectly reasonable to join data in both directions - inside and outside your web analytics application.  Strive to find a solution that's practical, easy, legal, and most likely to make your data analysts happy.

January 14, 2008

How to Prep for Site Redesign Measurement

I've been blogging for a few months now and it's been really fun, especially when it comes to choosing topics to write about.  So far I've posted about career development, events I attend and host, data integration, user-generated content, ... heck, I've even posted a picture of my family's pet chicken.

Now, in an attempt to say more about what I actually do at Semphonic, I thought I'd tell you about one project I'm currently working on.  Without giving out any identifying details about my client I'd like to share some tidbits that will be generally useful if you find yourself in a similar position.

So I'm doing a site redesign analysis project.  My client has recently made some major structural, interface and content changes to their site, and I'm comparing visitor behavior (and outcomes and sentiment) before and after the  launch of the new site. 

Redesign analysis is something I've done numerous times throughout my career in web analytics.  Simple fact:  sites get redesigned over and over.  Analyzing the impact of redesign is mostly about picking the right things to measure and then coming up with an interesting story to tell project stakeholders.  Let it be clear that I'm talking about sweeping changes to a whole site that happen at the flip of a switch, not incremental changes made by way of testing. 

Sometimes a redesign launches and I - as the measurement person - have looked back and said, "D'oh!  I should have thought of that in advance!"  In the remainder of this post I'll list out 3 ways I prepare for a site redesign measurement project.  Setting up these things ahead of time will allow you to avoid some known pitfalls.

Redesign Prep #1: Take Screenshots

SnagitWhat to do:
Take "before" screenshots of all the major pages on your site and put them aside for safekeeping.  Yes, you may be able to dig archived pages out of a content management system, but screenshots are a more reliable bet.

Let me just say I absolutely love the screen capture utility called SnagIt, especially the scrolling window functionality.  I recently discovered that SnagIt can also preserve links on a web page, which is quite useful. 

Why to do it:
Because you'll need to know what the site looked like prior to redesign, and  you may want to include these screenshots in your analysis presentation.

Calendar_cropRedesign Prep #2: Decide on Timing

What to do:
Take out your calendar, pick "pre" and "post" windows of time to analyze, then pick dates a little farther out from that when you'll be able to share your analysis findings. 

It's a bit of a balancing act.  Invariably site owners want to know immediately whether or not their effort was a success, but from the analysis standpoint it's advisable to wait until a sufficient period of time has passed before you draw any conclusions about the impact of redesign. 

In my current project I'm doing a 2-week quick check-in followed by a 1-month post final analysis - that way I'm able to give some immediate feedback while I wait for the data to roll in and then use the longer time period in my final presentation.

Why to do it:
Because being proactive rather than reactive about data analysis is simply good form, plus it gives redesign project stakeholders some dates to look forward to.

Redesign Prep #3: Grab Data

What to do:
Figure out  if there's any historical data you won't be able to get after the site launches.  This varies depending on what tool you're using, but browser overlay and "Next Page" reports are the most commonly affected.   

What I mean is, you may only be able to view the browser overlay for the period of time when your page looks exactly like it does today.  For dates in the past when the page looked different, your browser overlay report may be unintelligible.

Know what's fleeting, then go in and grab data for your "pre" time period(s) while the data is still available.  If, like me, you choose to analyze two different windows of time - a short period and a long period - be sure to collect both of these snapshots.  Be thorough.

Why to do it:
Because you don't want to be faced with gaps in data as you pull together your analysis. 

So, how do you prepare for measuring redesign?  Anything you'd like to add to what I've mentioned here?  I welcome your comments.