Sometimes I need to compare 2 lists of URLs and find out how much they overlap. Which URLs are only in list #1? Which URLs are in list #2? Which URLs are in both lists?
Here’s an example: say I’ve got one list of URLs from my Web analytics tool (like Content>Top Content in Google Analytics) and another list of URLs from a sitemap-generating spider. Comparing the 2 lists lets me see which pages are actually getting traffic on my site versus which pages are reachable by spider. Although most URLs that get traffic are also spiderable, it’s also likely that some URLs get traffic but aren’t spiderable (like thank you pages) and some URLs are spiderable but get zero traffic.I've developed a quick and easy way to do this kind of comparison using Excel. You can download it here. I’ve included step-by-step instructions for you to follow. There are 8 steps:
- Paste in your first list of URLs
- Standardize them
- Pivot to remove duplicates
- Paste in your second list of URLs
- Standardize them
- Pivot to remove duplicates
- Concatenate the standardized, deduped lists and add a "magic" code
- Pivot again to get results
All the details are in my example spreadsheet.
If you're a programmer I’m sure you have a fancier way to compare lists. However, the beauty of my method is that anyone can do it as long as they have a copy of Excel and a basic understanding of pivot tables.Speaking of pivot tables, if you've never created one I encourage you to learn right this minute. It's something that every single Web analyst should know how to do. If I ever interview you for an analyst job, even an entry-level one, I will quiz you on pivot tables.
What do you think? Do you have an even easier method? And, as a Web analyst, why have you needed to compare lists of URLs?
Thanks for this informative article June. I'll be sure to try things out when I get a chance.
Posted by: Les | June 30, 2010 at 02:53 PM
this type issue arises all the time, and Excel is particularly ill-suited to solve it. I normally do it by using vlookup(textToFindFromListA, ListB,column) which would return the identical string sought, if found, otherwise return an error.
I think PowerPivot and DAX might be the best way to do it going forward, in the absence of a custom VBA function.
Posted by: Peter | July 06, 2010 at 04:07 AM
Thanks for the feedback, Peter! Ive used vlookup in the past, as well. The thing I like about my solution is that its bi-directional - youd have to do two vlookups to see the outliers in each group.
I haven't used PowerPivot and DAX but they sound like promising tools as ongoing (or high-volume) solutions. Ive just spent a few minutes of amusement watching the overview videos on their site. http://www.powerpivot.com/videos.aspx Cheers, June.
Posted by: June Dershewitz | July 11, 2010 at 12:44 PM
I've downloaded it. Thanks
Posted by: Joko Susilo | August 02, 2010 at 11:50 PM