Sometimes I need to compare 2 lists of URLs and find out how much they overlap. Which URLs are only in list #1? Which URLs are in list #2? Which URLs are in both lists?
Here’s an example: say I’ve got one list of URLs from my Web analytics tool (like Content>Top Content in Google Analytics) and another list of URLs from a sitemap-generating spider. Comparing the 2 lists lets me see which pages are actually getting traffic on my site versus which pages are reachable by spider. Although most URLs that get traffic are also spiderable, it’s also likely that some URLs get traffic but aren’t spiderable (like thank you pages) and some URLs are spiderable but get zero traffic.I've developed a quick and easy way to do this kind of comparison using Excel. You can download it here. I’ve included step-by-step instructions for you to follow. There are 8 steps:
- Paste in your first list of URLs
- Standardize them
- Pivot to remove duplicates
- Paste in your second list of URLs
- Standardize them
- Pivot to remove duplicates
- Concatenate the standardized, deduped lists and add a "magic" code
- Pivot again to get results
All the details are in my example spreadsheet.
If you're a programmer I’m sure you have a fancier way to compare lists. However, the beauty of my method is that anyone can do it as long as they have a copy of Excel and a basic understanding of pivot tables.Speaking of pivot tables, if you've never created one I encourage you to learn right this minute. It's something that every single Web analyst should know how to do. If I ever interview you for an analyst job, even an entry-level one, I will quiz you on pivot tables.
What do you think? Do you have an even easier method? And, as a Web analyst, why have you needed to compare lists of URLs?
Recent Comments