White Paper: Chasing the Long Tail of Natural Search

August 7th, 2006

by

How to size and capture the unbranded keyword

How does the average retailer’s long tail look? How do I grow my

unbranded keyword traffic?
How many people search for those unbranded

terms?
How Does natural search influence search term patterns? What is

The size of the long tail?
How much keyword traffic can one page yield?

How does the “Long Tail” of search terms compare to branded search traffic?

How can I optimize thousands of pages a day? Do traditional SEO stategies still
Apply
? What does the average company’s Long Tail look like? What is it worth

To
me?
Why is My search traffic mostly for my own Company terms?

Download the PDF version: “Chasing The Long Tail of Natural Search”

TABLE OF CONTENTS

Executive Summary
Introduction – The Long Search Tail
Sizing The Unbranded Tail
Page Yield Analysis
Capturing Long Tail Potential
PART I: Key Performance Indicators

PART II: Scalable SEO Strategies
Conclusion

ABOUT NETCONCEPTS

Netconcepts is a global natural search and web design firm trusted by leading retailers and media properties as specialists in search friendly E-Commerce development and natural search optimization services, including the industryâ??s only performance-based Proxy OptimizationTM platform. Headquartered in Madison Wisconsin, Netconcepts serves brands such as HSN, Discovery Communications, Kohls, REI,
Verizon, Cabela’s, and leading SEM agency partners, including Performics, and
Resolution Media.

EXECUTIVE SUMMARY

Brand searches are a small minority of searches conducted every day. Yet most E-tailers rely on them for their natural search traffic. Imagine taking to your next management meeting, a concrete prediction of the value of search traffic available from non-brand searches. Until now it has been difficult to find the numbers to justify investment in natural search optimization or quantify a siteâ??s potential search traffic.

In “Chasing the Long Tail of Natural Search: How to Capture the Unbranded Keyword,â?? Netconcepts attempts to provide insight to E-Commerce managers to quantify and capture their sites’ full natural search potential. It’s a big opportunity for big sites. But how big is it?

To answer that question, Netconcepts assessed 1.2 million unbranded natural search visits to 25 client merchant sites — boasting 5,000,000 pages indexed by search engines, in order to:
1. Develop a methodology for estimating unbranded search term traffic.
2. Introduce a set of Key Performance Indicators (KPIs) that reflect these dynamics.
3. Provide a benchmark for merchants to measure natural search performance against.

Here are our key findings on the “average” merchant’s long-tail profile:

â?¢ Only 14% of indexed pages yield search traffic, generating 4.6 visitors per month from non-brand searches.
â?¢ 189,000 brand searches are conducted every month.
â?¢ Retailers generate 80% of search traffic from brand keywords and 20% from non-brand terms.
â?¢ Total market potential for unbranded keyword traffic exceeds 7,000,000 searches per month — roughly 100 searches for every unique page, 38 times greater than total brand searches.

Our proposed â??Page Yield Theoryâ?? framework suggests that the E-Commerce websites best suited to capture this unbranded search potential combine large numbers of unique and “indexable” pages, and brand strength capable of wooing searchers into clicking.

But with a large site come management challenges. Limited resources. Hypercompetitive environment. Ever-evolving algorithms. How does the merchant successfully manage ongoing optimization of tens of thousands of website pages? You need a richer SEO management discipline based on KPIs and scalable solutions. We present such solutions in this report.

INTRODUCTION â?? THE LONG SEARCH TAIL


The term ‘long tail’ is a colloquial way of describing the low frequency portion of a Zipf or Pareto (’80/20 rule’) distribution graph. In these graphs, a high frequency population — ‘the head’ — is followed by a lower frequency population — ‘the tail.’ In aggregate, the tail can outweigh the head. This sort of distribution is common in life. Consider, for instance, the words ‘the’ and ‘of.’ These words are used more frequently than any other words; but it’s the thousands of less common words that represent the language as a whole.

This same statistical distribution occurs in studying search traffic patterns as well: common keywords people search for comprise the distribution “head,” and obscure keywords form a “long tail” of, literally, one-hit wonders. This phenomenon holds true for both paid and natural search. However, for most retailers, their natural search traffic is made-up of almost entirely “brand” traffic. In reality, their “long tail” of unbranded traffic is quite short, and it raises many difficult questions:
â?« How many search terms should drive traffic?
â?« How much traffic should any keyword drive?
â?« How many searches are performed for such terms?
â?« How does this compare to brand searches?
â?« How could we estimate such market potential?
â?« What kinds of SEO strategies can help capture the long tail opportunity?

In this paper, Netconcepts provides a framework to help merchants answer these questions. Drawing upon a retail client base that utilizes Netconcepts’ proprietary natural search proxy optimization platform (GravityStreamâ?¢), we can begin to understand the performance dynamics of the long tail.

SIZING THE UNBRANDED TAIL


Major retailers who have yet to undertake serious natural search website optimization lament over two unsurprising, yet challenging phenomena:

As a percentage of natural search traffic, an estimated 95% or more typically comes from “brand” terms, defined here as searches for the merchant, derivatives of the merchant’s name or URL, or the merchant’s products (e.g., [petsmart], [pet smart], [petsmart.com], [petsmart dog collar], etc). While this brand traffic tends to produce a significant sales volume, these are essentially “defensive” sales. The brand is expected to win these search battles with no contest. Marketing logic and intuition suggest that a much larger treasure exists throughout the rest of the unbranded keyword universe. A strong brand should do more than just yield brand-centric search traffic; it should be put into a position to win appropriate “offensive” unbranded search battles. But how big is this “long tail” treasure? How does it compare to brand searches?

As a percentage of unique pages available on the website, a small percent (typically under 5%) yield results. In fact, this small set of pages is largely only converting the “brand” searches initiated above. This can be an unsettling realization: Most large retail websites consist of tens of thousands of unique dynamic pages (SKUs, products, subcategories, categories, departments, etc). Such obvious scale should be expected to yield powerful economies in the quest to reach non-brand searches. But what percentage should be expected? Why are those pages not pulling their weight?

How should a retail brand estimate the sales potential natural search promises? A few approaches are available:

Keyword Research

Keyword research tools like the
Overture Keyword Selector, and Wordtracker, provide a basis for estimating demand. However, both have rather limited data reach and thus report a limited view of search demand when relied upon solely, although newer tools such as KeywordDiscovery encompass reportedly more terms. Still, while this approach may suffice for small websites with hundreds (maybe thousands) of keywords under watch, it does not easily scale to project on hundreds of thousands of keywords. For large websites, this issue is then exacerbated by a more significant limitation of knowledge. That is, the marketer does not know in advance the entire universe of keywords which his/her tens of thousands of pages might possibly yield useful traffic, nor does s/he have a way of mapping the two and taking any action to cause such an outcome. Because of this, keyword research alone can never provide a true estimate of natural search potential.

Pay Per Click (PPC) Extrapolation

The average retail brand spent $978,000 dollars in 2005 on paid search marketing* (*Shop.org/Forrester 2006). As CPC rates continue to rise* (*Jupiter 2005 data suggests increase from $0.39 in 2004 to $0.58 in 2010), marketers look to combat decreasing ROI by focusing attention on natural search listings; and for good reason: Other research reports and eye-tracking studies indicate a strong searcher preference towards clicking on natural listings over paid listings, as much as 85%*
(*Jupiter 2004). It is tempting, therefore, to conclude that if paid search listings are only clicked 15% of the time, then natural search should yield 5 â?? 6 times a retailer’s paid search sales performance, for a fraction of the cost. However, this approach does not take into account factors such as searchers using paid and natural search at different points in the buying cycle, or vast differences in the keywords used in the paid search campaign. Not to mention the incomparable keyword-to-page mapping ratios â??- paid search allows unlimited keyword ads to point to any given landing page, regardless of whether those keywords are embedded at all on the page or within the website network, which is clearly a requirement for natural search visibility (in most circumstances). Therefore PPC extrapolation also fails to provide a useful estimate of non-brand natural search potential.

Page Yield Theory

Both of the previous approaches attempt to estimate search potential based on relatively limited keyword demand data, without respect to the supply side of the economic equation. It is important to recognize that unique pages and their myriad qualities are ultimately what generate natural search traffic. In addition, it is the very scale â??- the volume â??- of unique pages on most merchant websites that render the previous approaches lacking. Therefore, we see a need for a more scientific method of estimating the potential value of the unbranded natural search tail. We propose calculating a robust and scalable prediction of long tail potential as a function of website size â??- a model which we call Page Yield Theory.

Calculating such a model would require large sets of the following type of data, across a significant sampling of websites, such as:
â?« Total unique pages available on each website
â?« Percentage of pages that did and did not yield keyword traffic
â?« Average number of keywords yielded by a given page
â?« Average number of visits received for each yielded keyword
â?« Effective click-through rate per keyword to estimate total searches.

With such data, a long tail “composite” could be built to model long tail keyword performance based both on the “realized” portion of the tail (the traffic driven by a known percentage of pages), as well as the “potential” performance (the possible traffic if all unique pages yielded the average amount of keyword traffic). What follows is our methodology for calculating such a set of metrics.

_____________________________________________________________________

BREAKING DOWN 1.2 MILLION UNBRANDED SEARCHES



â?« 3 word search terms accounted for 36% of searches
â?« Two-to-four word searches accounted for 81% of searches. (This compares to a recent MarketingSherpa study *which found the most popular number of words used per search to be only 2, grabbing 28.2% of searches. Whereas, 2 to 4 word terms accounted for 64.7% â??- likely a difference between general website information searches, and more highly targeted retail-related searches.)
â?« 95% of these terms were referrals generated from Google,
MSN Search, or Yahoo Search.

*MarketingSherpa, Search Marketing Benchmark Guide, 2005-2006
_____________________________________________________________________

PAGE YIELD ANALYSIS


Overview

We set out to estimate the number of searches performed across the potential non-brand keyword tail over a set of 25 E-Commerce proxy sites over January 2006. For this purpose, we created a representative sample of highly popular as well as less popular brands -â?? over half of which were ranked within the Multichannel Merchant 100 list, servicing a range of sectors from Men’s and Women’s Apparel, Electronics, TV Shopping, Office Equipment, General Merchandise and more. We then obtained the following data to build a composite model of the average siteâ??s keyword long tail potential.

Unique Pages (P)

Most marketers assume the number of unique pages of their website should be roughly equivalent to the number of product pages the website carries. However, with attribute-based navigation elements becoming more common, and serving to expose many crawlable pathway permutations for any given page, a more robust method of estimating unique pages is required. On the other hand, relying on any search engine’s reported index of a website is also a tenuous proposition, as we often find Google has between 10 and 1000 times as many deep-level pages indexed as Yahoo or MSN, yet Google’s index can change dramatically for no apparent reason. In fact, Google’s reported index of the proxy sites in this study (using the
[site:] command) was 3.1 times the
number of unique pages crawled by Googlebot -â?? roughly 220,000 pages

for the average E-Commerce proxy site in this study. While perhaps surprising to the uninitiated, this is not an uncommon phenomenon â??- Googleâ??s index historically shows a “better safe than sorry” attitude towards keeping a few duplicate pages on hand.

Thus, based on both relative as well as historical performance, and the fact that all sites within the study had no major inherent crawler barriers (a point made by virtue of the common static-URL proxy configuration), we assumed that Googlebot’s resulting crawl would indeed be the most reliable means of estimating unique pages on a retail website. Using this method, during the January period, all proxy sites had over 3,000 unique pages crawled by Google, some with over 350,000. The average site had 73,000 unique pages as determined by this method.

Yielding and Non-Yielding Pages (YP)

Based on this, the average site had 73,000 unique pages, with an average of 10,000 pages (14%) yielding keyword traffic (P1), leaving an additional 63,000 pages — roughly 86%, with unrealized potential (P2). These pages combined are what drive the X-axis in the long tail.

If the ultimate aspiration of ethical search engine optimization is to ensure that every unique page on a website is “singing its own song,” then clearly the vast majority of pages that Googlebot has crawled and indexed (86% of the average pages) need either vocal lessons or a better song to sing (read: better keywords and better rankings).

Keywords per Page (KPP) and Hits per Keyword (HPK)

Over 600,000 total unbranded search terms generated visitors to the Yielding Pages above. The average number of these search terms on a per-site basis was roughly 24,000, producing a ratio of 2.4 Keywords per Page [24,000 Keywords / 10,000 Yielding Pages].

Furthermore, the 600,000+ unbranded search terms produced over 1.2 million unique search visits during the timeframe. To be more exact, each non-brand keyword generated roughly 1.9 hits. This provides the Y-axis of the long tail, and as one would expect, decreases gradually towards one over the length of the tail.

Both these ratios play key roles in scaling the length and thickness of the tail, assuming every page that yields traffic is capable of yielding 2.4 keywords at a rate of 1.9 hits per keyword. In other words, the data suggests that each additional page that yields search traffic will do so at a rate of roughly 4.6 unbranded search hits per month. These metrics are affected by seasonality trends, but are also a reflection of how many words are used on the average retailer’s website pages, how prevalent the keywords are in the page copy, HTML, backlinks, and other locations.

Click-Through Rate (CTR)

To determine how many searches were conducted on those 600,000+ keywords, we created a keyword sample from the data which allowed us to calculate an estimated click-through rate (CTR) to apply towards the January search term traffic. Specifically, we created a random sampling of 375 keywords from the January keyword search traffic. These terms were checked against the Overture Keyword Selector to estimate total searches conducted. By dividing reported searches by clicks received (from all engines), we found the effective click-through rate to be 4.7%.

CTR = [searches (reported) / clicks (received)]

It should be noted that to derive this click-through calculation, we applied a discount factor against the Overture data as follows: Based on comparisons against WordTracker data, we first halved this number to account for Overture’s generally agreed overstatement of keyword data (due to fraudulent clicks and position checking). Since Overture keyword data is garnered from the Yahoo! Network, which has a search engine market reach of 20%, we then multiplied this halved data by a factor of 4.5 to roughly estimate the number of searches conducted across all engines.

To illustrate, a site received 10 clicks for a term which Overture reported 100 queries were performed in January. We estimated that the 100 searches were overstated by a factor of two, so halved this to 50 actual queries. This 50 queries was based on some 20% of the search market. Thus multiplying by 4.5 places the estimated total searches for this term across major engines at 225, with the 10 clicks received representing 4% [10 / 225] of the searches.

Here is what the average site’s long tail would look like.

Branded Portion, A
Notice the portion of the graph to the right labeld ‘A.’ ‘A’ represents the site’s branded portion of the distribution head. According to Overture, the average siteâ??s distribution head consisted of roughly 84,000 branded searches a month, upon which we assume they earned nearly 100% CTR, though recent studies* (*Hitwise, April 2006) find that only 85% of brand searches end up on the brandâ??s website. The discount factor, described above in the CTR section, is then applied to the branded searches, bringing the actual branded searches to 189,000. This section is serviced by a very small percentage of website pages.

Realized Portion, B1 & C1
On average, roughly 10,000 (P1) pages generated search traffic from 2.4 keywords, meaning as many as 24,000 keywords drove traffic to those pages (the X-axis). Those 24,000 keywords generated roughly 2 hits each, producing total keyword traffic on average of 48,000 hits (C1). The potential searches conducted for this portion of the long tail can be calculated by dividing 48,000 hits by our estimated CTR of 4.7%.

Realized Potential Searches (B1) = [hits / CTR]

Thus, the non-brand searches (B1) equal roughly 1 million (1,000,000). When compared against the average amount of brand searches (189,000), this total non-brand search estimate of 1 million (B1) is some 5 times larger than the number of searches conducted for the brand specifically -â?? a preliminary indication of how much more potential the unbranded search market holds for even well-branded E-Commerce websites.

Unrealized Portion, B2 & C2
The average site had 73,000 unique pages (100%), with only 10,000 of those pages yielding traffic (14%), leaving an additional 63,000 pages (P2) with unrealized potential (86%). Multiplying 63,000 non-producing pages by 2.4 keywords per page, results in roughly 150,000 unrealized keywords across the long tail (the X-axis). Multiplying 150,000 keywords by an estimated 2 hits per keyword yields roughly 300,000 hits for those non-yielding pages (the Y-axis). Admittedly, we would expect the hits to decrease towards 1 per page, the farther out the tail goes. In any case, these are unrealized hits the site could receive if all pages were producing at the average rate. Total search potential (B2) for the unrealized portion of the long tail then can be calculated by dividing the 300,000 potential hits by our estimated 4.7% CTR. Thus, the unrealized potential is roughly 6.3 million hits.

Long Tail Calculation
To recap, the formula to estimate both realized and unrealized search potential is as follows:

[ Pages(unique) x Keywords (per page) x Hits (per keyword) ] / Click-through rate

[ 73,000 pages x 2.4 KPP x 1.9 HPK ] / 4.7% CTR = 7,100,000 total searches

Simplifying this formula produces the following multiple:

[2.4 KPP x 1.9 HPK] / 4.7% CTR = 97

This suggests that, when calculated as a function of unique pages with roughly similar performance indicators, the potential unbranded search traffic for a large website can be estimated as roughly 100 times (over 97 times) the number of unique pages.

Furthermore, the calculated unbranded search potential (7.1 million searches) is roughly 38 orders of magnitude greater than the average site’s brand search universe of 189,000 searches.

Note: These calculations assume that the average client has 73,000 uniquelly crawled pages, 14% yielding traffic, and that 14% gets 4.6 keyword visitors per page with 189,000 brand searches per month.

CAPTURING LONG TAIL POTENTIAL


Long Tail KPIs

By associating non-brand keyword performance to producing pages, the Page Yield Theory offers a manageable framework for optimizing performance of the natural search long tail. This framework consists of the KPI’s used and calculated in this study, such as Yielding Pages, Keyword Yield per Page, and Keyword Hit Yield. These metrics provide a scientific assessment of large-scale optimization effectiveness, enabling merchants to more reliably guide optimization effort towards desired outcomes.

To illustrate, the objective of long tail optimization is to increase the length of the keyword distribution tail, while simultaneously increasing the thickness of the tail. As demonstrated earlier, if keyword demand is present, the length of the tail (X-axis) is a product of Yielding Pages and Keywords per Page ratios. The area under the curve (Y-axis) is defined by the Hits per Keyword ratio -â?? a click-through rate multiplied against keyword demand. This is essentially a measurement of how highly a page ranks for that keyword.

In our study, the average merchant attained a yielding page ratio of 14%, with each yielding page producing 2.4 keywords per page, and each keyword yielding 1.9 hits during the period. Factoring these metrics is what enabled us to estimate the size of the unbranded search long-tail market potential.

While not all merchants may have ready access to these metrics, the model created in this study enables a merchant to estimate them based on the mix of brand and non-brand search keywords they presently experience, as illustrated at the top of next page:

For example, if a merchant is currently receiving a natural search mix of 90% brand traffic and 10% non-brand traffic, our model would estimate that roughly 6% of the websiteâ??s available unique pages are yielding that traffic. Furthermore, if this merchant wishes to grow the channel performance mix to 50% brand to non-brand, then 58% of unique pages would need to be empowered to generate such traffic â?? four times the current producing pages. By comparison, the average merchant in this study had a 14% page yield ratio, which would project to roughly 80% branded and 20% unbranded search.

Here again are the average KPI metrics calculated in this study:

â?« Brand searches: 189,000
â?« Yielding pages: 14%
â?« Keywords per page: 2.4
â?« Hits per keyword: 1.9
â?« Click through rate: 4.7%
â?« Index per crawl page ratio: 3.1

Compare these metrics against your own website’s performance. Clearly if your non-brand search traffic is less than 20% of your natural search channel, a majority of your pages are providing no tangible search value. Simply rewriting URLs and basic optimization may be enough to significantly move the needle. However, if 95% of your pages are yielding unbranded search traffic, your challenge is how to grow the yield further by increasing rankings and click-through rate. Different decisions are available and required depending on the current performance gap. Focusing on the metrics allows merchants to pursue strategies that deliberately seek to leverage the inherent scale of the website, and quantify the results.

The key is to remember the equation of scale â??- every yielding page produces 2.4 keywords, which yields 1.9 hits per month. Thus, each of your tens of thousands of unique pages is capable of attracting some 4.6 qualified visitors per month. Scalable strategies to increase unique pages and the yield of those pages represent the primary levers to profitably growing the long tail.

Scalable SEO Strategies

Page Yield Theory
Page Yield Theory suggests that maximizing unbranded natural search traffic is a matter of predictably increasing the yields of many tens of thousands of pages. However, the logarithmic “laws” that govern long-tail economics call for scalable solutions, whereas the effort-to-payoff ratio of traditional SEO leads to unsustainable, diminishing returns when effort is focused too narrowly on individual pages. This means that traditional tactics like embedding Meta tags across a few hundred pages, or writing more page copy, and making static pages, simply are not robust enough to scale across the long tail.

Mass Optimization
For example, our research has found that rewriting URLs from complex and dynamic to simple and static has a profound impact on not only the crawlability, indexability, and flow of link-gain (PageRankâ?¢ being Google’s popular metric) throughout a website’s network, but it causes profound increases in search traffic as well. (Note: This technique is common to the sites included in this study, all of which utilize the GravityStreamâ?¢ proxy optimization platform technology.) In addition, exposing more unique pages through attribute-based navigation provides many permutations of unique page content, typically increasing unique indexed pages (the tail’s length) dramatically.

Mass optimization â?? URLs, thin slicing, user-generated content, tagging & bookmarking

Other examples of scalable solutions include optimized HTML code (e.g., title tags, heading tags, etc) on category or product page templates, which instantly affects the search ranking credentials of thousands of unique pages.

Copy Optimization
Copy optimization, often a dreaded but necessary tactic, must be approached in an equally scalable manner. For instance, an approach we think of as “thin slicing” involves lightly touching key elements (title tags, for instance) of thousands of pages in short amounts of time, monitoring for signs of life, and then expanding the copy, and reworking those that achieved little affect â??- or not, based on estimated ROI. This sort of methodology approaches optimization in quantifiable waves based on test data as opposed to arbitrarily writing new copy for pages without informed selection criteria and / or test results. Similar thinking should be applied to other necessary but traditionally scale-free tasks, such as link building.

Web 2.0
Despite the hype, Web 2.0 does provide, in our view, a compelling and worthy platform for creatively enabling such mass optimization. The “organization” that can best scale to optimize a large website’s tens of thousands of pages on an ongoing basis, and for free, is in fact the customer base -â?? the individuals actively searching for and interacting with those pages day after day. Consider enabling consumers and users to interact with your website in new, rich Web 2.0 ways:
â?« Tag a page in users’ own vocabulary (e.g., Flickr, Amazon)
â?« Add their own user-generated content to a page (e.g., Wikipedia, Amazon)
â?« Bookmark a page with their own crawler-friendly term (e.g., del.icio.us)
â?« Subscribe to your RSS feed and blog as “link bait”
â?« Provide a most-searched list of keywords others used to find a given page.

These sample tactics treat consumers as co-creators rather than just users, making the website more useful, and in the process, effectively “outsourcing” the optimization of the websiteâ??s long tail in a highly scalable way, back to the very people looking to find it.

CONCLUSION

Consider your website’s long tail profile – is a small percentage of your unique pages powering your natural search traffic and attracting only brand terms? For the E-Commerce websites included in this study, the total market potential for unbranded keyword traffic exceeds 7,000,000 searches per month. Put into context, that is roughly 100 unbranded searches for every unique page â??- some 38 times greater than the amount of monthly searches conducted for brand terms.

The E-Commerce marketer looking to capture the full potential of natural search must focus on unbranded search markets, by smartly leveraging inherent website scale and brand strength. Increasing the amount of page-yield described here calls for a more sophisticated management approach to natural search optimization that emphasizes testable strategies, that are capable of scaling across the websiteâ??s thousands of unique pages, and that can be measured against the new key performance indicators benchmarked here.

If your E-Commerce platform, analytics, or resources prevent this level of optimization management prior to the 2006 holiday season, talk with Netconcepts, or your SEM agency about ways a proxy optimization platform like GravityStreamâ?¢ can help your brand stop chasing your long tail, and begin to capture it.

2 Comments

  1. […] ineffective, depending on your situation. As we’ve seen many times over at Netconcepts, the long tail of natural search is where most companies may find the “holy grail” of ecommerce. Instead of spending 90% […]

  2. […] For more information on how the Long Tail can positively influence online business, read our white paper, “Chasing the Long Tail of Natural Search“. […]

  3. by The Long Tail A Myth? Study Calls It Into Question : Natural Search Blog — July 2, 2008 @ 12:02 pm