Scrapers stealing your content for SEO

December 15th, 2005

by Stephan Spencer

Content is king on the web. A site without content is doomed to lousy search engine rankings. Search engine spammers can’t be bothered writing good content. Especially when they can easily steal it from other web sites. How do they do it? They use “scrapers” — spiders that trawl web pages and/or RSS feeds and siphon off the content. They then stick your content on their own site and slap their own ads and affiliate links onto it.

The spammers especially want you to use relative links across your web site. That way they can lift your entire website and they don’t even have to go to the trouble of rejigging your internal links to make them point back to the scraped site. Granted, as far as bandwidth conservation, relative links are better than absolute links (also known as “hard links”). But let’s not make the spammer’s job any easier.

So use absolute links throughout your site.

As a side benefit, if your site responds to multiple domains and you use absolute links, you’ll also be helping the search engines reduce the potential for duplicate content by definitively identifying the full, canonical URL.

Also, to check if your site has been scraped, use Copyscape.

Tagging, tag clouds, and auto-tagging

December 13th, 2005

by Stephan Spencer

Tag clouds, a Web 2.0 sort of user interface for navigating tagged content a.k.a. folksonomies, gives certain hyperlinked keywords a larger font size treatment than others. These links lead to various category pages, tag pages, or search results pages.

One of my favorite implementations of a tag cloud on a blog is on O’Reilly Radar (on the right).

Another is the one on Eurekster’s blog (on the left).

The latter uses a new approach of “auto-tagging”. Eurekster calls this tag cloud of theirs a “BuzzCloud”. Webmasters can get one for free by signing up for their new Swicki service, which is a personalized Web search engine that is targeted and relevant to your site’s audience. You can seed your buzzcloud with search terms of your choosing, then Eurekster adds additional terms based on which searches are popular with your visitors. Visitors who click on the links are taken to a Eurekster search results page for that term. The results popular with you & your audience are promoted to the top of the search results and marked with an icon — in essence, tagging the results as well as the term.

Tagging that requires manual intervention such as del.icio.us and Technorati definitely have their use, but I think they are primarily for more web-intensive users; the combination of manual control and auto-tagging offered by Eurekster with swickis can potentially lead to mass uptake amongst web content editors. I’ve put a Eurekster swicki & buzzcloud here on my blog (on the right-hand column, near the bottom). Try it out and let me know what you think. Get your own free swicki for your blog or website here.

Affiliate programs that pass link gain (PageRank)

December 12th, 2005

by Stephan Spencer

Most affiliate programs do not benefit search engine rankings because the link from the affiliate to the merchant doesn’t count as a “vote.” Thus, the merchant will not see a benefit in their Google PageRank and consequently in their search engine rankings. For example, any merchant using LinkShare or Commission Junction will not see such a benefit. That’s because they all use temporary redirects, also known as 302 redirects. That type of redirect, which is the one programmers and site administrators tend to use by default, doesn’t pass the link gain (e.g. Google PageRank) on to the target (final destination) URL. Only a very few affiliate management services allow the merchant to capitalize on the link gain of the affiliate. MyAffiliateProgram.com is one such affiliate solution. So I checked them out, and it turns out that it kinda works. Yes, kinda.

Here’s the problem. The affiliate solution needs to use permanent redirects (a.k.a. 301 redirects) rather than temporary (302) ones. MyAffiliateProgram.com uses what they call “direct links.” Here are a couple examples of affiliate-tracked direct links that they provided me to look at: http://www.myaffiliateprogram.com/?kbid=1001 or http://www.kitchen-universe.com?kbid=1001. But when you visit either of these 2 URLs, there is no redirect at all. Consequently, this creates lots of duplicate pages in Google when Googlebot finds these affiliate-tracked direct links and follows them. Taking the first URL as an example, if you search Google for site::www.myaffiliateprogram.com inurl:kbid you’ll see 6,980 duplicate pages in Google. In other words, these are pages that were already in Google with URLs that don’t have kbid= appended at the end.

Think about it this way: Yes, with MyAffiliateProgram.com a merchant will get PageRank flowing to all the links contained on the countless duplicates of the merchant’s home page that are getting indexed. But because there is no 301 redirect present, MyAffiliateProgram has failed to collapse the link gain to one definitive version of the merchant’s home page. Then search engine spiders come along and index all these versions of the merchant’s home page which compete with the merchant’s true home page (the one without any kbid=). Furthermore, searchers who click on listings in the search results that contain kbid= in the URL will get counted as referrals from the affiliate and the merchant will pay for that. Ouch!

So, buyer beware when shopping for an affiliate management service that passes PageRank to your site. The devil’s in the details.

Any readers want to recommend affiliate solutions that do effectively pass link gain?

UPDATE: Just found this great blog post from Greg Boser that discusses this issue in more detail.

Optimizing your content for more Google AdSense revenue

November 29th, 2005

by Stephan Spencer

Optimizing your site for higher search engine rankings is an obvious activity for anyone with a website. Optimizing your site for higher conversion rates is another obvious one. But how about optimizing for higher advertising revenue — specifically, a bigger check from Google for the AdSense ads that you display on your site.

Consider for example if you had a website on redecorating for Do-It-Yourselfers. You might have a page all about “housepainting.” But, as described in this article in USA Today about webmasters making money off of AdSense, “housepainting” isn’t a great money term for AdSense revenue — it’s only a 20-cent word. “Home improvement,” on the other hand, is worth $2. That’s a $1.80 difference.

So in effect you can give yourself a nice pay increase just by changing the keyword themes of your pages that display AdSense ads by creating new content pages around those keywords. And the real opportunists out there are creating pages about mesothelioma, a rare form of cancer caused by asbestos that lawyers are bidding on. That keyword is worth an order of magnitude more than “home improvement.” But that would be sooo dirty! Thankfully I don’t know anyone THAT dirty!

Blogging Builds Brands

October 18th, 2005

by Stephan Spencer

Originally published in All About Branding.com

Blogging is one of the hottest trends on the net. A blog (short for “web log”) is a web-based diary where the author can ruminate on whatever strikes his or her fancy. The blogger may share photos, poetry, political views, gossip, industry trends, business advice, or the latest on their personal life. By definition, blogs are organized in reverse chronological order. Many are updated daily. They can have one or multiple authors, such as a community blog.

Continue reading »

Coverage of SES San Jose: Favorite SEO Tools

August 11th, 2005

by Stephan Spencer

Here we are, the last session of Search Engine Strategies. It’s been a great, but exhausting conference. The session I attended was on SEO Tools. Three of the five panelists provided their Powerpoints on their websites (just so happens they were the three best presentations), which you should definitely check out because they show screenshots of these tools in action. Download the first two Powerpoints from www.webuildpages.com/ses and the third from www.epiar.com/ses.

Jim Boykin:
Wayback Machine
Find Age of Website Tool
Poodle Predictor (spider simulator)
Copyscape (website plagiarism search)
URLinfo
Backlink Anchor Text Analyzer
KwMap (a keyword map for the whole Internet)
Hubfinder (looks for co-occurring backlinks, which may be authoritative links that help satisfy topic dependant link authority algorithms. To use Hubfinder enter a subject, and / or competing URLs to analyze linkage data of top ranked competing sites via the Yahoo! API.)
Keyword Tracker

Todd Malicoat:
Domain/server level information: Whois Source, DNS Stuff, and Check Class C IP Address (this last one is to make sure the links that you plan on buying are on different class C blocks)
Competitive information tools: GoogSpy, SwitchProxy extension for Firefox
Backlinks & offpage information tools: Pages Indexed, Backlinks Domain, PageRank, Allinanchor, Keyword Density tool, Yahoo! Link Harvester
Keyword information: Google Sets, Keyword Density tools, Google Suggest, Snap.com Keyword Stats
Header & page level information: Server Header Checker
Spidering & indexability: Xenu’s Link Sleuth, Sandbox Detection Tool

Ken Jurina:
Firefox extensions: SEOpen, Web Developer, Search Status, PDF Download, Roboform toolbar, Search Keys, IE View (all downloadable from http://extensionroom.mozdev.org)
Web CEO
Click Tracks
LiveSTATS
Roboform
Marketleap Link popularity check, Search engine saturation, Keyword verification

Bill Hartzer:
OptiLink
OptiSpider
Keyword Combinations
Keyword Helper
URL Trends domain analyzer (it also supports notifying you via email or RSS when changes happen)
Sources of other tools: www.seocompany.ca/tool/seo-tools.html, www.digitalpoint.com/tools/, www.seotoolset.com, www.seochat.com/seo-tools

Paul Bruemmer:
Alexa
RankingManager
Linxviewer
Yahoo! Finance
Hoovers Pro Plus
Print Screen Plus

Well I wanted to blog many more sessions than I did, but it ended up being a lot harder than I thought it would be. Thankfully for you, dear readers, there were many other capable bloggers blogging the SES sessions. In particular check out the coverage on Search Engine Roundtable blog.

By the way, a big hello to all the bloggers I met for the first time at SES, including Scott Miller, Aaron Wall, and Barry Schwartz, to name a few.

Coverage of SES San Jose: Search Engine Q&A On Links

August 10th, 2005

by Stephan Spencer

I’m a bit behind on my conference session blogging. Waaay too many parties going on; doesn’t leave much time for blogging. The Google Dance last night. Yahoo! party at Great America the night before. And tonight I’ve got another party to go to. Yesterday I spoke on RSS. I’ll post a recap on that session later.

I just attended “Search Engine Q&A On Links”, which was great. Lots of useful advice from Google and Yahoo! about linking (nobody seemed to want to ask poor Ask Jeeves any questions). It was funny how obviously diametrically opposed the engines were to the immediately prior session on “Buying and Selling Links”. It’s hard to reconcile the two different sets of advice. Matt in the hallway before this session was adamant: “Don’t buy links!”

Anyways, without any further ado, here’s the session recap:

Kaushal Kurapati from Ask Jeeves:
Be cautious of: reciprocal links and purchasing links
Avoid: link farms, cloaking pages, invisible or hidden links that trick the crawler
Become an authority on a subject
Focus on your busines and content. Rest will follow. [I say: "yeah, right..."]
Teoma uses subject specific popularity: garner respect in your industry, subject-specific text based links can be understood. (hubs and authorities model)

Tim Mayer from Yahoo!:
Here’s some important news!! Yahoo! has just launched a brand new service: Site Explorer from Yahoo! Search. Stop scraping the Yahoo site for backlink results and use Site Explorer instead. Access via an API is offered too. And you can export as a CSV file.
Yahoo has 19.2 billion web objects in its index. Over 20 billion objects, when you include the audio and video.
Plans to use community to improve search quality. Social search = within a trusted network, where someone within your network vouches for a site.
Create natural linking strategies. when things start to look unnatural, is when you’ll start getting into trouble. We look at intent (linking to plasma TVs, diamonds, and Viagra all on the same page) and extent (i.e. what looks normal. Having everything on the page as links or 200 links on the page is too much!)
Yahoo! offers a much more comprehensive sample of backlinks than Google, but not a complete set of backlinks. New system (Site Explorer) will be reasonably comprehensive, in his opinion the most comprehensive out there.
It’s unnatural to link to sitemap-1 sitemap-2 sitemap-3 sitemap-4 sitemap-5. If you are doing this, you’re headed in the wrong direction.

Matt Cutts from Google:
Good links are earned links, links that are based on editorial discretion.
Create services that really useful. e.g newsletters, an article a day, syndicate through RSS (attribute my article and give me a link). start a blog.
Matt launched his blog today: mattcutts.com
Think outside the box.
Only SEOs and librarians do backlink searches. Historically we decided to dedicate a subset of our servers to backlinks. Only a sampling of backlinks would be displayed but only for a threshold of PageRank 4 or higher pages. A suggestion was made to show backlinks for lower PageRank pages too. We liked that idea so we now show a random sampling of backlinks, including low PageRank scoring pages too. We show twice as many backlinks as shown before, but still it’s only a sampling of the backlinks.
In graph theory, a clique in every node in the graph is very unnatural. So don’t link to every single node in your network of sites; it’ll get flagged.
For dynamic sites, you’re very safe if you have fewer than 2 parameters; keep the values of those parameters to fewer than 5 digits, and don’t name a parameter “id”. Googlebot sometimes tries variations of URLs by dropping parameters, but we only do that deep level analysis on big, quality sites.
Another good approach that alltheweb came up with: spider would always go 1 dynamic page deep from a static page.
Search engines only grab 100k or 200k or 500k so be careful loading up a huge page with a lot of links.
PageRank isn’t as important as SOME people make it out to be. BUT it’s NOT like “PageRank? Oh yeah let’s shuffle that one under the rug! That was sooo 4 years ago!”
“BO” = backlink obsession
We export PageRank only once every 3 months or so.

Technorati tag: Search Engine Strategies

Coverage of SES San Jose: Search Algorithms, The Patent Files

August 8th, 2005

by Stephan Spencer

I attended the “Search Algorithms: The Patent Files” session first thing this morning. The panelists were Rand Fishkin, CEO of SEOmoz.org, Ani Kortikar, Founder and CEO, Netramind, Dr. E. Garcia of Mi Islita.com, and Jon Glick, Senior Director of Product Search, Become.com. My favorite presentation was from Jon. He was not overly technical (Dr. Garcia lost me at the advanced mathematics talking about calculating dot products of vectors) yet he gave solid advice. Here’s what he had to say, in summary:

Take these patents with a grain of salt, because…
- patent applicants don’t need to use all the stuff they include in a patent application.
- patent applicants don’t have to disclose all of its features in a patent application.
- and they recognize that SEOs and their competitors are pouring over their patent apps.

With that said, there are some valuable learnings from the 2003 Google patent. Search engines may take into account: CTR on your page in SERPs, rapid changes in content, rapid growth of in-links, and length of time users spend on your site.

So which of these actually impact your rankings? Some are red herrings, such as:
- Clickthrough rate (CTR): it’s too easy to distort (e.g. through clickbotting, which is evil and likely to get you penalized). Probably CTR is used for demotion only. In other words, high CTR won’t help your organic rankings, but low CTR may lower your rankings.
- Time spent on a site: when users hit the back button almost immediately, it can signify an irrelevant page or 404 error. However, if this was used then this would in effect reward black hat tactics like mousetrapping and endless pop-ups — tactics that trap users within a site.
- Rate of change in content: Most recent crawl date, last time the content changed, registration date, and first crawl date mostly impacts crawl frequency, not ranking. Duplicate detection technologies are used to find meaningful changes in site content. Meaningful changes in site content do not include putting today’s date or today’s weather on the page — it doesn’t help rankings. When a site changes its IP address, it is often re-evaluated because it is possibly under new ownership.

According to Jon, what’s not a red herring is:
- Rate of change in links: Most Search Engines limit how quickly a site can gain connectivity (sandboxing, link aging). A sudden jump in in-links (e.g. from link farming and interlinking and triangle linking lots of domains) can draw scrutiny. There are exceptions for ?ĺspike?Ĺ sites (editorial review, lots of accompanying news/blog posts, lots of web searches).

Coverage of SES San Jose: Earning from Search & Contextual Ads

August 8th, 2005

by Stephan Spencer

Hello from sunny San Jose. I’m at the Search Engine Strategies conference – THE place to be if you care about search. I’m going to be blogging the sessions, so stay tuned over the next 4 days.

Here’s my first installment: a recap on the session I attended before lunch today on “Earning from Search & Contextual Ads”. Panelists were: Jason Calacanis, Co-Founder, Weblogs, Inc., Will Johnson, Yahoo! Search Marketing, Scott Meyer, President & CEO, About, Inc., Gokul Rajaram, Group Product Manager of Google AdSense, Google Inc. and Jen Slegg, Owner, JenSense.com.

Jen from JenSense.com started the panel off:
Jen started off by comparing and contrasting AdSense w/ Yahoo’s new YPN (Yahoo Publisher Network). Similarities include…
- very large pool of advertisers
- real time stats
- neither will tell you the revenue split
- can’t show both YPN and AdSense ads on the same page

Differences include…
with AdSense:
- 4 ads in smaller font
- international publishers ok
- offers additional tools & services
- more competition for higest paying
- multiple ad units per page
- “smart pricing” (CTR taken into account in pricing)

with YPN:
- 3 ads in a much larger font
- beta for US publishers
- only traditional ad units
- fewer publishers means less competition
- same ads on multiple units
- no smart pricing
- in future will be able to transfer your earnings to your advertising account

Many alternatives to AdSense and YPN:
- Kanoodle brightads: avg $0.35 earnings per click (EPC). 30,000 advertisers in network.
- Adsonar: thousands of advertisers
- Clicksor: avg $0.20 EPC. 4,000 advertisers running 20,000 campaigns. Will pull ads from other ad networks if insufficient clicks.
- Chitika: avg EPC $0.50
- Mirago: avg EPC .21p (approx $0.31 USD). you must invoice them. 12,000 advertisers
- ContextWeb: over 40,000 advertisers
bidclix: avg EPC 0.30. 11,000 advertisers
- Others include Miva Adrevenue xpress, Quigo, etc.
Rhetorical question from Jen: “When will MSN jump in?”

Optimizing tips:
- Placement: Bottom of page is bad. Good practice is to make link color the same as other links on the site. Anther good tactic is to place the ads on the left column where the nav usually is.
- Proximity:
- Ad unit selection: Try a variety of sizes and test.
- Ad unit colors & borders: Don’t use the standard ad unit colors / layout. Mix things up to prevent banner blindness. Try both complimentary and contrasting colors. Most sites find hidden borders yield highest CTR. like 2 or 3 times
- URL filters: Don’t do it as a way to get higher paying ads to appear. Only block your direct competitors or your own websites.

Testing:
- Use AdSense or YPN channels to track highest CTR & earnings pages. AdSense or YPN may perform better. Try both.
- Test on non-holiday weeks
- Try switching ad placement, ad unit sizes and colors
- Keep track of what works and what doesn’t
- Never assume that what works on one site will work on another.

(more…)

How blogging has paid off

July 19th, 2005

by Stephan Spencer

I was recently interviewed by a journalist on business blogging and its benefits. He wanted to know specifically what it’s done for me to have a blog. Here’s what I told him:

  • I’ve gotten inquiries from prospects who found Netconcepts through my blog.
  • My blog helps me get speaking gigs and PR. In fact, I recently got one of my blog entries taken verbatim by a well-respected US magazine — DM News — and published as an article.
  • It builds credibility and establishes me as a thought leader in the eyes of prospects and clients. For example, one of our recent clients choose us over a competitor for online marketing services partly because of my blog.
  • It’s helped upsell existing clients on additional services, as many of them are regularly reading my blog. For example, some of our clients are going to start a blog and use us for blog design, blog consulting, etc.
  • I’ve gotten links from popular bloggers, like Robert Scoble of Microsoft. It’s much more difficult to get a mention from Scoble (or other prominent bloggers) if you’re not a blogger. Scoble’s blog, called Scobleizer, is one of the most well-linked blogs on the Internet. Some bloggers have even included me on their blogroll, like Toby Bloomberg of Diva Marketing Blog (Thanks, Toby!)
  • It’s helped me with recruiting panelists for Thoughts Leaders Summits that I organized and moderated for MarketingProfs. For example, the lineup of panelists for one of the recent summits included Internet marketing gurus: Seth Godin, Doc Searls, Robert Scoble, Steve Rubel, and Debbie Weil. My blog played a role in establishing my credibility with them and getting them to respond to my “cold call” email message.
  • Blogs are also great for SEO (search engine optimization). Links are important to the search engines, and the blogosphere is richly interlinked with bloggers linking so much to each other. Blogs are also rich in content, which search engines also like. If I blog about RSS and SEO (which I have), for example, next thing I know I’m #1 in Google for [rss and seo].
  • I’ve also built some great business relationships with other respected bloggers. They have referred business to me, shared speaking opportunities with me, etc.

I had yet another experience with that last item, just today in fact. I’m speaking at the Frost & Sullivan Sales and Marketing East conference in Boston, and a fellow blogger from a competing SEO firm who was sitting at the table I was facilitating earlier today on blogging very kindly publicly commended my blog to the rest of the group for its content and thought leadership. (Thanks Stephen!) There’s a guy who understands the benefits of coopetition (rather than competition)!

The journalist also wanted to know how my blog’s traffic had grown over time. Here are the charts I shared with him showing the growth trends in pageviews and visitors:

Pageviews:

Visitors:

A pretty respectable trend, I’d say. If you’re curious what the actual numbers are, I will give you a hint and say that the both charts measure into the tens of thousands of visitors per month. Hopefully the trend will continue.

One thing I really need to do to keep the numbers heading northward is to blog more frequently. I’m sure traffic growth will accelerate once I do. I just need to buckle down! I guess I’ll just sleep less… (sigh). You other bloggers out there know what I’m saying here, don’t you! More often than we’d like, it’s the wee hours when we’re blogging.

How might a blog pay off for you? For some general ideas, read this article of mine, on blogging, published in last month’s issue of Multichannel Merchant magazine.


Pages (4):1234