Robots.txt is case sensitive!

What are robots?

Robots, crawlers, spiders or agents are programs which are used to traverse the wobbly world wide web automatically, taking note on which content is where. Search engines use these programs to index content for their indices, spammers use them to scrape content for their own sites, or even to crawl the web for email addresses to spam you even more. Understanding of how search robots work is an intrinsic part of SEO!

What is robots.txt?

Robots.txt is a file which, when placed in the root of a publicly available webserver, tells search engine robots and agents which content they can and cannot/should and shouldn’t access. It can be used to block access to members’ only directories for example, or pages which nobody should really be finding through search engines, or, as I was trying to do earlier today, block search engines from indexing pages which are a part of your affiliate system or associated with an affiliate ID.

What is an affiliate system?

Many e-businesses run affiliate systems, which can be described as an semi-automated process by which the e-business takes referrals from partner websites, and remunerates them for any transactions that result from the referral. Having worked in the online gaming industry for over 10 years now, I’m quite familiar with many of them, as they an intricate part of just about every online gambling business model.

Most affiliate systems use standard html links which contain affiliate ID parameters in order to track referrals from their partner or affiliated sites. Anyone who clicks the link which contains the affiliate parameters will be associated with that affiliate’s account. If they go on to purchase something from the site, the affiliate will take their share of that revenue.

Search engine optimisation and brand protection

This is leading somewhere, I promise!

Many companies, when taking on an SEO consultant, agency or in-house employee, go straight for the proverbial jugular. They want to target the big, juicy keywords which will drive mountains of good, converting, valuable traffic. Because of this, they usually overlook the basics, ensuring you’re dominating the search results for your brand names. Imagine, if you will, the panic in the office this morning, when after no more than a few hours, I spot a discrepancy in our brand term search results: an affiliate tracked URL is ranking in second spot, taking a nice bounty per referral as well as a share of any future revenue from any clients that came through that link!

SEO, brand protection and affiliate URLs

Whoever was in here before me, had not taken the time to ensure that search engines, and especially Google (ye olde search dominator) were not allowed to index URLs tagged with affiliate IDs. Due to the high volume of traffic this affiliate was sending through his affiliate ID, that URL got indexed for our brand name, and is still ranking in second place. I immediately submitted a removal request, and asked for a change to the robots.txt to ensure that affiliate parameters were blocked from this point forward… the reply: “what robots.txt?”

Robots.txt to block affiliate IDs

Given the fact the content/web developer people here hadn’t implemented one, and the urgency required to get this resolved, I quickly typed up a robots.txt file for them to upload… ran it through the nicely-provided-by-Google testing utility:

Allowed by line 5: Disallow:

But what about line 6 you stupid test tool? The one that says:

Disallow: /?affid=*

No amount of fiddling would get it to work! I tried and tried and tried. And then I tested a second URL, one I typed up myself, and not copied and pasted:

Blocked by line 6: Disallow: /?affid=*

And that is when it struck me… affid and affId are two completely, and utterly different things according to robots.txt… why? Because ROBOTS.TXT IS BLOODY CASE SENSITIVE!