Stopping web crawling bots from causing errors in BlogCFC
So I have been using BlogCFC for 970 days, and I love it. But one problem I have had since the beginning is when my site gets hammered with web crawler, I get a ton of errors. They usually hit between 2:00 AM and 6:00 AM and crawl my blog looking for new content. I appreciate what they do, but sometimes they can be VERY aggressive and start to cause timeout errors.
The result is that I wake up to dozens, or hundreds, of error emails and, very rarely, a crashed ColdFusion application server. Since I am on an Awesome VPS, I rarely have problems with the crashing, even less so since I upgraded the JVM from the CF8 default. But I would rather not have my server brought to its knees every morning by bots. Especially since I know that my worshippers from across the pond are just arriving at work and desire nothing more than to see if I have anything new to say.
So, finally, after 3 years, I decided to look into this problem. I've noticed that more often than not, the timeout errors are occurring when the web crawler tries to hit the "print" link on every post. So I said to myself, "Self, do web crawlers need to index my 'print' page?"



