Welcome to the JaguarPC Community
JaguarPC
Sales: (888) 338-5261
Support: (888)-551-3050
Results 1 to 9 of 9

This is a discussion on Limiting Bad Bots/Scrapers in the VPS & Dedicated forum
I currently don't have any bot traps in place but I'm probably going to implement one shortly. What I was wondering is if there is ...

  1. #1
    Nearly 100% Pure Carbon thecoalman's Avatar
    Join Date
    Nov 2007
    Location
    Northeast Pennsylvania
    Posts
    529

    Limiting Bad Bots/Scrapers

    I currently don't have any bot traps in place but I'm probably going to implement one shortly. What I was wondering is if there is any simple way that is not going to consume a lot of resources to set limits on a single I.P. For example I had a recent incident where a scraper pulled about 1 gig of pages over a very short time. What I would like to do is limit access to abusive IP's that are making excessive requests.

    I have APF/BFD installed if that helps.

  2. #2
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,312
    I don't have to worry about that, I guess. When I got scraped big time by 1 IP addy, support just disabled my entire account.
    Good luck

  3. #3
    Nearly 100% Pure Carbon thecoalman's Avatar
    Join Date
    Nov 2007
    Location
    Northeast Pennsylvania
    Posts
    529
    Ron although having the site disabled is an option I was hoping for something that might leave it online. :P

  4. #4
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,312
    Imagine.
    Good luck

  5. #5
    Old Hillbilly Connie's Avatar
    Join Date
    Sep 2001
    Location
    Hills of Missouri
    Posts
    2,648
    I use bot trap on 3 sites. Catches a ton of bots. I believe you can download it at http://danielwebb.us/software/bot-trap/.

    Forum Moderators - Jag Staff

    Spam Whackers Blog - Dedicated to fighting Spam and providing General SEO Tips
    Organize your Kitchen or purchase Kitchen Accessories at Condells
    Ihelpyou Forum - Dedicated to "Best Practices" SEO

  6. #6
    Nearly 100% Pure Carbon thecoalman's Avatar
    Join Date
    Nov 2007
    Location
    Northeast Pennsylvania
    Posts
    529
    Thanks Connie, I think I looked at that before a few years ago when was researching before. Trouble there is your relying on the bot not adhering to robots.txt there's even annotation that some are now as I'm sure many webmasters have implemented the same thing.

    Having said that I can across another suggestion to solve that problem, serve up a dynamic robots.txt file and whitelist the SE's to serve them the real robots.txt . Everyone else gets all pages denied.

    The bot won't know what specific pages might be the trap.

    Possible issues with that is if they are using a forged user agent so I'd have to match the SE's IP's Arrrrrrrrrrgh!

    I want to make sure I'm only catching the bad guys which was why I was hoping for a simple server implementation that could simply block IP's that generate excessive traffic.
    Last edited by thecoalman; 03-28-2009 at 04:55 AM.

  7. #7
    the Windlord Gwaihir's Avatar
    Join Date
    Jun 2002
    Posts
    2,562
    Quote Originally Posted by thecoalman View Post
    a simple server implementation that could simply block IP's that generate excessive traffic.
    Then forget about the "bots" context and search the web again. That sort of firewalling surely exists and I think I've even seen some thread on this forum of people exchanging ideas on how to setup their VPS' firewall like that.

    Or, if you're on a shared server, just drop a note to support right away to ask what they can do for you, as they'll have to set it up for you anyway.
    Regards,

    Wim Heemskerk
    ---
    Visit MeCCG.net - Cardgaming in J.R.R. Tolkien's Middle-earth
    And Gwaihir.net - The Middle-earth CCG store

  8. #8
    Old Hillbilly Connie's Avatar
    Join Date
    Sep 2001
    Location
    Hills of Missouri
    Posts
    2,648
    I've been using a dynamic robots.txt for a couple of years. In fact I only allow 4 robots to spider my sites.

    I also use a white list in .htaccess that only allows certain browsers, user- agents and IPS.

    My white list would be more effective if I could get a reverse forward DNS set up. That would prebvent spoofing of the user agent.

    Additionally I block a long list of known user agents that are bad.
    I know there are better ways to do this. For me the few scripts I found that looked better were above my head in how to set them up.

    You have a choice. Right now you have no protection. This will start giving you some while your looking.

    Heres a list of IPs, with useragent, and country that I have blocked. The list is organized by class A block. I have about a 100 from the last few months that I have not added to it yet.

    If you look at the list you'll see that very few bad bots identify themselves as a bot. Most will identify themselves as a browser.

    In protecting your site from the rouge bots there is no on solution. You need multiple layers.

    Forum Moderators - Jag Staff

    Spam Whackers Blog - Dedicated to fighting Spam and providing General SEO Tips
    Organize your Kitchen or purchase Kitchen Accessories at Condells
    Ihelpyou Forum - Dedicated to "Best Practices" SEO

  9. #9
    Old Hillbilly Connie's Avatar
    Join Date
    Sep 2001
    Location
    Hills of Missouri
    Posts
    2,648
    Forgot to give you the link. http://www.spam-whackers.com/bad.bots.htm

    Forum Moderators - Jag Staff

    Spam Whackers Blog - Dedicated to fighting Spam and providing General SEO Tips
    Organize your Kitchen or purchase Kitchen Accessories at Condells
    Ihelpyou Forum - Dedicated to "Best Practices" SEO

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •