Welcome to the JaguarPC Community
JaguarPC
Sales: (888) 338-5261
Support: (888)-551-3050
Results 1 to 3 of 3

This is a discussion on Spammed by Googlebot in the Shared & Semi-Dedicated forum
Since a few days I'm attacked (again!) by Googlebot, about 300 meg of data transfer per day. I use "Crawl-delay: 15" but this seems to ...

  1. #1
    JPC Member
    Join Date
    May 2006
    Location
    Belgium
    Posts
    20

    Spammed by Googlebot

    Since a few days I'm attacked (again!) by Googlebot, about 300 meg of data transfer per day.
    I use "Crawl-delay: 15" but this seems to have no effect?
    Should I increase to 60 or do I have other options than banning them (I do not like to)??


  2. #2
    Ron
    Ron is online now
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,307
    Ensure that your URLs lead to unique data, in other words make sure that you can see each page through only 1 URL. Make sure that you don't allow both "WWW" and non-WWW URLs. Things like that. Not only will that help reduce unnecessary indexing by G, but is also good in terms of avoiding duplicate content penalties.

    300MB isn't a huge amount per day if you've got a sizeable site, it's about 9GB a month. Usually, I find that googlebot does a deep index fairly infrequently, so it shouldn't be everyday anyway.

    I can't recall if Googlebot obeys a crawl-delay; Even if it did, a 15 second delay would allow it to access 5,760 pages a day. If each page is about 53K, there's your 300MB. If you slow it down to 60 seconds (and it is obeying the directive) you'd reduce the number of possible pages by 75% or perhaps 75MB.

    Personally, I find Google to be fairly well behaved compared to some other bots out there.

    Good luck!

  3. #3
    JPC Member
    Join Date
    May 2006
    Location
    Belgium
    Posts
    20
    Quote Originally Posted by Ron
    Ensure that your URLs lead to unique data, in other words make sure that you can see each page through only 1 URL. Make sure that you don't allow both "WWW" and non-WWW URLs. Things like that. Not only will that help reduce unnecessary indexing by G, but is also good in terms of avoiding duplicate content penalties.

    300MB isn't a huge amount per day if you've got a sizeable site, it's about 9GB a month. Usually, I find that googlebot does a deep index fairly infrequently, so it shouldn't be everyday anyway.

    I can't recall if Googlebot obeys a crawl-delay; Even if it did, a 15 second delay would allow it to access 5,760 pages a day. If each page is about 53K, there's your 300MB. If you slow it down to 60 seconds (and it is obeying the directive) you'd reduce the number of possible pages by 75% or perhaps 75MB.

    Personally, I find Google to be fairly well behaved compared to some other bots out there.

    Good luck!

    Thanks a lot!

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •