Welcome to the JaguarPC Community
JaguarPC
Sales: (888) 338-5261
Support: (888)-551-3050
Page 1 of 4 1234 LastLast
Results 1 to 15 of 57

This is a discussion on Ton of errors from googlebot in the Website Management forum
I have been getting a ton of errors in my error_log from a googlebot. Many are: File does not exist: /usr/local/apache/htdocs/501.shtml Others are trying to ...

  1. #1
    Voluntarily Retired gohighvoltage's Avatar
    Join Date
    Jan 2011
    Posts
    641

    Ton of errors from googlebot

    I have been getting a ton of errors in my error_log from a googlebot.

    Many are:

    File does not exist: /usr/local/apache/htdocs/501.shtml

    Others are trying to access my websites via https:// when there is no SSL,

    and looking for other files that don't exist either.


    Should I just create a:
    File does not exist: /usr/local/apache/htdocs/501.shtml


    The IP that is doing all the errors is:
    66.249.71.133

    Anyone else have this kind of issue?

    PS: Odd that google bots is searching the root directory of the VPS.

  2. #2
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,312
    Fix whatever is causing the 501 error or create a 501.shtml file to handle the error or remove the exception redirection and those errors will go away.
    Good luck

  3. #3
    Voluntarily Retired gohighvoltage's Avatar
    Join Date
    Jan 2011
    Posts
    641
    Hey Ron,

    I created a 501.shtml file by copying the 500.shtml and renaming it to 501.shtml. What would cause the 501 error? I am not familiar with this error. By the way the error in the log was, it looked like the googlebot was looking for the 501.shtml directly. Any ideas of what would cause a 501 error so I can check?

  4. #4
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,312
    A good hint or two should be in your APACHE recent visitors log and/or APACHE error log from CPanel.
    Good luck

  5. #5
    Voluntarily Retired gohighvoltage's Avatar
    Join Date
    Jan 2011
    Posts
    641
    ok, thanks. I will delete the 501.shtml file I created, so it allows the error again, and then I will see what the logs say. I don't remember anything else that would show where the error was coming from or what caused it.

    crazy errors like this:

    [Mon Oct 24 11:52:35 2011] [error] [client 58.218.199.147] File does not exist: /usr/local/apache/htdocs/me
    [Mon Oct 24 12:45:16 2011] [error] [client 66.249.71.133] Invalid method in request \x16\x03\x01
    [Mon Oct 24 12:45:16 2011] [error] [client 66.249.71.133] Invalid method in request \x80C\x01\x03\x01

  6. #6
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,312
    I think that's an error using https to a non ssl port. Looking at the error log, you'll probably see them alternating as the one causes the other immediately. Look at the timestamps.

    I wonder where google is getting the idea to do that. Have you tried searching google's index for your site for the file? Do you use google's webmaster tools?
    https://www.google.com/webmasters/tools/home?hl=en

    When you figure out the file it's trying to index and when you have a webmasters tools account you can delete the file from their index. Also robots.txt can be used to restrict google and all well-behave bots from looking for specific files and/or directories and/or directory structures.
    Last edited by Ron; 10-24-2011 at 01:19 PM.
    Good luck

  7. #7
    JPC Dream Team JPC-Sabrina's Avatar
    Join Date
    Aug 2011
    Posts
    346
    This is an excellent resource. You can improve the performance of your site when you have a good tool to give you more detailed information. I would advise all site owners to explore this option.
    Last edited by JPC-Sabrina; 10-24-2011 at 01:24 PM. Reason: Content Editing
    JPC-Sabrina / Public Relations
    sabrina@jaguarpc.com

    Sabrina/ Public Relations
    (email)

    Need a Manager?
    (pm) | (email) David, Customer Service Manager
    (pm) | (email) Masood, Chief Technical Officer
    (pm) | (email) Les, Chief Operations Officer

  8. #8
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,312
    What?
    Good luck

  9. #9
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,312
    What I mean by
    Quote Originally Posted by Ron View Post
    What?
    is that webmaster tools is not primarily a site performance resource. It's a place to interact with google to help tweak your site's presence in the google index as good as they allow.

    There are a couple of secondary things in there that speak directly to site performance, such as the amount of time it takes Google to download a page from your site and how long visitors to your site require to download and render a page.

    Such as my site's recent downturn then upturn in performance:

    ServerPerformance.png
    ServerPerformance2.png
    ServerPerformance3.png
    Good luck

  10. #10
    Voluntarily Retired gohighvoltage's Avatar
    Join Date
    Jan 2011
    Posts
    641
    Hi Ron.

    I have sitemaps on sites, and, I use google's webmaster to submit and track all my sitemaps. I also use google analytics. So I am not sure why it is searching for items that don't exist, and I am trying to figure out why the 501 errors still. LOL. Crazy!

  11. #11
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,312
    Having sitemaps does not prevent google from indexing pages that aren't in the sitemaps. Sitemaps provide a positive referral, and don't imply non-existence of URLs not included.

    The 501s are being caused by google looking for https pages on your non SSL port. For some reason it's using https instead of http. Why that is I don't know and I can't possibly enumerate all the possibilities I can think of, but a couple of possibilities include perhaps there are bad links on the web to your site; perhaps some vb plugin is exposing poorly constructed or accidentally constructed dynamic URLs.

    Use the google search engine and type

    Site:gohighvoltage.com me

    (or whatever specific site and file is being requested) and see what it shows. See what crazy things are in the index and remove them using the url removal tool
    Then use robots.txt to stop the crawler from looking for things that it should be looking for
    I don't know how else to put this info.
    Last edited by Ron; 10-24-2011 at 01:55 PM.
    Good luck

  12. #12
    Voluntarily Retired gohighvoltage's Avatar
    Join Date
    Jan 2011
    Posts
    641
    Great info Ron! I will work on that. It has been crazy trying to pinpoint it.

  13. #13
    Voluntarily Retired gohighvoltage's Avatar
    Join Date
    Jan 2011
    Posts
    641
    Ron, you are a genius. That is exactly what was causing the 501 error codes!! It keeps looking for a https://

    Oddly enough, long time ago, I had an ssl for my website, cause it was the cheapest way to get a dedicated IP on a shared hosting with another vendor. Well, I never used the https: pages except for my contact page. Since I have gotten a VPS, I have a dedictated IP, and don't need the ssl.

    This has to be the issue of why they are still looking for it.


    Now, I have to figure out how to forward these requests, or block them form using https://

  14. #14
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,312
    On second thought, I don't like their solution, waaaay too dangerous.

    How do I write my robots.txt to exclude SSL pages only?

    It is using an internal redirect (one the visiting bot knows nothing about) to exclude things sitewide. Even if this works as intended today (and I don't know that it would) there's no guarantee that Google will continue to differentiate URLS based on protocol properly.
    Last edited by Ron; 10-24-2011 at 02:11 PM.
    Good luck

  15. #15
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,312
    I'm not sure how to handle this problem. I don't think the above solution would even work... I think the connection would fail prior to getting to .htaccess.

    I would seriously consider just allowing the invalid protocols to be rejected and the errors handled by the 501 page.
    Good luck

Page 1 of 4 1234 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •