Welcome to the JaguarPC Community
JaguarPC
Sales: (888) 338-5261
Support: (888)-551-3050
Results 1 to 4 of 4

This is a discussion on Googlebots and SID's in the Shared & Semi-Dedicated forum
I am having a problem with googlebots and session ID's in a phpBB install. Here is a typical request from the Latest Visitors screen: ~/forum/index.php?sid=591a5fac55d52ae258cdcc7d0a ...

  1. #1
    A geezer, with 1 foot in. Oldfrog's Avatar
    Join Date
    Apr 2004
    Posts
    204

    Googlebots and SID's

    I am having a problem with googlebots and session ID's in a phpBB install. Here is a typical request from the Latest Visitors screen:
    ~/forum/index.php?sid=591a5fac55d52ae258cdcc7d0a 316c91
    The problem is that the bots will continue to index the same page multiple times using a different SID each time. I have confirmed with Google that their bots don't "like" SID's and I was advised to disable them.

    I found a mod to the sessions.php file on the phpBB forums that is supposed to correct the problem but it hasn't done so entirely. Their site is down right now so I can't provide a link or go back for further info.

    Here is the user_agent for the offending bots:
    Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
    Here is the code for the mod that I installed in sessions.php (I added the text in red trying to add the user_agent shown above):
    function append_sid($url, $non_html_amp = false)
    {
    global $SID, $HTTP_SERVER_VARS;

    if ( !empty($SID) && !preg_match('#sid=#', $url) && !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Googlebot') &&
    !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)') &&
    !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'slurp@inktomi.com;'))
    {
    $url .= ( ( strpos($url, '?') != false ) ? ( ( $non_html_amp ) ? '&' : '&' ) : '?' ) . $SID;
    }

    return $url;
    }
    Does anyone else have this problem or any ideas on correcting it?
    Gravity, more than a good idea, it's the law!

  2. #2
    || $name ne 'R.Stiltskin'
    Join Date
    Jun 2003
    Location
    Tejas
    Posts
    2,438
    Have you reviewed the php subroutine strstr() to check its code? I don't use php but if it's a typical sub, then it (strstr()) may not like the format of your non-interpolated string - you know, regex expression and all. The space, parenthesis, semicolon, plus, and slash characters are all special and may need some extra protection to treat them as regular characters. Without knowing the parsing rules or expected format of input into the sub, then the input may be getting ignored or botched. I don't know if single quotes are adequate for the task.

  3. #3
    Community Leader jason's Avatar
    Join Date
    Sep 2001
    Location
    Rochester, NY
    Posts
    6,003
    strstr() is simply a string level function. It looks for the occurance of one string (the needle in PHP lingo) inside of another (the haystack), and if found it retuns everything from the first occurance of the needed to the end of the the haystack string. It doesn't work with regular expressions, so that's not the problem.

    I've had my head inside of PHP code all day and it is all starting to look the same now. When my eyes have had a rest for a while, I'll take another look and see what I can come up with.

    --Jason
    Jason Pitoniak
    Interbrite Communications
    www.interbrite.com www.kodiakskorner.com

  4. #4
    A geezer, with 1 foot in. Oldfrog's Avatar
    Join Date
    Apr 2004
    Posts
    204
    Thanks to both of you. If that is the case then the original code should be sufficient without my addition. Take your time, Jason, I appreciate it.
    Gravity, more than a good idea, it's the law!

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •