If guest is disabled than what about google bot?

Started by alex30, September 06, 2008, 11:42:26 PM

Previous topic - Next topic

alex30

Forum that I installed about 3 weeks ago isn't still indexed by google. Now I started to think maybe it's because I protected all boards from viewing.

Need your knowledge about this guys, if its a problem than I'll need to enable all boards for guests so the forum can be crawled by google bots.

Yahoo indexed, MSN indexed, google doesn't want yet. Thanks for your replies.

ccbtimewiz

All the bots are technically guest accounts in SMF 1.1.x. Meaning, if you disable for guests, you disable for bots as well.

AJ32

Also, I believe it would be against Google's T.O.S. to have bots viewing what guests can't. ;)

alex30

Than how yahoo, msn indexed my forum and I can see their bots' presence?

AJ32

Are they actually indexing your topics, or just the main page?

ccbtimewiz

Perhaps those sites have indexed your site previously or are working on a cache system. Or they are simply indexing your topic titles via the index.

Quote from: AJ32 on September 06, 2008, 11:44:32 PM
Also, I believe it would be against Google's T.O.S. to have bots viewing what guests can't. ;)

Not too sure about that, actually.

AJ32

QuoteNot too sure about that, actually.

Not sure either, I just thought I read it somewhere. :-\

alex30

So guys, what's your advice?

I used a mod for protection topics and not sure what to do because I need to disable guests.

alex30

I got indexed website but not a forum and it's a problem.

AJ32

I'm not sure what you sould do, But why do you want Google to index it, and guests can't view it? If Google can index it, it will 'cache' the page, and (even if guest viewing is off), guests will be able to see Google's cached page anyways.

karlbenson

AJ32 is correct, it is against Google and most other search engines terms of service to show content to spiders that is not viewable by normal guests.

If you get caught employing techniques, you will get either sandboxed or permanently banned from the search engine indexes.

alex30

Well it's all about registration, most of the users don't register, just viewing forums.

Well google has to have my website like anybody's else and I need to find some solution for these guys?

QuoteIf Google can index it, it will 'cache' the page, and (even if guest viewing is off), guests will be able to see Google's cached page anyways.

So, you say that google will index forum anyway even if guests are not allowed to view? Should I open forum for a while, to get indexed and than again protect it?

karlbenson

If you site is open for a time, google will index it.
When you make the forum private again it will remove those pages again.

There are a couple of solutions.
1. Limit ALL guests daily pageviews eg 10 pageviews per day.
(This will effect spiders aswell [they must have the exact same limit]. But at least search engines can index some pages)
http://custom.simplemachines.org/mods/index.php?mod=1290

2. Look But No Read
Guests can read the board index but they can't read the topics themselves
(This will effect spiders aswell.  Search engines will be able to index your boardindex, message index, but not actual topics]
http://custom.simplemachines.org/mods/index.php?mod=1332

AJ32

Quote from: alex30 on September 08, 2008, 07:52:36 AM
QuoteIf Google can index it, it will 'cache' the page, and (even if guest viewing is off), guests will be able to see Google's cached page anyways.

So, you say that google will index forum anyway even if guests are not allowed to view?

No, IF Google can index it whilst guests can't - but as karlbenson already confirmed, that's against their TOS.

tienf

But it would be better if we would have option for this issue, right ? Have anyone have some modification in code to get bots passed the wall ?

Arantor

Very bad idea. If Google can index content but guests cannot see it, that's usually considered a form of cloaking.

In any case, people that realise that's the case can reasonably easily change their browser's User Agent to appear to be Google.
Holder of controversial views, all of which my own.


tienf

Yes  ;), if they are that smart, we wouldn't stand up with a simple wall like that ;)
Anyway, I have just code a little lines to allow bots, spiders, for who like it


// Sanitize - cast as integer / prevent undefined index - especially after initial install of mod
$modSettings['limited_views'] = empty($modSettings['limited_views']) ? 0 : (int)$modSettings['limited_views'];


//  MODIFICATION GOES HERE
//  Allow common spiders/bots to index without views limitation
$interestingCrawlers = array(
'Teoma',   
'alexa',
'froogle',
'inktomi',
'looksmart',
'URL_Spider_SQL',
'Firefly',
'NationalDirectory',
'Ask Jeeves',
'TECNOSEEK',
'InfoSeek',
'WebFindBot',
'girafabot',
'crawler',
'www.galaxy.com',
'Googlebot',
'MSNBot',
'Scooter',
'Slurp',
'appie',
'FAST',
'WebBug',
'Spade',
'ZyBorg',
'rabaz',
'bot',
'spider');
$pattern = '/(' . implode('|', $interestingCrawlers) .')/';
$matches = array();
$numMatches = preg_match($pattern, strtolower($_SERVER['HTTP_USER_AGENT']), $matches, 'i');
if($numMatches > 0) // Found a match
{
if(isset($_SESSION['limited_views']))
unset($_SESSION['limited_views']);
return;
}

Arantor

In the case of cloaking, Google can remove your site from their index entirely. Though that hack would solve the immediate situation for allowing Google but not guests.
Holder of controversial views, all of which my own.


Advertisement: