If I apply a mod that prevents guests from seeing replies (where they have to register first), won't this prevent google bots from crawling my pages properly. What do i have to do so that Google bots have entry into everything?
Here is the mod I'm referring to: http://custom.simplemachines.org/mods/index.php?mod=2082
Moreover, is there a mod that shows the first 128 characters of a reply and then mentions that guests must register?
What is the point of blocking posts for guests, but allowing Google (a guest itself) to index the full contents? All someone would have to do is google the topic and they would be able to read everything in the cached copy. To let Google index the full pages, you would have to modify SMF to add a new class of user: "bot", in addition to "guest" and "logged in user", and detect bots (based on USER_AGENT) and treat them like a "logged in user".
I haven't seen any mods with a "teaser" function, but maybe it could be modified from the mod you mentioned. Instead of suppressing all replies, you would cut off replies at 128 characters or less (I would suggest cutting off at half way or 128 characters, whichever is shorter, and cut at a word, not in the middle of a word). Actually, come to think of it, a general-purpose teaser($text, $max_length, $fraction) function would be harder to write than it sounds. You would have to ensure that all BBCode and HTML tags are ignored in the character count, and are properly closed (nothing left dangling or incomplete).
So I would have to create a new member group called "bot." How would I configure this group? :(
You're right about the google caching but not every user knows about that, maybe 1 out of 5 do - those who are tech-savvy.
So I'm guessing the 128 character mod would be difficult to write. I have no idea how to do such a thing.
It wouldn't be a member group, since those apply only to logged-in members. You would have to modify SMF so instead of looking at just "guest" (not logged in) and "logged in member", there would be a third group, "bots" (or at least, googlebot). You would have to determine that a guest (not logged in) is actually a bot (certain USER_AGENTs, I would suppose) and treat them more like a logged-in member than a guest. Of course, you wouldn't want them to make posts, etc. I would think it would be fairly complicated, but maybe someone out there has actually done something along those lines.
If you want to rely on "security through obscurity", that's your business. After all, every Google entry lists "view cached page" -- that's an engraved invitation to go around all your work and view the full replies. If your readers aren't too tech-savvy, and there's no real cost to having someone read the full text through Google, maybe that would work for you. Perhaps providing just a "teaser" to all guests, including search engines, would be enough to accomplish what you want -- prodding readers to sign up?
A general teaser() function, if one hasn't already been written (anyone?), would probably first strip all BBCode and HTML tags from "$text" to get the character count, and decide where to split on the smaller of "$fraction*length($stripped_text)" and "$max_length". If this is done after parse_bbc() has converted BBCode to HTML, then you would only have to worry about HTML tags. Then it would move left if it's in the middle of a word, so that the cut is made after a full word. Then the fun starts. You would have to go through the (original) $text character by character, keeping track (on a stack) of what (BBCode and) HTML tags are active at any time. When you reach the cutoff point, you have to unwind the stack, adding the appropriate closing tags so that the text is properly balanced and all the post's tags are completely closed. If the original tags were incorrectly nested, you're probably no worse off than if the whole text were shown. It's not trivial, but maybe someone has already done most of it?