Simple Machines Community Forum

SMF Support => SMF 1.1.x Support => Aiheen aloitti: frostipuff - kesäkuu 22, 2011, 12:01:30 IP

Otsikko: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: frostipuff - kesäkuu 22, 2011, 12:01:30 IP
DISCLAIMER: I am not a webmaster.  I just run two forums that, until now, I haven't needed to know too much about the internals because SMF does such an excellent job of letting me administer the forums through the UI.

I have two SMF forums (both on v1.1.14):

My robots.txt file resides at /public_html.

I've researched the Print Page issue on this forum, and I got some help from K@ yesterday, and I know that edits to the robots.txt file can take a while to appear in search results, but I am seeing MORE "guests"--as many as a dozen at any one time in Who's Online--indexing Print Page links. 

I should note that while all these new bots/spiders are wreaking search havoc on my public forum, I no longer see Google and Yahoo IPs indexing the Print Page links. So it seems my changes are at least partially working.

Here's a current copy of my robots.txt file. I'd be grateful if any of you can point out problems with it. For example, is it overkill?

LainaaUser-agent: *
Disallow: /mypublicdomain.com/forum/index.php?action=activate
Disallow: /mypublicdomain.com/forum/index.php?action=admin
Disallow: /mypublicdomain.com/forum/index.php?action=arcade
Disallow: /mypublicdomain.com/forum/index.php?action=calendar
Disallow: /mypublicdomain.com/forum/index.php?action=collapse
Disallow: /mypublicdomain.com/forum/index.php?action=deletemsg
Disallow: /mypublicdomain.com/forum/index.php?action=editpoll
Disallow: /mypublicdomain.com/forum/index.php?action=help
Disallow: /mypublicdomain.com/forum/index.php?action=helpadmin
Disallow: /mypublicdomain.com/forum/index.php?action=lock
Disallow: /mypublicdomain.com/forum/index.php?action=login
Disallow: /mypublicdomain.com/forum/index.php?action=logout
Disallow: /mypublicdomain.com/forum/index.php?action=markasread
Disallow: /mypublicdomain.com/forum/index.php?action=mergetopics
Disallow: /mypublicdomain.com/forum/index.php?action=mlist
Disallow: /mypublicdomain.com/forum/index.php?action=modifykarma
Disallow: /mypublicdomain.com/forum/index.php?action=movetopic
Disallow: /mypublicdomain.com/forum/index.php?action=notify
Disallow: /mypublicdomain.com/forum/index.php?action=notifyboard
Disallow: /mypublicdomain.com/forum/index.php?action=pm
Disallow: /mypublicdomain.com/forum/index.php?action=post
Disallow: /mypublicdomain.com/forum/index.php?action=printpage
Disallow: /mypublicdomain.com/forum/index.php?action=profile
Disallow: /mypublicdomain.com/forum/index.php?action=profile;area=showposts;u=*
Disallow: /mypublicdomain.com/forum/index.php?action=profile;area=showposts;sa=attach;u=*
Disallow: /mypublicdomain.com/forum/index.php?action=register
Disallow: /mypublicdomain.com/forum/index.php?action=removetopic2
Disallow: /mypublicdomain.com/forum/index.php?action=reporttm
Disallow: /mypublicdomain.com/forum/index.php?action=search
Disallow: /mypublicdomain.com/forum/index.php?action=sendtopic
Disallow: /mypublicdomain.com/forum/index.php?action=splittopics
Disallow: /mypublicdomain.com/forum/index.php?action=stats
Disallow: /mypublicdomain.com/forum/index.php?action=sticky
Disallow: /mypublicdomain.com/forum/index.php?action=trackip
Disallow: /mypublicdomain.com/forum/index.php?action=unread
Disallow: /mypublicdomain.com/forum/index.php?action=unreadreplies
Disallow: /mypublicdomain.com/forum/index.php?wap2
Disallow: /mypublicdomain.com/forum/index.php?action=who
Disallow: mypublicdomain.com/forum/attachments/
Disallow: /mypublicdomain.com/forum/avatars/
Disallow: /mypublicdomain.com/forum/Packages/
Disallow: /mypublicdomain.com/forum/Smileys/
Disallow: /mypublicdomain.com/forum/Sources/
Disallow: /mypublicdomain.com/forum/Themes/
Disallow: /mypublicdomain.com/forum/*.msg

Has anyone had any luck inserting: Disallow: /print/

And could the increase in indexed print pages be correlated to the increase in comment spammers flooding SMF forums (such that I have had to make registration manual and will install a honey pot this weekend)?

I'm getting frustrated with trying different things, because no matter what tweaks I make to this file, I end up seeing an increase in indexed print topics.
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: frostipuff - kesäkuu 22, 2011, 12:02:04 IP
Oh, and this post (http://www.simplemachines.org/community/index.php?topic=389687.20) says:

Lainaus käyttäjältä: Once Upon A Star - helmikuu 27, 2011, 01:06:39 IP
Just inside the function that's in Printpage.php, just add is_not_guest();

That looks promising, but where does this code fragment go? Do I put it in the first SQL query and add a WHERE clause filter like{

WHERE m.ID_TOPIC = $topic
AND is_not_guest();


If yes, isn't part of the predicate missing? That is, <something> AND is_not_guest() ?

If that's not where it goes, can someone please tell me I insert this code fragment.

Thanks to K@ yesterday, I was directed to the NOINDEX META tag (http://www.robotstxt.org/meta.html), but that instructs me to put it on every page on my site. Really? I'm not running a traditional HTML-based web site, so where does this tag go?

The info at http://www.robotstxt.org/meta.html says to put a nofollow tag in head, but I don't have HTML files, I have php files that I can edit.  As for rel="nofollow", it would seem that Google (and other?) robots could still follow the links and index the print pages.

Finally, there seems to be a lot conflicting info in the various discussions about getting spiders to stop indexing the print pages. People say Use this AND this. Or use this BUT NOT that.

Frankly, I'm more confused than ever, really don't want to spend hours and hours on what should seemingly be a simple tweak, and my forum content is turning up in searches with useless hits.
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: kat - kesäkuu 22, 2011, 02:46:52 IP
Just thinking out-loud, here...

How about removing the "Print" button?

That'll stop 'em. ;)
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: frostipuff - kesäkuu 22, 2011, 03:12:58 IP
Yes, but I use Print to archive long posts as PDF.

But wait. OK now I am thinking out loud.

*pretend I just ran off to do some testing*

Assume I know the display page link; for example:

http://www.mypublicdomain.com/forum/index.php/topic,628.0.html

I click the Print link and get this:

http://www.mypublicdomain.com/forum/index.php?action=printpage;topic=628.0

All anyone registered as a member needs is the static bit in the URL (for the print action) and the topic ID:

?action=printpage;topic=<float>

Cool.

Now I just need to search this forum on how to remove the PRINT functionality.


Thanks for thinking out loud, K@. 




Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: kat - kesäkuu 22, 2011, 04:37:38 IP
It's dead easy!

If you attach your theme's index.template.php, we can sort it in seconds!

(Actually, having just checked that out, it may take a bit longer...)
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: kat - kesäkuu 22, 2011, 04:45:19 IP
If you go to the theme's directory and edit Display.template.php, find this line:

'print' => array('text' => 465, 'image' => 'print.gif', 'lang' => true, 'custom' => 'target="_blank"', 'url' => $scripturl . '?action=printpage;topic=' . $context['current_topic'] . '.0'),

and delete it.

(Read my sig, first)
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: frostipuff - kesäkuu 22, 2011, 04:51:28 IP
I'm a technical writer. I live and die by backups. ;)

Thanks K@. I'll make the edit and then close this topic after.

Oh, one more dumb question. do I have to make this change in every theme? Or just the default, the ones that guests see?
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: kat - kesäkuu 22, 2011, 04:58:43 IP
Every theme that has that file in. Unless it's just to stop guests, in which case you'll just need to just do your forum's default theme.

Be aware, though, that not all themes may have that file.

If they don't, they'll be using the file from the SMF default theme.
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: frostipuff - kesäkuu 22, 2011, 05:11:51 IP
I see Default.template.php in all themes except for my default (English Steel).

If I understand you correctly, if I only care about preventing guests from accessing the Print Page links, I only need to edit

public_html/mypublivdomain.com/forum/Themes/default/Display.template.php

or also
/babylon
/classic

I liked the way English Steel looked, so I made it the forum default. Is it a problem that my forum's default doesn't have the Display template?

If not, I'll just modify the one in the /default directory and see how it goes.

If yes, as a workaround, I could set the forum default theme to SMF default and then manually change each user's theme back to English Steel. New users would see SMF as their default, but that's OK.
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: frostipuff - kesäkuu 22, 2011, 05:13:04 IP
Also, I wrote a bunch of how-to procedures for users (add an avatar, change themes, change smileys, insert images, blah blah blah).

That stuff is all visible to guests.

Should I move it to a members-only section of the forum?
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: kat - kesäkuu 22, 2011, 05:14:06 IP
What you could do, is copy the file from the default theme, into Blue Steel and edit that.
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: frostipuff - kesäkuu 22, 2011, 05:23:13 IP
Sounds like a plan. I just downloaded a copy and will rename it, edit it, and ftp it to the English Steel folder.

I'm not much of a PHP scripter, so I wasn't sure if adding that file to English Steel would mess anything up, and my work deadlines are taking up too much of my fun time.

I'll give it a go and see what happens.
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: frostipuff - kesäkuu 22, 2011, 05:27:37 IP
Wee, the PRINT button is gone.

Thanks K@. (clever alias)

If it's OK, I'll leave this topic open until I am satisfied it did the trick with the evil robots.
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: frostipuff - kesäkuu 22, 2011, 07:03:57 IP
Grrr, I have brand new guests (new IPs--I am actually tracking them now) who are printing.

Does this mean I should edit that line out of all the Display.template.php files?
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: Illori - kesäkuu 22, 2011, 07:28:18 IP
it is possible to still go to that url, unless you get a permission for printing people/bots will still be able to print pages.
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: frostipuff - kesäkuu 22, 2011, 09:14:03 IP
I've never seen anything like this. I have 25 guests parked on my site now, more than half of them "printing."

My guest permissions are pretty stringent.  They can't do anything but view posts, but I don't even see any Print Page option (which I have since removed from the template).

I just enabled the "option to deny permissions" in Admin > Permissions > Settings. Unfortunately, the ? mouseover is cryptic, and that part of the SMF manual is not much more informative:

LainaaEnable advanced by-board permissions - With this unchecked, the Permissions by Board page of the Permissions section will be made real simple. Locally setting permissions will be restricted to just a few key commonly set permissions. You can make a Board: read only, disallow polls, or reply only. If you want to alter different permissions, you will want this checked. Having this checked will allow you to edit every permission for each local permission set Board for every Membergroup.

How? I don't need to restrict by board, since most of the forum is read-only to unregistered users. I want to stop them from indexing the Print Page thingamajig.

I've gone in to each of those pages and nothing jumps out at me, such as where I can deny certain permissions for membergroup guests.

Do I need to revoke print permissions from phpMySQL?
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: kat - kesäkuu 23, 2011, 05:53:35 AP
Maybe these people know about the printing thing and actually manually type the "action=printpage" thing.


THAT will show up in the log. BUT, it doesn't mean they can actually print anything, by doing that.


Of course, there's nothing to stop them from using their browser's "Print" facility, if they really DO want to print anything. So, even if that permission doesn't exist, they can still print your pages, if they want to.

I really wouldn't worry, Frosty.
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: frostipuff - kesäkuu 23, 2011, 08:19:11 AP
Oh, I don't give a diddly if they want to print topics. What I am trying to stop is the Print Page results ending up on Google. Those results are useless because they don't take the browser anywhere, unlike a display page result, which brings the browser straight to that topic in my forum.

I'm trying to improve my forum's search ratings, nothing more. So a Print Page result, like this one:

http://www.theeverydaybeauty.com/forum/index.php?action=printpage;topic=57.0

is poop.
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: kat - kesäkuu 23, 2011, 08:32:58 AP
I wonder if it might be worth contacting Google....?
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: frostipuff - kesäkuu 23, 2011, 02:47:16 IP
Maybe. I think their Webmaster's Tools area has a help forum.

Meanwhile, I am starting to see a decrease in the print activity, so maybe removing the button was what worked the best.

I'll mark this solved, since we've really beaten it to death, and thanks much for the help.
Otsikko: Re: Help! Print Page indexing getting worse, not better with robots.txt mods
Kirjoitti: kat - kesäkuu 23, 2011, 02:52:47 IP
Always a pleasure!

Just a shame we couldn't totally nail it, really. :(