Pretty URLs

Started by SMFHacks.com Team, January 31, 2007, 10:56:43 AM

Previous topic - Next topic

Nao 尚

As far as I know, it has the flag only on my version of SMF.
Basically, if you look at Display.php, it sets the flag when ".from" or ".new" is used. Not with ".msg". You can "fix" this by setting $context['robot_no_index'] = true; right after // Duplicate link!  Tell the robots not to link this.
What's even better, is to add, just before // Figure out all the link to the next/prev/first/last/etc. for wireless mainly., this code:

if (!empty($context['robot_no_index']) && function_exists('ob_googlebot_getAgent'))
if (ob_googlebot_getAgent($_SERVER['HTTP_USER_AGENT'], $spider_name, $agent))
redirectexit('topic=' . $topic . '.' . (($context['page_info']['current_page']-1) * $context['messages_per_page']), false, true);


Of course, in order for this to do something, you need to install the "Googlebot & Spiders" mod. Or you can use SMF 2.0's own spider detection system, but I'm not using it for now so I can't provide you with code for this. (I think it's $user_info['possibly_robot'])
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

bluegray

Quote from: viulian on January 12, 2008, 03:17:29 AM
Now, I knew that Google will actually drop the page ranking if content is duplicate, and it seems that on my site, after 4-5 months since running Pretty URL, from Google point of view, there are lots of duplicate content.

I am wondering if this is a real problem.. and if there are some solutions for it ?
You can also edit your robots.txt file in the root directory to include some of the following lines. This is mine for my forum. The last one is for those msg urls

User-agent: *
Disallow: /cgi-bin/
Disallow: /index.php?action=activate
Disallow: /index.php?action=admin
Disallow: /index.php?action=arcade
Disallow: /index.php?action=calendar
Disallow: /index.php?action=collapse
Disallow: /index.php?action=deletemsg
Disallow: /index.php?action=editpoll
Disallow: /index.php?action=help
Disallow: /index.php?action=helpadmin
Disallow: /index.php?action=lock
Disallow: /index.php?action=login
Disallow: /index.php?action=logout
Disallow: /index.php?action=markasread
Disallow: /index.php?action=mergetopics
Disallow: /index.php?action=mlist
Disallow: /index.php?action=modifykarma
Disallow: /index.php?action=movetopic
Disallow: /index.php?action=notify
Disallow: /index.php?action=notifyboard
Disallow: /index.php?action=pm
Disallow: /index.php?action=post
Disallow: /index.php?action=profile
Disallow: /index.php?action=register
Disallow: /index.php?action=removetopic2
Disallow: /index.php?action=reporttm
Disallow: /index.php?action=search
Disallow: /index.php?action=sendtopic
Disallow: /index.php?action=splittopics
Disallow: /index.php?action=stats
Disallow: /index.php?action=sticky
Disallow: /index.php?action=trackip
Disallow: /index.php?action=unread
Disallow: /index.php?action=unreadreplies
Disallow: /index.php?action=who
Disallow: /Themes/

Disallow: */msg

Nao 尚

#1062
Your robots.txt file will forbid Google from visit these pages, right...?
Then it will have more trouble indexing the latest posts.
My solution allows it to visit the latest pages, and be redirected to the page that should be indexed. It works well on my website. Try it (search for "site:nao.cyna.fr nao")
I would also recommend putting this somewhere in Post() in Post.php:

// Spiders should not be able to index a Post template
$context['robot_no_index'] = true;


This is because Google will index your answer page if guest posting is enabled. I just found out that my nao.cyna.fr website has this problem.
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

bluegray

Quote from: Nao 尚 on January 12, 2008, 10:08:26 AM
Your robots.txt file will forbid Google from visit these pages, right...?
Then it will have more trouble indexing the latest posts.
Yes, it will stop Google from visiting that page, but since it is duplicate content, it won't matter because it will still index your main unblocked page that you want to have indexed. I prefer this method, because GoogleBot won't have to follow all those duplicate links and redirections. So it should index your site much faster.

You do have a point about the latest posts though. With the robots.txt method, there will only be a new url for Google to index if the posts are enough to make a new page. The content of the pages will still change with each post though.
But I'm not sure if that would even make a difference, since Google will not necessarily read the new page if it is redirected from a new msg url, if that url is already in it's cache.
Will do some tests ;)

Nao 尚

Quote from: bluegray on January 12, 2008, 10:48:00 AM
Yes, it will stop Google from visiting that page, but since it is duplicate content, it won't matter because it will still index your main unblocked page that you want to have indexed.
I think Google is more likely to visit your homepage more often than the rest. So, if a .msg link there can't be visited, it won't update the page until it reaches it via the traditional browsing method. Which means new messages won't be indexed immediately. That's the way I see it, at least.

QuoteI prefer this method, because GoogleBot won't have to follow all those duplicate links and redirections.
Well, one thing that could be improved, is setting to "nofollow" flag on message links (i.e. the links that are found at the beginning of every message in a display page). Because, basically, these links lead Google to exactly the same page.

QuoteBut I'm not sure if that would even make a difference, since Google will not necessarily read the new page if it is redirected from a new msg url, if that url is already in it's cache.
Will do some tests ;)
I think it's worth it...
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

Nao 尚

Update: from what I can see (at least on SMF 2.0), SMF already adds "nofollow" to most of its .msg links.
It doesn't use $message['link'] though (which has nofollow), for individual posts I mean, it only uses a hardcoded link with $message['href'] as the target. I've modified this to add the rel="nofollow" flag.
Hopefully this will have an influence... Although Wikipedia says Nofollow shouldn't be used for search engines and doesn't work very well for telling spiders not to follow a link:
http://en.wikipedia.org/wiki/Nofollow
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

viulian

Thanks for the response guys.

Dannii - you are right. With the default theme, I get <meta name="robots" content="noindex" /> added in the page content, but with the narenciye theme I am using, it is not there.

I did not know to pay attention to these :) I have to investigate and add this the noindex thing to the other template. It didn't even cross my mind to check this. I hate it. Oh well...

Thanks for the updates, I'll also follow the other advices (robots.txt updates) seems to be much cleaner!

bluegray

I'm still having problems with the latest snapshot. Looks like there is some php error that causes it to exit. There is no error messages and now errors in the log and all I get is a blank page on all my forum urls. I have to revert to using 0.3 since its working fine...

I tried removing all modifications - no difference. I also tried the uninstall.php script.

Dannii

Anything in the php error log (not SMF's)?
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

bluegray

Not that I can see... where is the default location for the php error log. If it is the file error_log in the forum directory, there is no errors reported.

bluegray

I found the problem. Still no errors in the error log though.
The problem was with my theme when including code for http://www.crawltrack.fr/
It worked fine with the last version of pretty urls I used though (0.3). What could cause this incompatibility?

Dannii

What's the code you included?
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

bluegray

Sorry for the bother. I think I have it under control now. I upgraded to the latest version of crawltrack and moved the included code for crawltrack from the theme template to the index.php file as recommended on their site.

Looks like everything is working fine now - thanks for your patience ;)

Dannii

Hmm weird. But if it's working now, that's great :).
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

bluegray

It seems that not all the boards and topics are working. I still get blank pages for some boards and topics. I tried reinstalling and rebuilding the database (I used the uninstall.php on the google code site).

I get the following smf error.


Hacking attempt...

INSERT INTO `bluegray_smf`.smf_pretty_topic_urls
(ID_TOPIC, pretty_url)
VALUES (11, "the-locator-locates!-(danie-krugel)")
File: /root/forum/Sources/PrettyUrls-Filters.php
Line: 139

Dannii

"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

bluegray

yeah, there are some other values, but they are all similar to the one I posted. (13, "recommended-reading"), (143, "promises-promises") etc.

bluegray

Looks like it is line 316 in Subs.php
log_error('Hacking attempt...' . "\n" . $db_string, $file, $line);
that causes the log error.

Nao 尚

Hey there.
Dannii, as your website has been down for some time now (like a week), I'm forced to post here about this.

I'm officially announcing my retirement as a co-developer of PrettyURLs.
It just so happens that my work was never used in PrettyURLs, except maybe for a couple of cool ideas I had. I was not credited anywhere either. I constantly submitted my custom changes to Dannii, with many solutions to problems people encountered, but apparently it was never good enough.

Today, I finally upgraded my prod website to SMF 2.0 and am using my custom version of PrettyURLs to "emulate" my previous version's hardcoded URL prettifier. It seems to be working okay and I will probably add more functionality in the future, but I will not do this in a mod file any longer, I will directly hardcode it into my prod website, just as I do the rest of my custom programming. So it means it'll be harder for me to share my code, and as the code I previously shared was never put to use by Dannii, I think it's time to stop trying and call it quits.

Thanks to Dannii for sharing a lot of his good PHP coding practices, though. It was very helpful for me. And kudos to a great mod!
I will not make any deals with you. I've resigned. I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.

Aeva Media rocks your life.

Dannii

Oh, thats a shame, cause in the work I haven't commited yet I have both added you to the credits/licence file, and add in your UTF8 conversion code.

That said, you were always welcome to make changes yourself rather than offering them to me. Your suggestions were usually quite good, but just not in the part of the program I was thinking about at that time.

Anyways, thanks for your help!
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Advertisement: