News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

mod_rewrite 301

Started by destalk, December 11, 2005, 12:48:01 PM

Previous topic - Next topic

destalk

Hi

I want to switch on SEO frindly URLs on one of my forums, which works fine on my server. But then I will be left with thousands of spidered 'dynamic' URLs.

So, does anyone know how to do mod_rewrite in the .htaccess file (or elsewhere), so that I can throw a 301 redirect from all the old dynamic urls to the new SEO friendly ones? Obviously, I don't want to have to write a separate redirect for for each url.  :-\

Any help, much appreciated.  :) 

JayBachatero

#1
Follow me on Twitter

"HELP!!! I've fallen and I can't get up"
This moment has been brought to you by LifeAlert

destalk

Quote from: JayBachatero on December 11, 2005, 10:16:25 PM
It does this automatically.  If you have http://www.simplemachines.org/community/index.php?topic=60194.0 it will go to http://www.simplemachines.org/community/index.php/topic,60194.html.

Hi JayBachatero

Thanks for that . But I'm sorry, that's not quite accurate. Or, at least, it's not what I was after.

What you are talking about is the way that URLs are displayed. When the SEO option is switched on, the displayed URLs are changed  - e.g. http://www.simplemachines.org/community/index.php/topic,60194.html. Which is great.

But if someone navigates to your site via the old dynamic URL (because it is listed somewhere as that) then it still displays that URL - e.g. http://www.simplemachines.org/community/index.php?topic=60194.0.

Because of that search engines will continue to display the old, incorrect, URL because the header code says that it is 200 OK. What needs to be done is for the old URL to throw a 301 http message in the header and redirect to the new SEO friendly URL, so that the search engines know that the new urls are being used. Otherwise they will just think that it is duplicate content.

I hope I am being clear. :)

JayBachatero

Ok got you now.  I'm afraid that I can't help you with this but I'm sure will will get your answer from someone else here.

-JayBachatero
Follow me on Twitter

"HELP!!! I've fallen and I can't get up"
This moment has been brought to you by LifeAlert

destalk


destalk

Actually, I've just discovered this external forum topic specifically about mod_rewite and SMF.

http://www.doriat.com/viewtopic.php?p=648#648

I haven't tried the solution yet, but I'll report back if it works, in case anyone is interested.

If anyone has a simpler solution, I'm still interested. :)

Oldiesmann

#6
That might work, but there are a couple of problems with that code:

1. It doesn't handle the start parameter (the part after the . that tells SMF what page or message we're viewing), so an attempt to access a specific page of a board or topic or a specific post within a topic would result in being redirected to the first page instead of the actual location.

2. It creates additional search engine friendly URLs for the profile and the search, which might not work properly and would result in extra redirection since SMF doesn't output the URLs this way.

This topic came up on another board a couple months ago. Here's the soultion I came up with (this one works both ways so if you ever decide to disable search engine friendly URLs, it will redirect the old URLs back to the correct ones).

index.php

Find
// Check if compressed output is enabled, supported, and not already being done.

Add before that:
if(empty($_REQUEST['action']) && (isset($_REQUEST['board']) || isset($_REQUEST['topic'])))
{
if(empty($modSettings['queryless_urls']))
{
// This is surprisingly simple... Figure out whether it's a board or a topic, and replace a few characters
if(strpos(strtolower($_SERVER['REQUEST_URI']), '/board,'))
{
// We're really only interested in what follows "/board,"...
$string = substr($_SERVER['REQUEST_URI'], strpos(strtolower($_SERVER['REQUEST_URI']), '/board,') + 7);

// This is really quite simple - replace every "/" with a ";", every "," with an "=" and get rid of ".html"...
$location = $scripturl . '?board=' . str_replace(array('/', ',', '.html'), array(';', '=', ''), $string);

header('HTTP/1.1 301 Moved Permanently');
header('Location: ' . $location);
}
elseif(strpos(strtolower($_SERVER['REQUEST_URI']), '/topic,'))
{
// We only need what's after "/topic,"...
$string = substr($_SERVER['REQUEST_URI'], strpost(strtolower($_SERVER['REQUEST_URI']), '/topic,') + 7);

// Again, just replace slashes with semicolons, commas with equal signs and get rid of the .html...
$location = $scripturl . '?topic=' . str_replace(array('/', ',', '.html'), array(';', '=', ''), $string);

header('HTTP/1.1 301 Moved Permanently');
header('Location: ' . $location);
}
}
// Only do this if we're just viewing a board or a topic - "board=" or "topic=" could be there in other situations as well...
else
{
// Still just a simple matter of replacing things, although a bit more work is required for topics...
if(strpos(strtolower($_SERVER['REQUEST_URI']), 'board='))
{
// Get whatever follows the "board="
$string = substr($_SERVER['REQUEST_URI'], strpos(strtolower($_SERVER['REQUEST_URI']), 'board=') + 6);

// Reverse of what we did above - replace semicolons with slashes, and equal signs with commas.
str_replace(array(';', '='), array('/', ','), $string);

// Don't forget the .html...
$string .= '.html';

$location = $scripturl . '/board,' . $string;

header('HTTP/1.1 301 Moved Permanently');
header('Location: ' . $location);
}
elseif(strpos(strtolower($_SERVER['REQUEST_URI']), 'topic='))
{
// Get whatever follows the "topic="
$string = substr($_SERVER['REQUEST_URI'], strpos(strtolower($_SERVER['REQUEST_URI']), 'topic=') + 6);

// Split off the anchor string from the rest of it...
if(strpos($string, '#'))
{
// Where does the anchor string start?
$pos = strpos($string, '#');

// Isolate the anchor part from the rest of it
$anchorstring = substr($string, $pos);

// Now we just drop that part from the string...
str_replace($anchorstring, '', $string);
}
else
{
$anchorstring = '';
}

// Replace again
str_replace(array(';', '='), array('/', ','), $string);

// Add the .html
$string .= '.html';

$location = $scripturl . '/topic,' . $string . $anchorstring;

// Redirect
header('HTTP/1.1 301 Moved Permanently');
header('Location: ' . $location);
}
}
}
Michael Eshom
Christian Metal Fans

JayBachatero

Moved it to the [[Tips and tricks]] board.
Follow me on Twitter

"HELP!!! I've fallen and I can't get up"
This moment has been brought to you by LifeAlert

destalk

#8
Quote from: Oldiesmann on December 14, 2005, 02:43:35 PM
That might work, but there are a couple of problems with that code:

Thanks Oldiesmann, that's great and seems like a much more elegant solution.

A couple of questions, if you don't mind?

I've noticed that when SE friendly URLs are enabled in SMF, that the pull down menus still point to the default dynamic PHP URLs. Will your solution also redirect the urls generated by the drop-down menus? Although, perhaps, a way to also make the pull down menus generate SE friendly URLs would be a better solution?

Quote1. It doesn't handle the start parameter (the part after the . that tells SMF what page or message we're viewing), so an attempt to access a specific page of a board or topic or a specific post within a topic would result in being redirected to the first page instead of the actual location.

I suspect that this is intentional on the part of the author, to avoid duplication of content. I think that the idea is that it ensures that the search engines only index one instance of a topic (the beginning, as you said), rather than whatever point in the discussion the SE spider happened to pick up (which could well be an anchor point in the middle of a thread). From that point of view, it's quite clever.

I agree that the search and profiles don't really need to be SE friendly, though.

I'll try your solution though and report back. Will it work in all versions of SMF (I'm using the Beta 1.1.

Thanks JayBachatero, for moving this to Tips and Tricks. And thanks again to both of you for the help. :)

destalk

#9
Hi Oldiesmann

I am getting the following error when I paste that code in (this is with SEO friendly URLs switched off);

Warning: Unexpected character in input: ' in /home/domain/public_html/index.php on line 183


If I switch SE Friendly URLs on, the errors look like this;

Warning: Unexpected character in input: ' in /home/domain/public_html/index.php on line 183

Notice: Array to string conversion in /home/domain/public_html/index.php on line 171

Warning: Cannot modify header information - headers already sent by (output started at /home/domain/public_html/index.php:183) in /home/domain/public_html/index.php on line 179

Warning: Cannot modify header information - headers already sent by (output started at /home/domain/public_html/index.php:183) in /home/domain/public_html/index.php on line 180

Oldiesmann

Whoops! Mixed up the order of the values being passed to str_replace :)

Fixed the code above.
Michael Eshom
Christian Metal Fans

destalk

Thank you, as ever.

Very much appreciated.  :D

destalk

#12
Just one minor issue. When the redirect kicks in, it loses the # sign. E.G. it forwards the url to something like this;

http://www.domain.com/index.php/topic,125.msg532.html

When the URL is actually;

http://www.domain.com/index.php/topic,125.msg532.html#msg532

Again, this leaves the possibiity of Google having two urls to deal with.

<---EDIT--->

Also, did you say that this would work the other way around? I.E. SE Friendy to dynamic/original URLs? Because I get the following error when I turn SEO friendly option off;

Fatal error: Call to undefined function: strpost() in /home/domain/public_html/index.php on line 120

Ben_S

AFAIK google will ignore #'s anyway.
Liverpool FC Forum with 14 million+ posts.

destalk

Quote from: Ben_S on December 18, 2005, 06:17:36 PM
AFAIK google will ignore #'s anyway.

I was just about to disagree with you, then I went to check. You are quite right. :P

Thanks for that. :)

Oldiesmann

The code isn't supposed to drop the anchor string, but if Google ignores them anyway then I guess there's no point in fixing them. Also, the strpost() error is due to a typo. It should be strpos().
Michael Eshom
Christian Metal Fans

destalk

Thanks, as ever, Oldiesmann.

SleePy

#17
i would like to use this since our forums can't due this because we dont got the mod we need.

i got it to do stuff like

Quote
RewriteEngine on
RewriteRule ^index.htm index.php [L]
RewriteRule ^forums.htm index.php?action=forum [L]
RewriteRule ^search.html index.php?action=search [L]

but soon as i add

RewriteRule ^board([0-9.]*).html index.php?board=$1 [L]

i get a 500 internal server error.

am i doing anything wrong?

in fact if i add anything else it breaks
Quote
RewriteRule ^profile.html index.php?action=profile [L]
RewriteRule ^PrivateMessage.html index.php?action=pm [L]
RewriteRule ^Calendar.html index.php?action=calendar [L]
Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

destalk

What would also be really nice is if, upon first viewing a forum with SEF URLs switched on, is if the PHPSESSIONID stuff could be avoided - Google is full of PHPSESSIONID urls from my forums, so it is indexing this. Also, the first time that the home page is shown to a user, it still shows the PHP dynamic URLs.

Any ideas welcome.

destalk

Hi

This all works fine, apart from one feature. I've noticed that when an email notification is sent to a member, it is still in the old format. And they click on the link it redirects them to the beginning of a thread, rather than to the *new* post.

Is there any way to fix this?

Thanks.

Advertisement: