I have to shut down on of my boards but want to keep (some) of the content online. Since Google really likes the site it would be great to keep the URLs alive or even better to redirect them to corresponding wiki-pages (which should follow up the board-pages).
Any ideas on how to do this - apart from saving the pages manually?
thats about the only way, pretty much.
although you could just shut down the boards to new posting. I don't know of another way
thanks - I was hoping for some kind of mirror/dump-possibility, as I am working with some (non forum-)systems which build these static pages to lower and distribute system load.
Being quite a PHP-newb I'm giving HTTrack wget a try...
HTTrack isn't a bad choice to do a static mirror of a site. Be careful in the settings, however. Tossing more than four processes against a site can sometimes cause you to be banned on the server.
never heard of that. thanks for the tip ill remember that :)
Lainaus käyttäjältä: Motoko-chan - huhtikuu 04, 2008, 11:25:29 AP
HTTrack isn't a bad choice to do a static mirror of a site. Be careful in the settings, however. Tossing more than four processes against a site can sometimes cause you to be banned on the server.
that is good advice regarding overloading sites (especially dynamically loaded database sites like SMF )by using a scraping tool like HTTRACK.
However, HTTrack can not properly dump a SMF forum as it can only crawl exposed links, furthermore, it will crawl (or attempt to crawl) links which are just not relevant.
What he needs in this case is a custom script.
This should be no about 20-40 lines of code.
either loop through the database and dump the post body (with the converted time/date and doing a user name lookup from the ID
OR
use the SSI board news function to read the boards and topics (set the board id to the id of each board you are looping through, and the POSTS variable to the largest number ( so you can get all the replies), as you loop through, concatenate
this way you get the text (bbcode parsed and all) suitable for dumping to a text file.
OR
loop through all his topics,
get the text of each thread as follows
$threaddata = file_getcontents( "$myurl; $topic;all" );
//note that retrieving the topic that way will by default only return the first two or so pages of posts...I dont remember where to change it so it returns ALL the posts, but it shouldnt be too hard to find.
To optimize, you may set the template to a very low bandwidth template (i.e. "Wireless"
finally, you would do a redirect (302: Moved Permanently) in your HTACCESS to send search engines to permanently go to the resulting HTML file.
This all very rough, but you should have enough ideas to use