Uutiset:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu
Advertisement:

build a static version of an entire board?

Aloittaja busy_one, huhtikuu 03, 2008, 02:57:25 IP

« edellinen - seuraava »

busy_one

I have to shut down on of my boards but want to keep (some) of the content online. Since Google really likes the site it would be great to keep the URLs alive or even better to redirect them to corresponding wiki-pages (which should follow up the board-pages).

Any ideas on how to do this - apart from saving the pages manually?

metallica48423

thats about the only way, pretty much.

although you could just shut down the boards to new posting.  I don't know of another way
Justin O'Leary
Ex-Project Manager
Ex-Lead Support Specialist

LainaaMicrosoft wants us to "Imagine life without walls"...
I say, "If there are no walls, who needs Windows?"


Useful Links:
Online Manual!
How to Help us Help you
Search
Settings Repair Tool

busy_one

#2
thanks - I was hoping for some kind of mirror/dump-possibility, as I am working with some (non forum-)systems which build these static pages to lower and distribute system load.

Being quite a PHP-newb I'm giving HTTrack wget a try...

青山 素子

HTTrack isn't a bad choice to do a static mirror of a site. Be careful in the settings, however. Tossing more than four processes against a site can sometimes cause you to be banned on the server.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


metallica48423

never heard of that. thanks for the tip ill remember that  :)   
Justin O'Leary
Ex-Project Manager
Ex-Lead Support Specialist

LainaaMicrosoft wants us to "Imagine life without walls"...
I say, "If there are no walls, who needs Windows?"


Useful Links:
Online Manual!
How to Help us Help you
Search
Settings Repair Tool

steighan

Lainaus käyttäjältä: Motoko-chan - huhtikuu 04, 2008, 11:25:29 AP
HTTrack isn't a bad choice to do a static mirror of a site. Be careful in the settings, however. Tossing more than four processes against a site can sometimes cause you to be banned on the server.

that is good advice regarding overloading sites (especially dynamically loaded database sites like SMF )by using a scraping tool like HTTRACK.

However, HTTrack can not properly dump a SMF forum as it can only crawl exposed links, furthermore, it will crawl (or attempt to crawl) links which are just not relevant.

What he needs in this case is a custom script.
This should be no about  20-40  lines of code.

either loop through the database and dump the post body (with the converted time/date and doing a user name lookup from the ID

OR

use the SSI board news function to read the boards and topics (set the board id to the id of each board you are looping through, and the POSTS variable to the largest number ( so you can get all the replies), as you loop through, concatenate
this way you get the text (bbcode parsed and all) suitable for dumping to a text file.

OR

loop through all his topics,
get the text of each thread as follows
$threaddata = file_getcontents( "$myurl; $topic;all" );

//note that retrieving the topic that way will by default only return the first two or so pages of posts...I dont remember where to change it so it returns ALL the posts, but it shouldnt be too hard to find.

To optimize, you may set the template to a very low bandwidth template (i.e. "Wireless"

finally, you would do a redirect (302: Moved Permanently) in your HTACCESS to send search engines to permanently go to the resulting HTML file.

This all very rough, but you should have enough ideas to use
"Frequently wrong, but never in doubt"

Advertisement: