News:

Wondering if this will always be free?  See why free is better.

Main Menu

How to externally copy or archive an SMF forum?

Started by mpoloukhine, February 26, 2010, 02:11:32 PM

Previous topic - Next topic

mpoloukhine

A forum I frequent is being shut down. The site owners have no intention of archiving the forum posts. Is there an app for downloading the publicly displayed posts of an SMF forum (1.1) to a file format that can display the threads in a readable format?

Need not be to an SQL DB format (although that would be ideal, as I could load it to my SMF forum!) I'll settle for any format I can display online that would allow the public to read the contents.

I'm looking for something more manageable to run than a million print screens...

kat

Why not ask them to make a backup of the home directory and the database and get them to send it to you?


There are various website downloaders, but I doubt they'll do the job, with this. Especially as you'll need the database, too.

Chas Large

You have to remember that a forum is not like a fixed website, all the post pages are produced depending on which page you visit and the data is pulled from the database into the template you select as a user. So unless you can call on the database, there is no way to simply copy it.
My Modifications :)  My Forum

Please DO NOT PM me with support requests. Post the problem in the appropriate Support Board so everyone can benefit from the advice given.

mpoloukhine

As I said, the website owner is not interested in archiving it, I've asked, so we're on our own. And yeah I've tried a couple of site leech programs and they obviously are not up to the task. What I really need is to get a copy of the SQLdb, but with a disinterested site admin, its not easy.

mpoloukhine

The site is still active, I can call on the site in that sense, and so in theory, it should be possible to have some script or routine armed with my site login credentials basically just run down the site links and capture the pages that are sent to it.

In theory.

By a programmer.

Which I am decidedly not.  ;)

Chas Large

My Modifications :)  My Forum

Please DO NOT PM me with support requests. Post the problem in the appropriate Support Board so everyone can benefit from the advice given.

mpoloukhine

Thanks, hdn't tried that one, but same general problem with logging in as with other software I've tried.

mpoloukhine

I'm not savvy enough to sort out exactly what goes wrong, but there's something about how that forum is set up that causes problems with these software in the login process and/or in reading the available sub-boards.

kat

Quite right, too, or we'd all find our forum appearing somewhere else.


Yeah, to get everything, you'll need the admin's login details.

mpoloukhine

Quote from: St. Kat on February 27, 2010, 04:54:58 AM
Quite right, too, or we'd all find our forum appearing somewhere else.


Yeah, to get everything, you'll need the admin's login details.
Yes, I can appreciate that. Just frustrated in that I'm not looking to scam the forum but to save it.

What I'm getting at is that there seems to me to be no reason a script can't be written to harvest each page as the user visits them. Instead of printing to screen it prints to screen AND to html file. It would be the user navigating, the script "filming" the results as html.

No way a security set up would block that as its on the user side, at user speed.

Taking it one step further, no reason a script couldn't be done to do the same in an automated fashion, and just slowed down to user speed.


MrPhil

A few weeks ago (give or take) there was a query about using an external site scraper tool to archive an SMF forum as a collection of static HTML files. It was on this board, but I can't seem to find it due to the broken search (won't go to page 2)! Once search comes back, look for HTML+forum+ maybe "archive"? If the contents of the forum in question are not hidden from guests, all sorts of off-the-shelf site scrapers should work. If you need to log in with your ID and password, there still may be some that work.

mpoloukhine

#11
Thanks MrPhil, found this thread.

It references GNU Wget, and the Windows version. but that's pretty much a routine to do from the admin access side. Pretty sure that won't work from the user side.

[edit]Modified some of the search filters to get more results and also found this thread. Same basic sources there too, and looking to do it from the admin side. I'm trying to save it from the user side.

[edit2]To get more search results while search is busted, display searches by date, and then re-search using the "days" search filter to filter out by date the threads the search already showed on page one.

MrPhil

Neither of those threads was the recent one I recall seeing, but keep looking!

If you can access the forum content as a guest (without signing on), anything that acts like a search engine spider should be able to crawl the whole site and grab everything. If you have to be signed on, it could get a bit more complicated. In either case, I don't see why you would need "admin" access.

Advertisement: