Simple Machines Community Forum

SMF Support => SMF 1.1.x Support => Topic started by: mpoloukhine on February 26, 2010, 02:11:32 PM

Title: How to externally copy or archive an SMF forum?
Post by: mpoloukhine on February 26, 2010, 02:11:32 PM
A forum I frequent is being shut down. The site owners have no intention of archiving the forum posts. Is there an app for downloading the publicly displayed posts of an SMF forum (1.1) to a file format that can display the threads in a readable format?

Need not be to an SQL DB format (although that would be ideal, as I could load it to my SMF forum!) I'll settle for any format I can display online that would allow the public to read the contents.

I'm looking for something more manageable to run than a million print screens...
Title: Re: How to externally copy or archive an SMF forum?
Post by: kat on February 26, 2010, 03:54:40 PM
Why not ask them to make a backup of the home directory and the database and get them to send it to you?


There are various website downloaders, but I doubt they'll do the job, with this. Especially as you'll need the database, too.
Title: Re: How to externally copy or archive an SMF forum?
Post by: Chas Large on February 26, 2010, 03:58:28 PM
You have to remember that a forum is not like a fixed website, all the post pages are produced depending on which page you visit and the data is pulled from the database into the template you select as a user. So unless you can call on the database, there is no way to simply copy it.
Title: Re: How to externally copy or archive an SMF forum?
Post by: mpoloukhine on February 26, 2010, 04:06:51 PM
As I said, the website owner is not interested in archiving it, I've asked, so we're on our own. And yeah I've tried a couple of site leech programs and they obviously are not up to the task. What I really need is to get a copy of the SQLdb, but with a disinterested site admin, its not easy.
Title: Re: How to externally copy or archive an SMF forum?
Post by: mpoloukhine on February 26, 2010, 04:09:55 PM
The site is still active, I can call on the site in that sense, and so in theory, it should be possible to have some script or routine armed with my site login credentials basically just run down the site links and capture the pages that are sent to it.

In theory.

By a programmer.

Which I am decidedly not.  ;)
Title: Re: How to externally copy or archive an SMF forum?
Post by: Chas Large on February 26, 2010, 04:45:11 PM
Have you tried Forum Downloader (http://www.softpedia.com/get/Internet/Download-Managers/Forum-Downloader.shtml) ?
Title: Re: How to externally copy or archive an SMF forum?
Post by: mpoloukhine on February 26, 2010, 06:01:51 PM
Thanks, hdn't tried that one, but same general problem with logging in as with other software I've tried.
Title: Re: How to externally copy or archive an SMF forum?
Post by: mpoloukhine on February 26, 2010, 06:37:08 PM
I'm not savvy enough to sort out exactly what goes wrong, but there's something about how that forum is set up that causes problems with these software in the login process and/or in reading the available sub-boards.
Title: Re: How to externally copy or archive an SMF forum?
Post by: kat on February 27, 2010, 04:54:58 AM
Quite right, too, or we'd all find our forum appearing somewhere else.


Yeah, to get everything, you'll need the admin's login details.
Title: Re: How to externally copy or archive an SMF forum?
Post by: mpoloukhine on February 27, 2010, 12:40:04 PM
Quote from: St. Kat on February 27, 2010, 04:54:58 AM
Quite right, too, or we'd all find our forum appearing somewhere else.


Yeah, to get everything, you'll need the admin's login details.
Yes, I can appreciate that. Just frustrated in that I'm not looking to scam the forum but to save it.

What I'm getting at is that there seems to me to be no reason a script can't be written to harvest each page as the user visits them. Instead of printing to screen it prints to screen AND to html file. It would be the user navigating, the script "filming" the results as html.

No way a security set up would block that as its on the user side, at user speed.

Taking it one step further, no reason a script couldn't be done to do the same in an automated fashion, and just slowed down to user speed.

Title: Re: How to externally copy or archive an SMF forum?
Post by: MrPhil on February 27, 2010, 01:04:03 PM
A few weeks ago (give or take) there was a query about using an external site scraper tool to archive an SMF forum as a collection of static HTML files. It was on this board, but I can't seem to find it due to the broken search (won't go to page 2)! Once search comes back, look for HTML+forum+ maybe "archive"? If the contents of the forum in question are not hidden from guests, all sorts of off-the-shelf site scrapers should work. If you need to log in with your ID and password, there still may be some that work.
Title: Re: How to externally copy or archive an SMF forum?
Post by: mpoloukhine on February 27, 2010, 02:52:55 PM
Thanks MrPhil, found this thread (http://www.simplemachines.org/community/index.php?topic=329334.msg2198262#msg2198262).

It references GNU Wget (http://www.gnu.org/software/wget/), and the Windows version (http://pages.interlog.com/~tcharron/wgetwin.html). but that's pretty much a routine to do from the admin access side. Pretty sure that won't work from the user side.

[edit]Modified some of the search filters to get more results and also found this thread (http://www.simplemachines.org/community/index.php?topic=236280.msg1522074#msg1522074). Same basic sources there too, and looking to do it from the admin side. I'm trying to save it from the user side.

[edit2]To get more search results while search is busted, display searches by date, and then re-search using the "days" search filter to filter out by date the threads the search already showed on page one.
Title: Re: How to externally copy or archive an SMF forum?
Post by: MrPhil on February 27, 2010, 05:03:37 PM
Neither of those threads was the recent one I recall seeing, but keep looking!

If you can access the forum content as a guest (without signing on), anything that acts like a search engine spider should be able to crawl the whole site and grab everything. If you have to be signed on, it could get a bit more complicated. In either case, I don't see why you would need "admin" access.