News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

Create a standalone HTML version of your messageboard - beta

Started by inthe80s, March 27, 2007, 09:59:16 PM

Previous topic - Next topic

inthe80s

This is TOTALLY BETA!  Never given out before, use at your own risk!!!

Archive Script - Beta 0.1
by Charles R. Grosvenor Jr.  - http://www.chuckyg.com - [email protected]

This script can be seen in use at http://www.inthe00s.com/archive/

This script can be downloaded at http://www.inthe00s.com/archive/archive.zip

Purpose:
To capture all of the threads and messages from an active SMF board and build static HTML pages.

Requirements:
You will need the ability to upload PHP scripts.

You will need to edit this program's configuration files if you wish to disable certain boards from being created.

You will need Server Side Includes (SSI) in order for the navigation links to be visible on the message pages.

To customize, you will need to understand HTML.

Installation:
Read the include me for the details.

Why?
If you're like me, you want the search engines to index your site, but SMF isn't designed to allow search engine spiders easy access.  They hit servers kind of hard, tend to index a lot of stuff you'd prefer they didn't, and generally don't bring in much traffic anyways. 

So this script reads all the messages from your SMF board, builds some ASCII datafiles, and then builds static HTML pages from those datafiles.  You can point the search engine to look at your archive (also builds a sitemap.xml for the search engines) instead of your regular forum. 

With adsense, I found that I made way more money from the archives than I could from the ads on the messageboard.  I even ended up removing the ads from everything except the archives.

What's the Catch?
I'm pretty busy battling some spam issues with my server and one of my sites at the moment, so I have no idea how quick I can respond to questions people have.  This is also the very first public version available, so I have no idea how friendly it is for someone who has never used it before to try it out. 

It doesn't write anything to the database, so you shouldn't need to worry about backing anything up, but that's up to you.  If you don't back it up and something happens, it ain't my problem.  I've been using these scripts for 3 years without an issue.
Running SMF since May '04.  Started with YaBB on Oct 2001.

ladynada

EDITED!

DUH!  I found the changed archive_defaults_auto.php that your program created, in the protected folder!  It worked... now lemme continue the process,and I will reply as to how it goes.  still you need to know about the error I was reporting below. and yes, it also made one on my home pc.  THANKS
original reply below.



Hello,

I ran the setup on my test setup on home pc with apache mysql and php 4.6 and I get this error

Notice: Undefined offset: 0 in D:\public_html\bbs\protected\archive_setup.php on line 60

line 60 is in this section of code


# build directory array
ksort($arrayName);
for ($loop=0; $loop<=$maxId; $loop++ ){
if ( empty($boards[$loop]) ) {
$dirName = preg_replace("/\W/", "", strtolower($boardTitles[$loop]));
$boards[$loop] = $dirName;
}



specifically this line:



$dirName = preg_replace("/\W/", "", strtolower($boardTitles[$loop]));


I am excited about using your program.  any ideas whats the matter?

I also tried on my live bbs and it did not give the error but I assume it had the same error, because it did not create any files in the archive folder.  it did make the archive folder though.  on my home pc, the same thing happened, and when I changed the error reporting on the php.ini file, it also ran and did nothing, and did not report this error.  so, on my home pc, when I have error reporting from php.ini on STRICT, then it does tell me the error is on line 60

what are those /\W/"  ???

thanks,
nada
WORK for Truth, Print it, Take Time to READ ALL LINKS NOTED  click here --> The TWO Witnesses are Mom and Dad and SMF Skins

ladynada

I have an update.

I have finished the process on my HOME bbs.  now, the part where you run the archive_create_indexfile.php
and then click the Start at the begining, it took 13 minutes for 332 threads and 554 messages.

then, it made lotsa files in folders for each board, and separate sitemap.xml file INSIDE each folder for each board, and one BIG one that refers to the little ones.  Hope that is clear to folks. ok lookit, the big sitemap.xml has entries like this:


<sitemap>
<loc>http://localhost/bbs/archive/apostolicnews/sitemap.xml</loc>
<lastmod>2007-04-07</lastmod>
</sitemap>


and then inside that folder is another sitemap.xml with entries like this:


<url>
	
	
<
loc>http://localhost/bbs/archive/apostolicnews/smf/1172106074.shtml</loc>
	
	
<
lastmod>2007-04-07</lastmod>
	
	
<
changefreq>weekly</changefreq>
	
	
<
priority>0.8</priority>
</
url>


and also entries for monthly index.

Now, I have a question.  In the instructions, it says:

Quote from: instructions
You will want to create a default file in the archive folder and include the
index_include.shtml if you want to use it for the list of the boards
available.


I have NO IDEA what this default file should be, nor how to include  the index_include.shtml for, what??  I do not understand this quoted paragraph at all.  Can anyone help me??

I am going to run it now on my live bbs.  I like the idea, and need a better sitemap.xml for google.  I would like to have had SEARCH ENGINE FRIENDLY URLS in this???  Maybe the author has considered that and found using these date codes better?  I do not know. but what is nice, if I understand this, is that the urls point to archived messages, so if your forum crashes, they are still accessible...

however, if people reach them from search engines, they will get these plain vanilla pages with the messages?  am I right?  so there is alot to consider.

I may be wrong about all that, I am so new to all this, maybe the spider will go to the real page and get that???  LOL

I have not figured out how to add header and footer yet.

This is fun!  thank you for sharing this program with us!

nada

WORK for Truth, Print it, Take Time to READ ALL LINKS NOTED  click here --> The TWO Witnesses are Mom and Dad and SMF Skins

ladynada

took 11 minutes on the live bbs, about the same number of threads and messages.. gonna upload it to google, and see how it goes.
nada
WORK for Truth, Print it, Take Time to READ ALL LINKS NOTED  click here --> The TWO Witnesses are Mom and Dad and SMF Skins

ladynada

update:

I submitted the sitemap.xml to google and it has been accepted and read, had 486 links in it, where my old one only had like 70.  so we shall see, after next times the bot comes by, how it works out.

thanks again to the author for being generous to share a personal program!
nada
WORK for Truth, Print it, Take Time to READ ALL LINKS NOTED  click here --> The TWO Witnesses are Mom and Dad and SMF Skins

spiros

Haven't tested it yet but it looks impressive!

Can I ask you something, is there a way to create a hyperlinked table of contents in a single page (containing hyperlinked topic titles only) for each board?

There is also a SEO script for SMF which also does similar things, http://www.webmasterstalks.com/tpmod.html;dl=item46 any experience on that?

ladynada

WORK for Truth, Print it, Take Time to READ ALL LINKS NOTED  click here --> The TWO Witnesses are Mom and Dad and SMF Skins

ladynada

I keep getting error 404 on the shtml files and do not understand what they are and why this happens. I did some searches on google but understand even less.  seems like a command on the htaccess is needed but I want to know what I am doing before editing it.
can someone explain?
nada
WORK for Truth, Print it, Take Time to READ ALL LINKS NOTED  click here --> The TWO Witnesses are Mom and Dad and SMF Skins

ladynada

Quote from: instructions
You will want to create a default file in the archive folder and include the
index_include.shtml if you want to use it for the list of the boards
available.

can anyone answer this?

google bot is getting errors on these pages..
nada
WORK for Truth, Print it, Take Time to READ ALL LINKS NOTED  click here --> The TWO Witnesses are Mom and Dad and SMF Skins

ladynada

I get an error that says

[an error occurred while processing this directive]

when I run this

http://www.heartdaughter.com/overcomers/archive/breakingandurgent/index_04_2007.shtml

here is what is inside the file



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>The Two Witnesses Messages from April 2007</title>
<meta name="description" content="The Two Witnesses Messages from April 2007" />
<meta name="keywords" content="April, 2007, forum, messages" />
<meta name="mssmarttagspreventparsing" content="true" />
<meta name="classification" content="personal" />
<meta name="robots" content="all" />
<meta name="revisit-after" content="14 days" />
<meta name="distribution" content="global" />
<meta name="resource-type" content="document" />
<meta name="robots" content="all" />
<meta http-equiv="imagetoolbar" content="no" />
<style type="text/css" title="currentStyle">
/*******************************************************************************
  Message stuff
*******************************************************************************/
/* surrounds entire message */
.MessageEntire {
padding: 20px;
border: 1pt solid black;
}

/* surrounds all the messages, not the header and footer */
.MainBody{
padding-left: 20px;
padding-right: 20px;
}

/* used for the stuff that appears the top of the page */
.navigationBar {
padding-left: 20px;
padding-right: 20px;
font-family: verdana, sans-serif; font-size: 10pt;
}

/* surrounds entire message */
.MessageEntire {
padding: 20px;
border: 1pt solid black;
}

.MessageSubject {
}
.MessageSubject1 {
}
.MessageSubject2 {
font-weight: bold;
}

/* encompasses the author and the date */
.MessageAuthor {
}
.MessageAuthor1 {
}
.MessageAuthor2 {
font-weight: bold;
}

.MessageDate1 {
}
.MessageDate2 {
font-weight: bold;
}

.MessageText {
}


.copyright {
padding-top: 20px;
padding-right: 10px;
font: 8pt Verdana, Tahoma, helvetica, arial, sans-serif;
font-style: italic;
text-align: right;
}
</style>
</head>

<body><br /><div class="MainBody">
<h1>The Two Witnesses</h1><p><i>Posting Here is CLOSED Until They Arrive</i></p><br /><div class="MessageLink">This is an index of topics from the <a href="http://heartdaughter.com/overcomers/index.php?board=28.0">The Two Witnesses</a> topic on the <a href="http://heartdaughter.com/overcomers/">The Two Witnesses</a><br /><div class="MessageLink"><!--#include virtual="/archive/thetwowitnesses//header_2007.html"--></div><br />
<span style="font-size: 18pt;">April 2007</span>

<ul>
<li><!-- FIRSTDATE04/19/07 at 4:03 amFIRSTDATE --><a href="/archive/thetwowitnesses/smf/1176973384.shtml">THE TWO WITNESSES ARE ELIJAH AND THE DAUGHTER OF ZION</a> by ladynada - 04/19/2007</li>
<li><!-- FIRSTDATE04/04/07 at 1:57 pmFIRSTDATE --><a href="/archive/thetwowitnesses/smf/1175713024.shtml">SEARCHING FOR THE TWO WITNESSES</a> by ladynada - 04/04/2007</li>
</div>


<div class="copyright">
Copyright 2007
</div>


</body>
</html>



any help would be appreciated, I wish the author would come back to check on this.

nada
WORK for Truth, Print it, Take Time to READ ALL LINKS NOTED  click here --> The TWO Witnesses are Mom and Dad and SMF Skins

inthe80s

oh boy... I didn't mark the thread to notify me of responses, so I wasn't aware anyone even tried this.  So very sorry about that.

I'll try and respond tonight to the problems people are having, I was just stopping in quickly before running out the door for the afternoon.
Running SMF since May '04.  Started with YaBB on Oct 2001.

ladynada

Quote from: inthe80s on April 19, 2007, 01:08:55 PM
oh boy... I didn't mark the thread to notify me of responses, so I wasn't aware anyone even tried this.  So very sorry about that.

I'll try and respond tonight to the problems people are having, I was just stopping in quickly before running out the door for the afternoon.

Blessings on you for your generosity to share this mod.

I fixed my problem, but when I went to post here earlier today, my firefox crashed, and then I got busy and forgot

I suspect you have your forum in your root of your website, because the links were wrong in the shtml files for the actual messages, so all I had to do was change the location in one of your files, and then it works GREAT!

the answer for me was in here


# the root url of the messageboard, everything that precedes /index.php in your URL
$server_root $boardurl;
# name of the site
$boardName $mbname;
# ABSOLUTE path of the site
$rootDirectory '/home/pudnintane/public_html/';



so it was the $rootDirectory that I had to edit.

it is wonderful and I am looking forward to google using the sitemap.  google was already using it, but had some 404 errors on the messages, now it will be fine

thanks again
nada
WORK for Truth, Print it, Take Time to READ ALL LINKS NOTED  click here --> The TWO Witnesses are Mom and Dad and SMF Skins

inthe80s

Quote from: ladynada on April 19, 2007, 01:16:59 PM
Blessings on you for your generosity to share this mod.
not a problem, several people have asked me in the past for it, but the code was pretty messy and not well suited for others.

Quote
I fixed my problem, but when I went to post here earlier today, my firefox crashed, and then I got busy and forgot

I suspect you have your forum in your root of your website, because the links were wrong in the shtml files for the actual messages, so all I had to do was change the location in one of your files, and then it works GREAT!

the answer for me was in here


# the root url of the messageboard, everything that precedes /index.php in your URL
$server_root $boardurl;
# name of the site
$boardName $mbname;
# ABSOLUTE path of the site
$rootDirectory '/home/pudnintane/public_html/';



so it was the $rootDirectory that I had to edit.

it is wonderful and I am looking forward to google using the sitemap.  google was already using it, but had some 404 errors on the messages, now it will be fine

cool, glad you were able to figure it out. 

Yes, I do have my messageboard in my root, I tried to extract out most of the paths so it would be easy to change for people who didn't.  if you have any recommendations on what I could add to the readme file to help others who may have a similar issue, let me know and I'll add it.

google, yahoo and msn all support a new addition to your robots.txt file, so once you're comfortable with your sitemap working, you can add a line in your robots.txt for all the search engines to find it automatically.
Running SMF since May '04.  Started with YaBB on Oct 2001.

inthe80s

Quote from: spiros on April 12, 2007, 07:57:22 AM
Haven't tested it yet but it looks impressive!

Can I ask you something, is there a way to create a hyperlinked table of contents in a single page (containing hyperlinked topic titles only) for each board?

There is also a SEO script for SMF which also does similar things, http://www.webmasterstalks.com/tpmod.html;dl=item46 any experience on that?

the SEO script looks like it just modifies the links to your existing forum topics and indexes so they look like regular html files. 

The archive program I have is designed to make a backup of your database into a simple text database and then build a second low-bandwidth version of the topics as static html files. 

In my opinion, the SEO script isn't very useful.  Google will index a forum whether it's got regular php links or not.  I know because I had to block all search engines from my forums and point them to the static version I build, so they don't index the print links, and the reply links, etc.

I don't think either script is doing what you need it to.  You want just a list of forum topics per board?  I could probably modify one of the scripts to create that, I don't think it would take very long.
Running SMF since May '04.  Started with YaBB on Oct 2001.

ladynada

Quote from: inthe80s on April 19, 2007, 07:04:14 PM
Quote from: ladynada on April 19, 2007, 01:16:59 PM
Blessings on you for your generosity to share this mod.
not a problem, several people have asked me in the past for it, but the code was pretty messy and not well suited for others.

Quote
I fixed my problem, but when I went to post here earlier today, my firefox crashed, and then I got busy and forgot

I suspect you have your forum in your root of your website, because the links were wrong in the shtml files for the actual messages, so all I had to do was change the location in one of your files, and then it works GREAT!

the answer for me was in here


# the root url of the messageboard, everything that precedes /index.php in your URL
$server_root $boardurl;
# name of the site
$boardName $mbname;
# ABSOLUTE path of the site
$rootDirectory '/home/pudnintane/public_html/';



so it was the $rootDirectory that I had to edit.

it is wonderful and I am looking forward to google using the sitemap.  google was already using it, but had some 404 errors on the messages, now it will be fine

cool, glad you were able to figure it out. 

Yes, I do have my messageboard in my root, I tried to extract out most of the paths so it would be easy to change for people who didn't.  if you have any recommendations on what I could add to the readme file to help others who may have a similar issue, let me know and I'll add it.

google, yahoo and msn all support a new addition to your robots.txt file, so once you're comfortable with your sitemap working, you can add a line in your robots.txt for all the search engines to find it automatically.

Hi,

I do not understand what the line in robots.txt file should be to point it to this archive. Please tell me cuz that is exactly what I want to do.


oh and.. well, actually I give credit to God for helping me figure out what to change to make it work for me.  ummm.. because the key was to NOT change whatever variable you were using to KNOW where my forum was and its messages.  I only wanted to change the variable you were using to WRITE the addresses of where the plain text messages were kept.  I knew it didnt matter whether the archive folder was in my root or in my forum folder, so that was easy to ignore.  so actually I fixed it but did not know how (God helps me)

I hope that helps you figure out what to say in the readme to help folks know which one to edit and what to put there.  It WAS GOOD that your descriptions highlighted the word URL so I knew that would require an http: and not a /home type thing. 


thanks,
nada
WORK for Truth, Print it, Take Time to READ ALL LINKS NOTED  click here --> The TWO Witnesses are Mom and Dad and SMF Skins

ladynada

Oh I forgot.  I actually have a multitude of OTHER questions for you!

I could not figure out how to use the header and footer files you mentioned. you did not include any.  I just put some html inside there, and it worked, but I want to include a php file actually so I can track when people and bots access the archive.

http://heartdaughter.com/archive/index_include.shtml

you can see mine here.

I added a little menu, it looks flaky.  I dont understand the coding you used where you had code and arrows and stuff.. I read the help file for apache...  what I really need, as I say, is to INCLUDE a php file for my bot tracker.

I kinda figure that I could make an index.php and you said include that other file in it, and that way I could have my tracker, but I want the tracker on every page..

I hope this all makes sense!

thanks again,
nada
WORK for Truth, Print it, Take Time to READ ALL LINKS NOTED  click here --> The TWO Witnesses are Mom and Dad and SMF Skins

inthe80s

Quote from: ladynada on April 19, 2007, 10:13:37 PM
Oh I forgot.  I actually have a multitude of OTHER questions for you!

I could not figure out how to use the header and footer files you mentioned. you did not include any.  I just put some html inside there, and it worked, but I want to include a php file actually so I can track when people and bots access the archive.

http://heartdaughter.com/archive/index_include.shtml

you can see mine here.

I added a little menu, it looks flaky.  I dont understand the coding you used where you had code and arrows and stuff.. I read the help file for apache...  what I really need, as I say, is to INCLUDE a php file for my bot tracker.

I kinda figure that I could make an index.php and you said include that other file in it, and that way I could have my tracker, but I want the tracker on every page..

I hope this all makes sense!

thanks again,
nada


I don't think you can include php files in there, because the file extensions are .shtml  I would have to rewrite the scripts to output php files.  The headers and footers can only by html code.

arrows and stuff?  I don't know what you're talking about

the index_include.shtml is meant to be used inside of an index.shtml file you can make yourself, that way you can give out just an domain.com/archive link instead of a longer path, A file something like this:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Messageboard Archive Index</title>
<meta name="description" content="Messages from the forum bbs" />
<meta name="keywords" content="forum, messages, index" />
<meta name="mssmarttagspreventparsing" content="true" />
<meta name="classification" content="personal" />
<meta name="robots" content="all" />
<meta name="revisit-after" content="14 days" />
<meta name="distribution" content="global" />
<meta name="resource-type" content="document" />
<meta name="robots" content="all" />
<meta http-equiv="imagetoolbar" content="no" />
<style type="text/css" title="currentStyle">
li { padding-bottom: 8px; font-size: 14pt; }
ul { padding-bottom: 8px; }
.headings { font-size: 18pt; width: 800px;}
</style>
<script type="text/javascript"> </script>
</head>

<body>
<!--#include virtual="index_include.shtml"-->

</body>
</html>
Running SMF since May '04.  Started with YaBB on Oct 2001.

ladynada

oooh you are a dream!

thank you!

I will use that.  and yes that answered my question.

nada

yes its nice

http://heartdaughter.com/archive/index.shtml

now who do I tell to look at that?  robots?

WORK for Truth, Print it, Take Time to READ ALL LINKS NOTED  click here --> The TWO Witnesses are Mom and Dad and SMF Skins

inthe80s

Quote from: ladynada on April 20, 2007, 09:42:40 PM
oooh you are a dream!

thank you!

I will use that.  and yes that answered my question.

nada

yes its nice

http://heartdaughter.com/archive/index.shtml

now who do I tell to look at that?  robots?



make a file called robots.txt with Notepad and upload it to the ROOT of your domain.  Place this line in it

Sitemap: http://heartdaughter.com/archive/sitemap.xml

If you do a google search for robots.txt they you will find more about telling the search engines how to better index your site.
Running SMF since May '04.  Started with YaBB on Oct 2001.

ladynada

Quote from: inthe80s on April 21, 2007, 08:10:38 AM
Quote from: ladynada on April 20, 2007, 09:42:40 PM
oooh you are a dream!

thank you!

I will use that.  and yes that answered my question.

nada

yes its nice

http://heartdaughter.com/archive/index.shtml

now who do I tell to look at that?  robots?



make a file called robots.txt with Notepad and upload it to the ROOT of your domain.  Place this line in it

Sitemap: http://heartdaughter.com/archive/sitemap.xml

If you do a google search for robots.txt they you will find more about telling the search engines how to better index your site.


I unastan!  thank you so much.  I fixed it.

nada
WORK for Truth, Print it, Take Time to READ ALL LINKS NOTED  click here --> The TWO Witnesses are Mom and Dad and SMF Skins

Advertisement: