Sitemap

Started by SlammedDime, May 12, 2007, 05:43:04 PM

Previous topic - Next topic

nexxuscorp

HEllo SLimmedime,

Thanks for the response, but i have change my modification.french.php, but the error style appear....

SlammedDime

Did you clear your file cache?
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

nexxuscorp

Thanks a lot SlammeDime, all seems to be ok.
I forgot to clear the cache.

Thank you :p

beltazar

hi SlammedDime,

this is great mod, i installed in SMF2 RC2, it works fine.
but google webmaster said this

URL restricted by robots.txt
We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit.


this is my robot.txt


User-agent: Mediapartners-Google*
Disallow:

User-agent: BotRightHere
Disallow: /

User-agent: WebZip
Disallow: /

User-agent: larbin
Disallow: /

User-agent: b2w/0.1
Disallow: /

User-agent: Copernic
Disallow: /

User-agent: psbot
Disallow: /

User-agent: Python-urllib
Disallow: /

User-agent: NetMechanic
Disallow: /

User-agent: URL_Spider_Pro
Disallow: /

User-agent: CherryPicker
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: CopyRightCheck
Disallow: /

User-agent: Crescent
Disallow: /

User-agent: SiteSnagger
Disallow: /

User-agent: ProWebWalker
Disallow: /

User-agent: CheeseBot
Disallow: /

User-agent: LNSpiderguy
Disallow: /

User-agent: Alexibot
Disallow: /

User-agent: Teleport
Disallow: /

User-agent: TeleportPro
Disallow: /

User-agent: MIIxpc
Disallow: /

User-agent: Telesoft
Disallow: /

User-agent: Website Quester
Disallow: /

User-agent: WebZip
Disallow: /

User-agent: moget/2.1
Disallow: /

User-agent: WebZip/4.0
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebSauger
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: NetAnts
Disallow: /

User-agent: Mister PiX
Disallow: /

User-agent: WebAuto
Disallow: /

User-agent: TheNomad
Disallow: /

User-agent: WWW-Collector-E
Disallow: /

User-agent: RMA
Disallow: /

User-agent: libWeb/clsHTTP
Disallow: /

User-agent: asterias
Disallow: /

User-agent: httplib
Disallow: /

User-agent: turingos
Disallow: /

User-agent: spanner
Disallow: /

User-agent: InfoNaviRobot
Disallow: /

User-agent: Harvest/1.5
Disallow: /

User-agent: Bullseye/1.0
Disallow: /

User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
Disallow: /

User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /

User-agent: CherryPickerSE/1.0
Disallow: /

User-agent: CherryPickerElite/1.0
Disallow: /

User-agent: WebBandit/3.50
Disallow: /

User-agent: NICErsPRO
Disallow: /

User-agent: Microsoft URL Control - 5.01.4511
Disallow: /

User-agent: DittoSpyder
Disallow: /

User-agent: Foobot
Disallow: /

User-agent: SpankBot
Disallow: /

User-agent: BotALot
Disallow: /

User-agent: lwp-trivial/1.34
Disallow: /

User-agent: lwp-trivial
Disallow: /

User-agent: BunnySlippers
Disallow: /

User-agent: Microsoft URL Control - 6.00.8169
Disallow: /

User-agent: URLy Warning
Disallow: /

User-agent: Wget/1.6
Disallow: /

User-agent: Wget/1.5.3
Disallow: /

User-agent: Wget
Disallow: /

User-agent: LinkWalker
Disallow: /

User-agent: cosmos
Disallow: /

User-agent: moget
Disallow: /

User-agent: hloader
Disallow: /

User-agent: humanlinks
Disallow: /

User-agent: LinkextractorPro
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: Mata Hari
Disallow: /

User-agent: LexiBot
Disallow: /

User-agent: Web Image Collector
Disallow: /

User-agent: The Intraformant
Disallow: /

User-agent: True_Robot/1.0
Disallow: /

User-agent: True_Robot
Disallow: /

User-agent: BlowFish/1.0
Disallow: /

User-agent: JennyBot
Disallow: /

User-agent: MIIxpc/4.2
Disallow: /

User-agent: BuiltBotTough
Disallow: /

User-agent: ProPowerBot/2.14
Disallow: /

User-agent: BackDoorBot/1.0
Disallow: /

User-agent: toCrawl/UrlDispatcher
Disallow: /

User-agent: suzuran
Disallow: /

User-agent: TightTwatBot
Disallow: /

User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /

User-agent: VCI
Disallow: /

User-agent: Szukacz/1.4
Disallow: /

User-agent: Openfind data gatherer
Disallow: /

User-agent: Openfind
Disallow: /

User-agent: Xenu's Link Sleuth 1.1c
Disallow: /

User-agent: Xenu's
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /

User-agent: RepoMonkey
Disallow: /

User-agent: Microsoft URL Control
Disallow: /

User-agent: Openbot
Disallow: /

User-agent: URL Control
Disallow: /

User-agent: Zeus Link Scout
Disallow: /

User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /

User-agent: Webster Pro
Disallow: /

User-agent: EroCrawler
Disallow: /

User-agent: LinkScan/8.1a Unix
Disallow: /

User-agent: Keyword Density/0.9
Disallow: /

User-agent: Kenjin Spider
Disallow: /

User-agent: Iron33/1.0.2
Disallow: /

User-agent: Bookmark search tool
Disallow: /

User-agent: GetRight/4.2
Disallow: /

User-agent: FairAd Client
Disallow: /

User-agent: Gaisbot
Disallow: /

User-agent: Aqua_Products
Disallow: /

User-agent: Radiation Retriever 1.1
Disallow: /

User-agent: Flaming AttackBot
Disallow: /

User-agent: *
Disallow: /*action*
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.png$
Disallow: index.php?action=search*
Disallow: index.php?action=calendar*
Disallow: index.php?action=login*
Disallow: index.php?action=register*
Disallow: index.php?action=profile*
Disallow: index.php?action=stats*
Disallow: index.php?action=printpage*
Disallow: index.php?PHPSESSID=*
Disallow: index.php?*rss*
Disallow: index.php?*wap*
Disallow: index.php?*wap2*
Disallow: index.php?*imode*

User-agent: Mediapartners-Google
Allow:


what should i do ?




SlammedDime

Why on earth do you have all of that?  Seems a bit overkill.

The line that is killing it is this...

User-agent: *
Disallow: /*action*

You're telling all user-agents to not visit any links in smf that have action in them.
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

Massl

Thanks for this great mod SlammedDime ;)

CrazyTech

I'm not sure what's up, but I've got everything installed and working.

However, I'm not getting indexed well at all. In fact, I'm sitting at less than 200 URLs where I easily had several thousand minimum with vB prior to the conversion. I understand this takes time, and I given it a couple weeks to see. I guess what troubles me the most is the XML sitemap shows only about 8,000 URLs - far less than the 11,000+ threads I have at the least. robots.txt, etc. are all fine. I've increased the Google indexing for an example without much change.

Is there something I'm missing with only 8,000 URLs and so few actually indexed? Just to compare, I installed vB sitemap and within 24 hours I had an additiona 1200 pages indexed.

SlammedDime

The sitemap only provides a list of topics and boards to the search engines, it makes no promise or guarantee that you will be indexed... there is no mod that can and I can't tell you why vB works better than SMF with indexing.  Search engines index content not just because it's there, but because they deem it worthy to be indexed.

As for the number of URLs in the Sitemap; you didn't mention which version of SMF you're using, but if using SMF 2.0, the xml sitemap only displays topics viewable to guests.  If using SMF 1.1.x, the xml sitemap will display whatever the viewer has access to see.  A guest (google) would only see topics a guest can see.  As an admin, you would be shown all topics in the XML sitemap.  Since you as an admin can view all topics, so if you have some that aren't available to guests, that is where the difference comes in.
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

CrazyTech

Quote
The sitemap only provides a list of topics and boards to the search engines, it makes no promise or guarantee that you will be indexed... there is no mod that can and I can't tell you why vB works better than SMF with indexing.  Search engines index content not just because it's there, but because they deem it worthy to be indexed.

Well that's what I'm concerned about. If the only real variable is the software...which by the way is SMF 2.0 RC2.

I'm rehashing the sitemap just to make sure. I don't necessarily feel like it's a sitemap issue with your above points. I'm going to double-check all of the forums, but they should certainly be accessible to guests and therefore search engine bots. One thing I did notice is that I used the Googlebot download tool in Google webmasters, and it seems to download only a portion of the sitemap. It abruptly stops a little ways down in the middle of an entry. Just not sure if that's by design using the tool, or if maybe that's what is actually happening to the sitemap. I know the vB mod generated 4-5 XML pages for my forum, and then it looks like the MyBB mod does the same. I'm not sure if that has any bearing at all, but it's a thought. The only other difference is that it's compressed as far as vB.

I'm definitely not blaming anything yet, so please don't take it the way that I'm complaining about the sitemap. I appreciate the work you do and releasing it to this community, so thanks!

SlammedDime

If googlebot is stopping half way through, then it may not like how long the page takes to generate... I put in set_time_limit calls to extend out how much time PHP can take to generate the page, but if google doesn't respect that it can take longer, that might be an issue I"ll have to look into.

As for multiple pages - I never really considered it.... would be something to look into.
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

beltazar

Quote from: SlammedDime on December 11, 2009, 01:16:15 AM
Why on earth do you have all of that?  Seems a bit overkill.

The line that is killing it is this...

User-agent: *
Disallow: /*action*

You're telling all user-agents to not visit any links in smf that have action in them.

ups... thx SlammedDime

i've delete those two line, but google still said restricted by robots.txt
how is the right robot.txt ?

thx

spiros

Is there  a way, when showing topics, not to have the board name linked?
Also, could one define the number of topics per page?

If you have board names linked it makes 200 links per page, which, I think, is twice the recommended amount by Google.

SlammedDime

This very page has over 294 links now that I've posted to it... I can't see how the number of links per page would hamper a bot, at least not this day and age in technology where it takes a fraction of a second to preg_search the page to capture all links.

As for your questions though: Number of topics per page: not settable by an admin setting, but it can be changed in the Sitemap.php Source file (assuming you're using 2.0), under the listOptions array for topics.  Same for linked boards, it would have to be edited under the listOptions for topics
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

spiros

Thanks for your answer. Can you please tell me what code I need to change to unlink Boards in topics view?

QuoteKeep the links on a given page to a reasonable number (fewer than 100).
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769

SlammedDime

There is a significant amount of debate as to whether or not that actually matters... Google Search

As for the code

Code (Find) Select
'format' => '[<a href="' . $scripturl . '?board=%d.0">%s</a>] <a href="' . $scripturl . '?topic=%d.0">%s</a>',


Code (Replace) Select
'format' => '[%s] <a href="' . $scripturl . '?topic=%d.0">%s</a>',
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

spiros

Just tried the change but it does not work, this is what it gives me:

Θέμα Ξεκίνησε από Εμφανίσεις Απαντήσεις
[17] 6719 feedbot 11 0
[17] 6718 feedbot 6 0
[17] 6717 feedbot 8 0
[17] 6716 feedbot 11 0
[17] 6715 feedbot 10 0
[17] 6714 feedbot 7 0
[17] 6713 feedbot 7 0
[17] 6712 feedbot 10 0


And all links are the same to:
http://www.nonsmokersclub.com/forum/index.php?topic=0.0

instead of:

Θέμα Ξεκίνησε από Εμφανίσεις Απαντήσεις
[Health news] Foetal heart rate monitor warning feedbot 8 0
[Health news] Rest Easy. When It Comes to Swine Flu, Your Pet Is Safe (HealthDay) feedbot 8 0
[Health news] For Gene Therapy, Seeing Signs of a Resurgence feedbot 8 0
[Health news] Young 'must have swine flu jab' feedbot 8 0
[Health news] Medical aid group raises alarm about AIDS funding (AP) feedbot 8 0
[Health news] Skin Deep: Surgery at a Spa? Buyer Beware. feedbot 9 0
[Health news] Tackle work stress, bosses told feedbot 8 0
[Health news] Democrats wrestle with abortion on health bill (Reuters) feedbot 6 0

SlammedDime

Sorry, one more line of code to remove...

This line is right below the one you modified.

Code (Remove) Select
'id_board' => false,
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

spiros

Brilliant, many thanks :)

Massl

Hi,
I installed this mod today, but the register of SMF 2.0 tells me these errors

It's normal ?

SlammedDime

Quote from: SlammedDime on November 12, 2009, 04:57:42 PM
That means your language strings were not updated in your language file.  If you have any other Modifications.[language].php file OTHER than english, you need to copy what is in it to the other languages.
SlammedDime
Former Lead Customizer
BitBucket Projects
GeekStorage.com Hosting
                      My Mods
SimpleSEF
Ajax Quick Reply
Sitemap
more...
                     

Advertisement: