Simple Machines Community Forum

Archived Boards and Threads... => Archived Boards => SMF Feedback and Discussion => Aiheen aloitti: pothead - helmikuu 08, 2008, 02:06:23 IP

Otsikko: Bots?
Kirjoitti: pothead - helmikuu 08, 2008, 02:06:23 IP
i recently been noticing google bots on my site...even if i ban ips it still gets in..
im wondering if this is bad or good ?? can this slow my forum down??
and why do they do this....checking traffic .?gathering information?...

crawl-66-249-66-112.googlebot.com
ips 66.249.66.112
theres others too and theyre routing thru NY to california...

anybody with this knowlegde on this is appreciated.....


aloha
Otsikko: Re: Bots?
Kirjoitti: karlbenson - helmikuu 08, 2008, 02:15:37 IP
google has thousands of different ips it uses in different ranges.

If you want to stop google, rather than ip banning, you should use a robots.txt
Otsikko: Re: Bots?
Kirjoitti: pkmnfrk - helmikuu 08, 2008, 02:20:18 IP
Also, I don't see much point in trying to stop them either, unless it's a private site or something. When they're crawling the site, they're adding your pages into the Google index, so people can search for them.
Otsikko: Re: Bots?
Kirjoitti: pothead - helmikuu 08, 2008, 04:22:09 IP
thank you..for the information,,,,,

Otsikko: Re: Bots?
Kirjoitti: Ðyєgσv - helmikuu 08, 2008, 11:04:40 IP
And just to answer you other question, no, they don't slow down your site ;)
Otsikko: Re: Bots?
Kirjoitti: dejiman - huhtikuu 13, 2008, 08:19:40 IP
Please I have submitted my site map to google and i have a steady robots.txt file to stop some places i don't want google to index but google those not index other parts of my site.

please check this on my forum xml's links to confirm

May be am right?
www.dejimanaire.com/sitemap.xml (http://www.dejimanaire.com/sitemap.xml)

http://dejimanaire.com/index.php?type=rss;action=.xml (http://dejimanaire.com/index.php?type=rss;action=.xml)



Otsikko: Re: Bots?
Kirjoitti: karlbenson - huhtikuu 13, 2008, 08:35:45 IP
I your robots.txt you don't wildcards at the end. It is IMPLIED.
Also note, only Google/Yahoo support wildcards.

I've been doing alot in this area on my forum recently.
I've had to implement a smart robots (http://www.youposted.com/robots.txt) and serve a specific robots.txt to yahoo/google vs msnbot vs every other bot because using the wildcards invalidated my robots.txt in several checkers.
Otsikko: Re: Bots?
Kirjoitti: Antechinus - huhtikuu 14, 2008, 05:59:25 AP
You fluffed the link, Karl. You have a double http:// there.  ;)
Otsikko: Re: Bots?
Kirjoitti: Aleksi "Lex" Kilpinen - huhtikuu 14, 2008, 06:32:24 AP
Lainaus käyttäjältä: Ðyєgσv - helmikuu 08, 2008, 11:04:40 IP
And just to answer you other question, no, they don't slow down your site ;)
Yahoo can even bring a site completely down :D But, yeah usually search bot's don't do damage of any kind. :)
Otsikko: Re: Bots?
Kirjoitti: karlbenson - huhtikuu 14, 2008, 12:57:34 IP
Indeed theres two very aggressive bots i'm aware of
Omgilibot and Yahoo.

As a point of first instance, i think its better to try to use a robots to block off some area (which you can guarantee that yahoo slurp is crawling) rather than a crawl-delay.

+ Edit, oops fixed my link ;)
Otsikko: Re: Bots?
Kirjoitti: dejiman - huhtikuu 15, 2008, 01:28:12 IP
Lainaus käyttäjältä: karlbenson - huhtikuu 13, 2008, 08:35:45 IP
I your robots.txt you don't wildcards at the end. It is IMPLIED.
Also note, only Google/Yahoo support wildcards.

I've been doing alot in this area on my forum recently.
I've had to implement a smart robots (http://www.youposted.com/robots.txt) and serve a specific robots.txt to yahoo/google vs msnbot vs every other bot because using the wildcards invalidated my robots.txt in several checkers.


PLease how can I add to my key word with google and smf forum? please How can I get the link for SMF SITE MAP that I am to submit to google? Please did I created this [urlhttp://www.dejimanaire.com/sitemap.xml[/url] site map well? please should I change all the html to php in the created site map?: Please I will appreciate your effor?
Please How would google index other part of my site with titles?
Otsikko: Re: Bots?
Kirjoitti: karlbenson - huhtikuu 15, 2008, 01:30:05 IP
You can submit the sitemap to Google via Google Webmasters.
You can also add the Sitemap autodiscovery thing to your robots.txt like I've done in mine (http://www.youposted.com/robots.txt)

Eg like
Lainaa
Sitemap: http://www.youposted.com/sitemap.xml

User-agent: *
Disallow: /attachments/
... {and the rest of my disallow}

Note, the line between Sitemap and UserAgent MYMUST be there.
Otsikko: Re: Bots?
Kirjoitti: dejiman - huhtikuu 17, 2008, 07:21:03 AP
Please I want you to review my site map and my robots txt. that I submitted to google webmaster. check out this link for  my site map http://www.dejimanaire.com/sitemap.xml (http://www.dejimanaire.com/sitemap.xml)

Google only index my site map with their robots.

and check out this link for my robot txt www.dejimanaire.com/robots.txt (http://www.dejimanair.com/robots.txt)


Please am I on the right track?

Really I need google robots to crawl all around my site except the restricted area.

Thanks for your support.
Otsikko: Re: Bots?
Kirjoitti: karlbenson - huhtikuu 17, 2008, 11:24:53 AP
You dont need wildcards on the end of all those Disallows.
It is implied

So
action=arcade
would also block
action=arcade;andanythingthatfollows
Otsikko: Re: Bots?
Kirjoitti: Antechinus - huhtikuu 17, 2008, 08:01:56 IP
Lainaus käyttäjältä: karlbenson - huhtikuu 15, 2008, 01:30:05 IP
You can submit the sitemap to Google via Google Webmasters.
You can also add the Sitemap autodiscovery thing to your robots.txt like I've done in mine (http://www.youposted.com/robots.txt)

Eg like
Lainaa
Sitemap: http://www.youposted.com/sitemap.xml

User-agent: *
Disallow: /attachments/
... {and the rest of my disallow}

Note, the line between Sitemap and UserAgent MYMUST be there.

Can you recommend a good tutorial on setting up a sitemap? Never having done it before I'm completely at a loss as to how to proceed.
Otsikko: Re: Bots?
Kirjoitti: karlbenson - huhtikuu 17, 2008, 08:12:28 IP
There are mods for smf which can make them.

I didn't actually look at any sitemap tutorials
I looked at SlammedDimes mod and then wrote my own.
Otsikko: Re: Bots?
Kirjoitti: Antechinus - huhtikuu 18, 2008, 06:27:23 AP
Cool. Thanks. I'll check out his mod.
Otsikko: Re: Bots?
Kirjoitti: dejiman - huhtikuu 18, 2008, 03:55:45 IP
Lainaus käyttäjältä: antechinus - huhtikuu 17, 2008, 08:01:56 IP
Lainaus käyttäjältä: karlbenson - huhtikuu 15, 2008, 01:30:05 IP
You can submit the sitemap to Google via Google Webmasters.
You can also add the Sitemap autodiscovery thing to your robots.txt like I've done in mine (http://www.youposted.com/robots.txt)

Eg like
Lainaa
Sitemap: http://www.youposted.com/sitemap.xml

User-agent: *
Disallow: /attachments/
... {and the rest of my disallow}

Note, the line between Sitemap and UserAgent MYMUST be there.

Can you recommend a good tutorial on setting up a sitemap? Never having done it before I'm completely at a loss as to how to proceed.


PLease did you use SlammedDimes mod for your sitemap? When I installed his site map mod i recieved an error in the index.template.php file. will it work with that error?
Otsikko: Re: Bots?
Kirjoitti: karlbenson - huhtikuu 18, 2008, 04:15:40 IP
I looked at SlammedDimes mod. But wrote my own. (I have however previously used his mod)

No, if you get an error installing you'll need to install it manually.
Use either the package parser in the mod site or an external one such as http://www.adrevenueshare.com/parser
Otsikko: Re: Bots?
Kirjoitti: dejiman - huhtikuu 18, 2008, 05:12:21 IP
I just uninstalled his mod and tried using google site map generator. you can now check my site map http://www.dejimanaire.com/sitemap.xml       

But I have a problem while validating my sitemap with http://www.w3.org   I received this errors

Schema validating with XSV 3.1-1 of 2007/12/11 16:20:05

Schema validator crashed

The maintainers of XSV will be notified, you don't need to
send mail about this unless you have extra information to provide.
If there are Schema errors reported below, try correcting
them and re-running the validation.

    * Target: http://www.dejimanaire.com
         (Real name: http://www.dejimanaire.com
          Last Modified: Fri, 18 Apr 2008 21:06:47 GMT
          Server: Apache/2.2.8 (Unix) mod_ssl/2.2.8 OpenSSL/0.9.8b mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635)
    * The target was not assessed

Low-level XML well-formedness and/or validity processing output


Error: Mismatched end tag: expected </td>, got </table>
in unnamed entity at line 58 char 10 of http://www.dejimanaire.com

Please what should i do?