News:

SMF 2.1.6 has been released! Take it for a spin! Read more.

Main Menu

Indexed, though blocked by robots.txt

Started by Shades., August 29, 2021, 10:36:34 AM

Previous topic - Next topic

Shades.

Google is getting on my nerves! How do I fix this please?

Indexed, though blocked by robots.txt

Thanks 8)
ShadesWeb.com - Custom Logos - My Themes on SMF | My Themes on ShadesWeb
https://shadesweb.com

BikerHound.com - Sniffing out the road ahead
https://bikerhound.com

Dream as if you'll live forever; Live as if you'll die today. - James Dean

Antechinus

#BlockBotsByUserAgent
SetEnvIfNoCase User-Agent (Baidu|Barkrowler|Brandwatch|Garlik|Knowledge|libwww-perl|Linkdex|omgili|Proximic|Semrush|Sogou|Tweetmeme|Trendiction|Wordpress|Neevabot) bad_bot
<RequireAll>
Require all Granted
Require not env bad_bot
</RequireAll>

Add Googlebot to the above list in .htaccess. Google will go away. :)

Shades.

Like this?

QuoteBlockBotsByUserAgent
SetEnvIfNoCase User-Agent (Baidu|Barkrowler|Brandwatch|Garlik|Knowledge|libwww-perl|Linkdex|omgili|Proximic|Semrush|Sogou|Tweetmeme|Trendiction|Wordpress|Neevabot|Googlebot) bad_bot
<RequireAll>
   Require all Granted
   Require not env bad_bot
</RequireAll>
ShadesWeb.com - Custom Logos - My Themes on SMF | My Themes on ShadesWeb
https://shadesweb.com

BikerHound.com - Sniffing out the road ahead
https://bikerhound.com

Dream as if you'll live forever; Live as if you'll die today. - James Dean

Antechinus

That should kill it, as long as the actual user agent string contains "Googlebot" (I haven't checked). Up to you which of the others you want to keep. They are just ones that were being mildly annoying for me, so I thumped them. :)

Although you want to get rid of the line break in libwww-perl if you are keeping that. Should be one string, with hyphen, but without line break.

Shades.

With this tho google won't be able to index? I need it to allow indexing...I don't know why it's kicking back that error on action=search cause I have permission for guest to be allowed to search.
ShadesWeb.com - Custom Logos - My Themes on SMF | My Themes on ShadesWeb
https://shadesweb.com

BikerHound.com - Sniffing out the road ahead
https://bikerhound.com

Dream as if you'll live forever; Live as if you'll die today. - James Dean

Mick.

Wanna know whats on my robots.txt? Nothing. ...and I don't use SMFs keywords either. Blank! Buahhahaha!

Antechinus

Quote from: Shades. on August 29, 2021, 06:11:55 PMWith this tho google won't be able to index? I need it to allow indexing...I don't know why it's kicking back that error on action=search cause I have permission for guest to be allowed to search.
Oh, ok. Well in that case I have NFI.
ETA: Hang on. Your first line is this:
User-agent: *

Disallow: /*action

That's going to block action=search, yes? So maybe you want this:
Allow: /*action=search

Allow: /*page=*

Antechinus

Although honestly I'm not sure why a bot would need to index action=search anyway. Bots will crawl all of your public pages, which means they will index any public content that could be found through search. So, indexing the search page itself is not really useful, AFAICT. :)

Shades.

It is already:

Allow: /*action=

But I went ahead and added:

Allow: /*action=search

Allow: /*page=*

Any idea how to check it other than having to wait on the  google validation tool or whatever its called? Is there a robots.text checker out there somewhere? LOL!
ShadesWeb.com - Custom Logos - My Themes on SMF | My Themes on ShadesWeb
https://shadesweb.com

BikerHound.com - Sniffing out the road ahead
https://bikerhound.com

Dream as if you'll live forever; Live as if you'll die today. - James Dean

Shades.

Quote from: Antechinus on August 29, 2021, 06:29:37 PMAlthough honestly I'm not sure why a bot would need to index action=search anyway. Bots will crawl all of your public pages, which means they will index any public content that could be found through search. So, indexing the search page itself is not really useful, AFAICT. :)
So I could probably just leave it and ignore the error?
ShadesWeb.com - Custom Logos - My Themes on SMF | My Themes on ShadesWeb
https://shadesweb.com

BikerHound.com - Sniffing out the road ahead
https://bikerhound.com

Dream as if you'll live forever; Live as if you'll die today. - James Dean

Mick.

I think he uses Pretty URLs. Meh, junk.

Antechinus

Quote from: Shades. on August 29, 2021, 06:32:53 PM
Quote from: Antechinus on August 29, 2021, 06:29:37 PMAlthough honestly I'm not sure why a bot would need to index action=search anyway. Bots will crawl all of your public pages, which means they will index any public content that could be found through search. So, indexing the search page itself is not really useful, AFAICT. :)
So I could probably just leave it and ignore the error?
I'm not an expert on bots, so maybe get an extra opinion. But, they are designed to find and index public content all by themselves, and they're pretty good at it, so I'm not sure a search form is going to be much use to a bot.

How are they going to use it anyway? They would have to think up search terms and parameters, then enter them, then submit the form, before they could analyse the results. Then they would have to go through the results. AFAICT they might as well just index the public pages without frigging around with search form queries.

Antechinus

Hey here's an idea. Maybe Google know search forms are useless to bots, so have coded their bots to ignore them, along with obvious things like action=search.

Shades.

Quote from: Mick. on August 29, 2021, 06:34:26 PMI think he uses Pretty URLs. Meh, junk.
I'm getting away from it cause I'm starting to agree! ;)
ShadesWeb.com - Custom Logos - My Themes on SMF | My Themes on ShadesWeb
https://shadesweb.com

BikerHound.com - Sniffing out the road ahead
https://bikerhound.com

Dream as if you'll live forever; Live as if you'll die today. - James Dean

Shades.

Quote from: Antechinus on August 29, 2021, 06:57:14 PMHey here's an idea. Maybe Google know search forms are useless to bots, so have coded their bots to ignore them, along with obvious things like action=search.
Well maybe since I have:

Allow: /*action=

I can use:

Disallow: /*action=search

I dunno what you think??
ShadesWeb.com - Custom Logos - My Themes on SMF | My Themes on ShadesWeb
https://shadesweb.com

BikerHound.com - Sniffing out the road ahead
https://bikerhound.com

Dream as if you'll live forever; Live as if you'll die today. - James Dean

Antechinus


Shades.

I think I got it sorted out now thanks!

QuoteUser-agent: *

Disallow: /*action



Disallow: /*PHPSESSID

Disallow: /*;

Allow: /$

Allow: /*board*.html$

Allow: /*topic*.html$

Allow: /*.xml

Allow: /*.css$

Allow: /*.js$

Allow: /*.png$

Allow: /*.jpg$

Allow: /*.gif$

Allow: /*sitemap



Sitemap: https://shadesweb.com/sitemap.xml
ShadesWeb.com - Custom Logos - My Themes on SMF | My Themes on ShadesWeb
https://shadesweb.com

BikerHound.com - Sniffing out the road ahead
https://bikerhound.com

Dream as if you'll live forever; Live as if you'll die today. - James Dean

Mick.


Shades.

Yes, I didn't configure the robot.txt settings at first but now its set so I should be good to go now! Thanks 8)
ShadesWeb.com - Custom Logos - My Themes on SMF | My Themes on ShadesWeb
https://shadesweb.com

BikerHound.com - Sniffing out the road ahead
https://bikerhound.com

Dream as if you'll live forever; Live as if you'll die today. - James Dean

Advertisement: