Googlebot onclick crawling (was:"Googlebot issues")

Started by cortez, February 25, 2014, 10:21:22 AM

Previous topic - Next topic

Kindred

You can not block htaccess variables n robots.txt.
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Gwenwyfar

#21
What do you mean with "htaccess variables"? robots.txt allows to block parts of links doesn't it? And google is acting as if the click action is a link ending with "%1$d". So you could do a disallow: *%1$d if what I saw about it is correct.

I'll give this a try, I'll post the results later.

byproduct

just a idea...

"Disallow: /forum/*action*"

?* at front, * at back.....to broad of a wildcard match?

INSTEAD SUMN LIKE
"/forum/index.php?action=*"

Gwenwyfar

Just to update on this, a month later after adding Disallow: *.%1$d there have been no more errors with this.

shawnb61

Thanks for the update.

Do you use Google's Webmaster Tools?    Are you comfortable that Google is crawling the rest of your content correctly?

I was a bit too aggressive with robots.txt at one point & accidentally stopped crawling altogether.   
A question worth asking is born in experience & driven by necessity. - Fripp

Gwenwyfar

#25
Yes, otherwise I wouldn't know there were errors from it ;)

By default, not quite. I had to block some parts of the forum as well, or actions, because they were creating errors too. Like register page, members profile (I have guest viewing disabled), and some others. But other than that everything is fine. This is how my robots.txt looks like right now:

User-agent: *
Disallow: /index.php?action=login
Disallow: /index.php?action=register
Disallow: /index.php?action=stats
Disallow: /index.php?action=profile
Disallow: /index.php?action=dlattach
Disallow: /index.php?action=shop
Disallow: *.%1$d
Disallow: /index.php/board,3.0.html (yes, I know the topics inside will still be crawled this way, but only if the links are seen elsewhere)

The only problem I noticed is that the keywords it is listing for the forum seem completely off. At least what it lists on the top keywords. Like adding "who's viewing this topic" lists to the keywords, or things like "reply" "post" and some name of a custom profile field. Names of users, group names (even things like "admin"), dates... and almost no actual post content keywords. Feels like its completely lost on what is what and putting more weight on all the interface elements than the actual post content. But then again I don't know how it adds keywords to normal sites, so I don't know if this is normal or not. The only thing in the interface it is not adding to keywords is the header area (menu/forum name/forum slogan), and the footer.

There's that data highlighter but no idea how useful it actually is. Forum or more "general" type of content is not even on the list when using it.

Kindred

none of those pages should create errors - even if accessed by a guest
Assuming permissions are correct, there should be no need to block access to those features, since google will never even see them as a guest 9excpet for login and register)

if guests ARE seeing those locations, then you have poorly configured your permissions.

However.... even then- none of those should be generating errors -- and if they are, then you have something poorly coded.
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Gwenwyfar

#27
It calls it "soft 404", the page gives a error/warning to the guest viewing it and no content, so its a error page to google. For register/login I just don't want that showing up on search. I do not have anything poorly configured on my permissions. That would be google being lost on forum content :P

And I did not modify anything on any of those pages, so, no, its not that either. Those are default smf pages with default working permissions. The only different thing on them would be a slightly different header (I just removed things from the html), and that there's a sidebar, which seems its properly counting as such since it ignores it for the most part.

QuoteAssuming permissions are correct, there should be no need to block access to those features, since google will never even see them as a guest 9excpet for login and register)
Well, it does. Just like it uses the problematic pages link this topic was about which you said it also shouldn't...

Kindred

well, shop is not a default page.... :P


And google only follows links that it can see...
by default, SMF does not EVER show an action to a user who can not complete that action.
If guests are not supposed to view attachments (the dlattach action) then the attachment is not even displayed -- UNLESS you have altered the basic code to show attachments to guests but not allow them to view attachments.
The same goes for profiles and stats...   if those actions are not valid for guests, then SMF does not even display a link to them... UNLESS you have altered the basic code


So -- once again, I state: those robots.txt entries are not needed.
(They won't HURT anything, but telling people to use them is incorrect, since they serve no useful purpose in a properly configured system.)

hmmm....    actually, again, I just checked... in a properly configured system, attempting to access those sections leads to the login page not an 404 page (soft or hard)
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Gwenwyfar

And it gives the same error as the others. That the guest can't access it. (And it makes no difference anyways, since I blocked this right after opening my forum to guests, before I got almost any mods at all, and to this date none of them modify permissions or these pages to do anything other than default...)

You're forgetting that it can find these links inside posts, or in my case, also on some of the board's descriptions, as I link childboards on them and some "index" topics. For profile links, that's a little obvious, you can see the profile links as a guest. The same for attachments, those are linked in many topics and google finds them.

My permissions are working perfectly and no unproper links/content show up for guests, just as default.

Quote
hmmm....    actually, again, I just checked... in a properly configured system, attempting to access those sections leads to the login page not an 404 page (soft or hard)
I'm not completely sure on that. I have added that a long time ago already, and the errors are long gone, so I can't confirm it. I recall it being listed under soft 404. On any case I either added it because it was listed as an error (what I think is most likely), or it was showing up on search (more unlikely since I rarely check that and practically only did so with my forum name), both of which are a problem, so its better this way.

Advertisement: