What is this URLs in google search with action=tags;id

Started by helpdivya, April 07, 2009, 02:17:35 PM

Previous topic - Next topic

helpdivya

I have a forum running for long time. Earlier I guess I had a tag mod installed. Recently I update to 2.0 RC1 and I removed most of the mod from it.

Now when I see lots of URLs like below in google search page, and if you click on that link it take you to home page. I am really confused.. any pointers or suggestions on this will be helpful.

=================================
Google search URL that I see in google search
==================================

www.ekhichdi.com/index.php?action=tags;id=528
http://www.ekhichdi.com/index.php?action=tags;id=545
www.ekhichdi.com/index.php?action=tags;id=543

karlbenson

Obviously you don't have that mod installed anymore.
But links to it will still remain in google (until it works them out of its system).

If the mod isn't installed, the action=tags is invalid, and so it will fall through to the smf boardindex

helpdivya

I really don't know which mod is that? or is it SMF out of box install?

karlbenson

Those urls are not part of any default smf versions.

They are from either the tagging system mod or googletagged mod.

H

As with your other topic, these urls should disappear as Google re-crawls the pages :)
-H
Former Support Team Lead
                              I recommend:
Namecheap (domains)
Fastmail (e-mail)
Linode (VPS)
                             

helpdivya

#5
It has been almost 3 months now but still i see lot of URls with tag id in google search. And if I click on them it takes me to the home page of my forum.

eg. Check this www.ekhichdi.com/index.php?action=tags;id=138

And this URL www.ekhichdi.com/index.php?action=tags;id=532 is just couple of weeks old and still it is getting in Google cache. So that means still there is something on my forum that is getting this tag number. I dont know what.

It looks like apart from the main content page all the other SMF forum URLs are getting crawled. This is driving me insane.

Pages like http://www.ekhichdi.com/microsoft-sharepoint-server/moss-2007-alerts-not-working/0/?action=printpage
This pages has <meta name="robots" content="noindex" /> still it is getting index.
I dont want any page that has printpage like above to be in google crawled. What can be done to stop this?

I have only 2 mods installed. 1) Pretty uRl 2) Ad management

karlbenson

You can manually remove any link from google via Google Webmasters.
(one link at a time). :(

If the noindex is set Google shouldn't be indexing it.

The only reason I can see is if your using a robots.txt to block those sections.
Since robots.txt are only crawler directives, they would prevent google from crawling those pages, so it will never see the meta noindex tag. (and since robots.txt does not cause/effect the removal of any pages from the index, there wouldn't be any change, they would be stuck in the index until such time as google dumped it).

helpdivya

I am not blocking any pages in my robots.txt. If you want to check then you can check my robots.txt

Here is content of it.
============================
User-agent: gotdotnet.ch proxy services
Disallow: /

# Disallow TerrawizBot

User-agent: TerrawizBot
Disallow: /


User-agent: *
Disallow: /main
Disallow: /_db_backups

Sitemap: http://www.ekhichdi.com/sitemap.xml

helpdivya

#8
Any update on the above question? I find all irrelevant pages in inde rather then the important page like post.

I repeat I have only 2 mods installed 1) Pretty URL 2) Ads management

And post just 10 days are getting inde with tag is. Just driving me crazy.

Now after moving to RC1 I see lot of pages with/wap2 in google search.

See pages like

www.ekhichdi.com/football/?wap2
www.ekhichdi.com/football/pele-or-maradona/?wap2
www.ekhichdi.com/football/pele-robbed-at-gunpoint/?wap2
www.ekhichdi.com/microsoft-sharepoint-server/9/?wap2
www.ekhichdi.com/wireless/windows-vista-unable-to-take-ip-from-wireless-dhcp-router/?wap2

These pages are devoid of any themes and ads. I don't want all these pages in index.

Please help.

karlbenson

try noindexing the wap2 pages in Themes/default/wireless.template.php

helpdivya

Thanks a ton for that. Could you please provide the code for that. I guess it would be in meta tag but still if you could provide details of it it will be helpful.

Apart from that Please please please let me know how do avoid print page and tags page from indexing. The prime reason to have this post open here.

Please help

helpdivya

Does anyone as any update on my problem. Please help me.

H

The easiest thing to do would just be to add these pages to robots.txt


User-agent: *
Disallow: *wap*

User-agent: *
Disallow: *imode*

User-agent: *
Disallow: *tags*
-H
Former Support Team Lead
                              I recommend:
Namecheap (domains)
Fastmail (e-mail)
Linode (VPS)
                             

helpdivya

I have added the below code in robots.txt. I found that print and wap url has ? in it.

QuoteUser-agent: Googlebot
Disallow: /*?

Please verfiy if this is right code or else I will out your code given above.

H

I wouldn't block anything containing ? as this will block anything that is not being 'prettified' by an SEO mod
-H
Former Support Team Lead
                              I recommend:
Namecheap (domains)
Fastmail (e-mail)
Linode (VPS)
                             

helpdivya

#15
I am not using SEO mod on my site. Also in above reply you said try noindexing the wap2 pages in Themes/default/wireless.template.php

Could you please provide me the exact code to noindex the wap page and print page and where do I insert the code?

H

You must be using some sort of prettyurl/SEO mod as your URLs are not the standard SMF format! :)
-H
Former Support Team Lead
                              I recommend:
Namecheap (domains)
Fastmail (e-mail)
Linode (VPS)
                             

helpdivya

Yes I do use Pretty URL mod but still the printpage is something that is not effected by Pretty URL. The printpage is the OOB SMF forum feature.

JBlaze

helpdivya, is this issue resolved?

Google keeps indexes of pages that may not still be in use, as regularexpression pointed out.

If this issue is resolved, please mark it so.
Jason Clemons
Former Team Member 2009 - 2012

hai2hai

Please one have solution to solve the "google index ?action=printpage,..." problem?
Phần mềm kế toán VNUNI A-Excel

Advertisement: