I think that my robot.txt has a problem

Started by deprueba, November 18, 2011, 12:34:29 PM

Previous topic - Next topic

deprueba

Hello,

I have a consult to do. In my website root I have a blog based in Wordpress. Google has indexed all my blog pages and posts from root page.

My SMF forum is hosted in the same place but in a subdomain called "foro". The forum have only one page indexed in Google, the main page. Yesterday I read in Google that their robots don't read the robots.txt if are uploaded in a subdomain. Then I deleted my forum robot.txt in the subdomain and paste the contents within the file robot.txt in the root.

Now, my only robots.txt is this:

# Sitemap
Sitemap: http://heremiwebsite.com/sitemap.xml

# Ficheros y directorios a des/indexar de nuestro WordPress
User-Agent: *
Allow: /wp-content/uploads/
Allow: /feed/$
Disallow: /wp-
Disallow: /wp-content/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /feed/
Disallow: /?s=
Disallow: /search
Disallow: /archives/
Disallow: /index.php
Disallow: /*?
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: */feed/
Disallow: */trackback/
Disallow: /page/
Disallow: /tag/
Disallow: /category/
Disallow: /foro/index.php?action=help*
Disallow: /foro/index.php?action=search*
Disallow: /foro/index.php?action=login*
Disallow: /foro/index.php?action=register*
Disallow: /foro/index.php?action=profile*
Disallow: /foro/index.php?action=arcade*
Disallow: /foro/index.php?action=printpage*
Disallow: /foro/index.php?PHPSESSID=*
Disallow: /foro/index.php?*wap*
Disallow: /foro/index.php?*wap2*
Disallow: /foro/index.php?*imode*

# Reglas para los bots más conocidos

User-agent: Googlebot-Image
Disallow: /wp-includes/
Allow: /wp-content/uploads/

User-agent: Mediapartners-Google
Disallow:
User-agent: ia_archiver
Disallow: /
User-agent: duggmirror
Disallow: /
User-agent: noxtrumbot
Crawl-delay: 50
User-agent: msnbot
Crawl-delay: 30
User-agent: Slurp
Crawl-delay: 10
User-agent: MSIECrawler
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: HTTrack
Disallow: /
User-agent: Microsoft.URL.Control
Disallow: /
User-agent: libwww
Disallow: /


I've created and sent to Google sitemaps from both sites.

I would apreciate any advice and opinion. Does this code will work?

Thank you for reading.

KensonPlays

Try this, I used a analyzer:

# Sitemap
Sitemap: http://heremiwebsite.com/sitemap.xml

# Ficheros y directorios a des/indexar de nuestro WordPress
User-Agent: *
Allow: /wp-content/uploads/
Allow: /feed/$
Disallow: /wp-
Disallow: /wp-content/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /feed/
Disallow: /?s=
Disallow: /search
Disallow: /archives/
Disallow: /index.php
Disallow: /*?
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /feed/
Disallow: /trackback/
Disallow: /page/
Disallow: /tag/
Disallow: /category/
Disallow: /foro/index.php?action=help*
Disallow: /foro/index.php?action=search*
Disallow: /foro/index.php?action=login*
Disallow: /foro/index.php?action=register*
Disallow: /foro/index.php?action=profile*
Disallow: /foro/index.php?action=arcade*
Disallow: /foro/index.php?action=printpage*
Disallow: /foro/index.php?PHPSESSID=*
Disallow: /foro/index.php?*wap*
Disallow: /foro/index.php?*wap2*
Disallow: /foro/index.php?*imode*

# Reglas para los bots más conocidos

User-agent: Googlebot-Image
Disallow: /wp-includes/
Allow: /wp-content/uploads/

User-agent: Mediapartners-Google
Disallow:
User-agent: ia_archiver
Disallow: /
User-agent: duggmirror
Disallow: /
User-agent: noxtrumbot
Crawl-delay: 50
User-agent: msnbot
Crawl-delay: 30
User-agent: Slurp
Crawl-delay: 10
User-agent: MSIECrawler
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: HTTrack
Disallow: /
User-agent: Microsoft.URL.Control
Disallow: /
User-agent: libwww
Disallow: /

And this here is a tutorial on creating one: http://tools.seobook.com/robots-txt/

deprueba

Uploaded! I will tell you the results.

Thank you very much.

KensonPlays

Your welcome :) If it doesn't work, feel free to post again :)

deprueba

Let me a question. How much time does Google take in reading the new robot file? My actual position says 5 days ago...

Illori

you would have to ask google, it depends there is no set amount of time that it will use to crawl any one site.

deprueba

Quote from: Illori on November 19, 2011, 06:33:56 AM
you would have to ask google, it depends there is no set amount of time that it will use to crawl any one site.

... is a question of time.

deprueba

Quote from: Kcmartz on November 18, 2011, 05:57:45 PM
Your welcome :) If it doesn't work, feel free to post again :)

That's working! Thank you very much.

nutn2lewz

#8
Quote from: deprueba on November 18, 2011, 12:34:29 PM
... Yesterday I read in Google that their robots don't read the robots.txt if are uploaded in a subdomain. Then I deleted my forum robot.txt in the subdomain and paste the contents within the file robot.txt in the root. ....


Google does read robots.txt files in subdomains ...

https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
[nofollow]

Just thought I would post so others know that Google DOES look for robots.txt files in subdomains. Google will look for robots.txt at subdomain.domain.com/robots.txt [nofollow] which is probably located in your domain/subdomain directory.

Good luck, Barry

Advertisement: