Bots are not entering in my site!

Started by SwapsRulez, April 24, 2008, 11:44:16 AM

Previous topic - Next topic

SwapsRulez

Hi there,

I've a SMF 2.0 Beta 3 Forum here with the third party SMF theme installed.

http://www.project-bb.org/

In my site, the bots are coming, but they arent crawling the pages. I got this info from the logs in the search engine section in the admin CP. I've kep following setting for the bots.

Search Engine Tracking Level : Very High
Permissions : Regular Members
Show spiders in who's online list : Show Spider Details

Also i havent kept the robots.txt because i dont have any private data with me to restrict access.
I want the bots to go everywhere in the site. I have about more than 900 topics. But still not a single topic is crawled by the bots. Also i have installed the Pretty URL's.

If i'm missing anything or any permission set, Please do let me tell. I really need bots with me. :(

Thanks in advance. Waiting for the reply :)
Project-BB.org : Educational Forum For Engineering, Diploma & Technical Students

The Engineering, Diploma & All technical students lounge for Free Projects, Seminars, Syllabus, Question Papers, College Assignments, Placement Papers, E-Books, Company Information & other technical stuffs.

Kindred

first of all, the robots.txt keeps the bots from doing things like scanning the images directory, attempting to scan profiles, etc. You should put it back in.

Second, every time you CHANGE the way the site works (urls vs seo, vs prettyurls, etc) it resets the scanning of your site.

third... sometimes it takes weeks or even months for your site to be spidered.  How do you KNOW that "not a single topic is crawled by the bots"?

last, approrpiate links to your site/content on other sites will help more than any silly "url" mod.
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

karlbenson

1) Your website was only registered on 14-Apr-2008
It can take WEEKS or MONTHS to get SOME pages in search engines/spiders to visit regularly.
Google in particular is suspicious of all new websites (until they are trusted), for black-hat activity such as content scraping (using content scraped/extracted from other websites, and other cheats used by bad folk)

2) It appears that your site is being redirected when trying to access your robots.txt
http://www.project-bb.org/robots.txt

3) Most of your topics appear to be 'SCRAPED/copied and pasted' topics/posts from other website.
No doubt if search engine spiders detect your content is scraped them may penalise you.

SwapsRulez

Thanks Kindred & karlbenson, For you kind information.

Fiirst of all, i've created now the robots.txt here

http://www.project-bb.org/robots.txt

Secondly, @Kindred, I checked out the spider logs from the admin its showing that the bots are viewing the board index only. As you can see in the screenshot below



Also, i think before URL Mod also the bots were not entering into my site.

@karlbenson, I know that my site is quite new, but as the bots are reaching to the board index, they must enter the site to check out if the contents are copied or not. Also though some contects are copied. Most of the programs & other things are genuine of mine. They are created by myself.

I think this is not the main problem, there must be some problem with the main index page in some nofollow tag, because when i checked it in the xml-sitemap.com tools to simulate the google bot, i tell me that robots.txt & main index page's nofollow tag isnt allowing your site to crawl.

So i have created the simple robot.txt now, but still i'm clueless about the index.php's nofollow tag. I know little about html. But as the index page is generating at the runtime with the php, i cant handle that situation. Also is there anything with the pretty url's MOD that is making my board index nofollow.

Thanks in advance. Waiting for your reply. :)
Project-BB.org : Educational Forum For Engineering, Diploma & Technical Students

The Engineering, Diploma & All technical students lounge for Free Projects, Seminars, Syllabus, Question Papers, College Assignments, Placement Papers, E-Books, Company Information & other technical stuffs.

Kindred

Quote from: Kindred on April 24, 2008, 11:54:54 AM
last, approrpiate links to your site/content on other sites will help more than any silly "url" mod.


so, they have been hitting your site for a few days. It's going to take them a while to actually get past the index.
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

SwapsRulez

Quote from: Kindred on April 25, 2008, 08:03:57 AM
Quote from: Kindred on April 24, 2008, 11:54:54 AM
last, approrpiate links to your site/content on other sites will help more than any silly "url" mod.


so, they have been hitting your site for a few days. It's going to take them a while to actually get past the index.


Wow.... so for how many days, they are going to crawl my index only. Actually i do know much about the crawlers & also know that they seperate out the links from the page & then index it. But here the scenario is that they are just reaching to the index.php page & after that something isnt allowing them to enter into the site.

What is that something ??? I want to know that thing only :(
Project-BB.org : Educational Forum For Engineering, Diploma & Technical Students

The Engineering, Diploma & All technical students lounge for Free Projects, Seminars, Syllabus, Question Papers, College Assignments, Placement Papers, E-Books, Company Information & other technical stuffs.

Oldiesmann

As a guest, I can view everything just fine, so bots should be able to view everything just fine as well.

Also, I highly recommend that you don't give bots more access than guests - giving them the same permissions as regular members may cause problems.
Michael Eshom
Christian Metal Fans

SwapsRulez

#7
Quote from: Oldiesmann on April 25, 2008, 11:42:37 PM
As a guest, I can view everything just fine, so bots should be able to view everything just fine as well.

Also, I highly recommend that you don't give bots more access than guests - giving them the same permissions as regular members may cause problems.

From where should i give them the permissions ? If you talking about the Admin -> Search Engines -> Settings -> Apply restrictive permissions from group, then there is no entry for the Guest in that combo box. That's why i've set it to Regular members, what should i do ???

Please i really needed bots with me. Even a single page from my site is not crawled. Even if the bots are coming to index. They havent crawled index.php also. :(

Now i fixed my robots.txt


But.... on some online sitemap generator, it showed me that my index page has nofollow tag.
If i ruined my forum with such settings, please do let me know what i have done actually. Also how to set all the pages index & follow tags ?

Please, help me. Thanks in advance.


EDIT : If the current scene continues & if the SMF is unable to give me support for my problem, i will have to switch to phpBB sadly though being a SMF lover.
Project-BB.org : Educational Forum For Engineering, Diploma & Technical Students

The Engineering, Diploma & All technical students lounge for Free Projects, Seminars, Syllabus, Question Papers, College Assignments, Placement Papers, E-Books, Company Information & other technical stuffs.

Nigel

Hi SwapsRulez,

QuoteIf the current scene continues & if the SMF is unable to give me support for my problem, i will have to switch to phpBB sadly though being a SMF lover.

I think you'll find blackmail tactics gets frowned upon here – especially as everything is free and run by volunteers. Nobody will really care if you switch.

But that said, you just need to be patient. As KarlBenson said; "Google in particular is suspicious of all new websites (until they are trusted.)" Spiders will visit, spiders will build up a picture of your site, spiders will even list a few pages, but it can take up to a year before a site starts to get fully indexed.

So rather than wasting time worrying about it now, this time could be much better spent building up a really good site/forum – full of useful original content that people want to read and link to. Then when Google decides to start properly indexing there'll be plenty for it to list.

QuoteFrom where should i give them the permissions ?

I think Oldiesmann was saying DON'T give them any more permissions. As long as Guests can freely browse your forum, so can spiders. If you start giving more permissions you could find things like member's profiles being listed on Google. Your members will be extremely displeased and delete their accounts.

QuoteBut.... on some online sitemap generator, it showed me that my index page has nofollow tag.

How about doing some 'real' research? View the source code in your browser and search for nofollow tags. I did with your above mentioned index page and found only 3 – your recent topics list.

It takes time...
Nigel

SwapsRulez

Thanks for taking interest in my problem/discussuin Nigel, & sorry for that blackmailing. Actually i dont want to move the topic to that direction.

Sooo... i checked the source code & found that there is a nofollow tag for each section of the forum. That's why its not entering in any board of the forum. Please If you can tell me from where does this tags generate & how do i disable all nofollow & noindex tags, if any!

Thanks in advance. Waiting for your reply. :)
Project-BB.org : Educational Forum For Engineering, Diploma & Technical Students

The Engineering, Diploma & All technical students lounge for Free Projects, Seminars, Syllabus, Question Papers, College Assignments, Placement Papers, E-Books, Company Information & other technical stuffs.

Nigel

QuoteSooo... i checked the source code & found that there is a nofollow tag for each section of the forum.

Where are you checking – the actually source code as displayed in your browser or the SMF files? Noindex and nofollow do appear in the original files but in conditional (if) statements. But they're only added to the final code if required – eg to avoid duplicate content, etc.

I just checked a random board index and a random topic on your site and only found 2 nofollows – both on the print button.

Nigel

SwapsRulez

Quote from: Nigel on April 26, 2008, 01:23:17 PM
QuoteSooo... i checked the source code & found that there is a nofollow tag for each section of the forum.

Where are you checking – the actually source code as displayed in your browser or the SMF files? Noindex and nofollow do appear in the original files but in conditional (if) statements. But they're only added to the final code if required – eg to avoid duplicate content, etc.

I just checked a random board index and a random topic on your site and only found 2 nofollows – both on the print button.

Nigel


Very true. Then what might be the reason to not to even enter into any section. :(
Also my previous phpbb free hosting forum was quite good crawled by the crawlers with no contents :|
Even if my index page isnt crawled yet.

No issue. I will wait as you said. But still hoping everyone's friend (google) to be my friend also :)
Project-BB.org : Educational Forum For Engineering, Diploma & Technical Students

The Engineering, Diploma & All technical students lounge for Free Projects, Seminars, Syllabus, Question Papers, College Assignments, Placement Papers, E-Books, Company Information & other technical stuffs.

starlion

Quote from: Kindred on April 24, 2008, 11:54:54 AM
first of all, the robots.txt keeps the bots from doing things like scanning the images directory, attempting to scan profiles, etc. You should put it back in.

Second, every time you CHANGE the way the site works (urls vs seo, vs prettyurls, etc) it resets the scanning of your site.

third... sometimes it takes weeks or even months for your site to be spidered.  How do you KNOW that "not a single topic is crawled by the bots"?

last, approrpiate links to your site/content on other sites will help more than any silly "url" mod.

I think,too. And Smf 2.0 is beta version now. :D if you create robots.txt bots are can entrance to your web side :D

SwapsRulez

Quote from: starlion on April 26, 2008, 02:45:51 PM
Quote from: Kindred on April 24, 2008, 11:54:54 AM
first of all, the robots.txt keeps the bots from doing things like scanning the images directory, attempting to scan profiles, etc. You should put it back in.

Second, every time you CHANGE the way the site works (urls vs seo, vs prettyurls, etc) it resets the scanning of your site.

third... sometimes it takes weeks or even months for your site to be spidered.  How do you KNOW that "not a single topic is crawled by the bots"?

last, approrpiate links to your site/content on other sites will help more than any silly "url" mod.

I think,too. And Smf 2.0 is beta version now. :D if you create robots.txt bots are can entrance to your web side :D

Yep, i created robots.txt here

http://www.project-bb.org/robots.txt
Project-BB.org : Educational Forum For Engineering, Diploma & Technical Students

The Engineering, Diploma & All technical students lounge for Free Projects, Seminars, Syllabus, Question Papers, College Assignments, Placement Papers, E-Books, Company Information & other technical stuffs.

Nigel

QuoteThen what might be the reason to not to even enter into any section.

Dunno... I'm not a SEO expert. One thing to bear in mind is that there's a lot of links on a forum page. Bots tends to spend a certain amount of time or follow a certain number of links per visit and it's gonna take time. Plus the newness of your site is gonna make a difference. My forum isn't live yet but with my website it took about a year before Google started to fully index the pages, despite visiting regularly.

Quote from: starlion on April 26, 2008, 02:45:51 PMIf you create robots.txt bots are can entrance to your web site.
Quote from: SwapsRulez on April 26, 2008, 03:06:57 PMYep, i created robots.txt here.

Firstly the robots.txt file tells the bots where NOT to go. Normally they go everywhere and you can exclude them from certain areas you don't want crawled. Note however that only the good bots abide by those rules. The bad ones will go there anyway.

Second, your robots.txt file isn't doing anything.


User-agent: *
Disallow:


The first line is targeting every bot, then in the disallow line you specify the files/folders you don't want crawled. But as I said, that will keep them out of certain areas, NOT coax them in.

I'd suggest forgetting about all this stuff for now and instead focus on building up a good site first. Then they will come...

Nigel

SwapsRulez

Quote from: Nigel on April 26, 2008, 04:15:13 PM
QuoteThen what might be the reason to not to even enter into any section.

Dunno... I'm not a SEO expert. One thing to bear in mind is that there's a lot of links on a forum page. Bots tends to spend a certain amount of time or follow a certain number of links per visit and it's gonna take time. Plus the newness of your site is gonna make a difference. My forum isn't live yet but with my website it took about a year before Google started to fully index the pages, despite visiting regularly.

Quote from: starlion on April 26, 2008, 02:45:51 PMIf you create robots.txt bots are can entrance to your web site.
Quote from: SwapsRulez on April 26, 2008, 03:06:57 PMYep, i created robots.txt here.

Firstly the robots.txt file tells the bots where NOT to go. Normally they go everywhere and you can exclude them from certain areas you don't want crawled. Note however that only the good bots abide by those rules. The bad ones will go there anyway.

Second, your robots.txt file isn't doing anything.


User-agent: *
Disallow:


The first line is targeting every bot, then in the disallow line you specify the files/folders you don't want crawled. But as I said, that will keep them out of certain areas, NOT coax them in.

I'd suggest forgetting about all this stuff for now and instead focus on building up a good site first. Then they will come...

Nigel


Yep... i do know robots.txt as well, & the creator of the standard knows me :P
Coz while reading his site, i told him some mistakes in the site & he corrected it. :D

Also i'm going to wait now... Lets see what happens!!!! :)
Project-BB.org : Educational Forum For Engineering, Diploma & Technical Students

The Engineering, Diploma & All technical students lounge for Free Projects, Seminars, Syllabus, Question Papers, College Assignments, Placement Papers, E-Books, Company Information & other technical stuffs.

karlbenson

Do you still require assistance with this?


Advertisement: