Bandwidth usage almost doubled

Started by Teh Lizzeh, November 18, 2014, 12:27:56 PM

Previous topic - Next topic

Teh Lizzeh

I mean the majority of the members I get aren't through google anyway so it wouldn't make that much of a difference I suppose. I'm mostly wondering if it'll significantly save my bandwidth though. I suppose of course I could always try and see what it does of course, if it doesn't help, it doesn't harm anything either. I'm currently awaiting a message back from my host and I'd like to thank all of you for your help. I'll be marking this as solved since I don't think there's more any of you could do with the information I'm able to provide and access at the moment. All of your suggestions did help for sure, though!

Gwenwyfar

Now I'm not really sure you can disable those based on group, you'd have to check (net being a &¨%$$ with dns so my forum isn't loading to check =/), but considering a good part of the bandwidth often comes from images like banner, avatars and so on, perhaps you could just disable those for bots? I know you can create a group with permissions just for them.

Its about the same with my forum (almost no one comes from google), but there's still those that do so its not completely useless even if that's the case for you :P
"It is impossible to communicate with one that does not wish to communicate"

Arantor

It is actually possible to hide avatars for guests without a mod with a bit of abuse of the innards.

Go into the smf_themes table, and add a row of:
id_member = -1
id_theme = 1
variable = show_no_avatars
value = 1

A new row in the table with these values should hide avatars from guests which can cut down your bandwidth use.

Illori

Quote from: Arantor on November 19, 2014, 05:36:07 PM
That's relatively new then since it never used to honour robots.txt.

new as in at least 2 years old then ;)

Arantor

Nah, I'm sure I've seen Baidu fail to listen more recently than that.

Steve

Quote from: K@ on November 19, 2014, 05:06:10 PM'course, you could have others, like Baidu, spidering. They totally ignore robots.txt and you have to keep 'em out using an .htaccess file.

Is there a tutorial/instructions somewhere to teach me how to do this?
DO NOT pm me for support!

Illori

my robots.txt file just has

#Baiduspider
User-agent: Baiduspider
Disallow: /
in it and it seems to work fine. you should try it.

Gwenwyfar

#27
Doesn't seem to work for me. Added that early this morning, Baidu is still going all over my forum all day. Unless they only check the robots file once in a while... Will wait more and see if it stops.

Edit: Oddly enough, its not showing up on the host stats, it shows other bots but not baidu. According to that the other bots together are making up about just 15mb of bandwidth for the month out of another 370mb by pretty much me alone. Pretty low if its accurate. Hope it stays low :P

Edit2: Seems they check the file daily or something around that, it stopped showing up after midnight :)
"It is impossible to communicate with one that does not wish to communicate"

Teh Lizzeh

My host helped me figure  out that it's the Bing spider using about 4.5 GB per month. I've added a robots.txt and I'm hoping that this will stop it from bugging around my site >.> Google only uses 370 MB in comparison, Baidu hasn't shown it's face thankfully o.o

kat


Teh Lizzeh

Both google and bing seem to ignore my robots.txt did I misplace it in my database? It's both in the www folder and the forum folder AND the folder that holds all the actual forum files (I wasn't sure where to put it since it only shows up on the server in the last one but it said to put it in the root)

Arantor


Teh Lizzeh

childrenofolympus.net if you add /robots.txt it does show up but only when I put it in the forum folder in my FTP

Arantor

Well, the fact it shows up indicates that you hit the right place and I don't see anything obviously wrong with it other than the fact you're telling every search engine to go away (which may be desirable, I don't know)

Teh Lizzeh

Yeah I figured I would test it out by telling them all to go away. I would like to allow google in there but I wanted to check if it worked at all, which, clearly it's not because somehow both Bing and google are still showing up. Would it make more sense to try to block Bing separately? And would I have to add each agent on a separate piece of code?

Arantor

Firstly, there is always a delay in between dropping the file in and them responding to it - can be several days.

Secondly, some bots claim to be Google or Bing when they are not (like spammers), and they will ignore robots.txt anyway.

Thirdly, if you want to allow Google in but not Bing, you will want to be blocking Bing on its own anyway (since the concept of robots.txt is 'if you're not listed here, it's OK to come in')

Fourth, yes you will need one entry per agent because the User-agent line only understands the name of a user-agent or * for everything. There's nothing formally supporting more complex constructions like User-agent: Google, Bing for example.

Teh Lizzeh

Ah okay I figured that it would have responded to it by now since I dropped the file yesterday afternoon but if it can also take several days that makes more sense.

so my code would look like this if I'm correct?

User-agent: Bingbot
Disallow: /

And that for every agent I want to block, of course, but that's the only one that seems to be on my site.

Arantor

That assumes Bing tests for Bingbot; it may also test for msnbot. Also not sure if it is capital B on that or not, to be honest, been a while since I checked what Bing uses these days.

Teh Lizzeh

well Bingbot is the one that comes on my site, I've looked and there's four agents they use, Bingbot being the only one that ever comes on my site. But I see no problem in blocking all of them. I just want them to go away.

Steve

I'm a complete dolt with this stuff so let me make sure I understand. If I create a text file called robots.txt and put in it:

User-agent: *
Disallow: /

That would stop any spider or bot that doesn't ignore robots.txt? And it goes in the same folder as my .htaccess?
DO NOT pm me for support!

Advertisement: