News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

Bandwidth usage almost doubled

Started by Teh Lizzeh, November 18, 2014, 12:27:56 PM

Previous topic - Next topic

Teh Lizzeh

The bandwidth usage on my site has almost doubled over the past four months or so, but the actual members and traffic has been relatively constant and while it has gotten a bit higher, it hasn't doubled at all. What are some of the other things that can cause this and how can I stop it? I'd like to avoid having to start paying more for my hosting unless I really have to. I'm not sure if I should be posting this here so if I'm in the wrong place please let me know where I should go~

Gwenwyfar

What mods and themes you have installed? Some of those (like chats for instance), can eat a big chunk of bandwidth. If you only have default smf then perhaps attachments/avatars and alike being used more?
"It is impossible to communicate with one that does not wish to communicate"

Teh Lizzeh

QuoteOnline Only Groups in the Group Key    1.0
Voter Visibility    2.0
Stars And Badges -JRJ-    2.1
Custom Username Color    1.2
SMF Staff Page    1.7
Board Notes    1.0.8
Dice Roller BBcode    1.3    
Age And Location    1.0
Member Color Link    3.1    
SubAccounts Mod    1.0.1
SimplePortal    2.3.5

That's all the mods I have right now but I've had all of them basically since the beginning and it's only spiked the past few months. No one's using attachments and the avatars have also not been used that much more. Do you think maybe asking people to link them instead of using the upload button would help a little? I can't imagine that almost doubling the usage though?

Themes have been the same for almost two and a half years, too.

Illori

sounds to me like you are getting more hits by guests and spiders. do you have the spiders log enabled in the admin panel? if so you can check and see if you are getting at on of hits from a specific spider and maybe need to block that spider.

Gwenwyfar

Well it depends on how much you're talking about to begin with... if your usage was low, then doubling a low usage is still low, so these small things could make a big difference. Now if it is high and doubled, then there's something else in there, like the spiders/guests as Illori pointed out :P

And yeah, many of these mods wouldn't do anything to your bandwidth, those that do don't seem like much. Perhaps the chat from simpleportal if you have that active. Can't remember if that one has this option, but if that is the case you could reduce the refresh time on the chat.

You could always just find a better host as well if that's the case, there are some good ones (also some listed in the hosts section in the forum here) that offer a lot of bandwidth for a low price.
"It is impossible to communicate with one that does not wish to communicate"

Teh Lizzeh

#5
my usage went from about 7-8GB per month to 14-15GB per month, I'm not sure if that's a lot or not. And I do not have the spider logs enabled currently so I'll do that and see what it does.

Also I do not have the chat from simpleportal enabled so that's definitely not it~

Just for reference, how many hits per spider in a day is considered normal? Just so I know what I should be looking out for~

kat

Have a look at the "Logs" section of your CPanel. You'll have things like Webalizer, Bandwidth and suchlike.

They might tell you something.

Teh Lizzeh

I've looked at those logs and all they tell me is how much I'm using but it doesn't give me details of what changed to make it spike so much.

Gwenwyfar

The "Awstats" will give you better info on where or who was accessing more and so on if your host is using cPanel, as well as other log pages along with that one.
I think that's what K@ was talking about.
"It is impossible to communicate with one that does not wish to communicate"

Teh Lizzeh

I'm getting 404 not found error when trying to access the Awstats so I'm gonna see if my host can help me out with that. Does anyone know what the average spider traffic is so I can see if that's abnormally high? That way at least I can see if that's causing this or not. And if not, thanks for all of your help, you guys are awesome.

Gwenwyfar

See if the others work, most of those links on that section are logs/stats, like webalizer and etc.
(not that you shouldn't get your host to fix that anyways :P)

About a "normal" or "average" spider traffic, I don't know either, sorry.

"It is impossible to communicate with one that does not wish to communicate"

Teh Lizzeh

Yeah the rest of the cpanel works just fine it's just that for the AWstats I have to follow a separate link that's linked to my website and that one specifically doesn't work. The logs and stats that I can find directly on my cpanel though don't tell my anything specific about who or what's causing the traffic they just tell me how much it happened and when.

Gwenwyfar

Many of those logs have links to take you to more detailed stats. On webalizer for instance you can see which pages or files got more hits, which pages used more bandwidth etc.
"It is impossible to communicate with one that does not wish to communicate"

JBlaze

Just wait until one of your websites gets linked to on Reddit... :P
Jason Clemons
Former Team Member 2009 - 2012

Teh Lizzeh

Guys is more than 1200 hits by google in 18 hours normal?

kat

It can be, sure.

If you want to keep them out, I believe Google still acknowledges a robots.txt file, at the moment.

Teh Lizzeh

Well if that's one of the things that's causing my bandwidth to be used up so much it might be a good idea although I don't know how much that will affect how easily I am to find? Or am I saying something really stupid here?

kat

Well, now you have a problem.

Do you want the spiders and have them use your bandwidth? Or, do you want to be found?

Your choice, I'm afraid. ;)

'course, you could have others, like Baidu, spidering. They totally ignore robots.txt and you have to keep 'em out using an .htaccess file.

Illori

Quote from: K@ on November 19, 2014, 05:06:10 PM
'course, you could have others, like Baidu, spidering. They totally ignore robots.txt and you have to keep 'em out using an .htaccess file.

i have had baidu blocked with robots.txt for over 2 years and it has not come back once.

Arantor

That's relatively new then since it never used to honour robots.txt.

Teh Lizzeh

I mean the majority of the members I get aren't through google anyway so it wouldn't make that much of a difference I suppose. I'm mostly wondering if it'll significantly save my bandwidth though. I suppose of course I could always try and see what it does of course, if it doesn't help, it doesn't harm anything either. I'm currently awaiting a message back from my host and I'd like to thank all of you for your help. I'll be marking this as solved since I don't think there's more any of you could do with the information I'm able to provide and access at the moment. All of your suggestions did help for sure, though!

Gwenwyfar

Now I'm not really sure you can disable those based on group, you'd have to check (net being a &¨%$$ with dns so my forum isn't loading to check =/), but considering a good part of the bandwidth often comes from images like banner, avatars and so on, perhaps you could just disable those for bots? I know you can create a group with permissions just for them.

Its about the same with my forum (almost no one comes from google), but there's still those that do so its not completely useless even if that's the case for you :P
"It is impossible to communicate with one that does not wish to communicate"

Arantor

It is actually possible to hide avatars for guests without a mod with a bit of abuse of the innards.

Go into the smf_themes table, and add a row of:
id_member = -1
id_theme = 1
variable = show_no_avatars
value = 1

A new row in the table with these values should hide avatars from guests which can cut down your bandwidth use.

Illori

Quote from: Arantor on November 19, 2014, 05:36:07 PM
That's relatively new then since it never used to honour robots.txt.

new as in at least 2 years old then ;)

Arantor

Nah, I'm sure I've seen Baidu fail to listen more recently than that.

Steve

Quote from: K@ on November 19, 2014, 05:06:10 PM'course, you could have others, like Baidu, spidering. They totally ignore robots.txt and you have to keep 'em out using an .htaccess file.

Is there a tutorial/instructions somewhere to teach me how to do this?
DO NOT pm me for support!

Illori

my robots.txt file just has

#Baiduspider
User-agent: Baiduspider
Disallow: /
in it and it seems to work fine. you should try it.

Gwenwyfar

#27
Doesn't seem to work for me. Added that early this morning, Baidu is still going all over my forum all day. Unless they only check the robots file once in a while... Will wait more and see if it stops.

Edit: Oddly enough, its not showing up on the host stats, it shows other bots but not baidu. According to that the other bots together are making up about just 15mb of bandwidth for the month out of another 370mb by pretty much me alone. Pretty low if its accurate. Hope it stays low :P

Edit2: Seems they check the file daily or something around that, it stopped showing up after midnight :)
"It is impossible to communicate with one that does not wish to communicate"

Teh Lizzeh

My host helped me figure  out that it's the Bing spider using about 4.5 GB per month. I've added a robots.txt and I'm hoping that this will stop it from bugging around my site >.> Google only uses 370 MB in comparison, Baidu hasn't shown it's face thankfully o.o

kat


Teh Lizzeh

Both google and bing seem to ignore my robots.txt did I misplace it in my database? It's both in the www folder and the forum folder AND the folder that holds all the actual forum files (I wasn't sure where to put it since it only shows up on the server in the last one but it said to put it in the root)

Arantor


Teh Lizzeh

childrenofolympus.net if you add /robots.txt it does show up but only when I put it in the forum folder in my FTP

Arantor

Well, the fact it shows up indicates that you hit the right place and I don't see anything obviously wrong with it other than the fact you're telling every search engine to go away (which may be desirable, I don't know)

Teh Lizzeh

Yeah I figured I would test it out by telling them all to go away. I would like to allow google in there but I wanted to check if it worked at all, which, clearly it's not because somehow both Bing and google are still showing up. Would it make more sense to try to block Bing separately? And would I have to add each agent on a separate piece of code?

Arantor

Firstly, there is always a delay in between dropping the file in and them responding to it - can be several days.

Secondly, some bots claim to be Google or Bing when they are not (like spammers), and they will ignore robots.txt anyway.

Thirdly, if you want to allow Google in but not Bing, you will want to be blocking Bing on its own anyway (since the concept of robots.txt is 'if you're not listed here, it's OK to come in')

Fourth, yes you will need one entry per agent because the User-agent line only understands the name of a user-agent or * for everything. There's nothing formally supporting more complex constructions like User-agent: Google, Bing for example.

Teh Lizzeh

Ah okay I figured that it would have responded to it by now since I dropped the file yesterday afternoon but if it can also take several days that makes more sense.

so my code would look like this if I'm correct?

User-agent: Bingbot
Disallow: /

And that for every agent I want to block, of course, but that's the only one that seems to be on my site.

Arantor

That assumes Bing tests for Bingbot; it may also test for msnbot. Also not sure if it is capital B on that or not, to be honest, been a while since I checked what Bing uses these days.

Teh Lizzeh

well Bingbot is the one that comes on my site, I've looked and there's four agents they use, Bingbot being the only one that ever comes on my site. But I see no problem in blocking all of them. I just want them to go away.

Steve

I'm a complete dolt with this stuff so let me make sure I understand. If I create a text file called robots.txt and put in it:

User-agent: *
Disallow: /

That would stop any spider or bot that doesn't ignore robots.txt? And it goes in the same folder as my .htaccess?
DO NOT pm me for support!

Arantor

If you create a file called robots.txt with that content, and put it in the same folder as your top level stuff, it will block any well behaved robots.

I say 'top level stuff' because I don't know where your .htaccess file is (since you can have one in every folder if you want!).

If your site is example.com/index.php, example.com/robots.txt is where the file goes (same folder as SMF in that situation), but if your site is example.com/forum/index.php, it's *still* example.com/robots.txt (but this time it's not the same folder as SMF, if that makes sense)

Steve

Makes perfect sense and thank you. :D

My site is actually forums.example.com but it still goes in example.com ... prior to the 'public html' folder, yes?
DO NOT pm me for support!

Arantor


Steve

DO NOT pm me for support!

Gwenwyfar

These bots are more annoying than I thought they'd be, getting spammed by yet another unnamed chinese bot now :(

From some of the info I found (is is changing the IP always as usual), seems to be from something called ShenZhen Sunrise Technology Co.,Ltd., which might be just baidu under another name. So yeah, they're still annoying, I guess, or at least the chinese are :P

It doesn't get identified by smf as a bot, though.

"It is impossible to communicate with one that does not wish to communicate"

Advertisement: