News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

How do I reduce bandwidth usage?

Started by ajac63, February 18, 2021, 10:26:28 PM

Previous topic - Next topic

drewactual

Quote from: Aleksi "Lex" Kilpinen on March 02, 2021, 12:07:40 AM
The .htaccess should catch everything before it gets to SMF, but could be you just had a wave of crawlers that were not included in the .htaccess rules. Some "search crawlers" can really be a nuisance, and may appear in numbers reaching a hundred in a small timeframe.

hundreds?

I've experienced the china crawlers literally in the tens of thousands... the htaccess blocks took care of the majority of them but still in the late Jan to early Mar timeframe they return... every year... this year and last they were in the 4k range at peak, but before the htaccess blocking (which someone provided here- and thank you for that whomever you were) I would hit 32,000+ crawlers atop of my 300 or so users.   i was watching the metrics like a hawk at the time- half willing to kill them right there, and the winning half morbidly curious to see if the server could handle it... it did... but man, they came in FORCE.

the ALL originated from China- and not a one of them give a damn what you request or suggest - they just bear down on you... if they crash your server? they still don't care.... they stack up to bum rush it again just as soon as you're back up.... they crawl every. single. page. over and over... i despise them.  data harvesting is what they're doing- and it's amazing what they can put together by doing so- an innocuous comment here or there, a mention of job title/position there, a bit of information that means nothing by itself but when in aggregate of other comments both from the same user over time and then other sources? boom- they get a complete picture of whatever the subject matter is be it technical or personal.  they are something else...... and.... we (US) do it too... we do it as good as they do.  nothing is 'private', and with AI it's easier to make sense of the pile of formless data.... and forums are gold mines as rich or more so than social media.

Aleksi "Lex" Kilpinen

I do believe that happens, but must say in my years online I have never seen a single crawler come in with a force quite like that. :o
Slava
Ukraini!
"Before you allow people access to your forum, especially in an administrative position, you must be aware that that person can seriously damage your forum. Therefore, you should only allow people that you trust, implicitly, to have such access." -Douglas

How you can help SMF

drewactual

they tidied up their annual run a couple weeks ago... now i'm seeing somewhere in between 400 and 1500 a day... that'll drop off by Summer to around 500 tops... then, next January, they'll bum rush again. 

what i should do is make copy of their IP's and adjust accordingly.  it 'should' be that simple.

2018 was 'the most', and I misspoke- it was 31k not 32... 2019 was just a hundred or so short of that and while i was blocking one range (before whomever it was left the post identifying the ranges they had encountered)... 2020 I had the htaccess blocks set up and same this year... i want to say it was just over 4k was peak this year. 

ajac63

Quote from: drewactual on March 17, 2021, 10:31:09 AM
Quote from: Aleksi "Lex" Kilpinen on March 02, 2021, 12:07:40 AM
The .htaccess should catch everything before it gets to SMF, but could be you just had a wave of crawlers that were not included in the .htaccess rules. Some "search crawlers" can really be a nuisance, and may appear in numbers reaching a hundred in a small timeframe.

hundreds?

I've experienced the china crawlers literally in the tens of thousands... the htaccess blocks took care of the majority of them but still in the late Jan to early Mar timeframe they return... every year... this year and last they were in the 4k range at peak, but before the htaccess blocking (which someone provided here- and thank you for that whomever you were) I would hit 32,000+ crawlers atop of my 300 or so users.   i was watching the metrics like a hawk at the time- half willing to kill them right there, and the winning half morbidly curious to see if the server could handle it... it did... but man, they came in FORCE.

the ALL originated from China- and not a one of them give a damn what you request or suggest - they just bear down on you... if they crash your server? they still don't care.... they stack up to bum rush it again just as soon as you're back up.... they crawl every. single. page. over and over... i despise them.  data harvesting is what they're doing- and it's amazing what they can put together by doing so- an innocuous comment here or there, a mention of job title/position there, a bit of information that means nothing by itself but when in aggregate of other comments both from the same user over time and then other sources? boom- they get a complete picture of whatever the subject matter is be it technical or personal.  they are something else...... and.... we (US) do it too... we do it as good as they do.  nothing is 'private', and with AI it's easier to make sense of the pile of formless data.... and forums are gold mines as rich or more so than social media.
I can identify with 'late Jan to early Mar timeframe...' as it was February when the sudden spike in b/width usage happened to me, but it's notable that in your case they were all from China and for some years.  I wonder who they're harvesting data on behalf of?
Another SMF believer

drewactual

Quote from: ajac63 on March 17, 2021, 08:47:31 PM
Quote from: drewactual on March 17, 2021, 10:31:09 AM
Quote from: Aleksi "Lex" Kilpinen on March 02, 2021, 12:07:40 AM
The .htaccess should catch everything before it gets to SMF, but could be you just had a wave of crawlers that were not included in the .htaccess rules. Some "search crawlers" can really be a nuisance, and may appear in numbers reaching a hundred in a small timeframe.

hundreds?

I've experienced the china crawlers literally in the tens of thousands... the htaccess blocks took care of the majority of them but still in the late Jan to early Mar timeframe they return... every year... this year and last they were in the 4k range at peak, but before the htaccess blocking (which someone provided here- and thank you for that whomever you were) I would hit 32,000+ crawlers atop of my 300 or so users.   i was watching the metrics like a hawk at the time- half willing to kill them right there, and the winning half morbidly curious to see if the server could handle it... it did... but man, they came in FORCE.

the ALL originated from China- and not a one of them give a damn what you request or suggest - they just bear down on you... if they crash your server? they still don't care.... they stack up to bum rush it again just as soon as you're back up.... they crawl every. single. page. over and over... i despise them.  data harvesting is what they're doing- and it's amazing what they can put together by doing so- an innocuous comment here or there, a mention of job title/position there, a bit of information that means nothing by itself but when in aggregate of other comments both from the same user over time and then other sources? boom- they get a complete picture of whatever the subject matter is be it technical or personal.  they are something else...... and.... we (US) do it too... we do it as good as they do.  nothing is 'private', and with AI it's easier to make sense of the pile of formless data.... and forums are gold mines as rich or more so than social media.
I can identify with 'late Jan to early Mar timeframe...' as it was February when the sudden spike in b/width usage happened to me, but it's notable that in your case they were all from China and for some years.  I wonder who they're harvesting data on behalf of?

i don't think they're targeting anything in particular or for anyone/thing in particular... i think they have a toy and by damn they're going to use it.

there was a time, pre-internet, there were three entities you had to concern about collecting information... it was (and this is likely going to surprise you if you didn't know) Financial industry- particularly credit cards... they collected all kinds of information about your spending habits... then, Churches- The Vatican first but in a close second was Mormons.... they didn't reach out into the world, but what they knew about their followers was startling and astounding... then, the usual suspects- gov'ts... particularly the IRS but it didn't stay there.  then other gov'ts...

this interweb thing is a boon for data collection.  you're already well documented no matter how well you hide from it.  AI makes it even more complicated as it collects a heaping pile of raw data without form and creates relationships between those datums on the fly- and that is where google and the social media's come in... followed closely by the US and China who are neck and neck as to who collects more and the applications it's applied.... they actually sell it, for one, after targeting known users of products... your, say, kayak makers and marketers don't have to take out super bowl ads at millions of dollars to target the perhaps 1.5% of watchers that may be interested in their product... now, they just pay the big harvesters and have precision target marketing. 

.... the fear is where ever else it goes... one thing is for certain- it never goes away. 

maybe five years ago, now, i got a FB message asking me "Is this you in this picture?" - and there i sat in my drunken stupor at a vegas blackjack table in the background of someone's vacation photo... i said "nope"... some months later i saw a picture in 'photos of you' section and there i sat- 'tagged'- but it wasn't me who confirmed it.  i wonder if someone i know did, or if they said "yeah right- that's you ya drunken clown"....

these robots/worms/crawlers aren't our friend... unless you're selling something- and then they are... somewhat... but not for long...

there ain't no hiding from it.  it's the way it is and the way its going to be and there is nothing you can do about it even if you unplug- because your friends have facebook and twitter, and you and they carry phones, and those phones keep up with where you are and who you're around..... unless you bounce completely off the grid they know... and then they know you've bounced...

wonderful, no? 

ajac63

Not wonderful at all >:(.  It seems cyberspace has become such a jungle of crawlers and spiders, that it's getting more and more difficult to know what to expect next.  As to pre-internet info gatherers, I didn't know about The Vatican but I did know about the practices of the Mormons; they wanted to know your life story as a prerequisite for joining.  I was literally 'this far' from being baptized when I luckily came to my senses and walked straight back out of their European head quarters opposite Exhibition Road, London. :-X
Another SMF believer

Advertisement: