News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

PHP Session IDs "exploded" my database to 1 Gigabyte, exceeding hosters limits..

Started by Medizinmann99, April 27, 2024, 09:09:10 AM

Previous topic - Next topic

Medizinmann99

Dear Sirs,

I have Simplemachines Forum Version 2.0.13.

Seemingly tonight (I say seemingly as this might have also slowly crawled to 1 GB over time) my forums database "exploded" from ca. 20 MB to 1.000 MB. Making a total of 1.600 MB with the files hosted on the hoster (which probably means that it exploded over night, because the hoster would have shut it down at 1 GB, so it must have exploded so quickly that the hosters system reacted "600 MB too late" so to speak, because when I look into the hosters dashboard, it says 1 GB for the database and 600 MB for the files, for a total of 1,6 GB).

Regarding the contents of the database, it is clear that the extreme size is caused by PHP Session IDs, in the database this can be found under
DATEBASENAME_sessions
It created a sweet number of only 1254549 (1 million 254 thousand and so on) database entries full with PHP Session IDs.

So this exceeded the total storage limit for files and database as the total storage limit is 1.000 MB in total at my hoster, So the hoster deactivated access to the database and set it to read only.

Which is good, for the moment, as I guess this would continue and it might have also grown to terabytes and so on, lol.

Tried to export the database by MYSQLDUMPER, which failed, as MYSQLDUMPER seems to suddenly completely malfunction, although the basic MYSQLDUMPER files should theoretically be ok.
I guess this is either because the database is locked or because MYSQLDUMPER itself is now in a "forbidden space" or something as I guess the database storage has priority over the file storage or something. I have no idea, it does not work properly, just gives me a lot of PHP Errors and strange remnants of a user interface. (Currently this does not have priority I guess, I will have to play around to find out what causes this.) 

So I used the hosters MyPHPAdmin thingie inside the hosters dashboard to export the database, charset is UTF8mb4_unicode_ci
which was the default.
I guess the "real charset" which I have or should have in my database files is some older UTF8. According to AI, UTF8mb4_unicode_ci will NOT corrupt the database even if the database runs on some older UTF8 charset.

So I made the backup using the UTF8mb4_unicode_ci charset with the "quick setting" and the "SQL" file setting.

Seems this just successfully backed up the 1 GB database as a SQL file. I hope this wont be a corrupted file.

Just found out that the "real charset" is UTF8_general_ci. So I guess I should have selected UTF8_general_ci? Should I make a new backup (this will cost me another gig of bandwith, lol)

I checked the stats and I have 1,8 million pageviews this month (basically all guests), which is a new record I guess. I (ME) am the only one maintaining and writing in this forum, lol, but lets say it seems to draw a littlebit more public attention at the moment or something.

Now the question is what I am supposed to do now.

I guess I must remove all php session IDs from the database manually using MyPHPAdmin and make a new backup (so that I have a real backup of the smaller database containing onhly the real data) with charset UTF8_general_ci .

Any problems which can be caused by this? How exactly should I delete it, I have no idea about phpmyadmin.

Then I guess I should upload the forums base files again (I deleted the forums base files as I thought I could free up enough space to get mysqldumper working again, which did not help though). I made a backup of the forums files with FTP of course, as well as a backup of the mysqldumper files.

But lets say IF I would do this, I guess the same problem will happen again tonight, or in a few weeks or months or whenever, it will simply create a few million PHP session IDs again and screw up everything again.

Is this perhaps some new funny Denial of Service-Attack (DOS-Attack) developed by some script kiddie or some "great" hacker which is targeted at Simplemachines 2.0.13?

Can I turn off the whole PHP Session ID System?

How should I resolve this? How can I prevent this in the future?

Thanks!

Best regards,

Medizinmann99

Kindred

No you can not turn off the session ids. It's necessary.

However,  you absolutely MUST upgrade to at least 2.0.19

Your problem is that you got hit by spiders... probably claudebot, as that has been responsible for over-running at least 2 of my 2.0.x forums...
You should block claudebot and baidu...
And set a limit for the other spiders in robots.txt
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Medizinmann99

Thanks, I guess that explains it.

I just asked an AI (perplexity.ai) and got a quite helpful answer I guess:
=====
Based on the information provided in the search results, the key search engine crawlers known to create significant problems with SimpleMachines Forums (SMF) by excessive crawling and generating too many PHP session IDs are:
Baiduspider - The search engine crawler used by the Baidu search engine has been identified as a problematic crawler for SMF forums. It can overload the server by creating an excessive number of PHP sessions.

AhrefsBot - The web crawler used by Ahrefs, a popular SEO and backlink analysis tool, has also been reported to cause issues with SMF forums by generating too many PHP sessions.

To resolve these problems with a SimpleMachines Forum, the recommended steps are:
Create a robots.txt file in the root directory of your SMF forum and add the following directives:

User-agent: Baiduspider
Disallow: /

User-agent: AhrefsBot
Disallow: /

This will instruct the Baiduspider and AhrefsBot crawlers to not access any pages on your SMF forum.

Consider implementing IP-based blocking or rate-limiting measures to further restrict access from these problematic crawlers. This can be done at the server level or using .htaccess rules.

Monitor your server logs and analytics to identify any other overly aggressive crawlers or bots that may be causing issues with excessive PHP session creation or server load. Add them to the robots.txt file as needed.

Ensure that your SimpleMachines Forum software is up-to-date and that any available performance optimizations or caching mechanisms are properly configured to handle increased crawler activity.

By taking these steps, you can effectively mitigate the problems caused by Baiduspider, AhrefsBot, and other problematic crawlers on your SimpleMachines Forum, protecting your server resources and user experience.
====

So the first thing I will do now is to create this robots.txt file, I will put the following contents in it:

User-agent: Baiduspider
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

Just tried, server wont let me do it as I am overquota, lol.

So next step must be to somehow get the PHP Session IDs out of the database with MyPHPAdmin to gain some file space (for the robots.txt file, lol!) without screwing anything up, any small guide how to delete the Session IDs "safely"?

If there is anything safe regarding this, lol.

Can I corrupt the database or screw something seriously up by doing this?

Thanks!

Medizinmann99

I asked perplexity.ai if there is an easy way to rate limit search engine crawlers by the .htaccess file and this was the answer, do you think that the suggested values are "good"? Baiduspider, Ahrefsbot and Claudebot are already "excluded" in robots.txt, but I guess the following additional settings cannot do any harm and the other mentioned search engine crawlers will be rate limited.

====
Based on the information provided in the search results, here is an easy way to rate limit search engine crawlers using the .htaccess file:
Create or edit the .htaccess file in the root directory of your website.
Add the following lines to the .htaccess file:

# Rate limit search engine crawlers
<IfModule mod_ratelimit.c>
  SetEnvIfNoCase User-Agent "^Baiduspider" crawler_bot
  SetEnvIfNoCase User-Agent "^AhrefsBot" crawler_bot
  SetEnvIfNoCase User-Agent "^Ezooms" crawler_bot
  SetEnvIfNoCase User-Agent "^Googlebot" crawler_bot
  SetEnvIfNoCase User-Agent "^Slurp" crawler_bot
  SetEnvIfNoCase User-Agent "^msnbot" crawler_bot
  SetEnvIfNoCase User-Agent "^Yandex" crawler_bot

  RLimitCPU crawler_bot 1 1
  RLimitIO crawler_bot 512 512
  RLimitMEM crawler_bot 128 128
</IfModule>

This configuration does the following:
It identifies several common search engine crawler user agents, such as Baiduspider, AhrefsBot, Googlebot, and others.
It then applies rate limiting rules to these identified crawlers, restricting their CPU usage, I/O, and memory consumption.
The specific rate limiting values used in this example are:
CPU: 1 second per request
I/O: 512 bytes per second
Memory: 128 KB per process
You can adjust these values as needed to suit your server's capacity and the traffic patterns of your website.
By implementing this .htaccess configuration, you can effectively rate limit the identified search engine crawlers, helping to protect your server resources and maintain the performance of your SimpleMachines Forum.

Medizinmann99

Another IMHO helpful answer from perplexity.ai regarding all this:
====
Based on the information provided in the search results, there are no significant real drawbacks to blocking Baiduspider, AhrefsBot, and ClaudeBot from your website using the robots.txt file, if your target audience is primarily in English and German speaking countries, and not in Asia.

The key points are:
Baiduspider is the web crawler used by the Baidu search engine, which is primarily focused on the Chinese market and has limited relevance for websites targeting English and German audiences.

AhrefsBot is a crawler used by the Ahrefs SEO tool, which is popular globally, but blocking it is unlikely to have a major impact on your website's visibility in your target markets.

ClaudeBot is an AI-related agent operated by Anthropic, and its purpose is not clearly defined. Blocking it is unlikely to have any significant consequences for your website's search engine optimization.
The search results indicate that blocking these crawlers in your robots.txt file is a common and recommended practice, especially if they are not relevant to your target audience and are causing issues like excessive server load.

The more important focus should be on optimizing your website's content and structure for the search engines and users that are relevant to your business, such as Google, Bing, and local search engines in your target markets.

Therefore, based on the information provided, blocking Baiduspider, AhrefsBot, and ClaudeBot in your robots.txt file is unlikely to have any real drawbacks for your website's search engine visibility, as long as your primary audience is in the English and German speaking countries, and not in Asia. This can be a useful measure to protect your server resources and focus on the search engines that matter most for your business.

Medizinmann99

Regarding the deletion of the Session IDs, perplexity answered the following, is this correct? The main question is if this can cause serious problems...
===
Based on the information provided in the search results, to delete all PHP session IDs from the SimpleMachines 2.0.13 forums database using phpMyAdmin, you can follow these steps:
Log into your phpMyAdmin interface and locate the database for your SimpleMachines forum.
In the left-hand navigation, find the "_sessions" table or subdirectory within your forum database.
Once you have located the "_sessions" table, click on it to open the table view.
In the table view, you will see all the existing session IDs stored in the database.
To delete all the session IDs, you can use the "Select All" checkbox at the top of the table, and then click the "Delete" button to remove all the records.
Alternatively, you can also run a SQL query to delete all the session IDs in one go:

sql
DELETE FROM _sessions;

This SQL query will delete all the records from the "_sessions" table, effectively removing all the stored PHP session IDs from your SimpleMachines forum database.
Remember to exercise caution when deleting session data, as it may impact the active user sessions on your forum. It's generally recommended to monitor your forum's activity and only delete session data if you encounter issues like excessive session IDs or server performance problems.

mickjav

Sorry, why ignore the good advice given?.

I think you're heading for a lot of problems without upgrading to at least 2.19 as suggested.

You could end up with MYSQL version errors, PHP version errors.

If you're worried there are a number of developers on the forums who could carry out upgrades, Me not being one of them lol.

if I missed where you replied to @Kindred then I'll blame it on old age lol

Hopefully somebody will correct me if wrong, But if you can upgrade all the way to 2.1.14 the sessions table is no longer used?

Medizinmann99

@mickjav
Thanks, of course I want to upgrade, but first, I have to get rid of the extremely large database and my current question is if I can really delete all the PHP Session IDs with Phpmyadmin...I just dont want to wreck my forum...so, I am trying to do this step by step. first the smaller database, then robots.txt, then .htaccess, then the 2.0.13 forum software again from my backup, then I see if everything works, then the upgrades. I guess thats the smartest course of action.

mickjav

Maybe looking for a better host that allows for more than 1 or 2 MySQL dbs and 1 Gb of space.

I would make that my first move problem of oversize would be solved, and you can look at the other options

when I was looking for a host I posted what I thought of a lot I posted at link below.

PLEASE NOTE THIS IS A TEST COPY OF MY MAIN SITE THE FORUMS HAVE NO ERRORS BUT THAT CAN'T BE SAID FOR REST OF TEST SITE.

https://www.databasedreams.co.uk/testing/index.php?topic=1021.0

Sesquipedalian

You could also turn off the setting to use database-driven sessions. Then the session data will be stored using PHP's default session management system (i.e. temporary files) rather than in your database. Your host might still complain about your overall data usage, if they count temporary files as part of that. But at the very least this will move the problem out of your database and make it easier to deal with.
I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.

shawnb61

The smf_sessions table is used in all versions.

Yes, you can truncate/empty the smf_sessions table.  But this effectively logs everyone out - even you.  The sessions table holds the state of everyone's login session.  No login session = no longer logged in. 

But if you need to reduce the size of the DB and this particular table is huge, yes, go ahead & do so, but everyone will need to log back on.

Note that the .htaccess updates can block the bad guys, folks who crawl too aggressively or don't generally play nice.

robots.txt is a set of guidelines that are used by the good guys - bots who follow the rules.  robots.txt is used to inform bots which parts of your site they should crawl & should not crawl, and how fast to do so.  (Bad bots generally don't bother looking at robots.txt...)

That all said, I'm curious how smf_sessions got 1M rows in it...  SMF normally cleans up unused & old sessions.  If they come back, we need to understand how & why it's growing so fast.  I'd start by looking at errors in the SMF and apache logs, to see if something is interfering with the session cleanup.

I would not ask an AI bot how to build my robots.txt or .htaccess files.  There are some good examples to use as starting points here on the forum, e.g., vbgamer45 included his .htaccess here, it's a very helpful list if you don't have one (I would add claudebot to this list):
Quote from: vbgamer45 on February 22, 2024, 12:32:28 AMMy current badbot useragent list
<Location />
<Limit GET POST PUT>

# Begin Bad Bot Blocking
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase magpie-crawler 1.1 bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase NetSeer crawler/2.0 bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase Jakarta Commons-HttpClient/3.0 bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase wordpress bad_bot
BrowserMatchNoCase istellabot bad_bot
BrowserMatchNoCase SeznamBot bad_bot
BrowserMatchNoCase Cliqzbot bad_bot
BrowserMatchNoCase SocialRankIOBot bad_bot
BrowserMatchNoCase Mail.RU_Bot bad_bot
BrowserMatchNoCase Clickag Bot bad_bot
BrowserMatchNoCase Mediatoolkitbot bad_bot
BrowserMatchNoCase SemrushBot bad_bot
BrowserMatchNoCase DotBot/1.1 bad_bot
BrowserMatchNoCase DataForSeoBot bad_bot
BrowserMatchNoCase www.timpi.io bad_bot
BrowserMatchNoCase DotBot bad_bot
BrowserMatchNoCase trendictionbot bad_bot
BrowserMatchNoCase BLEXBot/1.0 bad_bot
BrowserMatchNoCase SeekportBot bad_bot
BrowserMatchNoCase Turnitin bad_bot
BrowserMatchNoCase omgili/0.5 bad_bot
BrowserMatchNoCase CheckHost bad_bot
BrowserMatchNoCase Amazonbot bad_bot
BrowserMatchNoCase SEOkicks bad_bot
<RequireAll>
Require all granted
<RequireNone>
Require env bad_bot
</RequireNone>
</RequireAll>

</Limit>
</Location>

Some other helpful reading materials:
https://www.imperva.com/blog/most-active-good-bots/
https://radar.cloudflare.com/traffic/verified-bots
https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/blob/master/_generator_lists/bad-user-agents.list

Hope this helps.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Arantor

That's, uh, not how the sessions table works. You can delete the contents of the session table and people will remain logged in because the cookie will handle reauthenticating (and remaking) sessions for them just fine. (This is why the cookie has the components it has, such as the user id.)

What will be affected is anyone typing a post in the period leading up to the clearout of the table, which will produce a 'your session timed out while posting' error but not lose their post, and they can resubmit with a new cookie just fine.

vbgamer's quoted list there is good, but I'd add claudebot, anthropic-ai, gptbot to it at a minimum - these have been shown not to respect robots.txt at all on a regular basis.

Medizinmann99

Hmm...thanks to all so far, the forum works again, but I have hundreds of stupid "guests" which are most likely search engine crawlers, so the Session IDs are probably starting to fill up again.

If I add the lines you proposed to .htaccess the full .htaccess file looks like this:

=======================

# THIS WAS ALREADY IN THE .HTACCESS FILE FOR WHATEVER REASON (28.04.2024)
# # av:php5-engine
AddHandler av-php56 .php

#THIS STUFF SHOULD BLOCK OR RATE LIMIT BAD SEARCH ENGINE CRAWLERS (28.04.2024)
<Location />
<Limit GET POST PUT>

# Begin Bad Bot Blocking
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase magpie-crawler 1.1 bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase NetSeer crawler/2.0 bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase Jakarta Commons-HttpClient/3.0 bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase wordpress bad_bot
BrowserMatchNoCase istellabot bad_bot
BrowserMatchNoCase SeznamBot bad_bot
BrowserMatchNoCase Cliqzbot bad_bot
BrowserMatchNoCase SocialRankIOBot bad_bot
BrowserMatchNoCase Mail.RU_Bot bad_bot
BrowserMatchNoCase Clickag Bot bad_bot
BrowserMatchNoCase Mediatoolkitbot bad_bot
BrowserMatchNoCase SemrushBot bad_bot
BrowserMatchNoCase DotBot/1.1 bad_bot
BrowserMatchNoCase DataForSeoBot bad_bot
BrowserMatchNoCase www.timpi.io bad_bot
BrowserMatchNoCase DotBot bad_bot
BrowserMatchNoCase trendictionbot bad_bot
BrowserMatchNoCase BLEXBot/1.0 bad_bot
BrowserMatchNoCase SeekportBot bad_bot
BrowserMatchNoCase Turnitin bad_bot
BrowserMatchNoCase omgili/0.5 bad_bot
BrowserMatchNoCase CheckHost bad_bot
BrowserMatchNoCase Amazonbot bad_bot
BrowserMatchNoCase SEOkicks bad_bot
BrowserMatchNoCase Claudebot bad_bot
BrowserMatchNoCase GPTbot bad_bot
BrowserMatchNoCase anthropic-ai bad_bot
<RequireAll>
Require all granted
<RequireNone>
Require env bad_bot
</RequireNone>
</RequireAll>

</Limit>
</Location>
==================

Without the ==== of course.

This gives me a 500 error on ALL forum sites. Which is good, at the moment, as I guess this means "the forum" cannot fill up the session IDs once again.

If I remove the lines for bad bot blocking the .htaccess file looks like this

==========

# THIS WAS ALREADY IN THE .HTACCESS FILE FOR WHATEVER REASON (28.04.2024)
# # av:php5-engine
AddHandler av-php56 .php

==========

an then everything works again, aka the forum works fine, but I can see hundreds of guests in the forum statistics which rapidly access the entire forum etc. etc.


Any suggestions what to do next? First priority is to lock out all these funny "guests" I guess.

Arantor

Look at the server error log as to what it is complaining about specifically. Depending on the age of the server it might be complaining about Apache 2.4 syntax - but the error message will tell us for sure.


I do have an alternative version of the code that might work for you instead:

RewriteCond %{HTTP_USER_AGENT} ^.*(Ahrefsbot|MauiBot|PetalBot).*$ [NC]
RewriteRule .* - [F,L]

You can add more bots in between the brackets by adding a | after PetalBot though for the ones in your list you will need to put a \ before any special characters including . / and +

Medizinmann99

Thanks @Arantor, this works :-) No more "additional" guests!

But now, there is a new problem, none of the attachments work anymore.

I mean, you can click on attachments, the download window appears, you can download it (for example a JPG) and then it gives some error message that the file is scrambled.

I downloaded the whole forum with Filezilla before I deleted it from the server, including the attachments and there were no error messages.

I uploaded it again, no error messages.

Hm - I guess something scrambled the files, or did not transfer the files correctly.

Or something is wrong with the basic forum software?

Medizinmann99

Very interesting...the attachment files on the hard drive are ok, these are the ones I downloaded from the server with Filezilla before deleting the forum software from the server.

But the attachment files which I today uploaded on the server are corrupted (really, f*cking corrupted, you can see half the pic and then the other half is suddenly lots of weird colorful mumbojumbo).

Hmm...time to play around with Filezilla and a single file to see what happens here.

Arantor

For FileZilla, https://www.simplemachines.org/community/index.php?topic=374178.0

The issue is that the ones downloaded from the server are corrupted because of FileZilla making assumptions it really, really shouldn't. (We gave up reporting this as a bug to FileZilla, they are adamant that it's not a bug. 2.1 takes steps to avoid accidentally running into their bad decision.)

Medizinmann99

@Arantor
I also originally suspected the damn transfer mode, which I guess you are hinting at. I have Filezilla Version 3.66.1 and transfer mode is set to "automatic", this always worked perfectly since many, many years.

But now, there is a very interesting phenomenon. The files on my hard drive are ok and the files on the server are also ok (identical).

I can even - ROFL - delete the file on the server and lo and behold, then I click on the appropriate file in the forum (after a force refresh in the browser), aka I see the posting and at the end of the posting the download option of the attachment which has the identical name of the attachment, and lo and behold, AFTER I deleted the file from the server the Simplemachines Forum still offers the download (ROFL), I get the download window and then I get the remnants of the original file (which I can see because a small part of the image is still visible in the corrupted file).

LOL - it really gets funnier by the moment.

Now I am starting to get a headache.

Please note that the attachment system still worked perfectly yesterday or when this was.

Medizinmann99

Hm dont start thinking yet, the file in the forum refers to another damn file, with a mumbojumbo name, so I now know where the forum takes this from. I have to take a look at this file and I fear I know what I will find - a messed up sh*t probably, but lets see.

Medizinmann99

Yes, it was the damn transfer mode. All files which have encrypted attachment names are f*cked up.

So I now have a completely corrupt copy of all attachments.

And the original on the server was deleted.

I have to check if the hoster still has a backup, it has one of yesterday, but I have no idea if this will restore it.

Medizinmann99

Hm I guess I was "quite" lucky, my hoster made a backup of the filespace at March 30, 2024. I restored the entire filespace and most attachments are here again.

I re-edited the .htaccess file to shut out the unwanted "guests" (search engine crawlers) again.

The attachments "since then" I can see in the database, I will manually add them again :-)

Is there a way to turn off the "attachment name encryption" in the forum so that I only get clear filenames like
picture1.jpg
instead of for example
1_6350c04bbbf57566a714fe8b468ede17a8e05ca0

?

Any recommended software to make a VALID backup of ALL forum files?

Arantor

Is there a way to turn it off? No. It was made mandatory in 2009 after a security hole was demonstrated to exploit it to the point of server takeover.

Any recommended software to make a valid backup? FileZilla with everything in 'binary' mode will do, as will any other tool that either doesn't assume or can be set to use binary transfer mode.

Medizinmann99

Any other files which could get corrupted by a filezilla set to "automatic" mode?

I guess it does the equivalent of what Godzilla does on automatic mode, Godzilla destroys cities, Filezilla destroys files, basically the name says it! It is easy to remember this way...

Sesquipedalian

I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.

Kindred

Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Medizinmann99

Quote from: Kindred on April 28, 2024, 04:43:40 PMPlease learn to use CODE BBC tags

Whom does this refer to?

I noted that my small avatar picture of my forum admin account disappeared, I downloaded it from the browser cache and manually uploaded it once again into my profile and now it shows again! But I wonder why it disappeared, does dumb Filezilla on automatic mode also destroy Avatar pictures? I ran Repair_settings.php and set it to all recommended values, this did not bring the avatar picture back, I had to manually reupload it.

The forum overall seems to work completely normal again and the bad search engine crawlers are obviously locked out.

I will make a complete backup on CD / DVD tomorrow. Then I will start the upgrade process etc..

Kindred

That comment was to you



I have had to modify your messages to make them more readable
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Arantor

Quote from: Medizinmann99 on April 28, 2024, 04:23:08 PMAny other files which could get corrupted by a filezilla set to "automatic" mode?

I guess it does the equivalent of what Godzilla does on automatic mode, Godzilla destroys cities, Filezilla destroys files, basically the name says it! It is easy to remember this way...

FileZilla's automatic mode is applied to anything without an extension and anything it thinks is text, like .txt files to convert between what Linux/Mac think is an end-of-line symbol and what Windows thinks is an end-of-line symbol.

As a rule, don't even let it guess and just force it to always use binary mode.

Personally I use WinSCP but that's also because I'm usually using SFTP servers rather than FTP or FTPS servers where the rules are a bit different for connecting.

Quote from: Medizinmann99 on April 28, 2024, 04:54:11 PMdoes dumb Filezilla on automatic mode also destroy Avatar pictures?

Avatars are stored with attachments by default in 2.0 so yes.

Medizinmann99

I forgot to post my .htaccess file as this might be interesting for other users as well, seems like this locks out the bots effectively in my case, I just tried to copy&paste the most important bots which were mentioned by you into the list, here the content of my current .htaccess file regarding the blocking of bad bots:
RewriteCond %{HTTP_USER_AGENT} ^.*(Ahrefsbot|MauiBot|PetalBot|anthropic-ai|GPTbot|claudebot|baiduspider|Baiduspider/2.0|ahrefsbot/1.0).*$ [NC]
RewriteRule .* - [F,L]

Arantor

You need to put a backslash before the / in there as I said before.

Medizinmann99

English is not my mother tongue, backslash where exactly please? I made an error in the .htaccess file?

Thanks :-)

Arantor

Yes, you put a / in the file, you need to write \/ instead. You also need to add \ before the . inside the bracket because . has a special meaning that isn't just a dot.

Medizinmann99

Hm, like this?

RewriteCond %{HTTP_USER_AGENT} ^\.*(Ahrefsbot|MauiBot|PetalBot|anthropic-ai|GPTbot|claudebot|baiduspider|Baiduspider\/2.0|ahrefsbot\/1.0).*$ [NC]
RewriteRule .* - [F,L]

Steve

@Medizinmann99 - please remember this is a family-friendly forum and you have to keep from using coarse language. While yours was mild (otherwise I would have edited them), it's just better to get in the habit of not using them at all.  :)

Quote from: Sesquipedalian on April 28, 2024, 04:31:37 PMCyberDuck
I'm not sure if that is needed or not (and it's not free). Like Arantor, I've been using WinScp as my ftp program for years and have had zero problems.
DO NOT pm me for support!

Medizinmann99

@Steve
Ok :-)

I just burned 3 high quality CDs with the database (database downloaded with MYSQLDUMPER and checked with Winrar for integrity as it is an SQL.GZ that is possible, so thats good) and added lots and lots of redundancy created by multipar / quickpar (verified burn). I also burned 3 high quality DVDs (verified burn) with all the forums basic files (redownloaded with Filezilla in binary mode, no transfer errors and I checked if pictures without extensions are still ok, everything ok), everything multipar RARed and redundancy for the multipars until the DVDs were full. Checked burn quality as well. Every copy process was done with teracopy and verified. Well, I guess this should last a few years now.

Phew! I was really lucky this time, this ridiculous chain of events almost destroyed my attachments, which are around 1.000 at the moment. 

The "guest invasion" seems to be over, despite the alleged errors in my .htaccess file.

Hm - now I can continue the upgrade process :-)

Can anyone point my to a guide how I should upgrade from 2.0.13 to perhaps even the latest version? If this is recommendable, I would be happy with a version which is known to be very stable and "finished". I mean finished for the moment, lol.

I will look into WinScp, thanks!

Oldiesmann

Quote from: Medizinmann99 on April 29, 2024, 09:20:48 AM@Steve
Ok :-)

I just burned 3 high quality CDs with the database (database downloaded with MYSQLDUMPER and checked with Winrar for integrity as it is an SQL.GZ that is possible, so thats good) and added lots and lots of redundancy created by multipar / quickpar (verified burn). I also burned 3 high quality DVDs (verified burn) with all the forums basic files (redownloaded with Filezilla in binary mode, no transfer errors and I checked if pictures without extensions are still ok, everything ok), everything multipar RARed and redundancy for the multipars until the DVDs were full. Checked burn quality as well. Every copy process was done with teracopy and verified. Well, I guess this should last a few years now.

Phew! I was really lucky this time, this ridiculous chain of events almost destroyed my attachments, which are around 1.000 at the moment. 

The "guest invasion" seems to be over, despite the alleged errors in my .htaccess file.

Hm - now I can continue the upgrade process :-)

Can anyone point my to a guide how I should upgrade from 2.0.13 to perhaps even the latest version? If this is recommendable, I would be happy with a version which is known to be very stable and "finished". I mean finished for the moment, lol.

I will look into WinScp, thanks!

You can download the 2.1 upgrade package from the downloads page here, then extract the files and upload them to your server and navigate to upgrade.php. See https://wiki.simplemachines.org/smf/Upgrading for more information.
Michael Eshom
Christian Metal Fans

Sesquipedalian

Quote from: Steve on April 29, 2024, 07:50:29 AM
Quote from: Sesquipedalian on April 28, 2024, 04:31:37 PMCyberDuck
I'm not sure if that is needed or not (and it's not free).

Yes it is, if you download it from their website. The version they offer in app stores costs money, but there's no need to get it that way.
I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.

Arantor

It's quite naggy about payment though, and doesn't play nice with some servers I've seen.

Advertisement: