News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

Emails Stuck in Mail Queue

Started by Darkness7148, June 27, 2025, 12:55:38 PM

Previous topic - Next topic

Darkness7148

I've noticed there's emails stuck in the mail queue and don't seem to be going anywhere. The top two are to me and I didn't get alerts for those threads either.

Emails surely shouldn't be taking this long to send. It's active forum so it should have triggered. Doesn't seem to be any issues with the Sending Test Email either.

You cannot view this attachment.

Running. SMF 2.1.6

Doug Heffernan

Was there any changes/modifications done to the forum recently?

Illori

is the forum sending out any emails?

Darkness7148

#3
Quote from: Illori on June 27, 2025, 01:02:00 PMis the forum sending out any emails?

I checked the Mail Delivery Report in WHM and emails are sending okay. I can't be sure they're from the forum though. I'll have to monitor the list.

Quote from: Doug Heffernan on June 27, 2025, 01:01:03 PMWas there any changes/modifications done to the forum recently?

I upgraded to 2.1.5 on Wednesday morning and this started happening. I don't check the Mail Queue that often so could be a coincidence.

I did change my PHP version from 8.1 to 8.3 after I upgraded to 2.1.5 given that you said it was compatible with 8.4.

This error appeared in the Error log after I upgraded to SMF 2.1.5.

QuoteGuest
 https://www.avpgalaxy.net/forum/cron.php
 /home/avpgalax/public_html/forum/Sources/ScheduledTasks.php (Line 873)  Backtrace information

Type of error: Cron
Error messageSelect
2: Undefined array key "mail_failed_attempts"

Edit: Just received the top two messages in my emails now.

Doug Heffernan

Quote from: Darkness7148 on June 27, 2025, 01:15:26 PMI upgraded to 2.1.5 on Wednesday morning and this started happening.

In the meantime there has been another version released, 2.1.6. Can you update your forum to it and see if it would help?

Darkness7148

I already did that. I upgraded as soon as it came out.

gkawa

I'm looking at the same problem. It's not a mail server problem. The Send test works, but I think it's a direct call to the send function and it's not using the mail queue. The mail queue is the problem.
I tried inserting a task in the queue manually, it stays there. Forcing the queue doesn't work. One thing I noticed is that the id_mail field changes every time the queue is processed. It's as if the task is attempted and re-entered into the queue. I don't see anything in the error log.

The forum was updated this week to 2.1.5 and 2.1.6 a couple of days after that. Everything went fine. There was a weird error in the log, calls to subs-admin.php with references to non-existing files. Unfortunately, I deleted them. I thought it was just something temporarily broken during the update. It didn't happen again since.

The forum has little movement, so I can't say if all mail tasks are failing or when the problem started. I'll keep looking into it and let you know if I find anything.

Liviu Lalescu

I am facing the same problems: the notification emails are delayed many hours and send queue is sometimes locked. The test email works, and notification for a new user again works.

KittyGalore

Can confirm this is happening for me also.
SMF Curve 2.0x

gkawa

After a couple hours, the queue is empty. So, it's working, just delayed.

gkawa

I set a trigger on Insert that inserts into a log table, same structure as smf_mail_queue.
I have no idea what I'm looking for, but maybe someone can make sense out of it. What I see is that the queue is processed and all rows are reinserted into it with new id_mail values.
I wonder if it's about the time each message spends in the queue or the number of times it's processed before it goes out.

gkawa

I can confirm the queue is working; it's just delayed. I forgot to put a time control on the new log table. But the first three messages went out after 30-something loops. 30, 33, 34. I think the last one went around more cycles because it was already there when I set the log. I'll add a timestamp to the log and add some more messages to check.

shawnb61

In 2.1.5, I know there were changes to mail queuing & error handling.  Some mail transports are changing their behavior & becoming more tweaky.  In many instances, emails are just getting dropped.

Now, if errors are encountered, emails are no longer dropped, they stay in the queue & are retried at a later time (which I think is configurable).

So...   We should be seeing fewer/zero straightup dropped mails.  But...  We should be seeing more queued emails & delays.

https://github.com/SimpleMachines/SMF/pull/7788
https://github.com/SimpleMachines/SMF/issues/7787

It may need some tweaking.
A question worth asking is born in experience & driven by necessity. - Fripp

marcosbr

envia horas atrasado

sends hours late
Do you feel superior?
Above is a slab and below is darkness. It's fire brother!
https://amigosdaeletronica.com.br

gkawa

Quote from: shawnb61 on June 29, 2025, 06:44:36 PMIn 2.1.5, I know there were changes to mail queuing & error handling.  Some mail transports are changing their behavior & becoming more tweaky.  In many instances, emails are just getting dropped.

Now, if errors are encountered, emails are no longer dropped, they stay in the queue & are retried at a later time (which I think is configurable).

So...   We should be seeing fewer/zero straightup dropped mails.  But...  We should be seeing more queued emails & delays.
That's what I was thinking when I saw that the messages were being re-queued. But I think something else is going on. I assume that the strategy is to select all the queue, and invoke whatever function is doing the sending of the message for each row. Then, re-queue is something goes wrong. I guess the send test is calling the same function. So, I tried many times forcing the processing of the queue (Send mail queue now or refreshing the main page) and doing the send test at the same time. The tests always work. It can't be a mail server problem.

Here's a report from the log with timestamp. Number of times it was processed in the queue, time_send (time it entered the mail queue) and last time it was inserted in the log (time when it was actually sent).


The last two showed up while I was testing. The first 8 were delayed between 100 and 108 minutes. I have no idea how the system works, but it seems too precise to be coincidental. Quotes here. Precise means around the same value. The process is asynchronous and depends on the activity in the forum. Although I was refreshing it on purpose, I didn't do it at precise intervals.

Hope it helps to find the problem. I'll let the log run overnight and let you people know if something else shows up tomorrow.

petewadey

I'm having the same problem since upgrading to 2.1.5, then 2.1.6. Most annoyingly, new members' activation emails are being delayed.

gkawa

I made a mistake. The timestamp indicates the last time the message was reinserted into the queue. The message was sent the next time, and I have no record of it. I have to add a delete trigger.



The first one was still in the queue this morning. We have little to no activity; it wasn't processed since 20:35 last night, and went away in the first try. But the last 2 are still there after 5 tries, and they have more than 10 hours stuck in the queue. I don't think it's relevant; there are daily digests. I added the delete trigger and some more messages to see if I can get more info about it.

gkawa

I can confirm that it's related to time_sent. I sent another newsletter to a group. I set the time_sent of one message one day back, it got sent immediately. Funny thing, one of the daily digests went with it after 10 hours. I guess the priority matters; those are 26 in priority. I

gkawa

Almost all the newsletter notifications were sent between 132 and 133 minutes. A very small part of them were in the 110 and 120 range. There was a password reset in the middle that went out in 115 minutes. So, whatever the number is, it's around 2 hours.
The second daily digest took over 800 minutes.

ChrisDyer

Same issue here.....

anyone having any idea on the root cause - except it is linked to 2.1.6?

I would have liked to add a screenshot, but apparently this isn't possible either.

oskar866

Funny enough I was about to post a new post about mail getting stuck in the queue when I saw this post.  I've not yet upgraded to 2.1.6, still on 2.1.4 but I'm getting the same problems.  Searching this forum shows a history of very similar questions but with no sure answer.  I won't hijack this post but I will follow it, a lot of people seem to be having the same problem.

Liviu Lalescu

With SMF-2.1.4 the notification emails worked very well for me. Only the recent updates to 2.1.5 and 2.1.6 brought me this delay problem.

gkawa

I can't say if it happened in 2.1.4. I noticed after the upgrade. I've been trying different things to find the reason, but I'm stuck. According to the log, typically two hours, with some exceptions below that. One was dispatched immediately, less than one second! Same priority as the others. And the Daily Digests are all being delayed more than 10 hours. I don't understand that since there's no way to tell them apart from the information in the mail queue. Except for the priority.



Darkness7148

I can confirm too that it's working.

gkawa

I'm trying it now, and it seems to be working. It's going to take a while because I tried a newsletter to 200 users. So far, it's sending at a rate of 10 per page refresh (it's set for that).

But I don't understand the problem. It's a system to protect the mail server from being overloaded if there are a lot of failed emails. I'm not having any.

The other question, this change has to be reverted before any update, right?

petewadey

Will there be an official update to resolve this? Now it's a known fault.

Sesquipedalian

Quote from: petewadey on July 01, 2025, 11:28:46 AMWill there be an official update to resolve this? Now it's a known fault.
Yes. However, the official fix might not be the same as the changes described in the comments on the GitHub bug report.
I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.

shawnb61

Quote from: gkawa on July 01, 2025, 10:22:41 AMBut I don't understand the problem. It's a system to protect the mail server from being overloaded if there are a lot of failed emails.

My understanding... 

Mail servers have been changing behavior, & SMF needs to adapt to that.  In the past, mail servers acted as a relay...  SMF would give it an email, the server would acknowledge receipt, and SMF's job was done.  If the mail server couldn't send it, only the mail server knew.  To look at failed emails, you'd likely need to use a web client on your host.  (If you haven't done so, go check it out...  You may have 20 years of failed mail attempts in there, and a couple hundred replies where folks have replied to your forum's email...)

So, the old model was:
- SMF: here take this; Host: got it, I'll see what I can do; SMF: thank you, I'm done now!

SMF never really had to deal with items left in the queue that couldn't be sent, because all it had to do was pass it on to the mail server.  SMF's mail queue existed mainly to slow down bulk mailings, to avoid pissing off your host.

But we are seeing some hosts abandon this relay model...  As SMF connects to the mail server, it's being checked at that time.  The new model is:
- SMF: here take this; Host: let me check... gimme a minute... Nope! Recipient not found!; SMF: WHAT?!?!?

SMF wasn't designed to handle real-time responses from mail servers in this fashion.  Entries were getting stuck in the queue forever; sometimes clogging queue processing.  But some are acting this way (& we will likely see more going forward), so SMF has to adapt.  And sometimes the hosts are just like: "Recipient's mail server thinks we're sending too many emails!"  Expecting SMF (i.e., not the mail server as in the past) to hold on & retry later.  Again, new behavior SMF must adapt to. 

2.1.5 included some changes to handle this.  The SMF mail queue changed from being solely 'not sent yet' to a mix of 'try again later' and 'not sent yet'. 

The new queue logic has been on THIS site for a while.  And THIS site sees a tremendous volume of email, so it looked like things were good.

Quote from: gkawa on July 01, 2025, 10:22:41 AMThe other question, this change has to be reverted before any update, right?
Any time you tweak the core SMF code for an issue such as this, you likely need to revert it before the official patch comes out. 
A question worth asking is born in experience & driven by necessity. - Fripp

gkawa

Quote from: shawnb61 on July 01, 2025, 11:54:16 AMAny time you tweak the core SMF code for an issue such as this, you likely need to revert it before the official patch comes out. 
Thanks. I'll keep that in mind.

By the way, I really really really really like the "old model". Any chance to have that as an option?
I understand that for most forum admins it's a problem. But I check my mail server regularly. Mostly because it's my work server and it's shared with the forum. If SMF can't send emails, I'm having a bigger fish to fry...

shawnb61

Quote from: gkawa on July 01, 2025, 12:29:46 PMBy the way, I really really really really like the "old model". Any chance to have that as an option?

This isn't an SMF choice - it's how some host mail servers behave.  So SMF must support both the old asynchronous relay model & the new synchronous one.

SMF can definitely speed up with a tweak or two.  Probably back to where you can't tell the difference.
A question worth asking is born in experience & driven by necessity. - Fripp

KittyGalore

Quote from: peter_mein on July 01, 2025, 09:34:22 AMHello
There seems to be a solution here
I tried it out and it works.

https://github.com/SimpleMachines/SMF/issues/8712

Yes i can confirm since i changed them codes it has fixed the problem for me also.  I was able to clear out the mail queue.
SMF Curve 2.0x

gkawa

Quote from: shawnb61 on July 01, 2025, 11:54:16 AMBut we are seeing some hosts abandon this relay model...  As SMF connects to the mail server, it's being checked at that time.  The new model is:
- SMF: here take this; Host: let me check... gimme a minute... Nope! Recipient not found!; SMF: WHAT?!?!?
Got it! I misunderstood the situation.

SleePy

@shawnb61,
You hit the nail on the head with the explanation.  More and more modern mail servers are performing real-time checks as we are trying to hand off the mail, and the current mail code assumes it was a failure.  So it just keeps trying and never sends the mail.  This builds up, and mail stops flowing as SMF didn't have a way to handle this.
We could update the code in SMF to just try and ignore responses, but legit failures could result in mail not being sent.  Think your mail server is down, and SMF just throws it out there and doesn't wait for a response.  It is what we did before, but the mail server would effectively accept it and then it would be responsible for trying to send that and depending on the mail server config, it would send a message back to the inbox of the sender, which the suggested setup would be to have those forwarded to a admin (all this done on mail server setup, not SMF).

To adapt, SMF will need to do some more intelligent handling of emails.  I have plans for 3.0 to handle this and to add an additional unused column to ensure we can adapt the code in the future without having to deal with the lack of database changes to perform those changes.

The worst part about the failures would be the DOS that SMF would effectively start.  When mail queue started to pile up, it would keep trying to send emails and if you just upped your mail queue size thinking your more busy, your system would keep trying to send and it could result in other remote servers thinking your performing a some sort of attack and throttle or block you, which then you no longer can send any emails to that domain.  Running your own mail server, you learn that your domain and IP reputation are very important and you take care of everything to ensure you follow the rules, or the big giants will block your domain and don't care about unblocking you.  Those forms typically do nothing and where time time-based.  With some "AI", I suspect they may try to reduce the time they block you.
Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

shawnb61

Quote from: SleePy on July 01, 2025, 09:12:13 PMThe worst part about the failures would be the DOS that SMF would effectively start.  When mail queue started to pile up, it would keep trying to send emails and if you just upped your mail queue size thinking your more busy, your system would keep trying to send and it could result in other remote servers thinking your performing a some sort of attack and throttle or block you, which then you no longer can send any emails to that domain.  Running your own mail server, you learn that your domain and IP reputation are very important and you take care of everything to ensure you follow the rules, or the big giants will block your domain and don't care about unblocking you.

Reading this, I can't help but think that massive sites like SMF, which have decades worth of old registrations & emails, are sending a massive volume of email to nowhere...

I bet a lot of those are for email accounts that are no longer valid.

I wonder if, upon seeing a 5xx response code, we should inactivate email for that user - unset the opt-in flag.  And never even *try* to send to them again.

I bet the volume of email being sent would drop. 

A lot.
A question worth asking is born in experience & driven by necessity. - Fripp

Aleksi "Lex" Kilpinen

If we could make it so, that it's retried once like a week or so later, and then act if it still fails - Even better. Just to make sure it's not just a system malfunction or something. (Only saying this, because my own email address has been dropped in places I really wish it hadn't, when Microsoft was having temporary issues... )
Slava
Ukraini!
"Before you allow people access to your forum, especially in an administrative position, you must be aware that that person can seriously damage your forum. Therefore, you should only allow people that you trust, implicitly, to have such access." -Douglas

How you can help SMF

shawnb61

#36
Quote from: Aleksi "Lex" Kilpinen on July 02, 2025, 01:24:11 PMIf we could make it so, that it's retried once like a week or so later, and then act if it still fails - Even better. Just to make sure it's not just a system malfunction or something.

The recent change does most of this already.  It slows down resends, & ultimately drops repeated fails. 

The good thing is that the mail transports generally report 4xx codes for those worth retrying.  5xx is normally dead altogether (although 552 is mailbox full, which might be temporary...).  A 550 normally means the mailbox doesn't exist.  Though it can mean that the sender (you/your host) has been blacklisted.
https://www.mailersend.com/blog/smtp-codes

Maybe opt out the 550s with repeated fails.  TBH - I'd opt out all of 5xx.  If they ever fix things they can opt back in.

That's actually a benefit to this new synchronous mode we're starting to see...

Under the old relay model, the only place to see if emails are getting rejected is to logon to your host's webmail.  Your forum has no idea which emails are invalid.  If you want to clean those up, you must do so by hand.
A question worth asking is born in experience & driven by necessity. - Fripp

gkawa

Quote from: shawnb61 on July 02, 2025, 01:37:27 PMTBH - I'd opt out all of 5xx.  If they ever fix things they can opt back in.
I'd do that. A mailbox full is as good as dead. I'm sure that person has bigger problems than missing forum notifications. And, most likely, it's an abandoned mailbox.

While checking on this issue, I found out that most daily digest subscriptions are from users who haven't logged into the forum in ages. And we have little to no traffic at all, fewer than 30 users are active every day. I can imagine that the problem is huge for popular forums.

CRM 114

Quote from: peter_mein on July 01, 2025, 09:34:22 AMHello
There seems to be a solution here
I tried it out and it works.

https://github.com/SimpleMachines/SMF/issues/8712
For me, this did not work.

Sending a test mail (SMTP) works though.
German Wet Shaving Forum: www.gut-rasiert.de/forum

Oldiesmann

Quote from: CRM 114 on July 02, 2025, 02:31:54 PM
Quote from: peter_mein on July 01, 2025, 09:34:22 AMHello
There seems to be a solution here
I tried it out and it works.

https://github.com/SimpleMachines/SMF/issues/8712
For me, this did not work.

Sending a test mail (SMTP) works though.

What didn't work about it? It's worked for me on two different forums (my birthday was two days ago so the birthday notification mail got stuck in the queue on both of them, and I've since gotten a topic reply notification email from one of them as well). After making the change if you go to the mail queue and click "Send queue now", it should send all the emails that are stuck in the queue (though I don't recommend doing this if you have a ton of emails in the queue - I only had 3 to 5 messages in the queue on each)

CRM 114

Quote from: Oldiesmann on July 02, 2025, 10:10:00 PMAfter making the change if you go to the mail queue and click "Send queue now", it should send all the emails that are stuck in the queue (though I don't recommend doing this if you have a ton of emails in the queue - I only had 3 to 5 messages in the queue on each)
I have 3 Mails stuck in the queue, when i click on the "send queue now", nothing happens.
German Wet Shaving Forum: www.gut-rasiert.de/forum

CRM 114

Quote from: CRM 114 on July 02, 2025, 10:51:32 PMI have 3 Mails stuck in the queue, when i click on the "send queue now", nothing happens.
Priority is "Very high".

Values in table settings:
mail_failed_attempts: 23
mail_limit: 10
mail_next_send: 0
mail_quantity: 10
mail_recent: 1751511038|10
mail_type: 1

The changes I made in ScheduledTasks.php, as described here:
You cannot view this attachment.
German Wet Shaving Forum: www.gut-rasiert.de/forum

peter_mein

Where has line 823 gone?
It is not deleted.

CRM 114

Maybe I missunderstood

Find:
$email['priority'] = max($priority_offset, $email['priority'], min(ceil((time() - $email['time_sent']) / $smtp_expire * ($max_priority - $priority_offset)) + $priority_offset, $max_priority));

// Don't send if it's too soon. Also, if we've already failed a few times, only send on every fourth attempt so that we don't DOS some poor mail server.
if (time() < $next_send_time || ($email['priority'] >= $priority_offset && $email['priority'] % 4 !== 0)) {

Change to:
// Don't send if it's too soon. Also, if we've already failed a few times, only send on every fourth attempt so that we don't DOS some poor mail server.
if (time() >= $next_send_time) {

Anyway, I copied line 823 back from the backup, but this did not solve the problem.

And: The second change affects 2 lines (828 and 857). Is this correct?

Can someone provide his fixed ScheduledTasks.php here, so I can check it?
German Wet Shaving Forum: www.gut-rasiert.de/forum

petewadey

Quote from: CRM 114 on July 03, 2025, 03:51:04 AMAnd: The second change affects 2 lines (828 and 857). Is this correct?



I also found 2 lines


CRM 114

German Wet Shaving Forum: www.gut-rasiert.de/forum


Oldiesmann

Quote from: CRM 114 on July 03, 2025, 12:32:09 AM
Quote from: CRM 114 on July 02, 2025, 10:51:32 PMI have 3 Mails stuck in the queue, when i click on the "send queue now", nothing happens.
Priority is "Very high".

Values in table settings:
mail_failed_attempts: 23
mail_limit: 10
mail_next_send: 0
mail_quantity: 10
mail_recent: 1751511038|10
mail_type: 1

The changes I made in ScheduledTasks.php, as described here:
You cannot view this attachment.

Add ++$email['priority']; in there where line 823 was.


CapnK

Can also confirm that the code on GitHub quoted above works, v SMF 2.1.6.  Thanks for the tip!

Cypheros

Yes the fix works for me, too.  :)
Attached the changed ScheduledTasks.php for 2.1.6 using "Fix mail queue #8716" from jdarwood007 on Github.

Advertisement: