News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

Apache [host] <defunct> zombie processes

Started by zap123, January 31, 2006, 02:38:39 PM

Previous topic - Next topic

zap123

  I just did fresh install of smf 1.1RC2 on my server and when anyone is using the message board it is ok but I keep getting many of these zombie processes and the only way to get rid of them is restart the apache process. Can anyone think what would be causing this to happen?

Linux all up to date
Qmail server
Apache httpd 2.0.52
   -The log files did get to about 1gb but then the log rotated.
PHP 4.3.9-3.9


apache   19983  0.0  0.0     0    0 ?        Z    Jan29   0:00 [host] <defunct>
apache   20039  0.0  0.0     0    0 ?        Z    Jan29   0:00 [host] <defunct>
apache   20046  0.0  0.0     0    0 ?        Z    Jan29   0:00 [host] <defunct>
apache   20050  0.0  0.0     0    0 ?        Z    Jan29   0:00 [host] <defunct>
apache   20055  0.0  0.0     0    0 ?        Z    Jan29   0:00 [host] <defunct>
apache   20062  0.0  0.0     0    0 ?        Z    Jan29   0:00 [host] <defunct>
apache   26243  0.0  0.0     0    0 ?        Z    Jan29   0:00 [host] <defunct>
apache   26249  0.0  0.0     0    0 ?        Z    Jan29   0:00 [host] <defunct>
apache    4883  0.0  0.0     0    0 ?        Z    09:38   0:00 [host] <defunct>
apache    6062  0.0  0.0     0    0 ?        Z    11:32   0:00 [host] <defunct>
apache    6069  0.0  0.0     0    0 ?        Z    11:32   0:00 [host] <defunct>
apache    6075  0.0  0.0     0    0 ?        Z    11:33   0:00 [host] <defunct>
apache    6080  0.0  0.0     0    0 ?        Z    11:33   0:00 [host] <defunct>
apache    6085  0.0  0.0     0    0 ?        Z    11:33   0:00 [host] <defunct>
apache    6142  0.0  0.0     0    0 ?        Z    11:35   0:00 [host] <defunct>
apache    6148  0.0  0.0     0    0 ?        Z    11:36   0:00 [host] <defunct>
apache    6303  0.0  0.0     0    0 ?        Z    11:50   0:00 [host] <defunct>

zap123

Look like it happends when someone logs out..

Ben_S

The error log will often show whats causing it.
Liverpool FC Forum with 14 million+ posts.

brooks

#3
I have checked my error log, and there's nothing in there relating to this.  Can you please elaborate?  I have close to 100 of these running on my server, and it's raising the load quite a bit.

Edit: My specs are also as follows:

RedHat Enterprise Server 4
PHP 4.3.11
Apache 1.3.34

Thanks
-Brooks

DracoBN

#4
Same problem here - no errors being thrown by PHP or Apache. Not sure if 1.06 has the same problems, but RC2 definitely does. After the last bounce of my httpd server, I'm sitting at 33 Zombies, from roughly 14 hours of traffic. They do get reaped from time to time, but overall they continue growing.

I've tried some httpd directives to control these a bit, but nothing as of yet is working. This is definitely specific to SMF, I had zero trouble with the server process hanging out until SMF was installed over the last week.

I'll be happy to help track these down as my time permits, just shoot me an email, or a PM and I'll see what I can do to assist.

System Specifics :

SMF 1.1RC2 (As of last Wednesday)
RedHat AS 4u1 (Nahant)
Kernel  2.6.9-11.EL

Packages Installed of Relevance :

httpd-2.0.52-12.ent (Apache version 2.0.52)
php-mysql-4.3.9-3.6
php-ldap-4.3.9-3.6
php-pear-4.3.9-3.6
php-gd-4.3.9-3.6
php-4.3.9-3.6
mysqlclient10-3.23.58-4.RHEL4.1
php-mysql-4.3.9-3.6
mysql-4.1.10a-2.RHEL4.1
libdbi-dbd-mysql-0.6.5-10.RHEL4.1
mysql-server-4.1.10a-2.RHEL4.1
mysql-devel-4.1.10a-2.RHEL4.1
mod_auth_mysql-2.6.1-2.2

And of course, the output from Top showing a few defuncts :
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
18048 apache    16   0 32316  20m 6948 S  0.0  4.1   1:24.32 /usr/sbin/httpd
18049 apache    16   0 31728  20m 6916 S  0.0  4.0   1:02.62 /usr/sbin/httpd
18050 apache    16   0 32836  21m 6928 S  0.0  4.2   1:21.25 /usr/sbin/httpd
18051 apache    16   0 33504  22m 7268 S  0.0  4.4   1:08.04 /usr/sbin/httpd
18417 apache    16   0 31596  20m 7284 S  0.0  4.1   1:05.98 /usr/sbin/httpd
18418 apache    16   0 32744  21m 6932 S  0.0  4.2   1:00.73 /usr/sbin/httpd
18419 apache    16   0 32964  21m 6928 S  0.0  4.3   1:35.64 /usr/sbin/httpd
18420 apache    16   0 32848  21m 6916 S  0.0  4.2   0:50.55 /usr/sbin/httpd
18422 apache    15   0 32984  21m 7304 S  0.0  4.3   1:07.54 /usr/sbin/httpd
8481 apache    16   0     0    0    0 Z  0.0  0.0   0:00.01 [host] <defunct>
9202 apache    16   0     0    0    0 Z  0.0  0.0   0:00.01 [host] <defunct>
9363 apache    16   0     0    0    0 Z  0.0  0.0   0:00.01 [host] <defunct>
12565 apache    16   0     0    0    0 Z  0.0  0.0   0:00.00 [host] <defunct>


brooks

is anyone ever going to give anything other than a canned answer to this?

c'mon, don't you think this is a bit absurd?  This is from 2 days since a restart of Apache...

Quote27693 ?        Z      0:00 [host] <defunct>
27957 ?        Z      0:00 [host] <defunct>
28039 ?        Z      0:00 [host] <defunct>
28112 ?        Z      0:00 [host] <defunct>
29680 ?        Z      0:00 [host] <defunct>
  338 ?        Z      0:00 [host] <defunct>
3154 ?        Z      0:00 [host] <defunct>
6600 ?        Z      0:00 [host] <defunct>
7118 ?        Z      0:00 [host] <defunct>
8441 ?        Z      0:00 [host] <defunct>
12968 ?        Z      0:00 [host] <defunct>
13178 ?        Z      0:00 [host] <defunct>
13311 ?        Z      0:00 [host] <defunct>
13387 ?        Z      0:00 [host] <defunct>
13401 ?        Z      0:00 [host] <defunct>
13740 ?        Z      0:00 [host] <defunct>
14618 ?        Z      0:00 [host] <defunct>
14654 ?        Z      0:00 [host] <defunct>
16716 ?        Z      0:00 [host] <defunct>
24317 ?        Z      0:00 [host] <defunct>
27014 ?        Z      0:00 [host] <defunct>
27027 ?        Z      0:00 [host] <defunct>
27065 ?        Z      0:00 [host] <defunct>
27103 ?        Z      0:00 [host] <defunct>
27172 ?        Z      0:00 [host] <defunct>
27227 ?        Z      0:00 [host] <defunct>
27237 ?        Z      0:00 [host] <defunct>
27276 ?        Z      0:00 [host] <defunct>
27283 ?        Z      0:00 [host] <defunct>
27288 ?        Z      0:00 [host] <defunct>
27346 ?        Z      0:00 [host] <defunct>
27383 ?        Z      0:00 [host] <defunct>
27403 ?        Z      0:00 [host] <defunct>
27442 ?        Z      0:00 [host] <defunct>
27460 ?        Z      0:00 [host] <defunct>
27466 ?        Z      0:00 [host] <defunct>
27510 ?        Z      0:00 [host] <defunct>
27529 ?        Z      0:00 [host] <defunct>
27535 ?        Z      0:00 [host] <defunct>
27623 ?        Z      0:00 [host] <defunct>
27633 ?        Z      0:00 [host] <defunct>
27638 ?        Z      0:00 [host] <defunct>
27679 ?        Z      0:00 [host] <defunct>
27712 ?        Z      0:00 [host] <defunct>
27726 ?        Z      0:00 [host] <defunct>
27744 ?        Z      0:00 [host] <defunct>
27749 ?        Z      0:00 [host] <defunct>
27765 ?        Z      0:00 [host] <defunct>
27772 ?        Z      0:00 [host] <defunct>
27789 ?        Z      0:00 [host] <defunct>
27794 ?        Z      0:00 [host] <defunct>
27799 ?        Z      0:00 [host] <defunct>
27806 ?        Z      0:00 [host] <defunct>
27833 ?        Z      0:00 [host] <defunct>
27844 ?        Z      0:00 [host] <defunct>
27850 ?        Z      0:00 [host] <defunct>
27862 ?        Z      0:00 [host] <defunct>
27867 ?        Z      0:00 [host] <defunct>
27872 ?        Z      0:00 [host] <defunct>
28099 ?        Z      0:00 [host] <defunct>
28118 ?        Z      0:00 [host] <defunct>
28123 ?        Z      0:00 [host] <defunct>
28303 ?        Z      0:00 [host] <defunct>
28315 ?        Z      0:00 [host] <defunct>
28326 ?        Z      0:00 [host] <defunct>
28336 ?        Z      0:00 [host] <defunct>
28341 ?        Z      0:00 [host] <defunct>
28346 ?        Z      0:00 [host] <defunct>
28376 ?        Z      0:00 [host] <defunct>
28389 ?        Z      0:00 [host] <defunct>
28438 ?        Z      0:00 [host] <defunct>
28582 ?        Z      0:00 [host] <defunct>
28596 ?        Z      0:00 [host] <defunct>
28628 ?        Z      0:00 [host] <defunct>
28647 ?        Z      0:00 [host] <defunct>
28665 ?        Z      0:00 [host] <defunct>
28670 ?        Z      0:00 [host] <defunct>
28688 ?        Z      0:00 [host] <defunct>
28693 ?        Z      0:00 [host] <defunct>
28698 ?        Z      0:00 [host] <defunct>
28708 ?        Z      0:00 [host] <defunct>
28713 ?        Z      0:00 [host] <defunct>
28745 ?        Z      0:00 [host] <defunct>
28750 ?        Z      0:00 [host] <defunct>
28755 ?        Z      0:00 [host] <defunct>
28760 ?        Z      0:00 [host] <defunct>
28780 ?        Z      0:00 [host] <defunct>
28790 ?        Z      0:00 [host] <defunct>
28795 ?        Z      0:00 [host] <defunct>
28811 ?        Z      0:00 [host] <defunct>
28821 ?        Z      0:00 [host] <defunct>
28828 ?        Z      0:00 [host] <defunct>
28839 ?        Z      0:00 [host] <defunct>
28849 ?        Z      0:00 [host] <defunct>
28866 ?        Z      0:00 [host] <defunct>
28892 ?        Z      0:00 [host] <defunct>
28924 ?        Z      0:00 [host] <defunct>
28929 ?        Z      0:00 [host] <defunct>
28934 ?        Z      0:00 [host] <defunct>
28944 ?        Z      0:00 [host] <defunct>
28949 ?        Z      0:00 [host] <defunct>
28959 ?        Z      0:00 [host] <defunct>
28965 ?        Z      0:00 [host] <defunct>
28970 ?        Z      0:00 [host] <defunct>
29273 ?        Z      0:00 [host] <defunct>
32238 ?        Z      0:00 [host] <defunct>

would you like to chime in with something more that an offhand remark, mr support tech guy?  ::)

Ben_S

#6
Are there any log files (error or access logs) over 2GB?

Problems of this nature are pretty hard to track down when there is nothing in the logs to suggest what is causing it.
Liverpool FC Forum with 14 million+ posts.

Trekkie101

Could it be the hostname lookups, I believe there was a post saying that it kept going until it got an answer and didnt stop because of an extra -W or something.

Try disabling hostname lookups in your admin panel, does this help?

DracoBN

Ben - no logfiles in excess of even 100 meg here, I'm quite the nazi about keeping my logs rotated and cleaned.

I'll try disabling hostname lookups in SMF -my apache install has the best practices applied for performance tuning, but nothing out of the ordinary. It's definitely related to SMF in some fashion, as this was never a problem prior with other software on the server (phpBB, and some older YabbSE installations). This didnt start happening until SMF was put into the mix.

I have tried various actions on the board to duplicate it, but as yet I've had no luck in determing what action on the board is zombifying the httpd children. There's nothing in any of my logs (access, error, smf, php error, syslogd) to indicate what the problem is, so I'm as stunped as anyone else is.

I'll go clear my current queue of zombies, turn off hostname lookups in SMF, and see what that does.

DracoBN

Coming up on three hours, and zero zombies. By this time during peak traffic I should be sitting between 10-20 of them, so a definite improvement by turning off the hostname functionality.

I check both nslookup and host on my system, and they appear to behave exactly as they should (ie - they support the command line argument being passed), so not sure why they would hang that way.

gethostbyaddr() shouldnt be causing the problem, since it's not exec'ing out to a shell - that leaves one of the following as the culprit.

Subs.php, line 3458$test = @shell_exec('host -W 1 ' . @escapeshellarg($ip));


Subs.php, line 3471$test = @shell_exec('nslookup -timeout=1 ' . @escapeshellarg($ip));


brooks

can you direct me towards this option to turn off the hostname lookups?

I've been digging around in the admin panel for about 15 minutes, and haven't found a thing about it yet.

Thanks

DracoBN

Admin - Features and Options - Layout and Options Tab - Disable Hostname Lookups (near the bottom).

DracoBN

Coming up on 24 hours now, and zero zombies.

Should this be submitted as a bug ?

DracoBN

Ok, this is definitely the host -W line but I think the actual problem is when it gets a long running query and forces itself to wait for a response that runs beyond the length of the configured php script execution time.

I've set my script execution timer a little higher to see if it's timing out within the normal TCP configured 60 second window - perhaps that will allow hostname lookups and not cause the zombies to hang out.

As a question - why was the decision made to exec a 'host' or 'nslookup' call in the first plcae ? exec() is an expensive operation, and of course always a security risk too. Given that gethostbyaddr() should be available in any PHP installation, and it uses the same methods as both of the exec'd program, why not use it exclusively ?

brooks

I'm not so sure about that.  I turned it off a few days ago, and I have a ton of them again.

Quote
6003 ?        Z      0:00 [host] <defunct>
12421 ?        Z      0:00 [host] <defunct>
15982 ?        Z      0:00 [host] <defunct>
22128 ?        Z      0:00 [host] <defunct>
---------------------<snip>------------------------
30667 ?        Z      0:00 [host] <defunct>
30687 ?        Z      0:00 [host] <defunct>
30698 ?        Z      0:00 [host] <defunct>
30705 ?        Z      0:00 [host] <defunct>
  534 ?        Z      0:00 [host] <defunct>
7670 ?        Z      0:00 [host] <defunct>
7777 ?        Z      0:00 [host] <defunct>
27447 ?        Z      0:00 [host] <defunct>
  510 ?        Z      0:00 [host] <defunct>
6985 ?        Z      0:00 [host] <defunct>

grand total of 284 in 3 days.  Little help here guys?

DracoBN

If you've turned it off - then the code is not physically called. I searched the entire codebase, and that's the only place it appears. Did you bounce httpd after changing the setting (Zombies won't go away until you do) ?

brooks

Absolutely, and I just did again.  I'll keep an eye on it and see if it happens again.

brooks

#17
2 days, same issues.

Quote
14081 ?        Z      0:00 [host] <defunct>
17009 ?        Z      0:00 [host] <defunct>
19534 ?        Z      0:00 [host] <defunct>
19573 ?        Z      0:00 [host] <defunct>
19866 ?        Z      0:00 [host] <defunct>
20045 ?        Z      0:00 [host] <defunct>
20147 ?        Z      0:00 [host] <defunct>
20334 ?        Z      0:00 [host] <defunct>
20690 ?        Z      0:00 [host] <defunct>
20739 ?        Z      0:00 [host] <defunct>
21107 ?        Z      0:00 [host] <defunct>
21151 ?        Z      0:00 [host] <defunct>
21322 ?        Z      0:00 [host] <defunct>
21427 ?        Z      0:00 [host] <defunct>
21532 ?        Z      0:00 [host] <defunct>
26518 ?        Z      0:00 [host] <defunct>
29532 ?        Z      0:00 [host] <defunct>
4789 ?        Z      0:00 [host] <defunct>

edit: see screenshot from my admin panel here: hxxp:www.fsnhosting.com/004.jpg [nonactive]

DracoBN

I'd check installed mods, or other software then.

The base RC2 install only calls the code at the aforementioned lines I documented. You could always try commenting that code out and see if perhaps something else is mucking with it.

Sarge

Is this issue solved in recent versions of SMF? If so, from which version and up is it solved?

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

青山 素子

I hate to say things like this, but it isn't so much an SMF problem as Apache. Processes hang around as zombies until the parent process reads their exit status, at which point they disappear. For some reason Apache isn't doing this, so the zombies remain. (When you kill apache, the zombies are picked up by init, which closes them out).

I haven't had this happen, so I'm thinking it is related to possibly some PHP version or apache configuration item.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


H

Sarge, did you try disabling hostname lookups as suggested?

http://docs.simplemachines.org/index.php?topic=105

The problem is caused by the "host" command. On some hosts the command acts differently due to the version being used.

I'm not sure what the current status of this is development wise however I believed something was being looked at to work around this odd querk.
-H
Former Support Team Lead
                              I recommend:
Namecheap (domains)
Fastmail (e-mail)
Linode (VPS)
                             

Sarge

Thanks to both of you for replying. Yes, I checked "Disable hostname lookups", restarted Apache and no zombies appeared anymore. Previously, we used to have 3,000+ zombies after about 2 days since last Apache restart. Our forum is based on SMF 1.1 RC3-1 (yes, I know, we have to upgrade).

Our issue is solved, thanks to support given in this topic, but I'd like to know: have people experienced this issue with SMF 1.1 Final? Or doesn't it matter which SMF version the forum is?

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

H

Quote
Our issue is solved, thanks to support given in this topic, but I'd like to know: have people experienced this issue with SMF 1.1 Final? Or doesn't it matter which SMF version the forum is?

As this is an issue with a tool outside of SMF we can't directly correct it however I believe a workaround may be in the works for a future version.

I just remembered that I posted a temporary workaround which can be found here
-H
Former Support Team Lead
                              I recommend:
Namecheap (domains)
Fastmail (e-mail)
Linode (VPS)
                             

Advertisement: