Questions about internal_wait function

Fri Sep 17 19:39:37 UTC 2010

> Comments in source code show that real parent will get ECHILD while its
> children are ptraced. So I think maybe some early kernel does not do this
> well. The default kernel in RHEL 3.9 seems to work well about this issue
> while I doubt that not all 2.4 kernels are fine.

RHEL3 is not really 2.4, it is a strange hybrid in this area (NPTL backport).
Looking at vanilla 2.4.21 sources, I think it will have this problem too.

> So I feel a little difficult when trying to make this better. Actually,
> strace can trace wait call properly on 2.6 kernels without internal_wait.
> While some 2.4, but not all, or older kernels need this and I think it
> will take a lot of work to make internal_wait call works better with pgid
> and these wait options, which may be meaningless on 2.6. 

I agree.  I don't think it's worth trying to make the internal_wait logic
handle all possible wait* arguments correctly.

Off hand, I don't have any particularly clever ideas for deducing when you
can or can't rely on the kernel to dtrt without internal_wait.  Since we
have never tried to be perfect for all wait* arguments before, it wouldn't
be the worst thing to just keep the old behavior for uname reporting <2.6,
though in general I am against version number checks.  The only thing I
have thought of so far is a brute-force check at startup time, where we
spawn a traced child that spawns a child, just to see whether the parent
tracee gets the ECHILD error behavior.

Thanks,
Roland