Questions about internal_wait function

Wang Chao wang.chao at cn.fujitsu.com
Fri Sep 17 09:38:08 UTC 2010


Sent on 2010-9-10 16:31, Roland McGrath wrote:
>> Moreover, if I got all right, in old kernels such as 2.4 series,
>> wait call will lost its children who are ptraced. I guess that's
>> why strace have function internal_wait. But in 2.6 series kernels,
>> I think wait call will block if its children are currently ptraced
>> and strace does not need to do any additional work here. Doing so
>> in 2.6 will make things much more complex if wait call use pgid or
>> options such as __WCLONE. I'm not very sure about this.
> 
> I think even 2.4 might do that better, but I'm not really sure.  There are
> a lot of versions and half-backport variants out there.  (RHEL3's kernel is
> based on 2.4.21, but has many, many changes, and does this right.)  I think
> it's certainly true that all 2.6 kernels correctly both leave a
> ptrace-waited zombie for the real parent to wait on and don't let a real
> parent see ECHILD while ptrace has stolen the child.  For them, there is
> probably no need for the internal_wait logic.
> 

Currently internal_wait only matches pid and -1 cases, and does not
handle wait call with process group id. Also, it does not care about
these linux only options such as __WCLONE/__WNOTHREAD. I've already
encountered some problems of tracing such wait calls and strace
doesn't work properly. However, as you said above, for 2.6 kernels,
maybe there is no need for the inernal_wait logic.

Comments in source code show that real parent will get ECHILD while its
children are ptraced. So I think maybe some early kernel does not do this
well. The default kernel in RHEL 3.9 seems to work well about this issue
while I doubt that not all 2.4 kernels are fine.

So I feel a little difficult when trying to make this better. Actually,
strace can trace wait call properly on 2.6 kernels without internal_wait.
While some 2.4, but not all, or older kernels need this and I think it
will take a lot of work to make internal_wait call works better with pgid
and these wait options, which may be meaningless on 2.6. 

Hope I express my opinion clearly.

Thanks,
Wang Chao





More information about the Strace-devel mailing list