[PATCH] do not suspend waitpid under strace

Oleg Nesterov oleg at redhat.com
Tue Jun 7 13:43:33 UTC 2011


On 06/07, Denys Vlasenko wrote:
>
> strace suspends waitpid until there is a child
> for waitpid'ing process to collect status from.
> Apparently, it is done because in some very old kernels
> there were ptrace bugs which were making waitpid
> not seeing children. Comment in strace code:
>
>                /* There are children that this parent should block for.
>                   But ptrace made us the parent of the traced children
>                   and the real parent will get ECHILD from the wait call.
>                   ...

I am not sure I understand which bug/problem this comment describes...
OK, looking at the patch below, I guess it means that wait/etc from
_tracee_ can fail with -ECHILD because it is ptraced, correct?

> It is definitely fixed in 2.6.x.

Yes. And even _if_ it was not fixed, strace should not try to fight
with the obvious kernel bugs, the kernel should be fixed.

> Oleg, can you tell approximately how many years ago was it fixed?

Oh, I can't say. I can't even recall whether I ever knew about something
like this. There were sooo many problems in this area. I am looking at
2.6.12 now, and I do not see how this code can return ECHILD in this
case. At least it shouldn't unless it has other bugs.

Well. _May be_ this was fixed by


	commit 1edfa64279794d193f64339fc97d49d858824588
	Author: Ingo Molnar <mingo at elte.hu>
	Date:   Mon Aug 19 18:15:30 2002 -0700

	    [PATCH] O(1) sys_exit(), threading, scalable-exit-2.5.31-A6

	    This fixes the ptrace wait4() anomaly that can be observed in any
	    previous Linux kernel i could get my hands at.

	    If the parent still has other children (that are being traced by
	    somebody), we wait for them or return immediately without an error in
	    case of WNOHANG.

	diff --git a/kernel/exit.c b/kernel/exit.c
	index f2390db..6526b6b 100644
	--- a/kernel/exit.c
	+++ b/kernel/exit.c
	@@ -731,7 +731,7 @@ repeat:
			tsk = next_thread(tsk);
		} while (tsk != current);
		read_unlock(&tasklist_lock);
	-	if (flag) {
	+	if (flag || !list_empty(&current->ptrace_children)) {
			retval = 0;
			if (options & WNOHANG)
				goto end_wait4;

The patch doesn't look right at first glance, but I guess it is too late
to argue ;) However, looks like this patch tries to fix the discussed
problem.

In short, I agree with Denys.

Oleg.





More information about the Strace-devel mailing list