strace cleanup() problems

Wed Jan 20 20:35:38 UTC 2010

Hi.

We found some problems with strace's cleanup() logic during the testing,
see below. Please bear in mind I am not familiar with strace's code ;)

------------------------------------------------------------------------------
strace.c:trace()

	  2470                          if (tcp->flags & TCB_BPTSET) {
	  2471                                  /*
	  2472                                   * One example is a breakpoint inherited from
	  2473                                   * parent through fork ().
	  2474                                   */
	  2475                                  if (clearbpt(tcp) < 0) /* Pretty fatal */ {
	  2476                                          droptcb(tcp);
	  2477                                          cleanup();
	  2478                                          return -1;
	  2479                                  }
	  2480                          }

Afaics, it is not right to do cleanup() if clearbpt() fails. Yes, the
"Pretty fatal" comment is correct, but we can have other processes to
continue tracing.

IOW, suppose "strace -c" traces the process P which fork's child C.
Now droptcb(C) can fail just because C is SIGKILLED and therefore
PTRACE_GETREGS/PTRACE_SETREGS fails. strace shouldn't "panic" in
this case imho, we should ignore ESRCH like ptrace_restart() does.

Actually, I don't really understand what cleanup() buys us if it is
called when something goes wrong. Can't strace just exit and rely on
implicit detach in kernel? Except cleanup() should probably send
CONT/TERM before exiting. Yes, we can leave the tracee in the "bad"
state, but I do not see how cleanup() can help. For example, it does
not try to clearbpt() if TCB_BPTSET.

In any case, cleanup() itself have problems which can lead to the
hang, please see below.

------------------------------------------------------------------------------
detach() assumes that if TCB_STARTUP is set, the tracee must have
the pending SIGSTOP queued by attach and thus my_tgkill(SIGSTOP) is
not needed.

This is mostly correct, but if cleanup() is the caller we can't trust
TCB_STARTUP.

	cleanup:

			if (tcp->flags & TCB_ATTACHED)
				detach(tcp, 0);
			else {
				kill(tcp->pid, SIGCONT);
				kill(tcp->pid, SIGTERM);
			}

If cleanup() finds a TCB_ATTACHED sub-thread before TCB_STARTUP
sub-thread X it sends SIGCONT to the thread group. This steals the
pending SIGSTOP and wakes up the TCB_STARTUP tracee X. After that
detach()->wait4(X) can hang "forever" until X exits or stops. For
example, X can sleep in sys_pause() and strace hangs in wait4(X).

Probably TCB_ATTACHED should be per-process, not per thread. Or
perhaps cleanup() should be changed to detach() sub-threads first,
and only then the main thread which has TCB_ATTACHED.

------------------------------------------------------------------------------
There is another reason why I think cleanup() should detach sub
threads first. The main thread can be zombie, in this case
my_tgkill(SIGSTOP) can't help and wait4() can't return unless
we detach and reap all sub-threads.

------------------------------------------------------------------------------
I can't understand why detach()->wait4() can't hang if TCB_STARTUP
is set and the previous ptrace(PTRACE_DETACH) succeeded.

------------------------------------------------------------------------------
Finally, the exact scenario which we had during the testing.

To simplify, the tracee does:

	fork();

	// both parent and child run he code below

	for (0 .. NUM_THREADS)
		pthread_create();

	raise(SIGFPE);

it runs under "strace -f". We have 2 thread groups, G1 and G2.

G2's main thread does raise(SIGFPE), this kills (the kernel sends
SIGKILL to all threads) a thread T in G2 and clearbpt(T) fails.
strace calls cleanup().

cleanup() tries to detach() all traced theads, from G1 and G2.
It detaches some of them, then it sees G1's main thread with
TCB_ATTACHED set and sends SIGCONT to G1.

This wakes up a TCB_STARTUP thread X from G1, X resumes and sleeps
in do_futex().

strace does detach(X) and hangs in wait4(X) (remember, detach()
trusts TCB_STARTUP and doesn't send SIGSTOP).

Now, wait4(X) hangs forever because X waits for FUTEX_WAKE from
another thread in this thread group, and that thread sleep in
TASK_TRACED.

Oleg.