[PATCH] Try to use PTRACE_SETOPTIONS to handle non-ptrace SIGTRAP

Thu Mar 4 22:50:46 UTC 2010

If you post replacement patches in quick succession, it would be nice to
say something about what changed from one day to the next.  Also, it's a
bit odd to be posting patches without any introductory explanation at all.
In this case, it's tackling a problem we've been talking about for a while,
so I can fill in OK.  But still it looks odd to see just patches fired off
without any of the usual banter.

> --- a/process.c
> +++ b/process.c
> @@ -866,6 +866,9 @@ internal_fork(struct tcb *tcp)
>  				clearbpt(tcpchild);
>  
>  			tcpchild->flags &= ~(TCB_SUSPENDED|TCB_STARTUP);
> +#ifdef TCB_PTRACE_OPTIONS
> +			tcpchild->flags |= tcp->flags & TCB_PTRACE_OPTIONS;
> +#endif
>  			if (ptrace_restart(PTRACE_SYSCALL, tcpchild, 0) < 0)
>  				return -1;
>  

I think I understand the logic behind this flag, but it merits some
comments in the code.  The idea is that when you first attach, you won't
have been able to do PTRACE_SETOPTIONS yet because the tracee wasn't yet
stopped so ptrace calls could have worked.  But in case that tracee hits an
execve right after attach, it will get the old-fashioned ptrace SIGTRAP for
that execve before it gets the SIGSTOP that PTRACE_ATTACH sent.  (It could
also get some normal non-tracing signal before the SIGSTOP, but that
doesn't matter to strace.  It can't get a syscall stop first, because you
haven't had the chance to do PTRACE_SYSCALL yet.)  Is that right?  Is there
any other case than that execve race that you are handling by having the
option of a tracee that doesn't have TCB_PTRACE_OPTIONS set?

If that's the only case, then is this really buying you anything over just
using TCB_STARTUP?

> @@ -1732,9 +1735,10 @@ struct tcb *tcp;
>  		fixvfork(tcp);
>  #endif /* SUNOS4 */
>  #if defined LINUX && defined TCB_WAITEXECVE
> -	if (exiting(tcp) && syserror(tcp))
> -		tcp->flags &= ~TCB_WAITEXECVE;
> -	else
> +	if (exiting(tcp)) {
> +		if (syserror(tcp))
> +			tcp->flags &= ~TCB_WAITEXECVE;
> +	} else
>  		tcp->flags |= TCB_WAITEXECVE;
>  #endif /* LINUX && TCB_WAITEXECVE */
>  	return 0;

Hmm.  This always needed a comment about the logic, and the change
certainly merits adding a full comment here now.  I don't really understand
why we shouldn't (always have) just set TCB_WAITEXECVE only in the (exiting
&& !syserror) case.  If you trace an execve, you're going to get its
syscall exit stop before you get its old-style traced-exec SIGTRAP.  So why
set it on entry and clear it on error exit, instead of only ever setting it
for success exit?

> +static int probe_ptrace_setoptions = PTRACE_O_TRACESYSGOOD | PTRACE_O_TRACEEXEC;

I think I understand the use of these bit masks.  But the logic is not
really intuitive, and even hard to convince oneself is entirely right.
Please give it a thorough comment about the use of this variable.
Also, it could as well be static inside is_ptrace_stop() where its
definition would be right next to the logic that uses it.

> +	/*
> +	 * Ask kernel to set signo to SIGTRAP | 0x80
> +	 * on ptrace-generated SIGTRAPs, and mark
> +	 * execve's SIGTRAP with PTRACE_EVENT_EXEC.
> +	 */

I would note the execve effect a bit differently here.  The most important
reason to use PTRACE_O_TRACEEXEC is that it avoids actual generation of any
signal altogether.  That's the whole reason it solves the problem of
SIGTRAP being blocked in the program.  So it does not do the motivating
subtlety justice to say "mark it with ..." (for the syscall case either, in
fact).  What's important is that we get non-signal ptrace stops here and
can identify them as such.

> +	if (WSTOPSIG(status) == (SIGTRAP | 0x80)) {
> +		/* PTRACE_O_TRACESYSGOOD works. */
> +		probe_ptrace_setoptions &= PTRACE_O_TRACESYSGOOD;

So this means that the definition is "bits clear might not work"?  Except
it's not really that, because the starting state is both bits set when both
are of unknown status.  So you clear the other guy's bit because he might
still not work, but you do?  It's pretty nonobvious.  It would make plenty
of sense if you were using &= ~ here instead.

> +	if ((WSTOPSIG(status) == SIGTRAP && probe_ptrace_setoptions))

Doubled parens.  Anyway I think it would be clearest just to write:

	return WSTOPSIG(status) == SIGTRAP && probe_ptrace_setoptions;

But, for exec doesn't it make sense to call it a possible ptrace stop only
if TCB_STARTUP?  In fact, that is already covered by your TCB_PTRACE_OPTIONS
check at the top.  So the only way this can be a ptrace stop is if
PTRACE_O_TRACESYSGOOD is not yet known to work.

Thanks,
Roland