[PATCH] check process death if ptrace() fails, v2

Wed Dec 10 12:33:39 UTC 2008

> On Tue, 2008-12-09 at 21:57 -0800, Roland McGrath wrote:
> > > >  but probably better instead to have a
> > > > check for a saved ptrace error in the main loop after the WIFSIGNALED and
> > > > WIFEXITED cases.  If we get further and there is a saved ptrace error, then
> > > > print the "<unavailable>" or perhaps something with the strerror string.
> > > 
> > > I'm not sure what are you referring to by "get further".
> > 
> > I mean when it didn't take the WIFSIGNALED or WIFEXITED cases (which do
> > "continue;") so it gets to:
> > 
> > 		if (!WIFSTOPPED(status)) {
> > 			fprintf(stderr, "PANIC: pid %u not stopped\n", pid);
> > 			droptcb(tcp);
> > 			continue;
> > 		}
> > 		if (debug)
> > 			fprintf(stderr, "pid %u stopped, [%s]\n",
> > 				pid, signame(WSTOPSIG(status)));
> 
> I am not sure there are guarantees this will necessarily happen.

I'm not sure what you mean here.  This is one of the three major paths
handling the possible kinds of wait result.  One of them will always happen.

For the scenarios we've been talking about, this one happening for a PID on
which we just experienced a ptrace error is not very likely.  But it's
possible in general.  (It's probably not possible if the error was ESRCH.)

> +extern long secure_ptrace P((int request, struct tcb *tcp, void *addr, void *data));
> +extern long secure_ptrace_SYSCALL P((struct tcb *tcp, int sig));
> +extern long secure_ptrace_CONT P((struct tcb *tcp, int sig));
> +extern long secure_ptrace_DETACH P((struct tcb *tcp, int sig));

These are not good names, and there is no reason to have several different
calls.  Just have one do_ptrace(), pass it the ptrace args and also a
string to use in case of error messages.  (This might not actually get
used, or maybe it gets stored for later.  But make a clean interface for
the function that gives one in case we want it.)  None of the callers
examine the return value, just success/failure; it doesn't need to return
long.  Just make it return a success indicator, int.

The new changes need more comments.  Comment formatting should always
follow the convention used in strace:
	/*
	 * Like this.
	 */

> --- strace.1/process.c	2008-12-04 19:16:29.000000000 +0100
> +++ strace.2/process.c	2008-12-09 19:00:00.000000000 +0100
> @@ -964,7 +964,8 @@ struct tcb *tcp;
>  
>  			tcpchild->flags &= ~(TCB_SUSPENDED|TCB_STARTUP);
>  			if (ptrace(PTRACE_SYSCALL, pid, (char *) 1, 0) < 0) {
> -				perror("resume: ptrace(PTRACE_SYSCALL, ...)");
> +				if (errno != ESRCH) /* if not killed meanwhile */
> +					perror("resume: ptrace(PTRACE_SYSCALL, ...)");
>  				return -1;
>  			}

This doesn't need a special case.  Just use do_ptrace on tcpchild.

> -	if (ptrace(PTRACE_SYSCALL, tcp->pid, (char *) 1, 0) < 0) {
> -		perror("resume: ptrace(PTRACE_SYSCALL, ...)");
> +	if (secure_ptrace_SYSCALL(tcp, 0) < 0) {
>  		return -1;
>  	}

When the body of an if is just one line, remove the braces.

> @@ -1491,12 +1491,13 @@ int sig;
>  	 * detached process would be left stopped (process state T).
>  	 */
>  	catch_sigstop = (tcp->flags & TCB_STARTUP);
> -	if ((error = ptrace(PTRACE_DETACH, tcp->pid, (char *) 1, sig)) == 0) {
> +	error = secure_ptrace_DETACH(tcp, sig);
> +	if (error == 0) {
>  		/* On a clear day, you can see forever. */
>  	}

You probably don't want to mess with the detach() logic at all.  It's not
at all like the uses of ptrace for getting registers et al.  It has
specific touchy error case handling you don't want to meddle with.  This is
not a place where the error-resilience we're trying to add makes any sense.

> @@ -2175,8 +2168,7 @@ handle_group_exit(struct tcb *tcp, int s
>  		  	if (leader != NULL && leader != tcp)
>  				leader->flags |= TCB_GROUP_EXITING;
>  		}
> -		else if (ptrace(PTRACE_CONT, tcp->pid, (char *) 1, sig) < 0) {
> -			perror("strace: ptrace(PTRACE_CONT, ...)");
> +		else if (secure_ptrace_CONT(tcp, sig) < 0) {
>  			cleanup();
>  			return -1;
>  		}

This shouldn't call cleanup() any more.  The purpose of storing of ptrace
errors is that we can deal with a failure of just this PID without bailing
out entirely.  The same goes for other places you touched similarly.

>  	tracing:
> -		if (ptrace(PTRACE_SYSCALL, pid, (char *) 1, 0) < 0) {
> -			perror("trace: ptrace(PTRACE_SYSCALL, ...)");
> +		/* ESRCH: task is dead or not stopped, we will handle it
> +		 * at waitpid. Otherwise we are confused and bail out. */
> +		if (secure_ptrace_SYSCALL(tcp, 0) < 0 && errno != ESRCH) {
>  			cleanup();
>  			return -1;

The ESRCH checks scattered around are not a good idea.  Probably the right
way to deal with this is to pass do_ptrace() a flag saying whether a
miscellaneous failure is tolerable.  In cases that don't affect control
flow (PEEKUSR et al), it always is.  In places like this, perhaps it's not
and then do_ptrace can call cleanup().  It's arguable whether we really
should bail out entirely for any of these, but probably we should stick
with it since it could indicate confusion in strace or kernel bad enough
that we really can't recover.

> @@ -2575,6 +2563,11 @@ struct tcb *tcp;
>  	if (tcp_last && (!outfname || followfork < 2 || tcp_last == tcp)) {
>  		tcp_last->flags |= TCB_REPRINT;
>  		tprintf(" <unfinished ...>\n");
> +	} else if (tcp_last->ptrace_errno) {
> +		if (tcp_last->flags & TCB_INSYSCALL)
> +			tprintf(" <unavailable>) =");
> +		tprintf(" ? <unavailable>\n");
> +		tcp_last->ptrace_errno = 0;

Printing a needed ")" must follow the example in trace_syscall:

		tprintf(") ");
		tabto(acolumn);

before the "= ..." part.

But anyway, this is not quite the logic needed.  As you can see in
trace_syscall, there is never a ")" printed that's not followed by a "= ...".
So it can't be right to conditionalize the ") =" and not the "...".
That just doesn't make sense.

Let's think through all the cases where there might have been a stored
ptrace error.

1. Error inside get_scno/syscall_fixup/syscall_enter.  That is, PID1 had a
   syscall-entry stop, then ptrace failed to read the registers to get the
   syscall number and args.

   We should fix trace_syscall so that it sets TCB_INSYSCALL for this case,
   making it consistent that it's always set after the entry stop was seen
   no matter what.  Otherwise recovering from this case would leave the
   state confused.

   Since we might not even have got the syscall number, we don't have any
   info to print.  In this event, we should print a placeholder to make it
   consistent, e.g.:
	???(<unavailable>
   That case would do the printing-state bookkeeping from the normal case:
	printleader(tcp);
	tcp->flags &= ~TCB_REPRINT;
	tcp_last = tcp;

   It's tempting to delay this message so that we suppress it entirely when
   it turns out the process just died by signal, so we can just print that.
   But OTOH, it is being correctly informative that there was a syscall
   entry attempted (even if it didn't get far enough to actually run the
   syscall).  And in case it's some other oddity, then it could be an
   arbitrary time before PID1 reports again, and then it would seem
   wrong not to report quickly (and with the right timestamp, when
   reporting those) that there was this mysterious syscall entry.

   Printing the "???(" here means we've reported the error, so if we do
   that eagerly then it should not leave a stored ptrace error to be seen
   later.  Once it's printed, it's just as if a normal entry of an unknown
   syscall number happened (and called printargs).

   If we decide instead to delay the message, then we'd have a tcb for PID1
   with TCB_INSYSCALL set and ptrace_error nonzero.  After that the
   possibilities are:

   a. Some different PID2 reports next instead, prints " <unfinished>".
      Here we would want to suppress the "<unfinished>" too if we're
      delaying printing the beginning of the line before it (i.e.,
      we haven't printed anything at all on this line yet anyway).
      Consider afresh next time we see PID1 again.
   b. PID1 dies.  Before "+++ killed ...", we do printleader.
      (We'd need to tell printleader it's the death case, either by
      an arg or a TCB_* flag.)  Here because it's a death report,
      we ignore the stored ptrace error.
   c. PID1 gets a non-syscall signal stop (not SIGTRAP).
      This can't actually happen in Linux.  It would mean the ptrace
      report ordering is confused.
   d. PID1 gets the corresponding syscall-exit stop.
      Now we'd get to the TCB_REPRINT check in trace_syscall.
      TCB_REPRINT is set if we had an (a) case and came back around later;
      or perhaps it isn't, we probably shouldn't set it in the (a) code if
      we didn't print the beginning of the line at all in the first place.
      When TCB_REPRINT is clear with the saved ptrace error, we haven't
      printed anything at all yet because we suppressed the "???(".
      So now we just print the "???(" before the normal ")".

      If get_error fails too when we had a saved ptrace error from
      syscall_enter, we probably don't want to print anything at all now.

2. Error inside sys_* with TCB_INSYSCALL clear.  That is, PID1 had a
   syscall-entry stop, succeeded in getting its arguments, but got a ptrace
   error in decoding some arguments.  This is usually handled in the sys_*
   directly by printing an address instead of decoding the argument.
   So this should probably be outside the stored ptrace error cases
   entirely.  If all these are umove*, then it's already separate
   and fine as it is.

3. Error inside get_scno/syscall_fixup/get_error.  That is, PID1 had a
   syscall-exit stop, then ptrace failed to read the return value registers.
   (This is I think the only case you've reported seeing in testing.)

   Likewise we need to be sure to clear TCB_INSYSCALL in these cases.

   If PID1 already has a saved ptrace error, this is the 1(d) case.

   Since we don't know the return value, we want to wind up with a line
   that ends in "= ? <unavailable>".  I think we can assume that if we get
   an error here, there is no result-param decoding we want to bother with.
   It's unlikely we can read memory after failing to read registers.

   When we have printed the start of the call already, we must do the
   TCB_REPRINT logic.  After that either the line looks like:
	"[pid nnn] <... foobar resumed> "
   or:
	"[pid nnn] foobar(1, 2"
   (If we haven't printed anything already, this is 1(d).)

   To match that line we need an ending, so we can immediately print:
   	" <...>) "
   in place of the normal sys_* printing plus ") ".  This indicates that we
   might have printed more result-param info (which is the case for some
   syscalls and not others).  After that, tabto and "= ? <unavailable>"
   matches the output style for other cases, and indicates that the return
   value is unknown.

4. Error inside sys_* with TCB_INSYSCALL set.  That is, PID1 had a
   syscall-exit stop, succeeded in getting the return value, but got a
   ptrace error in decoding some result params.  This is like #2 and
   probably needs no change.

I may not have covered every angle correctly here.  But I hope I've pointed
you in the direction of running down all the potential cases in your mind
and seeing what the series of code paths would be.

Thanks,
Roland