[PATCH 2/2] Trace fork series calls using PTRACE_SETOPTIONS on linux

Tue Sep 14 02:54:06 UTC 2010

> Yeah, this method is a good way to keep the compatibility with old kernels
> and I'm working on the patch now.

Great!  Thanks for working on this and taking care to do it right.

> > For the new Linux setbpt code that just changes registers instead of doing
> > breakpoints, I think we could make that support proper vfork too.  Right
> > now, it turns a vfork syscall into a clone syscall just like a fork syscall.
> > We could make that CLONE_PTRACE | CLONE_VFORK | CLONE_VM | SIGCHLD.
> > We might have to make sure that more registers get restored to their
> > original values, but perhaps that is already done OK.
> 
> I doubt it is not that easy. With these flags set, I think strace and the
> program it traced won't end up normally since strace will now block the
> new child when it returns from fork series calls first until its creater
> returns from that syscall. This conflicts with the logic of vfork.

You are referring to the:
				tcp = alloctcb(pid);
				tcp->flags |= TCB_ATTACHED | TCB_SUSPENDED;
case in the main tracing loop.  When an unexpected new child is seen, we
record it and leave it stopped.  Only when we see the parent's call succeed
do we know the right details (old register contents) to let the child
continue unmolested.  Indeed, that is true.  So I was wrong and in fact we
cannot make the current setbpt method work with CLONE_VFORK.  That's fine.

> Actually, when new child return from fork series syscall first, strace
> can not easily get its parent's pid and determine its parent relationship
> and thus can not do the clearbpt work. I guess that's why strace choose
> to block new child if it returns first. Maybe /proc/<pid>/stat may help us
> to get its parent id, but I'm not sure this method is portable. Moreover,
> new child's parent is not always its creater who has done the setbpt work,
> for example, clone call with CLONE_PARENT flag.

All that's true but it doesn't really constrain us.  That is, the one and
only thing it really makes hard to do is CLONE_PTRACE|CLONE_VFORK without
having PTRACE_O_TRACEVFORK.  So we just won't try to do that.

For non-vforks with the current method, we already handle this fine.
We just don't need to know about the new child right away, because the
parent will finish its clone syscall soon enough and then we'll be
able to match up the early-reporting child with its parent.  If we
have PTRACE_O_TRACE* working, then we also don't have a problem.  Even
for vfork, the PTRACE_EVENT_* report from the parent comes right after
creating the child (racing with its startup and its reporting of its
initial SIGSTOP), before the vfork wait begins.  So the current logic
that leaves the child stopped works fine: we see the parent report
PTRACE_EVENT_{CLONE,VFORK} and do PTRACE_GETEVENTMSG to see the child
pid, complete the bookkeeping, and resume the child.  When we resume
the parent again, then it will wait for the vfork child to finish
before it reports the syscall exit.

But we do have real trouble for the first vfork call made before we
know whether the kernel supports PTRACE_O_TRACEVFORK properly.  The
safe assumption is the old behavior, where we turn it into a fork and
then give it the CLONE_PTRACE (setbpt) treatment.  But once we've done
one clone or fork or vfork of any kind, then we know that the kernel
support really works, and we can leave the next vfork alone.  If you
really didn't want to perturb the vfork behavior of the tracee even
once, strace could at startup fork a child that uses PTRACE_TRACEME
and then does a fork/vfork so strace can see how the kernel support
works before it handles any real tracee.

Thanks,
Roland