[PATCH 2/2] Trace fork series calls using PTRACE_SETOPTIONS on linux

Fri Sep 10 08:31:26 UTC 2010

> If so, I guess that what you want is the compatibility with system
> using early 2.4 kernels. But I am in doubt whether we need to do so
> since 2.4 series are really too old, right?

Indeed I think it's only some 2.4 kernels that behave in the problematic
fashion.  But strace has always been compatible with quite old kernels
before, and we don't want to regress in that compatibility.

> Moreover, if I got all right, in old kernels such as 2.4 series,
> wait call will lost its children who are ptraced. I guess that's
> why strace have function internal_wait. But in 2.6 series kernels,
> I think wait call will block if its children are currently ptraced
> and strace does not need to do any additional work here. Doing so
> in 2.6 will make things much more complex if wait call use pgid or
> options such as __WCLONE. I'm not very sure about this.

I think even 2.4 might do that better, but I'm not really sure.  There are
a lot of versions and half-backport variants out there.  (RHEL3's kernel is
based on 2.4.21, but has many, many changes, and does this right.)  I think
it's certainly true that all 2.6 kernels correctly both leave a
ptrace-waited zombie for the real parent to wait on and don't let a real
parent see ECHILD while ptrace has stolen the child.  For them, there is
probably no need for the internal_wait logic.

The wait question is really an entirely separate subject from what we've
just been discussing.  If you have some changes about that in mind, please
bring them up in another thread.

> Yes, in this way we can gain good compatibility with older kernels.
> But at the same time, we have also lost the benefits of using
> PTRACE_SETOPTIONS. For example, with old code, we still can not
> really trace vfork, while this can be easily done with PTRACE_SETOPTIONS.
> 
> I'm sorry if I don't understand correctly.

I'm not sure we're understanding each other.  In what I described, there
would be no lack of the benefits of PTRACE_SETOPTIONS.  In the first call
to clone/etc, we'd still do the old setbpt work, but also get the new
ptrace report.  Thereafter, having seen a PTRACE_EVENT_CLONE etc., we would
never use setbpt again.

vfork is an additional issue.  When we decide not to use setbpt, we'd
certainly also decide not to use change_syscall any more either.  But
that's too late for the first vfork, we've already made it a fork instead.
The status quo is not that we "can not really trace vfork", it's that we
turn a vfork into a fork, and then it gets traced just fine (but is doing
something different).

I believe the change_syscall logic dates from the old setbpt code (perhaps
still used for a non-Linux build, if any such still work).  There I think
the reationale was not to do a vfork because you were going to do actual
text-writing breakpoint insertion and wanted it to be separate for the
parent and child.  (I'm not really sure why you couldn't do even that with
a real vfork too, but anyway.)  

For the new Linux setbpt code that just changes registers instead of doing
breakpoints, I think we could make that support proper vfork too.  Right
now, it turns a vfork syscall into a clone syscall just like a fork syscall.
We could make that CLONE_PTRACE | CLONE_VFORK | CLONE_VM | SIGCHLD.
We might have to make sure that more registers get restored to their
original values, but perhaps that is already done OK.

With that done, then even the first vfork would remain a proper vfork.

Thanks,
Roland