strace's cleanup() logic is wrong ? (Was: [RHEL4 patch 0/2] BZ#539506: fix 2/many bugs)
Oleg Nesterov
oleg at redhat.com
Fri Jan 8 19:06:25 UTC 2010
I need the help from strace gurus ;)
On 01/08, Oleg Nesterov wrote:
>
> This series fixes 2 bugs but the test-case
> still hangs although the hang is very unlikely (before this series it
> quickly hangs after 3-5 iterations, now it needs thousands).
>
> I am still investigating, this time it _looks_ like user-space problem
> but I am not sure.
Yes!!! I never read strace's sources before, but this really looks like
strace bug to me.
OK. In essence, the test-case does:
fork();
// both parent and child run he code below
for (0 .. NUM_THREADS)
pthread_create();
raise(SIGFPE);
It creates to thread groups, G1 and G2, and "strace -f" attaches to
all sub-threads.
Suppose that G1's main thread calls raise(SIGFPE), this kills all
sub-threads in G1, let's say it also kills a sub-thread T.
In this case (say) ptrace(PTRACE_SYSCALL, T) can fail (because T
is not stopped any longer) and strace calls cleanup() to detach.
But! Unless I misread the code, cleanup() tries to detach _all_
attached tracees (which is just wrong imho) and does:
for (i = 0; i < tcbtabsize; i++) {
...
detach(tcbtab[i])
...
}
this means strace can try to detach a thread X from _another_ thread
group G2 before it detaches the threads from G1, and in this case
the hang is very possible because detach() does waitpid(pid), not
wait(-1), if PTRACE_DETACH fails. G1 is alive, we can't expect
waitpid(X) should eventually succeed, X can wait for other threads
sleeping in TASK_TRACED.
Thoughts?
Oleg.
More information about the Strace-devel
mailing list