strace scheduling fairness

Wed May 16 10:47:14 UTC 2012

On 05/16/2012 11:48 AM, Dmitry V. Levin wrote:
>> Testcase essentially starts threads, each of which does this:
>>
>>           for (;;) getuid();
>>
>> Since getuid is a very cheap syscall, it finishes very quickly.
>>
>> When there are 2-3 such threads, strace's waitpid() picks up one, restarts it,
>> picks another, restarts it, ...and meanwhile first thread stops again at
>> syscall entry/exit and is ready to be picked up again.
>>
>> It may happen so that the very first thread, which starts all others, even
>> though it is also ready to be handled by strace, will never be picked up
>> because there is always at least one other getuid'ing thread, and that thread
>> gets picked up instead.
>>
>> One solution is to run waitpid() in a loop until it returns 0 "no more tracees
>> to wait", then handle them all. It was NAKed a few years ago.
>
> It would be nice to have a look at that discussion.  Do you have a
> reference?  What was the rationale that time?

I guess Roland saw it as "ugly" and "trying to work around ptrace API
which is a hopelessly bad API anyway".

On the technical note, it adds additional waitpid call per every
syscall entry/exit, and serializes strace even more: instead of
servicing and restarting a thread as fast as we can, we collect
other threads - keeping ready threads stuck a bit longer.

I looked into in and _maybe_ signalfd may help us noticeably.

I expect difficulties in waitpid area: it's likely we can't just
block SIGCHLD and happily read their siginfo's in batch reads
from signalfd: for one, some SIGCHLDs can be lost.
Second, not doing waitpid on exiting tracees will turn them
into zombies - thus, we _still_ will need to call waitpid...

-- 
vda