[RFC] gradually moving strace from ptrace to perf

Wed Oct 2 12:16:05 UTC 2013

On 10/02/2013 01:20 PM, Jan Kratochvil wrote:
> On Wed, 02 Oct 2013 13:13:21 +0200, Denys Vlasenko wrote:
>> IIUC, this approach, if implemented fully, would move
>> most of strace's syscall decoding machinery to kernel,
>> what would be passed back to userspace are the strings
>> of generated output.
> 
> Yes.
> 
> 
>> To do something like this, I would want to reuse strace's
>> C code of syscall decoders with as small modifications
>> as possible (i.e. want to avoid a complete rewrite,
>> especially to a different language).
>> Is this achievable with systemtap?
> 
> No.  But some mechanical rewrite of about 300 syscall decoders (most of them
> are trivial ones) by an assigned intern IMO should not play a role in the
> design principle of strace-NextGeneration.
> 
> One cannot use C directly in systemtap as the systemtap scripting ensures the
> code is safe against crashes and lockups.

How about this then: in kernel code, just hook to the "sys_enter",
"sys_exit" tracepoints, fetch syscall no, params/return code,
then branch into syscall decoders (C code taken from strace source),
and pass back generated strings to userspace.

This means: no need to mess with perf or systemtap.

At the first glance, such hook doesn't constitute much code.

It's not all trivial, though. Fetching data will need copy_from_user,
which requires sleepable context. tracepoint callbacks aren't.
We'll need to use task_work for this. Should be doable;
although I envision difficulties with triggering task_work
invocation stopping before entering syscall.

-- 
vda