[PATCHv4] print stack trace after each syscall

Sun Sep 22 01:45:38 UTC 2013

On Sat, Sep 21, 2013 at 12:54 PM, Masatake YAMATO <yamato at redhat.com> wrote:
> Hi,
>
> Thank you for reviewing the patch.
> But I need more discussion.
>
> On Thu, 19 Sep 2013 18:05:06 -0700, Luca Clementi <luca.clementi at gmail.com> wrote:
> ...
>>> Why delete_mmap_cache invocation is needed after wait related system calls?
>>>
>>
>> I can't find a logical explanation for that.
>> I think that part of the code should be removed as you indicated.
>
> O.k. Let's remove the code block.
>
> On Thu, 19 Sep 2013 18:05:06 -0700, Luca Clementi <luca.clementi at gmail.com> wrote:
>> Although I have one minor comment.
>> Since we are printing the stack trace on return of a sys call,
>> shouldn't we update the mapping _before_ we print the stack trace?
>> In most of the situation it does not matter except for execve (we can
>> avoid a "backtracing_error").
>>
>> I simply moved the delete_mmap_cache up in the trace_syscall_exiting:
>
>
> I think there is no difference between  _after_ and _before_.
>
>
> Let's think about calling mmap then getpid.
>
> ** _after_ case **
>
>        -1. mmap run in kernel and the map of process is modified.
>         0. control for mmap syscall is returned from kernel, then
>         1. the stack for calling mmap is printed, then
>         2. the cache is cleared, then
>         3. getpid is called, then
>         4. cache is rebuilt (in print_stack), then
>         5. the stack for calling getpid is printed(in print_stack).
>
>         Note: the map is refereed in 4 but between 0 and 5, the map for
>         the process is not modified.
>
> ** _before_ case **
>
>        -1. mmap run in kernel and the map of process is modified.
>         0. control for mmap syscall is returned from kernel, then
>         1. the cache is cleared, then
>         2. cache is rebuilt (in print_stack), then
>         3. the stack for calling mmap is printed, then
>         4. getpid is called, then
>         5. the stack for calling getpid is printed using the cache
>            built in 2.
>
> In the both case, when printing the stack trace for getpid, newer
> cache is used.
>
> As you wrote, execve is special.
> We are interested in the context of execve invoked.
> So we should capture the stack in `trace_syscall_entering'.
> After execve, the context is destroyed.

I agree on this, but in this way you will have to buffer the output
and print it only in the trace_syscall_exiting (to preserve the
current output, which in my opinion is more user friendly).
My was a dirty and bad work-around, I admit it.

> ...However, I've found more important thing.
> libunwind scans /proc/$pid/maps every print_stacktrace invocation!
> If libunwind provides `unw_get_proc_name_and_path',
> the map cache management will not be needed in strace.

I noticed that. Honestly speaking, I was thinking to first get this
patch in and then in a second step, try to code another patch which
provides a custom get_dyn_info_list_addr call back [1] (which is where
the scan of the maps is done).
In the custom get_dyn_info_list_addr we can then use the info we
already have in our mmap_cache.
My first concern was to get the functionality in, then we can work on
improving its performance.

The problem is that libunwind will scan anyway maps every stack trace
(aka every system call, that is performance killer), if you re-write
get_dyn_info_list_addr to use our mmap_cache you can parse the maps
file only when really needed and use the cache the other times.

> I already submitted a conceptual patch:
>   http://lists.nongnu.org/archive/html/libunwind-devel/2013-09/msg00009.html
>
> If `unw_get_proc_name_and_path' is introduced in libunwind, we can
> make unwind.c much simpler.

PS: I think this is a very useful feature to have in libunwind

[1] http://www.nongnu.org/libunwind/man/unw_create_addr_space(3).html

Luca