Structured output?

Philippe Ombredanne pombredanne at nexb.com
Thu Feb 6 14:41:54 UTC 2014


On Wed, Feb 5, 2014 at 12:43 AM, Dmitry V. Levin <ldv at altlinux.org> wrote:
> On Sun, Feb 02, 2014 at 09:39:35PM +0100, Philippe Ombredanne wrote:
>> Hi,
>> I was wondering if there could be something to make parsing of a
>> strace output a tad easier.
>>
>> In fact even though not too complex, parsing a strace output can be
>> almost as involved as the strace code that encodes the output itself.
>> It could be a csv output or a json  or yaml ouput or something that
>> would have some inherent structure that would make it easier to parse,
>> so that a strace user that wants to interpret the output can focus on
>> the interpretation of the data rather that having to handle the
>> interpretation of the strace output first.
>>
>> In that something that has been already discussed?
>> Any thoughts?
>
> strace output format resembles C syntax.
> And yes, it's so old that it probably predates html. :)

It does predate Linux too, right?  Your post says it all:
http://lvee.org/en/abstracts/94 I guess, though it would be fun to
know if Paul Krankerbug started strace before Linus started Linux in
1991.

> Printing of decoded C constructs is mostly open-coded, thus making any
> attempt to produce output in other formats a huge task.

yes, and though the code for printing things is mostly well defined,
but I am sure there are gremlin in dark corners and there are lots of
place where the code will need to be plugged in.
And huge indeed, as there are 1070+ tprintf and 690+ tprints calls.

> From another
> side, this open-codedness is error-prone, so there are probably some
> mistakes in corner cases waiting to be spotted.  Support of other formats
> inevitably means introducing some API for structured output, which will
> hopefully eliminate this class of mistakes, so that even users of
> traditional strace output format may benefit.

At a high level, the API may look something like this:
- first regardless of the output format we decide on, the line-by-line
approach makes the most sense. e.g. output would still be one physical
line per syscall or +++ message

- then the key elements of call are time, pid, function name, return
code, decoded return code, extra return info and of course arguments.

- for arguments this is a list positional value or key/value map where the
value can be a list or map of further decoded structures.

- there is a duality of raw vs. decoded args, such as for file
descriptor-like where you have a tuple of (fd, decoded path), so it
could make sense to provide both the raw and decoded value as a tuple
for many arguments

- whatever the api used, it should be able to print the exact same
format as today pixel per pixel ;)  and support ideally only one new
structured format for now.

- I have no special preference for a format, except that I prefer text
over binary and that this should be a standard well-defined format
with plenty of available support in most languages. Of text formats,
using some CSV-like would still be flat (short of inventing a
non-standard way of nesting data) and would likely be a regression
from the current format; XML is likely too verbose; yaml structure
requires multiple lines; Json is a tad verbose but less so than XML
and a list entry can be packed on a single line. So a json-like format
is likely a good candidate but this is not really super friendly for
piped input processing with common shell tools (though the current
strace output is more or less there for that today).

So we could have a single or few print function replacing tprintf and tprints
that could accommodate receiving some structures and formatting
code as used today, and a flag to output regular strace or a structured strace?

Something along these lines...

As a first step I think the best would be to come with a few mock
samples of json output.
-- 
Philippe Ombredanne




More information about the Strace-devel mailing list