GSOC: Additional considerations for a structured output project proposal

Fri Mar 7 10:39:06 UTC 2014

Howdy:
If you are interested in this, you probably did read the discussion in
this thread [1].

To recap and add to it, there are several open topics that I would
like to see addressed in a proposal even if a solution is not clear
yet.
I am just throwing it out all there though some items are definitely
more important than others:

- what would be the UI, command line options of a structure output?
- what kind of API restructuring would be need to eventually handle
classical and structured ouput in the same output print-like functions
calls?
- should the output have the same structure regardless of the options
being used?  e.g. either with eventual empty slots for info that are
not traced (such as timing) or would all timing and maximum decoding
details always be included in a structured output?
- how to test that the structured output is correct and that the
classical output has not changed?
- how would be handled the raw vs. decoded args or both (such as file
descriptors with -y and more?)
- how to ensure that the JSON created is valid and UTF8? in particular
for strings? and would escaped strings need to be decoded on the
receiving end of a tool processing the JSON?
- how to handle notification messages such as +++ messages (exited,
killed, superseded) and  --- messages (siginfo and stopped)?
- how to handle unfinished/resume business ?
- when using -ff should the structured files created with a js or json
extension or no special extension?
- how to handle the ioctl parser "or" feature in this:
ioctl(2, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or
TCGETS, {B38400 opost isig icanon echo ...}) = 0
- how to handle ... ellipsis on shortened arguments values like in this:
1390356495.438920 read(3</lib/x86_64-linux-gnu/libc-2.15.so>,
"\177ELF\2\1\1\0\0\0\0\0"..., 832) = 832
- how to handle timing with -t -tt -r and -T?
- how to handle return codes, messages and decoded errnos?
- should the structured output only contain strings (i.e. wrapping
every types in strings) or should it contains also numbers and other
JSON types?
- should decoded or'ed flags be structured (possibly in a list?) like
O_WRONLY|O_APPEND|O_CREAT in:
open("xyzzy", O_WRONLY|O_APPEND|O_CREAT, 0666) = 3
- how to handle the case when a bit-set is full and only unset
elements are printed prefixed by a tilde like this (see man
strace(1)):
sigprocmask(SIG_UNBLOCK, ~[], NULL) = 0
- how to handle the abbreviated /* 22 vars */ output in this:
execve("/usr/bin/make", ["make"], [/* 22 vars */]) = 0
- and the corresponding inverse verbose output as in:
execve("/usr/bin/make", ["make"], ["TERM=xxx", "SHELL=/bin/bash",
"XDG_SESSION_COOKIE=6cdeb7b4d6cae"..., "LANG=en_US.UTF-8",  "SHLVL=1",
"_=/usr/local/bin/strace"]) = 0
- should there be a structured -c output?
- how to handle the -i instruction pointer?
- would you reach out to existing strace output parsers authors  (See
[2]) to collect requirements and have them involved somehow ?

[1] https://sourceforge.net/p/strace/mailman/message/31921480/
[2] https://sourceforge.net/p/strace/mailman/message/31924683/

-- 
Philippe Ombredanne