[GSOC 2014] update for JSON support

yangmin zhu zym0017d at gmail.com
Sun Aug 17 18:20:16 UTC 2014

Hi all,

I updated all my work in my gist[0], my patch is based on git commit:
"57fac75  xtensa: sort values in struct_user_offsets". I found strace
4.9 is just released and it's about 20 commits ahead of my work. I
will continue my work and rebase my patch to the latest strace 4.9.
You can find a single big diff file in my gist[1] , It contains all my
change to strace.

1) The overall structure of the JSON output
I test the JSON output using programs in strace's test directory, it
could now produce valid JSON output for most syscalls except the
ioctl() in some special case and the signal situation. The JSON output
are make up with a series lines of JSON object, each line is a single
valid JSON object representing a syscall, signal or anything else. The
user can just read specific lines for their need instead of the entire
file at once. Each object/line has at least 4 key/value pairs for the
type,name,args and ret of a syscall, the type and name are a single
string, the args is a array for the arguments of a syscall and the ret
is the return(in most case it would be a single number but may in
other type in some situations). There are some extra key/value pair in
the object in some special case such as error and strerror.

2) Some big changes to the old output
One of the most important change to the original output are the
numbers. Because JSON itself does not support octal and hex numbers.
they are then changed to print using unsigned decimal(%u) in JSON
output. And all pointers(output by %p) are wrapped in double quoted
Another big change is made for these special styles in strace. First,
strace will produce the abbrevation ... when the output is too long in
some situations. Second, strace will produce a "?" in some situations
for the unknown value. Third, strace will produce the invalid JSON
format such as {0x1e9b7e9c, 0x7f7a} and MCE_GET_LOG_LEN or MEMERASE or
MTRRIOC_DEL_ENTRY in the argument of a syscall.
The solution for the first and second situation are simple, currently
I just output a string "..." and "?" to wrap the abbreviation and
interrogation mask.
The third problem is the biggest challenge in the work, currently I
need to modify these functions one by one to produce valid JSON
output. you can go to [2] to see my detail modifications to these

3) Example output
you can compare the two kind output (produced using option: "-v -y"
and "-j -v -y", there are more examples in my gist[0]):

1: open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) =
2: {"type": "syscall", "name": "open", "args": ["/etc/ld.so.cache",
["O_RDONLY","O_CLOEXEC"]], "ret": [3, "/etc/ld.so.cache"]}

3: mmap(0x7f341644b000, 24576, PROT_READ|PROT_WRITE,
3</lib/x86_64-linux-gnu/libc-2.19.so>, 0x1bb000) = 0x7f341644b000
4: {"type": "syscall", "name": "mmap", "args": [139870053089280,
"MAP_DENYWRITE", [3, "/lib/x86_64-linux-gnu/libc-2.19.so"], 1814528],
"ret": 139870053089280}

5: read(3</tmp/strace_GSOC_json_test.txt>, "helloworld,
this_is_a_test_strin"..., 1000) = 73
6: {"type": "syscall", "name": "read", "args": [[3,
"/tmp/strace_GSOC_json_test.txt"], "helloworld, this_is_a_test_strin",
1000], "ret": 73},

7: splice(3</tmp/strace_GSOC_json_test.txt>, NULL,
4</tmp/strace_tmp_file2>, NULL, 100, SPLICE_F_MOVE|SPLICE_F_NONBLOCK)
= -1 EINVAL (Invalid argument)
8: {"type": "syscall", "name": "splice", "args": [[3,
"/tmp/strace_GSOC_json_test.txt"], null, [4, "/tmp/strace_tmp_file2"],
null, 100, ["SPLICE_F_MOVE","SPLICE_F_NONBLOCK"]], "ret": -1, "error":
"EINVAL", "strerror": "Invalid argument"}

9: fstat(3</lib/x86_64-linux-gnu/libc-2.19.so>, {st_dev=makedev(8, 1),
st_ino=1048744, st_mode=S_IFREG|0755, st_nlink=1, st_uid=0, st_gid=0,
st_blksize=4096, st_blocks=3608, st_size=1845024,
st_atime=2014/08/17-20:39:31, st_mtime=2014/04/12-18:38:28,
st_ctime=2014/06/12-15:09:58}) = 0
10: {"type": "syscall", "name": "fstat", "args": [[3,
"/lib/x86_64-linux-gnu/libc-2.19.so"], {"st_dev": [8, 1], "st_ino":
1048744, "st_mode": ["S_IFREG",493], "st_nlink": 1, "st_uid": 0,
"st_gid": 0, "st_blksize": 4096, "st_blocks": 3608, "st_size":
1845024, "st_atime": "[2014, 8,17,20,39,31]", "st_mtime": "[2014,
4,12,18,38,28]", "st_ctime": "[2014, 6,12,15, 9,58]"}], "ret": 0}

11: +++ exited with 0 +++
12: {"type": "+++", "name": "exited", "info": 0 }

4) Implementation
I had changed the design a lot since the first patch. It now had some
big differences to my original proposals. The core idea is to keep
strace code clean and not change it as much as possible. you can find
a snippet of my change to strace here [2]. tprintf() is modified and
now it will replace the %o,%x and %p,%s specifier with the
corresponding %u and "%p","%s". so we do not need to modify these
tprintf in the syscalls functions. and I also introduce some help
functions to make the change clean and easy, you can also find their
usage in [2].

5) Future plan
First, the current modification to strace is simple for most syscalls,
but I think it is still very ugly when dealing with some extremely
complex output functions such as sys_futex() and sys_clone(). I want
to try more methods to improve the code look after we made our
changes. Second, The format need more test in different situations and
arguments, I think I need to write one test to each syscall just after
I made changes to these syscalls, the programs in test directory is
not enough to make sure the output valid.

Dmitry, could you please give me some suggestions on my next work?
Thank you!

[0] https://gist.github.com/zym0017d
[1] https://gist.github.com/zym0017d/9ba84382f0d1596d1fab
[2] https://gist.github.com/zym0017d/55ab97db366de5cd709f


More information about the Strace-devel mailing list