[GSOC] Proposal Final Draft: Structured Output Filter

Grace O'Hair-Sherman gohairsh at ucsc.edu
Fri Mar 25 04:29:02 UTC 2016


Grace O’Hair-Sherman
gohairsh at ucsc.edu
432 Flood Avenue
San Francisco, CA, 94112
USA
1-415-425-3451


Emergency contact:
Amy O’Hair
1-415-334-5154


GSOC Proposal: Filter which will parse the plain text output of strace
and convert it to structured formats such as JSON


Synopsis:

As it is, the output of strace is not easily machine-readable. I
propose to solve this problem by providing a filter to parse strace
output and convert to a structured format. This parser will be written
in Python and the output will have the option of being in JavaScript
Object Notation or MessagePack (http://msgpack.org/). Here is an
example of how partial output of strace run on a hello-world program
might be output as JSON (supposing the parser were named
strace_to_structured):

Partial output:
% strace -T ./hello
execve("./hello", ["./hello"...], [/* 33 vars */]) = 0 <0.000071>
brk(0)                  = 0x24e3000 <0.000006>


Newline-delimited JSON Stream:

{"syscall": "execve", "args": ["./hello", "[\"./hello\"...]", "[/* 33
vars */]"], "ret_val": 0, "ret_val_hex": "0", "kernel_time":
"0.000071"}

{"syscall": "brk", "args": ["0"], "ret_val": "38678528",
"ret_val_hex": "0x24e3000", "kernel_time": "0.000006"}



Benefits to Community

Anyone who wants to programmatically consume strace output must
currently write their own parser before they can use the output. This
parser will save these people time and effort as they can start with a
format that is easily parseable.



Deliverables

Preparations completed: I have built strace and reviewed the previous
JSON work done in the project.


Roadmap:

23 May - 29 May     --     Investigation & research into what useful
JSON output would look like and where to put Python program in
SourceForge and how to package and distribute the program (with help
from community mailing list)

(Spring quarter classes at university)


30 May - 5 June     --     Set up git repository, get dummy I/O
working, and propose JSON format and get review from community mailing
list

(Spring quarter classes)


6 June - 12 June     --     Begin creating prototype that can create
JSON output for one test from strace-code/test. I would approach this
by writing a parser class containing the parsing logic (possibly
regular expressions as I have experience with them) which is
instantiated once at each runtime, a syscall class that would be
instantiated once for each system call in the strace output, and a
formatter class for each output format. This formatter class would
contain functions for outputting the fields from the syscall objects
in the given format. I would begin by having one formatter class for
JSON and then eventually add more classes for other formats as they
became feasible. (Spring quarter final examinations at university)


13 June - 19 June     --    Continue work on prototype (see
description above). Find/write a Newline-delimited JSON validator
(like JSLint but accepting of newline-delimited rather than
comma-delimited JSON) to check JSON output syntax.


20 June - 26 June     --   Finish prototype (see above). Include in
prototype another formatter class to output the syscall objects'
fields in an strace format to validate/test program correctness. (The
goal being for the strace input and the output to look similar).

(GSOC Midterm evaluation submission period)


27 June - 3 July     --       Create automated test using initial test
program. Run filter with more existing strace programs, fixing
problems as they appear. Make filter accept flags/command-line
arguments to control which formatter to use (JSON or otherwise).


4 July - 10 July     --    Ensure filter correctly reads strace output
when it is run with flags (e.g. -T, -v ) and correctly outputs
corresponding JSON; Write usage text that is emitted by filter when
presented with unknown flags. Since a lot of strace flags add
different fields (for example -T adds the time a system call spent in
the kernel) to the default strace output, I plan to expand the filter
to accept strace output produced using such flags by adding more
arguments to the initializer for the syscall object and making their
default values null (and only giving them values if the flag is
enabled). Additionally, whatever function prints the objects as JSON
would not print any values that were null to avoid overcomplicating
the filter’s output for default/no-flag strace output (making the
common case simple).


11 July - 17 July     --     Document project so far and ensure flag
support (continued from previous week).


18 July - 24 July     --       Write a demo program that consumes the
filter output and prints a summary of average time taken by different
system calls.


25 July - 31 July     --     (Stretch goal) enhance filter to output
MessagePack (and ensure works with one test from strace-code/test)


1 August - 7 August     --     (Stretch goal) Run filter with
MessagePack output and with more existing strace programs, fixing
problems as they appear.



8 August - 14 August     --     (Stretch goal) Ensure filter correctly
reads strace output when it is run with flags (e.g. -T, -v ) and
correctly outputs corresponding MessagePack

15 August - 23 August 19:00 UTC     --     Final week: tidy code,
improve documentation, and submit code sample.



Related Work:

A similar project was proposed and implemented during the 2014 Google
Summer of Code, the main difference being that it was supposed to be
directly a part of strace. It seems that this project’s scope may have
been too big and it was never integrated with strace. This proposal
has a smaller scope in that it will be a separate script that does
post-processing on strace output. Another difference is that this
project will result in a program with options for different output
formats, i.e. JSON or MessagePack. (Inspired by this post:
goo.gl/2yvCTG)



Biographical Information:

I am a second-year computer science major at University of California,
Santa Cruz. I have taken Computer Architecture, Algorithms and
Abstract Data Types, Computer Systems and Assembly Language,
Introduction to Data Structures, and Accelerated Introduction to
Programming. By summer I will have taken Analysis of Algorithms as
well. Almost all these classes have involved UNIX or Linux Bash and
Makefiles. I started developing using Ubuntu two years ago when I
interned at Gametime United. I also used Git and wrote JSON, both
manually and automatically by writing a Python script.


I have experience meeting project deadlines; last summer I designed,
coded, and shipped an iOS application from start to finish in less
than eight weeks. (It is called Amino Ally: goo.gl/WTGgUz ) I haven’t
done any open source projects yet, although I’m a member of my
school’s Linux Users’ Group, so I’m really excited for this
opportunity to get more involved.


The relevant skills that will help me achieve this project’s goal
include Bash, Makefiles, Git, JSON, regular expressions, and Python
(see Github repository github.com/gracefulPotato/gsoc-python for some
scripts I have written including a very rough draft script that
outputs JSON called struct_ex.py).


During the last 10 weeks of Google Summer of Code I will be available
full time to work on my project. I have university classes during the
first two weeks and final examinations during part of the third week,
but I will nonetheless make sure to work at least 20 hours in each of
those three weeks. I consider this a serious full-time commitment and
I will make up the 60 hours missed during the first three weeks by
working 46 hours a week for the remaining 10 weeks.




More information about the Strace-devel mailing list