[GSOC] alternative structured output proposal rough draft

Grace O'Hair-Sherman gohairsh at ucsc.edu
Thu Mar 24 05:16:47 UTC 2016


Suggestions for improvement very welcome! Thank you.

Grace O’Hair-Sherman
gohairsh at ucsc.edu
[Extended contact information]


GSOC Proposal: Filter to Parse Plain Text strace output to Structured
Formats Like JSON



Synopsis:

As it is, the output of strace is not easily machine-readable. I
propose to solve this problem by providing a filter to parse strace
output and convert to a structured format. This parser will be written
in Python and the output will have the option of being in JavaScript
Object Notation or MessagePack (http://msgpack.org/). Here is an
example of how partial output of strace run on a hello-world program
might be output as JSON (supposing the parser were named
strace_to_structured):


Partial output:

% strace -T ./hello
execve("./hello", ["./hello"...], [/* 33 vars */]) = 0 <0.000071>
brk(0)                  = 0x24e3000 <0.000006>


JSON:

{
    "strace - T. / hello | strace_to_structured": [{
            "syscall": "execve",
            "arguments": ["./hello", "[\"./hello\"...]", "[/* 33 vars */]"],
            "return_val": 0,
            "return_val_hex": "0",
            "time_in_kernel": 0.000071
        }, {
            "syscall": "brk",
            "arguments": [0],
            "return_val": 38678528,
            "return_val_hex": "0x24e3000",
            "time_in_kernel": 0.000006
    }]
}



Benefits to Community

Anyone who wants to programmatically consume strace output must
currently write their own parser before they can use the output. This
parser will save these people time and effort as they can start with a
format that is easily parseable.



Deliverables

Preparations completed: I have built strace and reviewed the previous
JSON work done in the project.


Deadlines:

23 May - 29 May     --     Investigation & research into what useful
JSON and MessagePack output would look like

Investigate where to put Python program in SourceForge and how to
package and distribute the program (with help from community mailing
list)

(Spring quarter classes at university)


30 May - 5 June     --     Set up repository; get dummy I/O working

Propose JSON and MessagePack formats and get review from community mailing list

(Spring quarter classes)


6 June - 12 June     --     Create prototype that can create JSON
output for one test from strace-code/test

(Spring quarter final examinations at university)


13 June - 19 June     --     Decide on how to validate JSON output.
Perhaps use a python program that can consume and validate JSON.


20 June - 26 June     --     Create automated test using initial test program

Run filter with more existing strace programs, fixing problems as they appear.

(GSOC Midterm evaluation submission period)


27 June - 3 July     --     Write usage text that is emitted by filter
when presented with unknown flags

Ensure filter exits cleanly when interrupted


11 July - 17 July     --     Document project so far (Should this go
on the project wiki?)


18 July - 24 July     --     enhance filter to output MessagePack (and
ensure works with one test from strace-code/test)


25 July - 31 July     --     Run filter with MessagePack output and
with more existing strace programs, fixing problems as they appear.


1 August - 7 August     --     Ensure filter correctly reads strace
output when it is run with flags (e.g. -T, -v ) and correctly outputs
corresponding MessagePack


8 August - 14 August     --     Stretch goal: write a demo program
that consumes the filter output and prints a summary of average time
taken by different system calls.


15 August - 23 August 19:00 UTC     --     Final week: tidy code,
write tests, improve documentation and submit code sample.



Related Work:

A similar project was proposed and implemented during the 2014 Google
Summer of Code, the main difference being that it was supposed to be
directly a part of strace. It seems that this project’s scope may have
been too big and it was never integrated with strace. This proposal
has a smaller scope in that it will be a separate script that does
post-processing on strace output. Another difference is that this
project will result in a program with options for different output
formats, i.e. JSON or MessagePack. (Inspired by this post:
goo.gl/2yvCTG)



Biographical Information:

I am a second-year computer science major at University of California,
Santa Cruz. I have taken Computer Architecture, Algorithms and
Abstract Data Types, Computer Systems and Assembly Language,
Introduction to Data Structures, and Accelerated Introduction to
Programming. By summer I will have taken Analysis of Algorithms as
well. Almost all these classes have involved UNIX or Linux Bash and
Makefiles. I started developing using Ubuntu two years ago when I
interned at Gametime United. I also used Git and wrote JSON, both
manually and automatically by writing a Python script.


I have experience meeting project deadlines; last summer I designed,
coded, and shipped an iOS application from start to finish in less
than eight weeks. (It is called Amino Ally: goo.gl/WTGgUz ) I haven’t
done any open source projects yet, although I’m a member of my
school’s Linux Users’ Group, so I’m really excited for this
opportunity to get more involved.


The relevant skills that will help me achieve this project’s goal
include Bash, Makefiles, Git, Python, and JSON.


During the last 10 weeks of Google Summer of Code I will be available
full time to work on my project. I have university classes during the
first two weeks and final examinations during part of the third week,
but I will nonetheless make sure to work at least 20 hours in each of
those three weeks. I consider this a serious full-time commitment and
I will make up the 60 hours missed during the first three weeks by
working 46 hours a week for the remaining 10 weeks.




More information about the Strace-devel mailing list