[GSOC] alternative structured output proposal rough draft

Philippe Ombredanne pombredanne at nexb.com
Thu Mar 24 16:09:06 UTC 2016


On Thu, Mar 24, 2016 at 6:16 AM, Grace O'Hair-Sherman <gohairsh at ucsc.edu> wrote:
> Suggestions for improvement very welcome! Thank you.
>
> Grace O’Hair-Sherman
> gohairsh at ucsc.edu
> [Extended contact information]
>
> GSOC Proposal: Filter to Parse Plain Text strace output to Structured
> Formats Like JSON
>
> Synopsis:
>
> As it is, the output of strace is not easily machine-readable. I
> propose to solve this problem by providing a filter to parse strace
> output and convert to a structured format. This parser will be written
> in Python and the output will have the option of being in JavaScript
> Object Notation or MessagePack (http://msgpack.org/). Here is an
> example of how partial output of strace run on a hello-world program
> might be output as JSON (supposing the parser were named
> strace_to_structured):
>

Thank you for submitting a draft. This is a good idea.

> Partial output:
>
> % strace -T ./hello
> execve("./hello", ["./hello"...], [/* 33 vars */]) = 0 <0.000071>
> brk(0)                  = 0x24e3000 <0.000006>
>
>
> JSON:
>
> {
>     "strace - T. / hello | strace_to_structured": [{
>             "syscall": "execve",
>             "arguments": ["./hello", "[\"./hello\"...]", "[/* 33 vars */]"],
>             "return_val": 0,
>             "return_val_hex": "0",
>             "time_in_kernel": 0.000071
>         }, {
>             "syscall": "brk",
>             "arguments": [0],
>             "return_val": 38678528,
>             "return_val_hex": "0x24e3000",
>             "time_in_kernel": 0.000006
>     }]
> }
>

A streamable line-by-line JSON format would like be best.

> Benefits to Community
>
> Anyone who wants to programmatically consume strace output must
> currently write their own parser before they can use the output. This
> parser will save these people time and effort as they can start with a
> format that is easily parseable.
>
>
>
> Deliverables
>
> Preparations completed: I have built strace and reviewed the previous
> JSON work done in the project.
>
>
> Deadlines:
>
> 23 May - 29 May     --     Investigation & research into what useful
> JSON and MessagePack output would look like

I would leave aside MessagePack at first. If you structure your code
correctly, alternate serializations should not be too complex to add
afterwards.

>
> Investigate where to put Python program in SourceForge and how to
> package and distribute the program (with help from community mailing
> list)

IMHO the best and simplest will be to have your own repo.
Depending how this goes and what is involved (e.g. one or more scripts)
we can integrate that in the main codebase or keep it as a separate tool.

> (Spring quarter classes at university)
>
>
> 30 May - 5 June     --     Set up repository; get dummy I/O working
>
> Propose JSON and MessagePack formats and get review from community mailing list
>
> (Spring quarter classes)
>
>
> 6 June - 12 June     --     Create prototype that can create JSON
> output for one test from strace-code/test

What would be your approach? what data structures would you see?

>
> (Spring quarter final examinations at university)
>
>
> 13 June - 19 June     --     Decide on how to validate JSON output.
> Perhaps use a python program that can consume and validate JSON.

What do you mean by validate here?
If this is to verify that the parsing was correct, you could consider
a two-way parse/unparse to verify against the standard strace output?

> 20 June - 26 June     --     Create automated test using initial test program
>
> Run filter with more existing strace programs, fixing problems as they appear.
>
> (GSOC Midterm evaluation submission period)
>
>
> 27 June - 3 July     --     Write usage text that is emitted by filter
> when presented with unknown flags
>
> Ensure filter exits cleanly when interrupted

Can you elaborate on what you mean here?

> 11 July - 17 July     --     Document project so far (Should this go
> on the project wiki?)
>
>
> 18 July - 24 July     --     enhance filter to output MessagePack (and
> ensure works with one test from strace-code/test)

IMHO drop MessagePack for now entirely. This will be a nice to have strech goal.
Focus more on your approach and how you could ensure you could reach a
comprehensive coverage of what is thrown at you.

> 25 July - 31 July     --     Run filter with MessagePack output and
> with more existing strace programs, fixing problems as they appear.
>
>
> 1 August - 7 August     --     Ensure filter correctly reads strace
> output when it is run with flags (e.g. -T, -v ) and correctly outputs
> corresponding MessagePack
>
>
> 8 August - 14 August     --     Stretch goal: write a demo program
> that consumes the filter output and prints a summary of average time
> taken by different system calls.

That is a nice to have and should be fairly easy if we have the structure.


> 15 August - 23 August 19:00 UTC     --     Final week: tidy code,
> write tests, improve documentation and submit code sample.

Tests should come first ;)

>
> Related Work:
>
> A similar project was proposed and implemented during the 2014 Google
> Summer of Code, the main difference being that it was supposed to be
> directly a part of strace. It seems that this project’s scope may have
> been too big and it was never integrated with strace. This proposal
> has a smaller scope in that it will be a separate script that does
> post-processing on strace output. Another difference is that this
> project will result in a program with options for different output
> formats, i.e. JSON or MessagePack. (Inspired by this post:
> goo.gl/2yvCTG)
>
>
>
> Biographical Information:
>
> I am a second-year computer science major at University of California,
> Santa Cruz. I have taken Computer Architecture, Algorithms and
> Abstract Data Types, Computer Systems and Assembly Language,
> Introduction to Data Structures, and Accelerated Introduction to
> Programming. By summer I will have taken Analysis of Algorithms as
> well. Almost all these classes have involved UNIX or Linux Bash and
> Makefiles. I started developing using Ubuntu two years ago when I
> interned at Gametime United. I also used Git and wrote JSON, both
> manually and automatically by writing a Python script.
>
>
> I have experience meeting project deadlines; last summer I designed,
> coded, and shipped an iOS application from start to finish in less
> than eight weeks. (It is called Amino Ally: goo.gl/WTGgUz ) I haven’t
> done any open source projects yet, although I’m a member of my
> school’s Linux Users’ Group, so I’m really excited for this
> opportunity to get more involved.
>
>
> The relevant skills that will help me achieve this project’s goal
> include Bash, Makefiles, Git, Python, and JSON.
>

Any public Python code of yours that can be seen somwhere?

> During the last 10 weeks of Google Summer of Code I will be available
> full time to work on my project. I have university classes during the
> first two weeks and final examinations during part of the third week,
> but I will nonetheless make sure to work at least 20 hours in each of
> those three weeks. I consider this a serious full-time commitment and
> I will make up the 60 hours missed during the first three weeks by
> working 46 hours a week for the remaining 10 weeks.

That's OK, the GSOC is not supposed to be a death march!
We like full time involvement so this is appreciated.

-- 
Cordially
Philippe Ombredanne




More information about the Strace-devel mailing list