Experience from writing trace_inputs.py

Philippe Ombredanne pombredanne at nexb.com
Fri Mar 21 13:43:54 UTC 2014


On Fri, Mar 21, 2014 at 2:35 PM, Philippe Ombredanne
<pombredanne at nexb.com> wrote:
> On Thu, Mar 20, 2014 at 9:50 PM, Marc-Antoine Ruel <maruel at chromium.org> wrote:
[...]
>> - Encoding, I had to write a state machine to read the logs properly, see
>> the >110 lines of strace_process_quoted_arguments(). That's independent of
>> -x.
>
> Agreed this is an issue and that is one reason for getting a
> structured output. Let me make a separate post with a python snippet
> that may help you there.

Marc-Antoine:

In your case using the standard Python shlex module may be of some help?
It has a lot of the logic needed to lex shell-like quoted arguments.

Here is a Python snippet that demonstrates this approach. This code is
not copyrighted and placed in the public domain if you ever fancy
reusing it.

Cordially
-- 
Philippe Ombredanne


import re
import shlex

# catch things like , [/* 65 vars */]
VARS_COMMENT = re.compile(r', \[\/\* \d+ vars \*\/\]')

def decode_args(args):
    """
    Return a list of arguments from args string.

    Based on a classical strace output like::
        execve("/bin/bash", ["/bin/bash", "-c", "gcc -Wall
-Wwrite-strings -g -O2   -o strace bjm.o ."...], [/* 24 vars */]) = 0
    The expected args string is something like::
        "/bin/bash", ["/bin/bash", "-c", "gcc -Wall -Wwrite-strings -g
-O2   -o strace bjm.o ."...], [/* 24 vars */]
    And the returned list looks like::
        ['/bin/bash', '/bin/bash', '-c', 'gcc -Wall -Wwrite-strings -g
-O2   -o strace bjm.o ....']
    'Inner' arguments could be further decoded using the same approach.
    """
    try:
        # First some cleanup on args
        # remove var comments
        cleaned = re.sub(VARS_COMMENT, '', args)
        # remove deleted info: this can happen in decoded file descriptors
        # read(0</tmp/sh-thd-1391680596 (deleted)>, "..."..., 61) = 61
        cleaned = cleaned.replace(' (deleted)>', '>')
        # Then lex
        lexed = shlex.shlex(cleaned, posix=True)
        lexed.commenters = ''
        # use comma and whitespace as args delimiters
        lexed.whitespace_split = True
        lexed.whitespace += ','
        decoded = list(lexed)
        # Then fix brackets: [ at beginning and ] at end of each arg
        # FIXME: should do it only on the first and last arg but not all args
        fixed = [arg.lstrip('[').rstrip(']') for arg in decoded]
    except ValueError, e:
        raise ValueError('Error while decoding args: %(args)r.' % locals())
    return fixed




More information about the Strace-devel mailing list