Experience from writing trace_inputs.py
Philippe Ombredanne
pombredanne at nexb.com
Fri Mar 21 13:43:54 UTC 2014
On Fri, Mar 21, 2014 at 2:35 PM, Philippe Ombredanne
<pombredanne at nexb.com> wrote:
> On Thu, Mar 20, 2014 at 9:50 PM, Marc-Antoine Ruel <maruel at chromium.org> wrote:
[...]
>> - Encoding, I had to write a state machine to read the logs properly, see
>> the >110 lines of strace_process_quoted_arguments(). That's independent of
>> -x.
>
> Agreed this is an issue and that is one reason for getting a
> structured output. Let me make a separate post with a python snippet
> that may help you there.
Marc-Antoine:
In your case using the standard Python shlex module may be of some help?
It has a lot of the logic needed to lex shell-like quoted arguments.
Here is a Python snippet that demonstrates this approach. This code is
not copyrighted and placed in the public domain if you ever fancy
reusing it.
Cordially
--
Philippe Ombredanne
import re
import shlex
# catch things like , [/* 65 vars */]
VARS_COMMENT = re.compile(r', \[\/\* \d+ vars \*\/\]')
def decode_args(args):
"""
Return a list of arguments from args string.
Based on a classical strace output like::
execve("/bin/bash", ["/bin/bash", "-c", "gcc -Wall
-Wwrite-strings -g -O2 -o strace bjm.o ."...], [/* 24 vars */]) = 0
The expected args string is something like::
"/bin/bash", ["/bin/bash", "-c", "gcc -Wall -Wwrite-strings -g
-O2 -o strace bjm.o ."...], [/* 24 vars */]
And the returned list looks like::
['/bin/bash', '/bin/bash', '-c', 'gcc -Wall -Wwrite-strings -g
-O2 -o strace bjm.o ....']
'Inner' arguments could be further decoded using the same approach.
"""
try:
# First some cleanup on args
# remove var comments
cleaned = re.sub(VARS_COMMENT, '', args)
# remove deleted info: this can happen in decoded file descriptors
# read(0</tmp/sh-thd-1391680596 (deleted)>, "..."..., 61) = 61
cleaned = cleaned.replace(' (deleted)>', '>')
# Then lex
lexed = shlex.shlex(cleaned, posix=True)
lexed.commenters = ''
# use comma and whitespace as args delimiters
lexed.whitespace_split = True
lexed.whitespace += ','
decoded = list(lexed)
# Then fix brackets: [ at beginning and ] at end of each arg
# FIXME: should do it only on the first and last arg but not all args
fixed = [arg.lstrip('[').rstrip(']') for arg in decoded]
except ValueError, e:
raise ValueError('Error while decoding args: %(args)r.' % locals())
return fixed
More information about the Strace-devel
mailing list