[GSOC 2014] Some thoughts about the code and the proposal

Philippe Ombredanne pombredanne at nexb.com
Tue Mar 18 13:09:30 UTC 2014


On Thu, Mar 13, 2014 at 4:55 PM, yangmin zhu <zym0017d at gmail.com> wrote:
> I had spent some day reading the code of strace to grasp the big picture
> of strace.my main purpose is first to find out how did strace dispatch the
> syscall to each printing function. I found that the main function ( in
> file:strace.c) first call init(argc, argv) to initialize some important data
> structures such as "static struct tcb **tcbtab;" and so on. And then process
> the arguments(using getopt()) and set the corresponding global flags(such as
> cflag_t cflag representing the -c\-C option etc..), and use sigaction() to
> set some signal handler at last.
>
> After that, the main trace loop function "static int trace(void)" is called
> to handle all the work.
>
> the trace() function is really very big and I found it will finally call
> trace_syscall(tcp) to do the core output things, then I go into
> trace_syscall(tcp) to see what happened there.
>
> trace_syscall() is defined in syscall.c and it just simply use exiting(tcp)
> to determine which function to call(trace_syscall_exiting(tcp)  or
> trace_syscall_entering(tcp) ). From there I get to know why "strace sleep 2"
> output like that(and in fact I later set a breakpoint there and see it more
> clearly).
>
> trace_syscall_entering(tcp) will call some other functions to populate some
> fields of the tcp structure. And I think the most important output is done
> by the following code:
>
>        if ((tcp->qual_flg & QUAL_RAW) && tcp->s_ent->sys_func != sys_exit)
>               res = printargs(tcp);
>        else
>               res = tcp->s_ent->sys_func(tcp);
>
> apparently, for most functions the else part is executed and it dispatch to
> the RIGHT function by the structure s_ent which store the output function's
> address by a function pointer(together with the name string of the function
> etc..). I then become interesting at where and how does strace assign the
> right value to tcp->s_ent structure. and I found it is done in "static int
> get_scno(struct tcb *tcp)" by the line "tcp->s_ent = &sysent[scno];". the
> global pointer sysent is defined as:
>
>  const struct_sysent sysent0[] = {
>      #include "syscallent.h"
> };
>
> and after looking at the "syscallent.h",  I finally know how strace
> integrates the syscall table. and I think if we want to add support to some
> new syscall we can start from the syscallent.h.
>
> Back to the real output functions, the syscallent.h file give us the
> function name of the output function, they just have the same name to the
> corresponding syscall function. for example, from file.c I found sys_open()
> which just call static int decode_open(struct tcb *tcp, int offset). and the
> function decode_open do all the detail things , It know the detail arguments
> meaning of syscall open() .
>
> another interesting find is that strace have the low-level output function
> which finally output things and other upper functions just use these some
> kind API to finish their output function and do not care how the low-level
> output function works.the typical low-level output functions are:tprintf(),
> printstr(),printpath(), printfd() and so on.
>
> I will spend more time reading and debugging the code to understand its
> implementation and I think there is no need to understand all of the code
> deeply to finish the GSOC project.

yangmin zhu:
I think you are the right track!
This is a very thorough approach you are taking there which is very good!

Small note: try avoid using HTML emails and stick to plain text on the list.


> From this mail
> (http://sourceforge.net/p/strace/mailman/strace-devel/thread/4515571.KdWbzpdtLr%40vapier/#msg32095710),
> I find "the advanced path decoding itself would be large enough to fill a
> whole 3 month GSOC project".
>
> So, Are you suggesting us not to choose the "advanced path decoding" as the
> proposal?

Forget this post, I was just suggesting someone to look into "advanced
path decoding" too in addition to his suggestion.

Please make a proposal!

> I read the discuss in the mail and found the "Structured output" is also a
> good choice and from my current understanding of strace, we can just modify
> the output part of strace alone to finish the work.
>
> From this mail(http://sourceforge.net/p/strace/mailman/message/32072591/ ),
>
> Is it means that I should first finish a very basic prototype addressed some
> of the problems in the list and post the patch to the mailing list?

This pots was just a list of several details to possibly consider in a
proposal and an implmentation.
This is great if you can post a few patches and ideas ahead of your
proposal but is not an absolute requirement.
The work mode if you submit a proposal, and if this is accepted by
starce and the GSOC would be to discuss your approach and submit
pathes as you go for review to the mailing list.


> by the way, I find in this
> mail(http://sourceforge.net/p/strace/mailman/message/31924683/) that the
> current strace is "Printing of decoded C constructs is mostly open-coded" "
> Support of other formats inevitably means introducing some API for
> structured output " and "the strace code base would have a framework to call
> an output module and that would take care of the exact output details."  .
>
> So I am just wondering why strace hard-coded these decoding function and why
> use the method using in flex/bison, such as:
>
> we first define a specification file(plain text with a specific grammar)
> like:
>
> define sys_open: open ( $1, $2 ) = $0
>
> and then strace parse this file and substitute these $1,$2,$0 variable with
> real arguments and output the result string. because I ever used flex/bison
> and I think this maybe better than the hard-coded way?
>
> this is just my very first thoughts and I know it's immature(we still need
> some special way to handle those complex syscall's argument and this
> requires really a great lot work to do).

This is an intriguing idea! I like it, I am sure the devil is the
details though.
Are you suggesting to use a string template for the output, or to have
a grammar to do the actual decoding or arguments?


Thank you again for this detailed and thorough email!
Cordially

-- 
Philippe Ombredanne




More information about the Strace-devel mailing list