[GSOC 2014] Some thoughts about the code and the proposal

yangmin zhu zym0017d at gmail.com
Thu Mar 13 15:55:17 UTC 2014


Hi all,

  I had spent some day reading the code of strace to grasp the big picture
of strace.my main purpose is first to find out how did strace dispatch the
syscall to each printing function. I found that the main function ( in
file:strace.c) first call init(argc, argv) to initialize some important
data structures such as "static struct tcb **tcbtab;" and so on. And then
process the arguments(using getopt()) and set the corresponding global
flags(such as cflag_t cflag representing the -c\-C option etc..), and use
sigaction() to set some signal handler at last.

After that, the main trace loop function "static int trace(void)" is called
to handle all the work.

the trace() function is really very big and I found it will finally call
trace_syscall(tcp) to do the core output things, then I go into
trace_syscall(tcp) to see what happened there.

trace_syscall() is defined in syscall.c and it just simply use exiting(tcp)
to determine which function to call(trace_syscall_exiting(tcp)  or
trace_syscall_entering(tcp) ). From there I get to know why "strace sleep
2" output like that(and in fact I later set a breakpoint there and see it
more clearly).



trace_syscall_entering(tcp) will call some other functions to populate some
fields of the tcp structure. And I think the most important output is done
by the following code:

       if ((tcp->qual_flg & QUAL_RAW) && tcp->s_ent->sys_func != sys_exit)

              res = printargs(tcp);

       else

              res = tcp->s_ent->sys_func(tcp);

apparently, for most functions the else part is executed and it dispatch to
the RIGHT function by the structure s_ent which store the output function's
address by a function pointer(together with the name string of the function
etc..). I then become interesting at where and how does strace assign the
right value to tcp->s_ent structure. and I found it is done in "static int
get_scno(struct tcb *tcp)" by the line "tcp->s_ent = &sysent[scno];". the
global pointer sysent is defined as:

 const struct_sysent sysent0[] = {

#include "syscallent.h"

};

and after looking at the "syscallent.h",  I finally know how strace
integrates the syscall table. and I think if we want to add support to some
new syscall we can start from the syscallent.h.



Back to the real output functions, the syscallent.h file give us the
function name of the output function, they just have the same name to the
corresponding syscall function. for example, from file.c I found sys_open()
which just call static int decode_open(struct tcb *tcp, int offset). and
the function decode_open do all the detail things , It know the detail
arguments meaning of syscall open() .

another interesting find is that strace have the low-level output function
which finally output things and other upper functions just use these some
kind API to finish their output function and do not care how the low-level
output function works.the typical low-level output functions are:tprintf(),
printstr(),printpath(), printfd() and so on.



I will spend more time reading and debugging the code to understand its
implementation and I think there is no need to understand all of the code
deeply to finish the GSOC project.




>From this mail (
http://sourceforge.net/p/strace/mailman/strace-devel/thread/4515571.KdWbzpdtLr%40vapier/#msg32095710),
I find "the advanced path decoding itself would be large enough to fill a
whole 3 month GSOC project".

*So, Are you suggesting us not to choose the "advanced path decoding" as
the proposal?*




I read the discuss in the mail and found the "Structured output" is also a
good choice and from my current understanding of strace, we can just modify
the output part of strace alone to finish the work.

>From this mail(http://sourceforge.net/p/strace/mailman/message/32072591/ ),

*Is it means that I should first finish a very basic prototype addressed
some of the problems in the list and post the patch to the mailing list?*




by the way, I find in this mail(
http://sourceforge.net/p/strace/mailman/message/31924683/) that the current
strace is "Printing of decoded C constructs is mostly open-coded" " Support
of other formats inevitably means introducing some API for structured
output " and "the strace code base would have a framework to call an output
module and that would take care of the exact output details."  .

So I am just wondering why strace hard-coded these decoding function and
why use the method using in flex/bison, such as:

we first define a specification file(plain text with a specific grammar)
like:

define sys_open: open ( $1, $2 ) = $0

and then strace parse this file and substitute these $1,$2,$0 variable with
real arguments and output the result string. because I ever used flex/bison
and *I think this maybe better than the hard-coded way?*

this is just my very first thoughts and I know it's immature(we still need
some special way to handle those complex syscall's argument and this
requires really a great lot work to do).



Thanks

Yangmin Zhu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.strace.io/pipermail/strace-devel/attachments/20140313/97637ad2/attachment.html>


More information about the Strace-devel mailing list