Paul Chaignon's GSoC status report - #9 of 12

Tue Jul 30 12:33:18 UTC 2019

On Tue, Jul 30, 2019 at 01:36:27PM +0200, Eugene Syromyatnikov wrote:
> On Tue, Jul 30, 2019 at 12:10 PM Paul Chaignon <paul.chaignon at gmail.com> wrote:
> >
> > On Tue, Jul 30, 2019 at 10:13:08AM +0200, Eugene Syromyatnikov wrote:
> > > On Mon, Jul 29, 2019 at 9:21 PM Paul Chaignon <paul.chaignon at gmail.com> wrote:
> > > > - Finished implementing generation of bit matching BPF program.  On a
> > > >   first look, results are not as good as I expected.  For instance, for
> > > >   x86_64 personality, when tracing a single syscall (bpf(2)), linear BPF
> > > >   program has 8 instructions whereas binary match program has 35
> > > >   instructions.  It would take a large number of traced syscalls to make
> > > >   the binary match program the best choice...
> > > What about -e trace=%net, %file, or other (popular) syscall class?
> >
> > I haven't checked those yet, but that's what I was planning on using to
> > have common/representative sets of traced syscalls in evals.
> >
> > Even with those, it's not sure the binary match program will be better.
> > In the worst case, the linear program requires one more instruction per
> > new traced syscall, and often less (when there are continuous sequences of
> > syscalls to trace).  So we would need at least 27 additional traced
> > syscalls to make the binary match worth it.
>
> That's why the question.
>
> % git grep -c '\bTF\b' linux|cat
> linux/32/syscallent.h:40
> linux/64/syscallent.h:40
> linux/alpha/syscallent.h:69
> linux/arm/syscallent.h:69
> linux/avr32/syscallent.h:62
> linux/bfin/syscallent.h:68
> linux/hppa/syscallent.h:64
> linux/i386/syscallent.h:69
> linux/ia64/syscallent.h:62
> linux/m68k/syscallent.h:69
> linux/microblaze/syscallent.h:69
> linux/mips/syscallent-compat.h:21
> linux/mips/syscallent-n32.h:60
> linux/mips/syscallent-n64.h:59
> linux/mips/syscallent-o32.h:67
> linux/powerpc/syscallent.h:67
> linux/powerpc64/syscallent.h:64
> linux/s390/syscallent.h:67
> linux/s390x/syscallent.h:62
> linux/sh/syscallent.h:69
> linux/sh64/syscallent.h:68
> linux/sparc/syscallent.h:69
> linux/sparc64/syscallent.h:66
> linux/syscallent-common-32.h:1
> linux/syscallent-common.h:4
> linux/x32/syscallent.h:62
> linux/x86_64/syscallent.h:60
> linux/xtensa/syscallent.h:65
> % git grep -c '\bTD\b' linux|cat
> linux/32/syscallent.h:88
> linux/64/syscallent.h:87
> linux/alpha/syscallent.h:106
> linux/arm/syscallent.h:112
> linux/avr32/syscallent.h:108
> linux/bfin/syscallent.h:109
> linux/hppa/syscallent.h:107
> linux/i386/syscallent.h:112
> linux/ia64/syscallent.h:103
> linux/m68k/syscallent.h:112
> linux/microblaze/syscallent.h:112
> linux/mips/syscallent-compat.h:19
> linux/mips/syscallent-n32.h:103
> linux/mips/syscallent-n64.h:100
> linux/mips/syscallent-o32.h:110
> linux/powerpc/syscallent.h:112
> linux/powerpc64/syscallent.h:106
> linux/s390/syscallent.h:112
> linux/s390x/syscallent.h:104
> linux/sh/syscallent.h:112
> linux/sh64/syscallent.h:110
> linux/sparc/syscallent.h:111
> linux/sparc64/syscallent.h:107
> linux/syscallent-common-32.h:7
> linux/syscallent-common.h:10
> linux/x32/syscallent.h:109
> linux/x86_64/syscallent.h:100
> linux/xtensa/syscallent.h:105
>
> (others are not so numerous, though)

Ah, thanks.  I didn't realize the %file set was that large.

>
> > There may be a few opportunities to shorten the binary match program too
> > (i.e., skip bit arrays that are known to be empty).
>
> The other thing that is possible is to have multiple program
> generation strategies implemented (with known program complexity
> estimations), and switch between then based on the filter requested
> (or simply generate all the possibilities and select one that is
> shorter or executes faster).

That's been the plan all along :-)
Even if we didn't have the binary match program, we'd still need a reverse
linear program.  I'm planning to send a second patchset to implement the
multiple strategies, i.e., keep a single strategy in the first patchset.

However, since each strategy would likely have its own function to count
the number of instructions (in the check step, to see if seccomp can be
enabled), I was thinking we could use that to find the best strategy
instead of having heuristics.

Paul