GSoC 2019: Efficient syscall tracing for strace

Wed Apr 10 09:29:33 UTC 2019

On Tue, Apr 09, 2019 at 04:48:51PM +0300, Dmitry V. Levin wrote:
> On Tue, Apr 09, 2019 at 02:56:23PM +0200, Paul Chaignon wrote:
> > On Tue, Apr 09, 2019 at 07:33:24AM +0200, Paul Chaignon wrote:
> > > On Mon, Apr 08, 2019 at 11:53:34PM +0300, Dmitry V. Levin wrote:
> > 
> > [...]
> > 
> > > > > There are several opportunities to improve the current cBPF filter: it
> > > > > currently contains a few unnecessary instructions and implements a linear
> > > > > matching algorithm over the syscall numbers (although it takes ranges of
> > > > > continuous numbers into account).  To limit the number of instructions,
> > > > > the cBPF program could also match on the reverse set of syscalls (match
> > > > > against syscalls that should be RET_ALLOWed instead of RET_TRACEd) when
> > > > > that makes sense.
> > > > 
> > > > I suppose libseccomp might be a good source of ideas for generating a more
> > > > optimal cBPF program.
> > > 
> > > Thanks for the suggestion, I'll have a look.  At the moment, I was
> > > thinking we could either use binary search or a simple bit-matching
> > > algorithm (see mail to Chen Jingpiao on the mailing list for details) to
> > > match filtered syscalls.
> > 
> > I've checked libseccomp's code [1] and run a small PoC.  They implement a
> > linear match over the list of filtered syscalls.  They don't even have
> > range matching as Chen's current prototype.  They have to work with
> > different constraints though as users can give different priorities to
> > different syscalls and they match high-priority syscalls first.
> > 
> > They had several discussions on performance of the BPF program [2, 3].
> > One contributor is working on a binary search patchset.  I guess they
> > can't use bitwise AND if they want to abide by the user's priorities.
> 
> Thanks.  It's not libseccomp then.
> 
> If optimizing BPF program by size, we could apply different optimization
> methods and choose at runtime one that produces the smallest program.

Even taking personalities into account, it doesn't look like we might
exceed BPF_MAXINSNS (4096 on older kernels).  The number of instructions
might also not always be a good indicator of the filter's overhead.  At
least for binary search, the number of instructions will likely be similar
to a linear search with ranges, but the average runtime will be lower.

I'm expecting we will have to choose between linear search (for small
filters) and only one of bitwise match and binary search.  In that case we
can try linear first and fallback to what we've determined to be best
between bitwise and binary search if linear produces a "too large" number
of instructions.  We'll know more with a PoC and evaluations.

Reading libseccomp's code, I realized we may have to rewrite jumps for
large filters.  cBPF enables conditional jumps of <=255.  We'll have to
rewrite any jump>255 to use an inconditional jump.
In general, I think we should always fallback to the usual ptrace behavior
if seccomp filter doesn't work for any reason (eventually with a warning).
The current prototype will error out in a lot of cases (e.g., filter
rejected by kernel).

Thanks a lot for taking the time to read my rather long texts :)

--
Paul