GSoC 2019: Efficient syscall tracing for strace

Paul Chaignon paul.chaignon at gmail.com
Mon Apr 8 20:15:50 UTC 2019


Hello all,

Please find my GSoC application draft below.
I'm sorry for posting it so close to the deadline (I have been working on
my PhD defense for the last few months).  I hope you will still consider it.

Regards,
Paul


Your name: Paul Chaignon
Title of your proposal: Efficient syscall tracing for strace

Abstract of your proposal:
===================
Strace currently adds significant overhead to any application it traces.
Even when users are interested in a handful of syscalls, strace will
intercept all syscall made by the observed processes.  Tracing any syscall
with strace involves several context switches to and from the strace
process in userspace, both when entering and exiting the syscall.
Since Linux 3.5 [1], userspace applications can however rely on
seccomp-bpf to filter the syscalls they want to trace.  In that case, the
set of monitored syscalls is filtered in the kernel, using cBPF, before
any context switch to userspace.  seccomp-bpf is well-established and
already used by several large open source applications.  strace could
leverage it to avoid tracing syscalls users don't want to see anyway.
The tracing landscape of Linux also drastically evolved in recent years.
In particular, user applications can rely on eBPF programs, attached to
kprobes, tracepoints, and perf, to filter and aggregate data of interest
in the kernel, with low overhead.  Even though eBPF would currently be
unable to replace the entire feature set of strace on its own, it keeps
improving and may soon have that capability.
During this Google Summer of Code, I would like to finish and merge the
works started to 1) rely on seccomp-bpf to filter syscalls in kernel space
and 2) allow strace to use alternative backends.  That second work will
come with a tracepoint/BPF proof of concept to ensure strace supports
diverse backends, beyond the usual ptrace model.


Detailed description of your idea including explanation on why is it
innovative and what it will contribute:
================================================
Filtering syscalls with seccomp-bpf:
seccomp-bpf, in SECCOMP_SET_MODE_FILTER mode and with the
SECCOMP_RET_TRACE BPF return value, allows us to give control to a ptrace
userspace tracer for specific syscalls.  A cBPF program "attached" to
seccomp will run for all syscalls and select the appropriate action (in
our case SECCOMP_RET_ALLOW or SECCOMP_RET_TRACE).  In the sense of
seccomp, our cBPF program has to implement a backlist of syscalls we want
to trace.  This has to be carefully designed and tested to avoid missing
syscalls (e.g., new syscalls) in particular.
strace needs to generate the cBPF program to handle various architectures
and personalities while reducing overhead as most as possible.  The
resulting patch should aim to add negligible overhead when no syscalls are
being traced.

Support for alternative backends:
strace currently relies on ptrace as the only backend to trace syscalls in
target processes.  Several other backends could however be used:
gdbserver, ftrace, tracepoint, etc.  I am not familiar enough with
gdbserver that I think I could finish it plus the above in the GSoC's
time.  I am more familiar with tracepoint/kprobes + BPF, but I don't think
those are ready to fully replace the current backend.  In particular, such
in-kernel probes 1) would send data to userspace through perf ring buffers
and may therefore lose events and 2) eBPF is still too limited to
implement all of strace's filters.
Instead I would like to work on adding support for switching the backend
used by strace, in preparation for other backends.  The current prototype
for that work [4] implements this support through generic function
pointers to the main functions required from the backend.  In addition, I
would like to develop a proof of concept of a prototype of tracepoint/BPF
backend with two objectives: 1) provide an initial estimate of the
possible performance gains and the set of features that eBPF would need
(e.g., bounded loops, new string helpers) to implement the full set of
strace filtering features and 2) ensure the preparational work for
alternative backends is not limited to the ptrace/gdbserver model of
attach/detach and synchronous tracing (as opposed to filtering on PIDs in
BPF and sending data to userspace asynchronously).


Description of previous work, existing solutions (links to prototypes,
bibliography are more than welcome)
=================================================
Filtering syscalls with seccomp-bpf:
There is already a working patchset [2] implementing this idea.  I have
rebased the prototype [3], ran the tests, and performed a quick evaluation
of its performance.  With a trivial program doing get_ppid()s in a loop
(./overhead below), strace with seccomp-bpf provides a ~35x improvement
over vanilla strace when no syscalls are actually traced:
    $ timeout -s SIGINT 5s strace -enone -o /dev/null ./overhead
    nb of get_ppid() calls: 548190
    $ timeout -s SIGINT 5s strace -n -enone -o /dev/null ./overhead
    nb of get_ppid() calls: 19206080
There are several opportunities to improve the current cBPF filter: it
currently contains a few unnecessary instructions and implements a linear
matching algorithm over the syscall numbers (although it takes ranges of
continuous numbers into account).  To limit the number of instructions,
the cBPF program could also match on the reverse set of syscalls (match
against syscalls that should be RET_ALLOWed instead of RET_TRACEd) when
that makes sense.
The prototype currently relies on an -n switch to enable seccomp-bpf
filtering.  seccomp-bpf filtering should be enabled by default, but only
when syscalls can actually be filtered in kernel (-e but not -z/-Z for
instance).
The existing patchset also lacks documentation and a larger set of tests.

Support for alternative backends:
I have only started rebasing the existing prototype [4].  The
tracing_backend structure exposes the interface for backends.  It is
clearly still very close to the ptrace approach of syscall tracing (e.g.,
startup_child and detach functions) and unlikely a good fit for
tracepoint/kprobe tracing.


Mention the details of your academic studies, any previous work,
internships
===============================================
Academic Studies:
- PhD: I am currently pursuing a Ph.D. in computer science with the
University of Lorraine, France, tutored by the Inria (French research
institute), and financed by the French government and Orange (French
telecommunication company).  I submitted my PhD manuscript beginning of
March and will defend the 7th of May.  This has been taking almost all of
my time for the last few months.  I will be free after my defense, even
though I remain a student until end of 2019.
- MSc: I received an MSc in computer science from the INSA-Rennes, France.


Any relevant skills that will help you to achieve the goal (programming
languages, frameworks)?
===================================================
- C: I've written a lot of C code during my thesis, for various
prototypes.  I've had to understand and modify several large codebases,
including Open vSwitch and Linux.
- BPF: Most of my thesis work related to BPF.  I extended a userspace
implementation of BPF with a partial verifier and support for map
relocation.  I've also written (non open source) several tracing tools
using kprobes/BPF and two research prototypes using XDP/BPF.  I'm fairly
familiar with the internals of both cBPF and eBPF.


Any previous open-source projects (or even previous GSoC) you have
contributed to?
===================================================
- github-linguist [5]: I'm one of the maintainers of github-linguist, the
project used to identify the language of files on both GitHub and GitLab.
- bcc [6]: I've contributed around 70 pull requests to bcc, a BPF
framework and a collection of tracing tools.  In particular, I contributed
several pull requests to improve rewriting of external pointers (pointers
to kernel memory used inside the BPF VM).
- Minor: I've made a couple of small contributions to each of the
following projects: Linux (BPF verifier), LLVM (BPF object file and
libfuzzer), cis-ubuntu-ansible [7] (a project to apply the CIS security
guidelines to servers using Ansible), and various improvements to syntax
highlighting grammars used on GitHub.
- I have participated in the GSoC 2014 for the OWASP organization [11].
I developed security challenges to help teach application security to
students.  That was mostly a web development project with little to do
with strace or what I work on now.  I successfully completed that project.


Any open-source code of yours that we can check out?
========================================
The most recent, clean and non-trivial code I've contributed in open
source is probably my work on bcc [8].
If you're looking for a C project, I've open sourced a research prototype
written during my thesis and based on Open vSwitch [9].  This is clearly
not my cleanest code as we were looking to produce a quick prototype to
assess the performance gains of our approach.  It includes a partial
userspace implementation of a BPF verifier [10, validate_xxx functions]
which I've written from scratch.


Tentative schedule:
==============
Week 1: Add more tests for seccomp-bpf
Week 2: Add support and tests for ARM
Week 3: Prepare patchset for seccomp-bpf & Work on review rounds for
  seccomp-bpf
Week 4: Work on review rounds for seccomp-bpf
Week 5: Improve cBPF program and run evaluations
28/06 - Phase 1 deadline - Result: seccomp-bpf merged or in good state
Week 6: Rebase alt-backends
Week 7: Rebase alt-backends
Week 8: Prepare RFC patchset for alt-backends
Week 9: Develop PoC of tracepoint/BPF backend
26/07 - Phase 2 deadline - Result: alt-backends posted on mailing list
Week 10: Develop PoC of tracepoint/BPF backend
Week 11: Prepare patchset for alt-backends & Work on review rounds
Week 12: Final week to work on review rounds if necessary
19/08 - Final deadline - Result: alt-backends merged or in good state +
  RFC for tracepoint/BPF backend


1 - fb0fadf ("ptrace,seccomp: Add PTRACE_SECCOMP support"),
    https://lore.kernel.org/patchwork/patch/292118/
2 - https://github.com/strace/strace/commits/ppiao/gsoc-2018-final
3 - https://github.com/pchaigno/strace/commits/ppiao-seccomp
4 - https://github.com/esyr-rh/strace/commits/gdbserver-prep
5 - https://github.com/github/linguist
6 - https://github.com/iovisor/bcc
7 - https://github.com/awailly/cis-ubuntu-ansible
8 - https://github.com/iovisor/bcc/pulls?utf8=%E2%9C%93&q=is%3Apr+author%3Apchaigno+is%3Aclosed+external
9 - https://github.com/Orange-OpenSource/oko
10 - https://github.com/Orange-OpenSource/oko/blob/master/lib/bpf/ubpf_vm.c
11 - https://www.google-melange.com/archive/gsoc/2014/orgs/owasp/projects/pchaigno.html


More information about the Strace-devel mailing list