[PATCH RFC 2/4] Introduce seccomp-assisted syscall filtering

Mon Jul 22 16:32:37 UTC 2019

Sorry for the delay in answering.  I just went back to this after working
on the new BPF program.

On Sun, Jul 14, 2019 at 01:46:54PM +0300, Dmitry V. Levin wrote:
> On Sat, Jul 13, 2019 at 12:24:11PM +0200, Paul Chaignon wrote:

[...]

>
> I've had a cursory look at this patch, see my comments below.

Thanks for the quick review!

[...]

> > +
> > +   if (ptrace(PTRACE_TRACEME, 0, NULL, NULL) < 0) {
> > +           /* Check if strace is already being traced. */
> > +           if (errno == EPERM)
> > +                   _exit(0);
>
> We don't have a EPERM check in test_ptrace_get_syscall_info,
> why do we need it here?

I wanted to gracefully disable seccomp filtering in case of e.g.,
strace -f strace ...  I had not seen yet that test_ptrace_get_syscall_info
already causes this command to error out.  I removed it.

>
> [...]
> > +static bool
> > +traced_by_seccomp(unsigned int scno, unsigned int p)
> > +{
> > +   return !sysent_vec[p][scno].sys_func
> > +          || sysent_vec[p][scno].sys_flags & TRACE_INDIRECT_SUBCALL
> > +          || is_number_in_set_array(scno, trace_set, p)
> > +          || strcmp("execve", sysent_vec[p][scno].sys_name) == 0
> > +          || strcmp("execveat", sysent_vec[p][scno].sys_name) == 0
> > +#if defined SPARC || defined SPARC64
> > +          || strcmp("execv", sysent_vec[p][scno].sys_name) == 0
> > +#endif
> > +          || strcmp("socketcall", sysent_vec[p][scno].sys_name) == 0
> > +          || strcmp("ipc", sysent_vec[p][scno].sys_name) == 0
> > +#ifdef LINUX_MIPSO32
> > +          || strcmp("syscall", sysent_vec[p][scno].sys_name) == 0
> > +#endif
> > +          ;
> > +}
>
> struct tcb.sys_name is not the proper field to check,
> struct tcb.sen should be used instead.

Fixed it.  Thanks!

>
> By the way, why do we unconditionally trace multiplexing syscalls like ipc
> and socketcall?

This patchset doesn't handle such multiplexing syscalls.  It would require
some additional logic in the BPF program (whatever the approach for
syscall matching).  All subcalls are therefore currently sent to the
userspace tracer.

[...]

> > +
> > +static unsigned short
> > +init_sock_filter(struct sock_filter *filter)
> > +{
> > +   unsigned short pos = 0;
> > +#if SUPPORTED_PERSONALITIES > 1
> > +   unsigned int audit_arch_vec[] = {
> > +# if defined X86_64
> > +           AUDIT_ARCH_X86_64,
> > +           AUDIT_ARCH_I386,
> > +           AUDIT_ARCH_X86_64
> > +# elif SUPPORTED_PERSONALITIES == 2
> > +           AUDIT_ARCH_X86_64,
> > +           AUDIT_ARCH_I386
> > +# endif
> > +   };
> > +#endif
> > +   unsigned int syscall_bit_vec[] = {
> > +#if defined X86_64
> > +           0, 0, __X32_SYSCALL_BIT
> > +#elif defined X32
> > +           __X32_SYSCALL_BIT, 0
> > +#elif SUPPORTED_PERSONALITIES == 2
> > +           0, 0
> > +#else
> > +           0
> > +#endif
> > +   };
>
> Please don't use any arch-specific ifdefs in the common code like this.
> Use of arch-specific constants like AUDIT_ARCH_X86_64 in the common code
> like this is also not acceptable.

I had a look at the other arch-specific codes in strace, but couldn't find
the proper place for that code.  Should these simply be moved to the top
of the file as in syscall.c?

I also need to handle one special case for X86_64 in the BPF program.  On
X86_64, AUDIT_ARCH_X86_64 is used for both x32 and x86-64 and x32 has a
mask on syscall numbers to differentiate the two.  Should I also encode
that information in the audit_arch_vec array?
The BPF code to distinguish x32 and x86-64 is going to use
__X32_SYSCALL_BIT.  Should that constant also be moved outside the common
code?

Paul