[PATCH RFC 2/2] seccomp: implement SECCOMP_FILTER_FLAG_NO_INHERITANCE

Paul Chaignon paul.chaignon at gmail.com
Mon Nov 18 19:02:48 UTC 2019


On Mon, Nov 18, 2019 at 09:39:41PM +0300, Dmitry V. Levin wrote:
> On Mon, Nov 18, 2019 at 07:32:45PM +0100, Paul Chaignon wrote:
> > On Mon, Nov 18, 2019 at 04:27:25AM +0300, Dmitry V. Levin wrote:
> > > On Sun, Nov 17, 2019 at 06:52:04PM +0100, Paul Chaignon wrote:
> > > > On Fri, Nov 15, 2019 at 05:56:50PM +0100, Paul Chaignon wrote:
> > > > > On Fri, Nov 15, 2019 at 06:55:55PM +0300, Dmitry V. Levin wrote:
> > > > > > On Fri, Nov 15, 2019 at 04:46:14PM +0100, Paul Chaignon wrote:
> > > > > > > On Thu, Nov 14, 2019 at 07:06:20PM +0100, Paul Chaignon wrote:
> > > > > > > > On Mon, Nov 11, 2019 at 11:44:50PM +0300, Dmitry V. Levin wrote:
> > > > > > > > > On Mon, Nov 11, 2019 at 04:20:20PM +0100, Paul Chaignon wrote:
> > > > > > > 
> > > > > > > [...]
> > > > > > > 
> > > > > > > > > >  /*
> > > > > > > > > >   * All BPF programs must return a 32-bit value.
> > > > > > > > > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > > > > > > > > index 55af6931c6ec..1df5b058b067 100644
> > > > > > > > > > --- a/kernel/fork.c
> > > > > > > > > > +++ b/kernel/fork.c
> > > > > > > > > > @@ -1605,6 +1605,11 @@ static void copy_seccomp(struct task_struct *p)
> > > > > > > > > >  	 */
> > > > > > > > > >  	assert_spin_locked(&current->sighand->siglock);
> > > > > > > > > >  
> > > > > > > > > > +	if (!inherited_seccomp_filter(current)) {
> > > > > > > > > > +		clear_tsk_thread_flag(p, TIF_SECCOMP);
> > > > > > > > > > +		return;
> > > > > > > > > > +	}
> > > > > > > > > 
> > > > > > > > > Since the task can have many seccomp filters (chained by seccomp_filter.prev)
> > > > > > > > > and SECCOMP_FILTER_FLAG_NO_INHERIT flag affects only those filters that were
> > > > > > > > > created using this flag, this code should copy only those filters that are
> > > > > > > > > excluded from the fork inheritance.
> > > > > > > > 
> > > > > > > > Hm, I had the semantics all wrong.  I'll send a v2 with a new patch and
> > > > > > > > test.
> > > > > > > 
> > > > > > > Actually, I think we should just mandate that all seccomp filters in the
> > > > > > > list have the same value for flag SECCOMP_FILTER_FLAG_NO_FORK_INHERIT.
> > > > > > > This should be as simple as checking that the flag has the same value for
> > > > > > > a new filter and its parent.  In case of TSYNC we should be fine since all
> > > > > > > threads need to have the same ancestors as the calling thread.
> > > > > > > 
> > > > > > > If we don't do that, we'll have to inherit partial lists, which prevents
> > > > > > > us from using references since we'll need to modify field .prev of some
> > > > > > > filters.  I'm guessing making actual copies of filters will not be
> > > > > > > acceptable upstream...
> > > > > > > 
> > > > > > > One big issue with this approach is if we want to strace an application
> > > > > > > that itself uses seccomp-bpf.  In that case, the application will likely
> > > > > > > fail to set up its own filter because flag NO_FORK_INHERIT differ.
> > > > > > 
> > > > > > Seccomp filters are ubiquitous nowadays, software like systemd-nspawn
> > > > > > generates and applies seccomp filters by default.  If the proposed feature
> > > > > > does not support this workflow, does it have a chance to be accepted
> > > > > > into the kernel?
> > > > > 
> > > > > Yep, that was my concern too...  I'll have a second look after the weekend
> > > > > in case I missed something, but I don't see any other clear solution at
> > > > > the moment :(
> > > > 
> > > > Actually, we could relax the constraint: when adding a new filter with
> > > > flag NO_FORK_INHERIT, we can simply mandate that all *previous* filters
> > > > (again, we only need to check the previous one) have NO_FORK_INHERIT as
> > > > well.  Then, forks inherit a sublist of filters starting with the first
> > > > filter without NO_FORK_INHERIT set.
> > > 
> > > Would it be possible to add a traditional seccomp filter after
> > > a NO_FORK_INHERIT one?  If yes, how is it going to work?
> > > If no, is there any security implications?
> > 
> > Scratch that, it won't work.  I always forget that new filters are added
> > at the beginning of the list, not at the end.
> > 
> > What if we had the PID inside the BPF program?  We could compare that
> > against a hardcoded value and shortcut to RET_ALLOW if they match.
> 
> I don't think it would work well with pid namespaces,
> and pid recycling would make it unreliable.

Ah, right.  If parent dies, PID could be recycled and end up assigned to a
descendant.

> 
> Why can't we have only some of seccomp filters to be NO_FORK_INHERIT'ed?

If we have filter list f1->nf2->f3, with only nf2 NO_FORK_INHERIT'ed, we
would need to make a copy of at least f1 upon forking to rewrite its .prev
pointer to point to f3 directly.  Currently, children get a reference to
the list, there's no need for any filter copy.

Paul


More information about the Strace-devel mailing list