[RFC PATCH 15/15] Ask some questions in comments

Eugene Syromiatnikov esyr at redhat.com
Fri Jun 5 17:34:31 UTC 2020


On Fri, Jun 05, 2020 at 06:58:46PM +0200, Ákos Uzonyi wrote:
> Thanks for your answers.
> 
> On Fri, 5 Jun 2020 at 13:57, Eugene Syromiatnikov <esyr at redhat.com> wrote:
> > On Mon, Jun 01, 2020 at 01:44:44PM +0200, Ákos Uzonyi wrote:
> > >  struct btree {
> > >       uint64_t set_value;         /**< Default set value */
> > > +
> > > +     /*
> > > +     Would it make sense to make this pointer type of void**?
> > > +     This type is quite confusing anyway, but maybe void** is a bit more clear.
> > > +
> > > +     As I see we always cast it to uint64_t* so we use it as a pointer to just
> > > +     raw data, but for pointer arithmetic we need to point it to a pointer type.
> >
> > I think it should be simply uint64_t*, I can't recall what I was thinking
> > about when decided it that way.  btree_get_block() has to be fixed
> > accordingly (well, triple-asterisk there should be a hint, but, oh, well).
> 
> Right, that would be the most intuitive type. Yes, btree_get_block has
> to be fixed as it uses pointer aritmetic, which expects this pointer
> to point to a pointer type.
> 
> > > +/*
> > > +Can we check in any efficient way, that these two btree caches are valid?
> >
> > IIRC, PID NS IDs are not recycled (it has to be verified, though), so we can
> > rely on that.
> 
> I tested it, and ns ids are recycled very fast.
> Not a big problem however, I think other caches are well enough for a
> good performance.

Well, that's a bummer; quite expected, probably, however, since those
are (derived from) pointers and not IDRs (IIRC).  Yes, we can rely
on the fact that the NS hierarchy is persistent for a specific process
throughout its lifetime, at least.

> > >  struct proc_data {
> > >       int proc_pid;
> > > +
> > > +     /*
> > > +     Does it make sense to cache these values below?
> > > +
> > > +     As far as i know there is no other way to check this cache validity,
> > > +     than re-reading these values from /proc. But then we have to read these
> > > +     values again anyway, so no reason to cache.
> > > +
> > > +     The only thing I find useful to cache is proc_pid.
> > > +     With a cached proc_pid we don't have to read all entries in proc, just one.
> >
> > In fact, we (probably) can verify that the process is the same by checking
> > its starttime (it is 64-bit and shouldn't roll over) in /proc/pid/stat.
> 
> Wow that's a cool trick.
> 
> I realized I was wrong anyway, as this cache helps us to quickly find
> the proc_pid in case it's not yet in the proc_pid cache.
> 
> By the way, I'm now implementing the caching. I'm using to btree's:
> One for mapping (ns, ns_pid) pair to proc_pid, which is updated when
> we look up an id.
> And one for mapping proc_pid to proc_data, which is updated every time
> we read a dir in proc.
> 
> I think these caches give good enough performance, but if you have any
> comments, let me know.

Sounds reasonable to me so far.

> > > +/*
> > > +Do we need "last" parameter? Only 0 and our_ns are passed to it, but neither
> > > +does anything useful, as we can't go higher than our_ns anyway
> > > +(ioctl fails with EPERM).
> >
> > I think that the idea was to bail out early when we check pid relations
> > (for example, if we have 3 PID NS, for strace_NS - PID1_NS - PID2_NS, and we're
> > looking for PID2 in PID1_NS for whatever reason, then we can finish
> > early, but I can't remember exactly why I thought it could be useful).
> 
> I understand.
> But if I'm right, we only want to translate to our ns. And actually it
> does not seem to a big problem if we read the full hierarchy, it is
> (most probably) very small, and we cache it in proc_data, so even if
> we don't use it, it might be useful later.

Yeah, I think you can easily ignore that hyothetical use case and drop
the "last" argument.

> > > diff --git a/tests/inject-nf.test b/tests/inject-nf.test
> > > index cadb5adb..9b01345e 100755
> > > --- a/tests/inject-nf.test
> > > +++ b/tests/inject-nf.test
> > > @@ -34,6 +34,12 @@ test_rval()
> > >  test_rval 0
> > >  test_rval 1
> > >  test_rval 0x7fffffff
> > > +
> > > +#This test fails (overflow) with these large values,
> > > +#as we use signed integers in pid translation code
> > > +#(PID_MAX_LIMIT is 2^22, so that seems reasonable)
> >
> > The issue is that format has actually changed, it used to be printed
> > as RVAL_UDECIMAL (kernel_ulong) and it is int now; I would say that the
> > expected output should be updated in this case (printing -2147483648
> > seems reasonable to me).  Maybe a separate test that checks that?
> >
> > > +#
> > > +#Maybe we could use a different syscall, for example getuid?
> >
> > Seems reasonable to me.
> 
> Ok, now I've replaced it with geteuid, and it works. I didn't use
> getuid, because alpha does not have getuid (but has geteuid, according
> to alpha/syscallent.h).

Yes, Alpha employs this funky getxuid that returns a pair of values.

> >
> > >  test_rval 0x80000000
> > >  test_rval 0xfffff000
> > >  test_rval 0xfffffffe
> >
> 
> 
> Just some questions:
> 
> In the future, should I send a lot of little patches (like this
> series), or is it cleaner if I send one squashed patch containing the
> current pidns.c code? I'm refactoring pidns.c, and diffs are a bit
> unreadable.

I think it's better to have something more geared towards the final version,
i.e. single patch, unless you decide to split it.  But for the cases where
there are lots of small/local changes you deem reasonable to annotate,
the format with multiple-commits-to-be-squashed is probably preferred.

> Am I correct, that you didn't test your original code? (It had a
> couple compile errors).
> If that's the case, it's amazing for me, that the code had just a very
> few bugs :).

Yeah, I've tested only the btree part and the pidns.c part was mostly
write-only, I abandoned it after writing down most of the logic
without actually checking if it works/compiles.  So much for the lack
of focus.  The fact that it's of use actually surprises me as well.

Speaking of tests, we definitely need those, but there's understandable
obstacle of requirement of elevated privileges for PID NS creation;
I think we can go there the same way as btrfs tests, having some test
that requires root and run it from time to time semi-manually.



More information about the Strace-devel mailing list